Email address validation

Author: disco@discooctopus.com (discooctopus)

Hi,

Does anyone have a nice way of validating an email address (syntax string, proc code, field syntax properties, etc) in a field with a stringent set of rules?

I did find this... http://uniface.communityzero.com/uniface?go=2222502&content=entry ... but not sure if this type of thing will be considered???

Thanks

 

19 Comments

  1. Hi discooptus,

    I do not have any way to validate e-mail address nativly in Uniface and I consider this to be one of quite big issues. Uniface has some validation, via syntax etc, but this is all far too limited. See my wish of implementing regular expression in Uniface. Using this, it should be quite easy to validate many things, including e-mail addresses, phone numbers, postal codes etc. Of course, regular expressions can be used for many other things.

    Kind regards,
    Zdenek Socha


    Author: sochaz (zdenek.socha@fullsys.cz)
  2. Hi disco,

    it depends what you consider a valid email adress:

    If we start from:

    has to have a "@"
    something in front of the "@"
    and the text after the "@" has to have at least one "."

    a syntax string will do.

     

    For more sophisticated tests you can use the <VLDF> trigger:

    split the email adress at the "@"
    examine both parts separately.

    Success, Uli


    Author: ulrich-merkel (ulrichmerkel@web.de)
  3. Hi Uli,

    I can't help myself, I have to argue with you. In this case, i.e. validating an e-mail adress, Uniface syntax will NOT do (the job here). E-mail adress is not only about the at-sign ("@") and some points (".") around it.

    You can have alphanum and some extra characters (e.g. dot, plus) before at-sign, and very similar string after the at-sign. After very last dot, you should have valid domain (or at least to check, it is 2 or 3 or 4 characters long, which is 99% domains, I believe). As far as I know, this is beyond the ability of Uniface syntax strings. Oh man, how I wish I was wrong!

    Of course, you can validate e-mail adress using Uniface 4GL, but it is neither easy nor nice task. There would be a bunch of code to do this.

    Just my 2 cents :-)
    Zdenek Socha


    Author: sochaz (zdenek.socha@fullsys.cz)
  4. Hi Zdenek,

    as I said, it is all about what you call a valid eMail.

    Validate an email is a very easy task, even in Uniface 4GL, becaus its all about string handling.

     

    *) or do it the split way (divide and rule):

    Split what leads and trails "@"
    for both sides, split segments by the "." part.

    Examine the different segments.

    Expl. Each segment must consist only of 1-0,a-z,A-Z,-,_

    Expl. right side has to have at least 2 segments
    Expl: last segment right side has to be one of "com,eu,il, ...."

    *) Or do it the state-machine-way (eat the string):

    Just inspect any single character from left to right
     

    $position$ = 1
    $workchar$ = inspectstring[$position$:1]

    while ($workchar$ != "")

       .... do some checks here
      $position$ = $position$ + 1
      $workchar$ = inspectstring[$position$:1]

    endwhile

    Given a bit more time, I will create a dITo example how to do it. Here is the basic code:

    *************************************************************************
    entry split_adress ; separate the different segments AND check characters
    params
       string p_emailadress : IN
       string p_personlist  : OUT ; before "@" ("ton.blankers")
       string p_hosterlist  : OUT ; after  "@" ("nl·;compuware·;com")
       string p_errorsfound : OUT ; "illegal character 'ä' found"
       numeric p_nerrors    : OUT ;
    endparams
    variables
       string v_String, v_Char, v_allowedlist
       numeric nATfound
    endvariables  
       v_allowedlist = "0·;1·;2·;3·;4·;5·;6·;7·;8·;9·;a·;b·;c·;d·;e·;f·;g·;h·;i·;j·;k·;l·;m·;n·;o·;p·;q·;r·;s·;t·;u·;v·;w·;x·;y·;z·;-·;_"
       p_nErrors = 0
       nATfound = 0
       $1 = 1  ; position counter
       v_Char = p_emailadress[$1:1]
       while (v_char != "")
          selectcase (v_Char)
             case "@"  ; the character "@"
               nATfound = nATfound + 1
               p_personlist = v_String
               v_String = ""
             case "."
               v_string = "%%v_string%%%·;"
             elsecase
               v_string = "%%v_string%%%%%v_Char%%%"
               if ($item(v_Char,v_allowedlist) = "") ; not in List
                  p_nErrors  = p_nErrors + 1
                  p_errorsfound = "%%p_errorsfound%%%%%^illegal character '%%v_Char%%%'"
               endif
          endselectcase
          $1 = $1 + 1
          v_Char = p_emailadress[$1:1]
       endwhile
       p_hosterlist = v_String
       if (nATfound != 1)
          p_errorsfound = "%%p_errorsfound%%%%%^@ found %%nATfound%%% times"
          p_nErrors = p_nErrors + 1
       endif
    end ; split_adress
    **********************************************************************************************************

    SUccess, Uli

    P.S. To all interested in the adress business,
    if you know other rules to validate, here is the place to collect this.


    Author: ulrich-merkel (ulrichmerkel@web.de)
  5. Hi Uli,

    great piece of code. I don't want to flame, but you told, syntax string will do [the job]. But I can't see any use of syntax in your proc code. :-) Anyway, that proc is quite interesing, but as you can see, you have 30line of proc code just to validate (and split) email adress. We need similar validation for many many other things and writing special proc entry for every single validation is not very effective - as long as we talk about 4GL language in the 21st century.

    For example, we need to validate product numbers in many different companies, with many different rules. Rules are not complicated, but it can't be achieved by using just Uniface syntax. Yes, there are syntax codes for digits, letters, letter/digit/underscore, but that's far too limited. Most of times, we just need combination of letters, digits and few extra characters (like dot, coma, hypen, underscore).

    Sorry for beeing a bit off-topic
    Zdenek


    Author: sochaz (zdenek.socha@fullsys.cz)
  6. Zednek, So, write me the regexp functions you want to validate, and I'll put you a global proc together to do it....


    Author: Iain Sharp (i.sharp@pcisystems.co.uk)
  7. Hi Iain,

    I am not sure, what you want from me... regexp functions or regular expressions themselves? Just to be sure, I mention both.

    Regexp functions - the most important is compare, scan (search), replace, split, substring.

    As for regular expression, just as Disco writes, I do not want to mention concrete regexp's here, since it is always subject to change. But, just as guideline for what we need, I'll try to write some sort of very short list (limited to validation purposes only):

    • email adress - see my recent post for simple regexp
    • postal (ZIP) code - different for each country, e.g. here in Czech Republic it has 5 digits with optional space, e.g. "[0-9]{3} ?[0-9]{2}"
    • bank account and bank code, IBAN, SWIFT code - but bank accounts and codes, it is again different in different countries
    • product numbers - different for each company
    • product/goods codes - some internations standards, for example intrastat and/or custom purposes, statistics etc.
    • date and time validation - for importing external txt files
    • any other proprietary codes (mainly during import/export of data on B2B level, for example with bank companies)
    • name validation - not only user-name, but company names etc... first letter uppercase, limited to some characters (not the whole ascii), e.g. "[A-Z][a-z]+[a-z0-9]*"
    • phone and fax numbers - most of time, it has 9digits with optional spaces (for example 123 456 789) with optional international code (+420)
    • ... etc...

    I believe there are several other situations, which I just did not mention here.

    Kind regards,
    Zdenek


    Author: sochaz (zdenek.socha@fullsys.cz)
  8. Hi Zdenek,

    looks like you misinterpreted my post:

    You can use syntax string if all you want to test is a simple "something@another.location" specification.

    For all other things, you have the validate trigger.

    About the 30 lines of code:

    just spent some 2 hrs browsing the web for "the" eMail-Address validation.
    The collection can be found on the "giveaways" page of www.uli-merkel.de/dito
    Belive me, the regexes alone span over a couple of lines.
     

    The uniface code I prepared is a uniface 8 one and build for speed.
    It could have been much smaller.
    Using $SPLIT and $REPLACE in Uniface9 drastically reduces the amount of lines.

    And in this code there are reusable routines (write once, use many) like:
    check if the string is made only of listed characters (defined as you like).

    Another option:

    Uniface provides the use of a DLL,
    so if you find some 3GL (C, c++ preferred) code which evaluates regexes
    give me the link and we will have a dITo REGEX in a couple of days.

    Success, Uli

    P.S. I am not employed by CPWR, so no need for excuses if you critisise the product. I do it as well.


    Author: ulrich-merkel (ulrichmerkel@web.de)
  9. Hi Uli,

    as for your post(s) and my reaction (mainly  about syntax), I have read your post once more, and I admit, I misunderstood it. I'm really sorry for that, my fault.

    As for 3GL and regexp in some kind of DLL, please read our discussion about regexp (see related content in my first post in this thread) - you suggested to me the same and I have answered you, that we already have regexp in DLL (of course!) but there are limitations.

    Kind regards,
    Zdenek


    Author: sochaz (zdenek.socha@fullsys.cz)
  10. discooctopus, sochaz,

    Can you take one other try at explaining exactly what validation rules you want on an email adress?

    I would like to take a shot at solving this.

    Theo.


    Author: Theo Neeskens (tneeskens@itblockz.nl)
  11. Hi Theo,

    I would not like to have to identify what I think a valid email address is, I would expect to have an industry/internet limitation of the allowed syntax.

    EG. I may only validate the existance of the @ character. But then my rules might need to change to validate that there must be at least one "." on the right side of the @ character. Then my rules migh change to include a list of available TLDs... (.com, .sc, .au) each with their own set of sub rules... (.com.au, .org.au, .co.uk, .co.cn, etc).

    Ideally, I want these rules to be defined at an industry/internet level.

    Thanks

     


    Author: discooctopus (disco@discooctopus.com)
  12. Hi Theo,

    from my point of view, rules are as follows:

    • string with 2 parts, separated by at-sign (@), let's call them username and domain, so it looks like this username@domain
    • username is up to 64characters long, made of these characters: lowercase and uppercase english letters (a-z, A-Z), digits (0-9), dot (.) , underscore (_), hypen (-), plus (+), the first and the last one can't be dot (it's always better the first and the last char to be a letter or digit)
    • domain should be up to cca 255 characters, better to keep it under 100 chars long, made of 2 parts separated by dot (.), first part (let's call it local domain) is made of lowercase english letters (a-z), digits (0-9), hyper(-), up to 100chars long, the secont part (let's call it top domain) is from 2 to 4 characters long made of lowercase english letters (a-z)

    It is not perfect, nor it match RFC standard, but I believe, it is strong enough to reduce user typo's in entering email adress.

    As for me, it woud be great to be able to do this validation using some kind of syntax, like Uniface syntax string. Or, suggested by me, regexp. In that case, it would be very easy, to define field syntax (using syntax string or regexp) to nativly validate the field with single email adress. Or in proc code, one could use something like this:

     if (EMAIL.DUMMY != $regexp("^[a-zA-Z0-9_.+-]{2,64}@[a-z0-9-]{5,100}\.[a-z]{2,4}$"))
      message "Invalid email adress!"
    endif

    Of course, having a field with more than one e-mail adress, there could be a cycle (split data using space characters - another regexp, since it can be space, coma, tab etc..). And one could split the string with function like $regexp_split(...) that would split it to subparts, if you need. Or, you could of course make a list of adresses with $regexp_replace(FIELD.DUMMY, 1, $regexp("\s"), ";", -1) where ";" is item separator (that is gold ;). I know, this is more then simple validation, but I can't resist to mention it here.

    All in all, you could define field in your model entity like string field, VC255, field syntax REGEXP("^[a-zA-Z0-9_.+-]{2,64}@[a-z0-9-]{5,100}\.[a-z]{2,4}$") - this would be just perfect. I can't think of any Uniface syntax string doing this job (so we could enter it as field syntax).

    Kind regards
    Zdenek


    Author: sochaz (zdenek.socha@fullsys.cz)
  13. The spec's that I found on the Internet were a bit difficult to understand.
    I have tried to write them down in a simpler form.

    An email adress is: Local-part@Domain

    Local-part:
    One or more Words separated by periods.
    NB: So no period at the start or end of Local-part.
    A Word can be an Atom or a Quoted String.
    Quoted String is not relevant for our purpose.
     
    Domain:
    Two or more Sub-domains separated by periods.
    NB: So no period at the start or end of Domain.
    A Sub-domain can be an Atom (a.k.a. Domain-ref) or a Quoted String (a.k.a. Domain-literal)
    Quoted String is not relevant for our purpose.

    Atom:
    An Atom is one or more chars in the range #33..#126 except ()<>@,;:\/".[]
    That leaves us with:
    ! # $ % & ’ * + - 0 1 2 3 4 5 6 7 8 9 = ? A B C D E F G H I J K L M N O P Q R S T U V W X Y Z ^ _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z { | } ~

    Special considerations:
    In a Sub-domain Atoms are further restricted to letters (upper and lowercase, digits and hyphens. A Sub-domain cannot end with an hyphen.

    This are the syntax requirements. I'll try to code this first.
    After that we can worry about symantical requirements, like what should be the contect of sub-domains (.com, .org. .country etc)
    And finally a check on the actual existense of the email adress.

    Please post your comments if you think these specs are wrong!


    Author: Theo Neeskens (tneeskens@itblockz.nl)
  14. Hi Theo,

    just spent some 2 hrs browsing the web for "the" eMail-Address validation.
    The collection can be found on the "giveaways" page of www.uli-merkel.de/dito

    To make it worse, even an IP-address is allowed in the domain part
    anyone@127.12.100.5
    and my valid email address may look like:
    "Ulrich Merkel" <ulrichmerkel@web.de>

    So I recommend when we create an eMail Validation Routine
    we start with a specification  WHAT WE SEE as a valid one (as you have done in your post)

    With all the new domain features (umlauts, numbers only, 2chars only) we can not cover all situations.
    But we can learn and add these in the course of the time.

    The important bit (as you see on my 30 lines above) is to get the coding started
    and be open for changes.

    Success, Uli


    Author: ulrich-merkel (ulrichmerkel@web.de)
  15. Uli said:
    To make it worse, even an IP-address is allowed in the domain part
    anyone@127.12.100.5
    and my valid email address may look like:
    "Ulrich Merkel" ulrichmerkel@web.de

    The ip adress is not a problem as it fits within the validation rules that I found.


    The email adress with the quoted string might be acceptable for a mailserver,
    but we are looking at DATA ENTRY VALIDATION.
    I don't think many Uniface application have the need for data entered in this format.
    That is why I ignored the whole quoted string option in the specs.
    You send it TO a mail server in this format by combining name and email data from your application.


    Author: Theo Neeskens (tneeskens@itblockz.nl)
  16. Hi Theo,

    maybe it was lost during transfer of data, that in my example, the real email-address was included in "<" angle-braces.

    This is the format you will get from a lot of email programs if you use cut and paste or process emails (using UPOPMAIL).

    On the IP Address issue:

    It was not about your collection of rules, but on otheres expressed in this thread about checking the TLD (toplevel domain like .com)

    Success, Uli


    Author: ulrich-merkel (ulrichmerkel@web.de)
  17. This is my attempt. My code is a bit longer but easier to understand for myself.
    Either regex or some enhancements to syntax strings would but nice to have in Uniface.

     

    Operation Checkmail
    params
        String vMail : In
        String vError : Out
    endparams
    variables
        string vLocalPart, vDomain
        numeric vStatus
    endvariables
     
        ; First we split the string in the part before and the part after @
        vStatus = $split(vMail, 1, "@", vLocalPart, vDomain)
        if (vStatus = 0)
            vError = "Error in email adress: No @ found."
        else
            ; Decompose part before @ into sub-parts.
            call CheckDots(vLocalPart,"LOCAL",vError)
            if (vError = "")
                ; Decompose part after @ into sub-parts.
                call CheckDots(vDomain,"DOMAIN",vError)
            endif
        endif
     
        if (VError = "")
            return(0)
        else
            return(-1)
        endif
    end
     
    ;----------------------------------------------------
    Entry CheckDots
    params
        string vPart   : In
        string vType   : In
        string vError : Out
    endparams
    variables
        numeric vStatus, vPart1, vPart2
    endvariables
     
        if (vType = "DOMAIN" & $scan(vPart,".") = 0)
            vError = "Error in email adress: no . in domain"
        else
            repeat
                vStatus = $split(vPart, 1, ".", vPart1, vPart2)
     
                ; $status = 0 Means no more . found, stop
                ; $status = 1 Means . at the start, give error
                ; vPart2 = "" Means no characters after . , give error
     
                selectcase vStatus
                case 0
                   call CheckWord(vPart,vType,vError)
                case 1
                   vError = "Error in email adress: Incorrect ."
                elsecase
                   if (vPart2 = "")
                       vError = "Error in email adress: Incorrect ."
                   else
                       vPart = vPart2
                       call CheckWord(vPart1,vType,vError)
                   endif
                endselectcase
            until (vStatus=0 | vError <> "")
        endif
    end
     
    ;----------------------------------------------------
    Entry CheckWord
    params
        string vWord   : In
        string vType   : In
        string vError : Out
    endparams
    variables
        string vValid
    endvariables
     
        ; parts before and after @ have diffent valid characters
        if (vType = "LOCAL")
            vValid = "!#$%%%%&’*+-0123456789=?ABCDEFGHIJKLMNOPQRSTUVWXYZ^_`abcdefghijklmnopqrstuvwxyz{|}~"
        else
            vValid = "-0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
        endif
           
        while (vWord <> "")
            if ($scan(vValid, vWord[1:1]) > 0)
                vWord = vWord[2]
                if ($length(vWord) = 1 & vWord = "-" & vType = "DOMAIN")
                   vError = "Error in email address: Invalid character %%vWord[1:1]%%%"
                   vWord = ""
                endif             
            else
                vError = "Error in email address: Invalid character %%vWord[1:1]%%%"
                vWord = ""
            endif
        endwhile
    end
     
     

    Author: Theo Neeskens (tneeskens@itblockz.nl)
  18. Try using RegX for... Email Validation Lee


    Author: creiglee (creiglee@yahoo.com)
  19. Hi all, I am probably a little late. As mentioned by some of you, this is exactly what are regular expressions for. We have a universal approach to such kind of requests called LXScript : Instead of focusing on specific topics (like creating an excel files through intermediate steps) we make the necessary technology directly available to Uniface by extending it.  The extension is bidirectional and opens completely new possibilities to Uniface developers : regular expression, reading from/writing to real Excel files (including formating, charts, …), easy to write webservices, all kind of networking, image processing, graphic rendering, real xml processing, encrypting  ….  those fancy things that are difficult to achieve with Uniface alone.  And all this in an intuitive way and fully embedded in the Uniface runtime environment, not like a spawn shell/bat command limited to one OS. With LXScript regular expressions can be used within Uniface in just a single line of code, and regexp is just one of the many things that you will get with LXScript. You may google for labsolution and LXScript to get more information. Feel free to contact me for more information ! Best regards gerd


    Author: gerd (gerd.vassen@labsolution.lu)