Checking for correct e-mail address format

Currently, the complete project is here:

Adding a User to the database:
I want to check whether the provided e-mail address is valid, containing an @ sign as well as one of .com, .edu, .org (this isn’t real-world, just an exercise).

def add_user(self, name, email, user_books=None):
        if "@" not in email:
            print("Please enter a valid e-mail address")
            pass
        elif (".com" or ".edu" or ".org") not in email:
            print("Please enter a valid e-mail address")
            pass
        elif email in self.users.keys():
            print("User with email {email} already exists.".format(email))
            pass
        else:
            self.users.update({email: User(name, email)})
        if user_books:
            for book in user_books:
                self.add_book_to_user(book, email)
        return User(name, email)
  • I can’t tell at the moment if this OR logic is working or not. I tested it by printing the list of users afterwards to see if the user with invalid email had been added or not. However, the printing does not seem to happen. When there is nothing to print (no User objects), does Python maybe not show any output?

I’ll try to describing how I think it should be:

  1. If one out these three strings is present in the e-mail address, then go ahead and add the User:
if ".com" or ".edu" or ".org" in email:
    self.users.update({email: User(name, email)})
else:
    #print error message, don't add user
  1. If none of these three strings is present in the e-mail address, print the error message and don’t add the user:
if (".com" or ".edu" or ".org") not in email:
    print("Please enter a valid e-mail address")
    pass
else:
   # go ahead and add the user

not takes precedence before or, hence I need the brackets (?). Am I doing this control flow right?

  • Besides, I can never decide, which one of these two approaches is better for which case.
if x:
    do a
else:
    do b

versus:

if not x:
    do b
else:
    do a

Would you say, it’s rather a matter of style, readability and so on, or would you say that easily identifiable, logic criteria play a role as well?
In this function, I thought I’d first exclude all the hindering factors with the if checks and write the primary action of the function at the end where a string only gets by passing all these checks first.

Benny, there are a couple of issues here:

The short answer is that you can’t use or to create a sort of pseudo-list.

in takes precedence over or, so

if ".com" or ".edu" or ".org" in email: is parsed, you first evaluate

".org" in email ==> (which is, let’s assume) False, giving if ".com" or ".edu" or False

:arrow_right: Now, or returns, not necessarily True or False, but the value of the first operand (left-to-right) that has a truth value of True. If the operands are expressions that evaluate to True or False, this works out as you expect. See docs.

But if the operands are other objects, they are evaluated (i.e., return a truth value) like this:

  • False, 0 (zero), None, empty containers (dictionaries, strings, lists, tuples), and any expressions that evaluate to one of those return False
  • Everything else returns True.

Since “.com” is a non-empty string, in a boolean expression it has the value of True. So by the rule flagged above, (".com" or “.org”) returns “.com”, The first “true” operand:

print(".com" or ".org")
# Output
.com

Bottom line: an if statement constructed like the one used will always trigger as long as it contains one non-empty string.

email = "my_name@my_domain.org"
if "x" or "y" in email:
    print("in the for block")
else:
    print("not in the for block")

# Output:
in the for block

What to do? Maybe something like:

com = ".com" in email
org = ".org" in email
edu = ".edu" in email
if com or org or edu:
    etc.

regarding if vs if not .

In my opinion, the former.

3 Likes

I did this:

def add_user(self, name, email, user_books=None):
        com = ".com" in email
        org = ".org" in email
        edu = ".edu" in email
        if com or org or edu:
            print("Please enter a valid e-mail address")
            return
#then try to add user with invalid email:
Tome_Rater.add_user("Alan Turing", "alan@turingcom")
Tome_Rater.print_users()
#outputs:
user: Alan Turing
e-mail: alan@turingcom
books read: {}

alan@turingcom passed the test!

On the other hand, I now went back to my previous OR logic (mind the brackets!) and fixed the problem of “nothing to print > nothing is printed”. Also replaced pass with return so one failed test would suffice to exit the function:

def add_user(self, name, email, user_books=None):
        if "@" not in email:
            print("Please enter a valid e-mail address")
            return
        elif (".com" or ".edu" or ".org") not in email:
            print("Please enter a valid e-mail address")
            return
        elif email in self.users.keys():
            print("User with email {email} already exists.".format(email))
            return
        else:
            self.users.update({email: User(name, email)})
        if user_books:
            for book in user_books:
                self.add_book_to_user(book, email)
        return User(name, email)
#for a list of users:
def print_users(self):
        if list(self.users.values()) == []:
                print("No users yet.")
        else:
            for user in self.users.values():
                print(user)
#test the if-checks:
Tome_Rater.add_user("Alan Turing", "alan@turingcom")
Tome_Rater.print_users()
Tome_Rater.add_user("Alan Turing", "alanturing.com")
Tome_Rater.print_users()
Tome_Rater.add_user("Alan Turing", "alan@turing.com")
Tome_Rater.print_users()

Output is:

Please enter a valid e-mail address
No users yet.
Please enter a valid e-mail address
No users yet.
user: Alan Turing
e-mail: alan@turing.com
books read: {}

No, sorry, but it just won’t work:

emails = ["hotstuff@domain.xxx", "my_name@ msn.com", "student_activist@duke.edu",
 "natasha@internet_research_agency.ru", "clara_barton@red_cross.org"]

for email in emails:
    if (".com" or ".edu" or ".org") not in email:
        print("{} is invalid".format(email))
    else:
        print("Spam is on the wayto {}".format(email))

# Output:
hotstuff@domain.xxx is invalid
Spam is on the wayto my_name@ msn.com
student_activist@duke.edu is invalid   # should be valid
natasha@internet_research_agency.ru is invalid
clara_barton@red_cross.org is invalid   # should be valid

As I pointed out, the expression (".com" or ".edu" or ".org") is not an iterable object that you can somehow poll or enumerate.

It is a series of chained boolean operators that (in this case) always returns “.com”, reducing the if statement to if ".com" not in email: , which is False when there is a “.com” domain suffix, therefore moving on to elif. Every other suffix will return True and enter the if block.

With slight variations, the story is the same with or without parentheses, with or without not.

You just cannot use or to create an iterable object. That’s not what or does.

Here’s another way. This one does use an iterable, and has the advantage of being easily expandable as far as the suffixes go:

emails = ["hotstuff@domain.xxx", "my_name@ msn.com", "student_activist@duke.edu",
          "natasha@internet_research_agency.ru", "clara_barton@red_cross.org"]
good_suffixes = [".com", ".org", ".edu"]

for email in emails:
    good_email = False
    for suffix in good_suffixes:
        if suffix in email:
            print("Spam is on the wayto {}".format(email))
            good_email = True
            break
    if not good_email:            
            print("{} is invalid".format(email))

# Output:
hotstuff@domain.xxx is invalid
Spam is on the wayto my_name@ msn.com
Spam is on the wayto student_activist@duke.edu
natasha@internet_research_agency.ru is invalid
Spam is on the wayto clara_barton@red_cross.org

… and, sorry, way off topic, but I can’t resist. Python has a slightly weird for - else construction that works perfectly with this to pare it down:

for email in emails:    
    for suffix in good_suffixes:
        if suffix in email:
            print("Spam is on the wayto {}".format(email))            
            break
    else:            # This is only invoked if break is not invoked!
        print("{} is invalid".format(email))

# Output:
hotstuff@domain.xxx is invalid
Spam is on the wayto my_name@ msn.com
Spam is on the wayto student_activist@duke.edu
natasha@internet_research_agency.ru is invalid
Spam is on the wayto clara_barton@red_cross.org
1 Like

Totally off topic, but tangential in one respect…

https://newgtlds.icann.org/en/program-status/delegated-strings

There are hundreds of TLDs and their string length has only one thing in common… None are less than two characters and all are alpha characters. These are two restraints we can explore further.

def is_alpha_tld(e):
  return e.split('.')[-1].isalpha()
def is_len_tld(e):
  return len(e.split('.')[-1]) > 1
email = "joe.blow@example.c0m"
print (is_alpha_tld(email))                          # False
print (is_alpha_tld(email) and is_len_tld(email))    # False
email = "joe.blow@example.com"
print (is_alpha_tld(email))                          # True
print (is_alpha_tld(email) and is_len_tld(email))    # True
email = "joe.blow@example.c"
print (is_len_tld(email))                            # False
print (is_alpha_tld(email) and is_len_tld(email))    # False

We can check the TLD right off the hop, as well as check for the ‘@’ character. That’s before we put any new objects in global scope.

1 Like

I was thinking that in real world programs that check for validity of an e-mail adress, they would probably import a module that is being kept up-to-date (new sets of tlds have been added a couple times) - directly or indirectly - by some internet-related entity and devs would just perform a check against that module

1 Like

Can you pinpoint the flaw in my test that indicated it did work? Because I can’t. I really am having trouble wrapping my head around the operator logic as you are explaining it. I’ll just have to painstakingly go through a lot of examples…

emails = ["hotstuff@domain.xxx", "my_name@ msn.com", "student_activist@duke.edu",
          "natasha@internet_research_agency.ru", "clara_barton@red_cross.org"]
good_suffixes = [".com", ".org", ".edu"]

for email in emails:
    good_email = False
    for suffix in good_suffixes:
        if suffix in email:
            print("Spam is on the wayto {}".format(email))
            good_email = True
            break
    if not good_email:            
            print("{} is invalid".format(email))
  • This has the “weakness” that you need to put tested strings into a list first, but oh well…
  • if not good_email: - I’d have used if good_email == False: instead, would that work as well?
for email in emails:    
    for suffix in good_suffixes:
        if suffix in email:
            print("Spam is on the wayto {}".format(email))            
            break
    else:            # This is only invoked if break is not invoked!
        print("{} is invalid".format(email))

Doesn’t seem off-topic to me at all, it’s probably what I tried to do half an hour ago but didn’t know this syntax for it :slight_smile:

You can simplify this code a lot. Here’s another way to do what you’re trying to do:

emails = ["hotstuff@domain.xxx", "my_name@ msn.com", "student_activist@duke.edu",
          "natasha@internet_research_agency.ru", "clara_barton@red_cross.org"]
good_suffixes = [".com", ".org", ".edu"]

for email in emails:
    good = any(suff in email for suff in good_suffixes)
    if good:
        print("spam them! {}".format(email))
    else:
        print("bad: {}".format(email))

As others mentioned, email validation is pretty tricky. You may be interested in this stackoverflow post on the topic, and also Python Principles has an email validation project you may find useful.

1 Like

I don’t understand this line. Looking up any()
Also never seen a construction like this (a in b for a in c)

The former lets you check if any item in a list holds:

>>> any([True, False])
True

The latter is a list comprehension.

It was probably that “.com” was the first string in the OR chain (which I thought can’t matter), and I used a .com address for testing.

Oops, yeah, I have seen that, just didn’t recognize it.

It’s this:

 elif (".com" or ".edu" or ".org") not in email:

It makes sense in English, but not in code. The fact is, that or just doesn’t work like a shopping list (“see if they have beef or pork or chicken.”) It is a binary comparison operator. (It can be chained, but it only compares two things at a time.) It tells you something about the truth values of the thing on its left and the thing on its right, nothing more.

If you structured the above like this:

elif (".com" not in email or ".edu" not in email or ".org" not in email):

… it would do what you want.

It’s a stylistic preference, but most programmers (I think) would not use if condition == True: or if condition == False: because they are redundant: condition is an expression, which all by itself returns True or False. The forms if condition: or if not condition: are preferred.

1 Like

I just used a list to represent any object that can be used to iterate through the desired suffixes. It could be a file on disk, or a function, for example. (But it cannot be a chain of objects linked by or’s.)

1 Like