This week was one where multiple requests converged. I had two friends that needed to send emails to their customers. One had their information in an Excel spreadsheet and the other exported their internal databases into Excel. Both thought there were problems with many email addresses and wanted to know if there was an easy way to have Excel do email validation. They wanted to find email addresses with improper formats before sending out the emails.
They weren’t looking for me to show that firstname.lastname@example.org existed. Instead, they just wanted to make sure the data record was in the correct email address format.
Excel is a great program for evaluating data. In some cases, it creates hyperlinks for email addresses. In the case of my friends, some of the bad entries were the result of how the info was captured from their website. Other times, typing errors were made or customers intentionally provided incorrect email addresses.
Email Address Rules
I’m sure there may be a way someone could write Excel code to validate email addresses using regular expressions or syntax rules. I didn’t want to take that path, as I’m not a coder. In doing some research, I also noticed that there were more possibilities for valid email addresses than I realized. Here are a few examples:
- They can contain the “+” sign even though many systems disallow this character. For example, Google allows you to use the plus sign for their version of disposable email addresses.
- They can contain multiple periods.
- The top-level domains are not always 3 letters like .com, .net or .org. All country codes are two letters and then there are TLDs like .travel or .museum
- They can contain hyphens.
- Email addresses are case-insensitive. It doesn’t matter if you use upper or lower case letters.
If you want to get the specifics, there is a standard RFC 822 document for internet messaging.
After scrolling through these Excel address lists, I settled on two rules, which could identify most of the bad email address entries. These aren’t email addressing laws or absolutes. Instead, these are simple rules to knock out 95% of the badly formatted addresses.
- The email address had to contain the “@” sign
- The email address had to contain at least 1 period.
The nice thing about these two rules is I could have Excel find the invalid emails using the Filter command. Below you will see a screen snap of the type of entries that were on these Excel spreadsheets. You can also see where Excel will create hyperlinks on improper entries.
To filter for invalid email addresses in Excel
- Click the column header that contains your email addresses.
- From the Data menu, click Filter. This will add a drop-down icon to your column header.
- Click the down arrow control in the top right of the column containing your email addresses.
- From the drop-down menu, select Text Filters.
- From the sub-menu select Custom Filter…
- In the Custom AutoFilter dialog, enter in two filter criteria using the “does not contain” selection. Make certain to use the Or radio button option.
- Click OK.
Once you click OK, Excel will show the emails that don’t have either the @ sign or a period. These are the problem records. You can either delete them or try to make corrections. The rules aren’t perfect as you may still get some incorrect ones that slip through such as anne.h@examplecom. In this example, both the period and “at sign” are present, but it is still invalid since a second period is needed between “example” and “com”.
After doing this exercise, I did see some other patterns. For example, some people would type some combination of “do not send email”. This tends to happen if the input form doesn’t do field validation. Other times, the reverse would happen where people used “shift 2” or [at] instead of the “@ sign”. This is a tactic some people use online to prevent email bots from scraping their address.
Other Steps You Might Take
I also saw a number of emails with spaces such as anne email@example.com. I’m not certain if spaces are legitimate, but I seem to recall AOL using them at one time. However, many email systems will not accept spaces in an email address.
The other item I noticed was many duplicate email addresses. Either people had entered the information multiple times or perhaps the address was shared by two or more people. You should take steps to remove these duplicates, as the recipients do not want multiple copies. Some may even get mad enough to flag your email as spam.
The above method for finding bad email addresses isn’t perfect. What it does do is give you an easy method to find the badly formatted addresses so you can act on them in a timely way. By removing these bad ones, you also prevent someone from getting those annoying bounce back messages from ISPs telling you the user doesn’t exist.
Disclaimer: Images from Amazon Product Advertising API. I may receive an affiliate commission on these products if you buy. Updated: 2020-12-02