Monday, March 10, 2014

How we resolved email delivery issue when MX record got ignored by some mail servers?


We had a strange email delivery issue, some emails to our corporate email account where not getting delivered when sent from some email domains. Some of them who faced the issue called us when they didn't get a reply to their emails. We don't want this to happen with our clients or prospects who are trying to contact us. As being a technology company we could not allow this to happen and took it up a challenge that we needed to fix ASAP.

We have our static site running on Amazon S3 and have set up a CNAME record to cover the domain so that we can have the URL of http://mydomain.com (note that I am replacing our real domain with mydomain.com throughout this).  The request coming to http://www.mydomain.com was redirected to http://mydomain.com.

The issue is that sometimes when an email sends to [email protected], the domain part gets replaced with s3-website-ap-southeast-1.amazonaws.com during the domain name resolution process.

The replaced domain is the CNAME record for the domain. 

CNAME
mydomain.com -> s3-website-ap-southeast-1.amazonaws.com

The MX and A records are setups such that MX contains email.mydomain.com and an A record that points mail server to the correct IP address.

So I tried to investigate into it and finally found that some email server's domain lookup mechanism used domain CNAME record first instead of using the MX records.  The usual workaround was stopped using CNAME for the domain name and put the IP address of the domain server in the A record. For a static site hosted on Amazon S3, this is not an option.

We found a workaround.  We moved our static content to www subdomain record and the direct requests to http://mydomain.com were redirected to http://www.mydomain.com (from Amazon S3 bucket itself) . We then created a sudomain named www on DNS panel and assigned the CNAME record to the subdomain . Then we removed the original CNAME entry against  mydomain.com domain.

This seems to have resolved the issue. I think the domain resolution logic falls back to MX records when it detects "www" in the CNAME records.

I hope this information will be helpful for anyone who faced this issue. Do let us know your questions or comments.

Thanks.