SMTP issue


Badge +2
Hi, I am having this SMTP issue and need your help. Thanks in advance for your input.

I have been using aspmx.l.google.com:25 as SMTP server (no auth) and it worked well until last week. Error I get is:
Status : failedMessage : Failed to send email through the smtp server

If I ran send-test-email under ncli, it returns error "TimeOut 5000". But I have no problem to resolve or ping or telnet from the cluster/all cvms to google smtp. So I have changed the SMTP to an internal email server, and this time I get error "Exception reading response".

But if I do a straight mailx test to either aspmx.l.google.com or my internal mail server, the emails send through and no errors at all. Command run as below to google smtp and similar output if was internal mail server:
$ echo "This is a test email" | mailx -v -s "test smtp" -r "sender@mydomain.com" -S smtp="aspmx.l.google.com:25" recipient@mydomain.comResolving host aspmx.l.google.com . . . done.Connecting to 173.194.205.26 . . . connected.220 mx.google.com ESMTP t3si3635587qkb.18 - gsmtp>>> HELO NTNX-XXXXXXXXX-B-CVM250 mx.google.com at your service>>> MAIL FROM:250 2.1.0 OK t3si3635587qkb.18 - gsmtp>>> RCPT TO:250 2.1.5 OK t3si3635587qkb.18 - gsmtp>>> DATA354 Go ahead t3si3635587qkb.18 - gsmtp>>> .250 2.0.0 OK 1499904651 t3si3635587qkb.18 - gsmtp>>> QUIT221 2.0.0 closing connection t3si3635587qkb.18 - gsmtp

What can I do to fix this issue?

6 replies

Userlevel 3
Badge +17
There are a few things to look at inside Nutanix for email failures.Checks: First, the active email delivery agent logs email transactions in ~/data/logs/send-email.log on the current "zookeeper"/"zeus" leader on the cluster (cluster status | grep -i zeusleader). That log should have further details about the failures. Secondly, since you noted this was working previously, what might have changed? - either on the Nutanix side, such as AOS upgrades, since the zeus leader can change. - or externally, such as networking, or even at the email server side? - Do firewall rules include all the CVMs for port 25 or 2525 (for instance), since leader CVM may change? - it sounds like a "yes" from your post. - Does your internal email server require authentication during its test?Next steps: - What does "ncli cluster get-smtp-server" show? - all correct still? - What AOS version are you on currently? - What errors does the zeus leader's ~/data/logs/send-email.log show?
Badge +2
Hi PaulR, thank you for reply!

I don't think there were any changes (network or systems or access) had been done before the email stopped working. Port 25 is open to the entire subnet to Google mx; and internally there is no block on ports. Auth is not required either by internal smtp server.

As instructed, I have the send-mail.log from the zeus leader:
2017-07-17 10:20:04 INFO send-email:278 0 emails are sent successfully2017-07-17 10:21:02 INFO send-email:203 Email file /home/nutanix/data/email/1500213603.120250.new has matched the input file regexp2017-07-17 10:21:03 INFO emailfile.py:521 Failed to send email through the smtp server2017-07-17 10:21:03 ERROR emailfile.py:522 Traceback (most recent call last):File "/home/hudsonb/workspace/workspace/User_builds/builds/build-danube-4.5.1-stable-release/serviceability-python-tree/bdist.linux-x86_64/egg/serviceability/emailfile.py", line 518, in flushFile "/usr/lib64/python2.6/smtplib.py", line 716, in sendmailraise SMTPRecipientsRefused(senderrs)SMTPRecipientsRefused: {u'nos-alerts@nutanix.com': (552, '5.2.2 The email account that you tried to reach is over quota. Please direct5.2.2 the recipient to5.2.2 https://support.google.com/mail/?p=OverQuotaPerm h73si14639354qka.69 - gsmtp')}
2017-07-17 10:21:03 WARNING send-email:255 /home/nutanix/data/email/1500213603.120250.new: will be retried later: Failed to send email through the smtp server2017-07-17 10:21:04 INFO send-email:181 Send email status zookeeper node stated changed to smtp_server {address_list: "aspmx.l.google.com"port: 25}email_send_status: kFailurelast_attempt_timestamp_usecs: 1500250863944895failure_details: "Failed to send email through the smtp server"last_successful_email_timestamp_usecs: 1500250327698930
2017-07-17 10:21:04 INFO send-email:278 0 emails are sent successfully

Goolge MX server says the email recipient nos-alerts@nutanix.com is over quota. Seems this is stopping email from receiving. Not sure this is something fixable and happened only to me?

PS: even I untick the "Email recipient to Nutanix support" under Prism - Email alert configuration, the sending still fails with same log message.

Thanks,
Badge
I have problem to register SMTP
Userlevel 3
Badge +17
Hi BenAus:Thanks! - Glad to offer a first-steps response.From the log file, yes, there's a quota problem somewhere along the pathway. - Nutanix corporate is not over-quota - other users are sending to Nutanix OK. - Nutanix's Pulse emails sent to "nos-asups@nutanix.com" can be somewhat large (sometimes 10MB or more), and there might be a configured limit. That's why the tiny "test" email likely worked.I wonder if there is a limit for either overall quota, or per-email size when going through your "aspmx.l.google.com" SMTP configuration?Hi Hesham: Your comment is very general. - Check firewall settings, SMTP server authentication requirements and email forwarding settings, etc. - Can you specify what issue you are having?
Badge +2
Hi PaulR,

Checked on Google SMTP options, and it says aspmx.l.google.com having "Per user receiving limits". Not sure what the limitation is. I will swap to our internal smtp server and add relay there.

And another question, how can I remove the historical logs or alerts from attaching it into emails if the per email quota is the problem?

Thanks,
Userlevel 3
Badge +17
Hi BenAus,

So far as email sizes from the Nutanix cluster, alerts emails are not large. But, Pulse emails are the ones which contain the size of data attachments which could be restricted. I think these Pulse emails are the ones you may be referring to as "historical logs".In Prism, you can select if, and to whom, you send Pulse emails. - Check status for Pulse emails, and adjust if needed. - Disabling Pulse for a short while can help you debug if those are the ones causing the SMTP quota errors.

Reply