Tags:

Email duplicated from Barracuda Cloud Layer
February 26, 2014 7:18 AM   Subscribe

Some email to our server is being duplicated from the Barracuda Cloud Layer to our local Barracuda 300. Barracuda support says there are routing and latency issues. I don't see the evidence but not sure how to resolve it. Details inside...

Incoming email to our domain first goes to the Barracuda Cloud Protection Layer before being passed to our local Barracuda 300 antispam appliance. It then goes to our email server (CommuniGate). Approximately 1 week ago users started getting duplicates of email. For example, a user receives an email at 10am. Then they might receive the same email again at 10:02, 10:10. It is not not specific to any single domain. It does not occur for all inbound emails. It never occurs for email within our local domain (not process by Barracuda CPL).

I opened a call with Barracuda and they confirmed emails were being resent from the CPL to the local Barracuda but could not provide an explanation other than several messages were getting a "Deferred 451 No HELO/EHLO" status. They did a packet capture on the local Barracuda device and says there a lot of TCP retransmission packets coming from the barracuda cloud control servers to our local Barracuda. Additionally they say there is a routing issue based on this traceroute and want me to take it up with AT&T.

[root@192.168.200.20] # traceroute 64.235.154.194
traceroute to 64.235.154.194 (64.235.154.194), 30 hops max, 40 byte packets
1 99-72-249-174.uvs.mmphtn.sbcglobal.net (99.72.249.174) 2.628 ms 1.739 ms 1.810 ms
2 * * *
3 * * *
4 * 99.15.205.16 (99.15.205.16) 29.625 ms 26.311 ms
5 * * *
6 12.83.112.141 (12.83.112.141) 32.062 ms 31.286 ms 29.848 ms
7 12.122.104.69 (12.122.104.69) 152.513 ms 156.651 ms 96.317 ms
8 12.91.226.10 (12.91.226.10) 87.976 ms 85.129 ms 86.262 ms
9 oak1-ar1-xe-1-0-0-0.us.twtelecom.net (206.222.120.198) 95.168 ms 94.619 ms 108.430 ms
10 126.wecare.net (66.162.144.126) 100.729 ms 100.566 ms 100.232 ms
11 mail14.ess.barracuda.com (64.235.154.194) 100.931 ms 100.411 ms 100.244 ms

They also suggested I change the DNS server setting on the local Barracuda to something other than OpenDNS servers, but added they didn't think this was this issue and gave no alternative.

I'm sort of at a loss at what to do now. Does it really look like a routing issue?
posted by dukes909 to Computers & Internet (7 answers total)
 
"Deferred 451 No HELO/EHLO"

That's interesting. I only dabble with mail systems, but my understanding is that a 451 response is a bit of a trick to cut down on spam. The idea is that when a message comes through that you think might be spam, you tell the sender that your mail server is unavailable. If the sender is just shotgunning email out there, they won't bother queuing the message and trying again later per the RFC.

This suggests to me that something is running extra spam checks. It sounds like you're saying the local barracuda is sending the 451, in which case I'd be asking Barracuda how to fix that (presumably you don't need to be as strict with mail coming from CPL? I'm not entirely sure how that flow works for you). If in fact it's the CommuniGate box responding with 451, then you may need to tweak any spam checks on that device.


Of course, with all of those retransmission packets, I also wonder if the messages are being deferred because the extra traffic is confusing your local devices.
posted by Nonsteroidal Anti-Inflammatory Drug at 7:39 AM on February 26


Barracuda's latest assertion is: "If we have packets being retransmitted inappropriately, it may explain why we're getting duplicates of certain emails. Looking at the traceroute (above) I can see that we've got partial and complete timeouts, as well as high latency getting to the CPL. "
posted by dukes909 at 8:13 AM on February 26


So if there are a lot of retransmissions, that plus the traceroute indicates that there COULD be a problem between hops 6 and 7 in the path through AT&T. Not so much a "routing issue" as congestion on the link between those two hops that is causing high queueing and packet loss (hence retransmissions). I say could, because there's no packet loss in the traceroute output itself, and that ~60 ms jump in latency could be due to distance. That sort of latency jump would normally indicate a cross-country type of distance, and the name of hop 9 (plus the latency from my desk in Seattle) puts the server somewhere in the Bay area. So if you are somewhere east of the Mississippi that is a not-great but also not actionable number. (Note that you already incurred 30 ms latency just getting to hop 5, so 100ms end-to-end isn't so bad.)

You could try running more packets and look for loss - traceroute -q 50 64.235.154.194 will send 50 packets at each stage rather than 3. You're looking for hops 8-10 to have a lot of asterisks interspersed with numbers. This is where it gets tricky - routers don't like responding to traceroute requests, so you really have to see it at multiple hops (therefore losing packets to multiple routers) in order to have a reliable indication. You could also test to points closer to your own network like your ISP's DNS server, in order to figure out if you have loss on your local link (the most common place to find it).

tl;dr - I would not look at that traceroute and assume there's a network problem, unless your end is somewhere on the west coast of the US.
posted by five toed sloth at 9:45 AM on February 26 [1 favorite]


I'm east of the Mississippi - in fact, just south of Memphis ;). Thanks, I'll look at the traceroute with more packets.

I have an AT&T service tech coming tomorrow to do a line test.
posted by dukes909 at 9:53 AM on February 26


Ack.., my version of traceroute on RHEL and on the Barracuda itself won't allow for more than 10 packets per hop.
posted by dukes909 at 9:56 AM on February 26


But with 10:
traceroute to 64.235.154.194 (64.235.154.194), 30 hops max, 60 byte packets
1 99-72-249-174.uvs.mmphtn.sbcglobal.net (99.72.249.174) 2.953 ms 9.832 ms 9.825 ms 9.810 ms 9.794 ms 9.776 ms 9.763 ms 9.749 ms 9.732 ms 9.717 ms

2 * * * * * * * * * *

3 99.15.205.34 (99.15.205.34) 89.874 ms 91.102 ms 91.103 ms 91.090 ms 91.075 ms 97.633 ms 81.686 ms 80.883 ms * *

4 * * * * * * * * * *

5 * * * * * * * * * *

6 12.83.112.141 (12.83.112.141) 31.037 ms 37.269 ms 39.092 ms 37.250 ms 39.071 ms 39.059 ms 39.044 ms 39.468 ms 39.465 ms 44.155 ms

7 12.122.104.69 (12.122.104.69) 101.432 ms 101.433 ms 101.421 ms 102.157 ms 102.154 ms 102.142 ms 83.904 ms 83.689 ms 92.209 ms 94.205 ms

8 12.91.226.10 (12.91.226.10) 94.297 ms 94.308 ms 94.298 ms 94.288 ms 95.900 ms 95.898 ms 99.785 ms 99.784 ms 100.297 ms 100.294 ms

9 oak1-ar1-xe-1-0-0-0.us.twtelecom.net (206.222.120.198) 111.636 ms 111.593 ms 111.954 ms 123.704 ms 115.166 ms 113.168 ms 94.541 ms 94.663 ms 102.686 ms 103.122 ms

10 126.wecare.net (66.162.144.126) 109.174 ms 110.288 ms 110.286 ms 110.273 ms 110.259 ms 110.245 ms 115.818 ms 117.549 ms 117.546 ms 117.421 ms

11 mail14.ess.barracuda.com (64.235.154.194) 117.433 ms 118.246 ms 118.240 ms 100.575 ms 100.494 ms 101.291 ms * * * *
posted by dukes909 at 10:03 AM on February 26


So what you basically did there was to send 50 packets (the ones for hops 6-10) in quick succession without dropping one. The loss at the final hop is normal, servers limit the speed they'll respond to traceroutes.

There's no indication of a network problem there.
posted by five toed sloth at 12:31 PM on February 26 [1 favorite]


« Older We have done a long weekend at...   |  I don't know if it's my latex ... Newer »

You are not logged in, either login or create an account to post comments