I have a customer with a branch office in China. The folks over there need access to resources in the main office, so of course we implemented a VPN between the two offices.
It’s been a problem from day one.
Initially they had a fairly low speed connection at the China side. Actually the connection here wasn’t a lot better, but this side was boosted up fairly soon after adding the vpn. So we had a good 512Kbs here and half of that or less in China.
VPN performance was poor to fair. There were a lot of start-up problems, probably mostly to do with language problems: the tech folk in China didn’t speak English and I don’t speak Chinese, so there was more than a little confusion at first. But it got straightened out, and the system worked. Not well, but it worked.
It’s mostly a character based app (Mas90) and email anyway. They’d like to do SMB file sharing, but it’s not critical, so the VPN shouldn’t be terribly stressed. But it seemed to be.. as I said, the performance was “ok”, but not really good. And there was the fact that they’d like to have file sharing.
So, the China side contracted for a better connection, bumping up to 512 themselves. Life would surely improve, right? Nope, it got worse.
And I mean really bad. Type a character and wait for it to be displayed. Really, really bad. Were they getting what they paid for, I asked? Maybe. I had them run a broadband speed test and got back conflicting results. Their upload speed might be as low as 17kbs to some locations. Yeah, 17 – not 170, but just 17. But download is fine, and they say web page access is fine.
Traceroutes to the main office show something holding it up badly just after it leaves their office. I’d like to get traceroutes to other places, I’d like to know what happens if we pull out the router and go direct from a PC, I’d like to know a lot of things. But that lack of English speaking techs has been a problem, so finally they’ve hired another tech firm that does have English capable techs. I’m hoping that with their help we can get to the bottom of it.
So what could be wrong? Well, it could be something like ECN but I’m betting it’s packet or window size somewhere.
First, there’s MTU size, which is the maximum size of a packet. If the packets pass through a network that has a smaller size, they are supposed to get broken up and re-assembled at the other end, but sometimes this stops working somewhere and the packets disappear instead. So I’d like to start by dropping the MTU down at the China end and seeing what happens.
But it could also be a matter of window size, which is a different concept. Machines negotiate tcp window size based on their ability to buffer packets. Fast machines with lots of available RAM can have bigger windows. In fact, modern systems can handle this stuff much more easily than the original design of TCP thought would be possible: the TCP window field is only 16 bits, so that would mean a maximum of 64KB.
That’s too small, so later (1992?) a “Window scaling” ability was added. If used, this multiplies the window size by an eight bit value in another field. But of course that raised an immediate problem: older systems might not understand that, so the rule is that a system that can do scaling says so in its initial SYN packet. The receiving system’s response should respond with the same option set if it groks the concept.
This hasn’t been a problem until recently, probably because any scaling that was done was limited. Linux prior to 2.6 only used a scale of 1 or 2, which isn’t all that aggressive. But suddenly in 2.6.7, the default scale was raised to 7 and things started breaking.
Apparently some routers are responding as though they’d be happy to do scaled windows when in fact they are not. As you might expect, things get very unhappy after that.
The machines in China are Windows XP, the email here is on a Mac Xserve and the Mas90 is on SCO Unix.. and the routers that provide the VPN are Linksys. But who knows what sits between us? Where are we losing it and why? That’s what I need somebody in China for.
*Originally published at APLawrence.com
A.P. Lawrence provides SCO Unix and Linux consulting services http://www.pcunix.com