Shaping your traffic for improved performance

Sometimes, putting up an arbitrary limit for your network connection can actually improve the performance.

A typical setup will look like this: your computer is connected via 54-100 mBit to a router, which has a DSL line of let’s say 2 mBit downstream and 192 kBit upstream (and then tons of other links).

When you overload one of these connections, performance can go down a lot.

The simples case how this can happen is when you upload data. 192 kBit are the weakest link here, and rather easy to fill up.

What happens then is that your computer will start sending out data at his full link speed, until the buffers at the router are full and it starts dropping packets. TCP/IP will then adopt and - lacking ‘ACK’ packets - slow down sending data. It will likely still keep the send-buffer of the router full.

Now if the router isn’t very smart, it just keeps a first-in-first-out buffer. As long as you have just this single upload, you don’t need to care. But if you have other connections, like a download, you do care.

For your download to run at full speed, the sender needs to get those ACK packets from you. They’re basically a “got your data, send more” kind of message. Really tiny packets, but essential for signaling.

A simple trick you can use now is to limit your bandwith at the sending computer already. Since you’re going to limit it just to the next link anyway it doesn’t mean it will actually be getting much slower. But your network stack is smarter than the router’s (probably). It can send out these ACK packets at a higher priority, so you end up with a better download-and-upload performance.

If you know your network connection speed, just use it. Otherwise you can still measure it or guess it. I’m always going for 95% of the speed I see. You might want to limit it a bit more if you are sharing the connection with others.

Now enough theory, here’s what I use:

IFACE=wifi0
# setup ingress filtering
/sbin/tc qdisc add dev $IFACE handle ffff: ingress
# local network is unrestricted
/sbin/tc filter add dev $IFACE parent ffff: protocol ip \
  prio 10 u32 match ip src 192.168.2.0/24 \
  police conform-exceed ok/ok flowid :1
# incoming internet traffic is limited.
/sbin/tc filter add dev $IFACE parent ffff: protocol ip \
  prio 50 u32 match ip src 0.0.0.0/0 police \
  rate 1500kbit  burst 10k drop flowid :2
# outgoing traffic
/sbin/tc qdisc add dev $IFACE root handle 1: htb default 30
# local network is unrestricted
/sbin/tc class add dev $IFACE parent 1: classid 1:10 htb \
  rate 54mbit burst 15k
/sbin/tc qdisc add dev $IFACE parent 1:10 handle 10: sfq perturb 10
/sbin/tc filter add dev $IFACE parent 1:0 protocol ip \
  prio 10 u32 match ip dst 192.168.2.0/24 flowid 1:10
# internet traffic is limited.
/sbin/tc class add dev $IFACE parent 1: classid 1:20 htb \
  rate 160kbit ceil 196kbit burst 15k
/sbin/tc qdisc add dev $IFACE parent 1:20 handle 20: sfq perturb 10
/sbin/tc filter add dev $IFACE parent 1:0 protocol ip \
  prio 50 u32 match ip src 0.0.0.0/0 flowid 1:20

Long example, I know. The main reason is that I’m not limiting traffic to local computers. Furthermore, I’m also limiting my incoming traffic to use up at most 1.5 mbit of the 2 mbit connection, so my housemates will be able to use the internet as well (Debian mirrors send data fast - fast enough to kick them out of any online game they might be playing…).

There is one important detail hidden in there: sfq. This stands for “stochastic fair queueing”. It tells the network stack to basically send “one packet for each connection in turn”. This way, the small packets needed for download or interactive SSH sessions will get out quickly even when doing a larger upload.

It works great for me - I’m having a good download rate, using the 160 kBit upload limit completely (I don’t want to completely fill up the 192k upload link either) and I’m actually writing this blog entry remotely via SSH. The lag when writing is okay, probably around 300ms. Without shaping, this wouldn’t work this well.

(And no: just using ‘sfq’ for your outgoing traffic is not enough, since the key scheduling is happening at the weakest link. So you have to limit your traffic shaping setup to be just below the actual weakest link.)