(If you are not a cryptoplumber, the following words will be indistinguishable from random... that might be a good thing!)
When I and Zooko created the SDP1 layout (for "Secure Datagram Protocol #1") one of the requirements wasn't to avoid traffic analysis. So much so that we explicitly left all that out.
But, the times, they are a-changing. SDP1 is now in use in one app full-time, and 3 other apps are being coded up. So I have more experience in how to drive this process, and not a little idea about how to inform a re-design.
And the war news is bleak, we are getting beaten up in Eurasia. Our boys over there sure could use some help.
So how to avoid traffic analysis? A fundamental way is to be indistinguishable from random, as that reveals no information. Let's revisit this and see how far we can get.
One way is to make all packets around the same size, and same frequency. That's somewhat easy. Firstly, we expand out the (internal) Pad to include more mildly random data so that short packets are boosted to the average length. There's little we can do about long packets, except break them up, which really challenges the assumptions of datagrams and packet-independence.
Also, we can statistically generate some chit chat to keep the packets ticking away ... although I'd personally hold short of the dramatically costly demands of the Freedom design, which asked you to devote 64 or 256k permanent on-traffic to its cause. (A laughable demand, until you investigate what Skype is doing, now, successfully, on your computer this very minute.)
But a harder issue is the outer packet layer as it goes over the wire. It has structure, so someone can track it and learn information from it. Can we get rid of the structure?
The consists of open network layout consists of three parts - a token, a ciphertext and a MAC.
|Token||. . . enciphered . . . text . . .||SHA1-HMAC|
Each of these is an array, which itself consists of a leading length field followed by that many bytes.
|Length Field||. . . many . . . bytes . . . length . . . long . . .|
(Also, in all extent systems, there is a leading byte 0x5D that says "this is an SDP1, and not something else." That is, the application provides a little wrapping because there are cases where non-crypto traffic has to pass.)
Those arrays were considered necessary back then - but today I'm not so sure. Here's the logic.
Firstly the token creates an identifier for a cryptographic context. The token indexes into the keys, so it can be decrypted. The reason that this may not be so necessary is that there is generally another token already available in the network layer - in particular the UDP port number. At least one application I have coded up found itself having to use a different port number (create a new socket) for every logical channel, not because of the crypto needs, but because of how NAT works to track the sender and receiver.
(This same logic applies to the 0x5D in use.)
Secondly, skip to the MAC. This is in the outer layer - primarily because there is a paper (M. Bellare and C. Namprempre, Authenticated Encryption: Relations among notions and analysis of the generic composition paradigm Asiacrypt 2000 LNCS 1976) that advises this. That is, SDP1 uses Encrypt-then-MAC mode.
But it turns out that this might have been an overly conservative choice. Earlier fears that MAC-then-Encrypt mode was insecure may have been overdone. That is, if due care is taken, then putting the MAC inside the cryptographic envelope could be strong. And thus eliminate the MAC as a hook to hang some traffic analysis on.
So let's assume that for now. We ditch the token, and we do MAC-then-encrypt. Which leaves us with the ciphertext. Now, because the datagram transport layer - again, UDP typically - will preserve the length of the content data, we do not need the array constructions that tell us how long the data is.
Now we have a clean encrypted text with no outer layer information. One furfie might have been that we would have to pass across the IV as is normally done in crypto protocols. But not in SDP1, as this is covered in the overall "context" - that session management that manages the keys structure also manages the IVs for each packet. By design, there is no IV passing needed.
Hey presto, we now have a clean encrypted datagram which is indistinguishable from random data.
Am I right? At the time, Zooko and I agreed it couldn't be done - but now I'm thinking we were overly cautious about the needs of encrypt-then-Mac, and the needs to identify each packet coming in.
(Hat tip to Todd for pushing me to put these thoughts down.)Posted by iang at May 20, 2006 04:02 AM | TrackBack