Financial Cryptography: Indistinguishable from random...

May 20, 2006

Indistinguishable from random...

(If you are not a cryptoplumber, the following words will be indistinguishable from random... that might be a good thing!)

When I and Zooko created the SDP1 layout (for "Secure Datagram Protocol #1") one of the requirements wasn't to avoid traffic analysis. So much so that we explicitly left all that out.

But, the times, they are a-changing. SDP1 is now in use in one app full-time, and 3 other apps are being coded up. So I have more experience in how to drive this process, and not a little idea about how to inform a re-design.

And the war news is bleak, we are getting beaten up in Eurasia. Our boys over there sure could use some help.

So how to avoid traffic analysis? A fundamental way is to be indistinguishable from random, as that reveals no information. Let's revisit this and see how far we can get.

One way is to make all packets around the same size, and same frequency. That's somewhat easy. Firstly, we expand out the (internal) Pad to include more mildly random data so that short packets are boosted to the average length. There's little we can do about long packets, except break them up, which really challenges the assumptions of datagrams and packet-independence.

Also, we can statistically generate some chit chat to keep the packets ticking away ... although I'd personally hold short of the dramatically costly demands of the Freedom design, which asked you to devote 64 or 256k permanent on-traffic to its cause. (A laughable demand, until you investigate what Skype is doing, now, successfully, on your computer this very minute.)

But a harder issue is the outer packet layer as it goes over the wire. It has structure, so someone can track it and learn information from it. Can we get rid of the structure?

The consists of open network layout consists of three parts - a token, a ciphertext and a MAC.

Token

. . . enciphered . . . text . . .

SHA1-HMAC

Each of these is an array, which itself consists of a leading length field followed by that many bytes.

Length Field

. . . many . . . bytes . . . length . . . long . . .

(Also, in all extent systems, there is a leading byte 0x5D that says "this is an SDP1, and not something else." That is, the application provides a little wrapping because there are cases where non-crypto traffic has to pass.)

Len

token

Len	. . enciphered . . text . .

Len

SHA1-HMAC

Those arrays were considered necessary back then - but today I'm not so sure. Here's the logic.

Firstly the token creates an identifier for a cryptographic context. The token indexes into the keys, so it can be decrypted. The reason that this may not be so necessary is that there is generally another token already available in the network layer - in particular the UDP port number. At least one application I have coded up found itself having to use a different port number (create a new socket) for every logical channel, not because of the crypto needs, but because of how NAT works to track the sender and receiver.

(This same logic applies to the 0x5D in use.)

Secondly, skip to the MAC. This is in the outer layer - primarily because there is a paper (M. Bellare and C. Namprempre, Authenticated Encryption: Relations among notions and analysis of the generic composition paradigm Asiacrypt 2000 LNCS 1976) that advises this. That is, SDP1 uses Encrypt-then-MAC mode.

But it turns out that this might have been an overly conservative choice. Earlier fears that MAC-then-Encrypt mode was insecure may have been overdone. That is, if due care is taken, then putting the MAC inside the cryptographic envelope could be strong. And thus eliminate the MAC as a hook to hang some traffic analysis on.

So let's assume that for now. We ditch the token, and we do MAC-then-encrypt. Which leaves us with the ciphertext. Now, because the datagram transport layer - again, UDP typically - will preserve the length of the content data, we do not need the array constructions that tell us how long the data is.

Now we have a clean encrypted text with no outer layer information. One furfie might have been that we would have to pass across the IV as is normally done in crypto protocols. But not in SDP1, as this is covered in the overall "context" - that session management that manages the keys structure also manages the IVs for each packet. By design, there is no IV passing needed.

Hey presto, we now have a clean encrypted datagram which is indistinguishable from random data.

Am I right? At the time, Zooko and I agreed it couldn't be done - but now I'm thinking we were overly cautious about the needs of encrypt-then-Mac, and the needs to identify each packet coming in.

(Hat tip to Todd for pushing me to put these thoughts down.)

Posted by iang at May 20, 2006 04:02 AM | TrackBack

Comments

It's too bad I have so many others things going on right now, including moving to Colorado.

The reason we thought it couldn't be done perfectly was that the recipient needs to know which key to use to decrypt, and that information the "Which key should you use to decrypt this packet" information, cannot itself be encrypted by that key, of course.

I remain interested! Keep me posted!

Posted by: Zooko at May 21, 2006 06:45 AM

FTR: it occurred to me that the datalength of the packet is always going to be a multiple of 16. That tells us that it is using a modern block cipher at least. If one wanted to hide that then adding 0-15 random bytes would do that.

Posted by: Iang at July 1, 2006 02:22 PM

....

That's not typical of malware. Most sophisticated malware employs stronger encryption, but the trade-off for the attacker is that its traffic can trigger a red flag at the network layer. "Entropy and complexity is used by most [malware developers]," James says. "In the world of encryption detection of malware at the network layer ... you watch the traffic generated by it and if the measure of randomness/entropy is high," that could be a sign of malware with crypto, he says.

Flame's creators either used easily cracked encryption to camouflage the attack, or it could be a function of the size of the overall code, he says. "They didn't want you to detect that they were hiding anything. They wanted to look like common data," James says. "It did the opposite of what everyone is expecting with malware. And that's what helped it stay undetected for so long."

....

Posted by: Flame isn't.... at June 28, 2012 07:49 PM