The Kid Sister Crypto System Manifesto
by Brian Hurt

This work is licensed under a Creative Commons Attribution 3.0 United States License.
1. Statement of Purpose
There are two kinds of cryptography in this world: cryptography that will stop your kid sister from reading your files, and cryptography that will stop major governments from reading your files.
-Bruce Schneier, “Applied Cryptography”
Kid Sister Crypto is a project dedicated to the idea that there are times when you don’t need, and are unwilling to pay for (in terms of code complexity and computational cost), crypto strong enough to defeat major governments. Sometimes, crypto systems only strong enough to defeat your kid sister are sufficient. Examples of uses that fall into these categories are hash functions for data structures and simple pseudo-random number generators.
But that even in cases where “kid sister” strength crypto is sufficient, you don’t want to use an algorithm designed by some dilettante with no knowledge of cryptography or even awareness that a cryptosystem (however weak) is what they’re designing- it is much better to use a system designed by some dilettante with some knowledge of cryptography, and doing so explicitly as designing a crypto system. It’d be better still if a real cryptanalyst were to do this, but they’re all busy inventing (and breaking) major-government-proof ciphers, so this is what you get.
The name is picked explicitly to remind everyone that this is not a cryptosystem that will protect you against major governments.
2. Design Goals
Though a superhero, Bruce Schneier disdains the use of a mask or secret identity as ‘security through obscurity’.
-From the website Bruce Schneier Facts
The design goals of Kid Sister Crypto system are as follows:
- Fast. We are trading “cryptographically secure” for “fast”, we want the crypto-system to be fast. The goal is to make the system at least an order of magnitude faster than the common “strong” cryptosystems.
- Reasonably Strong. Despite giving up cryptographically strong, we don’t want there to be simple correlations between the input and the output- these cause problems even if cracking the cryptosystem isn’t a problem (for example, simple recurrences can cause hash tables to exhibit very bad performance). The avalanche effect, where changing a single bit of input changes a large number of bits of output, isn’t sufficient- the changes also have to be (reasonably) non-correlated, i.e. not predictable.
- Simple. The system should be simple enough to allow a programmer to just remember it, and reimplement it trivially. Long sequences of constants that must be reproduced exactly, or tricky implementation requirements, are to be avoided.
- Portable. The system should not depend upon special instructions, inline assembly, arbitrary precision integers, or other features not commonly found in most if not all programming languages and on most computers.
- Software-centric. The main implementation focus will be on software implementations on 32- and 64-bit computers, not hardware implementations, 8-bit embedded implementations, or the like.
- Variable. The implementation should scale to words of different bit lengths (especially 32- and 64- bits).
- No “unexplained constants”- all aspects of the design are made explicit, and all decisions are explained.
3. Example Implementation
Bruce Schneier expects the Spanish Inquisition.
-From the website Bruce Schneier Facts
For the impatient people who skip to the end of the book to see how it all comes out, I first present an example implementation of the algorithm. Also, this will make presenting the design simpler.
This implementation is not optimized for performance, but is instead structured to be easy to understand. For simplicity, it assumes a standard 32- or 64-bit system. For 32-bit systems, it assumes longs are 32 bits and shorts are 16 bits. For 64-bit systems, it assumes longs are 64 bits and ints are 32 bits. This covers pretty much all current popular architectures- other architectures will have to modify the code in obvious ways.
#include <limits.h>
typedef unsigned long fullword_t;
#if ARCH_64_BIT
typedef unsigned int halfword_t;
#define NBITS (32) /* The number of bits in a half word */
#define MULT (707106781186547ul)
#define ADD (314159265358979ul)
#else /* ARCH_64_BIT */
typedef unsigned short halfword_t;
#define NBITS (16)
#define MULT (707106781ul)
#define ADD (314159265ul)
#endif /* ARCH_64_BIT */
#define LOW_WORD(x) ((x) & ((1ul << NBITS) – 1ul))
#define HIGH_WORD(x) ((x) >> NBITS)
/* The round function */
static halfword_t round_func(halfword_t p, halfword_t k) {
fullword temp;
/* Mix-in the subkey */
p ^= k;
/* Multiplication by a constant */
temp = MULT * p;
/* Addition Whitening */
temp += ADD;
/* Result compression */
return (LOW_WORD(temp) ^ HIGH_WORD(temp));
}
fullword_t kid_sister_crypto_encrypt(
fullword_t key,
fullword_t plaintext)
{
halfword_t k[4]; /* The subwords generated by the key */
halfword_t m, n;
/* Feistel network inputs */
m = plaintext & ((1ul << N) – 1ul);
n = plaintext >> N;
/* Subkey generation */
k[0] = LOW_WORD(key);
k[1] = HIGH_WORD(key);
/* Implement the PHT */
k[2] = k[0];
k[3] = k[1];
k[2] += k[3];
k[3] += k[2];
/* Feistel Network */
n ^= round_func(m, k[0]);
m ^= round_func(n, k[1]);
n ^= round_func(m, k[2]);
m ^= round_func(n, k[3]);
/* Ciphertext reconstitution */
return ((((fullword_t) n) << NBITS) | m);
}
4. Design
Bruce Schneier is the Bombe
-From the website Bruce Schneier Facts
The design of Kid Sister Crypto is as follows:
Private key
The core function of Kid Sister Crypto is a private-key crypto system, like DES and AES (except not as strong), which can then be used (efficiently) as a hash function or pseudo-random number generator in the same ways that have been developed to turn cryptographically strong crypto systems into hash functions or pseudo-random number generators.
Rationale: private key crypto systems are generally much faster than public key crypto systems. Also, private key crypto systems are easier to turn into hashes, random number generators, etc.
This is reflected in the example code above by the signature of the main function, which takes a key and a plain text block, and returns a cipher text block.
Single word key and block size.
The size of both the key, the plain text block, and the cipher text block, should be “one word”, where the size of a word is defined by the implementation.
Rationale: A single word is too short for a cryptographically secure cryptosystem, but we don’t care about that. Keeping both the keys and the blocks short means we can stay in registers, even in register-starved architectures like the x86-32. And generally what we want is a single word of “output” bits, so the extra work to generate extra bits which are just going to be ignored is worthless.
This is reflected in the example code above by the fact that the plain text block and key passed in are both full words, as is the cipher text block returned.
Feistel Network Structure
We assume the plain text to the cipher is an unsigned word of 2n bits (i.e. with a maximum value of 22n – 1). This input word is then split into two sub-words, each of n bits. The two subwords are, respectively, just the high order and low order bits- so if the input plain text is integer value X, then the two subwords are X/2n (round down) and X mod 2n. The two subwords are then used to construct a two-round Feistel network, like this:
Rationale: Feistel networks are the gold example of what I mean by using the knowledge gained in building strong crypto systems in a weak crypto system. Feistel networks have been extensively studied in the literature and found to be incredibly robust. They bring with them a lot of advantages, and yet they’re really simple and cheap to implement.
for example, every Feistel network cipher is automatically invertible, i.e. it can be decrypted (if you know the key). If, in the example code above, I reversed the order that the subround function was applied, so it was like this:
n ^= round_func(m, k[0]);
m ^= round_func(n, k[1]);
n ^= round_func(m, k[2]);
m ^= round_func(n, k[3]);The function would decrypt messages previously encrypted with Kid Sister Crypto. Note that it is then easy to show that, for every key, every plain text block encrypts to a unique cipher text block, and every cipher text block decrypts to a unique plain text block.
This property is a useful property- for example, it’s now easy to show that there are no “stuck bits” in the output- meaning I don’t have to worry about some values not being “reachable” in the cipher.
Two rounds was chosen to be enough to get good “mix-in” characteristics- several tests I ran show that after two rounds, every bit of input has the potential to change every bit of output. But this is short enough to keep Kid Sister Crypto about as fast, per byte, as FNV. Kid Sister should be slightly slower on 32-bit, somewhat faster on 64-bit, than FNV.
Design of the round function
Each round of the Feistel Network calls a round function (named f in the graphic above, and
round_funcin the example code). This function combines a halfword of the (partially-encrypted) plaintext and a halfword from the expanded key into a half word of output, which is then xor’ed into the other halfword of the (partially-encrypted) plain text. It is the round function where the real “mixing” goes on. The main requirements of the round function are two fold: 1. To be fast, 2. to have good mixing properties. It does not need to be invertible in any sane way.The design decisions of the round function are as follows:
Mix-in in the subkey
The plain text input is xored with the subkey input. It needs to be mixed in somewhere, and this is cheap.
Technically, this is addition of GF(2n) polynomials (where n is the number of bits in a half-word), but programmers just know this as “xor”. One advantage mixing xors with “normal” integer arithmetic like multiply and add is that it’s different algebras. A common cause of correlations is to use only operations from a single algebra- and thus making it easy to describe correlations within that algebra. By using operations from different algebras this is constrained. Every crypto system since IDEA has used operations from different algebras for this very reason (well, except AES, but that’s a rant for a different day) for this very reason.
In the example code, this is done by the code:
/* The round function */
static halfword_t round_func(halfword_t p, halfword_t k) {
…
/* Mix-in the subkey */
p ^= k;
…Multiplication by a constant.
Multiplication by a constant is a wonderful way to get lots of bits interacting with lots of other bits. If you have an N-bit constant, then bit i of the input half-word is going to have an effect of bits i to i+N on the output full-word.
The choice of a constant is tricky. You want a number with a lot of set bits. Multiplying by a constant with few 1 bits isn’t very helpful- if the constant has only a few bits set, any given bit will only interact with a few other bits. But you also want a lot of 0 bits in the constant- multiplying by -1 isn’t very helpful either. About equal numbers of 1′s and 0′s in the constant would be good. It doesn’t have to be exactly equal, just close.
It also helps if the highest order set bit of the constant is large, allowing low-order input bits to affect more higher order output bits. It doesn’t have to be equal to the full precision of the word, but it helps if it’s large.
Another constraint on the constant is that it needs to be something easy for programmers to remember. And, it’d be nice if, on a n-bit system (n = 32 or 64), that the constant have a GCD of 1 with both 2n and 2n-1, for various number theoretical reasons.
The first 9 or 15 digits of the square root of 2, divided by 2, taken as an integer number, fulfill these requirements. The square root of 2, divided by 2, is 0.70710678118654752…, so for the 32-bit implementation I use the constant 707,106,781, and for the 64-bit implementation I use the constant 707,106,781,186,547. The 32-bit constant has 17 set bits of 30, while the 64-bit constant has 20 bits set of 50- not perfect, but acceptable.
Note that multiplication by the constant expands a half-word input (the xor between the input subkey and the input plain text) into a full-word output. This is compressed back down into a half-word of output later.
For those playing along at home, the code that implements this in the example is:
#if ARCH_64_BIT
#define MULT (707106781186547ul)
#else /* ARCH_64_BIT */
#define MULT (707106781ul)
#endif /* ARCH_64_BIT */
static halfword_t round_func(halfword_t p, halfword_t k) {
fullword temp; /* The intermediate full-word value */
…
/* Multiplication by a constant */
temp = MULT * p;
…Addition Whitening
The original design of the Kid Sister Crypto System had the weakness that plain text 0, key 0 encrypted to the cipher text 0. While not technically a weakness, this struck me as being too obvious, so a late addition to the crypto system was some whitening. After the multiplication, an arbitrary constant is added just to mix things up. This whitening has no effect on the cryptographic strength of Kid Sister (if that phrase didn’t just make you smirk, you’re not on board with the whole idea here), except to make plain text 0, key 0 encrypt to something not 0.
As a constant, I picked the first 9 or 15 digits of Pi. Pi is 3.1415926535897932, so, after multiplication, the constant 314,159,265 (for 32-bit systems) or 314,159,265,358,979 (for 64-bit systems) is added in to the full word result. The main constraints on this number are that it is easy to remember, that it have about equal numbers of set and clear bits (13 of 29 for the 32-bit constant, 25 of 49 for the 64-bit constant), and that it be about the right size.
I used addition instead of xor for whitening because I used xor in the next phase (result compression, below). It’s OK that it’s the same algebra as the multiplication- it’s purpose isn’t to eliminate correlations.
For your scrap book collection, the code that implements this in the example code is:
#if ARCH_64_BIT
#define ADD (314159265358979ul)
#else /* ARCH_64_BIT */
#define ADD (314159265ul)
#endif /* ARCH_64_BIT */
/* The round function */
static halfword_t round_func(halfword_t p, halfword_t k) {
fullword temp; /* The intermediate full-word value */
…
/* Addition Whitening */
temp += ADD;
…Result compression
One of the problems with using multiplication by a constant for the avalanche effect is that a given bit of the input can only affect equal or higher order bits on the output- there is no way to ‘wrap around’ to affect lower order bits. Also, it generates a full word of output, which needs to be compressed back down into a half word to fit the Feistel network plan.
I solve both of these problems by treating the full word output of the multiplication and whitening as a polynomial in GF(22n) (where n is the number of bits in a half-word, so 2n is the number of bits in a full-word), and calculating that polynomial modulo the polynomial Xn + 1.
That sounds complicated, but what it boils down to in programmer terms is that I xor the high-order half-word with the low-order half-word. That’s it. It’s not a complicated operation, but it’s an important one- because high order bits of the input half-word can and do affect the low order bits of the high half-word of the result. This operation allows that affect to “wrap around” and affect the low-order bits of the result. And, because it’s an operation in a different operation, it helps prevent correlations.
In case you’re wondering, the code that does this in the example code is:
static halfword_t round_func(halfword_t p, halfword_t k) {
fullword temp; /* The intermediate full-word value */
…
/* Result compression */
return (LOW_WORD(t) ^ HIGH_WORD(t));
}
Key Schedule
For two rounds of the Feistel network, I need four half-words of key material. The first two words are obvious- the low and high half-words of the full word key. The rule is simple- low order bits goes with low order bits, and low order bits go before high order bits- so the low half-word of the key, along with the low half-word of the plain text, feed into the first call to the subround function, and so on.
For the next two half-words of key material, I’ve decided to go with a Pseudo-Hadamard Transformation on the original two half-words of the key. The PHT is defined as:
a’ = a + b mod 2n
b’ = a + 2b mod 2n
This is easier to understand if I expressive b’ as:
b’ = b + a’ mod 2n
At which point it’s easy to see that this is just two additions- add the high order half-word of the key into the low order half-word, and then add that back into the high order half-word.
Rationale: I don’t need a lot of security here, I just need to mix up the bits a little bit. And the PHT has the advantage of being fast. And the first operating the key bits are used for in the subround function is xor, so addition modulo 2n has the advantage of being a different algebra.
The one downside the PHT has, from my point of view, is that a key of 0 doesn’t change. I considered adding some whitening to the operation (adding in a constant, maybe 1), but felt that it was unnecessary. Instead, I added the whitening into the subround function. One advantage this decision had was that in cases where the key is fixed, the implementation can simply define the fixed key to be 0, and eliminate all the instructions associated with the key.
In the example code, I used the array
kto hold the four half-words of the key schedule. The code to fill those words in is:/* Subkey generation */
k[0] = LOW_WORD(key);
k[1] = HIGH_WORD(key);
/* Implement the PHT */
k[2] = k[0];
k[3] = k[1];
k[2] += k[3];
k[3] += k[2];
6. Conclusion
Bruce Schneier doesn’t own a dog. His doghouse is already filled with unusable cryptography products.
-From the website Bruce Schneier Facts
Kid Sister Crypto fulfills several of it’s design criteria by simple inspection- it’s a simple cipher, with few (two) constants, and those are easy to remember common numbers. There are no obvious congruences, and the common ones are explicitly protected against.
Experimentation with Kid Sister Crypto shows that it’s also fast- on a 1.8GHz AMD Athlon XP, a portable C implementation of the 32-bit version of the algorithm was benchmarked at 43 million encryptions per second, or about 42 clock cycles per word, less than 11 clock cycles per byte.
Performance on 64-bit systems is even better- an 800MHz AMD Turion was able to do 39 million encryptions per second, or about 20 clock cycles per word. And since words were 8 bytes, this equates to about 2.5 clock cycles per byte.
It’s also strong- as a random number generator, it passes the Dieharder suite of random number tests (include occasionally even passing “broken” test #2).
As such, I think Kid Sister Crypto forms a fine basis for pseudo-random number generators and hash functions in all languages.
Related posts:
