» Home » Rainbow Tables Packer # Rainbow Tables Packer

Author: EiNSTeiN_ EiNSTeiN@g3nius.org

Created: June 2006

**NOTE** This paper is still a draft, check again later for a full version.
__History__
__How do they work ?__
__Simple Optimisations__
__A Little More Trade-Off Between Time and Memory__

Since it will save an awfull lot of disk space, compression is a great idea, but not if it means trading this space for the time needed to decompress the rainbow tables each time we need to use them.

To overcome this decompression problem, I use a compression method that I (unimaginatively) called "block compression". With this method, a table is compressed in many blocks of a given size. An index is generated, containing details about each blocks. The details kept in this index include the ending point of the first and last chain in the block (so we can determine the*block interval*), the size of the compressed and
uncompressed block, and a checksum of the uncompressed block for sanity check.

Block compression allow rcrack to determine in which block a given hash may be found, only by searching in the index the block whose interval encompas the hash. Only this block is then uncompressed and searched, thus saving the trouble of reading the whole table, while keeping the table compressed to a very decent level.__Implementation Specifics__
__Step-by-Step example__
__Links__

Created: June 2006

__What is a Rainbow Table ?__

In short, a rainbow table is a file that is used to lookup an unknown plaintext from a known hash for an algorithm that does not usually permit this operation.

The idea behind rainbow tables was proposed by Martin Hellman in 1980 in an article
titled *A Cryptanalytic Time-Memory Trade-Off*[1]. It was further refined by Philippe
Oechslin in 2003 in an article titled *Making a Faster Cryptanalytic Time-Memory Trade-Off*[2]

For a very good vulgarisation of [2], see [3].

The RainbowCrack[4] suite of tools is a general purpose implementation of Philippe
Oechslin's Faster Cryptanalytic Time-Memory Trade-Off released by Zhu Shuanglei.
Since then RainbowCrack.com[5] published an improved version of the original tools,
with more supported algorithms.

The work described in this paper is based on RainbowCrack.com's version of the RainbowCrack
suite of tools.

The RainbowCrack suite of tools has three main components: rtgen will generate
the tables, rtsort will sort them and rcrack will use the sorted tables to find the plaintext of
any given hash. The rtgen tool works by performing the following steps: (1) generate a random number
that will be used as the starting point of a rainbow chain; (2) find the ending point of the chain
by successively applying a chosen *hashing function* plus a *reduction function*
up to *n* times on the starting point of the chain, where *n* is the desired length of the rainbow chain;
(3) write the starting point and the ending point in arainbow table; this pair of number is considered
to be the chain in the rest of this paper.
Rainbow tables are usualy split in several small files, typically 650 MB or 1 GB. Tables are also split in several indexes,
by choosing a different *reduction function* in the table generation process.
When the desired amount of chains have been generated, the table is complete and can be sorted.
Tables are sorted in ascending order on the ending point of each chain, and the sorting process is mandatory for
rcrack to work properly. After each table is sorted, the rcrack tool can be used to recover an
unknown plaintext from a known hash. It does this by reading each chain and comparing the hash to
the ending point of the chain. If the hash match, odds are good that the password is part of the
chain, and it may be recovered by generating the middle links of the chain by applying the
*hashing function* plus the the *reduction function* on the starting point of the chain (like in rtgen). If the
plaintext is not recovered then, the *reduction function* is applied *on the hash itself* and the lookup is restarted.

The optimisations described in this paper consists in several steps that one should take to further refine the way each table is
stored on disk. Given any normal rainbow table generated with the RainbowCrack
suite, it is easy to follow the steps described in this paper to transform them into more efficient tables.

As a first example, represent yourself two tables of the same index that would each contain three
chains ending respectively with 3, 149 and 300 for the first table and 18, 150 and 412 for the second table.
Rainbow tables may easily be over 1 GB in size, so while we could read the entire table in no time in
this example, it cost a lot of disk-access time with a real rainbow table. Rcrack should try
to minimize the time it spend accessing the disk. In this example we can easily determine by reading the first and
last chains from each table (32 bytes per table), that the ending values of the chains in the first table will
always be between 3 and 300, and between 18 and 412 for the second table. When looking up a value
such as 342, we know for sure that it can only be located in the second table, so we can spare some time
and not even read the first table. On the other hand, when looking up a value such as 150, we have
no choice but to read both tables.

Tables are divided into indexes because the algorithm require it, but tables of the same index
are divided into multiple files only for convenience (for example,
to burn them on CD or DVD). This mean that it is possible to merge all tables of the same index
into a single file given that they also have the same charset and chain length.
Merging both tables from this example (that is, appending one table at the end of the other and
then sorting the result) would result in a table containing six chains ending
with 3, 18, 149, 150, 300, 412. For convenience, the table can be divided in two halves again,
giving two tables whose chains would end with respectively 3, 18, 149 for the first table and
150, 300, 412 for the last table. The new arrangement of the tables offer a real advantage over thier previous
arrangement because now, when looking for any single hash between 3 and 412 we can complete the process
by reading only one of the two tables. The difference between the first and last values in a table
is called the *table interval* in the rest of this article. In this example, the interval of the
first table is (3;149), and the interval in the second table is (150;412).

At this point, the only difference with normal tables is that we are ensured that the interval
covered by all tables of the same index will never merge. If a table
has an interval (x;y), for any chain *n* where *x <= n <= y*,
one can be ensured that *n* can only be present in the said table.

It may happen that rtgen will generate the same chain multiple times; it happen very rarely
if the chains are generated using a proper random number generator, but it may still happen.
From my experience, a normal table of 1 GB will contain between 150 and 1500 duplicates. 1500
is not a significant ammount of chains, because 1 GB tables contains 67108864 chains total, so 1500 is
much less than 0,01%.

In the previous example, we did not have duplicate chains, but chances are that any real table contains
some. To be a real duplicate, not only the ending point of two chains must be the same,
but also the starting point of those two chains must be the same. It is not necessary to remove the
duplicates, but it is still possible to remove them from a table. In the previous example, when
all tables from the same index have been merged into a single file and this file have been sorted, it is really simple to
remove the duplicates before splitting the tables into multiple pieces again. This step ensure
that rcrack will never loose time checking twice the exact same chain.

Due to the fact that the chains are stored as pairs of 64 bits numbers (each pair is a single chain),
and depending on the hashing function and the charset used, all rainbow tables will contain a very
large percentage of null bytes. The larger the keyspace is, the less null bytes the table will contain.
From my experience, large tables (64 GB) generated using the LM algorithm and a 64 characters charset
will contain up to 40% of null bytes, wheras small LM tables (20 GB) generated with a 32 characters charset
will contain up to 60% of null bytes.

This large null bytes footprint means that rainbow tables can be easily compressed to take between
40% and 60% less space on disk. Try to generate a small file (2 MB) with a small
charset (numeric symbols) using the following commands:

rtgen xxx rtsort xxxThen generate an equally small file (2 MB) but with a much larger charset (all symbols):

rtgen xxx rtsort xxxNow try to compress both. Using normal ZIP compression, the first table will compress from 2 MB to x MB, and the second table will compress from 2 MB to x MB.

Since it will save an awfull lot of disk space, compression is a great idea, but not if it means trading this space for the time needed to decompress the rainbow tables each time we need to use them.

To overcome this decompression problem, I use a compression method that I (unimaginatively) called "block compression". With this method, a table is compressed in many blocks of a given size. An index is generated, containing details about each blocks. The details kept in this index include the ending point of the first and last chain in the block (so we can determine the

Block compression allow rcrack to determine in which block a given hash may be found, only by searching in the index the block whose interval encompas the hash. Only this block is then uncompressed and searched, thus saving the trouble of reading the whole table, while keeping the table compressed to a very decent level.

The RainbowCrack suite of tools are coded in C++, whereas I chose to translate the suite of tools in C
before implementing my own modifications in them.

One step of my implementation require to merge all tables of the same index into a single file, which cannot
be handled by the original RainbowCrack suite because the file-access functions used are limited to 4 GB files.
In my implementation, the file-access functions use 64 bits offsets in order to be able to handle files over 4 GB
in size.

...

...

[1] http://www-ee.stanford.edu/~hellman/publications/36.pdf

[2] http://lasecwww.epfl.ch/~oechslin/publications/crypto03.pdf

[3] http://kestas.kuliukas.com/RainbowTables/