What is a Rainbow Table ?
In short, a rainbow table is a file that is used to lookup an unknown plaintext from a known hash for an algorithm that does not usually permit this operation.History
The idea behind rainbow tables was proposed by Martin Hellman in 1980 in an article
titled A Cryptanalytic Time-Memory Trade-Off. It was further refined by Philippe
Oechslin in 2003 in an article titled Making a Faster Cryptanalytic Time-Memory Trade-Off
For a very good vulgarisation of , see .
The RainbowCrack suite of tools is a general purpose implementation of Philippe Oechslin's Faster Cryptanalytic Time-Memory Trade-Off released by Zhu Shuanglei. Since then RainbowCrack.com published an improved version of the original tools, with more supported algorithms.
The work described in this paper is based on RainbowCrack.com's version of the RainbowCrack suite of tools.
The RainbowCrack suite of tools has three main components: rtgen will generate the tables, rtsort will sort them and rcrack will use the sorted tables to find the plaintext of any given hash. The rtgen tool works by performing the following steps: (1) generate a random number that will be used as the starting point of a rainbow chain; (2) find the ending point of the chain by successively applying a chosen hashing function plus a reduction function up to n times on the starting point of the chain, where n is the desired length of the rainbow chain; (3) write the starting point and the ending point in arainbow table; this pair of number is considered to be the chain in the rest of this paper. Rainbow tables are usualy split in several small files, typically 650 MB or 1 GB. Tables are also split in several indexes, by choosing a different reduction function in the table generation process. When the desired amount of chains have been generated, the table is complete and can be sorted. Tables are sorted in ascending order on the ending point of each chain, and the sorting process is mandatory for rcrack to work properly. After each table is sorted, the rcrack tool can be used to recover an unknown plaintext from a known hash. It does this by reading each chain and comparing the hash to the ending point of the chain. If the hash match, odds are good that the password is part of the chain, and it may be recovered by generating the middle links of the chain by applying the hashing function plus the the reduction function on the starting point of the chain (like in rtgen). If the plaintext is not recovered then, the reduction function is applied on the hash itself and the lookup is restarted.Simple Optimisations
The optimisations described in this paper consists in several steps that one should take to further refine the way each table is
stored on disk. Given any normal rainbow table generated with the RainbowCrack
suite, it is easy to follow the steps described in this paper to transform them into more efficient tables.
As a first example, represent yourself two tables of the same index that would each contain three chains ending respectively with 3, 149 and 300 for the first table and 18, 150 and 412 for the second table. Rainbow tables may easily be over 1 GB in size, so while we could read the entire table in no time in this example, it cost a lot of disk-access time with a real rainbow table. Rcrack should try to minimize the time it spend accessing the disk. In this example we can easily determine by reading the first and last chains from each table (32 bytes per table), that the ending values of the chains in the first table will always be between 3 and 300, and between 18 and 412 for the second table. When looking up a value such as 342, we know for sure that it can only be located in the second table, so we can spare some time and not even read the first table. On the other hand, when looking up a value such as 150, we have no choice but to read both tables.
Tables are divided into indexes because the algorithm require it, but tables of the same index are divided into multiple files only for convenience (for example, to burn them on CD or DVD). This mean that it is possible to merge all tables of the same index into a single file given that they also have the same charset and chain length. Merging both tables from this example (that is, appending one table at the end of the other and then sorting the result) would result in a table containing six chains ending with 3, 18, 149, 150, 300, 412. For convenience, the table can be divided in two halves again, giving two tables whose chains would end with respectively 3, 18, 149 for the first table and 150, 300, 412 for the last table. The new arrangement of the tables offer a real advantage over thier previous arrangement because now, when looking for any single hash between 3 and 412 we can complete the process by reading only one of the two tables. The difference between the first and last values in a table is called the table interval in the rest of this article. In this example, the interval of the first table is (3;149), and the interval in the second table is (150;412).
At this point, the only difference with normal tables is that we are ensured that the interval covered by all tables of the same index will never merge. If a table has an interval (x;y), for any chain n where x <= n <= y, one can be ensured that n can only be present in the said table.
It may happen that rtgen will generate the same chain multiple times; it happen very rarely if the chains are generated using a proper random number generator, but it may still happen. From my experience, a normal table of 1 GB will contain between 150 and 1500 duplicates. 1500 is not a significant ammount of chains, because 1 GB tables contains 67108864 chains total, so 1500 is much less than 0,01%.
In the previous example, we did not have duplicate chains, but chances are that any real table contains some. To be a real duplicate, not only the ending point of two chains must be the same, but also the starting point of those two chains must be the same. It is not necessary to remove the duplicates, but it is still possible to remove them from a table. In the previous example, when all tables from the same index have been merged into a single file and this file have been sorted, it is really simple to remove the duplicates before splitting the tables into multiple pieces again. This step ensure that rcrack will never loose time checking twice the exact same chain.
Due to the fact that the chains are stored as pairs of 64 bits numbers (each pair is a single chain),
and depending on the hashing function and the charset used, all rainbow tables will contain a very
large percentage of null bytes. The larger the keyspace is, the less null bytes the table will contain.
From my experience, large tables (64 GB) generated using the LM algorithm and a 64 characters charset
will contain up to 40% of null bytes, wheras small LM tables (20 GB) generated with a 32 characters charset
will contain up to 60% of null bytes.
This large null bytes footprint means that rainbow tables can be easily compressed to take between 40% and 60% less space on disk. Try to generate a small file (2 MB) with a small charset (numeric symbols) using the following commands:
rtgen xxx rtsort xxxThen generate an equally small file (2 MB) but with a much larger charset (all symbols):
rtgen xxx rtsort xxxNow try to compress both. Using normal ZIP compression, the first table will compress from 2 MB to x MB, and the second table will compress from 2 MB to x MB.
The RainbowCrack suite of tools are coded in C++, whereas I chose to translate the suite of tools in C
before implementing my own modifications in them.
One step of my implementation require to merge all tables of the same index into a single file, which cannot be handled by the original RainbowCrack suite because the file-access functions used are limited to 4 GB files. In my implementation, the file-access functions use 64 bits offsets in order to be able to handle files over 4 GB in size.