Monthly Archives: December 2011

Bitcoin – An Analysis

Kay Hamacher and Stefan Katzenbeisser presented their analysis of Bitcoin at 28C3. Besides the usual cryptographic analysis, they pointed out some important aspects of the system:

Bitcoin doesn’t scale well, and no update mechanism has been designed!

Bitcoin was created as a total decentralized system. There is also no central authority or update mechanism, that could alter the system. As soon as Bitcoin hits it’s boundaries, all Bitcoin users need to agree on a change of the system and integrate it. Because every user of Bitcoin needs to keep a full log of all transactions, that have ever been executed on the Bitcoin network, it is clear that Bitcoin will hit it’s scaleability boundaries rather soon, if it gets more popular.

Bitcoins can bet lost and the number of Bitcoins that can be generated is fixed!

Bitcoins are a digital currency, that are stored on the current owners computer. If for example the harddisk crashes, or it is encrypted an the password is lost, these Bitcoins cannot be spend anymore. They still exist in the network, and also cannot be regenerated. One could also say, these Bitcoins are lost. Also, the total amount of Bitcoins, that can ever be generated is fixed. So we need to expect, that the total amount of Bitcoins will start to decline, as soon as all Bitcoins have been generated. Just assume, that 1% of all Bitcoins are lost per year, then after  \log_{0.99} (0.5) \approx 69  years, only 50% of the Bitcoins will still be there.

Bitcoin is not untraceable and anonymous!

Because Bitcoin keeps a log of all transactions, and this log is available to the public, one can trace which address has received how much money, and where it went. Bitcoin allows a user to have more than one identity, but as soon as money from more than one address is used in a single transaction, one can assume that these addresses belong to the same user. One can for example see, who spend money to Wikileaks, and where Wikileaks transferred that money. Also if you assume, that a person ears money only from Bitcoin, you know his total income. You also know when he transfers money on Bitcoin and when not, so you might find out when he sleeps and in which time zone he lives in.

What has not been found…

As mentioned at the beginning of the post, there is no central update mechanism. So if somebody would find a bug in the design of the system, that allows him to steal money from it, it cannot easily be fixed. So far, no attack one the basics of the bitcoin system has been found and bitcoin is running and getting more and more popular.

Time is on my Side – Exploiting Timing Side Channel Vulnerabilities on the Web

Sebastian Schinzel gave an interesting talk today at 28C3, about timing side channel attacks against web applications. (Timing-) Side channel attacks are known in the cryptography world for a long time, and many algorithms like RSA or AES have been successfully attacked. In a nutshell, an attacker measures the time a device needs to process a request (usually an encryption or decryption), and can draw conclusions from that to the values of secret input parameters (a plaintext or a secret key).

(cc)Sebastian showed, that this can be used against none cryptography web applications as well. Instead of just presenting his attacks, he presented general methods how to do timing measurements against web applications first. For example, a web application could perform the following sequence of checks during a user login:

  1. Does the account exist?
  2. Is the account of the user locked?
  3. Has the account expired?
  4. Is the password correct?

If one of these checks fails, the procedure is aborted and an error page is send to the user. Of course, each of these steps requires some time, and from the time it takes from the request to the generation of the error message, one might guess, which of these steps went wrong.

The second attack presented in this talk was a timing attack on an implementation of the XML encryption standard using a PKCS#1.5 padding. Here, the server needs a longer time to process a request, depending on the padding inside the encrypted payload.

For me, my personal highlight was the extension of timing based side channel attacks to none cartographic web applications. I assume, if one would check some famous web applications, many of such timing leaks could be found, because web developers usually don’t care about timing side channels. The timing difference could also be used to assist blind SQL injection attacks, where the timing difference could be the only channel back to the attacker.

Unfortunately, the slides are not (yet) available, but a previous paper describing the methods can be found at http://sebastian-schinzel.de/_download/cosade-2011-extended-abstract.pdf.

Effective DoS attacks against Web Application Plattforms – #hashDoS [UPDATE3]

Julian Wälde (zeri) and Alexander Klink (alech) presented a very nice way how to bring down many popular websites at 28C3. The idea is quiet simple, effective and general. Many programming languages, especially scripting languages, frequently use hash tables to store all kinds of data.

Hash Tables

Hash tables (which are very well explained on wikipedia) are data structures, that store key-value pairs very efficiently. Adding new entries to the table, looking up entries in the table for a given key, and deleting entries are usually executed in $latex O(1)$ in best and average case, which means that the time for each operation is usually constant, and doesn’t depend on the number of entries stored in the table. Other data structures like binary tree type structures need $latex O(\log(n))$ entries in the average case, which means that they need more time for these operations when a lot of entries are stored. (n is the number of entries stored) However, there is a drawback, hash tables need $latex O(n)$ operations for these operations in the worst case, compared to $latex O(\log(n))$ operations for binary trees, which means that they are much slower in very rare cases.

How the attack works

Many hash table implementations are very similar. They use a deterministic hash function to map a key k (usually a string or another complex data structure) to a single 32 or 64 bit value h. Let l be the size of the current table. Then  h%l is used as an index in this table, where the entry is stored. If more than one entry is mapped to the same index, then a linked list of entries is stored at this index position in the table. If many different keys are all mapped to the same hash value h, then all these entries are stored at the same index of the table, and the performance of the hash table goes down to a simple linked list, where all operations need $latex O(n)$ time.

Performance

Just to get an impression, how much CPU time it takes to process such a request on a Core i7 CPU with PHP, assuming that the processing time for a request is not limited (usually, PHP limits the processing time for a request to 1 minute):

  • 8 MB of POST data –  288 minutes of CPU time
  • 500k of POST data – 1 minute of CPU time
  • 300k  of POST data – 30 sec of CPU time

So you can keep about 10.000 Core i7 CPU cores busy processing PHP requests using a gigabit internet connection. Alternatively for ASP.NET, 30,000 Core2 CPU cores, or for Java Tomcat 100,000 Core i7 CPU cores, or for CRuby 1.8 1,000,000 Core i7 CPU cores, can be kept busy with a single gigabit connection.

Even though this blog is named cryptanalysis.eu, these hash functions used for these tables are not cryptographic hash functions like SHA256, instead simple hash functions like djb2 are used, and it is very easy to find many inputs mapping to the same output. Julian and Alexander did a great job with checking many programming languages used for web applications for their hash table implementation and hash functions. For all of them they checked, they managed to find a lot of keys mapping to the same output, except for Perl. Perl uses a randomized hash function, i.e. the hash doesn’t only depend on the key, but also on an additional value, that is chosen at startup of the application at random. All other languages also store the query parameters send in an HTTP GET or POST request in an hash table, so that a request with many query parameters all mapping to the same hash value will slowly fill such a hash table, before the first line of code written by the application programmer will be executed. Filling this hash table will usually take several minutes on a decent CPU, so that even a fast web server can be kept busy using a slow connection.

Countermeasures

One may ask, can this problem be fixed by using a modern hash function like SHA256? Unfortunately, this will not solve the problem, because hash tables are usually very small, like $latex 2^{32}$ entries, and one can still find a lot of values where SHA256(m) maps to the same value modulus $latex 2^{32}$. Also fixing this in the parser for the parameters string seems to be a bad solution, because many programmers use hash tables to store data retrieved from a user or another insecure source. From my point of view (which is also the opinion of the speakers), the best solution is to use randomized hash functions for hash tables. However, there are several kind of fixes deployed by the vendors:

Limiting HTTP POST and GET request lengths (Microsoft ASP.NET)

Microsoft suggests to limit the HTTP request length. Most applications don’t require very long HTTP requests, except for file uploads.

<configuration>
 <system.web>
 <httpRuntime maxRequestLength="200”/>
 </system.web>
</configuration>

As long as you don’t process data in a hash table from any other sources, except from the HTTP request (like external URLs), this should prevent the basic form of the attack.

Microsoft has also released an emergency patch against this attack: http://technet.microsoft.com/en-us/security/bulletin/ms11-100.mspx

Limiting the number of different HTTP request parameters (PHP, Tomcat)

PHP has added a new configuration variable max_input_vars, that can limit the number of parameters. This is similar to the first solution, except that here, not the total length of the request is limited, instead the number of different parameters that can be submitted in a single request is limited. This can be easier to use than limiting the length of the HTTP request, because application programmers usually know, how many parameters they need, while the maximum length of the request is harder to predict. However, if a single request parameter contains a data structure, that is later parsed and put into a hash table, the application might still be vulnerable.

Apache Tomcat also deployed a similar fix, that limits the number of parameters to 10000 by default.

Using different data structures

As mentioned before, other data structures (binary tree type structures like an AVL-tree gurantee, that all operations are always executed in $latex O(\log(n))$, and never need $latex O(n)$ time. This is what for example Daniel J. Bernstein suggests. This completely fixes the vulnerability, but requires fundamental changes to the runtime/compiler/interpreter, and might be harder to implement.

More Information

More information can be found on Twitter, using the hashtag #hashDoS or from the user @hashDoS!

oCERT has a good summary about affected and fixed software versions: http://www.ocert.org/advisories/ocert-2011-003.html

Video and Paper

The video of the talk is available on YouTube. There is an advisory on full disclosure that describes the attack in detail: http://permalink.gmane.org/gmane.comp.security.full-disclosure/83694

 

Encrypted Traffic Mining (TM) – e.g. Leaks in Skype

Stefan Burschka presented a nice attack against Skype on 28C3. The attack allows you to detect a sentence or a sequence of words in an encrypted Skype call, without having to break the cryptography used in Skype. Skype is a famous internet telephony application, that offers free voice calls. To save bandwidth, Skype uses an advanced audio codec. The used bandwidth and the size of the packets send by Skype depends heavily on the audio, that is encoded. The packet length is no hidden by the encryption of Skype and visible to an eavesdropper.  Just using these information, Burschka could distinguish between difference sentences in an encrypted Skype call.

Two methods were used to improve these results:

  • Dynamic Time Warping (DTW) is an algorithm used by many speech recognition systems, to compare an audio sample with a set of reference samples of slightly different length.
  • A Kalman Filter, which is a method to remove noise from measurements. I think here, noise is a slight variation of the packet lengths.

That the amount of communication between two parties and the actual timing of packets can leak information about the content of an encrypted communication was known for a long time, and has been successfully used for an attack against the TOR onion router network. However, this is the first time, that I have seen this method applied against Skype. The methods here are so generic, that they might be applied as well against other VoIP systems like SIP, if they are encrypted or communicate over a VPN.

The full paper is available at http://www.csee.usf.edu/~labrador/Share/Globecom/DATA/01-038-02.PDF and the slides can be downloaded from http://events.ccc.de/congress/2011/Fahrplan/attachments/1985_CCC.pptx.