MD5 and SHA-1 are compromised. Those shall not be used unless their speed is several times slower than SHA-256 or SHA-512. Other that remain are SHA-256 and SHA-512. They are from SHA-2 family and are much more secure. SHA-256 is computed with 32-bit words, SHA-512 with 64-bit words.
Hash implementations
For generating cryptographic hashes in Java there is Apache Commons Codec library which is very convenient.
Speed performance
In order to test the speed sample code is used:
import java.util.UUID;import org.apache.commons.codec.digest.DigestUtils;import org.apache.commons.lang.time.StopWatch;public class Test { private static final int TIMES = 1_000_000; private static final String UUID_STRING = UUID.randomUUID().toString(); public static void main(String[] args) { System.out.println(generateStringToHash()); System.out.println("MD5: " + md5()); System.out.println("SHA-1: " + sha1()); System.out.println("SHA-256: " + sha256()); System.out.println("SHA-512: " + sha512()); } public static long md5() { StopWatch watch = new StopWatch(); watch.start(); for (int i = 0; i < TIMES; i++) { DigestUtils.md5Hex(generateStringToHash()); } watch.stop(); System.out.println(DigestUtils.md5Hex(generateStringToHash())); return watch.getTime(); } public static long sha1() { ... System.out.println(DigestUtils.sha1Hex(generateStringToHash())); return watch.getTime(); } public static long sha256() { ... System.out.println(DigestUtils.sha256Hex(generateStringToHash())); return watch.getTime(); } public static long sha512() { ... System.out.println(DigestUtils.sha512Hex(generateStringToHash())); return watch.getTime(); } public static String generateStringToHash() { return UUID.randomUUID().toString() + System.currentTimeMillis(); }}
Several measurements were done. Two groups - one with smaller length string to hash and one with longer. Each group had following variations of generateStringToHash() method:
cached UUID - no extra time should be consumed
cached UUID + current system time - in this case, time is consumed to get system time
new UUID + current system time - in this case, time is consumed for generating the UUID and to get system time
Raw results
Five measurements were made for each case an average value calculated. Time is in milliseconds per 1 000 000 calculations. The system is 64 bits Windows 10 with 1 core Intel i7 2.60GHz and 16GB RAM.
generateStringToHash() with: return UUID_STRING;
Data to encode is ~36 characters in length (f5cdcda7-d873-455f-9902-dc9c7894bee0). UUID is cached and time stamp is not taken. No additional time is wasted.
Data to encode is ~49 characters in length (1af4a3e1-1d92-40e7-8a74-7bb7394211e01468216765464). New UUID is generated on each calculation so time for its generation is included in total time.
Data to encode is ~72 characters in length (57149cb6-991c-4ffd-9c98-d823ee8a61f757149cb6-991c-4ffd-9c98-d823ee8a61f7). UUID is cached and time stamp is not taken. No additional time is wasted.
Data to encode is ~85 characters in length (2734b31f-16db-4eba-afd5-121d0670ffa72734b31f-16db-4eba-afd5-121d0670ffa71468217683040). New UUID is generated on each calculation so time for its generation is included in total time.
Hash
#1 (ms)
#2 (ms)
#3 (ms)
#4 (ms)
#5 (ms)
Average per 1M (ms)
MD5
1753
1757
1739
1751
1691
1738.2
SHA-1
1634
1634
1627
1634
1633
1632.4
SHA-256
1962
1956
1988
1988
1924
1963.6
SHA-512
1909
1946
1936
1929
1895
1923
Aggregated results
Results from all iterations are aggregated and compared in the table below. There are 6 main cases. They are listed below and referenced in the table:
Case 1 - 36 characters length string, UUID is cached
Case 2 - 49 characters length string, UUID is cached and system time stamp is calculated each iteration
Case 3 - 49 characters length string, new UUID is generated on each iteration and system time stamp is calculated each iteration
Case 4 - 72 characters length string, UUID is cached
Case 5 - 85 characters length string, UUID is cached and system time stamp is calculated each iteration
Case 6 - 85 characters length string, new UUID is generated on each iteration and system time stamp is calculated each iteration
All times below are per 1 000 000 calculations:
Hash
Case 1 (ms)
Case 2 (ms)
Case 3 (ms)
Case 4 (ms)
Case 5 (ms)
Case 6 (ms)
MD5
627.4
765.6
1488.8
839
1029.4
1738.2
SHA-1
604
748.2
1325
916.8
1009.6
1632.4
SHA-256
737.8
851
1504.4
1168.2
1260
1963.6
SHA-512
1056.4
1158.8
1837.4
1118.4
1227.4
1923
Compare results
Some conclusions of the results based on two cases with short string (36 and 49 chars) and longer string (72 and 85 chars).
SHA-256 is faster with 31% than SHA-512 only when hashing small strings. When the string is longer SHA-512 is faster with 2.9%.
Time to get system time stamp is ~121.6 ms per 1M iterations.
Time to generate UUID is ~670.4 ms per 1M iterations.
SHA-1 is fastest hashing function with ~587.9 ms per 1M operations for short strings and 881.7 ms per 1M for longer strings.
MD5 is 7.6% slower than SHA-1 for short strings and 1.3% for longer strings.
SHA-256 is 15.5% slower than SHA-1 for short strings and 23.4% for longer strings.
SHA-512 is 51.7% slower that SHA-1 for short strings and 20% for longer.
Hash sizes
Important data to consider is hash size that is produced by each function:
In specific case this research was made for hashed string will be passed as API request. It is constructed from API Key + Secret Key + current time in seconds. So if API Key is something like 15-20 chars, Secret Key is 10-15 chars and time is 10 chars, total length of string to hash is 35-45 chars. Since it is being passed as request param it is better to be as short as possible.
Select hash function
Based on all data so far SHA-256 is selected. It is from secure SHA-2 family. It is much faster than SHA-512 with shorter stings and it produces 64 chars hash.
Conclusion
The current post gives a comparison of MD5, SHA-1, SHA-256 and SHA-512 cryptographic hash functions. Important is that comparison is very dependant on specific implementation (Apache Commons Codec), the specific purpose of use (generate a secure token to be sent with API call). It is good MD5 and SHA-1 to be avoided as they are compromised and not secure. If their speed for given context is several times faster than secure SHA-2 ones and security is not that much important they can be chosen though. When choosing cryptographic hash function everything is up to a context of usage and benchmark tests for this context is needed.