[cdwg] Alternative non accelerated checksums

James A Simmons uja at ornl.gov
Fri Jun 21 07:31:29 PDT 2013


Hello

	Recently I have been doing an evaluation of the performance of check
summing in the Lustre code base. From the results you can see it can be
very expensive when you lack hardware acceleration. Unfortunately
we have some platforms were their is a high cost but check summing
is a requirement. Since the best case for non accelerated is not 
good enough I have looked into other hashing type algorithms to see
if they can meet the communities needs.
        Besides the core algorithms I have added a few of my own to
see how they measure up. We have csum which is your normal IP Header
check sum. A 32 bit version of murmur3 was implemented. I used the
jhash.h from linux to implement jenkins and lastly siphash was
implemented. Siphash author claims it to be more secure algorithm
in the same category as md5 but with much better speeds. I have tested
against four systems. The first set of test is for non cryptographic
hashes and the second is for the cryptographic set.

AMD Opteron(tm) Processor 6168 @ 1.9GHz stepping 1
 
Lustre: Crypto hash algorithm adler32 speed = 754 MB/s
Lustre: Crypto hash algorithm csum speed = 1834 MB/s
Lustre: Crypto hash algorithm murmur speed = 1093 MB/s
Lustre: Crypto hash algorithm jenkins speed = 763 MB/s

Lustre: Crypto hash algorithm md5 speed = 306 MB/s
Lustre: Crypto hash algorithm sha1 speed = 123 MB/s
Lustre: Crypto hash algorithm siphash speed = 593 MB/s

AMD Opteron(TM) Processor 6274 @ 2.2GHz stepping 2

Lustre: Crypto hash algorithm adler32 speed = 670 MB/s
Lustre: Crypto hash algorithm csum speed = 802 MB/s
Lustre: Crypto hash algorithm murmur speed = 1511 MB/s
Lustre: Crypto hash algorithm jenkins speed = 614 MB/s

Lustre: Crypto hash algorithm md5 speed = 309 MB/s
Lustre: Crypto hash algorithm sha1 speed = 118 MB/s
Lustre: Crypto hash algorithm siphash speed = 669 MB/s

Intel(R) Xeon(R) CPU E5520  @ 2.27GHz stepping 05

Lustre: Crypto hash algorithm adler32 speed = 836 MB/s
Lustre: Crypto hash algorithm csum speed = 3639 MB/s
Lustre: Crypto hash algorithm murmur speed = 1261 MB/s
Lustre: Crypto hash algorithm jenkins speed = 766 MB/s

Lustre: Crypto hash algorithm md5 speed = 265 MB/s
Lustre: Crypto hash algorithm sha1 speed = 108 MB/s
Lustre: Crypto hash algorithm sha256 speed = 75 MB/s
Lustre: Crypto hash algorithm sha384 speed = 105 MB/s
Lustre: Crypto hash algorithm sha512 speed = 105 MB/s
Lustre: Crypto hash algorithm siphash speed = 570 MB/s

Intel(R) Xeon(R) CPU E5-2603 0 @ 1.80GHz stepping 07

Lustre: Crypto hash algorithm adler32 speed = 1476 MB/s
Lustre: Crypto hash algorithm csum speed = 1769 MB/s
Lustre: Crypto hash algorithm murmur speed = 1243 MB/s
Lustre: Crypto hash algorithm jenkins speed = 956 MB/s

Lustre: Crypto hash algorithm md5 speed = 277 MB/s
Lustre: Crypto hash algorithm sha1 speed = 118 MB/s
Lustre: Crypto hash algorithm sha256 speed = 75 MB/s
Lustre: Crypto hash algorithm sha384 speed = 112 MB/s
Lustre: Crypto hash algorithm sha512 speed = 114 MB/s
Lustre: Crypto hash algorithm siphash speed = 562 MB/s

>From the data so far you can see when you compare siphash to shaX and
md5 it is a clear winning in performance for the cryptographic hashes.
For the non cryptographic hashes it's the IP check sum and murmur3
that does the best. This version of murmur3 only generates 32 check
sums but their exist a 128 bit version that is suppose to be faster.
It could be worth while to explore. The IP check sum from the linux
kernel is assembly optimized but my additional algorithms are generic
C. If done right we could speed up murmur3.

The final question is the Lustre community interested in the new
algorithms? If so I can push forward that work.



More information about the cdwg mailing list