[cdwg] Alternative non accelerated checksums

Wed Jun 26 05:23:31 PDT 2013

> >        Besides the core algorithms I have added a few of my own to
> > see how they measure up. We have csum which is your normal IP Header
> > check sum.
> 
> The IP header checksum is only 16-bit, so it is only suitable for a small
> amount of data. It is definitely not suitable for 1MB or 4MB RPC sizes. 

The size is just one factor making it vulnerable to collisions. The data
position in the stream is not taking into consideration. Their are many
papers on its weakness.

> > For the non cryptographic hashes it's the IP check sum and murmur3
> > that does the best. This version of murmur3 only generates 32 check
> > sums but their exist a 128 bit version that is suppose to be faster.
> > It could be worth while to explore. The IP check sum from the linux
> > kernel is assembly optimized but my additional algorithms are generic
> > C.
> 
> You should test with the kernel cryptoapi code, since AFAIK there are
> assembly versions of the common algorithms already. Check out how the 
> libcfs code is already handling the crc32 code - it benchmarks each 
> algorithm at startup and dumps the results in the Lustre debug log.  

Currently the most useful optimized algorithms are in the 3.10-rcX
kernels. I have only run lustre up to a linux 3.9.4 kernel so far. My
first runs at collect data has been with the Lustre debug logs as
well as tcrypt.

> > The final question is the Lustre community interested in the new
> > algorithms? If so I can push forward that work.
> 
> I'm not against it if there are significant improvements to be had.

Out of the test algorithms I implemented I would say siphash might be
interesting to the UI guys. 

> It surprises me that newer CPUs do not have hardware-accelerated checksums
> of some sort.  Is it just that the assembly versions have not been implemented
> in the kernels that Lustre is running on?  Could they be implemented in libcfs
> as was done with crc32 and then submitted to the upstream kernel (so everyone
> benefits and we don't have to maintain them forever)?

We have some production systems that only support up to SSE3 so we can't
take advantage of any hardware acceleration for crc32 or crc32c. Down
the line I will most likely need to implement this.