[cdwg] Lustre 2.5 Development Planning

Simmons, James A. simmonsja at ornl.gov
Wed Jun 5 11:06:43 PDT 2013


>>> > LU-684 - dev_rdonly patch is replaced by linux fail frame work.
>>>
>>> This one doesn't have any existing patches, so needs new development
>>>work.
>>
>>I started to play with some code locally. Its pretty easy but it does
>>require you to build a kernel with CONFIG_FAIL_MAKE_REQUEST. I put some
>>notes in the JIRA ticket about how to use it. With test shots for 2.4 I
>>haven't got to really testing it yet. Now I will have some free time to
>>play with this again.

>I expect it might also be possible to build the dm-fail module against an
>existing kernel?  I don't think we'll have any control over which options
>the vendor kernels will be built with.

       We could use the dm-flaky module but that puts the requirement of always
using device mapper and you need some userland setup to create special
targets on top of the default targets created and from there we use the special
target to perform the fail over. Of course we have other requirements with the
fault injection system as I posted in LU-684 ticket . Can you use device mapper
with loop back devices for the most basic testing? Since fail make request works
on the block layer I know it will work with  loop devices. Looking at flaky.c this
code is a lot like the fail make request code so its not a big challenge either way.
The cost either way chose to go is not coding but proper setup of the test bed.
       The only control over vendor options is if we build the kernels our selves
which we already do at this time. I see for the current RHEL6.4 kernel that
DM_FLAKY by enabled by default but you need currently a debug kernel 
for fail make request. So the question is how much pain do we want to suffer
in setup for testing? I think this discussion should be cross referenced to the
jira ticket.

>>Do you a mailing lists links to these attribute discussions? It would be
>>nice to see the frame work so we can move to it. Patch less ldiskfs is
>>also a big task so we might not finish by the 2.5 release.
>
>There was only a short discussion about this on the list:
>http://lists.openwall.net/linux-ext4/2012/08/11/8

Thank you for the link. From my web search nothing has moved in that
direction. Maybe I missed something.

>>> >**********************************************************
>>> >LNET change I would like so see these in 2.4.1 if possible as well :-)
>>> >**********************************************************
>>> >LU-2212 - add crc32c module loading to libcfs
>>>
>>> There is no objection to this patch landing, except that nobody reported
>>> that the patch for that bug actually said that the patch actually fixed
>>> the problem for them.
>>
>>I don't know what happen to JNET2000 but this patch makes my life a
>>little easier. Without this patch I have to manually modprobe the crc32c
>>module before starting lustre on my Cray test bed compute nodes. It
>>auto-magically happens with this patch.
>
>Could you please post this information into the Jira ticket.  If this patch
>fixes the problem for you, it should be landed for 2.4 and 2.1.

Sure can.

>> >Wish list work for myself that I most likely will not have ready for
>> >2.5.X
>> >
>> >Enable compression of LNET traffic.
>>
>> Is there an expectation that this will improve throughput under normal
>> usage, or would this only be good for WAN data transfers?  As it is, the
>> clients are already using considerable CPU for data handling, so I could
>> only see this helping if the client data compression went all the way to
>> the disk (i.e. it is compressed at the Lustre client, saved to disk in
>> compressed form, and then decompressed again at the Lustre client), not
>> at the LNET level.
>
>Exactly what I was thinking. Both ZFS and btrfs support transparent
>compression. Also we have the e2compr project so ldiskfs could also
>support compression in place.

>I don't think e2compr ever made it anywhere, and I haven't heard about it
>in many years.  In any case, there is an open question of whether the OSD
>filesystems can be told that they are getting already-compressed data.

This is a big project under the wish list heading, Again likely a 2.6+ thing.
Perhaps e2compr is something to look at? 

>>> >Fix up Lustre so it can be built with llvm. First step to compile some
>>> >of the more cpu intensive code in lustre as TSGI code to be executed by
>>> >the GPU.
>>>
>>> This is theoretically interesting, but I'm not sure if there is any
>>>piece
>>> of Lustre code which would actually benefit from GPU offloading.  I
>>>think
>>> that is only useful for CPU-intensive code that is run in a tight loop.
>>> It would also suffer if the code is doing data access, since it would
>>>need
>>> to do all the data access over the PCI bus in addition to the existing
>>>two
>>> network<->CPU<->storage transfers.
>>
>>Today we have the PCI bus access penalty but that will be going away
>>over the next few years. For example AMD has huma in the works.
>>
>>http://www.theregister.co.uk/2013/05/01/amd_huma
>>
>>As for what code would I targeted. Well the compression code :-)
>
>This puts this firmly into the "some day when it is ready" category, and
>not really "ready for 2.5"...

Agree If you look at the earlier emails I labeled it the wish list  section. Mind you
making lustre build again llvm would not be a huge barrier. One nice out come
of support llvm is that it is the back end of clang and that puts us in a position 
to handle the case if gcc is ever replaced by clang. Plus several years from
now it could be that GPU cores could be booting a linux OS. We just don't know
but it never hurts to be ready. I will spending a lot of cycles on other things
so most likely llvm work is a 2.6+ thing.

>>Besides compression I can see encryption handling also benefiting. It is
>>true check summing also could benefit. The most common use case for GPU
>>offloading that already has been done is software raid but I have no
>>idea if LAID will ever become a reality anymore.

>Lustre hashing is already done by using the kernel cryptoapi, and I don't
>think it makes sense to expose Lustre to the details of the implementation.
>If there is a fast GPU-based crypto or hashing code then it should be added
>as a cryptoapi module and every kernel component can benefit.  Similarly,
>if clients require this functionality without a GPU, they need to do this in
>software and cryptoapi is expected to have the most efficient assembly
>versions of the various algorithms.

You have a very good point about pushing such work upstream than with
the lustre source.


More information about the cdwg mailing list