[cdwg] Lustre 2.5 Development Planning

Tue Jun 4 07:22:25 PDT 2013

> >LU-3406 - merge raid5-mmp-unplug patch upstream
> >LU-2442 - quota scaling improvements. Need to be pushed
> >LU-3305   upstream
> 
> These depend on the willingness of the upstream patch maintainers to
> accept. Definitely something to track, but no guarantee of completion in 2.5.

True it could be awhile before these patches get merged upstream. We
have to be persistent otherwise it will get dropped. 

> > LU-684 - dev_rdonly patch is replaced by linux fail frame work.
> 
> This one doesn't have any existing patches, so needs new development work.

I started to play with some code locally. Its pretty easy but it does
require you to build a kernel with CONFIG_FAIL_MAKE_REQUEST. I put some
notes in the JIRA ticket about how to use it. With test shots for 2.4 I
haven't got to really testing it yet. Now I will have some free time to
play with this again.

> >and for the project I like to work on related to ldiskfs is to make
> >ldiskfs patch less against the tip of Linus tree. Anything that will
> >not be pushed upstream will be moved into osd-ldiskfs. No JIRA ticket
> >for this work yet.
> 
> I'm definitely in support of this.  One of the main patches that needs work
> to be accepted upstream is the large_xattr patch.  Please see LU-908 for
> what needs to be done before this patch can land upstream.
> 
> The other major file system feature that is not yet landed upstream is the
> dirdata feature, used for FID-in-dirent.  This doesn't have a lot of appeal
> to non-Lustre users today, but there may be a way to get this included
> upstream as part of an "attributes in dirent" feature that Ted discussed at
> one time.  I'm not sure if he is working on that, but I can ask him.

Do you a mailing lists links to these attribute discussions? It would be
nice to see the frame work so we can move to it. Patch less ldiskfs is
also a big task so we might not finish by the 2.5 release.

> >LNET work
> >---------------------------------------------------------------
> >LU-2456 - Dynamic LNET config support
> >LU-2950 - LNET route config
> >LU-2466 - LNET hash tables
> >LU-2934 - Router Priority
> >
> >Enable LNET to process its own checksums and do hand shaking with
> >the ptlrpc layer. No JIRA ticket for this yet.
> 
> Could you please explain this more?  What is the benefit of adding another
> layer of checksumming at the LNEt level vs. the existing Lustre-level
> checksums, except overhead?

Their are companies that are using LNET for more than Lustre. These
other non Lustre software products have a need to guarantee the data
over the fabric as well. Lustre itself only does check summing for
bulk messages and ignores small messages which also can suffer
corruption. I agree if we already have a check sum done by the Lustre
layer then we have no need to redo it. That is what I'm referring
to by hand shaking. If done already don't go again.

> >**********************************************************
> >LNET change I would like so see these in 2.4.1 if possible as well :-)
> >**********************************************************
> >LU-2212 - add crc32c module loading to libcfs
> 
> There is no objection to this patch landing, except that nobody reported
> that the patch for that bug actually said that the patch actually fixed
> the problem for them.

I don't know what happen to JNET2000 but this patch makes my life a
little easier. Without this patch I have to manually modprobe the crc32c
module before starting lustre on my Cray test bed compute nodes. It
auto-magically happens with this patch.

> >Wish list work for myself that I most likely will not have ready for
> >2.5.X
> >
> >Enable compression of LNET traffic.
> 
> Is there an expectation that this will improve throughput under normal
> usage, or would this only be good for WAN data transfers?  As it is, the
> clients are already using considerable CPU for data handling, so I could
> only see this helping if the client data compression went all the way to
> the disk (i.e. it is compressed at the Lustre client, saved to disk in
> compressed form, and then decompressed again at the Lustre client), not
> at the LNET level.

Exactly what I was thinking. Both ZFS and btrfs support transparent
compression. Also we have the e2compr project so ldiskfs could also
support compression in place. As you pointed out this would be plus for
WAN data transfers. At the same time it is expensive and should be not
be used unless needed or your client happens to have the compute power.
In the case of clients with Intel Phi cards or GPUs we now have the
native CPUs often idle. 

> >Fix up Lustre so it can be built with llvm. First step to compile some
> >of the more cpu intensive code in lustre as TSGI code to be executed by
> >the GPU.
> 
> This is theoretically interesting, but I'm not sure if there is any piece
> of Lustre code which would actually benefit from GPU offloading.  I think
> that is only useful for CPU-intensive code that is run in a tight loop.
> It would also suffer if the code is doing data access, since it would need
> to do all the data access over the PCI bus in addition to the existing two
> network<->CPU<->storage transfers.

Today we have the PCI bus access penalty but that will be going away
over the next few years. For example AMD has huma in the works.

http://www.theregister.co.uk/2013/05/01/amd_huma

As for what code would I targeted. Well the compression code :-)

> Given that Lustre does auto-negotiation of the best checksum algorithms
> between the client and OST to use the hardware CRC support of the CPU, do
> you have any candidates that might benefit from this?

Besides compression I can see encryption handling also benefiting. It is
true check summing also could benefit. The most common use case for GPU
offloading that already has been done is software raid but I have no
idea if LAID will ever become a reality anymore.