[cdwg] Lustre release notes

John Carrier carrier at cray.com
Thu Jul 12 07:30:40 PDT 2012


During the course of several discussions with OpenSFS members since LUG,
I have come to the following understanding of how Whamcloud manages
Lustre releases.  I prepared the notes below for my team in Cray.  Peter
and Pam thought these notes might also help provide a baseline for
further discussion within the cdwg.

Errors or omissions are mine -- please send corrections and comments to
this reflector.

thanks,

--jc

Whamcloud produces two Lustre release streams: a feature release every
6 months and a maintenance release every 3 months.  Feature releases are
new branches off of the master codeline that include all new code and
the bugfixes accumulated during the previous 6 months. Maintenance
releases apply bugfixes to the existing maintenance branch, which is
currently Lustre 2.1.

Whamcloud documents the release roadmap for both streams through its
"Community Lustre Roadmap" wiki [1].  Feature landings are projected
based on code Whamcloud produces for NRE contracts as well as code
received from 3rd parties.  These features and expected landings are
detailed on a "Development in Progress" wiki [2].

OpenSFS currently has two contracts with Whamcloud.  The first is a
development contract to produce features intended to improve Lustre
metadata performance. This contract stipulates that all features must
land in the master codeline (aka the canonical tree).  The second
contract shares support with Whamcloud for the release manager and
gatekeeper positions for the feature releases.

It is this latter contract which has caused confusion within the
Community.  The contract states that

   Ongoing Lustre development requires periodic Lustre feature
   releases which are fully tested and qualified Lustre releases
   featuring the latest code (both features and bugfixes) from the
   master codeline (git://git.opensfs.org/fs/lustre-release.git).
   It is expected that feature releases should occur roughly every
   six months. These are distinct from Lustre maintenance
   releases, which are funded by support contracts.

Nonetheless, perhaps because the title of contract is "Lustre
Development Community Tree Maintenance", the expectation among OpenSFS
members has been that bugfixes found after a feature release would be
applied back to the feature release branch where the bugs were found.
This level of maintenance is not, however, what Whamcloud agreed to
provide with this contract.

Landing bugfixes is a two-step process. Whamcloud lands all bugfixes
against the master codeline.  They then backport these patches to the
maintenance branch.  Because it takes considerable resources (both
people and hardware time) to backport the patches to a feature branch,
test the new code and release a maintenance update, Whamcloud only
produces these updates when it is viable to do so.  Subsequent feature
releases will incorporate all bugfixes and feature landings to the
master codeline at the time the branch is made.

Because the Lustre 2 codeline has had substantial re-engineering work
and is only recently getting wider testing exposure to a greater variety
of hardware and production workloads, the initial 2.x releases have
uncovered a number of bugs, especially in the new client package (CLIO)
as well as interoperability with Lustre 1.8. There have been very few
bugs found in the new features themselves. Over time, as the Community
gains more experience and fixes more bugs in the Lustre2 codeline,
there should be fewer bugs found following each feature release.

For now, it will be easier for community members to key their production
releases of Lustre from a Whamcloud maintenance release since this will
include all of the bugfixes since the release branch was created.
Community members can pull the bugfixes and apply the patches to a
feature release themselves (as Cray and ORNL have done for Lustre 2.2),
but this requires tracking master for relevant updates that should be
applied to support their production use of the feature branch.

A more strategic solution is to do more testing of a feature release
candidate _before_ it is released.  Even if a Community member has no
interest in using a feature release in production, early testing with
pre-release versions of feature releases will help identify
instabilities created by the new feature with their workloads and
hardware before the release is official.

OpenSFS is working with Whamcloud to identify which future release
branch will become the new maintenance branch.

[1] http://wiki.whamcloud.com/display/PUB/Community+Lustre+Roadmap
[2] http://wiki.whamcloud.com/display/PUB/Lustre+Community+Development+in+Progress



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opensfs.org/pipermail/cdwg-opensfs.org/attachments/20120712/9b7cf352/attachment.htm>


More information about the cdwg mailing list