[LSF/MM/BPF TOPIC] Lustre filesystem upstreaming

public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed

* [LSF/MM/BPF TOPIC] Lustre filesystem upstreaming
@ 2025-01-24 20:50 Day, Timothy
  2025-01-24 21:09 ` Matthew Wilcox
  2025-01-28  6:14 ` Christoph Hellwig
  0 siblings, 2 replies; 17+ messages in thread
From: Day, Timothy @ 2025-01-24 20:50 UTC (permalink / raw)
  To: lsf-pc@lists.linux-foundation.org
  Cc: linux-fsdevel@vger.kernel.org, jsimmons@infradead.org,
	Andreas Dilger, neilb@suse.de

Lustre is a high-performance parallel filesystem used for HPC
and AI/ML compute clusters available under GPLv2. Lustre is
currently used by 65% of the Top-500 (9 of Top-10) systems in
HPC [7]. Outside of HPC, Lustre is used by many of the largest
AI/ML clusters in the world, and is commercially supported by
numerous vendors and cloud service providers [1].

After 21 years and an ill-fated stint in staging, Lustre is still
maintained as an out-of-tree module [6]. The previous upstreaming
effort suffered from a lack of developer focus and user adoption,
which eventually led to Lustre being removed from staging
altogether [2].

However, the work to improve Lustre has continued regardless. In
the intervening years, the code improvements that previously
prevented a return to mainline have been steadily progressing. At
least 25% of patches accepted for Lustre 2.16 were related to the
upstreaming effort [3]. And all of the remaining work is
in-flight [4][5][8].

Our eventual goal is to a get both the Lustre client and server
(on ext4/ldiskfs) along with at least TCP/IP networking to an
acceptable quality before submitting to mainline. The remaining
network support would follow soon afterwards.

I propose to discuss:

- As we alter our development model [8] to support upstream development,
  what is a sufficient demonstration of commitment that our model works?
- Should the client and server be submitted together? Or split?
- Expectations for a new filesystem to be accepted to mainline
- How to manage inclusion of a large code base (the client alone is
  200kLoC) without increasing the burden on fs/net maintainers

Lustre has already received a plethora of feedback in the past.
While much of that has been addressed since - the kernel is a
moving target. Several filesystems have been merged (or removed)
since Lustre left staging. We're aiming to avoid the mistakes of
the past and hope to address as many concerns as possible before
submitting for inclusion.

Thanks!

Timothy Day (Amazon Web Services - AWS)
James Simmons (Oak Ridge National Labs - ORNL)

[1] Wikipedia: https://en.wikipedia.org/wiki/Lustre_(file_system)#Commercial_technical_support
[2] Kicked out of staging: https://lwn.net/Articles/756565/
[3] This is a heuristic, based on the combined commit counts of
    ORNL, Aeon, SuSe, and AWS - which have been primarily working
    on upstreaming issues: https://youtu.be/BE--ySVQb2M?si=YMHitJfcE4ASWQcE&t=960
[4] LUG24 Upstreaming Update: https://www.depts.ttu.edu/hpcc/events/LUG24/slides/Day1/LUG_2024_Talk_02-Native_Linux_client_status.pdf
[5] Lustre Jira Upstream Progress: https://jira.whamcloud.com/browse/LU-12511
[6] Out-of-tree codebase: https://git.whamcloud.com/?p=fs/lustre-release.git;a=tree
[7] Graph: https://8d118135-f68b-475d-9b6d-ef84c0db1e71.usrfiles.com/ugd/8d1181_bb8f9405d77a4e2bad53531aa94e8868.pdf
[8] Project Wiki: https://wiki.lustre.org/Upstream_contributing

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [LSF/MM/BPF TOPIC] Lustre filesystem upstreaming
  2025-01-24 20:50 [LSF/MM/BPF TOPIC] Lustre filesystem upstreaming Day, Timothy
@ 2025-01-24 21:09 ` Matthew Wilcox
  2025-01-24 22:56   ` NeilBrown
  2025-01-25  6:33   ` Day, Timothy
  2025-01-28  6:14 ` Christoph Hellwig
  1 sibling, 2 replies; 17+ messages in thread
From: Matthew Wilcox @ 2025-01-24 21:09 UTC (permalink / raw)
  To: Day, Timothy
  Cc: lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org,
	jsimmons@infradead.org, Andreas Dilger, neilb@suse.de

On Fri, Jan 24, 2025 at 08:50:02PM +0000, Day, Timothy wrote:
> Lustre has already received a plethora of feedback in the past.
> While much of that has been addressed since - the kernel is a
> moving target. Several filesystems have been merged (or removed)
> since Lustre left staging. We're aiming to avoid the mistakes of
> the past and hope to address as many concerns as possible before
> submitting for inclusion.

I'm broadly in favour of adding Lustre, however I'd really like it to not
increase my workload substantially.  Ideally it would use iomap instead of
buffer heads (although maybe that's not feasible).

What's not negotiable for me is the use of folios; Lustre must be
fully converted to the folio API.  No use of any of the functions in
mm/folio-compat.c.  If you can grep for 'struct page' in Lustre and
find nothing, that's a great place to be (not all filesystems in the
kernel have reached that stage yet, and there are good reasons why some
filesystems still use bare pages).

Support for large folios would not be a requirement.  It's just a really
good idea if you care about performance ;-)

I hope it doesn't still use ->writepage.  We're almost rid of it in
filesystems.

Ultimately, I think you'll want to describe the workflow you see Lustre
adopting once it's upstream -- I've had too many filesystems say to me
"Oh, you have to submit your patch against our git tree and then we'll
apply it to the kernel later".  That's not acceptable; the kernel is
upstream, not your private git tree.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [LSF/MM/BPF TOPIC] Lustre filesystem upstreaming
  2025-01-24 21:09 ` Matthew Wilcox
@ 2025-01-24 22:56   ` NeilBrown
  2025-01-25  6:33   ` Day, Timothy
  1 sibling, 0 replies; 17+ messages in thread
From: NeilBrown @ 2025-01-24 22:56 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Day, Timothy, lsf-pc@lists.linux-foundation.org,
	linux-fsdevel@vger.kernel.org, jsimmons@infradead.org,
	Andreas Dilger

On Sat, 25 Jan 2025, Matthew Wilcox wrote:
> 
> Ultimately, I think you'll want to describe the workflow you see Lustre
> adopting once it's upstream -- I've had too many filesystems say to me
> "Oh, you have to submit your patch against our git tree and then we'll
> apply it to the kernel later".  That's not acceptable; the kernel is
> upstream, not your private git tree.
> 
> 

While I generally agree with your sentiment, I think there is more
nuance in the details than you portray.
With nfsd, for example, I can sometimes submit patches against mainline
but sometimes need to submit against nfsd-next or even nfsd-testing if
someone else has been working in the same area.  And it may well be a
couple of releases "later" that it lands in Linus' kernel - though that
isn't the norm.

But certainly we need to be clear about the workflow - not least within
the lustre community who are used to a very different work-flow and will
need to learn.

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [LSF/MM/BPF TOPIC] Lustre filesystem upstreaming
  2025-01-24 21:09 ` Matthew Wilcox
  2025-01-24 22:56   ` NeilBrown
@ 2025-01-25  6:33   ` Day, Timothy
  1 sibling, 0 replies; 17+ messages in thread
From: Day, Timothy @ 2025-01-25  6:33 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org,
	jsimmons@infradead.org, Andreas Dilger, neilb@suse.de

> On 1/24/25, 4:09 PM, "Matthew Wilcox" <willy@infradead.org <mailto:willy@infradead.org>> wrote:
> > On Fri, Jan 24, 2025 at 08:50:02PM +0000, Day, Timothy wrote:
> > Lustre has already received a plethora of feedback in the past.
> > While much of that has been addressed since - the kernel is a
> > moving target. Several filesystems have been merged (or removed)
> > since Lustre left staging. We're aiming to avoid the mistakes of
> > the past and hope to address as many concerns as possible before
> > submitting for inclusion.
>
>
> I'm broadly in favour of adding Lustre, however I'd really like it to not
> increase my workload substantially. Ideally it would use iomap instead of
> buffer heads (although maybe that's not feasible).

The place Lustre uses buffer heads is osd-ldiskfs (the interface between
the Lustre server and ext4). And that's an artifact of ext4's usage of
buffer heads. I don’t see usage otherwise. The way the Lustre server
interfaces with ext4 is probably a bigger question.

> What's not negotiable for me is the use of folios; Lustre must be
> fully converted to the folio API. No use of any of the functions in
> mm/folio-compat.c. If you can grep for 'struct page' in Lustre and
> find nothing, that's a great place to be (not all filesystems in the
> kernel have reached that stage yet, and there are good reasons why some
> filesystems still use bare pages).
>
>
> Support for large folios would not be a requirement. It's just a really
> good idea if you care about performance ;-)

There's been some work towards folios, but nothing comprehensive i.e. we
still have a bunch of users of mm/folio-compat.c. I've seen some patches in-flight
for large folios, but nothing landed.

> I hope it doesn't still use ->writepage. We're almost rid of it in
> filesystems.

It's still there - I don't think anyone has seriously looked at how well
Lustre behaves without it.

> Ultimately, I think you'll want to describe the workflow you see Lustre
> adopting once it's upstream -- I've had too many filesystems say to me
> "Oh, you have to submit your patch against our git tree and then we'll
> apply it to the kernel later". That's not acceptable; the kernel is
> upstream, not your private git tree.

This is probably the biggest question. Staging didn't fare well with most
development happening out-of-tree. We have to rework the development
workflow to somehow generate patches against an actual kernel
tree versus our separate git tree. I have a high level idea of how we'd
get there - in terms of reorganizing and splitting the existing repo [1].

That's what I'd be most interested in discussing at LSF. If we change
the development model, how do we demonstrate the model is effective?

I guess another interesting question would be: has any other subsystem
or major driver undergone a transition like this before?

Tim Day

[1] https://wiki.lustre.org/Upstream_contributing

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [LSF/MM/BPF TOPIC] Lustre filesystem upstreaming
  2025-01-24 20:50 [LSF/MM/BPF TOPIC] Lustre filesystem upstreaming Day, Timothy
  2025-01-24 21:09 ` Matthew Wilcox
@ 2025-01-28  6:14 ` Christoph Hellwig
  2025-01-28 16:35   ` Day, Timothy
  1 sibling, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2025-01-28  6:14 UTC (permalink / raw)
  To: Day, Timothy
  Cc: lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org,
	jsimmons@infradead.org, Andreas Dilger, neilb@suse.de

On Fri, Jan 24, 2025 at 08:50:02PM +0000, Day, Timothy wrote:
> While much of that has been addressed since - the kernel is a
> moving target. Several filesystems have been merged (or removed)
> since Lustre left staging. We're aiming to avoid the mistakes of
> the past and hope to address as many concerns as possible before
> submitting for inclusion.

That's because they have a (mor eor less normal) development model
and a stable on-disk / on-the-wire protocol.

I think you guys needs to sort your internal mess out first.
Consolidate the half a dozend incompatible versions, make sure you
have a documented and stable on-disk version and don't require
all participants to run exactly the same version.  After that just
send patches just like everyone else.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [LSF/MM/BPF TOPIC] Lustre filesystem upstreaming
  2025-01-28  6:14 ` Christoph Hellwig
@ 2025-01-28 16:35   ` Day, Timothy
  2025-01-30 14:28     ` Theodore Ts'o
  0 siblings, 1 reply; 17+ messages in thread
From: Day, Timothy @ 2025-01-28 16:35 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org,
	jsimmons@infradead.org, Andreas Dilger, neilb@suse.de

> On 1/28/25, 1:14 AM, "Christoph Hellwig" <hch@infradead.org <mailto:hch@infradead.org>> wrote:
> > On Fri, Jan 24, 2025 at 08:50:02PM +0000, Day, Timothy wrote:
> > While much of that has been addressed since - the kernel is a
> > moving target. Several filesystems have been merged (or removed)
> > since Lustre left staging. We're aiming to avoid the mistakes of
> > the past and hope to address as many concerns as possible before
> > submitting for inclusion.
>
> That's because they have a (mor eor less normal) development model
> and a stable on-disk / on-the-wire protocol.
>
> I think you guys needs to sort your internal mess out first.
> Consolidate the half a dozend incompatible versions, make sure you
> have a documented and stable on-disk version and don't require
> all participants to run exactly the same version. After that just
> send patches just like everyone else.

The network and disk format is pretty stable at this point. All of the
Lustre versions released over the past 6 years (at least) interoperate
over the network just fine. I don't have personal experience with a
larger version difference - but the Lustre protocol negotiation is pretty
solid and I've heard of larger version gaps working fine.

For the disk format, Lustre uses a minimally patched ext4 for the
servers. That's well documented - although perhaps a bit odd
compared to NFS or SMB. The number of patches needed for
ext4 has decreased a lot over time. Convergence with regular
ext4 is feasible. But that's a deeper discussion with the ext4
developers, I think.

My biggest question for LSF is around development model:
Our current development model is still orthogonal to what
most other subsystems/drivers do. But as we evolve, how do
we demonstrate that our development model is reasonable?
Sending the initial patches is one thing. Convincing everyone
that the model is sustainable is another.

Tim Day

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [LSF/MM/BPF TOPIC] Lustre filesystem upstreaming
  2025-01-28 16:35   ` Day, Timothy
@ 2025-01-30 14:28     ` Theodore Ts'o
  2025-01-30 16:18       ` Day, Timothy
       [not found]       ` <4044F3FF-D0CE-4823-B104-0544A986DF7B@ddn.com>
  0 siblings, 2 replies; 17+ messages in thread
From: Theodore Ts'o @ 2025-01-30 14:28 UTC (permalink / raw)
  To: Day, Timothy
  Cc: Christoph Hellwig, lsf-pc@lists.linux-foundation.org,
	linux-fsdevel@vger.kernel.org, jsimmons@infradead.org,
	Andreas Dilger, neilb@suse.de

On Tue, Jan 28, 2025 at 04:35:46PM +0000, Day, Timothy wrote:
> My biggest question for LSF is around development model:
> Our current development model is still orthogonal to what
> most other subsystems/drivers do. But as we evolve, how do
> we demonstrate that our development model is reasonable?
> Sending the initial patches is one thing. Convincing everyone
> that the model is sustainable is another.

I suspect one of the reasons why most development is happening out-of-tree
is pretty much all of the users of Lustre are using distro (and very
often, Enterprise) kernels.  Are there any people outside of the core
Lustre team (most of whom are probably working for DDN?) that use
Lustre or can even test Lustre using the upstream kernel?

I'll let Andreas to comment further, but from my perspective, if we
want to upstreaming Lustre to be successful, perhaps one strategy
would be to make it easier for upstream users and developers to use
Lustre, perhaps in a smaller scale than what a typical DDN customer
would typically use.

Cheers,

					- Ted

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [LSF/MM/BPF TOPIC] Lustre filesystem upstreaming
  2025-01-30 14:28     ` Theodore Ts'o
@ 2025-01-30 16:18       ` Day, Timothy
  2025-01-30 16:56         ` Theodore Ts'o
       [not found]       ` <4044F3FF-D0CE-4823-B104-0544A986DF7B@ddn.com>
  1 sibling, 1 reply; 17+ messages in thread
From: Day, Timothy @ 2025-01-30 16:18 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Christoph Hellwig, lsf-pc@lists.linux-foundation.org,
	linux-fsdevel@vger.kernel.org, jsimmons@infradead.org,
	Andreas Dilger, neilb@suse.de

On 1/30/25, 9:28 AM, "Theodore Ts'o" <tytso@mit.edu <mailto:tytso@mit.edu>> wrote:
> On Tue, Jan 28, 2025 at 04:35:46PM +0000, Day, Timothy wrote:
> > My biggest question for LSF is around development model:
> > Our current development model is still orthogonal to what
> > most other subsystems/drivers do. But as we evolve, how do
> > we demonstrate that our development model is reasonable?
> > Sending the initial patches is one thing. Convincing everyone
> > that the model is sustainable is another.
>
> I suspect one of the reasons why most development is happening out-of-tree
> is pretty much all of the users of Lustre are using distro (and very
> often, Enterprise) kernels. Are there any people outside of the core
> Lustre team (most of whom are probably working for DDN?) that use
> Lustre or can even test Lustre using the upstream kernel?

Lustre has a lot of usage and development outside of DDN/Whamcloud [1][2].
HPE, AWS, SuSe, Azure, etc. And at least at AWS, we use Lustre on fairly
up-to-date kernels [3][4]. And I think this is becoming more common - although
I don't have solid data on that.

[1] https://en.wikipedia.org/wiki/Lustre_(file_system)#Commercial_technical_support
[2] https://youtu.be/BE--ySVQb2M?si=YMHitJfcE4ASWQcE&t=960
[3] AL2023 6.1 - https://github.com/amazonlinux/linux/commit/ef9660091712fa9edd137180b8925ea6316c8043
[4] AL2023 6.12 (Soon) - https://github.com/amazonlinux/linux/commits/amazon-6.12.y/mainline/

> I'll let Andreas to comment further, but from my perspective, if we
> want to upstreaming Lustre to be successful, perhaps one strategy
> would be to make it easier for upstream users and developers to use
> Lustre, perhaps in a smaller scale than what a typical DDN customer
> would typically use.

If we upstreamed the server alongside the client - it'd be easy enough
for upstream developers to setup a collocated Lustre client/server
and run xfstests at least. At some point (in the near-ish future), I want
to put together a patch series for xfstests/Lustre support.

And if you have dedicated hardware - setting up a small filesystem over
TCP/IP isn't much harder than an NFS server IMHO. Just a mkfs and
mount per storage target. With a single MDS and OSS, you only need two
disks. So I think we have everything we need to enable upstream
users/devs to use Lustre without too much hassle. I think it's mostly a
matter of documentation and scripting.

Tim Day

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [LSF/MM/BPF TOPIC] Lustre filesystem upstreaming
  2025-01-30 16:18       ` Day, Timothy
@ 2025-01-30 16:56         ` Theodore Ts'o
  2025-01-30 17:32           ` Day, Timothy
  0 siblings, 1 reply; 17+ messages in thread
From: Theodore Ts'o @ 2025-01-30 16:56 UTC (permalink / raw)
  To: Day, Timothy
  Cc: Christoph Hellwig, lsf-pc@lists.linux-foundation.org,
	linux-fsdevel@vger.kernel.org, jsimmons@infradead.org,
	Andreas Dilger, neilb@suse.de

On Thu, Jan 30, 2025 at 04:18:29PM +0000, Day, Timothy wrote:
> 
> Lustre has a lot of usage and development outside of DDN/Whamcloud
> [1][2].  HPE, AWS, SuSe, Azure, etc. And at least at AWS, we use
> Lustre on fairly up-to-date kernels [3][4]. And I think this is
> becoming more common - although I don't have solid data on that.

I agree that I am seeing more use/interest of Lustre in various Cloud
deployments, and to the extent that Cloud clients tend to use newer
Linux kernels (e.g., commonly, the the LTS from the year before) that
certainly does make them use kernels newer than a typical RHEL kernel.

It's probably inherent in the nature of cluster file systems that they
won't be of interest for home users who aren't going to be paying the
cost of a dozen or so Cloud VM's being up on a more-or-less continuous
basis.  However, the reality is that more likely than not, developers
who are most likely to be using the latest upstream kernel, or maybe
even Linux-next, are not going to be using cloud VM's.

> And if you have dedicated hardware - setting up a small filesystem over
> TCP/IP isn't much harder than an NFS server IMHO. Just a mkfs and
> mount per storage target. With a single MDS and OSS, you only need two
> disks. So I think we have everything we need to enable upstream
> users/devs to use Lustre without too much hassle. I think it's mostly a
> matter of documentation and scripting.

Hmm... would it be possible to set up a simple toy Lustre file system
using a single system running in qemu --- i.e., using something like a
kvm-xfstests[1] test appliance?  TCP/IP over loopback might be
interesting, if it's posssible to run the Lustre MDS, OSS, and client
on the same kernel.  This would make repro testing a whole lot easier,
if all someone had to do was run the command "kvm-xfstests -c lustre smoke".

[1] https://github.com/tytso/xfstests-bld/blob/master/Documentation/kvm-quickstart.md

							- Ted

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [LSF/MM/BPF TOPIC] Lustre filesystem upstreaming
  2025-01-30 16:56         ` Theodore Ts'o
@ 2025-01-30 17:32           ` Day, Timothy
  0 siblings, 0 replies; 17+ messages in thread
From: Day, Timothy @ 2025-01-30 17:32 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Christoph Hellwig, lsf-pc@lists.linux-foundation.org,
	linux-fsdevel@vger.kernel.org, jsimmons@infradead.org,
	Andreas Dilger, neilb@suse.de

On 1/30/25, 11:57 AM, "Theodore Ts'o" <tytso@mit.edu <mailto:tytso@mit.edu>> wrote:
> On Thu, Jan 30, 2025 at 04:18:29PM +0000, Day, Timothy wrote:
> >
> > Lustre has a lot of usage and development outside of DDN/Whamcloud
> > [1][2]. HPE, AWS, SuSe, Azure, etc. And at least at AWS, we use
> > Lustre on fairly up-to-date kernels [3][4]. And I think this is
> > becoming more common - although I don't have solid data on that.
>
>
> I agree that I am seeing more use/interest of Lustre in various Cloud
> deployments, and to the extent that Cloud clients tend to use newer
> Linux kernels (e.g., commonly, the the LTS from the year before) that
> certainly does make them use kernels newer than a typical RHEL kernel.
>
>
> It's probably inherent in the nature of cluster file systems that they
> won't be of interest for home users who aren't going to be paying the
> cost of a dozen or so Cloud VM's being up on a more-or-less continuous
> basis. However, the reality is that more likely than not, developers
> who are most likely to be using the latest upstream kernel, or maybe
> even Linux-next, are not going to be using cloud VM's.

I don't have a good sense of who's running absolute latest mainline or
linux-next. But agreed, I doubt there will be tons of home users of Lustre
post-upstreaming. Although, you can definitely play Counter Strike on
a home Lustre setup. I've personally validated that. :)

> > And if you have dedicated hardware - setting up a small filesystem over
> > TCP/IP isn't much harder than an NFS server IMHO. Just a mkfs and
> > mount per storage target. With a single MDS and OSS, you only need two
> > disks. So I think we have everything we need to enable upstream
> > users/devs to use Lustre without too much hassle. I think it's mostly a
> > matter of documentation and scripting.
>
> Hmm... would it be possible to set up a simple toy Lustre file system
> using a single system running in qemu --- i.e., using something like a
> kvm-xfstests[1] test appliance? TCP/IP over loopback might be
> interesting, if it's posssible to run the Lustre MDS, OSS, and client
> on the same kernel. This would make repro testing a whole lot easier,
> if all someone had to do was run the command "kvm-xfstests -c lustre smoke".
>
> [1] https://github.com/tytso/xfstests-bld/blob/master/Documentation/kvm-quickstart.md <https://github.com/tytso/xfstests-bld/blob/master/Documentation/kvm-quickstart.md>

Definitely possible. You can run all of the Lustre services on the same kernel.
I have Lustre working on a similar QEMU setup as part of Kent's ktest repo [1].
I use it to test/develop Lustre patches against mainline kernels - mostly for the
Lustre in-memory OSD (i.e. storage backend) [2]. So we can get a Lustre
development workflow that's pretty similar to the existing workflow for in-tree
filesystems, I think.

Tim Day

[1] https://github.com/koverstreet/ktest
[2] https://review.whamcloud.com/c/fs/lustre-release/+/55594


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Lsf-pc] [LSF/MM/BPF TOPIC] Lustre filesystem upstreaming
       [not found]       ` <4044F3FF-D0CE-4823-B104-0544A986DF7B@ddn.com>
@ 2025-01-31 22:11         ` Amir Goldstein
  2025-01-31 23:01           ` Day, Timothy
  0 siblings, 1 reply; 17+ messages in thread
From: Amir Goldstein @ 2025-01-31 22:11 UTC (permalink / raw)
  To: Andreas Dilger, Day, Timothy
  Cc: Theodore Ts'o, Christoph Hellwig,
	lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org,
	jsimmons@infradead.org, neilb@suse.de

On Fri, Jan 31, 2025 at 3:35 AM Andreas Dilger via Lsf-pc
<lsf-pc@lists.linux-foundation.org> wrote:
>
>
> As Tim mentioned, it is possible to mount a Lustre client (or two) plus one or
> more MDT/OST on a single ~3GB VM with loopback files in /tmp and run testing.
> There is a simple script we use to format and mount 4 MDTs and 4 OSTs on
> temporary loop files and mount a client from the Lustre build tree.
>
> There haven't been any VFS patches needed for years for Lustre to be run,
> though there are a number patches needed against a copied ext4 tree to
> export some of the functions and add some filesystem features.  Until the
> ext4 patches are merged, it would also be possible to run light testing with
> Tim's RAM-based OSD without any loopback devices at all (though with a
> hard limitation on the size of the filesystem).
>

Recommendation: if it is easy to setup loopback lustre server, then the best
practice would be to add lustre fstests support, same as nfs/afs/cifs can be
tested with fstests.

Adding fstests support will not guarantee that vfs developers will run fstest
with your filesystem, but if you make is super easy for vfs developers to
test your filesystem with a the de-facto standard for fs testing, then at least
they have an option to verify that their vfs changes are not breaking your
filesystem, which is what upstreaming is supposed to provide.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Lsf-pc] [LSF/MM/BPF TOPIC] Lustre filesystem upstreaming
  2025-01-31 22:11         ` [Lsf-pc] " Amir Goldstein
@ 2025-01-31 23:01           ` Day, Timothy
  2025-02-01 10:55             ` Amir Goldstein
  0 siblings, 1 reply; 17+ messages in thread
From: Day, Timothy @ 2025-01-31 23:01 UTC (permalink / raw)
  To: Amir Goldstein, Andreas Dilger
  Cc: Theodore Ts'o, Christoph Hellwig,
	lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org,
	jsimmons@infradead.org, neilb@suse.de

On 1/31/25, 5:11 PM, "Amir Goldstein" <amir73il@gmail.com <mailto:amir73il@gmail.com>> wrote:
> On Fri, Jan 31, 2025 at 3:35 AM Andreas Dilger via Lsf-pc
> <lsf-pc@lists.linux-foundation.org <mailto:lsf-pc@lists.linux-foundation.org>> wrote:
> >
> >
> > As Tim mentioned, it is possible to mount a Lustre client (or two) plus one or
> > more MDT/OST on a single ~3GB VM with loopback files in /tmp and run testing.
> > There is a simple script we use to format and mount 4 MDTs and 4 OSTs on
> > temporary loop files and mount a client from the Lustre build tree.
> >
> > There haven't been any VFS patches needed for years for Lustre to be run,
> > though there are a number patches needed against a copied ext4 tree to
> > export some of the functions and add some filesystem features. Until the
> > ext4 patches are merged, it would also be possible to run light testing with
> > Tim's RAM-based OSD without any loopback devices at all (though with a
> > hard limitation on the size of the filesystem).
>
>
> Recommendation: if it is easy to setup loopback lustre server, then the best
> practice would be to add lustre fstests support, same as nfs/afs/cifs can be
> tested with fstests.
>
>
> Adding fstests support will not guarantee that vfs developers will run fstest
> with your filesystem, but if you make is super easy for vfs developers to
> test your filesystem with a the de-facto standard for fs testing, then at least
> they have an option to verify that their vfs changes are not breaking your
> filesystem, which is what upstreaming is supposed to provide.

I was hoping to do exactly that. I've been able run to fstests on Lustre
(in an adhoc manner), but I wanted to put together a patch series to
add proper support. Would fstests accept Lustre support before Lustre
gets accepted upstream? Or should it be maintained as a separate
branch?

Tim Day


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Lsf-pc] [LSF/MM/BPF TOPIC] Lustre filesystem upstreaming
  2025-01-31 23:01           ` Day, Timothy
@ 2025-02-01 10:55             ` Amir Goldstein
  2025-02-01 13:59               ` Zorro Lang
  0 siblings, 1 reply; 17+ messages in thread
From: Amir Goldstein @ 2025-02-01 10:55 UTC (permalink / raw)
  To: Day, Timothy
  Cc: Andreas Dilger, Theodore Ts'o, Christoph Hellwig,
	lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org,
	jsimmons@infradead.org, neilb@suse.de, fstests, Zorro Lang

On Sat, Feb 1, 2025 at 12:01 AM Day, Timothy <timday@amazon.com> wrote:
>
> On 1/31/25, 5:11 PM, "Amir Goldstein" <amir73il@gmail.com <mailto:amir73il@gmail.com>> wrote:
> > On Fri, Jan 31, 2025 at 3:35 AM Andreas Dilger via Lsf-pc
> > <lsf-pc@lists.linux-foundation.org <mailto:lsf-pc@lists.linux-foundation.org>> wrote:
> > >
> > >
> > > As Tim mentioned, it is possible to mount a Lustre client (or two) plus one or
> > > more MDT/OST on a single ~3GB VM with loopback files in /tmp and run testing.
> > > There is a simple script we use to format and mount 4 MDTs and 4 OSTs on
> > > temporary loop files and mount a client from the Lustre build tree.
> > >
> > > There haven't been any VFS patches needed for years for Lustre to be run,
> > > though there are a number patches needed against a copied ext4 tree to
> > > export some of the functions and add some filesystem features. Until the
> > > ext4 patches are merged, it would also be possible to run light testing with
> > > Tim's RAM-based OSD without any loopback devices at all (though with a
> > > hard limitation on the size of the filesystem).
> >
> >
> > Recommendation: if it is easy to setup loopback lustre server, then the best
> > practice would be to add lustre fstests support, same as nfs/afs/cifs can be
> > tested with fstests.
> >
> >
> > Adding fstests support will not guarantee that vfs developers will run fstest
> > with your filesystem, but if you make is super easy for vfs developers to
> > test your filesystem with a the de-facto standard for fs testing, then at least
> > they have an option to verify that their vfs changes are not breaking your
> > filesystem, which is what upstreaming is supposed to provide.
>
> I was hoping to do exactly that. I've been able run to fstests on Lustre
> (in an adhoc manner), but I wanted to put together a patch series to
> add proper support. Would fstests accept Lustre support before Lustre
> gets accepted upstream? Or should it be maintained as a separate
> branch?
>

Up to the maintainer (CC) but in any case, you will need to maintain
a development branch until the fstests patches are reviewed, so I do
not see much difference for the process.

My own vote would be to merge lustre support to fstest *before*
merging lustre to linux-next tree (to fs-next branch), so that lustre
could potentially be tested by 3rd party when it hits linux-next.

IMO, if lustre is on track for upstreaming with all the open questions
addressed, I see no reason not to merge fstests support earlier.

I was going to recommend that you consider adding lustre support to
one or more of the available "fstest runners" to provide a turnkey solution
for the standalone test setup, but I see that you already contributed to ktest,
So that's great! and one more reason to merge fstests support sooner.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Lsf-pc] [LSF/MM/BPF TOPIC] Lustre filesystem upstreaming
  2025-02-01 10:55             ` Amir Goldstein
@ 2025-02-01 13:59               ` Zorro Lang
  2025-02-02 15:26                 ` Theodore Ts'o
  0 siblings, 1 reply; 17+ messages in thread
From: Zorro Lang @ 2025-02-01 13:59 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Day, Timothy, Andreas Dilger, Theodore Ts'o,
	Christoph Hellwig, lsf-pc@lists.linux-foundation.org,
	linux-fsdevel@vger.kernel.org, jsimmons@infradead.org,
	neilb@suse.de, fstests

On Sat, Feb 01, 2025 at 11:55:21AM +0100, Amir Goldstein wrote:
> On Sat, Feb 1, 2025 at 12:01 AM Day, Timothy <timday@amazon.com> wrote:
> >
> > On 1/31/25, 5:11 PM, "Amir Goldstein" <amir73il@gmail.com <mailto:amir73il@gmail.com>> wrote:
> > > On Fri, Jan 31, 2025 at 3:35 AM Andreas Dilger via Lsf-pc
> > > <lsf-pc@lists.linux-foundation.org <mailto:lsf-pc@lists.linux-foundation.org>> wrote:
> > > >
> > > >
> > > > As Tim mentioned, it is possible to mount a Lustre client (or two) plus one or
> > > > more MDT/OST on a single ~3GB VM with loopback files in /tmp and run testing.
> > > > There is a simple script we use to format and mount 4 MDTs and 4 OSTs on
> > > > temporary loop files and mount a client from the Lustre build tree.
> > > >
> > > > There haven't been any VFS patches needed for years for Lustre to be run,
> > > > though there are a number patches needed against a copied ext4 tree to
> > > > export some of the functions and add some filesystem features. Until the
> > > > ext4 patches are merged, it would also be possible to run light testing with
> > > > Tim's RAM-based OSD without any loopback devices at all (though with a
> > > > hard limitation on the size of the filesystem).
> > >
> > >
> > > Recommendation: if it is easy to setup loopback lustre server, then the best
> > > practice would be to add lustre fstests support, same as nfs/afs/cifs can be
> > > tested with fstests.
> > >
> > >
> > > Adding fstests support will not guarantee that vfs developers will run fstest
> > > with your filesystem, but if you make is super easy for vfs developers to
> > > test your filesystem with a the de-facto standard for fs testing, then at least
> > > they have an option to verify that their vfs changes are not breaking your
> > > filesystem, which is what upstreaming is supposed to provide.
> >
> > I was hoping to do exactly that. I've been able run to fstests on Lustre
> > (in an adhoc manner), but I wanted to put together a patch series to
> > add proper support. Would fstests accept Lustre support before Lustre
> > gets accepted upstream? Or should it be maintained as a separate
> > branch?
> >
> 
> Up to the maintainer (CC) but in any case, you will need to maintain
> a development branch until the fstests patches are reviewed, so I do
> not see much difference for the process.
> 
> My own vote would be to merge lustre support to fstest *before*
> merging lustre to linux-next tree (to fs-next branch), so that lustre
> could potentially be tested by 3rd party when it hits linux-next.
> 
> IMO, if lustre is on track for upstreaming with all the open questions
> addressed, I see no reason not to merge fstests support earlier.
> 
> I was going to recommend that you consider adding lustre support to
> one or more of the available "fstest runners" to provide a turnkey solution
> for the standalone test setup, but I see that you already contributed to ktest,
> So that's great! and one more reason to merge fstests support sooner.

Thanks Amir, I think fstests has nothing to lose to support one more testing :)

Let's see the patchset (to fstests@) at first. If it's simple enough, likes
4cbc0a0fa8b ("fstests: add GlusterFS support"), then it's easy to be merged.

If it needs to change many common things or generic cases, we'd better
to give it more reviewing and testing, and maybe merge it into a separated
branch at first. Anyway, let's check the patches at first :)

Meanwhile, I'd like to track the patches you send to linux VFS, to think
about when it's time to have the testing part at first, so feel free to CC.

Thanks,
Zorro

> 
> Thanks,
> Amir.
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Lsf-pc] [LSF/MM/BPF TOPIC] Lustre filesystem upstreaming
  2025-02-01 13:59               ` Zorro Lang
@ 2025-02-02 15:26                 ` Theodore Ts'o
  2025-02-03  6:44                   ` Christoph Hellwig
  0 siblings, 1 reply; 17+ messages in thread
From: Theodore Ts'o @ 2025-02-02 15:26 UTC (permalink / raw)
  To: Zorro Lang
  Cc: Amir Goldstein, Day, Timothy, Andreas Dilger, Christoph Hellwig,
	lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org,
	jsimmons@infradead.org, neilb@suse.de, fstests

On Sat, Feb 01, 2025 at 09:59:11PM +0800, Zorro Lang wrote:
> Thanks Amir, I think fstests has nothing to lose to support one more testing :)

Well, in the past I attempted to land a "local" file system type that
could be used for file systems that were available via docker (and so
there was no block device to mount and unmount).  This was useful for
testing gVisor[1] and could also be used for testing Windows Subsystem
for Linux v1.  As I recall, either Dave or Cristoph objected, even
though the diffstat was +73, -4 lines in common/rc.

[1] https://gvisor.dev/

So if it is really simple, it's also not hard to keep it in an
out-of-tree patch.  I've been maintaining [2] in my personal xfstests
branch, and kept it rebased on top of next, for years.  I figured it
was easier to keep it out of tree than to try to fight through Dave or
Cristoph's objects to get it upstream....

[2] https://github.com/tytso/xfstests/commit/7f8047273c8c963fdb9c3d441fe6d0f2a50cd4a3

So even if we can't get Lustre support upstream, I'm happy to maintain
it at my xfstests tree on github, much like I have with the "local"
file system type, for the past 8+ years.

Cheers,

						- Ted

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Lsf-pc] [LSF/MM/BPF TOPIC] Lustre filesystem upstreaming
  2025-02-02 15:26                 ` Theodore Ts'o
@ 2025-02-03  6:44                   ` Christoph Hellwig
  2025-02-03 16:06                     ` Day, Timothy
  0 siblings, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2025-02-03  6:44 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Zorro Lang, Amir Goldstein, Day, Timothy, Andreas Dilger,
	Christoph Hellwig, lsf-pc@lists.linux-foundation.org,
	linux-fsdevel@vger.kernel.org, jsimmons@infradead.org,
	neilb@suse.de, fstests

On Sun, Feb 02, 2025 at 10:26:28AM -0500, Theodore Ts'o wrote:
> Well, in the past I attempted to land a "local" file system type that
> could be used for file systems that were available via docker (and so
> there was no block device to mount and unmount).  This was useful for
> testing gVisor[1] and could also be used for testing Windows Subsystem
> for Linux v1.  As I recall, either Dave or Cristoph objected, even
> though the diffstat was +73, -4 lines in common/rc.

Yes, xfstests should just support upstream code.  Even for things where
we through it would get upstream ASAP like the XFS
rtrmap/reflink/metadir work (which finally did get upstream now) having
the half-finished support in xfstests without actually landing the rest
caused more than enough problems.  Something like lustre that has
historically been a complete trainwreck and where I have strong doubts
that the maintainers get their act together is even worse.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Lsf-pc] [LSF/MM/BPF TOPIC] Lustre filesystem upstreaming
  2025-02-03  6:44                   ` Christoph Hellwig
@ 2025-02-03 16:06                     ` Day, Timothy
  0 siblings, 0 replies; 17+ messages in thread
From: Day, Timothy @ 2025-02-03 16:06 UTC (permalink / raw)
  To: Christoph Hellwig, Theodore Ts'o
  Cc: Zorro Lang, Amir Goldstein, Andreas Dilger,
	lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org,
	jsimmons@infradead.org, neilb@suse.de, fstests

On 2/3/25, 1:44 AM, "Christoph Hellwig" <hch@infradead.org <mailto:hch@infradead.org>> wrote:
> On Sun, Feb 02, 2025 at 10:26:28AM -0500, Theodore Ts'o wrote:
> > Well, in the past I attempted to land a "local" file system type that
> > could be used for file systems that were available via docker (and so
> > there was no block device to mount and unmount). This was useful for
> > testing gVisor[1] and could also be used for testing Windows Subsystem
> > for Linux v1. As I recall, either Dave or Cristoph objected, even
> > though the diffstat was +73, -4 lines in common/rc.
>
> Yes, xfstests should just support upstream code. Even for things where
> we through it would get upstream ASAP like the XFS
> rtrmap/reflink/metadir work (which finally did get upstream now) having
> the half-finished support in xfstests without actually landing the rest
> caused more than enough problems.

That’s fair. From the perspective of someone making changes to xfstests,
it'd probably be hard to change any of the Lustre stuff without an easy
way to validate that the tests are still working correctly.

But I gave it another look yesterday - the changes need to support
Lustre are pretty minimal. fstests assumes that any miscellaneous filesystem
is disk based. So either relaxing that assumption or adding an explicit Lustre
$FSTYPE is enough. Of course, in the fullness of time - Lustre ought to have
its own tests exercising striping and such. We have our own test scripts to
cover this - but porting a subset to fstests would be helpful. But the initial
support doesn't need all that.

But I suppose I could keep this all downstream until Lustre gets closer
to acceptance.

> Something like lustre that has
> historically been a complete trainwreck and where I have strong doubts
> that the maintainers get their act together is even worse.

Did you have any specific doubts, beyond the development model and
disk/wire protocol stability? In other words, is the development model
the primary concern? Or are there technical concerns with Lustre itself?

Tim Day

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2025-02-03 16:06 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-24 20:50 [LSF/MM/BPF TOPIC] Lustre filesystem upstreaming Day, Timothy
2025-01-24 21:09 ` Matthew Wilcox
2025-01-24 22:56   ` NeilBrown
2025-01-25  6:33   ` Day, Timothy
2025-01-28  6:14 ` Christoph Hellwig
2025-01-28 16:35   ` Day, Timothy
2025-01-30 14:28     ` Theodore Ts'o
2025-01-30 16:18       ` Day, Timothy
2025-01-30 16:56         ` Theodore Ts'o
2025-01-30 17:32           ` Day, Timothy
     [not found]       ` <4044F3FF-D0CE-4823-B104-0544A986DF7B@ddn.com>
2025-01-31 22:11         ` [Lsf-pc] " Amir Goldstein
2025-01-31 23:01           ` Day, Timothy
2025-02-01 10:55             ` Amir Goldstein
2025-02-01 13:59               ` Zorro Lang
2025-02-02 15:26                 ` Theodore Ts'o
2025-02-03  6:44                   ` Christoph Hellwig
2025-02-03 16:06                     ` Day, Timothy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox