Re: Mainlining the kernel module for TernFS, a distributed filesystem

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Francesco Mazzoli" <f@mazzo.li>
To: "Amir Goldstein" <amir73il@gmail.com>
Cc: linux-fsdevel@vger.kernel.org,
	"Christian Brauner" <brauner@kernel.org>,
	"Darrick J. Wong" <djwong@kernel.org>,
	"Bernd Schubert" <bernd.schubert@fastmail.fm>,
	"Miklos Szeredi" <miklos@szeredi.hu>
Subject: Re: Mainlining the kernel module for TernFS, a distributed filesystem
Date: Fri, 03 Oct 2025 16:01:56 +0100	[thread overview]
Message-ID: <34918add-4215-4bd3-b51f-9e47157501a3@app.fastmail.com> (raw)
In-Reply-To: <CAOQ4uxi_Pas-kd+WUG0NFtFZHkvJn=vgp4TCr0bptCaFpCzDyw@mail.gmail.com>

On Fri, Oct 3, 2025, at 15:22, Amir Goldstein wrote:
> First of all, the project looks very impressive!
> 
> The first thing to do to understand the prospect of upstreaming is exactly
> what you did - send this email :)
> It is very detailed and the linked design doc is very thorough.

Thanks for the kind words!

> A codebase code with only one major user is a red flag.
> I am sure that you and your colleagues are very talented,
> but if your employer decides to cut down on upstreaming budget,
> the kernel maintainers would be left with an effectively orphaned filesystem.
> 
> This is especially true when the client is used in house, most likely
> not on a distro running the latest upstream kernel.
> 
> So yeh, it's a bit of a chicken and egg problem,
> but if you get community adoption for the server code,
> it will make a big difference on the prospect of upstreaming the client code.

Understood, we can definitely wait and see if TernFS gains wider adoption
before making concrete plans to upstream.

> I am very interested in this part, because that is IMO a question that
> we need to ask every new filesystem upstream attempt:
> "Can it be implemented in FUSE?"

Yes, and we have done so:
<https://github.com/XTXMarkets/ternfs/blob/main/go/ternfuse/ternfuse.go>.

> So my question is:
> Why is the FUSE client slower?
> Did you analyse the bottlenecks?
> Do these bottlenecks exist when using the FUSE-iouring channel?
> Mind you that FUSE-iouring was developed by DDN developers specifically
> for the use case of very fast distributed filesystems in userspace.
> ...
> I mean it sounds very cool from an engineering POV that you managed to
> remove unneeded constraints (a.k.a POSIX standard) and make a better
> product due to the simplifications, but that's exactly what userspace
> filesystems
> are for - for doing whatever you want ;)

These are all good questions, and while we have not profiled the FUSE driver
extensively, my impression is that relying critically on FUSE would be risky.
There are some specific things that would be difficult today. For instance
FUSE does not expose `d_revalidate`, which means that dentries would be dropped
needlessly in cases where we know they can be left in place.

There are also some more high level FUSE design points which we were concerned
by (although I'm not up to speed with the FUSE over io_uring work). One obvious
concern is the fact that with FUSE it's much harder to minimize copying.
FUSE passthrough helps but it would have made the read path significantly more
complex given the need to juggle file descriptors between user space and the
kernel. Also, TernFS uses Reed-Solomon to recover from situations where some
parts of a file is unreadable, and in that case we'd have had to fall back to
a non-passthrough version. Another possible FUSE performance pitfall is that
you're liable to be bottlenecked by the FUSE request queue, while if you work
directly within the kernel you're not.

And of course before BPF we wouldn't have been able to track the nature of
file closes to a degree where the FUSE driver can implement TernFS semantics
correctly.

This is not to say that a FUSE driver couldn't possibly work, but I think there
are good reason for wanting to work directly with the kernel if you want to be
sure to utilize resources effectively.

> Except for the wide adoption of the open source ceph server ;)

Oh, absolutely, I was just talking about how the code would look :).

Thanks,
Francesco

next prev parent reply	other threads:[~2025-10-03 15:02 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-03 12:13 Mainlining the kernel module for TernFS, a distributed filesystem Francesco Mazzoli
2025-10-03 14:22 ` Amir Goldstein
2025-10-03 15:01   ` Francesco Mazzoli [this message]
2025-10-03 17:35     ` Bernd Schubert
2025-10-03 18:18       ` Francesco Mazzoli
2025-10-03 19:01         ` Francesco Mazzoli
2025-10-04  2:52     ` Theodore Ts'o
2025-10-04  9:01       ` Francesco Mazzoli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=34918add-4215-4bd3-b51f-9e47157501a3@app.fastmail.com \
    --to=f@mazzo.li \
    --cc=amir73il@gmail.com \
    --cc=bernd.schubert@fastmail.fm \
    --cc=brauner@kernel.org \
    --cc=djwong@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).