Re: [LSF/MM/BPF TOPIC] Design challenges for a new file system that needs to support multiple billions of file

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Ric Wheeler <ricwheeler@gmail.com>
To: Amir Goldstein <amir73il@gmail.com>
Cc: lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org,
	Zach Brown <zab@zabbo.net>,
	Christian Brauner <brauner@kernel.org>
Subject: Re: [LSF/MM/BPF TOPIC] Design challenges for a new file system that needs to support multiple billions of file
Date: Mon, 3 Feb 2025 17:18:48 +0100	[thread overview]
Message-ID: <cf648cfb-7d2d-4c36-8282-fe3333a182c3@gmail.com> (raw)
In-Reply-To: <CAOQ4uxjN5oedNhZ2kCJC2XLncdkSFMYJOWmSEC3=a-uGjd=w7Q@mail.gmail.com>


On 2/3/25 4:22 PM, Amir Goldstein wrote:
> On Sun, Feb 2, 2025 at 10:40 PM RIc Wheeler <ricwheeler@gmail.com> wrote:
>>
>> I have always been super interested in how much we can push the
>> scalability limits of file systems and for the workloads we need to
>> support, we need to scale up to supporting absolutely ridiculously large
>> numbers of files (a few billion files doesn't meet the need of the
>> largest customers we support).
>>
> Hi Ric,
>
> Since LSFMM is not about presentations, it would be better if the topic to
> discuss was trying to address specific technical questions that developers
> could discuss.

Totally agree - from the ancient history of LSF (before MM or BPF!) we 
also pushed for discussions over talks.

>
> If a topic cannot generate a discussion on the list, it is not very
> likely that it will
> generate a discussion on-prem.
>
> Where does the scaling with the number of files in a filesystem affect existing
> filesystems? What are the limitations that you need to overcome?

Local file systems like xfs running on "scale up" giant systems (think 
of the old super sized HP Superdomes and the like) would be likely to 
handle this well.

In a lot of ways, ngnfs means to replicate that scalability for "scale 
out" (hate buzz words!) systems that are more affordable. In effect, you 
can size your system by just adding more servers with their local NVME 
devices and build up performance and capacity in an incremental way.

Shared disk file systems like scoutfs which (also GPL'ed but not 
upstream) scale pretty well in file count but have coarse grain locking 
that causes performance bumps and the added complexity of needed RAID 
heads or SAN systems.


>
>> Zach Brown is leading a new project on ngnfs (FOSDEM talk this year gave
>> a good background on this -
>> https://www.fosdem.org/2025/schedule/speaker/zach_brown/).  We are
>> looking at taking advantage of modern low latency NVME devices and
>> today's networks to implement a distributed file system that provides
>> better concurrency that high object counts need and still have the
>> bandwidth needed to support the backend archival systems we feed.
>>
> I heard this talk and it was very interesting.
> Here's a direct link to slides from people who may be too lazy to
> follow 3 clicks:
> https://www.fosdem.org/2025/events/attachments/fosdem-2025-5471-ngnfs-a-distributed-file-system-using-block-granular-consistency/slides/236150/zach-brow_aqVkVuI.pdf
>
> I was both very impressed by the cache coherent rename example
> and very puzzled - I do not know any filesystem where rename can be
> synchronized on a single block io, and looking up ancestors is usually
> done on in-memory dentries, so I may not have understood the example.
>
>> ngnfs as a topic would go into the coherence design (and code) that
>> underpins the increased concurrency it aims to deliver.
>>
>> Clear that the project is in early days compared to most of the proposed
>> content, but it can be useful to spend some of the time on new ideas.
>>
> This sounds like an interesting topic to discuss.
> I would love it if you or Zach could share more details on the list so that more
> people could participate in the discussion leading to LSFMM.
>
> Also, I think it is important to mention, as you told me, that the
> server implementation
> of ngnfs is GPL and to provide some pointers, because IMO this is very important
> when requesting community feedback on a new filesystem.
>
> Thanks,
> Amir.

All of ngnfs is GPL'ed - no non-open source client or similar.


Regards,


Ric

next prev parent reply	other threads:[~2025-02-03 16:18 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-02 21:39 [LSF/MM/BPF TOPIC] Design challenges for a new file system that needs to support multiple billions of file RIc Wheeler
2025-02-03 15:22 ` Amir Goldstein
2025-02-03 16:18   ` Ric Wheeler [this message]
2025-02-04  1:47     ` Dave Chinner
2025-02-05  8:05       ` Ric Wheeler
2025-02-06 18:58   ` Zach Brown
2025-02-06 23:36     ` Andreas Dilger
2025-02-03 15:23 ` Ric Wheeler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cf648cfb-7d2d-4c36-8282-fe3333a182c3@gmail.com \
    --to=ricwheeler@gmail.com \
    --cc=amir73il@gmail.com \
    --cc=brauner@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=zab@zabbo.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).