From: Matthew Wilcox <willy@infradead.org>
To: John Groves <John@groves.net>
Cc: Jonathan Corbet <corbet@lwn.net>,
Jonathan Cameron <Jonathan.Cameron@huawei.com>,
Dan Williams <dan.j.williams@intel.com>,
Vishal Verma <vishal.l.verma@intel.com>,
Dave Jiang <dave.jiang@intel.com>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org,
nvdimm@lists.linux.dev, John Groves <jgroves@micron.com>,
john@jagalactic.com, Dave Chinner <david@fromorbit.com>,
Christoph Hellwig <hch@infradead.org>,
dave.hansen@linux.intel.com, gregory.price@memverge.com,
Randy Dunlap <rdunlap@infradead.org>,
Jerome Glisse <jglisse@google.com>,
Aravind Ramesh <arramesh@micron.com>,
Ajay Joshi <ajayjoshi@micron.com>,
Eishan Mirakhur <emirakhur@micron.com>,
Ravi Shankar <venkataravis@micron.com>,
Srinivasulu Thanneeru <sthanneeru@micron.com>,
Luis Chamberlain <mcgrof@kernel.org>,
Amir Goldstein <amir73il@gmail.com>,
Chandan Babu R <chandanbabu@kernel.org>,
Bagas Sanjaya <bagasdotme@gmail.com>,
"Darrick J . Wong" <djwong@kernel.org>,
Kent Overstreet <kent.overstreet@linux.dev>,
Steve French <stfrench@microsoft.com>,
Nathan Lynch <nathanl@linux.ibm.com>,
Michael Ellerman <mpe@ellerman.id.au>,
Thomas Zimmermann <tzimmermann@suse.de>,
Julien Panis <jpanis@baylibre.com>,
Stanislav Fomichev <sdf@google.com>,
Dongsheng Yang <dongsheng.yang@easystack.cn>
Subject: Re: [RFC PATCH v2 00/12] Introduce the famfs shared-memory file system
Date: Tue, 30 Apr 2024 22:01:15 +0100 [thread overview]
Message-ID: <ZjFcG9Q1CegMPj_7@casper.infradead.org> (raw)
In-Reply-To: <c3mhc33u4yqhd75xc2ew53iuumg3c2vi3nk3msupt35fj7qkrp@pve6htn64e7c>
On Mon, Apr 29, 2024 at 09:11:52PM -0500, John Groves wrote:
> On 24/04/29 07:32PM, Matthew Wilcox wrote:
> > On Mon, Apr 29, 2024 at 12:04:16PM -0500, John Groves wrote:
> > > This patch set introduces famfs[1] - a special-purpose fs-dax file system
> > > for sharable disaggregated or fabric-attached memory (FAM). Famfs is not
> > > CXL-specific in anyway way.
> > >
> > > * Famfs creates a simple access method for storing and sharing data in
> > > sharable memory. The memory is exposed and accessed as memory-mappable
> > > dax files.
> > > * Famfs supports multiple hosts mounting the same file system from the
> > > same memory (something existing fs-dax file systems don't do).
> >
> > Yes, but we do already have two filesystems that support shared storage,
> > and are rather more advanced than famfs -- GFS2 and OCFS2. What are
> > the pros and cons of improving either of those to support DAX rather
> > than starting again with a new filesystem?
> >
>
> Thanks for paying attention to this Willy.
Well, don't mistake this for an endorsement! I remain convinced that
this is a science project, not a product. I am hugely sceptical of
disaggregated systems, mostly because I've seen so many fail. And they
rarely attempt to answer the "janitor tripped over the cable" problem,
the "we need to upgrade the firmware on the switch" problem, or a bunch
of other problems I've outlined in the past on this list.
So I am not supportive of any changes you want to make to the core kernel
to support this kind of adventure. Play in your own sandbox all you
like, but not one line of code change in the core. Unless it's something
generally beneficial, of course; you mentioned refactoring DAX and that
might be a good thing for everybody.
> * Famfs is not, not, not a general purpose file system.
> * One can think of famfs as a shared memory allocator where allocations can be
> accessed as files. For certain data analytics work flows (especially
> involving Apache Arrow data frames) this is really powerful. Consumers of
> data frames commonly use mmap(MAP_SHARED), and can benefit from the memory
> de-duplication of shared memory and don't need any new abstractions.
... and are OK with the extra latency?
> * Famfs is not really a data storage tool. It's more of a shared-memroy
> allocation tool that has the benefit of allocations being accesssible
> (and memory-mappable) as files. So a lot of software can automatically use
> it.
> * Famfs is oriented to dumping sharable data into files and then allowing a
> scale-out cluster to share it (often read-only) to access a single copy in
> shared memory.
Depending on the exact workload, I can see this being more efficient
than replicating the data to each member of the cluster. In other
workloads, it'll be a loss, of course.
> * I'm no expert on GFS2 or OCFS2, but I've been around memory, file systems
> and storage since well before the turn of the century...
> * If you had brought up the existing fs-dax file systems, I would have pointed
> that they use write-back metadata, which does not reconcile with shared
> access to media - but these file systems do handle that.
> * The shared media file systems are still oriented to block devices that
> provide durable storage and page-oriented access. CXL DRAM is a character
I'd say "block oriented" rather than page oriented, but I agree.
> dax (devdax) device and does not provide durable storage.
> * fs-dax-style memory mapping for volatile cxl memory requires the
> dev_dax_iomap portion of this patch set - or something similar.
> * A scale-out shared media file system presumably requires some commitment to
> configure and manage some complexity in a distributed environment; whether
> that should be mandatory for enablement of shared memory is worthy of
> discussion.
> * Adding memory to the storage tier for GFS2/OCFS2 would add non-persistent
> media to the storage tier; whether this makes sense would be a topic that
> GFS2/OCFS2 developers/architects should get involved in if they're
> interested.
>
> Although disaggregated shared memory is not commercially available yet, famfs
> is being actively tested by multiple companies for several use cases and
> patterns with real and simulated shared memory. Demonstrations will start to
> surface in the coming weeks & months.
I guess we'll see. SGI died for a reason.
prev parent reply other threads:[~2024-04-30 21:01 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-29 17:04 [RFC PATCH v2 00/12] Introduce the famfs shared-memory file system John Groves
2024-04-29 17:04 ` [RFC PATCH v2 01/12] famfs: Introduce famfs documentation John Groves
2024-04-30 6:46 ` Bagas Sanjaya
2024-04-29 17:04 ` [RFC PATCH v2 02/12] dev_dax_iomap: Move dax_pgoff_to_phys() from device.c to bus.c John Groves
2024-04-29 17:04 ` [RFC PATCH v2 03/12] dev_dax_iomap: Add fs_dax_get() func to prepare dax for fs-dax usage John Groves
2024-04-29 17:04 ` [RFC PATCH v2 04/12] dev_dax_iomap: Save the kva from memremap John Groves
2024-04-29 17:04 ` [RFC PATCH v2 05/12] dev_dax_iomap: Add dax_operations for use by fs-dax on devdax John Groves
2024-04-29 17:04 ` [RFC PATCH v2 06/12] dev_dax_iomap: export dax_dev_get() John Groves
2024-04-29 17:04 ` [RFC PATCH v2 07/12] famfs prep: Add fs/super.c:kill_char_super() John Groves
2024-05-02 18:17 ` Al Viro
2024-05-02 22:25 ` John Groves
2024-05-03 9:04 ` Christian Brauner
2024-05-03 15:38 ` John Groves
2024-04-29 17:04 ` [RFC PATCH v2 08/12] famfs: module operations & fs_context John Groves
2024-04-30 11:01 ` Christian Brauner
2024-05-02 15:51 ` John Groves
2024-05-03 14:15 ` John Groves
2024-05-02 18:23 ` Al Viro
2024-05-02 21:50 ` John Groves
2024-04-29 17:04 ` [RFC PATCH v2 09/12] famfs: Introduce inode_operations and super_operations John Groves
2024-04-29 17:04 ` [RFC PATCH v2 10/12] famfs: Introduce file_operations read/write John Groves
2024-05-02 18:29 ` Al Viro
2024-05-02 21:51 ` John Groves
2024-04-29 17:04 ` [RFC PATCH v2 11/12] famfs: Introduce mmap and VM fault handling John Groves
2024-04-29 17:04 ` [RFC PATCH v2 12/12] famfs: famfs_ioctl and core file-to-memory mapping logic & iomap_ops John Groves
2024-04-29 18:32 ` [RFC PATCH v2 00/12] Introduce the famfs shared-memory file system Matthew Wilcox
2024-04-29 23:08 ` Kent Overstreet
2024-04-30 2:24 ` John Groves
2024-04-30 3:11 ` Kent Overstreet
2024-05-01 2:09 ` John Groves
2024-04-30 2:11 ` John Groves
2024-04-30 21:01 ` Matthew Wilcox [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZjFcG9Q1CegMPj_7@casper.infradead.org \
--to=willy@infradead.org \
--cc=John@groves.net \
--cc=Jonathan.Cameron@huawei.com \
--cc=ajayjoshi@micron.com \
--cc=amir73il@gmail.com \
--cc=arramesh@micron.com \
--cc=bagasdotme@gmail.com \
--cc=brauner@kernel.org \
--cc=chandanbabu@kernel.org \
--cc=corbet@lwn.net \
--cc=dan.j.williams@intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=dave.jiang@intel.com \
--cc=david@fromorbit.com \
--cc=djwong@kernel.org \
--cc=dongsheng.yang@easystack.cn \
--cc=emirakhur@micron.com \
--cc=gregory.price@memverge.com \
--cc=hch@infradead.org \
--cc=jack@suse.cz \
--cc=jglisse@google.com \
--cc=jgroves@micron.com \
--cc=john@jagalactic.com \
--cc=jpanis@baylibre.com \
--cc=kent.overstreet@linux.dev \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=mcgrof@kernel.org \
--cc=mpe@ellerman.id.au \
--cc=nathanl@linux.ibm.com \
--cc=nvdimm@lists.linux.dev \
--cc=rdunlap@infradead.org \
--cc=sdf@google.com \
--cc=stfrench@microsoft.com \
--cc=sthanneeru@micron.com \
--cc=tzimmermann@suse.de \
--cc=venkataravis@micron.com \
--cc=viro@zeniv.linux.org.uk \
--cc=vishal.l.verma@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).