linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@kernel.org>
To: David Howells <dhowells@redhat.com>, Steve French <smfrench@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>,
	Marc Dionne <marc.dionne@auristor.com>,
	 Paulo Alcantara <pc@manguebit.com>,
	Shyam Prasad N <sprasad@microsoft.com>,
	Tom Talpey <tom@talpey.com>,
	Dominique Martinet <asmadeus@codewreck.org>,
	Eric Van Hensbergen <ericvh@kernel.org>,
	Ilya Dryomov <idryomov@gmail.com>,
	Christian Brauner <christian@brauner.io>,
	linux-cachefs@redhat.com, linux-afs@lists.infradead.org,
	 linux-cifs@vger.kernel.org, linux-nfs@vger.kernel.org,
	 ceph-devel@vger.kernel.org, v9fs@lists.linux.dev,
	linux-fsdevel@vger.kernel.org,  linux-mm@kvack.org,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v4 00/39] netfs, afs, 9p: Delegate high-level I/O to netfslib
Date: Thu, 14 Dec 2023 09:11:53 -0500	[thread overview]
Message-ID: <ede57e618abfc38229157447fb152f6027eb1b8e.camel@kernel.org> (raw)
In-Reply-To: <20231213152350.431591-1-dhowells@redhat.com>

On Wed, 2023-12-13 at 15:23 +0000, David Howells wrote:
> Hi Jeff, Steve, Dominique,
> 
> I have been working on my netfslib helpers to the point that I can run
> xfstests on AFS to completion (both with write-back buffering and, with a
> small patch, write-through buffering in the pagecache).  I have a patch for
> 9P, but am currently unable to test it.
> 
> The patches remove a little over 800 lines from AFS, 300 from 9P, albeit with
> around 3000 lines added to netfs.  Hopefully, I will be able to remove a bunch
> of lines from Ceph too.
> 
> I've split the CIFS patches out to a separate branch, cifs-netfs, where a
> further 2000+ lines are removed.  I can run a certain amount of xfstests on
> CIFS, though I'm running into ksmbd issues and not all the tests work
> correctly because of issues between fallocate and what the SMB protocol
> actually supports.
> 
> I've also dropped the content-crypto patches out for the moment as they're
> only usable by the ceph changes which I'm still working on.
> 
> The patch to use PG_writeback instead of PG_fscache for writing to the
> cache has also been deferred, pending 9p, afs, ceph and cifs all being
> converted.
> 
> The main aims of these patches are to get high-level I/O and knowledge of
> the pagecache out of the filesystem drivers as much as possible and to get
> rid, as much of possible, of the knowledge that pages/folios exist.
> 
> Further, I would like to see ->write_begin, ->write_end and ->launder_folio
> go away.
> 
> Features that are added by these patches to that which is already there in
> netfslib:
> 
>  (1) NFS-style (and Ceph-style) locking around DIO vs buffered I/O calls to
>      prevent these from happening at the same time.  mmap'd I/O can, of
>      necessity, happen at any time ignoring these locks.
> 
>  (2) Support for unbuffered I/O.  The data is kept in the bounce buffer and
>      the pagecache is not used.  This can be turned on with an inode flag.
> 
>  (3) Support for direct I/O.  This is basically unbuffered I/O with some
>      extra restrictions and no RMW.
> 
>  (4) Support for using a bounce buffer in an operation.  The bounce buffer
>      may be bigger than the target data/buffer, allowing for crypto
>      rounding.
> 
>  (5) ->write_begin() and ->write_end() are ignored in favour of merging all
>      of that into one function, netfs_perform_write(), thereby avoiding the
>      function pointer traversals.
> 
>  (6) Support for write-through caching in the pagecache.
>      netfs_perform_write() adds the pages is modifies to an I/O operation
>      as it goes and directly marks them writeback rather than dirty.  When
>      writing back from write-through, it limits the range written back.
>      This should allow CIFS to deal with byte-range mandatory locks
>      correctly.
> 
>  (7) O_*SYNC and RWF_*SYNC writes use write-through rather than writing to
>      the pagecache and then flushing afterwards.  An AIO O_*SYNC write will
>      notify of completion when the sub-writes all complete.
> 
>  (8) Support for write-streaming where modifed data is held in !uptodate
>      folios, with a private struct attached indicating the range that is
>      valid.
> 
>  (9) Support for write grouping, multiplexing a pointer to a group in the
>      folio private data with the write-streaming data.  The writepages
>      algorithm only writes stuff back that's in the nominated group.  This
>      is intended for use by Ceph to write is snaps in order.
> 
> (10) Skipping reads for which we know the server could only supply zeros or
>      EOF (for instance if we've done a local write that leaves a hole in
>      the file and extends the local inode size).
> 
> General notes:
> 
>  (1) The fscache module is merged into the netfslib module to avoid cyclic
>      exported symbol usage that prevents either module from being loaded.
> 
>  (2) Some helpers from fscache are reassigned to netfslib by name.
> 
>  (3) netfslib now makes use of folio->private, which means the filesystem
>      can't use it.
> 
>  (4) The filesystem provides wrappers to call the write helpers, allowing
>      it to do pre-validation, oplock/capability fetching and the passing in
>      of write group info.
> 
>  (5) I want to try flushing the data when tearing down an inode before
>      invalidating it to try and render launder_folio unnecessary.
> 
>  (6) Write-through caching will generate and dispatch write subrequests as
>      it gathers enough data to hit wsize and has whole pages that at least
>      span that size.  This needs to be a bit more flexible, allowing for a
>      filesystem such as CIFS to have a variable wsize.
> 
>  (7) The filesystem driver is just given read and write calls with an
>      iov_iter describing the data/buffer to use.  Ideally, they don't see
>      pages or folios at all.  A function, extract_iter_to_sg(), is already
>      available to decant part of an iterator into a scatterlist for crypto
>      purposes.
> 
> 
> 9P notes:
> 
>  (1) I haven't managed to test this as I haven't been able to get Ganesha
>      to work correctly with 9P.
> 
>  (2) Writes should now occur in larger-than-page-sized chunks.
> 
>  (3) It should be possible to turn on multipage folio support in 9P now.
> 
> 
> Changes
> =======
> ver #4)
>  - Slimmed down the branch:
>    - Split the cifs-related patches off to a separate branch (cifs-netfs)
>    - Deferred the content-encryption to the in-progress ceph changes.
>    - Deferred the use-PG_writeback rather than PG_fscache patch
>  - Rebased on a later linux-next with afs-rotation patches.
> 
> ver #3)
>  - Moved the fscache module into netfslib to avoid export cycles.
>  - Fixed a bunch of bugs.
>  - Got CIFS to pass as much of xfstests as possible.
>  - Added a patch to make 9P use all the helpers.
>  - Added a patch to stop using PG_fscache, but rather dirty pages on
>    reading and have writepages write to the cache.
> 
> ver #2)
>  - Folded the addition of NETFS_RREQ_NONBLOCK/BLOCKED into first patch that
>    uses them.
>  - Folded addition of rsize member into first user.
>  - Don't set rsize in ceph (yet) and set it in kafs to 256KiB.  cifs sets
>    it dynamically.
>  - Moved direct_bv next to direct_bv_count in struct netfs_io_request and
>    labelled it with a __counted_by().
>  - Passed flags into netfs_xa_store_and_mark() rather than two bools.
>  - Removed netfs_set_up_buffer() as it wasn't used.
> 
> David
> 
> Link: https://lore.kernel.org/r/20231013160423.2218093-1-dhowells@redhat.com/ # v1
> Link: https://lore.kernel.org/r/20231117211544.1740466-1-dhowells@redhat.com/ # v2
> 
> David Howells (39):
>   netfs, fscache: Move fs/fscache/* into fs/netfs/
>   netfs, fscache: Combine fscache with netfs
>   netfs, fscache: Remove ->begin_cache_operation
>   netfs, fscache: Move /proc/fs/fscache to /proc/fs/netfs and put in a
>     symlink
>   netfs: Move pinning-for-writeback from fscache to netfs
>   netfs: Add a procfile to list in-progress requests
>   netfs: Allow the netfs to make the io (sub)request alloc larger
>   netfs: Add a ->free_subrequest() op
>   afs: Don't use folio->private to record partial modification
>   netfs: Provide invalidate_folio and release_folio calls
>   netfs: Implement unbuffered/DIO vs buffered I/O locking
>   netfs: Add iov_iters to (sub)requests to describe various buffers
>   netfs: Add support for DIO buffering
>   netfs: Provide tools to create a buffer in an xarray
>   netfs: Add bounce buffering support
>   netfs: Add func to calculate pagecount/size-limited span of an
>     iterator
>   netfs: Limit subrequest by size or number of segments
>   netfs: Export netfs_put_subrequest() and some tracepoints
>   netfs: Extend the netfs_io_*request structs to handle writes
>   netfs: Add a hook to allow tell the netfs to update its i_size
>   netfs: Make netfs_put_request() handle a NULL pointer
>   netfs: Make the refcounting of netfs_begin_read() easier to use
>   netfs: Prep to use folio->private for write grouping and streaming
>     write
>   netfs: Dispatch write requests to process a writeback slice
>   netfs: Provide func to copy data to pagecache for buffered write
>   netfs: Make netfs_read_folio() handle streaming-write pages
>   netfs: Allocate multipage folios in the writepath
>   netfs: Implement support for unbuffered/DIO read
>   netfs: Implement unbuffered/DIO write support
>   netfs: Implement buffered write API
>   netfs: Allow buffered shared-writeable mmap through
>     netfs_page_mkwrite()
>   netfs: Provide netfs_file_read_iter()
>   netfs, cachefiles: Pass upper bound length to allow expansion
>   netfs: Provide a writepages implementation
>   netfs: Provide a launder_folio implementation
>   netfs: Implement a write-through caching option
>   netfs: Optimise away reads above the point at which there can be no
>     data
>   afs: Use the netfs write helpers
>   9p: Use netfslib read/write_iter
> 
>  Documentation/filesystems/netfs_library.rst   |   23 +-
>  MAINTAINERS                                   |    2 +-
>  fs/9p/vfs_addr.c                              |  352 +----
>  fs/9p/vfs_file.c                              |   89 +-
>  fs/9p/vfs_inode.c                             |    5 +-
>  fs/9p/vfs_super.c                             |   14 +-
>  fs/Kconfig                                    |    1 -
>  fs/Makefile                                   |    1 -
>  fs/afs/file.c                                 |  213 +--
>  fs/afs/inode.c                                |   26 +-
>  fs/afs/internal.h                             |   72 +-
>  fs/afs/super.c                                |    2 +-
>  fs/afs/write.c                                |  826 +----------
>  fs/cachefiles/internal.h                      |    2 +-
>  fs/cachefiles/io.c                            |   10 +-
>  fs/cachefiles/ondemand.c                      |    2 +-
>  fs/ceph/addr.c                                |   25 +-
>  fs/ceph/cache.h                               |   35 +-
>  fs/ceph/inode.c                               |    2 +-
>  fs/fs-writeback.c                             |   10 +-
>  fs/fscache/Kconfig                            |   40 -
>  fs/fscache/Makefile                           |   16 -
>  fs/fscache/internal.h                         |  277 ----
>  fs/netfs/Kconfig                              |   39 +
>  fs/netfs/Makefile                             |   22 +-
>  fs/netfs/buffered_read.c                      |  229 ++-
>  fs/netfs/buffered_write.c                     | 1247 +++++++++++++++++
>  fs/netfs/direct_read.c                        |  252 ++++
>  fs/netfs/direct_write.c                       |  170 +++
>  fs/{fscache/cache.c => netfs/fscache_cache.c} |    0
>  .../cookie.c => netfs/fscache_cookie.c}       |    0
>  fs/netfs/fscache_internal.h                   |   14 +
>  fs/{fscache/io.c => netfs/fscache_io.c}       |   42 +-
>  fs/{fscache/main.c => netfs/fscache_main.c}   |   25 +-
>  fs/{fscache/proc.c => netfs/fscache_proc.c}   |   23 +-
>  fs/{fscache/stats.c => netfs/fscache_stats.c} |    4 +-
>  .../volume.c => netfs/fscache_volume.c}       |    0
>  fs/netfs/internal.h                           |  288 ++++
>  fs/netfs/io.c                                 |  214 ++-
>  fs/netfs/iterator.c                           |   97 ++
>  fs/netfs/locking.c                            |  215 +++
>  fs/netfs/main.c                               |  110 ++
>  fs/netfs/misc.c                               |  260 ++++
>  fs/netfs/objects.c                            |   63 +-
>  fs/netfs/output.c                             |  478 +++++++
>  fs/netfs/stats.c                              |   31 +-
>  fs/nfs/Kconfig                                |    4 +-
>  fs/nfs/fscache.c                              |    7 -
>  fs/smb/client/cifsfs.c                        |    9 +-
>  fs/smb/client/file.c                          |   18 +-
>  fs/smb/client/fscache.c                       |    2 +-
>  include/linux/fs.h                            |    2 +-
>  include/linux/fscache.h                       |   45 -
>  include/linux/netfs.h                         |  176 ++-
>  include/linux/writeback.h                     |    2 +-
>  include/trace/events/afs.h                    |   31 -
>  include/trace/events/netfs.h                  |  155 +-
>  mm/filemap.c                                  |    1 +
>  58 files changed, 4197 insertions(+), 2123 deletions(-)
>  delete mode 100644 fs/fscache/Kconfig
>  delete mode 100644 fs/fscache/Makefile
>  delete mode 100644 fs/fscache/internal.h
>  create mode 100644 fs/netfs/buffered_write.c
>  create mode 100644 fs/netfs/direct_read.c
>  create mode 100644 fs/netfs/direct_write.c
>  rename fs/{fscache/cache.c => netfs/fscache_cache.c} (100%)
>  rename fs/{fscache/cookie.c => netfs/fscache_cookie.c} (100%)
>  create mode 100644 fs/netfs/fscache_internal.h
>  rename fs/{fscache/io.c => netfs/fscache_io.c} (86%)
>  rename fs/{fscache/main.c => netfs/fscache_main.c} (84%)
>  rename fs/{fscache/proc.c => netfs/fscache_proc.c} (58%)
>  rename fs/{fscache/stats.c => netfs/fscache_stats.c} (97%)
>  rename fs/{fscache/volume.c => netfs/fscache_volume.c} (100%)
>  create mode 100644 fs/netfs/locking.c
>  create mode 100644 fs/netfs/misc.c
>  create mode 100644 fs/netfs/output.c
> 

This all looks pretty great, David. Nice work! I had a few comments on a
few of them, but most are no big deal. It'd be nice to get this into
linux-next soon.

On the ones where I didn't have comments, you can add:

Reviewed-by: Jeff Layton <jlayton@kernel.org>

  parent reply	other threads:[~2023-12-14 14:11 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-13 15:23 [PATCH v4 00/39] netfs, afs, 9p: Delegate high-level I/O to netfslib David Howells
2023-12-13 15:23 ` [PATCH v4 01/39] netfs, fscache: Move fs/fscache/* into fs/netfs/ David Howells
2023-12-13 15:23 ` [PATCH v4 02/39] netfs, fscache: Combine fscache with netfs David Howells
2023-12-13 15:23 ` [PATCH v4 03/39] netfs, fscache: Remove ->begin_cache_operation David Howells
2023-12-13 15:23 ` [PATCH v4 04/39] netfs, fscache: Move /proc/fs/fscache to /proc/fs/netfs and put in a symlink David Howells
2023-12-13 15:23 ` [PATCH v4 05/39] netfs: Move pinning-for-writeback from fscache to netfs David Howells
2023-12-13 15:23 ` [PATCH v4 06/39] netfs: Add a procfile to list in-progress requests David Howells
2023-12-13 15:59   ` Jeff Layton
2023-12-13 15:23 ` [PATCH v4 07/39] netfs: Allow the netfs to make the io (sub)request alloc larger David Howells
2023-12-13 15:23 ` [PATCH v4 08/39] netfs: Add a ->free_subrequest() op David Howells
2023-12-13 15:23 ` [PATCH v4 09/39] afs: Don't use folio->private to record partial modification David Howells
2023-12-13 15:23 ` [PATCH v4 10/39] netfs: Provide invalidate_folio and release_folio calls David Howells
2023-12-13 16:05   ` Jeff Layton
2023-12-13 15:23 ` [PATCH v4 11/39] netfs: Implement unbuffered/DIO vs buffered I/O locking David Howells
2023-12-13 16:08   ` Jeff Layton
2023-12-13 16:30     ` Jeff Layton
2023-12-13 15:23 ` [PATCH v4 12/39] netfs: Add iov_iters to (sub)requests to describe various buffers David Howells
2023-12-13 16:37   ` Jeff Layton
2023-12-19 14:31   ` David Howells
2023-12-19 14:40   ` David Howells
2023-12-13 15:23 ` [PATCH v4 13/39] netfs: Add support for DIO buffering David Howells
2023-12-13 15:23 ` [PATCH v4 14/39] netfs: Provide tools to create a buffer in an xarray David Howells
2023-12-13 15:23 ` [PATCH v4 15/39] netfs: Add bounce buffering support David Howells
2023-12-13 15:23 ` [PATCH v4 16/39] netfs: Add func to calculate pagecount/size-limited span of an iterator David Howells
2023-12-13 15:23 ` [PATCH v4 17/39] netfs: Limit subrequest by size or number of segments David Howells
2023-12-13 15:23 ` [PATCH v4 18/39] netfs: Export netfs_put_subrequest() and some tracepoints David Howells
2023-12-13 18:01   ` Jeff Layton
2023-12-19 14:42   ` David Howells
2023-12-19 14:48   ` David Howells
2023-12-13 15:23 ` [PATCH v4 19/39] netfs: Extend the netfs_io_*request structs to handle writes David Howells
2023-12-13 15:23 ` [PATCH v4 20/39] netfs: Add a hook to allow tell the netfs to update its i_size David Howells
2023-12-13 15:23 ` [PATCH v4 21/39] netfs: Make netfs_put_request() handle a NULL pointer David Howells
2023-12-13 15:23 ` [PATCH v4 22/39] netfs: Make the refcounting of netfs_begin_read() easier to use David Howells
2023-12-13 15:23 ` [PATCH v4 23/39] netfs: Prep to use folio->private for write grouping and streaming write David Howells
2023-12-13 15:23 ` [PATCH v4 24/39] netfs: Dispatch write requests to process a writeback slice David Howells
2023-12-13 15:23 ` [PATCH v4 25/39] netfs: Provide func to copy data to pagecache for buffered write David Howells
2023-12-13 15:23 ` [PATCH v4 26/39] netfs: Make netfs_read_folio() handle streaming-write pages David Howells
2023-12-13 15:23 ` [PATCH v4 27/39] netfs: Allocate multipage folios in the writepath David Howells
2023-12-13 15:23 ` [PATCH v4 28/39] netfs: Implement support for unbuffered/DIO read David Howells
2023-12-14 12:43   ` Jeff Layton
2023-12-19 15:46   ` David Howells
2023-12-13 15:23 ` [PATCH v4 29/39] netfs: Implement unbuffered/DIO write support David Howells
2023-12-13 15:23 ` [PATCH v4 30/39] netfs: Implement buffered write API David Howells
2023-12-13 15:23 ` [PATCH v4 31/39] netfs: Allow buffered shared-writeable mmap through netfs_page_mkwrite() David Howells
2023-12-13 15:23 ` [PATCH v4 32/39] netfs: Provide netfs_file_read_iter() David Howells
2023-12-13 15:23 ` [PATCH v4 33/39] netfs, cachefiles: Pass upper bound length to allow expansion David Howells
2023-12-13 15:23 ` [PATCH v4 34/39] netfs: Provide a writepages implementation David Howells
2023-12-13 15:23 ` [PATCH v4 35/39] netfs: Provide a launder_folio implementation David Howells
2023-12-13 15:23 ` [PATCH v4 36/39] netfs: Implement a write-through caching option David Howells
2023-12-14 13:49   ` Jeff Layton
2023-12-19 16:51   ` David Howells
2023-12-19 17:19     ` Jeff Layton
2023-12-13 15:23 ` [PATCH v4 37/39] netfs: Optimise away reads above the point at which there can be no data David Howells
2023-12-14 14:07   ` Jeff Layton
2023-12-19 16:56   ` David Howells
2023-12-13 15:23 ` [PATCH v4 38/39] afs: Use the netfs write helpers David Howells
2023-12-13 15:23 ` [PATCH v4 39/39] 9p: Use netfslib read/write_iter David Howells
2023-12-13 15:39   ` Christian Schoenebeck
2023-12-14 14:11 ` Jeff Layton [this message]
2023-12-15 12:03 ` [PATCH v4 00/39] netfs, afs, 9p: Delegate high-level I/O to netfslib Christian Brauner
2023-12-15 13:29   ` Dominique Martinet
2023-12-18 11:05     ` Christian Brauner
2023-12-20 10:04   ` David Howells
2023-12-20 13:26     ` Christian Brauner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ede57e618abfc38229157447fb152f6027eb1b8e.camel@kernel.org \
    --to=jlayton@kernel.org \
    --cc=asmadeus@codewreck.org \
    --cc=ceph-devel@vger.kernel.org \
    --cc=christian@brauner.io \
    --cc=dhowells@redhat.com \
    --cc=ericvh@kernel.org \
    --cc=idryomov@gmail.com \
    --cc=linux-afs@lists.infradead.org \
    --cc=linux-cachefs@redhat.com \
    --cc=linux-cifs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=marc.dionne@auristor.com \
    --cc=netdev@vger.kernel.org \
    --cc=pc@manguebit.com \
    --cc=smfrench@gmail.com \
    --cc=sprasad@microsoft.com \
    --cc=tom@talpey.com \
    --cc=v9fs@lists.linux.dev \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).