public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: David Howells <dhowells@redhat.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: dhowells@redhat.com, Matthew Wilcox <willy@infradead.org>,
	Dave Chinner <david@fromorbit.com>,
	"Ritesh Harjani (IBM)" <ritesh.list@gmail.com>,
	linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	"Darrick J . Wong" <djwong@kernel.org>,
	Aravinda Herle <araherle@in.ibm.com>
Subject: Re: [RFC 2/2] iomap: Support subpage size dirty tracking to improve write performance
Date: Thu, 03 Nov 2022 14:51:10 +0000	[thread overview]
Message-ID: <7699.1667487070@warthog.procyon.org.uk> (raw)
In-Reply-To: <Y2IyTx0VwXMxzs0G@infradead.org>

Christoph Hellwig <hch@infradead.org> wrote:

> > filesystems right now.  Dave Howells' netfs infrastructure is trying
> > to solve the problem for everyone (and he's been looking at iomap as
> > inspiration for what he's doing).
> 
> Btw, I never understod why the network file systems don't just use
> iomap.  There is nothing block specific in the core iomap code.

It calls creates and submits bio structs all over the place.  This seems to
require a blockdev.

Anyway, netfs lib supports, or hopefully will support in the future, the
following:

 (1) Fscache.  netfslib will construct a read you're asking for from cached
     data and data from the server and stitch them together (where a folio may
     comprise pieces from more than once source), and then write the bits it
     read from the server out to the cache...  And handle content encryption
     for you such that the data stored in the cache is content-encrypted.

     On writeback, the dirty data must be written to both the cache (if you
     have one) and the server (if you're not in disconnected operation).

 (2) Disconnected operation.  netfslib will, in the future, handle storing
     data and changes in the cache and then sync'ing on reconnection of an
     object.

 (3) I want to hand persistent (for the life of an op) iov_iters to the
     filesystem so that the filesystem can, if it wants to, pass these to the
     kernel_sendmsg() and kernel_recvmsg() in the bottom.

     The aim is to get knowledge of pages out of the network filesystem
     entirely.  A network filesystem would then provide two basic hooks to the
     server: async direct read and as async direct write.  netfslib will use
     these to access the pagecache on behalf of the filesystem.

 (4) Reads and writes might want to/need to be non-block-size aligned.  If we
     have a byte-range file lock, for example, or if we have a max block size
     (eg. rsize/wsize) set that's not a multiple of 512, say.

 (5) Compressed I/O.  You get back more data than you asked for and you want
     to paste the rest into the pagecache (if buffered) or discard it (if
     DIO).  Further, to make this work on write, we may need to hold on to
     pages on the sides of the one we modified to make sure we keep the right
     size blob of data to recompress and send back.

 (6) Larger cache block granularity.  One thing I want to explore is the
     ability to have blocks in the cache that are larger than PAGE_SIZE.  If I
     can't use the backing filesystem's knowledge of holes in a file, then I
     have to store my own metadata (ie. effectively build a filesystem on top
     of a filesystem).  To reduce that amount of metadata that I need, I can
     make the cache granule size larger.

     In both 5 and 6, netfslib gets to tell the VM layer to increase the size
     of the blob in readahead() - and then may have to forcibly keep the pages
     surrounding the page of interest if it gets modified in order to be able
     to write to the cache correctly, depending on how much integrity I want
     to try and keep in the cache.

 (7) Not-quite-direct-I/O.  cifs, for example, has a number of variations on
     read and write modes that are kind of but not quite direct I/O.

David


  parent reply	other threads:[~2022-11-03 14:52 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-28  4:30 [RFC 0/2] iomap: Add support for subpage dirty state tracking to improve write performance Ritesh Harjani (IBM)
2022-10-28  4:30 ` [RFC 1/2] iomap: Change uptodate variable name to state Ritesh Harjani (IBM)
2022-10-28 16:31   ` Darrick J. Wong
2022-10-29  3:09     ` Ritesh Harjani (IBM)
2022-10-28  4:30 ` [RFC 2/2] iomap: Support subpage size dirty tracking to improve write performance Ritesh Harjani (IBM)
2022-10-28 12:42   ` Matthew Wilcox
2022-10-29  3:05     ` Ritesh Harjani (IBM)
2022-10-28 17:01   ` Darrick J. Wong
2022-10-28 18:15     ` Matthew Wilcox
2022-10-29  3:25     ` Ritesh Harjani (IBM)
2022-10-28 21:04   ` Dave Chinner
2022-10-30  3:27     ` Ritesh Harjani (IBM)
2022-10-30 22:31       ` Dave Chinner
2022-10-31  3:43     ` Matthew Wilcox
2022-10-31  7:08       ` Dave Chinner
2022-10-31 10:27         ` Matthew Wilcox
2022-11-02  8:57           ` Christoph Hellwig
2022-11-03  0:38             ` Dave Chinner
2022-11-02  9:03       ` Christoph Hellwig
2022-11-02 17:35         ` Darrick J. Wong
2022-11-04  7:27           ` Christoph Hellwig
2022-11-04 14:15             ` Ritesh Harjani (IBM)
2022-11-03 14:51         ` David Howells [this message]
2022-11-04  7:30           ` Christoph Hellwig
2022-11-07 13:03             ` David Howells
2022-11-03 14:12       ` David Howells
2022-11-04 11:28       ` Ritesh Harjani (IBM)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7699.1667487070@warthog.procyon.org.uk \
    --to=dhowells@redhat.com \
    --cc=araherle@in.ibm.com \
    --cc=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=hch@infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=ritesh.list@gmail.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox