Linux network filesystem support library
 help / color / mirror / Atom feed
From: David Howells <dhowells@redhat.com>
To: David Laight <david.laight.linux@gmail.com>
Cc: dhowells@redhat.com, Christian Brauner <christian@brauner.io>,
	Matthew Wilcox <willy@infradead.org>,
	Christoph Hellwig <hch@infradead.org>,
	Paulo Alcantara <pc@manguebit.org>, Jens Axboe <axboe@kernel.dk>,
	Leon Romanovsky <leon@kernel.org>,
	Steve French <sfrench@samba.org>,
	ChenXiaoSong <chenxiaosong@chenxiaosong.com>,
	Marc Dionne <marc.dionne@auristor.com>,
	Eric Van Hensbergen <ericvh@kernel.org>,
	Dominique Martinet <asmadeus@codewreck.org>,
	Ilya Dryomov <idryomov@gmail.com>,
	Trond Myklebust <trondmy@kernel.org>,
	netfs@lists.linux.dev, linux-afs@lists.infradead.org,
	linux-cifs@vger.kernel.org, linux-nfs@vger.kernel.org,
	ceph-devel@vger.kernel.org, v9fs@lists.linux.dev,
	linux-erofs@lists.ozlabs.org, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 00/21] netfs: Keep track of folios in a segmented bio_vec[] chain
Date: Tue, 19 May 2026 09:56:35 +0100	[thread overview]
Message-ID: <586308.1779180995@warthog.procyon.org.uk> (raw)
In-Reply-To: <20260519091545.171c4b85@pumpkin>

David Laight <david.laight.linux@gmail.com> wrote:

> > 	struct bvecq {
> > 		struct bvecq		*next;
> > 		struct bvecq		*prev;
> > 		unsigned long long	fpos;
> > 		refcount_t		ref;
> > 		u32			priv;
> > 		u16			nr_segs;
> > 		u16			max_segs;
> > 		enum bvecq_mem		mem_type:2;
> > 		bool			inline_bv:1;
> > 		bool			discontig:1;
> 
> There doesn't seem to be any point using bitfields.
> There is a massive hole here anyway.

Depends on how you define "massive".  On a 64-bit machine, the whole thing
fits into 48 bytes - 6 words (or 3 bio_vec slots).  next, prev, fpos, bv and
ref+priv take up 5 of those words; nr_segs and max_segs take up half of the
6th, leaving a 4 byte hole.

You're right, though, I could make them all non-bitfields as the enum is
marked mode(byte).

> >  (1) next, prev - Link segments together in a list.  I want this to be
> >      NULL-terminated linear rather than circular to make it possible to
> >      arbitrarily glue bits on the front.
> 
> Do you ever need to follow the list backwards?

iov_iter_revert() exists, unfortunately, but yes, I would like to avoid having
a prev pointer.

I have a couple of ideas on how to get rid of that - or at least store the
start in struct iov_iter and always work forwards - but I haven't got round to
trying that yet.

> >  (2) fpos, discontig - Note the current file position of the first byte of
> >      the segment; all the bio_vecs in ->bv[] must be contiguous in the file
> >      space.  The fpos can be used to find the folio by file position rather
> >      then from the info in the bio_vec.
> 
> Should fpos be off_t (or u64) rather than 'long long' (they are all the
> same underlying type).

It's not 'long long' and off_t is actually 'long' in asm-generic.  Actually, I
should probably switch to using uoff_t.  Note that this file position should
never be seen as negative; I think loff_t should only really be used in
llseek.

> >      If there's a discontiguity, this should break over into a new bvecq
> >      segment with the discontig flag set (though this is redundant if you
> >      keep track of the file position).  Note that the beginning and end
> >      file positions in a segment need not be aligned to any filesystem
> >      block size.
> 
> At this point you lose me :-)

Apologies, but I'm trying to define how a bvecq chain works.  I need to codify
it more coherently.

So there's a number of reasons I want to be able to maintain the file position
information in the chain:

 (1) I can treat buffered writeback and DIO write more similarly if there's no
     requirement to access the folios in the list to get file position
     information.

 (2) When cleaning up lists of folios in buffered writeback, the file position
     is needed to access the i_pages xarray in order to clean up the marks on
     it.  This means I don't need to go from my list to access each folio, but
     can look them up through the xarray instead.

 (3) Some network filesystems, e.g. ceph, allow discontiguous (sparse) writes
     to be made to the server in a single RPC operation.  This gives a means
     to convey that information to them, but then allows the data to be
     conveyed in a single blob to the socket (the mapping between blob offsets
     and file regions is tabulated separately within the RPC call).

Note that some of this also applies to reads too.

The last bit about filesystem block size alignment is because network
filesystems don't typically require any block alignment, doing RMW locally on
the server.  I should really have separated that from the discontiguity bit.

David


      reply	other threads:[~2026-05-19  8:56 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-18 22:29 [PATCH v2 00/21] netfs: Keep track of folios in a segmented bio_vec[] chain David Howells
2026-05-18 22:29 ` [PATCH v2 01/21] cachefiles: Don't rely on backing fs storage map for most use cases David Howells
2026-05-18 22:29 ` [PATCH v2 02/21] netfs: Add the cache object ID to netfs_read/write tracepoints David Howells
2026-05-18 22:29 ` [PATCH v2 03/21] mm: Make readahead store folio count in readahead_control David Howells
2026-05-18 22:29 ` [PATCH v2 04/21] netfs: Bulk load the readahead-provided folios up front David Howells
2026-05-18 22:29 ` [PATCH v2 05/21] Add a function to kmap one page of a multipage bio_vec David Howells
2026-05-25  6:13   ` Christoph Hellwig
2026-06-04 14:25     ` David Howells
2026-05-18 22:29 ` [PATCH v2 06/21] iov_iter: Make iov_iter_get_pages*() wrap iov_iter_extract_pages() David Howells
2026-05-25  6:13   ` Christoph Hellwig
2026-06-04 14:26     ` David Howells
2026-05-18 22:29 ` [PATCH v2 07/21] iov_iter: Add a segmented queue of bio_vec[] David Howells
2026-05-18 22:29 ` [PATCH v2 08/21] netfs: Add some tools for managing bvecq chains David Howells
2026-05-18 22:29 ` [PATCH v2 09/21] netfs: Add a function to extract from an iter into a bvecq David Howells
2026-05-18 22:29 ` [PATCH v2 10/21] afs: Use a bvecq to hold dir content rather than folioq David Howells
2026-05-18 22:29 ` [PATCH v2 11/21] cifs: Use a bvecq for buffering instead of a folioq David Howells
2026-05-18 22:29 ` [PATCH v2 12/21] cifs: Support ITER_BVECQ in smb_extract_iter_to_rdma() David Howells
2026-05-19  8:18   ` Stefan Metzmacher
2026-05-18 22:29 ` [PATCH v2 13/21] netfs: Switch to using bvecq rather than folio_queue and rolling_buffer David Howells
2026-05-18 22:29 ` [PATCH v2 14/21] cifs: Remove support for ITER_FOLIOQ from smb_extract_iter_to_rdma() David Howells
2026-05-19  8:19   ` Stefan Metzmacher
2026-05-18 22:29 ` [PATCH v2 15/21] netfs: Remove netfs_alloc/free_folioq_buffer() David Howells
2026-05-18 22:29 ` [PATCH v2 16/21] netfs: Remove netfs_extract_user_iter() David Howells
2026-05-18 22:29 ` [PATCH v2 17/21] iov_iter: Remove ITER_FOLIOQ David Howells
2026-05-18 22:29 ` [PATCH v2 18/21] netfs: Remove folio_queue and rolling_buffer David Howells
2026-05-18 22:29 ` [PATCH v2 19/21] netfs: Check for too much data being read David Howells
2026-05-18 22:29 ` [PATCH v2 20/21] netfs: Limit the minimum trigger for progress reporting David Howells
2026-05-18 22:29 ` [PATCH v2 21/21] netfs: Combine prepare and issue ops and grab the buffers on request David Howells
2026-05-19  8:15 ` [PATCH v2 00/21] netfs: Keep track of folios in a segmented bio_vec[] chain David Laight
2026-05-19  8:56   ` David Howells [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=586308.1779180995@warthog.procyon.org.uk \
    --to=dhowells@redhat.com \
    --cc=asmadeus@codewreck.org \
    --cc=axboe@kernel.dk \
    --cc=ceph-devel@vger.kernel.org \
    --cc=chenxiaosong@chenxiaosong.com \
    --cc=christian@brauner.io \
    --cc=david.laight.linux@gmail.com \
    --cc=ericvh@kernel.org \
    --cc=hch@infradead.org \
    --cc=idryomov@gmail.com \
    --cc=leon@kernel.org \
    --cc=linux-afs@lists.infradead.org \
    --cc=linux-cifs@vger.kernel.org \
    --cc=linux-erofs@lists.ozlabs.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=marc.dionne@auristor.com \
    --cc=netfs@lists.linux.dev \
    --cc=pc@manguebit.org \
    --cc=sfrench@samba.org \
    --cc=trondmy@kernel.org \
    --cc=v9fs@lists.linux.dev \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox