From: Christoph Hellwig <hch@infradead.org>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>,
"Ritesh Harjani (IBM)" <ritesh.list@gmail.com>,
linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org,
linux-fsdevel@vger.kernel.org, Dave Chinner <david@fromorbit.com>,
Matthew Wilcox <willy@infradead.org>,
Christian Brauner <brauner@kernel.org>,
Ojaswin Mujoo <ojaswin@linux.ibm.com>, Jan Kara <jack@suse.cz>,
Luis Chamberlain <mcgrof@kernel.org>
Subject: Re: [PATCH] Documentation: document the design of iomap and how to port
Date: Tue, 11 Jun 2024 09:12:13 -0700 [thread overview]
Message-ID: <Zmh3XTDLM1TToQ2g@infradead.org> (raw)
In-Reply-To: <20240609155506.GT52987@frogsfrogsfrogs>
On Sun, Jun 09, 2024 at 08:55:06AM -0700, Darrick J. Wong wrote:
> HTML version here, text version below.
That is so much nicer than all the RST stuff..
> iomap is a filesystem library for handling various filesystem
> operations that involves mapping of file's logical offset ranges
> to physical extents. This origins of this library is the file I/O
> path that XFS once used; it has now been extended to cover several
> other operations. The library provides various APIs for
> implementing various file and pagecache operations, such as:
Does anyone care about the origin?
>
> * Pagecache reads and writes
>
> * Folio write faults to the pagecache
>
> * Writeback of dirty folios
>
> * Direct I/O reads and writes
>
> * FIEMAP
>
> * lseek SEEK_DATA and SEEK_HOLE
>
> * swapfile activation
One useful bit might be that there are two layer in iomap.
1) the very simply underlying layer in iter.c that just provides
a nicer iteration over logical file offsets
2) anything built on top. That's the things mentioned above plus
DAX.
What is also kinda interesting as it keeps confusing people is that
nothing in the iterator is block device specific. In fact the DAX
code now has no block device dependencies, as does the lseek and
FIEMAP code.
Because of that it might make sense to split this document up a bit
for the different layers and libraries. Or maybe not if too many
documents are too confusing.
> 2. For each sub-unit of work...
>
> 1. Revalidate the mapping and go back to (1) above, if
> necessary
That's something only really done in the buffered write path.
> Each iomap operation will be covered in more detail below. This
> library was covered previously by an LWN article and a
> KernelNewbies page.
Maybe these are links in other formats, but if not this information
isn't very useful. Depending on how old that information is it
probably isn't even with links.
> The filesystem returns the mappings via the following structure.
> For documentation purposes, the structure has been reordered to
> group fields that go together logically.
I don't think putting a different layout in here is a good idea.
In fact duplicating the definition means it will be out of sync
rather sooner than later. Given that we have to deal with RST anyway
we might as well want to pull this in as kerneldoc comments.
And maybe reorder the actual definition while we're at it,
as the version below still packs nicely.
> struct block_device *bdev;
> struct dax_device *dax_dev;
> void *inline_data;
Note: The could become a union these days. I tried years ago
before fully decoupling the DAX code and that didn't work,
but we should be fine now.
> * type describes the type of the space mapping:
>
> * IOMAP_HOLE: No storage has been allocated. This type
> must never be returned in response to an IOMAP_WRITE
> operation because writes must allocate and map space,
> and return the mapping. The addr field must be set to
> IOMAP_NULL_ADDR. iomap does not support writing
> (whether via pagecache or direct I/O) to a hole.
...
These should probably also be kerneldoc comments instead of being
away from the definitions?
>
> * IOMAP_F_XATTR: The mapping is for extended attribute
> data, not regular file data. This is only useful for
> FIEMAP.
.. and only used inside XFS. Maybe we should look into killing it.
> These struct kiocb flags are significant for buffered I/O with
> iomap:
>
> * IOCB_NOWAIT: Only proceed with the I/O if mapping data are
> already in memory, we do not have to initiate other I/O, and
> we acquire all filesystem locks without blocking. Neither
> this flag nor its definition RWF_NOWAIT actually define what
> this flag means, so this is the best the author could come
> up with.
I don't think that's true. But if it feels true to you submitting
a patch to describe it better is probably more helpful than this.
> iomap internally tracks two state bits per fsblock:
>
> * uptodate: iomap will try to keep folios fully up to date. If
> there are read(ahead) errors, those fsblocks will not be
> marked uptodate. The folio itself will be marked uptodate
> when all fsblocks within the folio are uptodate.
>
> * dirty: iomap will set the per-block dirty state when
> programs write to the file. The folio itself will be marked
> dirty when any fsblock within the folio is dirty.
>
> iomap also tracks the amount of read and write disk IOs that are
> in flight. This structure is much lighter weight than struct
> buffer_head.
Is this really something that should go into an API documentation?
Note that the structure not only is lighter weight than a buffer_head,
but more importantly there are a lot less of them as there is only
one per folio and not one per FSB.
> Why Convert to iomap?
Make this a separate document?
next prev parent reply other threads:[~2024-06-11 16:12 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-08 0:17 [PATCH] Documentation: document the design of iomap and how to port Darrick J. Wong
2024-06-09 6:36 ` Christoph Hellwig
2024-06-09 15:55 ` Darrick J. Wong
2024-06-10 14:18 ` Jan Kara
2024-06-10 21:59 ` Darrick J. Wong
2024-06-10 22:25 ` Jan Kara
2024-06-11 1:32 ` Dave Chinner
2024-06-12 0:37 ` Darrick J. Wong
2024-06-11 16:12 ` Christoph Hellwig [this message]
2024-06-11 21:43 ` Darrick J. Wong
2024-06-10 8:57 ` Ritesh Harjani
2024-06-10 23:11 ` Darrick J. Wong
2024-06-11 6:43 ` Ritesh Harjani
2024-06-11 21:50 ` Darrick J. Wong
2024-06-12 6:55 ` Ritesh Harjani
2024-06-11 10:45 ` Ritesh Harjani
2024-06-11 23:47 ` Darrick J. Wong
2024-06-12 6:37 ` Ritesh Harjani
2024-06-12 22:15 ` Darrick J. Wong
2024-06-12 13:24 ` Ritesh Harjani
2024-06-13 17:58 ` Darrick J. Wong
2024-06-14 15:01 ` Ritesh Harjani
2024-06-14 20:41 ` Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zmh3XTDLM1TToQ2g@infradead.org \
--to=hch@infradead.org \
--cc=brauner@kernel.org \
--cc=david@fromorbit.com \
--cc=djwong@kernel.org \
--cc=jack@suse.cz \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=mcgrof@kernel.org \
--cc=ojaswin@linux.ibm.com \
--cc=ritesh.list@gmail.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).