From: Matthew Wilcox <willy@infradead.org>
To: Qu Wenruo <quwenruo.btrfs@gmx.com>
Cc: josef@toxicpanda.com, linux-f2fs-devel@lists.sourceforge.net,
clm@fb.com, terrelln@fb.com, dsterba@suse.com,
linux-btrfs@vger.kernel.org
Subject: Re: [f2fs-dev] [PATCH 02/14] btrfs: convert get_next_extent_buffer() to take a folio
Date: Fri, 23 Aug 2024 16:38:27 +0100 [thread overview]
Message-ID: <Zsis82IKSAq6Mgms@casper.infradead.org> (raw)
In-Reply-To: <7a04ac3b-e655-4a80-89dc-19962db50f05@gmx.com>
On Fri, Aug 23, 2024 at 11:43:41AM +0930, Qu Wenruo wrote:
> 在 2024/8/23 07:55, Qu Wenruo 写道:
> > 在 2024/8/22 21:37, Matthew Wilcox 写道:
> > > On Thu, Aug 22, 2024 at 08:28:09PM +0930, Qu Wenruo wrote:
> > > > But what will happen if some writes happened to that larger folio?
> > > > Do MM layer detects that and split the folios? Or the fs has to go the
> > > > subpage routine (with an extra structure recording all the subpage flags
> > > > bitmap)?
> > >
> > > Entirely up to the filesystem. It would help if btrfs used the same
> > > terminology as the rest of the filesystems instead of inventing its own
> > > "subpage" thing. As far as I can tell, "subpage" means "fs block size",
> > > but maybe it has a different meaning that I haven't ascertained.
> >
> > Then tell me the correct terminology to describe fs block size smaller
> > than page size in the first place.
> >
> > "fs block size" is not good enough, we want a terminology to describe
> > "fs block size" smaller than page size.
Oh dear. btrfs really has corrupted your brain. Here's the terminology
used in the rest of Linux:
SECTOR_SIZE. Fixed at 512 bytes. This is the unit used to communicate
with the block layer. It has no real meaning, other than Linux doesn't
support block devices with 128 and 256 byte sector sizes (I have used
such systems, but not in the last 30 years).
LBA size. This is the unit that the block layer uses to communicate
with the block device. Must be at least SECTOR_SIZE. I/O cannot be
performed in smaller chunks than this.
Physical block size. This is the unit that the device advertises as
its efficient minimum size. I/Os smaller than this / not aligned to
this will probably incur a performance penalty as the device will need
to do a read-modify-write cycle.
fs block size. Known as s_blocksize or i_blocksize. Must be a multiple
of LBA size, but may be smaller than physical block size. Files are
allocated in multiples of this unit.
PAGE_SIZE. Unit that memory can be mapped in. This applies to both
userspace mapping of files as well as calls to kmap_local_*().
folio size. The size that the page cache has decided to manage this
chunk of the file in. A multiple of PAGE_SIZE.
I've mostly listed this in smallest to largest. The relationships that
must be true:
SS <= LBA <= Phys
LBA <= fsb
PS <= folio
fsb <= folio
ocfs2 supports fsb > PAGE_SIZE, but this is a rarity. Most filesystems
require fsb <= PAGE_SIZE.
Filesystems like UFS also support a fragment size which is less than fs
block size. It's kind of like tail packing. Anyway, that's internal to
the filesystem and not exposed to the VFS.
> > > I have no idea why btrfs thinks it needs to track writeback, ordered,
> > > checked and locked in a bitmap. Those make no sense to me. But they
> > > make no sense to me if you're support a 4KiB filesystem on a machine
> > > with a 64KiB PAGE_SIZE, not just in the context of "larger folios".
> > > Writeback is something the VM tells you to do; why do you need to tag
> > > individual blocks for writeback?
> >
> > Because there are cases where btrfs needs to only write back part of the
> > folio independently.
iomap manages to do this with only tracking per-block dirty bits.
> > And especially for mixing compression and non-compression writes inside
> > a page, e.g:
> >
> > 0 16K 32K 48K 64K
> > |//| |///////|
> > 4K
> >
> > In above case, if we need to writeback above page with 4K sector size,
> > then the first 4K is not suitable for compression (result will still
> > take a full 4K block), while the range [32K, 48K) will be compressed.
> >
> > In that case, [0, 4K) range will be submitted directly for IO.
> > Meanwhile [32K, 48) will be submitted for compression in antoher wq.
> > (Or time consuming compression will delay the writeback of the remaining
> > pages)
> >
> > This means the dirty/writeback flags will have a different timing to be
> > changed.
>
> Just in case if you mean using an atomic to trace the writeback/lock
> progress, then it's possible to go that path, but for now it's not space
> efficient.
>
> For 16 blocks per page case (4K sectorsize 64K page size), each atomic
> takes 4 bytes while a bitmap only takes 2 bytes.
>
> And for 4K sectorsize 16K page size case, it's worse and btrfs compact
> all the bitmaps into a larger one to save more space, while each atomic
> still takes 4 bytes.
Sure, but it doesn't scale up well. And it's kind of irrelevant whether
you occupy 2 or 4 bytes at the low end because you're allocating all
this through slab, so you get rounded to 8 bytes anyway.
iomap_folio_state currently occupies 12 bytes + 2 bits per block. So
for a 16 block folio (4k in 64k), that's 32 bits for a total of 16
bytes. For a 2MB folio on a 4kB block size fs, that's 1024 bits for
a total of 140 bytes (rounded to 192 bytes by slab).
Hm, it might be worth adding a kmalloc-160, we'd get 25 objects per 4KiB
page instead of 21 192-byte objects ...
_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
next prev parent reply other threads:[~2024-08-23 15:38 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-22 1:37 [f2fs-dev] [PATCH -next 00/14] btrfs: Cleaned up folio->page conversion Li Zetao via Linux-f2fs-devel
2024-08-22 1:37 ` [f2fs-dev] [PATCH 01/14] btrfs: convert clear_page_extent_mapped() to take a folio Li Zetao via Linux-f2fs-devel
2024-08-22 3:02 ` Matthew Wilcox
2024-08-22 1:37 ` [f2fs-dev] [PATCH 02/14] btrfs: convert get_next_extent_buffer() " Li Zetao via Linux-f2fs-devel
2024-08-22 3:05 ` Matthew Wilcox
2024-08-22 10:58 ` Qu Wenruo via Linux-f2fs-devel
2024-08-22 12:07 ` Matthew Wilcox
2024-08-22 22:25 ` Qu Wenruo via Linux-f2fs-devel
2024-08-23 2:13 ` Qu Wenruo via Linux-f2fs-devel
2024-08-23 15:38 ` Matthew Wilcox [this message]
2024-08-23 21:32 ` Qu Wenruo via Linux-f2fs-devel
2024-08-26 14:13 ` Josef Bacik
2024-08-26 16:56 ` Matthew Wilcox
2024-08-26 21:32 ` Josef Bacik
2024-08-23 15:17 ` Josef Bacik
2024-08-22 11:01 ` Qu Wenruo via Linux-f2fs-devel
2024-08-22 1:37 ` [f2fs-dev] [PATCH 03/14] btrfs: convert try_release_subpage_extent_buffer() " Li Zetao via Linux-f2fs-devel
2024-08-22 1:37 ` [f2fs-dev] [PATCH 04/14] btrfs: convert try_release_extent_buffer() " Li Zetao via Linux-f2fs-devel
2024-08-22 1:37 ` [f2fs-dev] [PATCH 05/14] btrfs: convert read_key_bytes() " Li Zetao via Linux-f2fs-devel
2024-08-22 3:28 ` Matthew Wilcox
2024-08-22 1:37 ` [f2fs-dev] [PATCH 06/14] btrfs: convert submit_eb_subpage() " Li Zetao via Linux-f2fs-devel
2024-08-22 1:37 ` [f2fs-dev] [PATCH 07/14] btrfs: convert submit_eb_page() " Li Zetao via Linux-f2fs-devel
2024-08-22 1:37 ` [f2fs-dev] [PATCH 08/14] btrfs: convert try_release_extent_state() " Li Zetao via Linux-f2fs-devel
2024-08-22 1:37 ` [f2fs-dev] [PATCH 09/14] btrfs: convert try_release_extent_mapping() " Li Zetao via Linux-f2fs-devel
2024-08-22 1:37 ` [f2fs-dev] [PATCH 10/14] btrfs: convert zlib_decompress() " Li Zetao via Linux-f2fs-devel
2024-08-22 1:37 ` [f2fs-dev] [PATCH 11/14] btrfs: convert lzo_decompress() " Li Zetao via Linux-f2fs-devel
2024-08-22 1:37 ` [f2fs-dev] [PATCH 12/14] btrfs: convert zstd_decompress() " Li Zetao via Linux-f2fs-devel
2024-08-22 1:37 ` [f2fs-dev] [PATCH 13/14] btrfs: convert btrfs_decompress() " Li Zetao via Linux-f2fs-devel
2024-08-22 1:37 ` [f2fs-dev] [PATCH 14/14] btrfs: convert copy_inline_to_page() to use folio Li Zetao via Linux-f2fs-devel
2024-08-23 19:50 ` [f2fs-dev] [PATCH -next 00/14] btrfs: Cleaned up folio->page conversion Josef Bacik
2024-08-23 21:15 ` Josef Bacik
2024-08-26 14:08 ` Josef Bacik
2024-08-28 13:08 ` Li Zetao via Linux-f2fs-devel
2024-09-25 3:41 ` patchwork-bot+f2fs--- via Linux-f2fs-devel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zsis82IKSAq6Mgms@casper.infradead.org \
--to=willy@infradead.org \
--cc=clm@fb.com \
--cc=dsterba@suse.com \
--cc=josef@toxicpanda.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-f2fs-devel@lists.sourceforge.net \
--cc=quwenruo.btrfs@gmx.com \
--cc=terrelln@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).