public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed
From: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
To: Jan Kara <jack@suse.cz>
Cc: linux-fsdevel@vger.kernel.org,  linux-block@vger.kernel.org,
	 Christian Brauner <brauner@kernel.org>,
	 Al Viro <viro@zeniv.linux.org.uk>,
	linux-ext4@vger.kernel.org,  Ted Tso <tytso@mit.edu>,
	 "Tigran A. Aivazian" <aivazian.tigran@gmail.com>,
	 David Sterba <dsterba@suse.com>,
	Muchun Song <muchun.song@linux.dev>,
	 Oscar Salvador <osalvador@suse.de>,
	David Hildenbrand <david@kernel.org>,
	 linux-mm@kvack.org, linux-aio@kvack.org,
	 Benjamin LaHaise <bcrl@kvack.org>
Subject: Re: [PATCH 15/42] fat: Sync and invalidate metadata buffers from fat_evict_inode()
Date: Wed, 01 Apr 2026 21:50:42 +0900	[thread overview]
Message-ID: <87pl4ideu5.fsf@mail.parknet.co.jp> (raw)
In-Reply-To: <dvo2jvc6upmzrgqxx3n6er3e2gvujheongtzkihzmllwxqiyuq@h3qrrnwxlwo4>

Jan Kara <jack@suse.cz> writes:

>> Hm, metadata block is shared by several inodes. So earlier flush
>> makes fewer chance to combining multiple dirties.
>> 
>> For example,
>>   create dir-A
>>   reclaimed and flushed dir-A
>>   add new entries to dir-A
>>   lost chance to combining re-dirty of dir-A
>
> Yes, but for this to be possible you would have to:
> 1) stop using dir-A between dir create & file creates
> 2) create enough memory pressure to cycle the dentry of dir-A through the
> LRU and reclaim it
> 3) continue memory pressure to cycle the inode for dir-A through the LRU
> and reclaim it
>
> So the amount of work that already has to happen to trigger flushing of
> a single block is so large that IMHO that flush will be lost in the noise.
>  
>> >> Anyway, with it, reclaimed
>> >> inode metadata will be flushed forcibly and frequently (yeah, may not be
>> >> significant though. but I can't see the benefit for users from this
>> >> change.), and lost to chance combining multiple time of dirty while copy
>> >> many files.
>> >
>> > The benefit for users is 24 bytes saved for the majority of inodes that are
>> > there in the system - all the virtual inodes on sysfs / proc filesystem,
>> > all tmpfs inodes, all XFS inodes, all ext4 inodes when using journal (once I
>> > optimize ext4 code a bit), etc. So actually quite a bit of kernel memory
>> > saved in common configurations.
>> >
>> > Another win is that with metadata buffer head tracking now separated, I can
>> > modify that code (which will require growing the tracking structure) to
>> > properly track buffer head containing the inode and flush it on fsync(2).
>> > Currently there's a race that if flush worker writes out inode before
>> > fsync(2), then fsync(2) does not writeout the buffer containing the inode
>> > at all and thus data is not really persistent. This is actually my initial
>> > motivation for this refactoring since growing inode for everybody to fix
>> > data consistency issues of FAT/ext2/udf isn't popular these days...
>> 
>> Agree, it is good.  I'm only saying about the flushing earlier. To
>> implement it, is the flush earlier really necessary?
>
> Yes, to separate metadata buffer head tracking into a separate structure we
> must remove the handling of buffer head list from generic inode reclaim (as
> the filesystem has no way to provide the separate tracking structure there).
> Of course we could add a filesystem hook to inode reclaim to allow for
> handling of metadata bhs but:
>
> a) I'd rather do that in a way that is usable also for other issues
> filesystems have with inode reclaim as I mentioned in this thread before
>
> b) I don't think it's warranted for FAT etc. at this point as I don't think
> the possible overhead of metadata bh flushing on inode reclaim will be a
> problem in practice.
>
> But of course we can reevaluate if my gut feeling is wrong and someone
> comes with a workload which significantly regresses due to these changes.

OK.  I'm still thinking we should go the way to reduce the amplification
for performance and storage lifetime if possible, not increasing.

However discussion looks like enough for us, and looks like we just
voted to different priority.

Thanks.
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>


  reply	other threads:[~2026-04-01 12:50 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-26  9:53 [PATCH v3 0/42] fs: Move metadata bh tracking from address_space Jan Kara
2026-03-26  9:53 ` [PATCH 01/42] ext4: Use inode_has_buffers() Jan Kara
2026-03-26  9:53 ` [PATCH 02/42] gfs2: Don't zero i_private_data Jan Kara
2026-03-26  9:53 ` [PATCH 03/42] ntfs3: Drop pointless sync_mapping_buffers() and invalidate_inode_buffers() calls Jan Kara
2026-03-26  9:53 ` [PATCH 04/42] ocfs2: Drop pointless sync_mapping_buffers() calls Jan Kara
2026-03-26  9:53 ` [PATCH 05/42] bdev: Drop pointless invalidate_inode_buffers() call Jan Kara
2026-03-27  6:20   ` Christoph Hellwig
2026-03-26  9:54 ` [PATCH 06/42] ufs: Drop pointless invalidate_mapping_buffers() call Jan Kara
2026-03-26  9:54 ` [PATCH 07/42] exfat: Drop pointless invalidate_inode_buffers() call Jan Kara
2026-03-26  9:54 ` [PATCH 08/42] fs: Remove inode lock from __generic_file_fsync() Jan Kara
2026-03-27  6:20   ` Christoph Hellwig
2026-03-26  9:54 ` [PATCH 09/42] udf: Switch to generic_buffers_fsync() Jan Kara
2026-03-26  9:54 ` [PATCH 10/42] minix: " Jan Kara
2026-03-26  9:54 ` [PATCH 11/42] bfs: " Jan Kara
2026-03-26  9:54 ` [PATCH 12/42] fat: Switch to generic_buffers_fsync_noflush() Jan Kara
2026-03-26  9:54 ` [PATCH 13/42] fs: Drop sync_mapping_buffers() from __generic_file_fsync() Jan Kara
2026-03-27  6:21   ` Christoph Hellwig
2026-03-26  9:54 ` [PATCH 14/42] fs: Rename generic_file_fsync() to simple_fsync() Jan Kara
2026-03-27  6:22   ` Christoph Hellwig
2026-03-27 16:26     ` Jan Kara
2026-03-26  9:54 ` [PATCH 15/42] fat: Sync and invalidate metadata buffers from fat_evict_inode() Jan Kara
2026-03-29 13:55   ` OGAWA Hirofumi
2026-03-30  9:08     ` Jan Kara
2026-03-30 11:29       ` OGAWA Hirofumi
2026-03-31  8:49         ` Jan Kara
2026-03-31 10:40           ` OGAWA Hirofumi
2026-04-01  9:11             ` Jan Kara
2026-04-01  9:41               ` OGAWA Hirofumi
2026-04-01 10:36                 ` Jan Kara
2026-04-01 12:50                   ` OGAWA Hirofumi [this message]
2026-03-26  9:54 ` [PATCH 16/42] udf: Sync and invalidate metadata buffers from udf_evict_inode() Jan Kara
2026-03-26  9:54 ` [PATCH 17/42] minix: Sync and invalidate metadata buffers from minix_evict_inode() Jan Kara
2026-03-26  9:54 ` [PATCH 18/42] ext2: Sync and invalidate metadata buffers from ext2_evict_inode() Jan Kara
2026-03-26  9:54 ` [PATCH 19/42] ext4: Sync and invalidate metadata buffers from ext4_evict_inode() Jan Kara
2026-03-26  9:54 ` [PATCH 20/42] bfs: Sync and invalidate metadata buffers from bfs_evict_inode() Jan Kara
2026-03-26  9:54 ` [PATCH 21/42] affs: Sync and invalidate metadata buffers from affs_evict_inode() Jan Kara
2026-03-26  9:54 ` [PATCH 22/42] fs: Ignore inode metadata buffers in inode_lru_isolate() Jan Kara
2026-03-27  6:22   ` Christoph Hellwig
2026-03-26  9:54 ` [PATCH 23/42] fs: Stop using i_private_data for metadata bh tracking Jan Kara
2026-03-26  9:54 ` [PATCH 24/42] hugetlbfs: Stop using i_private_data Jan Kara
2026-03-26  9:54 ` [PATCH 25/42] aio: Stop using i_private_data and i_private_lock Jan Kara
2026-03-26  9:54 ` [PATCH 26/42] fs: Remove i_private_data Jan Kara
2026-03-26  9:54 ` [PATCH 27/42] kvm: Use private inode list instead of i_private_list Jan Kara
2026-03-26  9:54 ` [PATCH 28/42] fs: Drop osync_buffers_list() Jan Kara
2026-03-26  9:54 ` [PATCH 29/42] fs: Fold fsync_buffers_list() into sync_mapping_buffers() Jan Kara
2026-03-26  9:54 ` [PATCH 30/42] fs: Move metadata bhs tracking to a separate struct Jan Kara
2026-03-26  9:54 ` [PATCH 31/42] fs: Make bhs point to mapping_metadata_bhs Jan Kara
2026-03-26  9:54 ` [PATCH 32/42] fs: Switch inode_has_buffers() to take mapping_metadata_bhs Jan Kara
2026-03-26  9:54 ` [PATCH 33/42] fs: Provide functions for handling mapping_metadata_bhs directly Jan Kara
2026-03-27  6:23   ` Christoph Hellwig
2026-03-26  9:54 ` [PATCH 34/42] ext2: Track metadata bhs in fs-private inode part Jan Kara
2026-03-26  9:54 ` [PATCH 35/42] affs: " Jan Kara
2026-03-26  9:54 ` [PATCH 36/42] bfs: " Jan Kara
2026-03-26  9:54 ` [PATCH 37/42] fat: " Jan Kara
2026-03-26  9:54 ` [PATCH 38/42] udf: " Jan Kara
2026-03-26  9:54 ` [PATCH 39/42] minix: " Jan Kara
2026-03-26  9:54 ` [PATCH 40/42] ext4: " Jan Kara
2026-03-26  9:54 ` [PATCH 41/42] fs: Drop mapping_metadata_bhs from address space Jan Kara
2026-03-27  6:24   ` Christoph Hellwig
2026-03-26  9:54 ` [PATCH 42/42] fs: Drop i_private_list from address_space Jan Kara
2026-03-27  6:24   ` Christoph Hellwig
2026-03-26 14:06 ` [PATCH v3 0/42] fs: Move metadata bh tracking " Christian Brauner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87pl4ideu5.fsf@mail.parknet.co.jp \
    --to=hirofumi@mail.parknet.co.jp \
    --cc=aivazian.tigran@gmail.com \
    --cc=bcrl@kvack.org \
    --cc=brauner@kernel.org \
    --cc=david@kernel.org \
    --cc=dsterba@suse.com \
    --cc=jack@suse.cz \
    --cc=linux-aio@kvack.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=muchun.song@linux.dev \
    --cc=osalvador@suse.de \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox