All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Joanne Koong <joannelkoong@gmail.com>
Cc: Brian Foster <bfoster@redhat.com>,
	linux-mm@kvack.org, brauner@kernel.org, willy@infradead.org,
	jack@suse.cz, hch@infradead.org, jlayton@kernel.org,
	linux-fsdevel@vger.kernel.org, kernel-team@meta.com
Subject: Re: [PATCH v2 12/12] iomap: add granular dirty and writeback accounting
Date: Wed, 3 Sep 2025 19:52:10 -0700	[thread overview]
Message-ID: <20250904025210.GO8117@frogsfrogsfrogs> (raw)
In-Reply-To: <CAJnrk1Z6qKqkOwHJwaBfE9FEGABGD4JKoEwNbRJTpOWL-VtPrg@mail.gmail.com>

On Wed, Sep 03, 2025 at 05:35:51PM -0700, Joanne Koong wrote:
> On Wed, Sep 3, 2025 at 11:44 AM Brian Foster <bfoster@redhat.com> wrote:
> >
> > On Tue, Sep 02, 2025 at 04:46:04PM -0700, Darrick J. Wong wrote:
> > > On Fri, Aug 29, 2025 at 04:39:42PM -0700, Joanne Koong wrote:
> > > > Add granular dirty and writeback accounting for large folios. These
> > > > stats are used by the mm layer for dirty balancing and throttling.
> > > > Having granular dirty and writeback accounting helps prevent
> > > > over-aggressive balancing and throttling.
> > > >
> > > > There are 4 places in iomap this commit affects:
> > > > a) filemap dirtying, which now calls filemap_dirty_folio_pages()
> > > > b) writeback_iter with setting the wbc->no_stats_accounting bit and
> > > > calling clear_dirty_for_io_stats()
> > > > c) starting writeback, which now calls __folio_start_writeback()
> > > > d) ending writeback, which now calls folio_end_writeback_pages()
> > > >
> > > > This relies on using the ifs->state dirty bitmap to track dirty pages in
> > > > the folio. As such, this can only be utilized on filesystems where the
> > > > block size >= PAGE_SIZE.
> > >
> > > Er... is this statement correct?  I thought that you wanted the granular
> > > dirty page accounting when it's possible that individual sub-pages of a
> > > folio could be dirty.
> > >
> > > If i_blocksize >= PAGE_SIZE, then we'll have set the min folio order and
> > > there will be exactly one (large) folio for a single fsblock.  Writeback
> 
> Oh interesting, this is the part I'm confused about. With i_blocksize
> >= PAGE_SIZE, isn't there still the situation where the folio itself
> could be a lot larger, like 1MB?

Yes, that's quite possible.  IIRC you can get 2MB folios for 8k
fsblocks.

>                                  That's what I've been seeing on fuse
> where "blocksize" == PAGE_SIZE == 4096. I see that xfs sets the min
> folio order through mapping_set_folio_min_order() but I'm not seeing
> how that ensures "there will be exactly one large folio for a single
> fsblock"?

I misspoke -- should have said "there will be no more than one (large)
folio for a single fsblock".  Sorry about the confusion; my old brain is
still stuck in 2015 sometimes.

>           My understanding is that that only ensures the folio is at
> least the size of the fsblock but that the folio size can be larger
> than that too. Am I understanding this incorrectly?

Nope, your understanding is correct. :)

> > > must happen in units of fsblocks, so there's no point in doing the extra
> > > accounting calculations if there's only one fsblock.
> > >
> > > Waitaminute, I think the logic to decide if you're going to use the
> > > granular accounting is:
> > >
> > >       (folio_size > PAGE_SIZE && folio_size > i_blocksize)
> > >
> 
> Yeah, you're right about this - I had used "ifs && i_blocksize >=
> PAGE_SIZE" as the check, which translates to "i_blocks_per_folio > 1
> && i_block_size >= PAGE_SIZE", which in effect does the same thing as
> what you wrote but has the additional (and now I'm realizing,
> unnecessary) stipulation that block_size can't be less than PAGE_SIZE.

Oh!  Yes, that's right, they /are/ equivalent!

> > > Hrm?
> > >
> >
> > I'm also a little confused why this needs to be restricted to blocksize
> > gte PAGE_SIZE. The lower level helpers all seem to be managing block
> > ranges, and then apparently just want to be able to use that directly as
> > a page count (for accounting purposes).
> >
> > Is there any reason the lower level functions couldn't return block
> > units, then the higher level code can use a blocks_per_page or some such
> > to translate that to a base page count..? As Darrick points out I assume
> > you'd want to shortcut the folio_nr_pages() == 1 case to use a min page
> > count of 1, but otherwise ISTM that would allow this to work with
> > configs like 64k pagesize and 4k blocks as well. Am I missing something?
> >
> 
> No, I don't think you're missing anything, it should have been done
> like this in the first place.
> 
> > Brian
> >
> > > > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > > > ---
> > > >  fs/iomap/buffered-io.c | 140 ++++++++++++++++++++++++++++++++++++++---
> > > >  1 file changed, 132 insertions(+), 8 deletions(-)
> > > >
> > > > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > > > index 4f021dcaaffe..bf33a5361a39 100644
> > > > --- a/fs/iomap/buffered-io.c
> > > > +++ b/fs/iomap/buffered-io.c
> > > > @@ -20,6 +20,8 @@ struct iomap_folio_state {
> > > >     spinlock_t              state_lock;
> > > >     unsigned int            read_bytes_pending;
> > > >     atomic_t                write_bytes_pending;
> > > > +   /* number of pages being currently written back */
> > > > +   unsigned                nr_pages_writeback;
> > > >
> > > >     /*
> > > >      * Each block has two bits in this bitmap:
> > > > @@ -139,6 +141,29 @@ static unsigned ifs_next_clean_block(struct folio *folio,
> > > >             blks + start_blk) - blks;
> > > >  }
> > > >
> > > > +static unsigned ifs_count_dirty_pages(struct folio *folio)
> > > > +{
> > > > +   struct inode *inode = folio->mapping->host;
> > > > +   unsigned block_size = i_blocksize(inode);
> > > > +   unsigned start_blk, end_blk;
> > > > +   unsigned blks, nblks = 0;
> > > > +
> > > > +   start_blk = 0;
> > > > +   blks = i_blocks_per_folio(inode, folio);
> > > > +   end_blk = (i_size_read(inode) - 1) >> inode->i_blkbits;
> > > > +   end_blk = min(end_blk, i_blocks_per_folio(inode, folio) - 1);
> > > > +
> > > > +   while (start_blk <= end_blk) {
> > > > +           start_blk = ifs_next_dirty_block(folio, start_blk, end_blk);
> > > > +           if (start_blk > end_blk)
> > > > +                   break;
> > >
> > > Use your new helper?
> > >
> > >               nblks = ifs_next_clean_block(folio, start_blk + 1,
> > >                               end_blk) - start_blk?
> > > > +           nblks++;
> > > > +           start_blk++;
> > > > +   }
> > > > +
> > > > +   return nblks * (block_size >> PAGE_SHIFT);
> > >
> > > I think this returns the number of dirty basepages in a given large
> > > folio?  If that's the case then shouldn't this return long, like
> > > folio_nr_pages does?
> > >
> > > > +}
> > > > +
> > > >  static unsigned ifs_find_dirty_range(struct folio *folio,
> > > >             struct iomap_folio_state *ifs, u64 *range_start, u64 range_end)
> > > >  {
> > > > @@ -220,6 +245,58 @@ static void iomap_set_range_dirty(struct folio *folio, size_t off, size_t len)
> > > >             ifs_set_range_dirty(folio, ifs, off, len);
> > > >  }
> > > >
> > > > +static long iomap_get_range_newly_dirtied(struct folio *folio, loff_t pos,
> > > > +           unsigned len)
> > >
> > > iomap_count_clean_pages() ?
> 
> Nice, a much clearer name.
> 
> I'll make the suggestions you listed above too, thanks for the pointers.
> 
> Thanks for taking a look at this, Darrick and Brian!
> > >
> > > --D
> > >


  reply	other threads:[~2025-09-04  2:52 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-29 23:39 [PATCH v2 00/12] mm/iomap: add granular dirty and writeback accounting Joanne Koong
2025-08-29 23:39 ` [PATCH v2 01/12] mm: pass number of pages to __folio_start_writeback() Joanne Koong
2025-09-03 11:48   ` David Hildenbrand
2025-09-03 20:02     ` Darrick J. Wong
2025-09-03 20:05       ` David Hildenbrand
2025-09-03 23:12         ` Joanne Koong
2025-08-29 23:39 ` [PATCH v2 02/12] mm: pass number of pages to __folio_end_writeback() Joanne Koong
2025-08-29 23:39 ` [PATCH v2 03/12] mm: add folio_end_writeback_pages() helper Joanne Koong
2025-08-29 23:39 ` [PATCH v2 04/12] mm: pass number of pages dirtied to __folio_mark_dirty() Joanne Koong
2025-08-29 23:39 ` [PATCH v2 05/12] mm: add filemap_dirty_folio_pages() helper Joanne Koong
2025-08-29 23:39 ` [PATCH v2 06/12] mm: add __folio_clear_dirty_for_io() helper Joanne Koong
2025-08-29 23:39 ` [PATCH v2 07/12] mm: add no_stats_accounting bitfield to wbc Joanne Koong
2025-08-29 23:39 ` [PATCH v2 08/12] mm: refactor clearing dirty stats into helper function Joanne Koong
2025-08-29 23:39 ` [PATCH v2 09/12] mm: add clear_dirty_for_io_stats() helper Joanne Koong
2025-08-29 23:39 ` [PATCH v2 10/12] iomap: refactor dirty bitmap iteration Joanne Koong
2025-09-03 18:53   ` Brian Foster
2025-09-03 19:59     ` Darrick J. Wong
2025-10-03 22:27       ` Joanne Koong
2025-10-04  1:11         ` Joanne Koong
2025-08-29 23:39 ` [PATCH v2 11/12] iomap: refactor uptodate " Joanne Koong
2025-08-29 23:39 ` [PATCH v2 12/12] iomap: add granular dirty and writeback accounting Joanne Koong
2025-09-02 23:46   ` Darrick J. Wong
2025-09-03 18:48     ` Brian Foster
2025-09-04  0:35       ` Joanne Koong
2025-09-04  2:52         ` Darrick J. Wong [this message]
2025-09-04 11:47         ` Brian Foster
2025-09-04 20:07           ` Darrick J. Wong
2025-09-05  0:14             ` Joanne Koong
2025-09-05 11:19               ` Brian Foster
2025-09-05 12:43                 ` Jan Kara
2025-09-05 23:30                   ` Joanne Koong
2025-09-04  8:53 ` [PATCH v2 00/12] mm/iomap: " Jan Kara
2025-09-04 23:59   ` Joanne Koong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250904025210.GO8117@frogsfrogsfrogs \
    --to=djwong@kernel.org \
    --cc=bfoster@redhat.com \
    --cc=brauner@kernel.org \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=jlayton@kernel.org \
    --cc=joannelkoong@gmail.com \
    --cc=kernel-team@meta.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.