linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Dave Chinner <david@fromorbit.com>
Cc: Chris Mason <clm@fb.com>, Christoph Hellwig <hch@infradead.org>,
	"Darrick J. Wong" <djwong@kernel.org>,
	xfs <linux-xfs@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	"dchinner@redhat.com" <dchinner@redhat.com>
Subject: Re: [PATCH RFC] iomap: invalidate pages past eof in iomap_do_writepage()
Date: Fri, 3 Jun 2022 11:06:22 -0400	[thread overview]
Message-ID: <YpojbvB/+wPqHT8y@cmpxchg.org> (raw)
In-Reply-To: <20220603052047.GJ1098723@dread.disaster.area>

Hello Dave,

On Fri, Jun 03, 2022 at 03:20:47PM +1000, Dave Chinner wrote:
> On Fri, Jun 03, 2022 at 01:29:40AM +0000, Chris Mason wrote:
> > As you describe above, the loops are definitely coming from higher
> > in the stack.  wb_writeback() will loop as long as
> > __writeback_inodes_wb() returns that it’s making progress and
> > we’re still globally over the bg threshold, so write_cache_pages()
> > is just being called over and over again.  We’re coming from
> > wb_check_background_flush(), so:
> > 
> >                 struct wb_writeback_work work = {
> >                         .nr_pages       = LONG_MAX,
> >                         .sync_mode      = WB_SYNC_NONE,
> >                         .for_background = 1,
> >                         .range_cyclic   = 1,
> >                         .reason         = WB_REASON_BACKGROUND,
> >                 };
> 
> Sure, but we end up in writeback_sb_inodes() which does this after
> the __writeback_single_inode()->do_writepages() call that iterates
> the dirty pages:
> 
>                if (need_resched()) {
>                         /*
>                          * We're trying to balance between building up a nice
>                          * long list of IOs to improve our merge rate, and
>                          * getting those IOs out quickly for anyone throttling
>                          * in balance_dirty_pages().  cond_resched() doesn't
>                          * unplug, so get our IOs out the door before we
>                          * give up the CPU.
>                          */
>                         blk_flush_plug(current->plug, false);
>                         cond_resched();
>                 }
> 
> So if there is a pending IO completion on this CPU on a work queue
> here, we'll reschedule to it because the work queue kworkers are
> bound to CPUs and they take priority over user threads.

The flusher thread is also a kworker, though. So it may hit this
cond_resched(), but it doesn't yield until the timeslice expires.

> Also, this then requeues the inode of the b_more_io queue, and
> wb_check_background_flush() won't come back to it until all other
> inodes on all other superblocks on the bdi have had writeback
> attempted. So if the system truly is over the background dirty
> threshold, why is writeback getting stuck on this one inode in this
> way?

The explanation for this part at least is that the bdi/flush domain is
split per cgroup. The cgroup in question is over its proportional bg
thresh. It has very few dirty pages, but it also has very few
*dirtyable* pages, which makes for a high dirty ratio. And those
handful of dirty pages are the unflushable ones past EOF.

There is no next inode to move onto on subsequent loops.

  reply	other threads:[~2022-06-03 15:06 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-01  1:11 [PATCH RFC] iomap: invalidate pages past eof in iomap_do_writepage() Chris Mason
2022-06-01 12:18 ` Christoph Hellwig
2022-06-01 14:13   ` Chris Mason
2022-06-02  6:52     ` Dave Chinner
2022-06-02 15:32       ` Johannes Weiner
2022-06-02 19:41         ` Chris Mason
2022-06-02 19:59           ` Matthew Wilcox
2022-06-02 22:07             ` Dave Chinner
2022-06-02 22:06         ` Dave Chinner
2022-06-03  1:29           ` Chris Mason
2022-06-03  5:20             ` Dave Chinner
2022-06-03 15:06               ` Johannes Weiner [this message]
2022-06-03 16:09                 ` Chris Mason
2022-06-05 23:32                   ` Dave Chinner
2022-06-06 14:46                     ` Johannes Weiner
2022-06-06 15:13                       ` Chris Mason
2022-06-07 22:52                       ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YpojbvB/+wPqHT8y@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=clm@fb.com \
    --cc=david@fromorbit.com \
    --cc=dchinner@redhat.com \
    --cc=djwong@kernel.org \
    --cc=hch@infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).