All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wu Fengguang <fengguang.wu@intel.com>
To: Jan Kara <jack@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Christoph Hellwig <hch@lst.de>,
	Jan Engelhardt <jengelh@medozas.de>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 4/5] writeback: avoid livelocking WB_SYNC_ALL writeback
Date: Wed, 10 Nov 2010 10:26:24 +0800	[thread overview]
Message-ID: <20101110022624.GA5167@localhost> (raw)
In-Reply-To: <20101109231840.GC11214@quack.suse.cz>

On Wed, Nov 10, 2010 at 07:18:40AM +0800, Jan Kara wrote:
> On Tue 09-11-10 14:43:46, Andrew Morton wrote:

> > I don't really see how this patch changes anything.  For WB_SYNC_ALL
> > requests the code will still try to write out 2^63 pages, only it does
> > it all in a single writeback_inodes_wb() call.  What prevents that call

Sorry sync() works on one super block after another, so it's some
__writeback_inodes_sb() call. I'll update the comment.

> > itself from getting livelocked?

__writeback_inodes_sb() livelock is prevented by

- working on a finite set of files by doing queue_io() once at the beginning
- working on a finite set of pages by PAGECACHE_TAG_TOWRITE page tagging

>   I'm referring to the livelock avoidance using page tagging. Fengguang
> actually added a note about this into a comment in the code but it's not
> in the changelog. And you're right it should be here.

OK, I'll add the above to changelog.

> > Perhaps the unmentioned problem here is that each call to
> > writeback_inodes_wb(MAX_WRITEBACK_PAGES) will restart its walk across
> > the inode lists.  So instead of giving up on a being-written-to-file,
> > we continuously revisit it again and again and again.
> > 
> > Correct?  If so, please add the description.  If incorrect, please add
> > the description as well ;)
>   Yes, that's the problem.

writeback_inodes_wb(MAX_WRITEBACK_PAGES) will put the not full written
inode to head of b_more_io, and pick up the next inode from tail of
b_io next time it is called. Here the tail of b_io serves as the
cursor.

         b_io             b_more_io
        |----------------|-----------------|
        ^head            ^cursor           ^tail

> > Root cause time: it's those damn per-sb inode lists *again*.  They're
> > just awful.  We need some data structure there which is more amenable
> > to being iterated over.  Something against which we can store cursors,
> > for a start.
>   This would be definitely nice. But in this particular case, since we have
> that page tagging livelock avoidance, we can just do all we need in a one
> big sweep so we are OK.

The main problem of list_head is the awkward superblock walks in
move_expired_inodes(). It may take inode_lock for too long time.

It helps to break up b_dirty into a rb-tree. That will make
redirty_tail() more straightforward, too.

> Suggestion for the new changelog:
> When wb_writeback() is called in WB_SYNC_ALL mode, work->nr_to_write is
> usually set to LONG_MAX. The logic in wb_writeback() then calls
> __writeback_inodes_sb() with nr_to_write == MAX_WRITEBACK_PAGES and

> we easily end up with negative nr_to_write after the function returns.
> This is because write_cache_pages() does not stop writing when
> nr_to_write drops to zero in WB_SYNC_ALL mode.

It will return with (nr_to_write <=0) regardless of the
write_cache_pages() trick to ignore nr_to_write. So I changed the
above to:

        we easily end up with non-positive nr_to_write after the function
        returns, if the inode has more than MAX_WRITEBACK_PAGES dirty pages
        at the moment.

Others look good. I'll repost the series with updated changelog.

Thanks,
Fengguang

WARNING: multiple messages have this Message-ID (diff)
From: Wu Fengguang <fengguang.wu@intel.com>
To: Jan Kara <jack@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Christoph Hellwig <hch@lst.de>,
	Jan Engelhardt <jengelh@medozas.de>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 4/5] writeback: avoid livelocking WB_SYNC_ALL writeback
Date: Wed, 10 Nov 2010 10:26:24 +0800	[thread overview]
Message-ID: <20101110022624.GA5167@localhost> (raw)
In-Reply-To: <20101109231840.GC11214@quack.suse.cz>

On Wed, Nov 10, 2010 at 07:18:40AM +0800, Jan Kara wrote:
> On Tue 09-11-10 14:43:46, Andrew Morton wrote:

> > I don't really see how this patch changes anything.  For WB_SYNC_ALL
> > requests the code will still try to write out 2^63 pages, only it does
> > it all in a single writeback_inodes_wb() call.  What prevents that call

Sorry sync() works on one super block after another, so it's some
__writeback_inodes_sb() call. I'll update the comment.

> > itself from getting livelocked?

__writeback_inodes_sb() livelock is prevented by

- working on a finite set of files by doing queue_io() once at the beginning
- working on a finite set of pages by PAGECACHE_TAG_TOWRITE page tagging

>   I'm referring to the livelock avoidance using page tagging. Fengguang
> actually added a note about this into a comment in the code but it's not
> in the changelog. And you're right it should be here.

OK, I'll add the above to changelog.

> > Perhaps the unmentioned problem here is that each call to
> > writeback_inodes_wb(MAX_WRITEBACK_PAGES) will restart its walk across
> > the inode lists.  So instead of giving up on a being-written-to-file,
> > we continuously revisit it again and again and again.
> > 
> > Correct?  If so, please add the description.  If incorrect, please add
> > the description as well ;)
>   Yes, that's the problem.

writeback_inodes_wb(MAX_WRITEBACK_PAGES) will put the not full written
inode to head of b_more_io, and pick up the next inode from tail of
b_io next time it is called. Here the tail of b_io serves as the
cursor.

         b_io             b_more_io
        |----------------|-----------------|
        ^head            ^cursor           ^tail

> > Root cause time: it's those damn per-sb inode lists *again*.  They're
> > just awful.  We need some data structure there which is more amenable
> > to being iterated over.  Something against which we can store cursors,
> > for a start.
>   This would be definitely nice. But in this particular case, since we have
> that page tagging livelock avoidance, we can just do all we need in a one
> big sweep so we are OK.

The main problem of list_head is the awkward superblock walks in
move_expired_inodes(). It may take inode_lock for too long time.

It helps to break up b_dirty into a rb-tree. That will make
redirty_tail() more straightforward, too.

> Suggestion for the new changelog:
> When wb_writeback() is called in WB_SYNC_ALL mode, work->nr_to_write is
> usually set to LONG_MAX. The logic in wb_writeback() then calls
> __writeback_inodes_sb() with nr_to_write == MAX_WRITEBACK_PAGES and

> we easily end up with negative nr_to_write after the function returns.
> This is because write_cache_pages() does not stop writing when
> nr_to_write drops to zero in WB_SYNC_ALL mode.

It will return with (nr_to_write <=0) regardless of the
write_cache_pages() trick to ignore nr_to_write. So I changed the
above to:

        we easily end up with non-positive nr_to_write after the function
        returns, if the inode has more than MAX_WRITEBACK_PAGES dirty pages
        at the moment.

Others look good. I'll repost the series with updated changelog.

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2010-11-10  2:26 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-08 23:09 [PATCH 0/5] writeback livelock fixes Wu Fengguang
2010-11-08 23:09 ` Wu Fengguang
2010-11-08 23:09 ` Wu Fengguang
2010-11-08 23:09 ` [PATCH 1/5] writeback: integrated background writeback work Wu Fengguang
2010-11-08 23:09   ` Wu Fengguang
2010-11-08 23:09   ` Wu Fengguang
2010-11-08 23:09 ` [PATCH 2/5] writeback: trace wakeup event for background writeback Wu Fengguang
2010-11-08 23:09   ` Wu Fengguang
2010-11-08 23:09   ` Wu Fengguang
2010-11-08 23:09 ` [PATCH 3/5] writeback: stop background/kupdate works from livelocking other works Wu Fengguang
2010-11-08 23:09   ` Wu Fengguang
2010-11-08 23:09   ` Wu Fengguang
2010-11-09 21:13   ` Andrew Morton
2010-11-09 21:13     ` Andrew Morton
2010-11-09 22:28     ` Jan Kara
2010-11-09 22:28       ` Jan Kara
2010-11-09 23:00       ` Andrew Morton
2010-11-09 23:00         ` Andrew Morton
2010-11-09 23:56         ` Jan Kara
2010-11-09 23:56           ` Jan Kara
2010-11-10 23:37           ` Andrew Morton
2010-11-10 23:37             ` Andrew Morton
2010-11-11  0:40             ` Wu Fengguang
2010-11-11  0:40               ` Wu Fengguang
2010-11-11 13:32               ` Christoph Hellwig
2010-11-11 13:32                 ` Christoph Hellwig
2010-11-11 16:44             ` Jan Kara
2010-11-11 16:44               ` Jan Kara
2010-11-08 23:09 ` [PATCH 4/5] writeback: avoid livelocking WB_SYNC_ALL writeback Wu Fengguang
2010-11-08 23:09   ` Wu Fengguang
2010-11-08 23:09   ` Wu Fengguang
2010-11-09 22:43   ` Andrew Morton
2010-11-09 22:43     ` Andrew Morton
2010-11-09 23:18     ` Jan Kara
2010-11-09 23:18       ` Jan Kara
2010-11-10  2:26       ` Wu Fengguang [this message]
2010-11-10  2:26         ` Wu Fengguang
2010-11-08 23:09 ` [PATCH 5/5] writeback: check skipped pages on WB_SYNC_ALL Wu Fengguang
2010-11-08 23:09   ` Wu Fengguang
2010-11-09 22:47   ` Andrew Morton
2010-11-09 22:47     ` Andrew Morton
2010-11-09 23:16     ` Wu Fengguang
2010-11-09 23:16       ` Wu Fengguang
2010-11-08 23:23 ` [PATCH 0/5] writeback livelock fixes Wu Fengguang
  -- strict thread matches above, loose matches on Subject: below --
2010-11-10  2:35 [PATCH 0/5] writeback livelock fixes v2 Wu Fengguang
2010-11-10  2:35 ` [PATCH 4/5] writeback: avoid livelocking WB_SYNC_ALL writeback Wu Fengguang
2010-11-10  2:35   ` Wu Fengguang
2010-11-10  2:35   ` Wu Fengguang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101110022624.GA5167@localhost \
    --to=fengguang.wu@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=jengelh@medozas.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.