From: Wu Fengguang <fengguang.wu@intel.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
LKML <linux-kernel@vger.kernel.org>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
Chris Mason <chris.mason@oracle.com>,
Nick Piggin <npiggin@suse.de>, Rik van Riel <riel@redhat.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Christoph Hellwig <hch@infradead.org>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
Andrea Arcangeli <aarcange@redhat.com>,
Mel Gorman <mel@csn.ul.ie>, Minchan Kim <minchan.kim@gmail.com>
Subject: Re: [PATCH 0/5] [RFC] transfer ASYNC vmscan writeback IO to the flusher threads
Date: Fri, 30 Jul 2010 21:18:00 +0800 [thread overview]
Message-ID: <20100730131800.GB6262@localhost> (raw)
In-Reply-To: <20100730111244.GC2126@dastard>
On Fri, Jul 30, 2010 at 07:12:44PM +0800, Dave Chinner wrote:
> On Fri, Jul 30, 2010 at 03:58:19PM +0800, Wu Fengguang wrote:
> > On Fri, Jul 30, 2010 at 07:23:30AM +0800, Dave Chinner wrote:
> > > On Thu, Jul 29, 2010 at 07:51:42PM +0800, Wu Fengguang wrote:
> > > > Andrew,
> > > >
> > > > It's possible to transfer ASYNC vmscan writeback IOs to the flusher threads.
> > > > This simple patchset shows the basic idea. Since it's a big behavior change,
> > > > there are inevitably lots of details to sort out. I don't know where it will
> > > > go after tests and discussions, so the patches are intentionally kept simple.
> > > >
> > > > sync livelock avoidance (need more to be complete, but this is minimal required for the last two patches)
> > > > [PATCH 1/5] writeback: introduce wbc.for_sync to cover the two sync stages
> > > > [PATCH 2/5] writeback: stop periodic/background work on seeing sync works
> > > > [PATCH 3/5] writeback: prevent sync livelock with the sync_after timestamp
> > > >
> > > > let the flusher threads do ASYNC writeback for pageout()
> > > > [PATCH 4/5] writeback: introduce bdi_start_inode_writeback()
> > > > [PATCH 5/5] vmscan: transfer async file writeback to the flusher
> > >
> > > I really do not like this - all it does is transfer random page writeback
> > > from vmscan to the flusher threads rather than avoiding random page
> > > writeback altogether. Random page writeback is nasty - just say no.
> >
> > There are cases we have to do pageout().
> >
> > - a stressed memcg with lots of dirty pages
> > - a large NUMA system whose nodes have unbalanced vmscan rate and dirty pages
> >
> > In the above cases, the whole system may not be that stressed,
> > except for some local LRU list being busy scanned. If the local
> > memory stress lead to lots of pageout(), it could bring down the whole
> > system by congesting the disks with many small seeky IO.
> >
> > It may be an overkill to push global writeback (ie. it's silly to sync
> > 1GB dirty data because there is a small stressed 100MB LRU list).
>
> No it isn't. Dirty pages have to cleaned sometime and it reclaim has
> a need to clean pages, we may as well start cleaning them all.
> Kicking background writeback is effectively just starting work we
> have already delayed into the future a little bit earlier than we
> otherwise would have.
>
> Doing this is only going to hurt performance if the same pages are
> being frequently dirtied, but the cahnges to flush expired inodes
> first in background writeback should avoid the worst of that
> behaviour. Further, the more clean pages we have, the faster
> susbequent memory reclaims are going to free up pages....
You have some points here, the data have to be synced anyway, earlier
or later.
However it still helps to clean the right data first. With
write-around, we may get clean pages in the stressed LRU in 10ms.
Blindly syncing the global inodes...maybe after 10s if unlucky.
So pageout() is still good to have/keep. But sure we need to improve it
(transfer work to the flusher, do write-around, throttle) as well as
reducing it (kick global writeback and knock down global dirty pages).
> > The
> > obvious solution is to keep the pageout() calls and make them more IO
> > wise by doing write-around at the same time. The write-around pages
> > will likely be in the same stressed LRU list, hence will do good for
> > page reclaim as well.
>
> You've kind of already done that by telling it to writeback 1024
> pages starting with a specific page. However, the big problem with
> this is that it asusme that the inode has contiguous dirty pages in
Right. We could use .writeback_index/.nr_to_write instead of
.range_start/.range_end as the writeback parameters. It's a bit racy
to use mapping->writeback_index though.
> the cache. That assumption fall down in many cases e.g. when you
> are writing lots of small files like kernel trees contain, and so
> you still end up with random IO patterns coming out of reclaim.
Small files lead to random IO anyway? You may mean .offset=1 so the
dirty page 0 will be left out. I do have the plan to do write-around
to cover such issue, since it would be very common case. Imagine the
dirty page at offset N lies in Normal zone and N+1 in DMA32 zone.
If DMA32 is scanned slightly before Normal, then we got page N+1
first, while actually we should start with page N.
> > Transferring ASYNC work to the flushers helps the
> > kswapd-vs-flusher priority problem too. Currently the
> > kswapd/direct reclaim either have to skip dirty pages on
> > congestion, or to risk being blocked in get_request_wait(), both
> > are not good options. However the use of
> > bdi_start_inode_writeback() do ask for a good vmscan throttling
> > scheme to prevent it falsely OOM before the flusher is able to
> > clean the transfered pages. This would be tricky.
>
> I have no problem with that aspect ofthe patch - my issue is that it
> does nothing to prevent the problem that causes excessive congestion
> in the first place...
No problem. It's merely the first step, stay tuned :)
Thanks,
Fengguang
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
prev parent reply other threads:[~2010-07-30 13:18 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-07-29 11:51 [PATCH 0/5] [RFC] transfer ASYNC vmscan writeback IO to the flusher threads Wu Fengguang
2010-07-29 11:51 ` [PATCH 1/5] writeback: introduce wbc.for_sync to cover the two sync stages Wu Fengguang
2010-07-29 15:04 ` Jan Kara
2010-07-30 5:10 ` Wu Fengguang
2010-07-29 11:51 ` [PATCH 2/5] writeback: stop periodic/background work on seeing sync works Wu Fengguang
2010-07-29 16:20 ` Jan Kara
2010-07-30 4:03 ` Wu Fengguang
2010-08-02 20:51 ` Jan Kara
2010-08-03 3:01 ` Wu Fengguang
2010-08-03 10:55 ` Jan Kara
2010-08-03 12:39 ` Jan Kara
2010-08-03 12:59 ` Wu Fengguang
2010-08-03 13:18 ` Jan Kara
2010-08-03 13:22 ` Wu Fengguang
2010-08-03 13:44 ` Wu Fengguang
2010-08-03 13:48 ` Wu Fengguang
2010-08-03 14:36 ` Wu Fengguang
2010-07-29 11:51 ` [PATCH 3/5] writeback: prevent sync livelock with the sync_after timestamp Wu Fengguang
2010-07-29 15:02 ` Jan Kara
2010-07-30 5:17 ` Wu Fengguang
2010-07-29 11:51 ` [PATCH 4/5] writeback: introduce bdi_start_inode_writeback() Wu Fengguang
2010-07-29 11:51 ` [PATCH 5/5] vmscan: transfer async file writeback to the flusher Wu Fengguang
2010-07-29 16:09 ` [PATCH 0/5] [RFC] transfer ASYNC vmscan writeback IO to the flusher threads Jan Kara
2010-07-30 5:34 ` Wu Fengguang
2010-07-29 23:23 ` Dave Chinner
2010-07-30 7:58 ` Wu Fengguang
2010-07-30 9:22 ` KOSAKI Motohiro
2010-07-30 12:25 ` Wu Fengguang
2010-07-30 11:12 ` Dave Chinner
2010-07-30 13:18 ` Wu Fengguang [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100730131800.GB6262@localhost \
--to=fengguang.wu@intel.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=chris.mason@oracle.com \
--cc=david@fromorbit.com \
--cc=hannes@cmpxchg.org \
--cc=hch@infradead.org \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mel@csn.ul.ie \
--cc=minchan.kim@gmail.com \
--cc=npiggin@suse.de \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).