From: Wu Fengguang <fengguang.wu@intel.com>
To: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Dave Chinner <david@fromorbit.com>,
Andrew Morton <akpm@linux-foundation.org>,
LKML <linux-kernel@vger.kernel.org>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
Chris Mason <chris.mason@oracle.com>,
Nick Piggin <npiggin@suse.de>, Rik van Riel <riel@redhat.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Christoph Hellwig <hch@infradead.org>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
Andrea Arcangeli <aarcange@redhat.com>,
Mel Gorman <mel@csn.ul.ie>, Minchan Kim <minchan.kim@gmail.com>
Subject: Re: [PATCH 0/5] [RFC] transfer ASYNC vmscan writeback IO to the flusher threads
Date: Fri, 30 Jul 2010 20:25:53 +0800 [thread overview]
Message-ID: <20100730122553.GA6262@localhost> (raw)
In-Reply-To: <20100730181014.4AEA.A69D9226@jp.fujitsu.com>
> > There are cases we have to do pageout().
> >
> > - a stressed memcg with lots of dirty pages
> > - a large NUMA system whose nodes have unbalanced vmscan rate and dirty pages
>
> - 32bit highmem system too
Ah yes!
> can you please see following commit? this describe current design.
Good staff. Thanks.
Thanks,
Fengguang
>
>
>
> commit c4e2d7ddde9693a4c05da7afd485db02c27a7a09
> Author: akpm <akpm>
> Date: Sun Dec 22 01:07:33 2002 +0000
>
> [PATCH] Give kswapd writeback higher priority than pdflush
>
> The `low latency page reclaim' design works by preventing page
> allocators from blocking on request queues (and by preventing them from
> blocking against writeback of individual pages, but that is immaterial
> here).
>
> This has a problem under some situations. pdflush (or a write(2)
> caller) could be saturating the queue with highmem pages. This
> prevents anyone from writing back ZONE_NORMAL pages. We end up doing
> enormous amounts of scenning.
>
> A test case is to mmap(MAP_SHARED) almost all of a 4G machine's memory,
> then kill the mmapping applications. The machine instantly goes from
> 0% of memory dirty to 95% or more. pdflush kicks in and starts writing
> the least-recently-dirtied pages, which are all highmem. The queue is
> congested so nobody will write back ZONE_NORMAL pages. kswapd chews
> 50% of the CPU scanning past dirty ZONE_NORMAL pages and page reclaim
> efficiency (pages_reclaimed/pages_scanned) falls to 2%.
>
> So this patch changes the policy for kswapd. kswapd may use all of a
> request queue, and is prepared to block on request queues.
>
> What will now happen in the above scenario is:
>
> 1: The page alloctor scans some pages, fails to reclaim enough
> memory and takes a nap in blk_congetion_wait().
>
> 2: kswapd() will scan the ZONE_NORMAL LRU and will start writing
> back pages. (These pages will be rotated to the tail of the
> inactive list at IO-completion interrupt time).
>
> This writeback will saturate the queue with ZONE_NORMAL pages.
> Conveniently, pdflush will avoid the congested queues. So we end up
> writing the correct pages.
>
> In this test, kswapd CPU utilisation falls from 50% to 2%, page reclaim
> efficiency rises from 2% to 40% and things are generally a lot happier.
>
>
> The downside is that kswapd may now do a lot less page reclaim,
> increasing page allocation latency, causing more direct reclaim,
> increasing lock contention in the VM, etc. But I have not been able to
> demonstrate that in testing.
>
>
> The other problem is that there is only one kswapd, and there are lots
> of disks. That is a generic problem - without being able to co-opt
> user processes we don't have enough threads to keep lots of disks saturated.
>
> One fix for this would be to add an additional "really congested"
> threshold in the request queues, so kswapd can still perform
> nonblocking writeout. This gives kswapd priority over pdflush while
> allowing kswapd to feed many disk queues. I doubt if this will be
> called for.
>
> BKrev: 3e051055aitHp3bZBPSqmq21KGs5aQ
>
>
WARNING: multiple messages have this Message-ID (diff)
From: Wu Fengguang <fengguang.wu@intel.com>
To: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Dave Chinner <david@fromorbit.com>,
Andrew Morton <akpm@linux-foundation.org>,
LKML <linux-kernel@vger.kernel.org>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
Chris Mason <chris.mason@oracle.com>,
Nick Piggin <npiggin@suse.de>, Rik van Riel <riel@redhat.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Christoph Hellwig <hch@infradead.org>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
Andrea Arcangeli <aarcange@redhat.com>,
Mel Gorman <mel@csn.ul.ie>, Minchan Kim <minchan.kim@gmail.com>
Subject: Re: [PATCH 0/5] [RFC] transfer ASYNC vmscan writeback IO to the flusher threads
Date: Fri, 30 Jul 2010 20:25:53 +0800 [thread overview]
Message-ID: <20100730122553.GA6262@localhost> (raw)
In-Reply-To: <20100730181014.4AEA.A69D9226@jp.fujitsu.com>
> > There are cases we have to do pageout().
> >
> > - a stressed memcg with lots of dirty pages
> > - a large NUMA system whose nodes have unbalanced vmscan rate and dirty pages
>
> - 32bit highmem system too
Ah yes!
> can you please see following commit? this describe current design.
Good staff. Thanks.
Thanks,
Fengguang
>
>
>
> commit c4e2d7ddde9693a4c05da7afd485db02c27a7a09
> Author: akpm <akpm>
> Date: Sun Dec 22 01:07:33 2002 +0000
>
> [PATCH] Give kswapd writeback higher priority than pdflush
>
> The `low latency page reclaim' design works by preventing page
> allocators from blocking on request queues (and by preventing them from
> blocking against writeback of individual pages, but that is immaterial
> here).
>
> This has a problem under some situations. pdflush (or a write(2)
> caller) could be saturating the queue with highmem pages. This
> prevents anyone from writing back ZONE_NORMAL pages. We end up doing
> enormous amounts of scenning.
>
> A test case is to mmap(MAP_SHARED) almost all of a 4G machine's memory,
> then kill the mmapping applications. The machine instantly goes from
> 0% of memory dirty to 95% or more. pdflush kicks in and starts writing
> the least-recently-dirtied pages, which are all highmem. The queue is
> congested so nobody will write back ZONE_NORMAL pages. kswapd chews
> 50% of the CPU scanning past dirty ZONE_NORMAL pages and page reclaim
> efficiency (pages_reclaimed/pages_scanned) falls to 2%.
>
> So this patch changes the policy for kswapd. kswapd may use all of a
> request queue, and is prepared to block on request queues.
>
> What will now happen in the above scenario is:
>
> 1: The page alloctor scans some pages, fails to reclaim enough
> memory and takes a nap in blk_congetion_wait().
>
> 2: kswapd() will scan the ZONE_NORMAL LRU and will start writing
> back pages. (These pages will be rotated to the tail of the
> inactive list at IO-completion interrupt time).
>
> This writeback will saturate the queue with ZONE_NORMAL pages.
> Conveniently, pdflush will avoid the congested queues. So we end up
> writing the correct pages.
>
> In this test, kswapd CPU utilisation falls from 50% to 2%, page reclaim
> efficiency rises from 2% to 40% and things are generally a lot happier.
>
>
> The downside is that kswapd may now do a lot less page reclaim,
> increasing page allocation latency, causing more direct reclaim,
> increasing lock contention in the VM, etc. But I have not been able to
> demonstrate that in testing.
>
>
> The other problem is that there is only one kswapd, and there are lots
> of disks. That is a generic problem - without being able to co-opt
> user processes we don't have enough threads to keep lots of disks saturated.
>
> One fix for this would be to add an additional "really congested"
> threshold in the request queues, so kswapd can still perform
> nonblocking writeout. This gives kswapd priority over pdflush while
> allowing kswapd to feed many disk queues. I doubt if this will be
> called for.
>
> BKrev: 3e051055aitHp3bZBPSqmq21KGs5aQ
>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-07-30 12:26 UTC|newest]
Thread overview: 66+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-07-29 11:51 [PATCH 0/5] [RFC] transfer ASYNC vmscan writeback IO to the flusher threads Wu Fengguang
2010-07-29 11:51 ` Wu Fengguang
2010-07-29 11:51 ` Wu Fengguang
2010-07-29 11:51 ` [PATCH 1/5] writeback: introduce wbc.for_sync to cover the two sync stages Wu Fengguang
2010-07-29 11:51 ` Wu Fengguang
2010-07-29 11:51 ` Wu Fengguang
2010-07-29 15:04 ` Jan Kara
2010-07-29 15:04 ` Jan Kara
2010-07-30 5:10 ` Wu Fengguang
2010-07-30 5:10 ` Wu Fengguang
2010-07-29 11:51 ` [PATCH 2/5] writeback: stop periodic/background work on seeing sync works Wu Fengguang
2010-07-29 11:51 ` Wu Fengguang
2010-07-29 11:51 ` Wu Fengguang
2010-07-29 16:20 ` Jan Kara
2010-07-29 16:20 ` Jan Kara
2010-07-30 4:03 ` Wu Fengguang
2010-07-30 4:03 ` Wu Fengguang
2010-08-02 20:51 ` Jan Kara
2010-08-02 20:51 ` Jan Kara
2010-08-03 3:01 ` Wu Fengguang
2010-08-03 3:01 ` Wu Fengguang
2010-08-03 10:55 ` Jan Kara
2010-08-03 10:55 ` Jan Kara
2010-08-03 12:39 ` Jan Kara
2010-08-03 12:39 ` Jan Kara
2010-08-03 12:59 ` Wu Fengguang
2010-08-03 12:59 ` Wu Fengguang
2010-08-03 13:18 ` Jan Kara
2010-08-03 13:18 ` Jan Kara
2010-08-03 13:22 ` Wu Fengguang
2010-08-03 13:22 ` Wu Fengguang
2010-08-03 13:44 ` Wu Fengguang
2010-08-03 13:44 ` Wu Fengguang
2010-08-03 13:48 ` Wu Fengguang
2010-08-03 13:48 ` Wu Fengguang
2010-08-03 14:36 ` Wu Fengguang
2010-08-03 14:36 ` Wu Fengguang
2010-07-29 11:51 ` [PATCH 3/5] writeback: prevent sync livelock with the sync_after timestamp Wu Fengguang
2010-07-29 11:51 ` Wu Fengguang
2010-07-29 11:51 ` Wu Fengguang
2010-07-29 15:02 ` Jan Kara
2010-07-29 15:02 ` Jan Kara
2010-07-30 5:17 ` Wu Fengguang
2010-07-30 5:17 ` Wu Fengguang
2010-07-29 11:51 ` [PATCH 4/5] writeback: introduce bdi_start_inode_writeback() Wu Fengguang
2010-07-29 11:51 ` Wu Fengguang
2010-07-29 11:51 ` Wu Fengguang
2010-07-29 11:51 ` [PATCH 5/5] vmscan: transfer async file writeback to the flusher Wu Fengguang
2010-07-29 11:51 ` Wu Fengguang
2010-07-29 11:51 ` Wu Fengguang
2010-07-29 16:09 ` [PATCH 0/5] [RFC] transfer ASYNC vmscan writeback IO to the flusher threads Jan Kara
2010-07-29 16:09 ` Jan Kara
2010-07-30 5:34 ` Wu Fengguang
2010-07-30 5:34 ` Wu Fengguang
2010-07-29 23:23 ` Dave Chinner
2010-07-29 23:23 ` Dave Chinner
2010-07-30 7:58 ` Wu Fengguang
2010-07-30 7:58 ` Wu Fengguang
2010-07-30 9:22 ` KOSAKI Motohiro
2010-07-30 9:22 ` KOSAKI Motohiro
2010-07-30 12:25 ` Wu Fengguang [this message]
2010-07-30 12:25 ` Wu Fengguang
2010-07-30 11:12 ` Dave Chinner
2010-07-30 11:12 ` Dave Chinner
2010-07-30 13:18 ` Wu Fengguang
2010-07-30 13:18 ` Wu Fengguang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100730122553.GA6262@localhost \
--to=fengguang.wu@intel.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=chris.mason@oracle.com \
--cc=david@fromorbit.com \
--cc=hannes@cmpxchg.org \
--cc=hch@infradead.org \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mel@csn.ul.ie \
--cc=minchan.kim@gmail.com \
--cc=npiggin@suse.de \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.