* [PATCH 0/5] Reduce GFP_ATOMIC allocation failures, candidate fix V3 @ 2009-11-12 19:30 Mel Gorman 2009-11-12 19:30 ` [PATCH 1/5] page allocator: Always wake kswapd when restarting an allocation attempt after direct reclaim failed Mel Gorman ` (5 more replies) 0 siblings, 6 replies; 57+ messages in thread From: Mel Gorman @ 2009-11-12 19:30 UTC (permalink / raw) To: Andrew Morton, Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski <karol.k.lewando> Cc: linux-kernel, linux-mm@kvack.org", KOSAKI Motohiro, Pekka Enberg, Rik van Riel, Christoph Lameter, Stephan von Krawczynski, Rafael J. Wysocki, Kernel Testers List, Mel Gorman Sorry for the long delay in posting another version. Testing is extremely time-consuming and I wasn't getting to work on this as much as I'd have liked. Changelog since V2 o Dropped the kswapd-quickly-notice-high-order patch. In more detailed testing, it made latencies even worse as kswapd slept more on high-order congestion causing order-0 direct reclaims. o Added changes to how congestion_wait() works o Added a number of new patches altering the behaviour of reclaim Since 2.6.31-rc1, there have been an increasing number of GFP_ATOMIC failures. A significant number of these have been high-order GFP_ATOMIC failures and while they are generally brushed away, there has been a large increase in them recently and there are a number of possible areas the problem could be in - core vm, page writeback and a specific driver. The bugs affected by this that I am aware of are; [Bug #14141] order 2 page allocation failures in iwlagn [Bug #14141] order 2 page allocation failures (generic) [Bug #14265] ifconfig: page allocation failure. order:5, mode:0x8020 w/ e100 [No BZ ID] Kernel crash on 2.6.31.x (kcryptd: page allocation failure..) [No BZ ID] page allocation failure message kernel 2.6.31.4 (tty-related) The following are a series of patches that bring the behaviour of reclaim and the page allocator more in line with 2.6.30. Patches 1-3 should be tested first. The testing I've done shows that the page allocator and behaviour of congestion_wait() is more in line with 2.6.30 than the vanilla kernels. It'd be nice to have 2 more tests, applying each patch on top noting any behaviour change. i.e. ideally there would be results for o patches 1+2+3 o patches 1+2+3+4 o patches 1+2+3+4+5 Of course, any tests results are welcome. The rest of the mail is the results of my own tests. I've tested against 2.6.31 and 2.6.32-rc6. I've somewhat replicated the problem in Bug #14141 and believe the other bugs are variations of the same style of problem. The basic reproduction case was; 1. X86-64 AMD Phenom and X86 P4 booted with mem=512MB. Expectation is any machine will do as long as it's 512MB for the size of workload involved. 2. A crypted work partition and swap partition was created. On my own setup, I gave no passphrase so it'd be easier to activate without interaction but there are multiple options. I should have taken better notes but the setup goes something like this; cryptsetup create -y crypt-partition /dev/sda5 pvcreate /dev/mapper/crypt-partition vgcreate crypt-volume /dev/mapper/crypt-partition lvcreate -L 5G -n crypt-logical crypt-volume lvcreate -L 2G -n crypt-swap crypt-volume mkfs -t ext3 /dev/crypt-volume/crypt-logical mkswap /dev/crypt-volume/crypt-swap 3. With the partition mounted on /scratch, I cd /scratch mkdir music git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux-2.6 4. On a normal partition, I expand a tarball containing test scripts available at http://www.csn.ul.ie/~mel/postings/latency-20091112/latency-tests-with-results.tar.gz There are two helper programs that run as part of the test - a fake music player and a fake gitk. The fake music player uses rsync with bandwidth limits to start downloading a music folder from another machine. It's bandwidth limited to simulate playing music over NFS. I believe it generates similar if not exact traffic to a music player. It occured to be afterwards that if one patched ogg123 to print a line when 1/10th of a seconds worth of music was played, it could be used as an indirect measure of desktop interactivity and help pin down pesky "audio skips" bug reports. The fake gitk is based on observing roughly what gitk does using strace. It loads all the logs into a large buffer and then builds a very basic hash map of parent to child commits. The data is stored because it was insufficient just to read the logs. It had to be kept in an in-memory buffer to generate swap. It then discards the data and does it over again in a loop for a small number of times so the test is finite. When it processes a large number of commits, it outputs a line to stdout so that stalls can be observed. Ideal behaviour is that commits are read at a constant rate and latencies look flat. Output from the two programs is piped through another script - latency-output. It records how far into the test it was when the line was outputted and what the latency was since the last line appeared. The latency should always be very smooth. Because pipes buffer IO, they are all run by expect_unbuffered which is available from expect-dev on Debian at least. All the tests are driven via run-test.sh. While the tests run, it records the kern.log to track page allocation failures, records nr_writeback at regular intervals and tracks Page IO and Swap IO. 5. For running an actual test, a kernel is built, booted, the crypted partition activated, lvm restarted, /dev/crypt-volume/crypt-logical mounted on /scratch, all swap partitions turned off and then the swap partition on /dev/crypt-volume/crypt-swap activated. I then run run-test.sh from the tarball 6. Run the test script To evaluate the patches, I considered three basic metrics. o The length of time it takes fake-gitk to complete on average o How often and how long fake-gitk stalled for o How long was spent in congestion_wait All generated data is in the tarball. On X86, the results I got were 2.6.30-0000000-force-highorder Elapsed:10:59.095 Failures:0 2.6.31-0000000-force-highorder Elapsed:11:53.505 Failures:0 2.6.31-revert-8aa7e847 Elapsed:14:01.595 Failures:0 2.6.31-0000012-pgalloc-2.6.30 Elapsed:13:32.237 Failures:0 2.6.31-0000123-congestion-both Elapsed:12:44.170 Failures:0 2.6.31-0001234-kswapd-quick-recheck Elapsed:10:35.327 Failures:0 2.6.31-0012345-adjust-priority Elapsed:11:02.995 Failures:0 2.6.32-rc6-0000000-force-highorder Elapsed:18:18.562 Failures:0 2.6.32-rc6-revert-8aa7e847 Elapsed:10:29.278 Failures:0 2.6.32-rc6-0000012-pgalloc-2.6.30 Elapsed:13:32.393 Failures:0 2.6.32-rc6-0000123-congestion-both Elapsed:14:55.265 Failures:0 2.6.32-rc6-0001234-kswapd-quick-recheck Elapsed:13:35.628 Failures:0 2.6.32-rc6-0012345-adjust-priority Elapsed:12:41.278 Failures:0 The 0000000-force-highorder is a vanilla kernel patched so that network receive always results in an order-2 allocation. This machine wasn't suffering page allocation failures even under this circumstance. However, note how slow 2.6.32-rc6 is and how much the revert helps. With the patches applied, there is comparable performance. Latencies were generally reduced with the patches applied. 2.6.32-rc6 was particularly crazy with long stalls measured over the duration of the test but has comparable latencies with 2.6.30 with the patches applied. congestion_wait behaviour is more in line with 2.6.30 after the patches with similar amounts of time being spent. In general, 2.6.32-rc6-0012345-adjust-priority waits for longer than 2.6.30 or the reverted kernels did. It also waits in more instances such as inside shrink_inactive_list() where it didn't before. Forcing behaviour like 2.6.30 resulted in good figures but I couldn't justify the patches with anything more solid than "in tests, it behaves well even though it doesn't make a lot of sense" On X86-64, the results I got were 2.6.30-0000000-force-highorder Elapsed:09:48.545 Failures:0 2.6.31-0000000-force-highorder Elapsed:09:13.020 Failures:0 2.6.31-revert-8aa7e847 Elapsed:09:02.120 Failures:0 2.6.31-0000012-pgalloc-2.6.30 Elapsed:08:52.742 Failures:0 2.6.31-0000123-congestion-both Elapsed:08:59.375 Failures:0 2.6.31-0001234-kswapd-quick-recheck Elapsed:09:19.208 Failures:0 2.6.31-0012345-adjust-priority Elapsed:09:39.225 Failures:0 2.6.32-rc6-0000000-force-highorder Elapsed:19:38.585 Failures:5 2.6.32-rc6-revert-8aa7e847 Elapsed:17:21.257 Failures:0 2.6.32-rc6-0000012-pgalloc-2.6.30 Elapsed:18:56.682 Failures:1 2.6.32-rc6-0000123-congestion-both Elapsed:16:08.340 Failures:0 2.6.32-rc6-0001234-kswapd-quick-recheck Elapsed:18:11.200 Failures:7 2.6.32-rc6-0012345-adjust-priority Elapsed:21:33.158 Failures:0 Failures were down and my impression was that it was much harder to cause failures. Performance on mainline is still not as good as 2.6.30. On this particular machine, I was able to force performance to be in line but not with any patch I could justify in the general case. Latencies were slightly reduced by applying the patches against 2.6.31. against 2.6.32-rc6, applying the patches significantly reduced the latencies but they are still significant. I'll continue to investigate what can be done to improve this further. Again, congestion_wait() is more in line with 2.6.30 when the patches are applied. Similarly to X86, almost identical behaviour can be forced by waiting on BLK_ASYNC_BOTH for each caller to congestion_wait() in the reclaim and allocator paths. Bottom line, the patches made triggering allocation failures much harder and in a number of instances and latencies are reduced when the system is under load. I will keep looking around this area - particularly the performance under load on 2.6.32-rc6 but with 2.6.32 almost out the door, I am releasing what I have now. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 57+ messages in thread
* [PATCH 1/5] page allocator: Always wake kswapd when restarting an allocation attempt after direct reclaim failed 2009-11-12 19:30 [PATCH 0/5] Reduce GFP_ATOMIC allocation failures, candidate fix V3 Mel Gorman @ 2009-11-12 19:30 ` Mel Gorman 2009-11-13 5:23 ` KOSAKI Motohiro 2009-11-12 19:30 ` [PATCH 2/5] page allocator: Do not allow interrupts to use ALLOC_HARDER Mel Gorman ` (4 subsequent siblings) 5 siblings, 1 reply; 57+ messages in thread From: Mel Gorman @ 2009-11-12 19:30 UTC (permalink / raw) To: Andrew Morton, Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski <karol.k.lewando> Cc: linux-kernel, linux-mm@kvack.org", KOSAKI Motohiro, Pekka Enberg, Rik van Riel, Christoph Lameter, Stephan von Krawczynski, Rafael J. Wysocki, Kernel Testers List, Mel Gorman If a direct reclaim makes no forward progress, it considers whether it should go OOM or not. Whether OOM is triggered or not, it may retry the application afterwards. In times past, this would always wake kswapd as well but currently, kswapd is not woken up after direct reclaim fails. For order-0 allocations, this makes little difference but if there is a heavy mix of higher-order allocations that direct reclaim is failing for, it might mean that kswapd is not rewoken for higher orders as much as it did previously. This patch wakes up kswapd when an allocation is being retried after a direct reclaim failure. It would be expected that kswapd is already awake, but this has the effect of telling kswapd to reclaim at the higher order as well. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> --- mm/page_alloc.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index cdcedf6..250d055 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1817,9 +1817,9 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, if (NUMA_BUILD && (gfp_mask & GFP_THISNODE) == GFP_THISNODE) goto nopage; +restart: wake_all_kswapd(order, zonelist, high_zoneidx); -restart: /* * OK, we're below the kswapd watermark and have kicked background * reclaim. Now things get more complex, so set up alloc_flags according -- 1.6.5 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 57+ messages in thread
* Re: [PATCH 1/5] page allocator: Always wake kswapd when restarting an allocation attempt after direct reclaim failed 2009-11-12 19:30 ` [PATCH 1/5] page allocator: Always wake kswapd when restarting an allocation attempt after direct reclaim failed Mel Gorman @ 2009-11-13 5:23 ` KOSAKI Motohiro 2009-11-13 13:55 ` Mel Gorman 0 siblings, 1 reply; 57+ messages in thread From: KOSAKI Motohiro @ 2009-11-13 5:23 UTC (permalink / raw) To: Mel Gorman Cc: kosaki.motohiro, Andrew Morton, Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski, Tobias Oetiker, linux-kernel, linux-mm@kvack.org", Pekka Enberg, Rik van Riel, Christoph Lameter, Stephan von Krawczynski, Rafael J. Wysocki, Kernel Testers List > If a direct reclaim makes no forward progress, it considers whether it > should go OOM or not. Whether OOM is triggered or not, it may retry the > application afterwards. In times past, this would always wake kswapd as well > but currently, kswapd is not woken up after direct reclaim fails. For order-0 > allocations, this makes little difference but if there is a heavy mix of > higher-order allocations that direct reclaim is failing for, it might mean > that kswapd is not rewoken for higher orders as much as it did previously. > > This patch wakes up kswapd when an allocation is being retried after a direct > reclaim failure. It would be expected that kswapd is already awake, but > this has the effect of telling kswapd to reclaim at the higher order as well. > > Signed-off-by: Mel Gorman <mel@csn.ul.ie> > Reviewed-by: Christoph Lameter <cl@linux-foundation.org> > Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> > Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Umm, My mail box have the carbon copy of akpm sent this patch to linus. (bellow subject and data) Does this have any update? ----------------------------------------------------------------- Subject: [patch 07/35] page allocator: always wake kswapd when restarting an allocation attempt after direct reclaim failed To: torvalds@linux-foundation.org Cc: akpm@linux-foundation.org, mel@csn.ul.ie, cl@linux-foundation.org, kosaki.motohiro@jp.fujitsu.com, penberg@cs.helsinki.fi, stable@kernel.org From: akpm@linux-foundation.org Date: Wed, 11 Nov 2009 14:26:14 -0800 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 1/5] page allocator: Always wake kswapd when restarting an allocation attempt after direct reclaim failed 2009-11-13 5:23 ` KOSAKI Motohiro @ 2009-11-13 13:55 ` Mel Gorman 0 siblings, 0 replies; 57+ messages in thread From: Mel Gorman @ 2009-11-13 13:55 UTC (permalink / raw) To: KOSAKI Motohiro Cc: Andrew Morton, Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski, Tobias Oetiker, linux-kernel, linux-mm@kvack.org, Pekka Enberg, Rik van Riel, Christoph Lameter, Stephan von Krawczynski, Rafael J. Wysocki, Kernel Testers List On Fri, Nov 13, 2009 at 02:23:21PM +0900, KOSAKI Motohiro wrote: > > If a direct reclaim makes no forward progress, it considers whether it > > should go OOM or not. Whether OOM is triggered or not, it may retry the > > application afterwards. In times past, this would always wake kswapd as well > > but currently, kswapd is not woken up after direct reclaim fails. For order-0 > > allocations, this makes little difference but if there is a heavy mix of > > higher-order allocations that direct reclaim is failing for, it might mean > > that kswapd is not rewoken for higher orders as much as it did previously. > > > > This patch wakes up kswapd when an allocation is being retried after a direct > > reclaim failure. It would be expected that kswapd is already awake, but > > this has the effect of telling kswapd to reclaim at the higher order as well. > > > > Signed-off-by: Mel Gorman <mel@csn.ul.ie> > > Reviewed-by: Christoph Lameter <cl@linux-foundation.org> > > Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> > > Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> > > Umm, > My mail box have the carbon copy of akpm sent this patch to linus. > (bellow subject and data) > > Does this have any update? > No, it doesn't. I included it in the series for the benefit of testers. > ----------------------------------------------------------------- > Subject: [patch 07/35] page allocator: always wake kswapd when restarting an allocation attempt after direct reclaim failed > To: torvalds@linux-foundation.org > Cc: akpm@linux-foundation.org, > mel@csn.ul.ie, > cl@linux-foundation.org, > kosaki.motohiro@jp.fujitsu.com, > penberg@cs.helsinki.fi, > stable@kernel.org > From: akpm@linux-foundation.org > Date: Wed, 11 Nov 2009 14:26:14 -0800 > > > -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 57+ messages in thread
* [PATCH 2/5] page allocator: Do not allow interrupts to use ALLOC_HARDER 2009-11-12 19:30 [PATCH 0/5] Reduce GFP_ATOMIC allocation failures, candidate fix V3 Mel Gorman 2009-11-12 19:30 ` [PATCH 1/5] page allocator: Always wake kswapd when restarting an allocation attempt after direct reclaim failed Mel Gorman @ 2009-11-12 19:30 ` Mel Gorman [not found] ` <1258054235-3208-3-git-send-email-mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> 2009-11-12 19:30 ` [PATCH 3/5] page allocator: Wait on both sync and async congestion after direct reclaim Mel Gorman ` (3 subsequent siblings) 5 siblings, 1 reply; 57+ messages in thread From: Mel Gorman @ 2009-11-12 19:30 UTC (permalink / raw) To: Andrew Morton, Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski <karol.k.lewando> Cc: linux-kernel, linux-mm@kvack.org", KOSAKI Motohiro, Pekka Enberg, Rik van Riel, Christoph Lameter, Stephan von Krawczynski, Rafael J. Wysocki, Kernel Testers List, Mel Gorman Commit 341ce06f69abfafa31b9468410a13dbd60e2b237 altered watermark logic slightly by allowing rt_tasks that are handling an interrupt to set ALLOC_HARDER. This patch brings the watermark logic more in line with 2.6.30. [rientjes@google.com: Spotted the problem] Signed-off-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> Reviewed-by: Rik van Riel <riel@redhat.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> --- mm/page_alloc.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 250d055..2bc2ac6 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1769,7 +1769,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask) * See also cpuset_zone_allowed() comment in kernel/cpuset.c. */ alloc_flags &= ~ALLOC_CPUSET; - } else if (unlikely(rt_task(p))) + } else if (unlikely(rt_task(p)) && !in_interrupt()) alloc_flags |= ALLOC_HARDER; if (likely(!(gfp_mask & __GFP_NOMEMALLOC))) { -- 1.6.5 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 57+ messages in thread
[parent not found: <1258054235-3208-3-git-send-email-mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org>]
* Re: [PATCH 2/5] page allocator: Do not allow interrupts to use ALLOC_HARDER [not found] ` <1258054235-3208-3-git-send-email-mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> @ 2009-11-13 5:24 ` KOSAKI Motohiro 2009-11-13 13:56 ` Mel Gorman 0 siblings, 1 reply; 57+ messages in thread From: KOSAKI Motohiro @ 2009-11-13 5:24 UTC (permalink / raw) To: Mel Gorman Cc: kosaki.motohiro-+CUm20s59erQFUHtdCDX3A, Andrew Morton, Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski, Tobias Oetiker, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org", Pekka Enberg, Rik van Riel, Christoph Lameter, Stephan von Krawczynski, Rafael J. Wysocki, Kernel Testers List > Commit 341ce06f69abfafa31b9468410a13dbd60e2b237 altered watermark logic > slightly by allowing rt_tasks that are handling an interrupt to set > ALLOC_HARDER. This patch brings the watermark logic more in line with > 2.6.30. ditto. afaik, this patch have been sent to linus. > > [rientjes-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org: Spotted the problem] > Signed-off-by: Mel Gorman <mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> > Reviewed-by: Pekka Enberg <penberg-bbCR+/B0CizivPeTLB3BmA@public.gmane.org> > Reviewed-by: Rik van Riel <riel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> > Reviewed-by: KOSAKI Motohiro <kosaki.motohiro-+CUm20s59erQFUHtdCDX3A@public.gmane.org> > --- > mm/page_alloc.c | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 250d055..2bc2ac6 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1769,7 +1769,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask) > * See also cpuset_zone_allowed() comment in kernel/cpuset.c. > */ > alloc_flags &= ~ALLOC_CPUSET; > - } else if (unlikely(rt_task(p))) > + } else if (unlikely(rt_task(p)) && !in_interrupt()) > alloc_flags |= ALLOC_HARDER; > > if (likely(!(gfp_mask & __GFP_NOMEMALLOC))) { > -- > 1.6.5 > ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 2/5] page allocator: Do not allow interrupts to use ALLOC_HARDER 2009-11-13 5:24 ` KOSAKI Motohiro @ 2009-11-13 13:56 ` Mel Gorman 0 siblings, 0 replies; 57+ messages in thread From: Mel Gorman @ 2009-11-13 13:56 UTC (permalink / raw) To: KOSAKI Motohiro Cc: Andrew Morton, Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski, Tobias Oetiker, linux-kernel, linux-mm@kvack.org, Pekka Enberg, Rik van Riel, Christoph Lameter, Stephan von Krawczynski, Rafael J. Wysocki, Kernel Testers List On Fri, Nov 13, 2009 at 02:24:13PM +0900, KOSAKI Motohiro wrote: > > Commit 341ce06f69abfafa31b9468410a13dbd60e2b237 altered watermark logic > > slightly by allowing rt_tasks that are handling an interrupt to set > > ALLOC_HARDER. This patch brings the watermark logic more in line with > > 2.6.30. > > ditto. > afaik, this patch have been sent to linus. > And ditto, it's there for testers, particularly when patching against 2.6.31. > > > > > [rientjes@google.com: Spotted the problem] > > Signed-off-by: Mel Gorman <mel@csn.ul.ie> > > Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> > > Reviewed-by: Rik van Riel <riel@redhat.com> > > Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> > > --- > > mm/page_alloc.c | 2 +- > > 1 files changed, 1 insertions(+), 1 deletions(-) > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > index 250d055..2bc2ac6 100644 > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -1769,7 +1769,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask) > > * See also cpuset_zone_allowed() comment in kernel/cpuset.c. > > */ > > alloc_flags &= ~ALLOC_CPUSET; > > - } else if (unlikely(rt_task(p))) > > + } else if (unlikely(rt_task(p)) && !in_interrupt()) > > alloc_flags |= ALLOC_HARDER; > > > > if (likely(!(gfp_mask & __GFP_NOMEMALLOC))) { > > -- > > 1.6.5 > > > > > -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 57+ messages in thread
* [PATCH 3/5] page allocator: Wait on both sync and async congestion after direct reclaim 2009-11-12 19:30 [PATCH 0/5] Reduce GFP_ATOMIC allocation failures, candidate fix V3 Mel Gorman 2009-11-12 19:30 ` [PATCH 1/5] page allocator: Always wake kswapd when restarting an allocation attempt after direct reclaim failed Mel Gorman 2009-11-12 19:30 ` [PATCH 2/5] page allocator: Do not allow interrupts to use ALLOC_HARDER Mel Gorman @ 2009-11-12 19:30 ` Mel Gorman [not found] ` <1258054235-3208-4-git-send-email-mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> 2009-11-12 19:30 ` [PATCH 4/5] vmscan: Have kswapd sleep for a short interval and double check it should be asleep Mel Gorman ` (2 subsequent siblings) 5 siblings, 1 reply; 57+ messages in thread From: Mel Gorman @ 2009-11-12 19:30 UTC (permalink / raw) To: Andrew Morton, Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski <karol.k.lewando> Cc: linux-kernel, linux-mm@kvack.org", KOSAKI Motohiro, Pekka Enberg, Rik van Riel, Christoph Lameter, Stephan von Krawczynski, Rafael J. Wysocki, Kernel Testers List, Mel Gorman Testing by Frans Pop indicated that in the 2.6.30..2.6.31 window at least that the commits 373c0a7e 8aa7e847 dramatically increased the number of GFP_ATOMIC failures that were occuring within a wireless driver. Reverting this patch seemed to help a lot even though it was pointed out that the congestion changes were very far away from high-order atomic allocations. The key to why the revert makes such a big difference is down to timing and how long direct reclaimers wait versus kswapd. With the patch reverted, the congestion_wait() is on the SYNC queue instead of the ASYNC. As a significant part of the workload involved reads, it makes sense that the SYNC list is what was truely congested and with the revert processes were waiting on congestion as expected. Hence, direct reclaimers stalled properly and kswapd was able to do its job with fewer stalls. This patch aims to fix the congestion_wait() behaviour for SYNC and ASYNC for direct reclaimers. Instead of making the congestion_wait() on the SYNC queue which would only fix a particular type of workload, this patch adds a third type of congestion_wait - BLK_RW_BOTH which first waits on the ASYNC and then the SYNC queue if the timeout has not been reached. In tests, this counter-intuitively results in kswapd stalling less and freeing up pages resulting in fewer allocation failures and fewer direct-reclaim-orientated stalls. Signed-off-by: Mel Gorman <mel@csn.ul.ie> --- include/linux/backing-dev.h | 1 + mm/backing-dev.c | 25 ++++++++++++++++++++++--- mm/page_alloc.c | 4 ++-- mm/vmscan.c | 2 +- 4 files changed, 26 insertions(+), 6 deletions(-) diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h index b449e73..b35344c 100644 --- a/include/linux/backing-dev.h +++ b/include/linux/backing-dev.h @@ -276,6 +276,7 @@ static inline int bdi_rw_congested(struct backing_dev_info *bdi) enum { BLK_RW_ASYNC = 0, BLK_RW_SYNC = 1, + BLK_RW_BOTH = 2, }; void clear_bdi_congested(struct backing_dev_info *bdi, int sync); diff --git a/mm/backing-dev.c b/mm/backing-dev.c index 1065b71..ea9ffc3 100644 --- a/mm/backing-dev.c +++ b/mm/backing-dev.c @@ -736,22 +736,41 @@ EXPORT_SYMBOL(set_bdi_congested); /** * congestion_wait - wait for a backing_dev to become uncongested - * @sync: SYNC or ASYNC IO + * @sync: SYNC, ASYNC or BOTH IO * @timeout: timeout in jiffies * * Waits for up to @timeout jiffies for a backing_dev (any backing_dev) to exit * write congestion. If no backing_devs are congested then just wait for the * next write to be completed. */ -long congestion_wait(int sync, long timeout) +long congestion_wait(int sync_request, long timeout) { long ret; DEFINE_WAIT(wait); - wait_queue_head_t *wqh = &congestion_wqh[sync]; + int sync; + wait_queue_head_t *wqh; + + /* If requested to sync both, wait on ASYNC first, then SYNC */ + if (sync_request == BLK_RW_BOTH) + sync = BLK_RW_ASYNC; + else + sync = sync_request; + +again: + wqh = &congestion_wqh[sync]; prepare_to_wait(wqh, &wait, TASK_UNINTERRUPTIBLE); ret = io_schedule_timeout(timeout); finish_wait(wqh, &wait); + + if (sync_request == BLK_RW_BOTH) { + sync_request = 0; + sync = BLK_RW_SYNC; + timeout = ret; + if (timeout) + goto again; + } + return ret; } EXPORT_SYMBOL(congestion_wait); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 2bc2ac6..f6ed41c 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1727,7 +1727,7 @@ __alloc_pages_high_priority(gfp_t gfp_mask, unsigned int order, preferred_zone, migratetype); if (!page && gfp_mask & __GFP_NOFAIL) - congestion_wait(BLK_RW_ASYNC, HZ/50); + congestion_wait(BLK_RW_BOTH, HZ/50); } while (!page && (gfp_mask & __GFP_NOFAIL)); return page; @@ -1898,7 +1898,7 @@ rebalance: pages_reclaimed += did_some_progress; if (should_alloc_retry(gfp_mask, order, pages_reclaimed)) { /* Wait for some write requests to complete then retry */ - congestion_wait(BLK_RW_ASYNC, HZ/50); + congestion_wait(BLK_RW_BOTH, HZ/50); goto rebalance; } diff --git a/mm/vmscan.c b/mm/vmscan.c index 777af57..190bae1 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1793,7 +1793,7 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist, /* Take a nap, wait for some writeback to complete */ if (sc->nr_scanned && priority < DEF_PRIORITY - 2) - congestion_wait(BLK_RW_ASYNC, HZ/10); + congestion_wait(BLK_RW_BOTH, HZ/10); } /* top priority shrink_zones still had more to do? don't OOM, then */ if (!sc->all_unreclaimable && scanning_global_lru(sc)) -- 1.6.5 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 57+ messages in thread
[parent not found: <1258054235-3208-4-git-send-email-mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org>]
* Re: [PATCH 3/5] page allocator: Wait on both sync and async congestion after direct reclaim [not found] ` <1258054235-3208-4-git-send-email-mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> @ 2009-11-13 11:20 ` KOSAKI Motohiro [not found] ` <20091113142526.33B3.A69D9226-+CUm20s59erQFUHtdCDX3A@public.gmane.org> 0 siblings, 1 reply; 57+ messages in thread From: KOSAKI Motohiro @ 2009-11-13 11:20 UTC (permalink / raw) To: Mel Gorman Cc: kosaki.motohiro-+CUm20s59erQFUHtdCDX3A, Andrew Morton, Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski, Tobias Oetiker, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org", Pekka Enberg, Rik van Riel, Christoph Lameter, Stephan von Krawczynski, Rafael J. Wysocki, Kernel Testers List, Jens Axboe (cc to Jens) > Testing by Frans Pop indicated that in the 2.6.30..2.6.31 window at least > that the commits 373c0a7e 8aa7e847 dramatically increased the number of > GFP_ATOMIC failures that were occuring within a wireless driver. Reverting > this patch seemed to help a lot even though it was pointed out that the > congestion changes were very far away from high-order atomic allocations. > > The key to why the revert makes such a big difference is down to timing and > how long direct reclaimers wait versus kswapd. With the patch reverted, > the congestion_wait() is on the SYNC queue instead of the ASYNC. As a > significant part of the workload involved reads, it makes sense that the > SYNC list is what was truely congested and with the revert processes were > waiting on congestion as expected. Hence, direct reclaimers stalled > properly and kswapd was able to do its job with fewer stalls. > > This patch aims to fix the congestion_wait() behaviour for SYNC and ASYNC > for direct reclaimers. Instead of making the congestion_wait() on the SYNC > queue which would only fix a particular type of workload, this patch adds a > third type of congestion_wait - BLK_RW_BOTH which first waits on the ASYNC > and then the SYNC queue if the timeout has not been reached. In tests, this > counter-intuitively results in kswapd stalling less and freeing up pages > resulting in fewer allocation failures and fewer direct-reclaim-orientated > stalls. Honestly, I don't like this patch. page allocator is not related to sync block queue. vmscan doesn't make read operation. This patch makes nearly same effect of s/congestion_wait/io_schedule_timeout/. Please don't make mysterious heuristic code. Sidenode: I doubt this regression was caused from page allocator. Probably we need to confirm caller change.... > > Signed-off-by: Mel Gorman <mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> > --- > include/linux/backing-dev.h | 1 + > mm/backing-dev.c | 25 ++++++++++++++++++++++--- > mm/page_alloc.c | 4 ++-- > mm/vmscan.c | 2 +- > 4 files changed, 26 insertions(+), 6 deletions(-) > > diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h > index b449e73..b35344c 100644 > --- a/include/linux/backing-dev.h > +++ b/include/linux/backing-dev.h > @@ -276,6 +276,7 @@ static inline int bdi_rw_congested(struct backing_dev_info *bdi) > enum { > BLK_RW_ASYNC = 0, > BLK_RW_SYNC = 1, > + BLK_RW_BOTH = 2, > }; > > void clear_bdi_congested(struct backing_dev_info *bdi, int sync); > diff --git a/mm/backing-dev.c b/mm/backing-dev.c > index 1065b71..ea9ffc3 100644 > --- a/mm/backing-dev.c > +++ b/mm/backing-dev.c > @@ -736,22 +736,41 @@ EXPORT_SYMBOL(set_bdi_congested); > > /** > * congestion_wait - wait for a backing_dev to become uncongested > - * @sync: SYNC or ASYNC IO > + * @sync: SYNC, ASYNC or BOTH IO > * @timeout: timeout in jiffies > * > * Waits for up to @timeout jiffies for a backing_dev (any backing_dev) to exit > * write congestion. If no backing_devs are congested then just wait for the > * next write to be completed. > */ > -long congestion_wait(int sync, long timeout) > +long congestion_wait(int sync_request, long timeout) > { > long ret; > DEFINE_WAIT(wait); > - wait_queue_head_t *wqh = &congestion_wqh[sync]; > + int sync; > + wait_queue_head_t *wqh; > + > + /* If requested to sync both, wait on ASYNC first, then SYNC */ > + if (sync_request == BLK_RW_BOTH) > + sync = BLK_RW_ASYNC; > + else > + sync = sync_request; > + > +again: > + wqh = &congestion_wqh[sync]; > > prepare_to_wait(wqh, &wait, TASK_UNINTERRUPTIBLE); > ret = io_schedule_timeout(timeout); > finish_wait(wqh, &wait); > + > + if (sync_request == BLK_RW_BOTH) { > + sync_request = 0; > + sync = BLK_RW_SYNC; > + timeout = ret; > + if (timeout) > + goto again; > + } > + > return ret; > } > EXPORT_SYMBOL(congestion_wait); > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 2bc2ac6..f6ed41c 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1727,7 +1727,7 @@ __alloc_pages_high_priority(gfp_t gfp_mask, unsigned int order, > preferred_zone, migratetype); > > if (!page && gfp_mask & __GFP_NOFAIL) > - congestion_wait(BLK_RW_ASYNC, HZ/50); > + congestion_wait(BLK_RW_BOTH, HZ/50); > } while (!page && (gfp_mask & __GFP_NOFAIL)); > > return page; > @@ -1898,7 +1898,7 @@ rebalance: > pages_reclaimed += did_some_progress; > if (should_alloc_retry(gfp_mask, order, pages_reclaimed)) { > /* Wait for some write requests to complete then retry */ > - congestion_wait(BLK_RW_ASYNC, HZ/50); > + congestion_wait(BLK_RW_BOTH, HZ/50); > goto rebalance; > } > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 777af57..190bae1 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -1793,7 +1793,7 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist, > > /* Take a nap, wait for some writeback to complete */ > if (sc->nr_scanned && priority < DEF_PRIORITY - 2) > - congestion_wait(BLK_RW_ASYNC, HZ/10); > + congestion_wait(BLK_RW_BOTH, HZ/10); > } > /* top priority shrink_zones still had more to do? don't OOM, then */ > if (!sc->all_unreclaimable && scanning_global_lru(sc)) > -- > 1.6.5 > ^ permalink raw reply [flat|nested] 57+ messages in thread
[parent not found: <20091113142526.33B3.A69D9226-+CUm20s59erQFUHtdCDX3A@public.gmane.org>]
* Re: [PATCH 3/5] page allocator: Wait on both sync and async congestion after direct reclaim [not found] ` <20091113142526.33B3.A69D9226-+CUm20s59erQFUHtdCDX3A@public.gmane.org> @ 2009-11-13 11:55 ` Jens Axboe [not found] ` <20091113115558.GY8742-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org> 0 siblings, 1 reply; 57+ messages in thread From: Jens Axboe @ 2009-11-13 11:55 UTC (permalink / raw) To: KOSAKI Motohiro Cc: Mel Gorman, Andrew Morton, Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski, Tobias Oetiker, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, Pekka Enberg, Rik van Riel, Christoph Lameter, Stephan von Krawczynski, Rafael J. Wysocki, Kernel Testers List On Fri, Nov 13 2009, KOSAKI Motohiro wrote: > (cc to Jens) > > > Testing by Frans Pop indicated that in the 2.6.30..2.6.31 window at least > > that the commits 373c0a7e 8aa7e847 dramatically increased the number of > > GFP_ATOMIC failures that were occuring within a wireless driver. Reverting > > this patch seemed to help a lot even though it was pointed out that the > > congestion changes were very far away from high-order atomic allocations. > > > > The key to why the revert makes such a big difference is down to timing and > > how long direct reclaimers wait versus kswapd. With the patch reverted, > > the congestion_wait() is on the SYNC queue instead of the ASYNC. As a > > significant part of the workload involved reads, it makes sense that the > > SYNC list is what was truely congested and with the revert processes were > > waiting on congestion as expected. Hence, direct reclaimers stalled > > properly and kswapd was able to do its job with fewer stalls. > > > > This patch aims to fix the congestion_wait() behaviour for SYNC and ASYNC > > for direct reclaimers. Instead of making the congestion_wait() on the SYNC > > queue which would only fix a particular type of workload, this patch adds a > > third type of congestion_wait - BLK_RW_BOTH which first waits on the ASYNC > > and then the SYNC queue if the timeout has not been reached. In tests, this > > counter-intuitively results in kswapd stalling less and freeing up pages > > resulting in fewer allocation failures and fewer direct-reclaim-orientated > > stalls. > > Honestly, I don't like this patch. page allocator is not related to > sync block queue. vmscan doesn't make read operation. > This patch makes nearly same effect of s/congestion_wait/io_schedule_timeout/. > > Please don't make mysterious heuristic code. > > > Sidenode: I doubt this regression was caused from page allocator. > Probably we need to confirm caller change.... See the email from Chris from yesterday, he nicely explains why this change made a difference with dm-crypt. dm-crypt needs fixing, not a hack like this added. The vm needs to drop congestion hints and usage, not increase it. The above changelog is mostly hand-wavy nonsense, imho. -- Jens Axboe ^ permalink raw reply [flat|nested] 57+ messages in thread
[parent not found: <20091113115558.GY8742-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>]
* Re: [PATCH 3/5] page allocator: Wait on both sync and async congestion after direct reclaim [not found] ` <20091113115558.GY8742-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org> @ 2009-11-13 12:28 ` Mel Gorman [not found] ` <20091113122821.GC29804-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> 0 siblings, 1 reply; 57+ messages in thread From: Mel Gorman @ 2009-11-13 12:28 UTC (permalink / raw) To: Jens Axboe Cc: KOSAKI Motohiro, Andrew Morton, Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski, Tobias Oetiker, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, Pekka Enberg, Rik van Riel, Christoph Lameter, Stephan von Krawczynski, Rafael J. Wysocki, Kernel Testers List On Fri, Nov 13, 2009 at 12:55:58PM +0100, Jens Axboe wrote: > On Fri, Nov 13 2009, KOSAKI Motohiro wrote: > > (cc to Jens) > > > > > Testing by Frans Pop indicated that in the 2.6.30..2.6.31 window at least > > > that the commits 373c0a7e 8aa7e847 dramatically increased the number of > > > GFP_ATOMIC failures that were occuring within a wireless driver. Reverting > > > this patch seemed to help a lot even though it was pointed out that the > > > congestion changes were very far away from high-order atomic allocations. > > > > > > The key to why the revert makes such a big difference is down to timing and > > > how long direct reclaimers wait versus kswapd. With the patch reverted, > > > the congestion_wait() is on the SYNC queue instead of the ASYNC. As a > > > significant part of the workload involved reads, it makes sense that the > > > SYNC list is what was truely congested and with the revert processes were > > > waiting on congestion as expected. Hence, direct reclaimers stalled > > > properly and kswapd was able to do its job with fewer stalls. > > > > > > This patch aims to fix the congestion_wait() behaviour for SYNC and ASYNC > > > for direct reclaimers. Instead of making the congestion_wait() on the SYNC > > > queue which would only fix a particular type of workload, this patch adds a > > > third type of congestion_wait - BLK_RW_BOTH which first waits on the ASYNC > > > and then the SYNC queue if the timeout has not been reached. In tests, this > > > counter-intuitively results in kswapd stalling less and freeing up pages > > > resulting in fewer allocation failures and fewer direct-reclaim-orientated > > > stalls. > > > > Honestly, I don't like this patch. page allocator is not related to > > sync block queue. vmscan doesn't make read operation. > > This patch makes nearly same effect of s/congestion_wait/io_schedule_timeout/. > > > > Please don't make mysterious heuristic code. > > > > > > Sidenode: I doubt this regression was caused from page allocator. Probably not. As noted, the major change is really in how long callers are waiting on congestion_wait. The tarball includes graphs from an instrumented kernel that shows how long callers are waiting due to congestion_wait(). This has changed significantly. I'll queue up tests over the weekend that test without dm-crypt being involved. > > Probably we need to confirm caller change.... > > See the email from Chris from yesterday, he nicely explains why this > change made a difference with dm-crypt. Indeed. But bear in mind that it also possible that direct reclaimers are also congesting the queue due to swap-in. > dm-crypt needs fixing, not a hack like this added. > As noted by Chris in the same mail, dm-crypt has not changed. What has changed is how long callers wait in congestion_wait. > The vm needs to drop congestion hints and usage, not increase it. The > above changelog is mostly hand-wavy nonsense, imho. > Suggest an alternative that brings congestion_wait() more in line with 2.6.30 behaviour then. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ^ permalink raw reply [flat|nested] 57+ messages in thread
[parent not found: <20091113122821.GC29804-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org>]
* Re: [PATCH 3/5] page allocator: Wait on both sync and async congestion after direct reclaim [not found] ` <20091113122821.GC29804-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> @ 2009-11-13 13:32 ` Jens Axboe [not found] ` <20091113133211.GA8742-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org> 2009-11-20 14:56 ` Mel Gorman 0 siblings, 2 replies; 57+ messages in thread From: Jens Axboe @ 2009-11-13 13:32 UTC (permalink / raw) To: Mel Gorman Cc: KOSAKI Motohiro, Andrew Morton, Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski, Tobias Oetiker, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, Pekka Enberg, Rik van Riel, Christoph Lameter, Stephan von Krawczynski, Rafael J. Wysocki, Kernel Testers List On Fri, Nov 13 2009, Mel Gorman wrote: > On Fri, Nov 13, 2009 at 12:55:58PM +0100, Jens Axboe wrote: > > On Fri, Nov 13 2009, KOSAKI Motohiro wrote: > > > (cc to Jens) > > > > > > > Testing by Frans Pop indicated that in the 2.6.30..2.6.31 window at least > > > > that the commits 373c0a7e 8aa7e847 dramatically increased the number of > > > > GFP_ATOMIC failures that were occuring within a wireless driver. Reverting > > > > this patch seemed to help a lot even though it was pointed out that the > > > > congestion changes were very far away from high-order atomic allocations. > > > > > > > > The key to why the revert makes such a big difference is down to timing and > > > > how long direct reclaimers wait versus kswapd. With the patch reverted, > > > > the congestion_wait() is on the SYNC queue instead of the ASYNC. As a > > > > significant part of the workload involved reads, it makes sense that the > > > > SYNC list is what was truely congested and with the revert processes were > > > > waiting on congestion as expected. Hence, direct reclaimers stalled > > > > properly and kswapd was able to do its job with fewer stalls. > > > > > > > > This patch aims to fix the congestion_wait() behaviour for SYNC and ASYNC > > > > for direct reclaimers. Instead of making the congestion_wait() on the SYNC > > > > queue which would only fix a particular type of workload, this patch adds a > > > > third type of congestion_wait - BLK_RW_BOTH which first waits on the ASYNC > > > > and then the SYNC queue if the timeout has not been reached. In tests, this > > > > counter-intuitively results in kswapd stalling less and freeing up pages > > > > resulting in fewer allocation failures and fewer direct-reclaim-orientated > > > > stalls. > > > > > > Honestly, I don't like this patch. page allocator is not related to > > > sync block queue. vmscan doesn't make read operation. > > > This patch makes nearly same effect of s/congestion_wait/io_schedule_timeout/. > > > > > > Please don't make mysterious heuristic code. > > > > > > > > > Sidenode: I doubt this regression was caused from page allocator. > > Probably not. As noted, the major change is really in how long callers > are waiting on congestion_wait. The tarball includes graphs from an > instrumented kernel that shows how long callers are waiting due to > congestion_wait(). This has changed significantly. > > I'll queue up tests over the weekend that test without dm-crypt being involved. > > > > Probably we need to confirm caller change.... > > > > See the email from Chris from yesterday, he nicely explains why this > > change made a difference with dm-crypt. > > Indeed. > > But bear in mind that it also possible that direct reclaimers are also > congesting the queue due to swap-in. Are you speculating, or has this been observed? While I don't contest that that could happen, it's also not a new thing. And it should be an unlikely event. > > dm-crypt needs fixing, not a hack like this added. > > > > As noted by Chris in the same mail, dm-crypt has not changed. What has > changed is how long callers wait in congestion_wait. Right dm-crypt didn't change, it WAS ALREADY BUGGY. > > The vm needs to drop congestion hints and usage, not increase it. The > > above changelog is mostly hand-wavy nonsense, imho. > > > > Suggest an alternative that brings congestion_wait() more in line with > 2.6.30 behaviour then. I don't have a good explanation as to why the delays have changed, unfortunately. Are we sure that they have between .30 and .31? The dm-crypt case is overly complex and lots of changes could have broken that house of cards. -- Jens Axboe ^ permalink raw reply [flat|nested] 57+ messages in thread
[parent not found: <20091113133211.GA8742-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>]
* Re: [PATCH 3/5] page allocator: Wait on both sync and async congestion after direct reclaim [not found] ` <20091113133211.GA8742-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org> @ 2009-11-13 13:41 ` Pekka Enberg 2009-11-13 15:22 ` Chris Mason 2009-11-13 14:16 ` Mel Gorman 1 sibling, 1 reply; 57+ messages in thread From: Pekka Enberg @ 2009-11-13 13:41 UTC (permalink / raw) To: Jens Axboe Cc: Mel Gorman, KOSAKI Motohiro, Andrew Morton, Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski, Tobias Oetiker, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, Rik van Riel, Christoph Lameter, Stephan von Krawczynski, Rafael J. Wysocki, Kernel Testers List, Linus Torvalds Hi Jens, On Fri, Nov 13, 2009 at 3:32 PM, Jens Axboe <jens.axboe-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote: >> Suggest an alternative that brings congestion_wait() more in line with >> 2.6.30 behaviour then. > > I don't have a good explanation as to why the delays have changed, > unfortunately. Are we sure that they have between .30 and .31? The > dm-crypt case is overly complex and lots of changes could have broken > that house of cards. Hand-waving or not, we have end user reports stating that reverting commit 8aa7e847d834ed937a9ad37a0f2ad5b8584c1ab0 ("Fix congestion_wait() sync/async vs read/write confusion") fixes their (rather serious) OOM regression. The commit in question _does_ introduce a functional change and if this was your average regression, people would be kicking and screaming to get it reverted. So is there a reason we shouldn't send a partial revert of the commit (switching to BLK_RW_SYNC) to Linus until the "real" issue gets resolved? Yes, I realize it's ugly voodoo magic but dammit, it used to work! Pekka ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 3/5] page allocator: Wait on both sync and async congestion after direct reclaim 2009-11-13 13:41 ` Pekka Enberg @ 2009-11-13 15:22 ` Chris Mason 0 siblings, 0 replies; 57+ messages in thread From: Chris Mason @ 2009-11-13 15:22 UTC (permalink / raw) To: Pekka Enberg Cc: Jens Axboe, Mel Gorman, KOSAKI Motohiro, Andrew Morton, Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski, Tobias Oetiker, linux-kernel, linux-mm@kvack.org, Rik van Riel, Christoph Lameter, Stephan von Krawczynski, Rafael J. Wysocki, Kernel Testers List, Linus Torvalds On Fri, Nov 13, 2009 at 03:41:46PM +0200, Pekka Enberg wrote: > Hi Jens, > > On Fri, Nov 13, 2009 at 3:32 PM, Jens Axboe <jens.axboe@oracle.com> wrote: > >> Suggest an alternative that brings congestion_wait() more in line with > >> 2.6.30 behaviour then. > > > > I don't have a good explanation as to why the delays have changed, > > unfortunately. Are we sure that they have between .30 and .31? The > > dm-crypt case is overly complex and lots of changes could have broken > > that house of cards. > > Hand-waving or not, we have end user reports stating that reverting > commit 8aa7e847d834ed937a9ad37a0f2ad5b8584c1ab0 ("Fix > congestion_wait() sync/async vs read/write confusion") fixes their > (rather serious) OOM regression. The commit in question _does_ > introduce a functional change and if this was your average regression, > people would be kicking and screaming to get it reverted. > > So is there a reason we shouldn't send a partial revert of the commit > (switching to BLK_RW_SYNC) to Linus until the "real" issue gets > resolved? Yes, I realize it's ugly voodoo magic but dammit, it used to > work! If the workload didn't involve dm-crypt everything would make more sense. With dm-crypt, things look like this: VM: Someone submits a write (sync or not) to clean a page out of pages, I'll wait for the disk dm-crypt: Async threads wakeup and start encrypting submitted writes Async threads finish and hand off to other async threads to submit new writes or dm encryption thread submits the write directly (without my patch) The problem with all of this is that waiting for congestion doesn't actually have very much to do with waiting for dm-crypt IO. We might be checking congestion long before dm-cryptd actually gets around to sending the IO to the disk (so we'll just wait for the full timeout), or the congestion check could be waiting for entirely unrelated IO. Because dm-crypt uses the same queue for encrypting and decrypting reads and writes, if writepage needs to read a block (say a metadata mapping block), waiting for that block to become unencrypted will have to wait for the entire queue of both encrypted and decrypted blocks to finish. [ Unless I completely misread dm-crypt, which is always possible ] So, cleaning pages can take an arbitrary amount of time, based on how much CPU time is available, how many other pages are in the worker threads and how fast the disk is once we finally hand the pages over to it. Tuning the VM timeouts based on this seems like a bad idea to me. If the VM is waiting for pages to leave writeback, we should just add a sequence counter and wait queue that gets bumped as pages leave writeback (or something that wouldn't do horrible things to cachelines). Otherwise we're going to end up waiting way too long for the sync congestion on configs that don't intermix sync and async in such complex ways. -chris -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 3/5] page allocator: Wait on both sync and async congestion after direct reclaim [not found] ` <20091113133211.GA8742-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org> 2009-11-13 13:41 ` Pekka Enberg @ 2009-11-13 14:16 ` Mel Gorman 1 sibling, 0 replies; 57+ messages in thread From: Mel Gorman @ 2009-11-13 14:16 UTC (permalink / raw) To: Jens Axboe Cc: KOSAKI Motohiro, Andrew Morton, Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski, Tobias Oetiker, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, Pekka Enberg, Rik van Riel, Christoph Lameter, Stephan von Krawczynski, Rafael J. Wysocki, Kernel Testers List On Fri, Nov 13, 2009 at 02:32:12PM +0100, Jens Axboe wrote: > On Fri, Nov 13 2009, Mel Gorman wrote: > > On Fri, Nov 13, 2009 at 12:55:58PM +0100, Jens Axboe wrote: > > > On Fri, Nov 13 2009, KOSAKI Motohiro wrote: > > > > (cc to Jens) > > > > > > > > > Testing by Frans Pop indicated that in the 2.6.30..2.6.31 window at least > > > > > that the commits 373c0a7e 8aa7e847 dramatically increased the number of > > > > > GFP_ATOMIC failures that were occuring within a wireless driver. Reverting > > > > > this patch seemed to help a lot even though it was pointed out that the > > > > > congestion changes were very far away from high-order atomic allocations. > > > > > > > > > > The key to why the revert makes such a big difference is down to timing and > > > > > how long direct reclaimers wait versus kswapd. With the patch reverted, > > > > > the congestion_wait() is on the SYNC queue instead of the ASYNC. As a > > > > > significant part of the workload involved reads, it makes sense that the > > > > > SYNC list is what was truely congested and with the revert processes were > > > > > waiting on congestion as expected. Hence, direct reclaimers stalled > > > > > properly and kswapd was able to do its job with fewer stalls. > > > > > > > > > > This patch aims to fix the congestion_wait() behaviour for SYNC and ASYNC > > > > > for direct reclaimers. Instead of making the congestion_wait() on the SYNC > > > > > queue which would only fix a particular type of workload, this patch adds a > > > > > third type of congestion_wait - BLK_RW_BOTH which first waits on the ASYNC > > > > > and then the SYNC queue if the timeout has not been reached. In tests, this > > > > > counter-intuitively results in kswapd stalling less and freeing up pages > > > > > resulting in fewer allocation failures and fewer direct-reclaim-orientated > > > > > stalls. > > > > > > > > Honestly, I don't like this patch. page allocator is not related to > > > > sync block queue. vmscan doesn't make read operation. > > > > This patch makes nearly same effect of s/congestion_wait/io_schedule_timeout/. > > > > > > > > Please don't make mysterious heuristic code. > > > > > > > > > > > > Sidenode: I doubt this regression was caused from page allocator. > > > > Probably not. As noted, the major change is really in how long callers > > are waiting on congestion_wait. The tarball includes graphs from an > > instrumented kernel that shows how long callers are waiting due to > > congestion_wait(). This has changed significantly. > > > > I'll queue up tests over the weekend that test without dm-crypt being involved. > > > > > > Probably we need to confirm caller change.... > > > > > > See the email from Chris from yesterday, he nicely explains why this > > > change made a difference with dm-crypt. > > > > Indeed. > > > > But bear in mind that it also possible that direct reclaimers are also > > congesting the queue due to swap-in. > > Are you speculating, or has this been observed? Speculating. I'll need to adjust the test and instrumentation to know for sure. > While I don't contest > that that could happen, it's also not a new thing. And it should be an > unlikely event. > > > > dm-crypt needs fixing, not a hack like this added. > > > > > > > As noted by Chris in the same mail, dm-crypt has not changed. What has > > changed is how long callers wait in congestion_wait. > > Right dm-crypt didn't change, it WAS ALREADY BUGGY. > Fair point. > > > The vm needs to drop congestion hints and usage, not increase it. The > > > above changelog is mostly hand-wavy nonsense, imho. > > > > > > > Suggest an alternative that brings congestion_wait() more in line with > > 2.6.30 behaviour then. > > I don't have a good explanation as to why the delays have changed, > unfortunately. Are we sure that they have between .30 and .31? Fairly sure. The original reporter first reported that the problem was in this range and reverting one of the congestion_wait() patches appeared to help. What the revert was really doing was making congestion_wait() on SYNC instead of ASYNC in a number of cases. > The > dm-crypt case is overly complex and lots of changes could have broken > that house of cards. > -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 3/5] page allocator: Wait on both sync and async congestion after direct reclaim 2009-11-13 13:32 ` Jens Axboe [not found] ` <20091113133211.GA8742-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org> @ 2009-11-20 14:56 ` Mel Gorman 1 sibling, 0 replies; 57+ messages in thread From: Mel Gorman @ 2009-11-20 14:56 UTC (permalink / raw) To: Jens Axboe Cc: KOSAKI Motohiro, Andrew Morton, Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski, Tobias Oetiker, linux-kernel, linux-mm@kvack.org, Pekka Enberg, Rik van Riel, Christoph Lameter, Stephan von Krawczynski, Rafael J. Wysocki, Kernel Testers List On Fri, Nov 13, 2009 at 02:32:12PM +0100, Jens Axboe wrote: > On Fri, Nov 13 2009, Mel Gorman wrote: > > On Fri, Nov 13, 2009 at 12:55:58PM +0100, Jens Axboe wrote: > > > On Fri, Nov 13 2009, KOSAKI Motohiro wrote: > > > > (cc to Jens) > > > > > > > > > Testing by Frans Pop indicated that in the 2.6.30..2.6.31 window at least > > > > > that the commits 373c0a7e 8aa7e847 dramatically increased the number of > > > > > GFP_ATOMIC failures that were occuring within a wireless driver. Reverting > > > > > this patch seemed to help a lot even though it was pointed out that the > > > > > congestion changes were very far away from high-order atomic allocations. > > > > > > > > > > The key to why the revert makes such a big difference is down to timing and > > > > > how long direct reclaimers wait versus kswapd. With the patch reverted, > > > > > the congestion_wait() is on the SYNC queue instead of the ASYNC. As a > > > > > significant part of the workload involved reads, it makes sense that the > > > > > SYNC list is what was truely congested and with the revert processes were > > > > > waiting on congestion as expected. Hence, direct reclaimers stalled > > > > > properly and kswapd was able to do its job with fewer stalls. > > > > > > > > > > This patch aims to fix the congestion_wait() behaviour for SYNC and ASYNC > > > > > for direct reclaimers. Instead of making the congestion_wait() on the SYNC > > > > > queue which would only fix a particular type of workload, this patch adds a > > > > > third type of congestion_wait - BLK_RW_BOTH which first waits on the ASYNC > > > > > and then the SYNC queue if the timeout has not been reached. In tests, this > > > > > counter-intuitively results in kswapd stalling less and freeing up pages > > > > > resulting in fewer allocation failures and fewer direct-reclaim-orientated > > > > > stalls. > > > > > > > > Honestly, I don't like this patch. page allocator is not related to > > > > sync block queue. vmscan doesn't make read operation. > > > > This patch makes nearly same effect of s/congestion_wait/io_schedule_timeout/. > > > > > > > > Please don't make mysterious heuristic code. > > > > > > > > > > > > Sidenode: I doubt this regression was caused from page allocator. > > > > Probably not. As noted, the major change is really in how long callers > > are waiting on congestion_wait. The tarball includes graphs from an > > instrumented kernel that shows how long callers are waiting due to > > congestion_wait(). This has changed significantly. > > > > I'll queue up tests over the weekend that test without dm-crypt being involved. > > > > > > Probably we need to confirm caller change.... > > > > > > See the email from Chris from yesterday, he nicely explains why this > > > change made a difference with dm-crypt. > > > > Indeed. > > > > But bear in mind that it also possible that direct reclaimers are also > > congesting the queue due to swap-in. > > Are you speculating, or has this been observed? While I don't contest > that that could happen, it's also not a new thing. And it should be an > unlikely event. > > > > dm-crypt needs fixing, not a hack like this added. > > > > > > > As noted by Chris in the same mail, dm-crypt has not changed. What has > > changed is how long callers wait in congestion_wait. > > Right dm-crypt didn't change, it WAS ALREADY BUGGY. > On a different note, I tried without dm-crypt and found by far the greatest different to page allocator success or failure was the value of the low_latency tunable. low_latency == 0 2.6.32-rc6-0000000-force-highorder Elapsed:17:50.935(stddev:002.535) Failures:0 2.6.32-rc6-0000006-dm-crypt-unplug Elapsed:18:44.610(stddev:002.236) Failures:1 2.6.32-rc6-0000012-pgalloc-2.6.30 Elapsed:17:18.330(stddev:002.258) Failures:2 2.6.32-rc6-0000123-congestion-both Elapsed:16:17.370(stddev:002.167) Failures:1 2.6.32-rc6-0001234-kswapd-quick-recheck Elapsed:15:54.880(stddev:002.234) Failures:0 2.6.32-rc6-0012345-kswapd-stay-awake-when-min Elapsed:19:26.417(stddev:002.237) Failures:1 2.6.32-rc6-0123456-dm-crypt-unplug Elapsed:20:19.135(stddev:001.516) Failures:0 low_latency == 1 2.6.32-rc6-0000000-force-highorder Elapsed:20:04.755(stddev:078.551) Failures:22 2.6.32-rc6-0000006-dm-crypt-unplug Elapsed:25:05.608(stddev:053.224) Failures:12 2.6.32-rc6-0000012-pgalloc-2.6.30 Elapsed:19:01.530(stddev:002.146) Failures:14 2.6.32-rc6-0000123-congestion-both Elapsed:18:01.938(stddev:002.171) Failures:2 2.6.32-rc6-0001234-kswapd-quick-recheck Elapsed:19:52.833(stddev:064.168) Failures:7 2.6.32-rc6-0012345-kswapd-stay-awake-when-min Elapsed:26:25.767(stddev:050.827) Failures:1 2.6.32-rc6-0123456-dm-crypt-unplug Elapsed:22:56.850(stddev:053.914) Failures:1 Setting low_latency both regresses performance of the test and causes a boatload of allocation failures. Note also the deviations. With low_latency == 0, gitk performs predictably +/- around 2 seconds. With low_latency == 1, there are huge varianes +/- about a minute. Sampling writeback, I found that the number of pages in writeback with low_latency == 1 was higher for longer. Any theories as to why enabling low_latency has such a negative impact? Should it be disabled by default? -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 57+ messages in thread
* [PATCH 4/5] vmscan: Have kswapd sleep for a short interval and double check it should be asleep 2009-11-12 19:30 [PATCH 0/5] Reduce GFP_ATOMIC allocation failures, candidate fix V3 Mel Gorman ` (2 preceding siblings ...) 2009-11-12 19:30 ` [PATCH 3/5] page allocator: Wait on both sync and async congestion after direct reclaim Mel Gorman @ 2009-11-12 19:30 ` Mel Gorman 2009-11-13 10:43 ` KOSAKI Motohiro 2009-11-12 19:30 ` [PATCH 5/5] vmscan: Take order into consideration when deciding if kswapd is in trouble Mel Gorman [not found] ` <1258054235-3208-1-git-send-email-mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> 5 siblings, 1 reply; 57+ messages in thread From: Mel Gorman @ 2009-11-12 19:30 UTC (permalink / raw) To: Andrew Morton, Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski <karol.k.lewando> Cc: linux-kernel, linux-mm@kvack.org", KOSAKI Motohiro, Pekka Enberg, Rik van Riel, Christoph Lameter, Stephan von Krawczynski, Rafael J. Wysocki, Kernel Testers List, Mel Gorman After kswapd balances all zones in a pgdat, it goes to sleep. In the event of no IO congestion, kswapd can go to sleep very shortly after the high watermark was reached. If there are a constant stream of allocations from parallel processes, it can mean that kswapd went to sleep too quickly and the high watermark is not being maintained for sufficient length time. This patch makes kswapd go to sleep as a two-stage process. It first tries to sleep for HZ/10. If it is woken up by another process or the high watermark is no longer met, it's considered a premature sleep and kswapd continues work. Otherwise it goes fully to sleep. This adds more counters to distinguish between fast and slow breaches of watermarks. A "fast" premature sleep is one where the low watermark was hit in a very short time after kswapd going to sleep. A "slow" premature sleep indicates that the high watermark was breached after a very short interval. Signed-off-by: Mel Gorman <mel@csn.ul.ie> --- include/linux/vmstat.h | 1 + mm/vmscan.c | 44 ++++++++++++++++++++++++++++++++++++++++++-- mm/vmstat.c | 2 ++ 3 files changed, 45 insertions(+), 2 deletions(-) diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h index 2d0f222..9716003 100644 --- a/include/linux/vmstat.h +++ b/include/linux/vmstat.h @@ -40,6 +40,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, PGSCAN_ZONE_RECLAIM_FAILED, #endif PGINODESTEAL, SLABS_SCANNED, KSWAPD_STEAL, KSWAPD_INODESTEAL, + KSWAPD_PREMATURE_FAST, KSWAPD_PREMATURE_SLOW, PAGEOUTRUN, ALLOCSTALL, PGROTATED, #ifdef CONFIG_HUGETLB_PAGE HTLB_BUDDY_PGALLOC, HTLB_BUDDY_PGALLOC_FAIL, diff --git a/mm/vmscan.c b/mm/vmscan.c index 190bae1..ffa1766 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1904,6 +1904,24 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont, } #endif +/* is kswapd sleeping prematurely? */ +static int sleeping_prematurely(int order, long remaining) +{ + struct zone *zone; + + /* If a direct reclaimer woke kswapd within HZ/10, it's premature */ + if (remaining) + return 1; + + /* If after HZ/10, a zone is below the high mark, it's premature */ + for_each_populated_zone(zone) + if (!zone_watermark_ok(zone, order, high_wmark_pages(zone), + 0, 0)) + return 1; + + return 0; +} + /* * For kswapd, balance_pgdat() will work across all this node's zones until * they are all at high_wmark_pages(zone). @@ -2184,8 +2202,30 @@ static int kswapd(void *p) */ order = new_order; } else { - if (!freezing(current)) - schedule(); + if (!freezing(current)) { + long remaining = 0; + + /* Try to sleep for a short interval */ + if (!sleeping_prematurely(order, remaining)) { + remaining = schedule_timeout(HZ/10); + finish_wait(&pgdat->kswapd_wait, &wait); + prepare_to_wait(&pgdat->kswapd_wait, &wait, TASK_INTERRUPTIBLE); + } + + /* + * After a short sleep, check if it was a + * premature sleep. If not, then go fully + * to sleep until explicitly woken up + */ + if (!sleeping_prematurely(order, remaining)) + schedule(); + else { + if (remaining) + count_vm_event(KSWAPD_PREMATURE_FAST); + else + count_vm_event(KSWAPD_PREMATURE_SLOW); + } + } order = pgdat->kswapd_max_order; } diff --git a/mm/vmstat.c b/mm/vmstat.c index c81321f..90b11e4 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -683,6 +683,8 @@ static const char * const vmstat_text[] = { "slabs_scanned", "kswapd_steal", "kswapd_inodesteal", + "kswapd_slept_prematurely_fast", + "kswapd_slept_prematurely_slow", "pageoutrun", "allocstall", -- 1.6.5 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 57+ messages in thread
* Re: [PATCH 4/5] vmscan: Have kswapd sleep for a short interval and double check it should be asleep 2009-11-12 19:30 ` [PATCH 4/5] vmscan: Have kswapd sleep for a short interval and double check it should be asleep Mel Gorman @ 2009-11-13 10:43 ` KOSAKI Motohiro [not found] ` <20091113142558.33B6.A69D9226-+CUm20s59erQFUHtdCDX3A@public.gmane.org> 0 siblings, 1 reply; 57+ messages in thread From: KOSAKI Motohiro @ 2009-11-13 10:43 UTC (permalink / raw) To: Mel Gorman Cc: kosaki.motohiro, Andrew Morton, Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski, Tobias Oetiker, linux-kernel, linux-mm@kvack.org", Pekka Enberg, Rik van Riel, Christoph Lameter, Stephan von Krawczynski, Rafael J. Wysocki, Kernel Testers List > After kswapd balances all zones in a pgdat, it goes to sleep. In the event > of no IO congestion, kswapd can go to sleep very shortly after the high > watermark was reached. If there are a constant stream of allocations from > parallel processes, it can mean that kswapd went to sleep too quickly and > the high watermark is not being maintained for sufficient length time. > > This patch makes kswapd go to sleep as a two-stage process. It first > tries to sleep for HZ/10. If it is woken up by another process or the > high watermark is no longer met, it's considered a premature sleep and > kswapd continues work. Otherwise it goes fully to sleep. > > This adds more counters to distinguish between fast and slow breaches of > watermarks. A "fast" premature sleep is one where the low watermark was > hit in a very short time after kswapd going to sleep. A "slow" premature > sleep indicates that the high watermark was breached after a very short > interval. > > Signed-off-by: Mel Gorman <mel@csn.ul.ie> Why do you submit this patch to mainline? this is debugging patch no more and no less. > --- > include/linux/vmstat.h | 1 + > mm/vmscan.c | 44 ++++++++++++++++++++++++++++++++++++++++++-- > mm/vmstat.c | 2 ++ > 3 files changed, 45 insertions(+), 2 deletions(-) > > diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h > index 2d0f222..9716003 100644 > --- a/include/linux/vmstat.h > +++ b/include/linux/vmstat.h > @@ -40,6 +40,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, > PGSCAN_ZONE_RECLAIM_FAILED, > #endif > PGINODESTEAL, SLABS_SCANNED, KSWAPD_STEAL, KSWAPD_INODESTEAL, > + KSWAPD_PREMATURE_FAST, KSWAPD_PREMATURE_SLOW, > PAGEOUTRUN, ALLOCSTALL, PGROTATED, Please don't use the word of "premature" and "fast". it is too hard to understand the meanings. Plus, please use per-zone stastics (like NUMA_HIT). > > #ifdef CONFIG_HUGETLB_PAGE > HTLB_BUDDY_PGALLOC, HTLB_BUDDY_PGALLOC_FAIL, > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 190bae1..ffa1766 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -1904,6 +1904,24 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont, > } > #endif > > +/* is kswapd sleeping prematurely? */ > +static int sleeping_prematurely(int order, long remaining) > +{ > + struct zone *zone; > + > + /* If a direct reclaimer woke kswapd within HZ/10, it's premature */ > + if (remaining) > + return 1; > + > + /* If after HZ/10, a zone is below the high mark, it's premature */ > + for_each_populated_zone(zone) > + if (!zone_watermark_ok(zone, order, high_wmark_pages(zone), > + 0, 0)) > + return 1; for_each_populated_zone() iterate all populated zone. but kswapd shuld't see another node. > + > + return 0; > +} > + > /* > * For kswapd, balance_pgdat() will work across all this node's zones until > * they are all at high_wmark_pages(zone). > @@ -2184,8 +2202,30 @@ static int kswapd(void *p) > */ > order = new_order; > } else { > - if (!freezing(current)) > - schedule(); > + if (!freezing(current)) { > + long remaining = 0; > + > + /* Try to sleep for a short interval */ > + if (!sleeping_prematurely(order, remaining)) { > + remaining = schedule_timeout(HZ/10); > + finish_wait(&pgdat->kswapd_wait, &wait); > + prepare_to_wait(&pgdat->kswapd_wait, &wait, TASK_INTERRUPTIBLE); > + } > + > + /* > + * After a short sleep, check if it was a > + * premature sleep. If not, then go fully > + * to sleep until explicitly woken up > + */ > + if (!sleeping_prematurely(order, remaining)) > + schedule(); > + else { > + if (remaining) > + count_vm_event(KSWAPD_PREMATURE_FAST); > + else > + count_vm_event(KSWAPD_PREMATURE_SLOW); > + } > + } > > order = pgdat->kswapd_max_order; > } > diff --git a/mm/vmstat.c b/mm/vmstat.c > index c81321f..90b11e4 100644 > --- a/mm/vmstat.c > +++ b/mm/vmstat.c > @@ -683,6 +683,8 @@ static const char * const vmstat_text[] = { > "slabs_scanned", > "kswapd_steal", > "kswapd_inodesteal", > + "kswapd_slept_prematurely_fast", > + "kswapd_slept_prematurely_slow", > "pageoutrun", > "allocstall", > > -- > 1.6.5 > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 57+ messages in thread
[parent not found: <20091113142558.33B6.A69D9226-+CUm20s59erQFUHtdCDX3A@public.gmane.org>]
* Re: [PATCH 4/5] vmscan: Have kswapd sleep for a short interval and double check it should be asleep [not found] ` <20091113142558.33B6.A69D9226-+CUm20s59erQFUHtdCDX3A@public.gmane.org> @ 2009-11-13 14:13 ` Mel Gorman 2009-11-13 18:00 ` KOSAKI Motohiro 0 siblings, 1 reply; 57+ messages in thread From: Mel Gorman @ 2009-11-13 14:13 UTC (permalink / raw) To: KOSAKI Motohiro Cc: Andrew Morton, Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski, Tobias Oetiker, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, Pekka Enberg, Rik van Riel, Christoph Lameter, Stephan von Krawczynski, Rafael J. Wysocki, Kernel Testers List On Fri, Nov 13, 2009 at 07:43:09PM +0900, KOSAKI Motohiro wrote: > > After kswapd balances all zones in a pgdat, it goes to sleep. In the event > > of no IO congestion, kswapd can go to sleep very shortly after the high > > watermark was reached. If there are a constant stream of allocations from > > parallel processes, it can mean that kswapd went to sleep too quickly and > > the high watermark is not being maintained for sufficient length time. > > > > This patch makes kswapd go to sleep as a two-stage process. It first > > tries to sleep for HZ/10. If it is woken up by another process or the > > high watermark is no longer met, it's considered a premature sleep and > > kswapd continues work. Otherwise it goes fully to sleep. > > > > This adds more counters to distinguish between fast and slow breaches of > > watermarks. A "fast" premature sleep is one where the low watermark was > > hit in a very short time after kswapd going to sleep. A "slow" premature > > sleep indicates that the high watermark was breached after a very short > > interval. > > > > Signed-off-by: Mel Gorman <mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> > > Why do you submit this patch to mainline? this is debugging patch > no more and no less. > Do you mean the stats part? The stats are included until such time as the page allocator failure reports stop or are significantly reduced. In the event a report is received, the value of the counters help determine if kswapd was struggling or not. They should be removed once this mess is ironed out. If there is a preference, I can split out the stats part and send it to people with page allocator failure reports for retesting. > > > --- > > include/linux/vmstat.h | 1 + > > mm/vmscan.c | 44 ++++++++++++++++++++++++++++++++++++++++++-- > > mm/vmstat.c | 2 ++ > > 3 files changed, 45 insertions(+), 2 deletions(-) > > > > diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h > > index 2d0f222..9716003 100644 > > --- a/include/linux/vmstat.h > > +++ b/include/linux/vmstat.h > > @@ -40,6 +40,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, > > PGSCAN_ZONE_RECLAIM_FAILED, > > #endif > > PGINODESTEAL, SLABS_SCANNED, KSWAPD_STEAL, KSWAPD_INODESTEAL, > > + KSWAPD_PREMATURE_FAST, KSWAPD_PREMATURE_SLOW, > > PAGEOUTRUN, ALLOCSTALL, PGROTATED, > > Please don't use the word of "premature" and "fast". it is too hard to understand the meanings. How about KSWAPD_LOW_WMARK_HIT_QUICKLY and KSWAPD_HIGH_WMARK_HIT_QUICKLY? > Plus, please use per-zone stastics (like NUMA_HIT). > Is that not overkill? The intention is just to track in general terms if kswapd is struggling or not. I hadn't been considering the problem on a per-zone basis to date. > > > > #ifdef CONFIG_HUGETLB_PAGE > > HTLB_BUDDY_PGALLOC, HTLB_BUDDY_PGALLOC_FAIL, > > diff --git a/mm/vmscan.c b/mm/vmscan.c > > index 190bae1..ffa1766 100644 > > --- a/mm/vmscan.c > > +++ b/mm/vmscan.c > > @@ -1904,6 +1904,24 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont, > > } > > #endif > > > > +/* is kswapd sleeping prematurely? */ > > +static int sleeping_prematurely(int order, long remaining) > > +{ > > + struct zone *zone; > > + > > + /* If a direct reclaimer woke kswapd within HZ/10, it's premature */ > > + if (remaining) > > + return 1; > > + > > + /* If after HZ/10, a zone is below the high mark, it's premature */ > > + for_each_populated_zone(zone) > > + if (!zone_watermark_ok(zone, order, high_wmark_pages(zone), > > + 0, 0)) > > + return 1; > > for_each_populated_zone() iterate all populated zone. but kswapd shuld't see another node. > Good spot. How about the following? ==== CUT HERE ==== vmscan: Have kswapd sleep for a short interval and double check it should be asleep fix 1 This patch is a fix and a claritifacation to the patch "vmscan: Have kswapd sleep for a short interval and double check it should be asleep". The fix is for kswapd to only check zones in the node it is responsible for. The clarification is to rename two counters to better explain what is being counted. Signed-off-by: Mel Gorman <mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> --- include/linux/vmstat.h | 2 +- mm/vmscan.c | 20 +++++++++++++------- mm/vmstat.c | 4 ++-- 3 files changed, 16 insertions(+), 10 deletions(-) diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h index 7d66695..0591a48 100644 --- a/include/linux/vmstat.h +++ b/include/linux/vmstat.h @@ -40,7 +40,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, PGSCAN_ZONE_RECLAIM_FAILED, #endif PGINODESTEAL, SLABS_SCANNED, KSWAPD_STEAL, KSWAPD_INODESTEAL, - KSWAPD_PREMATURE_FAST, KSWAPD_PREMATURE_SLOW, + KSWAPD_LOW_WMARK_HIT_QUICKLY, KSWAPD_HIGH_WMARK_HIT_QUICKLY, KSWAPD_NO_CONGESTION_WAIT, PAGEOUTRUN, ALLOCSTALL, PGROTATED, #ifdef CONFIG_HUGETLB_PAGE diff --git a/mm/vmscan.c b/mm/vmscan.c index 70967e1..5557555 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1905,19 +1905,25 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont, #endif /* is kswapd sleeping prematurely? */ -static int sleeping_prematurely(int order, long remaining) +static int sleeping_prematurely(pg_data_t *pgdat, int order, long remaining) { - struct zone *zone; + int i; /* If a direct reclaimer woke kswapd within HZ/10, it's premature */ if (remaining) return 1; /* If after HZ/10, a zone is below the high mark, it's premature */ - for_each_populated_zone(zone) + for (i = 0; i < pgdat->nr_zones; i++) { + struct zone *zone = pgdat->node_zones + i; + + if (!populated_zone(zone)) + continue; + if (!zone_watermark_ok(zone, order, high_wmark_pages(zone), 0, 0)) return 1; + } return 0; } @@ -2221,7 +2227,7 @@ static int kswapd(void *p) long remaining = 0; /* Try to sleep for a short interval */ - if (!sleeping_prematurely(order, remaining)) { + if (!sleeping_prematurely(pgdat, order, remaining)) { remaining = schedule_timeout(HZ/10); finish_wait(&pgdat->kswapd_wait, &wait); prepare_to_wait(&pgdat->kswapd_wait, &wait, TASK_INTERRUPTIBLE); @@ -2232,13 +2238,13 @@ static int kswapd(void *p) * premature sleep. If not, then go fully * to sleep until explicitly woken up */ - if (!sleeping_prematurely(order, remaining)) + if (!sleeping_prematurely(pgdat, order, remaining)) schedule(); else { if (remaining) - count_vm_event(KSWAPD_PREMATURE_FAST); + count_vm_event(KSWAPD_LOW_WMARK_HIT_QUICKLY); else - count_vm_event(KSWAPD_PREMATURE_SLOW); + count_vm_event(KSWAPD_HIGH_WMARK_HIT_QUICKLY); } } diff --git a/mm/vmstat.c b/mm/vmstat.c index bc09547..6cc8dc6 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -683,8 +683,8 @@ static const char * const vmstat_text[] = { "slabs_scanned", "kswapd_steal", "kswapd_inodesteal", - "kswapd_slept_prematurely_fast", - "kswapd_slept_prematurely_slow", + "kswapd_low_wmark_hit_quickly", + "kswapd_high_wmark_hit_quickly", "kswapd_no_congestion_wait", "pageoutrun", "allocstall", ^ permalink raw reply related [flat|nested] 57+ messages in thread
* Re: [PATCH 4/5] vmscan: Have kswapd sleep for a short interval and double check it should be asleep 2009-11-13 14:13 ` Mel Gorman @ 2009-11-13 18:00 ` KOSAKI Motohiro [not found] ` <20091114023901.3DA8.A69D9226-+CUm20s59erQFUHtdCDX3A@public.gmane.org> 0 siblings, 1 reply; 57+ messages in thread From: KOSAKI Motohiro @ 2009-11-13 18:00 UTC (permalink / raw) To: Mel Gorman Cc: kosaki.motohiro, Andrew Morton, Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski, Tobias Oetiker, linux-kernel, linux-mm@kvack.org, Pekka Enberg, Rik van Riel, Christoph Lameter, Stephan von Krawczynski, Rafael J. Wysocki, Kernel Testers List > On Fri, Nov 13, 2009 at 07:43:09PM +0900, KOSAKI Motohiro wrote: > > > After kswapd balances all zones in a pgdat, it goes to sleep. In the event > > > of no IO congestion, kswapd can go to sleep very shortly after the high > > > watermark was reached. If there are a constant stream of allocations from > > > parallel processes, it can mean that kswapd went to sleep too quickly and > > > the high watermark is not being maintained for sufficient length time. > > > > > > This patch makes kswapd go to sleep as a two-stage process. It first > > > tries to sleep for HZ/10. If it is woken up by another process or the > > > high watermark is no longer met, it's considered a premature sleep and > > > kswapd continues work. Otherwise it goes fully to sleep. > > > > > > This adds more counters to distinguish between fast and slow breaches of > > > watermarks. A "fast" premature sleep is one where the low watermark was > > > hit in a very short time after kswapd going to sleep. A "slow" premature > > > sleep indicates that the high watermark was breached after a very short > > > interval. > > > > > > Signed-off-by: Mel Gorman <mel@csn.ul.ie> > > > > Why do you submit this patch to mainline? this is debugging patch > > no more and no less. > > > > Do you mean the stats part? The stats are included until such time as the page > allocator failure reports stop or are significantly reduced. In the event a > report is received, the value of the counters help determine if kswapd was > struggling or not. They should be removed once this mess is ironed out. > > If there is a preference, I can split out the stats part and send it to > people with page allocator failure reports for retesting. I'm sorry my last mail didn't have enough explanation. This stats help to solve this issue. I agreed. but after solving this issue, I don't imagine administrator how to use this stats. if KSWAPD_PREMATURE_FAST or KSWAPD_PREMATURE_SLOW significantly increased, what should admin do? Or, Can LKML folk make any advise to admin? if kernel doesn't have any bug, kswapd wakeup rate is not so worth information imho. following your additional code itself looks good to me. but... > ==== CUT HERE ==== > vmscan: Have kswapd sleep for a short interval and double check it should be asleep fix 1 > > This patch is a fix and a claritifacation to the patch "vmscan: Have > kswapd sleep for a short interval and double check it should be asleep". > The fix is for kswapd to only check zones in the node it is responsible > for. The clarification is to rename two counters to better explain what is > being counted. > > Signed-off-by: Mel Gorman <mel@csn.ul.ie> > --- > include/linux/vmstat.h | 2 +- > mm/vmscan.c | 20 +++++++++++++------- > mm/vmstat.c | 4 ++-- > 3 files changed, 16 insertions(+), 10 deletions(-) > > diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h > index 7d66695..0591a48 100644 > --- a/include/linux/vmstat.h > +++ b/include/linux/vmstat.h > @@ -40,7 +40,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, > PGSCAN_ZONE_RECLAIM_FAILED, > #endif > PGINODESTEAL, SLABS_SCANNED, KSWAPD_STEAL, KSWAPD_INODESTEAL, > - KSWAPD_PREMATURE_FAST, KSWAPD_PREMATURE_SLOW, > + KSWAPD_LOW_WMARK_HIT_QUICKLY, KSWAPD_HIGH_WMARK_HIT_QUICKLY, > KSWAPD_NO_CONGESTION_WAIT, > PAGEOUTRUN, ALLOCSTALL, PGROTATED, > #ifdef CONFIG_HUGETLB_PAGE > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 70967e1..5557555 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -1905,19 +1905,25 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont, > #endif > > /* is kswapd sleeping prematurely? */ > -static int sleeping_prematurely(int order, long remaining) > +static int sleeping_prematurely(pg_data_t *pgdat, int order, long remaining) > { > - struct zone *zone; > + int i; > > /* If a direct reclaimer woke kswapd within HZ/10, it's premature */ > if (remaining) > return 1; > > /* If after HZ/10, a zone is below the high mark, it's premature */ > - for_each_populated_zone(zone) > + for (i = 0; i < pgdat->nr_zones; i++) { > + struct zone *zone = pgdat->node_zones + i; > + > + if (!populated_zone(zone)) > + continue; > + > if (!zone_watermark_ok(zone, order, high_wmark_pages(zone), > 0, 0)) > return 1; > + } > > return 0; > } > @@ -2221,7 +2227,7 @@ static int kswapd(void *p) > long remaining = 0; > > /* Try to sleep for a short interval */ > - if (!sleeping_prematurely(order, remaining)) { > + if (!sleeping_prematurely(pgdat, order, remaining)) { > remaining = schedule_timeout(HZ/10); > finish_wait(&pgdat->kswapd_wait, &wait); > prepare_to_wait(&pgdat->kswapd_wait, &wait, TASK_INTERRUPTIBLE); > @@ -2232,13 +2238,13 @@ static int kswapd(void *p) > * premature sleep. If not, then go fully > * to sleep until explicitly woken up > */ > - if (!sleeping_prematurely(order, remaining)) > + if (!sleeping_prematurely(pgdat, order, remaining)) > schedule(); > else { > if (remaining) > - count_vm_event(KSWAPD_PREMATURE_FAST); > + count_vm_event(KSWAPD_LOW_WMARK_HIT_QUICKLY); > else > - count_vm_event(KSWAPD_PREMATURE_SLOW); > + count_vm_event(KSWAPD_HIGH_WMARK_HIT_QUICKLY); > } > } > > diff --git a/mm/vmstat.c b/mm/vmstat.c > index bc09547..6cc8dc6 100644 > --- a/mm/vmstat.c > +++ b/mm/vmstat.c > @@ -683,8 +683,8 @@ static const char * const vmstat_text[] = { > "slabs_scanned", > "kswapd_steal", > "kswapd_inodesteal", > - "kswapd_slept_prematurely_fast", > - "kswapd_slept_prematurely_slow", > + "kswapd_low_wmark_hit_quickly", > + "kswapd_high_wmark_hit_quickly", > "kswapd_no_congestion_wait", > "pageoutrun", > "allocstall", -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 57+ messages in thread
[parent not found: <20091114023901.3DA8.A69D9226-+CUm20s59erQFUHtdCDX3A@public.gmane.org>]
* Re: [PATCH 4/5] vmscan: Have kswapd sleep for a short interval and double check it should be asleep [not found] ` <20091114023901.3DA8.A69D9226-+CUm20s59erQFUHtdCDX3A@public.gmane.org> @ 2009-11-13 18:17 ` Mel Gorman [not found] ` <20091113181740.GN29804-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> 0 siblings, 1 reply; 57+ messages in thread From: Mel Gorman @ 2009-11-13 18:17 UTC (permalink / raw) To: KOSAKI Motohiro Cc: Andrew Morton, Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski, Tobias Oetiker, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, Pekka Enberg, Rik van Riel, Christoph Lameter, Stephan von Krawczynski, Rafael J. Wysocki, Kernel Testers List On Sat, Nov 14, 2009 at 03:00:57AM +0900, KOSAKI Motohiro wrote: > > On Fri, Nov 13, 2009 at 07:43:09PM +0900, KOSAKI Motohiro wrote: > > > > After kswapd balances all zones in a pgdat, it goes to sleep. In the event > > > > of no IO congestion, kswapd can go to sleep very shortly after the high > > > > watermark was reached. If there are a constant stream of allocations from > > > > parallel processes, it can mean that kswapd went to sleep too quickly and > > > > the high watermark is not being maintained for sufficient length time. > > > > > > > > This patch makes kswapd go to sleep as a two-stage process. It first > > > > tries to sleep for HZ/10. If it is woken up by another process or the > > > > high watermark is no longer met, it's considered a premature sleep and > > > > kswapd continues work. Otherwise it goes fully to sleep. > > > > > > > > This adds more counters to distinguish between fast and slow breaches of > > > > watermarks. A "fast" premature sleep is one where the low watermark was > > > > hit in a very short time after kswapd going to sleep. A "slow" premature > > > > sleep indicates that the high watermark was breached after a very short > > > > interval. > > > > > > > > Signed-off-by: Mel Gorman <mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> > > > > > > Why do you submit this patch to mainline? this is debugging patch > > > no more and no less. > > > > > > > Do you mean the stats part? The stats are included until such time as the page > > allocator failure reports stop or are significantly reduced. In the event a > > report is received, the value of the counters help determine if kswapd was > > struggling or not. They should be removed once this mess is ironed out. > > > > If there is a preference, I can split out the stats part and send it to > > people with page allocator failure reports for retesting. > > I'm sorry my last mail didn't have enough explanation. > This stats help to solve this issue. I agreed. but after solving this issue, > I don't imagine administrator how to use this stats. if KSWAPD_PREMATURE_FAST or > KSWAPD_PREMATURE_SLOW significantly increased, what should admin do? One possible workaround would be to raise min_free_kbytes while a fix is being worked on. > Or, Can LKML folk make any advise to admin? > Work with them to fix the bug :/ > if kernel doesn't have any bug, kswapd wakeup rate is not so worth information imho. > following your additional code itself looks good to me. but... > > > > ==== CUT HERE ==== > > vmscan: Have kswapd sleep for a short interval and double check it should be asleep fix 1 > > > > This patch is a fix and a claritifacation to the patch "vmscan: Have > > kswapd sleep for a short interval and double check it should be asleep". > > The fix is for kswapd to only check zones in the node it is responsible > > for. The clarification is to rename two counters to better explain what is > > being counted. > > > > Signed-off-by: Mel Gorman <mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> > > --- > > include/linux/vmstat.h | 2 +- > > mm/vmscan.c | 20 +++++++++++++------- > > mm/vmstat.c | 4 ++-- > > 3 files changed, 16 insertions(+), 10 deletions(-) > > > > diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h > > index 7d66695..0591a48 100644 > > --- a/include/linux/vmstat.h > > +++ b/include/linux/vmstat.h > > @@ -40,7 +40,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, > > PGSCAN_ZONE_RECLAIM_FAILED, > > #endif > > PGINODESTEAL, SLABS_SCANNED, KSWAPD_STEAL, KSWAPD_INODESTEAL, > > - KSWAPD_PREMATURE_FAST, KSWAPD_PREMATURE_SLOW, > > + KSWAPD_LOW_WMARK_HIT_QUICKLY, KSWAPD_HIGH_WMARK_HIT_QUICKLY, > > KSWAPD_NO_CONGESTION_WAIT, > > PAGEOUTRUN, ALLOCSTALL, PGROTATED, > > #ifdef CONFIG_HUGETLB_PAGE > > diff --git a/mm/vmscan.c b/mm/vmscan.c > > index 70967e1..5557555 100644 > > --- a/mm/vmscan.c > > +++ b/mm/vmscan.c > > @@ -1905,19 +1905,25 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont, > > #endif > > > > /* is kswapd sleeping prematurely? */ > > -static int sleeping_prematurely(int order, long remaining) > > +static int sleeping_prematurely(pg_data_t *pgdat, int order, long remaining) > > { > > - struct zone *zone; > > + int i; > > > > /* If a direct reclaimer woke kswapd within HZ/10, it's premature */ > > if (remaining) > > return 1; > > > > /* If after HZ/10, a zone is below the high mark, it's premature */ > > - for_each_populated_zone(zone) > > + for (i = 0; i < pgdat->nr_zones; i++) { > > + struct zone *zone = pgdat->node_zones + i; > > + > > + if (!populated_zone(zone)) > > + continue; > > + > > if (!zone_watermark_ok(zone, order, high_wmark_pages(zone), > > 0, 0)) > > return 1; > > + } > > > > return 0; > > } > > @@ -2221,7 +2227,7 @@ static int kswapd(void *p) > > long remaining = 0; > > > > /* Try to sleep for a short interval */ > > - if (!sleeping_prematurely(order, remaining)) { > > + if (!sleeping_prematurely(pgdat, order, remaining)) { > > remaining = schedule_timeout(HZ/10); > > finish_wait(&pgdat->kswapd_wait, &wait); > > prepare_to_wait(&pgdat->kswapd_wait, &wait, TASK_INTERRUPTIBLE); > > @@ -2232,13 +2238,13 @@ static int kswapd(void *p) > > * premature sleep. If not, then go fully > > * to sleep until explicitly woken up > > */ > > - if (!sleeping_prematurely(order, remaining)) > > + if (!sleeping_prematurely(pgdat, order, remaining)) > > schedule(); > > else { > > if (remaining) > > - count_vm_event(KSWAPD_PREMATURE_FAST); > > + count_vm_event(KSWAPD_LOW_WMARK_HIT_QUICKLY); > > else > > - count_vm_event(KSWAPD_PREMATURE_SLOW); > > + count_vm_event(KSWAPD_HIGH_WMARK_HIT_QUICKLY); > > } > > } > > > > diff --git a/mm/vmstat.c b/mm/vmstat.c > > index bc09547..6cc8dc6 100644 > > --- a/mm/vmstat.c > > +++ b/mm/vmstat.c > > @@ -683,8 +683,8 @@ static const char * const vmstat_text[] = { > > "slabs_scanned", > > "kswapd_steal", > > "kswapd_inodesteal", > > - "kswapd_slept_prematurely_fast", > > - "kswapd_slept_prematurely_slow", > > + "kswapd_low_wmark_hit_quickly", > > + "kswapd_high_wmark_hit_quickly", > > "kswapd_no_congestion_wait", > > "pageoutrun", > > "allocstall", > > > -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ^ permalink raw reply [flat|nested] 57+ messages in thread
[parent not found: <20091113181740.GN29804-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org>]
* Re: [PATCH 4/5] vmscan: Have kswapd sleep for a short interval and double check it should be asleep [not found] ` <20091113181740.GN29804-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> @ 2009-11-14 9:34 ` KOSAKI Motohiro [not found] ` <2f11576a0911140134u21eafa83t9642bb25ccd953de-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 57+ messages in thread From: KOSAKI Motohiro @ 2009-11-14 9:34 UTC (permalink / raw) To: Mel Gorman Cc: Andrew Morton, Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski, Tobias Oetiker, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, Pekka Enberg, Rik van Riel, Christoph Lameter, Stephan von Krawczynski, Rafael J. Wysocki, Kernel Testers List 2009/11/14 Mel Gorman <mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org>: > On Sat, Nov 14, 2009 at 03:00:57AM +0900, KOSAKI Motohiro wrote: >> > On Fri, Nov 13, 2009 at 07:43:09PM +0900, KOSAKI Motohiro wrote: >> > > > After kswapd balances all zones in a pgdat, it goes to sleep. In the event >> > > > of no IO congestion, kswapd can go to sleep very shortly after the high >> > > > watermark was reached. If there are a constant stream of allocations from >> > > > parallel processes, it can mean that kswapd went to sleep too quickly and >> > > > the high watermark is not being maintained for sufficient length time. >> > > > >> > > > This patch makes kswapd go to sleep as a two-stage process. It first >> > > > tries to sleep for HZ/10. If it is woken up by another process or the >> > > > high watermark is no longer met, it's considered a premature sleep and >> > > > kswapd continues work. Otherwise it goes fully to sleep. >> > > > >> > > > This adds more counters to distinguish between fast and slow breaches of >> > > > watermarks. A "fast" premature sleep is one where the low watermark was >> > > > hit in a very short time after kswapd going to sleep. A "slow" premature >> > > > sleep indicates that the high watermark was breached after a very short >> > > > interval. >> > > > >> > > > Signed-off-by: Mel Gorman <mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> >> > > >> > > Why do you submit this patch to mainline? this is debugging patch >> > > no more and no less. >> > > >> > >> > Do you mean the stats part? The stats are included until such time as the page >> > allocator failure reports stop or are significantly reduced. In the event a >> > report is received, the value of the counters help determine if kswapd was >> > struggling or not. They should be removed once this mess is ironed out. >> > >> > If there is a preference, I can split out the stats part and send it to >> > people with page allocator failure reports for retesting. >> >> I'm sorry my last mail didn't have enough explanation. >> This stats help to solve this issue. I agreed. but after solving this issue, >> I don't imagine administrator how to use this stats. if KSWAPD_PREMATURE_FAST or >> KSWAPD_PREMATURE_SLOW significantly increased, what should admin do? > > One possible workaround would be to raise min_free_kbytes while a fix is > being worked on. Please correct me, if I said wrong thing. if I was admin, I don't watch this stats because kswapd frequently wakeup doesn't mean any trouble. instead I watch number of allocation failure. [see include/linux/vmstat.h] umm... Why don't we have nr-allocation-failure vmstat? I don't remember it. GFP_NOWARN failure doesn't make syslog error logging. but frequently GFP_NOWARN failure imply the system is under memory pressure or under heavy fragmentation. It is good opportunity to change min_free_kbytes. >> Or, Can LKML folk make any advise to admin? >> > > Work with them to fix the bug :/ > >> if kernel doesn't have any bug, kswapd wakeup rate is not so worth information imho. >> following your additional code itself looks good to me. but... >> >> >> > ==== CUT HERE ==== >> > vmscan: Have kswapd sleep for a short interval and double check it should be asleep fix 1 >> > >> > This patch is a fix and a claritifacation to the patch "vmscan: Have >> > kswapd sleep for a short interval and double check it should be asleep". >> > The fix is for kswapd to only check zones in the node it is responsible >> > for. The clarification is to rename two counters to better explain what is >> > being counted. >> > >> > Signed-off-by: Mel Gorman <mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> >> > --- >> > include/linux/vmstat.h | 2 +- >> > mm/vmscan.c | 20 +++++++++++++------- >> > mm/vmstat.c | 4 ++-- >> > 3 files changed, 16 insertions(+), 10 deletions(-) >> > >> > diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h >> > index 7d66695..0591a48 100644 >> > --- a/include/linux/vmstat.h >> > +++ b/include/linux/vmstat.h >> > @@ -40,7 +40,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, >> > PGSCAN_ZONE_RECLAIM_FAILED, >> > #endif >> > PGINODESTEAL, SLABS_SCANNED, KSWAPD_STEAL, KSWAPD_INODESTEAL, >> > - KSWAPD_PREMATURE_FAST, KSWAPD_PREMATURE_SLOW, >> > + KSWAPD_LOW_WMARK_HIT_QUICKLY, KSWAPD_HIGH_WMARK_HIT_QUICKLY, >> > KSWAPD_NO_CONGESTION_WAIT, >> > PAGEOUTRUN, ALLOCSTALL, PGROTATED, >> > #ifdef CONFIG_HUGETLB_PAGE >> > diff --git a/mm/vmscan.c b/mm/vmscan.c >> > index 70967e1..5557555 100644 >> > --- a/mm/vmscan.c >> > +++ b/mm/vmscan.c >> > @@ -1905,19 +1905,25 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont, >> > #endif >> > >> > /* is kswapd sleeping prematurely? */ >> > -static int sleeping_prematurely(int order, long remaining) >> > +static int sleeping_prematurely(pg_data_t *pgdat, int order, long remaining) >> > { >> > - struct zone *zone; >> > + int i; >> > >> > /* If a direct reclaimer woke kswapd within HZ/10, it's premature */ >> > if (remaining) >> > return 1; >> > >> > /* If after HZ/10, a zone is below the high mark, it's premature */ >> > - for_each_populated_zone(zone) >> > + for (i = 0; i < pgdat->nr_zones; i++) { >> > + struct zone *zone = pgdat->node_zones + i; >> > + >> > + if (!populated_zone(zone)) >> > + continue; >> > + >> > if (!zone_watermark_ok(zone, order, high_wmark_pages(zone), >> > 0, 0)) >> > return 1; >> > + } >> > >> > return 0; >> > } >> > @@ -2221,7 +2227,7 @@ static int kswapd(void *p) >> > long remaining = 0; >> > >> > /* Try to sleep for a short interval */ >> > - if (!sleeping_prematurely(order, remaining)) { >> > + if (!sleeping_prematurely(pgdat, order, remaining)) { >> > remaining = schedule_timeout(HZ/10); >> > finish_wait(&pgdat->kswapd_wait, &wait); >> > prepare_to_wait(&pgdat->kswapd_wait, &wait, TASK_INTERRUPTIBLE); >> > @@ -2232,13 +2238,13 @@ static int kswapd(void *p) >> > * premature sleep. If not, then go fully >> > * to sleep until explicitly woken up >> > */ >> > - if (!sleeping_prematurely(order, remaining)) >> > + if (!sleeping_prematurely(pgdat, order, remaining)) >> > schedule(); >> > else { >> > if (remaining) >> > - count_vm_event(KSWAPD_PREMATURE_FAST); >> > + count_vm_event(KSWAPD_LOW_WMARK_HIT_QUICKLY); >> > else >> > - count_vm_event(KSWAPD_PREMATURE_SLOW); >> > + count_vm_event(KSWAPD_HIGH_WMARK_HIT_QUICKLY); >> > } >> > } >> > >> > diff --git a/mm/vmstat.c b/mm/vmstat.c >> > index bc09547..6cc8dc6 100644 >> > --- a/mm/vmstat.c >> > +++ b/mm/vmstat.c >> > @@ -683,8 +683,8 @@ static const char * const vmstat_text[] = { >> > "slabs_scanned", >> > "kswapd_steal", >> > "kswapd_inodesteal", >> > - "kswapd_slept_prematurely_fast", >> > - "kswapd_slept_prematurely_slow", >> > + "kswapd_low_wmark_hit_quickly", >> > + "kswapd_high_wmark_hit_quickly", >> > "kswapd_no_congestion_wait", >> > "pageoutrun", >> > "allocstall", >> >> >> > > -- > Mel Gorman > Part-time Phd Student Linux Technology Center > University of Limerick IBM Dublin Software Lab > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo-Bw31MaZKKs0EbZ0PF+XxCw@public.gmane.org For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org"> email-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org </a> > ^ permalink raw reply [flat|nested] 57+ messages in thread
[parent not found: <2f11576a0911140134u21eafa83t9642bb25ccd953de-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH 4/5] vmscan: Have kswapd sleep for a short interval and double check it should be asleep [not found] ` <2f11576a0911140134u21eafa83t9642bb25ccd953de-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2009-11-14 15:46 ` Mel Gorman [not found] ` <20091114154636.GR29804-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> 2009-11-17 10:34 ` [PATCH] vmscan: Have kswapd sleep for a short interval and double check it should be asleep fix 1 Mel Gorman 1 sibling, 1 reply; 57+ messages in thread From: Mel Gorman @ 2009-11-14 15:46 UTC (permalink / raw) To: KOSAKI Motohiro Cc: Andrew Morton, Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski, Tobias Oetiker, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, Pekka Enberg, Rik van Riel, Christoph Lameter, Stephan von Krawczynski, Rafael J. Wysocki, Kernel Testers List On Sat, Nov 14, 2009 at 06:34:23PM +0900, KOSAKI Motohiro wrote: > 2009/11/14 Mel Gorman <mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org>: > > On Sat, Nov 14, 2009 at 03:00:57AM +0900, KOSAKI Motohiro wrote: > >> > On Fri, Nov 13, 2009 at 07:43:09PM +0900, KOSAKI Motohiro wrote: > >> > > > After kswapd balances all zones in a pgdat, it goes to sleep. In the event > >> > > > of no IO congestion, kswapd can go to sleep very shortly after the high > >> > > > watermark was reached. If there are a constant stream of allocations from > >> > > > parallel processes, it can mean that kswapd went to sleep too quickly and > >> > > > the high watermark is not being maintained for sufficient length time. > >> > > > > >> > > > This patch makes kswapd go to sleep as a two-stage process. It first > >> > > > tries to sleep for HZ/10. If it is woken up by another process or the > >> > > > high watermark is no longer met, it's considered a premature sleep and > >> > > > kswapd continues work. Otherwise it goes fully to sleep. > >> > > > > >> > > > This adds more counters to distinguish between fast and slow breaches of > >> > > > watermarks. A "fast" premature sleep is one where the low watermark was > >> > > > hit in a very short time after kswapd going to sleep. A "slow" premature > >> > > > sleep indicates that the high watermark was breached after a very short > >> > > > interval. > >> > > > > >> > > > Signed-off-by: Mel Gorman <mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> > >> > > > >> > > Why do you submit this patch to mainline? this is debugging patch > >> > > no more and no less. > >> > > > >> > > >> > Do you mean the stats part? The stats are included until such time as the page > >> > allocator failure reports stop or are significantly reduced. In the event a > >> > report is received, the value of the counters help determine if kswapd was > >> > struggling or not. They should be removed once this mess is ironed out. > >> > > >> > If there is a preference, I can split out the stats part and send it to > >> > people with page allocator failure reports for retesting. > >> > >> I'm sorry my last mail didn't have enough explanation. > >> This stats help to solve this issue. I agreed. but after solving this issue, > >> I don't imagine administrator how to use this stats. if KSWAPD_PREMATURE_FAST or > >> KSWAPD_PREMATURE_SLOW significantly increased, what should admin do? > > > > One possible workaround would be to raise min_free_kbytes while a fix is > > being worked on. > > Please correct me, if I said wrong thing. > You didn't. > if I was admin, I don't watch this stats because kswapd frequently > wakeup doesn't mean any trouble. instead I watch number of allocation > failure. > The stats are not tracking when kswapd wakes up. It helps track how quickly the high or low watermarks are going under once kswapd tries to go back to sleep. > [see include/linux/vmstat.h] > > umm... > Why don't we have nr-allocation-failure vmstat? There isn't a stat for allocation failures, but there are tracepoints that can be used to find out that information if necessary. For the cases I'm looking at the moment though, GFP_NOWARN is not a factor so just looking through the kernel log has been sufficient. > I don't remember it. > GFP_NOWARN failure doesn't make syslog error logging. but frequently > GFP_NOWARN failure imply the system is under memory pressure or under > heavy fragmentation. It is good opportunity to change min_free_kbytes. > > > > >> Or, Can LKML folk make any advise to admin? > >> > > > > Work with them to fix the bug :/ > > > >> if kernel doesn't have any bug, kswapd wakeup rate is not so worth information imho. > >> following your additional code itself looks good to me. but... > >> > >> > >> > ==== CUT HERE ==== > >> > vmscan: Have kswapd sleep for a short interval and double check it should be asleep fix 1 > >> > > >> > This patch is a fix and a claritifacation to the patch "vmscan: Have > >> > kswapd sleep for a short interval and double check it should be asleep". > >> > The fix is for kswapd to only check zones in the node it is responsible > >> > for. The clarification is to rename two counters to better explain what is > >> > being counted. > >> > > >> > Signed-off-by: Mel Gorman <mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> > >> > --- > >> > include/linux/vmstat.h | 2 +- > >> > mm/vmscan.c | 20 +++++++++++++------- > >> > mm/vmstat.c | 4 ++-- > >> > 3 files changed, 16 insertions(+), 10 deletions(-) > >> > > >> > diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h > >> > index 7d66695..0591a48 100644 > >> > --- a/include/linux/vmstat.h > >> > +++ b/include/linux/vmstat.h > >> > @@ -40,7 +40,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, > >> > PGSCAN_ZONE_RECLAIM_FAILED, > >> > #endif > >> > PGINODESTEAL, SLABS_SCANNED, KSWAPD_STEAL, KSWAPD_INODESTEAL, > >> > - KSWAPD_PREMATURE_FAST, KSWAPD_PREMATURE_SLOW, > >> > + KSWAPD_LOW_WMARK_HIT_QUICKLY, KSWAPD_HIGH_WMARK_HIT_QUICKLY, > >> > KSWAPD_NO_CONGESTION_WAIT, > >> > PAGEOUTRUN, ALLOCSTALL, PGROTATED, > >> > #ifdef CONFIG_HUGETLB_PAGE > >> > diff --git a/mm/vmscan.c b/mm/vmscan.c > >> > index 70967e1..5557555 100644 > >> > --- a/mm/vmscan.c > >> > +++ b/mm/vmscan.c > >> > @@ -1905,19 +1905,25 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont, > >> > #endif > >> > > >> > /* is kswapd sleeping prematurely? */ > >> > -static int sleeping_prematurely(int order, long remaining) > >> > +static int sleeping_prematurely(pg_data_t *pgdat, int order, long remaining) > >> > { > >> > - struct zone *zone; > >> > + int i; > >> > > >> > /* If a direct reclaimer woke kswapd within HZ/10, it's premature */ > >> > if (remaining) > >> > return 1; > >> > > >> > /* If after HZ/10, a zone is below the high mark, it's premature */ > >> > - for_each_populated_zone(zone) > >> > + for (i = 0; i < pgdat->nr_zones; i++) { > >> > + struct zone *zone = pgdat->node_zones + i; > >> > + > >> > + if (!populated_zone(zone)) > >> > + continue; > >> > + > >> > if (!zone_watermark_ok(zone, order, high_wmark_pages(zone), > >> > 0, 0)) > >> > return 1; > >> > + } > >> > > >> > return 0; > >> > } > >> > @@ -2221,7 +2227,7 @@ static int kswapd(void *p) > >> > long remaining = 0; > >> > > >> > /* Try to sleep for a short interval */ > >> > - if (!sleeping_prematurely(order, remaining)) { > >> > + if (!sleeping_prematurely(pgdat, order, remaining)) { > >> > remaining = schedule_timeout(HZ/10); > >> > finish_wait(&pgdat->kswapd_wait, &wait); > >> > prepare_to_wait(&pgdat->kswapd_wait, &wait, TASK_INTERRUPTIBLE); > >> > @@ -2232,13 +2238,13 @@ static int kswapd(void *p) > >> > * premature sleep. If not, then go fully > >> > * to sleep until explicitly woken up > >> > */ > >> > - if (!sleeping_prematurely(order, remaining)) > >> > + if (!sleeping_prematurely(pgdat, order, remaining)) > >> > schedule(); > >> > else { > >> > if (remaining) > >> > - count_vm_event(KSWAPD_PREMATURE_FAST); > >> > + count_vm_event(KSWAPD_LOW_WMARK_HIT_QUICKLY); > >> > else > >> > - count_vm_event(KSWAPD_PREMATURE_SLOW); > >> > + count_vm_event(KSWAPD_HIGH_WMARK_HIT_QUICKLY); > >> > } > >> > } > >> > > >> > diff --git a/mm/vmstat.c b/mm/vmstat.c > >> > index bc09547..6cc8dc6 100644 > >> > --- a/mm/vmstat.c > >> > +++ b/mm/vmstat.c > >> > @@ -683,8 +683,8 @@ static const char * const vmstat_text[] = { > >> > "slabs_scanned", > >> > "kswapd_steal", > >> > "kswapd_inodesteal", > >> > - "kswapd_slept_prematurely_fast", > >> > - "kswapd_slept_prematurely_slow", > >> > + "kswapd_low_wmark_hit_quickly", > >> > + "kswapd_high_wmark_hit_quickly", > >> > "kswapd_no_congestion_wait", > >> > "pageoutrun", > >> > "allocstall", > >> > >> > >> > > -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ^ permalink raw reply [flat|nested] 57+ messages in thread
[parent not found: <20091114154636.GR29804-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org>]
* Re: [PATCH 4/5] vmscan: Have kswapd sleep for a short interval and double check it should be asleep [not found] ` <20091114154636.GR29804-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> @ 2009-11-17 11:03 ` KOSAKI Motohiro [not found] ` <20091117141638.3DCB.A69D9226-+CUm20s59erQFUHtdCDX3A@public.gmane.org> 0 siblings, 1 reply; 57+ messages in thread From: KOSAKI Motohiro @ 2009-11-17 11:03 UTC (permalink / raw) To: Mel Gorman Cc: kosaki.motohiro-+CUm20s59erQFUHtdCDX3A, Andrew Morton, Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski, Tobias Oetiker, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, Pekka Enberg, Rik van Riel, Christoph Lameter, Stephan von Krawczynski, Rafael J. Wysocki, Kernel Testers List I'm sorry for the long delay. > On Sat, Nov 14, 2009 at 06:34:23PM +0900, KOSAKI Motohiro wrote: > > 2009/11/14 Mel Gorman <mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org>: > > > On Sat, Nov 14, 2009 at 03:00:57AM +0900, KOSAKI Motohiro wrote: > > >> > On Fri, Nov 13, 2009 at 07:43:09PM +0900, KOSAKI Motohiro wrote: > > >> > > > After kswapd balances all zones in a pgdat, it goes to sleep. In the event > > >> > > > of no IO congestion, kswapd can go to sleep very shortly after the high > > >> > > > watermark was reached. If there are a constant stream of allocations from > > >> > > > parallel processes, it can mean that kswapd went to sleep too quickly and > > >> > > > the high watermark is not being maintained for sufficient length time. > > >> > > > > > >> > > > This patch makes kswapd go to sleep as a two-stage process. It first > > >> > > > tries to sleep for HZ/10. If it is woken up by another process or the > > >> > > > high watermark is no longer met, it's considered a premature sleep and > > >> > > > kswapd continues work. Otherwise it goes fully to sleep. > > >> > > > > > >> > > > This adds more counters to distinguish between fast and slow breaches of > > >> > > > watermarks. A "fast" premature sleep is one where the low watermark was > > >> > > > hit in a very short time after kswapd going to sleep. A "slow" premature > > >> > > > sleep indicates that the high watermark was breached after a very short > > >> > > > interval. > > >> > > > > > >> > > > Signed-off-by: Mel Gorman <mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> > > >> > > > > >> > > Why do you submit this patch to mainline? this is debugging patch > > >> > > no more and no less. > > >> > > > > >> > > > >> > Do you mean the stats part? The stats are included until such time as the page > > >> > allocator failure reports stop or are significantly reduced. In the event a > > >> > report is received, the value of the counters help determine if kswapd was > > >> > struggling or not. They should be removed once this mess is ironed out. > > >> > > > >> > If there is a preference, I can split out the stats part and send it to > > >> > people with page allocator failure reports for retesting. > > >> > > >> I'm sorry my last mail didn't have enough explanation. > > >> This stats help to solve this issue. I agreed. but after solving this issue, > > >> I don't imagine administrator how to use this stats. if KSWAPD_PREMATURE_FAST or > > >> KSWAPD_PREMATURE_SLOW significantly increased, what should admin do? > > > > > > One possible workaround would be to raise min_free_kbytes while a fix is > > > being worked on. > > > > Please correct me, if I said wrong thing. > > You didn't. > > > if I was admin, I don't watch this stats because kswapd frequently > > wakeup doesn't mean any trouble. instead I watch number of allocation > > failure. > > The stats are not tracking when kswapd wakes up. It helps track how > quickly the high or low watermarks are going under once kswapd tries to > go back to sleep. Umm, honestly I'm still puzlled. probably we need go back one step at once. kswapd wake up when memory amount less than low watermark and sleep when memory amount much than high watermask. We need to know GFP_ATOMIC failure sign. My point is, kswapd wakeup only happen after kswapd sleeping. but if the system is under heavy pressure and memory amount go up and down between low watermark and high watermark, this stats don't increase at all. IOW, this stats is strong related to high watermark. Probaby, min watermark or low watermark are more useful for us. # of called wake_all_kswapd() is related to low watermark. and It's conteniously increase although the system have strong memroy pressure. I'm ok. KSWAPD_NO_CONGESTION_WAIT is related to min watermark. I'm ok too.. # of page allocation failure is related to min watermark too. I'm ok too. IOW, I only dislike this stat stop increase strong memory pressure (above explanation). Can you please tell me why you think kswapd slept time is so important? ^ permalink raw reply [flat|nested] 57+ messages in thread
[parent not found: <20091117141638.3DCB.A69D9226-+CUm20s59erQFUHtdCDX3A@public.gmane.org>]
* Re: [PATCH 4/5] vmscan: Have kswapd sleep for a short interval and double check it should be asleep [not found] ` <20091117141638.3DCB.A69D9226-+CUm20s59erQFUHtdCDX3A@public.gmane.org> @ 2009-11-17 11:44 ` Mel Gorman 2009-11-17 12:18 ` KOSAKI Motohiro 0 siblings, 1 reply; 57+ messages in thread From: Mel Gorman @ 2009-11-17 11:44 UTC (permalink / raw) To: KOSAKI Motohiro Cc: Andrew Morton, Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski, Tobias Oetiker, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, Pekka Enberg, Rik van Riel, Christoph Lameter, Stephan von Krawczynski, Rafael J. Wysocki, Kernel Testers List On Tue, Nov 17, 2009 at 08:03:21PM +0900, KOSAKI Motohiro wrote: > I'm sorry for the long delay. > > > On Sat, Nov 14, 2009 at 06:34:23PM +0900, KOSAKI Motohiro wrote: > > > 2009/11/14 Mel Gorman <mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org>: > > > > On Sat, Nov 14, 2009 at 03:00:57AM +0900, KOSAKI Motohiro wrote: > > > >> > On Fri, Nov 13, 2009 at 07:43:09PM +0900, KOSAKI Motohiro wrote: > > > >> > > > After kswapd balances all zones in a pgdat, it goes to sleep. In the event > > > >> > > > of no IO congestion, kswapd can go to sleep very shortly after the high > > > >> > > > watermark was reached. If there are a constant stream of allocations from > > > >> > > > parallel processes, it can mean that kswapd went to sleep too quickly and > > > >> > > > the high watermark is not being maintained for sufficient length time. > > > >> > > > > > > >> > > > This patch makes kswapd go to sleep as a two-stage process. It first > > > >> > > > tries to sleep for HZ/10. If it is woken up by another process or the > > > >> > > > high watermark is no longer met, it's considered a premature sleep and > > > >> > > > kswapd continues work. Otherwise it goes fully to sleep. > > > >> > > > > > > >> > > > This adds more counters to distinguish between fast and slow breaches of > > > >> > > > watermarks. A "fast" premature sleep is one where the low watermark was > > > >> > > > hit in a very short time after kswapd going to sleep. A "slow" premature > > > >> > > > sleep indicates that the high watermark was breached after a very short > > > >> > > > interval. > > > >> > > > > > > >> > > > Signed-off-by: Mel Gorman <mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> > > > >> > > > > > >> > > Why do you submit this patch to mainline? this is debugging patch > > > >> > > no more and no less. > > > >> > > > > > >> > > > > >> > Do you mean the stats part? The stats are included until such time as the page > > > >> > allocator failure reports stop or are significantly reduced. In the event a > > > >> > report is received, the value of the counters help determine if kswapd was > > > >> > struggling or not. They should be removed once this mess is ironed out. > > > >> > > > > >> > If there is a preference, I can split out the stats part and send it to > > > >> > people with page allocator failure reports for retesting. > > > >> > > > >> I'm sorry my last mail didn't have enough explanation. > > > >> This stats help to solve this issue. I agreed. but after solving this issue, > > > >> I don't imagine administrator how to use this stats. if KSWAPD_PREMATURE_FAST or > > > >> KSWAPD_PREMATURE_SLOW significantly increased, what should admin do? > > > > > > > > One possible workaround would be to raise min_free_kbytes while a fix is > > > > being worked on. > > > > > > Please correct me, if I said wrong thing. > > > > You didn't. > > > > > if I was admin, I don't watch this stats because kswapd frequently > > > wakeup doesn't mean any trouble. instead I watch number of allocation > > > failure. > > > > The stats are not tracking when kswapd wakes up. It helps track how > > quickly the high or low watermarks are going under once kswapd tries to > > go back to sleep. > > Umm, honestly I'm still puzlled. probably we need go back one step at once. > kswapd wake up when memory amount less than low watermark and sleep > when memory amount much than high watermask. We need to know > GFP_ATOMIC failure sign. > > My point is, kswapd wakeup only happen after kswapd sleeping. but if the system is > under heavy pressure and memory amount go up and down between low watermark > and high watermark, this stats don't increase at all. IOW, this stats is strong related to > high watermark. > Yes, this is true but as long as kswapd is awake and doing its job, it will continue taking direction on what order it should be reclaiming from processes that failed the low_watermark test. The GFP_ATOMIC allocations will be allowed to go under this low watermark but will have informed kswapd what order it should be reclaiming at so it stays working. A stat that increases between the low and high watermark would indicate that memory pressure is there or that the reclaim algorithm is not working as expected but that's checking for a different problem. What I was looking at was kswapd going to sleep and the low or min watermarks being hit very quickly after that so that kswapd pre-emptively kicks in before allocations start failing again. > Probaby, min watermark or low watermark are more useful for us. > Why? kswapd is awake between those points. > # of called wake_all_kswapd() is related to low watermark. and It's conteniously > increase although the system have strong memroy pressure. I'm ok. > KSWAPD_NO_CONGESTION_WAIT is related to min watermark. I'm ok too.. > # of page allocation failure is related to min watermark too. I'm ok too. > > IOW, I only dislike this stat stop increase strong memory pressure (above explanation). > Can you please tell me why you think kswapd slept time is so important? > I don't think the amount of time it has slept is important. I think it's important to know if the system is getting back into watermark trouble very shortly after kswapd reached the high watermark. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 4/5] vmscan: Have kswapd sleep for a short interval and double check it should be asleep 2009-11-17 11:44 ` Mel Gorman @ 2009-11-17 12:18 ` KOSAKI Motohiro 2009-11-17 12:25 ` Mel Gorman 0 siblings, 1 reply; 57+ messages in thread From: KOSAKI Motohiro @ 2009-11-17 12:18 UTC (permalink / raw) To: Mel Gorman Cc: kosaki.motohiro, Andrew Morton, Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski, Tobias Oetiker, linux-kernel, linux-mm@kvack.org, Pekka Enberg, Rik van Riel, Christoph Lameter, Stephan von Krawczynski, Rafael J. Wysocki, Kernel Testers List > On Tue, Nov 17, 2009 at 08:03:21PM +0900, KOSAKI Motohiro wrote: > > I'm sorry for the long delay. > > > > > On Sat, Nov 14, 2009 at 06:34:23PM +0900, KOSAKI Motohiro wrote: > > > > 2009/11/14 Mel Gorman <mel@csn.ul.ie>: > > > > > On Sat, Nov 14, 2009 at 03:00:57AM +0900, KOSAKI Motohiro wrote: > > > > >> > On Fri, Nov 13, 2009 at 07:43:09PM +0900, KOSAKI Motohiro wrote: > > > > >> > > > After kswapd balances all zones in a pgdat, it goes to sleep. In the event > > > > >> > > > of no IO congestion, kswapd can go to sleep very shortly after the high > > > > >> > > > watermark was reached. If there are a constant stream of allocations from > > > > >> > > > parallel processes, it can mean that kswapd went to sleep too quickly and > > > > >> > > > the high watermark is not being maintained for sufficient length time. > > > > >> > > > > > > > >> > > > This patch makes kswapd go to sleep as a two-stage process. It first > > > > >> > > > tries to sleep for HZ/10. If it is woken up by another process or the > > > > >> > > > high watermark is no longer met, it's considered a premature sleep and > > > > >> > > > kswapd continues work. Otherwise it goes fully to sleep. > > > > >> > > > > > > > >> > > > This adds more counters to distinguish between fast and slow breaches of > > > > >> > > > watermarks. A "fast" premature sleep is one where the low watermark was > > > > >> > > > hit in a very short time after kswapd going to sleep. A "slow" premature > > > > >> > > > sleep indicates that the high watermark was breached after a very short > > > > >> > > > interval. > > > > >> > > > > > > > >> > > > Signed-off-by: Mel Gorman <mel@csn.ul.ie> > > > > >> > > > > > > >> > > Why do you submit this patch to mainline? this is debugging patch > > > > >> > > no more and no less. > > > > >> > > > > > > >> > > > > > >> > Do you mean the stats part? The stats are included until such time as the page > > > > >> > allocator failure reports stop or are significantly reduced. In the event a > > > > >> > report is received, the value of the counters help determine if kswapd was > > > > >> > struggling or not. They should be removed once this mess is ironed out. > > > > >> > > > > > >> > If there is a preference, I can split out the stats part and send it to > > > > >> > people with page allocator failure reports for retesting. > > > > >> > > > > >> I'm sorry my last mail didn't have enough explanation. > > > > >> This stats help to solve this issue. I agreed. but after solving this issue, > > > > >> I don't imagine administrator how to use this stats. if KSWAPD_PREMATURE_FAST or > > > > >> KSWAPD_PREMATURE_SLOW significantly increased, what should admin do? > > > > > > > > > > One possible workaround would be to raise min_free_kbytes while a fix is > > > > > being worked on. > > > > > > > > Please correct me, if I said wrong thing. > > > > > > You didn't. > > > > > > > if I was admin, I don't watch this stats because kswapd frequently > > > > wakeup doesn't mean any trouble. instead I watch number of allocation > > > > failure. > > > > > > The stats are not tracking when kswapd wakes up. It helps track how > > > quickly the high or low watermarks are going under once kswapd tries to > > > go back to sleep. > > > > Umm, honestly I'm still puzlled. probably we need go back one step at once. > > kswapd wake up when memory amount less than low watermark and sleep > > when memory amount much than high watermask. We need to know > > GFP_ATOMIC failure sign. > > > > My point is, kswapd wakeup only happen after kswapd sleeping. but if the system is > > under heavy pressure and memory amount go up and down between low watermark > > and high watermark, this stats don't increase at all. IOW, this stats is strong related to > > high watermark. > > > > Yes, this is true but as long as kswapd is awake and doing its job, it > will continue taking direction on what order it should be reclaiming from > processes that failed the low_watermark test. The GFP_ATOMIC allocations > will be allowed to go under this low watermark but will have informed kswapd > what order it should be reclaiming at so it stays working. > > A stat that increases between the low and high watermark would indicate > that memory pressure is there or that the reclaim algorithm is not > working as expected but that's checking for a different problem. > > What I was looking at was kswapd going to sleep and the low or min watermarks > being hit very quickly after that so that kswapd pre-emptively kicks in > before allocations start failing again. > > > Probaby, min watermark or low watermark are more useful for us. > > Why? kswapd is awake between those points. What's difference below (1) and (2)? 1. kswapd() run 100ms and sleep 10ms and run 100ms. 2. kswapd() run 200ms (1) represent amount memory go beyond high-watermark very shortly and go below low-watermark right after . (2) represent amount memory neared high-watermark closely, but don't touched, and probably go blow low-watermark right after. It's almost same memory pressure. but (1) increase KSWAPD_HIGH_WMARK_HIT_QUICKLY and (2) don't increase any stat. Thus, We can't think KSWAPD_HIGH_WMARK_HIT_QUICKLY indicate memory pressure. > > # of called wake_all_kswapd() is related to low watermark. and It's conteniously > > increase although the system have strong memroy pressure. I'm ok. > > KSWAPD_NO_CONGESTION_WAIT is related to min watermark. I'm ok too.. > > # of page allocation failure is related to min watermark too. I'm ok too. > > > > IOW, I only dislike this stat stop increase strong memory pressure (above explanation). > > Can you please tell me why you think kswapd slept time is so important? > > I don't think the amount of time it has slept is important. I think it's > important to know if the system is getting back into watermark trouble very > shortly after kswapd reached the high watermark. Probably, My last mail doesn't take kindly explanation. My point is, beyond high-watermark or not doesn't indicate any meaningful phenomenon. Then, I'd prefer low or min-watermark related stats. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 4/5] vmscan: Have kswapd sleep for a short interval and double check it should be asleep 2009-11-17 12:18 ` KOSAKI Motohiro @ 2009-11-17 12:25 ` Mel Gorman [not found] ` <20091117122555.GZ29804-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> 0 siblings, 1 reply; 57+ messages in thread From: Mel Gorman @ 2009-11-17 12:25 UTC (permalink / raw) To: KOSAKI Motohiro Cc: Andrew Morton, Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski, Tobias Oetiker, linux-kernel, linux-mm@kvack.org, Pekka Enberg, Rik van Riel, Christoph Lameter, Stephan von Krawczynski, Rafael J. Wysocki, Kernel Testers List On Tue, Nov 17, 2009 at 09:18:11PM +0900, KOSAKI Motohiro wrote: > > On Tue, Nov 17, 2009 at 08:03:21PM +0900, KOSAKI Motohiro wrote: > > > I'm sorry for the long delay. > > > > > > > On Sat, Nov 14, 2009 at 06:34:23PM +0900, KOSAKI Motohiro wrote: > > > > > 2009/11/14 Mel Gorman <mel@csn.ul.ie>: > > > > > > On Sat, Nov 14, 2009 at 03:00:57AM +0900, KOSAKI Motohiro wrote: > > > > > >> > On Fri, Nov 13, 2009 at 07:43:09PM +0900, KOSAKI Motohiro wrote: > > > > > >> > > > After kswapd balances all zones in a pgdat, it goes to sleep. In the event > > > > > >> > > > of no IO congestion, kswapd can go to sleep very shortly after the high > > > > > >> > > > watermark was reached. If there are a constant stream of allocations from > > > > > >> > > > parallel processes, it can mean that kswapd went to sleep too quickly and > > > > > >> > > > the high watermark is not being maintained for sufficient length time. > > > > > >> > > > > > > > > >> > > > This patch makes kswapd go to sleep as a two-stage process. It first > > > > > >> > > > tries to sleep for HZ/10. If it is woken up by another process or the > > > > > >> > > > high watermark is no longer met, it's considered a premature sleep and > > > > > >> > > > kswapd continues work. Otherwise it goes fully to sleep. > > > > > >> > > > > > > > > >> > > > This adds more counters to distinguish between fast and slow breaches of > > > > > >> > > > watermarks. A "fast" premature sleep is one where the low watermark was > > > > > >> > > > hit in a very short time after kswapd going to sleep. A "slow" premature > > > > > >> > > > sleep indicates that the high watermark was breached after a very short > > > > > >> > > > interval. > > > > > >> > > > > > > > > >> > > > Signed-off-by: Mel Gorman <mel@csn.ul.ie> > > > > > >> > > > > > > > >> > > Why do you submit this patch to mainline? this is debugging patch > > > > > >> > > no more and no less. > > > > > >> > > > > > > > >> > > > > > > >> > Do you mean the stats part? The stats are included until such time as the page > > > > > >> > allocator failure reports stop or are significantly reduced. In the event a > > > > > >> > report is received, the value of the counters help determine if kswapd was > > > > > >> > struggling or not. They should be removed once this mess is ironed out. > > > > > >> > > > > > > >> > If there is a preference, I can split out the stats part and send it to > > > > > >> > people with page allocator failure reports for retesting. > > > > > >> > > > > > >> I'm sorry my last mail didn't have enough explanation. > > > > > >> This stats help to solve this issue. I agreed. but after solving this issue, > > > > > >> I don't imagine administrator how to use this stats. if KSWAPD_PREMATURE_FAST or > > > > > >> KSWAPD_PREMATURE_SLOW significantly increased, what should admin do? > > > > > > > > > > > > One possible workaround would be to raise min_free_kbytes while a fix is > > > > > > being worked on. > > > > > > > > > > Please correct me, if I said wrong thing. > > > > > > > > You didn't. > > > > > > > > > if I was admin, I don't watch this stats because kswapd frequently > > > > > wakeup doesn't mean any trouble. instead I watch number of allocation > > > > > failure. > > > > > > > > The stats are not tracking when kswapd wakes up. It helps track how > > > > quickly the high or low watermarks are going under once kswapd tries to > > > > go back to sleep. > > > > > > Umm, honestly I'm still puzlled. probably we need go back one step at once. > > > kswapd wake up when memory amount less than low watermark and sleep > > > when memory amount much than high watermask. We need to know > > > GFP_ATOMIC failure sign. > > > > > > My point is, kswapd wakeup only happen after kswapd sleeping. but if the system is > > > under heavy pressure and memory amount go up and down between low watermark > > > and high watermark, this stats don't increase at all. IOW, this stats is strong related to > > > high watermark. > > > > > > > Yes, this is true but as long as kswapd is awake and doing its job, it > > will continue taking direction on what order it should be reclaiming from > > processes that failed the low_watermark test. The GFP_ATOMIC allocations > > will be allowed to go under this low watermark but will have informed kswapd > > what order it should be reclaiming at so it stays working. > > > > A stat that increases between the low and high watermark would indicate > > that memory pressure is there or that the reclaim algorithm is not > > working as expected but that's checking for a different problem. > > > > What I was looking at was kswapd going to sleep and the low or min watermarks > > being hit very quickly after that so that kswapd pre-emptively kicks in > > before allocations start failing again. > > > > > Probaby, min watermark or low watermark are more useful for us. > > > > Why? kswapd is awake between those points. > > What's difference below (1) and (2)? > > 1. kswapd() run 100ms and sleep 10ms and run 100ms. > 2. kswapd() run 200ms > Because prior to the patch, once kswapd went to sleep, it would not wake again until the low watermark was reached. There appeared to be a timing issue where congestion_wait() would block kswapd just long enough before checking the high watermark to mean that it stayed awake. This wasn't happening hence the approach of briefly-sleep-after-high-watermark-is-reached-and-double-check-watermarks-are-ok. > (1) represent amount memory go beyond high-watermark very shortly and go below > low-watermark right after . (2) represent amount memory neared high-watermark closely, but don't touched, > and probably go blow low-watermark right after. > It's almost same memory pressure. but (1) increase KSWAPD_HIGH_WMARK_HIT_QUICKLY and > (2) don't increase any stat. > > Thus, We can't think KSWAPD_HIGH_WMARK_HIT_QUICKLY indicate memory pressure. > It indicates mild memory pressure because the high watermark was only reached for a short period of time. > > > # of called wake_all_kswapd() is related to low watermark. and It's conteniously > > > increase although the system have strong memroy pressure. I'm ok. > > > KSWAPD_NO_CONGESTION_WAIT is related to min watermark. I'm ok too.. > > > # of page allocation failure is related to min watermark too. I'm ok too. > > > > > > IOW, I only dislike this stat stop increase strong memory pressure (above explanation). > > > Can you please tell me why you think kswapd slept time is so important? > > > > I don't think the amount of time it has slept is important. I think it's > > important to know if the system is getting back into watermark trouble very > > shortly after kswapd reached the high watermark. > > Probably, My last mail doesn't take kindly explanation. > My point is, beyond high-watermark or not doesn't indicate any meaningful > phenomenon. > > Then, I'd prefer low or min-watermark related stats. > What would that have to do with the kswapd-briefly-sleep logic? i.e. the stats you are suggesting, while meaningful, are for looking at a different type of problem. If you want those stats, then I should revert the stats part of this patch altogether. Does that then mean you also want the patch that makes kswapd double check watermarks to be dropped or just have no associated stats to see what it's doing in the event of allocation failure? -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 57+ messages in thread
[parent not found: <20091117122555.GZ29804-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org>]
* Re: [PATCH 4/5] vmscan: Have kswapd sleep for a short interval and double check it should be asleep [not found] ` <20091117122555.GZ29804-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> @ 2009-11-18 5:20 ` KOSAKI Motohiro 0 siblings, 0 replies; 57+ messages in thread From: KOSAKI Motohiro @ 2009-11-18 5:20 UTC (permalink / raw) To: Mel Gorman Cc: kosaki.motohiro-+CUm20s59erQFUHtdCDX3A, Andrew Morton, Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski, Tobias Oetiker, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, Pekka Enberg, Rik van Riel, Christoph Lameter, Stephan von Krawczynski, Rafael J. Wysocki, Kernel Testers List > On Tue, Nov 17, 2009 at 09:18:11PM +0900, KOSAKI Motohiro wrote: > > > On Tue, Nov 17, 2009 at 08:03:21PM +0900, KOSAKI Motohiro wrote: > > > > I'm sorry for the long delay. > > > > > > > > > On Sat, Nov 14, 2009 at 06:34:23PM +0900, KOSAKI Motohiro wrote: > > > > > > 2009/11/14 Mel Gorman <mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org>: > > > > > > > On Sat, Nov 14, 2009 at 03:00:57AM +0900, KOSAKI Motohiro wrote: > > > > > > >> > On Fri, Nov 13, 2009 at 07:43:09PM +0900, KOSAKI Motohiro wrote: > > > > > > >> > > > After kswapd balances all zones in a pgdat, it goes to sleep. In the event > > > > > > >> > > > of no IO congestion, kswapd can go to sleep very shortly after the high > > > > > > >> > > > watermark was reached. If there are a constant stream of allocations from > > > > > > >> > > > parallel processes, it can mean that kswapd went to sleep too quickly and > > > > > > >> > > > the high watermark is not being maintained for sufficient length time. > > > > > > >> > > > > > > > > > >> > > > This patch makes kswapd go to sleep as a two-stage process. It first > > > > > > >> > > > tries to sleep for HZ/10. If it is woken up by another process or the > > > > > > >> > > > high watermark is no longer met, it's considered a premature sleep and > > > > > > >> > > > kswapd continues work. Otherwise it goes fully to sleep. > > > > > > >> > > > > > > > > > >> > > > This adds more counters to distinguish between fast and slow breaches of > > > > > > >> > > > watermarks. A "fast" premature sleep is one where the low watermark was > > > > > > >> > > > hit in a very short time after kswapd going to sleep. A "slow" premature > > > > > > >> > > > sleep indicates that the high watermark was breached after a very short > > > > > > >> > > > interval. > > > > > > >> > > > > > > > > > >> > > > Signed-off-by: Mel Gorman <mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> > > > > > > >> > > > > > > > > >> > > Why do you submit this patch to mainline? this is debugging patch > > > > > > >> > > no more and no less. > > > > > > >> > > > > > > > > >> > > > > > > > >> > Do you mean the stats part? The stats are included until such time as the page > > > > > > >> > allocator failure reports stop or are significantly reduced. In the event a > > > > > > >> > report is received, the value of the counters help determine if kswapd was > > > > > > >> > struggling or not. They should be removed once this mess is ironed out. > > > > > > >> > > > > > > > >> > If there is a preference, I can split out the stats part and send it to > > > > > > >> > people with page allocator failure reports for retesting. > > > > > > >> > > > > > > >> I'm sorry my last mail didn't have enough explanation. > > > > > > >> This stats help to solve this issue. I agreed. but after solving this issue, > > > > > > >> I don't imagine administrator how to use this stats. if KSWAPD_PREMATURE_FAST or > > > > > > >> KSWAPD_PREMATURE_SLOW significantly increased, what should admin do? > > > > > > > > > > > > > > One possible workaround would be to raise min_free_kbytes while a fix is > > > > > > > being worked on. > > > > > > > > > > > > Please correct me, if I said wrong thing. > > > > > > > > > > You didn't. > > > > > > > > > > > if I was admin, I don't watch this stats because kswapd frequently > > > > > > wakeup doesn't mean any trouble. instead I watch number of allocation > > > > > > failure. > > > > > > > > > > The stats are not tracking when kswapd wakes up. It helps track how > > > > > quickly the high or low watermarks are going under once kswapd tries to > > > > > go back to sleep. > > > > > > > > Umm, honestly I'm still puzlled. probably we need go back one step at once. > > > > kswapd wake up when memory amount less than low watermark and sleep > > > > when memory amount much than high watermask. We need to know > > > > GFP_ATOMIC failure sign. > > > > > > > > My point is, kswapd wakeup only happen after kswapd sleeping. but if the system is > > > > under heavy pressure and memory amount go up and down between low watermark > > > > and high watermark, this stats don't increase at all. IOW, this stats is strong related to > > > > high watermark. > > > > > > > > > > Yes, this is true but as long as kswapd is awake and doing its job, it > > > will continue taking direction on what order it should be reclaiming from > > > processes that failed the low_watermark test. The GFP_ATOMIC allocations > > > will be allowed to go under this low watermark but will have informed kswapd > > > what order it should be reclaiming at so it stays working. > > > > > > A stat that increases between the low and high watermark would indicate > > > that memory pressure is there or that the reclaim algorithm is not > > > working as expected but that's checking for a different problem. > > > > > > What I was looking at was kswapd going to sleep and the low or min watermarks > > > being hit very quickly after that so that kswapd pre-emptively kicks in > > > before allocations start failing again. > > > > > > > Probaby, min watermark or low watermark are more useful for us. > > > > > > Why? kswapd is awake between those points. > > > > What's difference below (1) and (2)? > > > > 1. kswapd() run 100ms and sleep 10ms and run 100ms. > > 2. kswapd() run 200ms > > Because prior to the patch, once kswapd went to sleep, it would > not wake again until the low watermark was reached. There appeared > to be a timing issue where congestion_wait() would block kswapd > just long enough before checking the high watermark to mean that > it stayed awake. This wasn't happening hence the approach of > briefly-sleep-after-high-watermark-is-reached-and-double-check-watermarks-are-ok. You are right. > > (1) represent amount memory go beyond high-watermark very shortly and go below > > low-watermark right after . (2) represent amount memory neared high-watermark closely, but don't touched, > > and probably go blow low-watermark right after. > > It's almost same memory pressure. but (1) increase KSWAPD_HIGH_WMARK_HIT_QUICKLY and > > (2) don't increase any stat. > > > > Thus, We can't think KSWAPD_HIGH_WMARK_HIT_QUICKLY indicate memory pressure. > > > > It indicates mild memory pressure because the high watermark was only > reached for a short period of time. > > > > > # of called wake_all_kswapd() is related to low watermark. and It's conteniously > > > > increase although the system have strong memroy pressure. I'm ok. > > > > KSWAPD_NO_CONGESTION_WAIT is related to min watermark. I'm ok too.. > > > > # of page allocation failure is related to min watermark too. I'm ok too. > > > > > > > > IOW, I only dislike this stat stop increase strong memory pressure (above explanation). > > > > Can you please tell me why you think kswapd slept time is so important? > > > > > > I don't think the amount of time it has slept is important. I think it's > > > important to know if the system is getting back into watermark trouble very > > > shortly after kswapd reached the high watermark. > > > > Probably, My last mail doesn't take kindly explanation. > > My point is, beyond high-watermark or not doesn't indicate any meaningful > > phenomenon. > > > > Then, I'd prefer low or min-watermark related stats. > > What would that have to do with the kswapd-briefly-sleep logic? i.e. the > stats you are suggesting, while meaningful, are for looking at a > different type of problem. If you want those stats, then I should revert > the stats part of this patch altogether. Does that then mean you also > want the patch that makes kswapd double check watermarks to be dropped > or just have no associated stats to see what it's doing in the event of > allocation failure? Yes, you talked right thing. After awhile thinking (and one night good sleeping), I conclude I should not enforce my personal preference any more. I'm sorry. Honestly, I haven't understand this stats usage. but the stats is not core piece in this patch concept. we sholdn't get stuck small issue. then, I'll review your latest patch soon. Thanks. ^ permalink raw reply [flat|nested] 57+ messages in thread
* [PATCH] vmscan: Have kswapd sleep for a short interval and double check it should be asleep fix 1 [not found] ` <2f11576a0911140134u21eafa83t9642bb25ccd953de-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2009-11-14 15:46 ` Mel Gorman @ 2009-11-17 10:34 ` Mel Gorman 2009-11-18 5:27 ` KOSAKI Motohiro 1 sibling, 1 reply; 57+ messages in thread From: Mel Gorman @ 2009-11-17 10:34 UTC (permalink / raw) To: KOSAKI Motohiro, Andrew Morton Cc: Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski, Tobias Oetiker, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, Pekka Enberg, Rik van Riel, Christoph Lameter, Stephan von Krawczynski, Rafael J. Wysocki, Kernel Testers List When checking if kswapd is sleeping prematurely, all populated zones are checked instead of the zones the instance of kswapd is responsible for. The counters for kswapd going to sleep prematurely are also named poorly. This patch makes kswapd only check its own zones and renames the relevant counters. This is a fix to the patch vmscan-have-kswapd-sleep-for-a-short-interval-and-double-check-it-should-be-asleep.patch and is based on top of mmotm-2009-11-13-19-59. It would be preferable if Kosaki Motohiro signed off on it as he had comments on the patch but the discussion petered out without a solid conclusion. Signed-off-by: Mel Gorman <mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> --- include/linux/vmstat.h | 2 +- mm/vmscan.c | 20 +++++++++++++------- mm/vmstat.c | 4 ++-- 3 files changed, 16 insertions(+), 10 deletions(-) diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h index 70c8093..117f0dd 100644 --- a/include/linux/vmstat.h +++ b/include/linux/vmstat.h @@ -40,7 +40,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, PGSCAN_ZONE_RECLAIM_FAILED, #endif PGINODESTEAL, SLABS_SCANNED, KSWAPD_STEAL, KSWAPD_INODESTEAL, - KSWAPD_PREMATURE_FAST, KSWAPD_PREMATURE_SLOW, + KSWAPD_LOW_WMARK_HIT_QUICKLY, KSWAPD_HIGH_WMARK_HIT_QUICKLY, KSWAPD_SKIP_CONGESTION_WAIT, PAGEOUTRUN, ALLOCSTALL, PGROTATED, #ifdef CONFIG_HUGETLB_PAGE diff --git a/mm/vmscan.c b/mm/vmscan.c index 81ef29b..da6cf42 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1907,19 +1907,25 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont, #endif /* is kswapd sleeping prematurely? */ -static int sleeping_prematurely(int order, long remaining) +static int sleeping_prematurely(pg_data_t *pgdat, int order, long remaining) { - struct zone *zone; + int i; /* If a direct reclaimer woke kswapd within HZ/10, it's premature */ if (remaining) return 1; /* If after HZ/10, a zone is below the high mark, it's premature */ - for_each_populated_zone(zone) + for (i = 0; i < pgdat->nr_zones; i++) { + struct zone *zone = pgdat->node_zones + i; + + if (!populated_zone(zone)) + continue; + if (!zone_watermark_ok(zone, order, high_wmark_pages(zone), 0, 0)) return 1; + } return 0; } @@ -2227,7 +2233,7 @@ static int kswapd(void *p) long remaining = 0; /* Try to sleep for a short interval */ - if (!sleeping_prematurely(order, remaining)) { + if (!sleeping_prematurely(pgdat, order, remaining)) { remaining = schedule_timeout(HZ/10); finish_wait(&pgdat->kswapd_wait, &wait); prepare_to_wait(&pgdat->kswapd_wait, &wait, TASK_INTERRUPTIBLE); @@ -2238,13 +2244,13 @@ static int kswapd(void *p) * premature sleep. If not, then go fully * to sleep until explicitly woken up */ - if (!sleeping_prematurely(order, remaining)) + if (!sleeping_prematurely(pgdat, order, remaining)) schedule(); else { if (remaining) - count_vm_event(KSWAPD_PREMATURE_FAST); + count_vm_event(KSWAPD_LOW_WMARK_HIT_QUICKLY); else - count_vm_event(KSWAPD_PREMATURE_SLOW); + count_vm_event(KSWAPD_HIGH_WMARK_HIT_QUICKLY); } } diff --git a/mm/vmstat.c b/mm/vmstat.c index 889254f..6051fba 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -683,8 +683,8 @@ static const char * const vmstat_text[] = { "slabs_scanned", "kswapd_steal", "kswapd_inodesteal", - "kswapd_slept_prematurely_fast", - "kswapd_slept_prematurely_slow", + "kswapd_low_wmark_hit_quickly", + "kswapd_high_wmark_hit_quickly", "kswapd_skip_congestion_wait", "pageoutrun", "allocstall", ^ permalink raw reply related [flat|nested] 57+ messages in thread
* Re: [PATCH] vmscan: Have kswapd sleep for a short interval and double check it should be asleep fix 1 2009-11-17 10:34 ` [PATCH] vmscan: Have kswapd sleep for a short interval and double check it should be asleep fix 1 Mel Gorman @ 2009-11-18 5:27 ` KOSAKI Motohiro 0 siblings, 0 replies; 57+ messages in thread From: KOSAKI Motohiro @ 2009-11-18 5:27 UTC (permalink / raw) To: Mel Gorman Cc: kosaki.motohiro, Andrew Morton, Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski, Tobias Oetiker, linux-kernel, linux-mm@kvack.org, Pekka Enberg, Rik van Riel, Christoph Lameter, Stephan von Krawczynski, Rafael J. Wysocki, Kernel Testers List > When checking if kswapd is sleeping prematurely, all populated zones are > checked instead of the zones the instance of kswapd is responsible for. > The counters for kswapd going to sleep prematurely are also named poorly. > This patch makes kswapd only check its own zones and renames the relevant > counters. > > This is a fix to the patch > vmscan-have-kswapd-sleep-for-a-short-interval-and-double-check-it-should-be-asleep.patch > and is based on top of mmotm-2009-11-13-19-59. It would be preferable if > Kosaki Motohiro signed off on it as he had comments on the patch but the > discussion petered out without a solid conclusion. > > Signed-off-by: Mel Gorman <mel@csn.ul.ie> Looks good to me. I apologize to bother you by nit for long time. Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 57+ messages in thread
* [PATCH 5/5] vmscan: Take order into consideration when deciding if kswapd is in trouble 2009-11-12 19:30 [PATCH 0/5] Reduce GFP_ATOMIC allocation failures, candidate fix V3 Mel Gorman ` (3 preceding siblings ...) 2009-11-12 19:30 ` [PATCH 4/5] vmscan: Have kswapd sleep for a short interval and double check it should be asleep Mel Gorman @ 2009-11-12 19:30 ` Mel Gorman 2009-11-13 9:54 ` KOSAKI Motohiro [not found] ` <1258054235-3208-6-git-send-email-mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> [not found] ` <1258054235-3208-1-git-send-email-mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> 5 siblings, 2 replies; 57+ messages in thread From: Mel Gorman @ 2009-11-12 19:30 UTC (permalink / raw) To: Andrew Morton, Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski <karol.k.lewando> Cc: linux-kernel, linux-mm@kvack.org", KOSAKI Motohiro, Pekka Enberg, Rik van Riel, Christoph Lameter, Stephan von Krawczynski, Rafael J. Wysocki, Kernel Testers List, Mel Gorman If reclaim fails to make sufficient progress, the priority is raised. Once the priority is higher, kswapd starts waiting on congestion. However, on systems with large numbers of high-order atomics due to crappy network cards, it's important that kswapd keep working in parallel to save their sorry ass. This patch takes into account the order kswapd is reclaiming at before waiting on congestion. The higher the order, the longer it is before kswapd considers itself to be in trouble. The impact is that kswapd works harder in parallel rather than depending on direct reclaimers or atomic allocations to fail. Signed-off-by: Mel Gorman <mel@csn.ul.ie> --- mm/vmscan.c | 14 ++++++++++++-- 1 files changed, 12 insertions(+), 2 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index ffa1766..5e200f1 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1946,7 +1946,7 @@ static int sleeping_prematurely(int order, long remaining) static unsigned long balance_pgdat(pg_data_t *pgdat, int order) { int all_zones_ok; - int priority; + int priority, congestion_priority; int i; unsigned long total_scanned; struct reclaim_state *reclaim_state = current->reclaim_state; @@ -1967,6 +1967,16 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order) */ int temp_priority[MAX_NR_ZONES]; + /* + * When priority reaches congestion_priority, kswapd will sleep + * for a short time while congestion clears. The higher the + * order being reclaimed, the less likely kswapd will go to + * sleep as high-order allocations are harder to reclaim and + * stall direct reclaimers longer + */ + congestion_priority = DEF_PRIORITY - 2; + congestion_priority -= min(congestion_priority, sc.order); + loop_again: total_scanned = 0; sc.nr_reclaimed = 0; @@ -2092,7 +2102,7 @@ loop_again: * OK, kswapd is getting into trouble. Take a nap, then take * another pass across the zones. */ - if (total_scanned && priority < DEF_PRIORITY - 2) + if (total_scanned && priority < congestion_priority) congestion_wait(BLK_RW_ASYNC, HZ/10); /* -- 1.6.5 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 57+ messages in thread
* Re: [PATCH 5/5] vmscan: Take order into consideration when deciding if kswapd is in trouble 2009-11-12 19:30 ` [PATCH 5/5] vmscan: Take order into consideration when deciding if kswapd is in trouble Mel Gorman @ 2009-11-13 9:54 ` KOSAKI Motohiro [not found] ` <20091113142608.33B9.A69D9226-+CUm20s59erQFUHtdCDX3A@public.gmane.org> 2009-11-13 14:38 ` [PATCH 5/5] vmscan: Take order into consideration when deciding if kswapd is in trouble Minchan Kim [not found] ` <1258054235-3208-6-git-send-email-mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> 1 sibling, 2 replies; 57+ messages in thread From: KOSAKI Motohiro @ 2009-11-13 9:54 UTC (permalink / raw) To: Mel Gorman Cc: kosaki.motohiro, Andrew Morton, Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski, Tobias Oetiker, linux-kernel, linux-mm@kvack.org", Pekka Enberg, Rik van Riel, Christoph Lameter, Stephan von Krawczynski, Rafael J. Wysocki, Kernel Testers List > If reclaim fails to make sufficient progress, the priority is raised. > Once the priority is higher, kswapd starts waiting on congestion. > However, on systems with large numbers of high-order atomics due to > crappy network cards, it's important that kswapd keep working in > parallel to save their sorry ass. > > This patch takes into account the order kswapd is reclaiming at before > waiting on congestion. The higher the order, the longer it is before > kswapd considers itself to be in trouble. The impact is that kswapd > works harder in parallel rather than depending on direct reclaimers or > atomic allocations to fail. > > Signed-off-by: Mel Gorman <mel@csn.ul.ie> > --- > mm/vmscan.c | 14 ++++++++++++-- > 1 files changed, 12 insertions(+), 2 deletions(-) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index ffa1766..5e200f1 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -1946,7 +1946,7 @@ static int sleeping_prematurely(int order, long remaining) > static unsigned long balance_pgdat(pg_data_t *pgdat, int order) > { > int all_zones_ok; > - int priority; > + int priority, congestion_priority; > int i; > unsigned long total_scanned; > struct reclaim_state *reclaim_state = current->reclaim_state; > @@ -1967,6 +1967,16 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order) > */ > int temp_priority[MAX_NR_ZONES]; > > + /* > + * When priority reaches congestion_priority, kswapd will sleep > + * for a short time while congestion clears. The higher the > + * order being reclaimed, the less likely kswapd will go to > + * sleep as high-order allocations are harder to reclaim and > + * stall direct reclaimers longer > + */ > + congestion_priority = DEF_PRIORITY - 2; > + congestion_priority -= min(congestion_priority, sc.order); This calculation mean sc.order congestion_priority scan-pages --------------------------------------------------------- 0 10 1/1024 * zone-mem 1 9 1/512 * zone-mem 2 8 1/256 * zone-mem 3 7 1/128 * zone-mem 4 6 1/64 * zone-mem 5 5 1/32 * zone-mem 6 4 1/16 * zone-mem 7 3 1/8 * zone-mem 8 2 1/4 * zone-mem 9 1 1/2 * zone-mem 10 0 1 * zone-mem 11+ 0 1 * zone-mem I feel this is too agressive. The intention of this congestion_wait() is to prevent kswapd use 100% cpu time. but the above promotion seems break it. example, ia64 have 256MB hugepage (i.e. order=14). it mean kswapd never sleep. example2, order-3 (i.e. PAGE_ALLOC_COSTLY_ORDER) makes one of most inefficent reclaim, because it doesn't use lumpy recliam. I've seen 128GB size zone, it mean 1/128 = 1GB. oh well, kswapd definitely waste cpu time 100%. > + > loop_again: > total_scanned = 0; > sc.nr_reclaimed = 0; > @@ -2092,7 +2102,7 @@ loop_again: > * OK, kswapd is getting into trouble. Take a nap, then take > * another pass across the zones. > */ > - if (total_scanned && priority < DEF_PRIORITY - 2) > + if (total_scanned && priority < congestion_priority) > congestion_wait(BLK_RW_ASYNC, HZ/10); Instead, How about this? --- mm/vmscan.c | 13 ++++++++++++- 1 files changed, 12 insertions(+), 1 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 64e4388..937e90d 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1938,6 +1938,7 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order) * free_pages == high_wmark_pages(zone). */ int temp_priority[MAX_NR_ZONES]; + int has_under_min_watermark_zone = 0; loop_again: total_scanned = 0; @@ -2057,6 +2058,15 @@ loop_again: if (total_scanned > SWAP_CLUSTER_MAX * 2 && total_scanned > sc.nr_reclaimed + sc.nr_reclaimed / 2) sc.may_writepage = 1; + + /* + * We are still under min water mark. it mean we have + * GFP_ATOMIC allocation failure risk. Hurry up! + */ + if (!zone_watermark_ok(zone, order, min_wmark_pages(zone), + end_zone, 0)) + has_under_min_watermark_zone = 1; + } if (all_zones_ok) break; /* kswapd: all done */ @@ -2064,7 +2074,8 @@ loop_again: * OK, kswapd is getting into trouble. Take a nap, then take * another pass across the zones. */ - if (total_scanned && priority < DEF_PRIORITY - 2) + if (total_scanned && (priority < DEF_PRIORITY - 2) && + !has_under_min_watermark_zone) congestion_wait(BLK_RW_ASYNC, HZ/10); /* -- 1.6.2.5 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 57+ messages in thread
[parent not found: <20091113142608.33B9.A69D9226-+CUm20s59erQFUHtdCDX3A@public.gmane.org>]
* Re: [PATCH 5/5] vmscan: Take order into consideration when deciding if kswapd is in trouble [not found] ` <20091113142608.33B9.A69D9226-+CUm20s59erQFUHtdCDX3A@public.gmane.org> @ 2009-11-13 13:54 ` Mel Gorman [not found] ` <20091113135443.GF29804-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> 2009-11-13 18:00 ` KOSAKI Motohiro 0 siblings, 2 replies; 57+ messages in thread From: Mel Gorman @ 2009-11-13 13:54 UTC (permalink / raw) To: KOSAKI Motohiro Cc: Andrew Morton, Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski, Tobias Oetiker, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, Pekka Enberg, Rik van Riel, Christoph Lameter, Stephan von Krawczynski, Rafael J. Wysocki, Kernel Testers List On Fri, Nov 13, 2009 at 06:54:29PM +0900, KOSAKI Motohiro wrote: > > If reclaim fails to make sufficient progress, the priority is raised. > > Once the priority is higher, kswapd starts waiting on congestion. > > However, on systems with large numbers of high-order atomics due to > > crappy network cards, it's important that kswapd keep working in > > parallel to save their sorry ass. > > > > This patch takes into account the order kswapd is reclaiming at before > > waiting on congestion. The higher the order, the longer it is before > > kswapd considers itself to be in trouble. The impact is that kswapd > > works harder in parallel rather than depending on direct reclaimers or > > atomic allocations to fail. > > > > Signed-off-by: Mel Gorman <mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> > > --- > > mm/vmscan.c | 14 ++++++++++++-- > > 1 files changed, 12 insertions(+), 2 deletions(-) > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c > > index ffa1766..5e200f1 100644 > > --- a/mm/vmscan.c > > +++ b/mm/vmscan.c > > @@ -1946,7 +1946,7 @@ static int sleeping_prematurely(int order, long remaining) > > static unsigned long balance_pgdat(pg_data_t *pgdat, int order) > > { > > int all_zones_ok; > > - int priority; > > + int priority, congestion_priority; > > int i; > > unsigned long total_scanned; > > struct reclaim_state *reclaim_state = current->reclaim_state; > > @@ -1967,6 +1967,16 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order) > > */ > > int temp_priority[MAX_NR_ZONES]; > > > > + /* > > + * When priority reaches congestion_priority, kswapd will sleep > > + * for a short time while congestion clears. The higher the > > + * order being reclaimed, the less likely kswapd will go to > > + * sleep as high-order allocations are harder to reclaim and > > + * stall direct reclaimers longer > > + */ > > + congestion_priority = DEF_PRIORITY - 2; > > + congestion_priority -= min(congestion_priority, sc.order); > > This calculation mean > > sc.order congestion_priority scan-pages > --------------------------------------------------------- > 0 10 1/1024 * zone-mem > 1 9 1/512 * zone-mem > 2 8 1/256 * zone-mem > 3 7 1/128 * zone-mem > 4 6 1/64 * zone-mem > 5 5 1/32 * zone-mem > 6 4 1/16 * zone-mem > 7 3 1/8 * zone-mem > 8 2 1/4 * zone-mem > 9 1 1/2 * zone-mem > 10 0 1 * zone-mem > 11+ 0 1 * zone-mem > > I feel this is too agressive. The intention of this congestion_wait() > is to prevent kswapd use 100% cpu time. Ok, I thought the intention might be to avoid dumping too many pages on the queue but it was already waiting on congestion elsewhere. > but the above promotion seems > break it. > > example, > ia64 have 256MB hugepage (i.e. order=14). it mean kswapd never sleep. > > example2, > order-3 (i.e. PAGE_ALLOC_COSTLY_ORDER) makes one of most inefficent > reclaim, because it doesn't use lumpy recliam. > I've seen 128GB size zone, it mean 1/128 = 1GB. oh well, kswapd definitely > waste cpu time 100%. > > > > + > > loop_again: > > total_scanned = 0; > > sc.nr_reclaimed = 0; > > @@ -2092,7 +2102,7 @@ loop_again: > > * OK, kswapd is getting into trouble. Take a nap, then take > > * another pass across the zones. > > */ > > - if (total_scanned && priority < DEF_PRIORITY - 2) > > + if (total_scanned && priority < congestion_priority) > > congestion_wait(BLK_RW_ASYNC, HZ/10); > > Instead, How about this? > This makes a lot of sense. Tests look good and I added stats to make sure the logic was triggering. On X86, kswapd avoided a congestion_wait 11723 times and X86-64 avoided it 5084 times. I think we should hold onto the stats temporarily until all these bugs are ironed out. Would you like to sign off the following? If you are ok to sign off, this patch should replace my patch 5 in the series. ==== CUT HERE ==== vmscan: Stop kswapd waiting on congestion when the min watermark is not being met If reclaim fails to make sufficient progress, the priority is raised. Once the priority is higher, kswapd starts waiting on congestion. However, if the zone is below the min watermark then kswapd needs to continue working without delay as there is a danger of an increased rate of GFP_ATOMIC allocation failure. This patch changes the conditions under which kswapd waits on congestion by only going to sleep if the min watermarks are being met. [mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org: Add stats to track how relevant the logic is] Needs-signed-off-by-original-author diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h index 9716003..7d66695 100644 --- a/include/linux/vmstat.h +++ b/include/linux/vmstat.h @@ -41,6 +41,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, #endif PGINODESTEAL, SLABS_SCANNED, KSWAPD_STEAL, KSWAPD_INODESTEAL, KSWAPD_PREMATURE_FAST, KSWAPD_PREMATURE_SLOW, + KSWAPD_NO_CONGESTION_WAIT, PAGEOUTRUN, ALLOCSTALL, PGROTATED, #ifdef CONFIG_HUGETLB_PAGE HTLB_BUDDY_PGALLOC, HTLB_BUDDY_PGALLOC_FAIL, diff --git a/mm/vmscan.c b/mm/vmscan.c index ffa1766..70967e1 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1966,6 +1966,7 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order) * free_pages == high_wmark_pages(zone). */ int temp_priority[MAX_NR_ZONES]; + int has_under_min_watermark_zone = 0; loop_again: total_scanned = 0; @@ -2085,6 +2086,15 @@ loop_again: if (total_scanned > SWAP_CLUSTER_MAX * 2 && total_scanned > sc.nr_reclaimed + sc.nr_reclaimed / 2) sc.may_writepage = 1; + + /* + * We are still under min water mark. it mean we have + * GFP_ATOMIC allocation failure risk. Hurry up! + */ + if (!zone_watermark_ok(zone, order, min_wmark_pages(zone), + end_zone, 0)) + has_under_min_watermark_zone = 1; + } if (all_zones_ok) break; /* kswapd: all done */ @@ -2092,8 +2102,13 @@ loop_again: * OK, kswapd is getting into trouble. Take a nap, then take * another pass across the zones. */ - if (total_scanned && priority < DEF_PRIORITY - 2) - congestion_wait(BLK_RW_ASYNC, HZ/10); + if (total_scanned && (priority < DEF_PRIORITY - 2)) { + + if (!has_under_min_watermark_zone) + count_vm_event(KSWAPD_NO_CONGESTION_WAIT); + else + congestion_wait(BLK_RW_ASYNC, HZ/10); + } /* * We do this so kswapd doesn't build up large priorities for diff --git a/mm/vmstat.c b/mm/vmstat.c index 90b11e4..bc09547 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -685,6 +685,7 @@ static const char * const vmstat_text[] = { "kswapd_inodesteal", "kswapd_slept_prematurely_fast", "kswapd_slept_prematurely_slow", + "kswapd_no_congestion_wait", "pageoutrun", "allocstall", ^ permalink raw reply related [flat|nested] 57+ messages in thread
[parent not found: <20091113135443.GF29804-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org>]
* Re: [PATCH 5/5] vmscan: Take order into consideration when deciding if kswapd is in trouble [not found] ` <20091113135443.GF29804-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> @ 2009-11-13 14:48 ` Minchan Kim 0 siblings, 0 replies; 57+ messages in thread From: Minchan Kim @ 2009-11-13 14:48 UTC (permalink / raw) To: Mel Gorman Cc: KOSAKI Motohiro, Andrew Morton, Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski, Tobias Oetiker, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, Pekka Enberg, Rik van Riel, Christoph Lameter, Stephan von Krawczynski, Rafael J. Wysocki, Kernel Testers List On Fri, Nov 13, 2009 at 10:54 PM, Mel Gorman <mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> wrote: > On Fri, Nov 13, 2009 at 06:54:29PM +0900, KOSAKI Motohiro wrote: >> > If reclaim fails to make sufficient progress, the priority is raised. >> > Once the priority is higher, kswapd starts waiting on congestion. >> > However, on systems with large numbers of high-order atomics due to >> > crappy network cards, it's important that kswapd keep working in >> > parallel to save their sorry ass. >> > >> > This patch takes into account the order kswapd is reclaiming at before >> > waiting on congestion. The higher the order, the longer it is before >> > kswapd considers itself to be in trouble. The impact is that kswapd >> > works harder in parallel rather than depending on direct reclaimers or >> > atomic allocations to fail. >> > >> > Signed-off-by: Mel Gorman <mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> >> > --- >> > mm/vmscan.c | 14 ++++++++++++-- >> > 1 files changed, 12 insertions(+), 2 deletions(-) >> > >> > diff --git a/mm/vmscan.c b/mm/vmscan.c >> > index ffa1766..5e200f1 100644 >> > --- a/mm/vmscan.c >> > +++ b/mm/vmscan.c >> > @@ -1946,7 +1946,7 @@ static int sleeping_prematurely(int order, long remaining) >> > static unsigned long balance_pgdat(pg_data_t *pgdat, int order) >> > { >> > int all_zones_ok; >> > - int priority; >> > + int priority, congestion_priority; >> > int i; >> > unsigned long total_scanned; >> > struct reclaim_state *reclaim_state = current->reclaim_state; >> > @@ -1967,6 +1967,16 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order) >> > */ >> > int temp_priority[MAX_NR_ZONES]; >> > >> > + /* >> > + * When priority reaches congestion_priority, kswapd will sleep >> > + * for a short time while congestion clears. The higher the >> > + * order being reclaimed, the less likely kswapd will go to >> > + * sleep as high-order allocations are harder to reclaim and >> > + * stall direct reclaimers longer >> > + */ >> > + congestion_priority = DEF_PRIORITY - 2; >> > + congestion_priority -= min(congestion_priority, sc.order); >> >> This calculation mean >> >> sc.order congestion_priority scan-pages >> --------------------------------------------------------- >> 0 10 1/1024 * zone-mem >> 1 9 1/512 * zone-mem >> 2 8 1/256 * zone-mem >> 3 7 1/128 * zone-mem >> 4 6 1/64 * zone-mem >> 5 5 1/32 * zone-mem >> 6 4 1/16 * zone-mem >> 7 3 1/8 * zone-mem >> 8 2 1/4 * zone-mem >> 9 1 1/2 * zone-mem >> 10 0 1 * zone-mem >> 11+ 0 1 * zone-mem >> >> I feel this is too agressive. The intention of this congestion_wait() >> is to prevent kswapd use 100% cpu time. As I said in reply of kosaki's patch, I can't understand point. > Ok, I thought the intention might be to avoid dumping too many pages on > the queue but it was already waiting on congestion elsewhere. > >> but the above promotion seems >> break it. >> >> example, >> ia64 have 256MB hugepage (i.e. order=14). it mean kswapd never sleep. >> example2, But, This is a true problem missed in my review. Thanks, Kosaki. >> order-3 (i.e. PAGE_ALLOC_COSTLY_ORDER) makes one of most inefficent >> reclaim, because it doesn't use lumpy recliam. >> I've seen 128GB size zone, it mean 1/128 = 1GB. oh well, kswapd definitely >> waste cpu time 100%. >> >> >> > + >> > loop_again: >> > total_scanned = 0; >> > sc.nr_reclaimed = 0; >> > @@ -2092,7 +2102,7 @@ loop_again: >> > * OK, kswapd is getting into trouble. Take a nap, then take >> > * another pass across the zones. >> > */ >> > - if (total_scanned && priority < DEF_PRIORITY - 2) >> > + if (total_scanned && priority < congestion_priority) >> > congestion_wait(BLK_RW_ASYNC, HZ/10); >> >> Instead, How about this? >> > > This makes a lot of sense. Tests look good and I added stats to make sure > the logic was triggering. On X86, kswapd avoided a congestion_wait 11723 > times and X86-64 avoided it 5084 times. I think we should hold onto the > stats temporarily until all these bugs are ironed out. > > Would you like to sign off the following? > > If you are ok to sign off, this patch should replace my patch 5 in > the series. I agree Kosaki's patch is more strightforward. You can add my review sign, too. Thanks for good patch, Kosaki. :) -- Kind regards, Minchan Kim ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 5/5] vmscan: Take order into consideration when deciding if kswapd is in trouble 2009-11-13 13:54 ` Mel Gorman [not found] ` <20091113135443.GF29804-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> @ 2009-11-13 18:00 ` KOSAKI Motohiro [not found] ` <20091114023138.3DA5.A69D9226-+CUm20s59erQFUHtdCDX3A@public.gmane.org> 1 sibling, 1 reply; 57+ messages in thread From: KOSAKI Motohiro @ 2009-11-13 18:00 UTC (permalink / raw) To: Mel Gorman Cc: kosaki.motohiro, Andrew Morton, Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski, Tobias Oetiker, linux-kernel, linux-mm@kvack.org, Pekka Enberg, Rik van Riel, Christoph Lameter, Stephan von Krawczynski, Rafael J. Wysocki, Kernel Testers List > This makes a lot of sense. Tests look good and I added stats to make sure > the logic was triggering. On X86, kswapd avoided a congestion_wait 11723 > times and X86-64 avoided it 5084 times. I think we should hold onto the > stats temporarily until all these bugs are ironed out. > > Would you like to sign off the following? > > If you are ok to sign off, this patch should replace my patch 5 in > the series. I'm sorry, I found my bug. Please see below. > > ==== CUT HERE ==== > > vmscan: Stop kswapd waiting on congestion when the min watermark is not being met > > If reclaim fails to make sufficient progress, the priority is raised. > Once the priority is higher, kswapd starts waiting on congestion. However, > if the zone is below the min watermark then kswapd needs to continue working > without delay as there is a danger of an increased rate of GFP_ATOMIC > allocation failure. > > This patch changes the conditions under which kswapd waits on > congestion by only going to sleep if the min watermarks are being met. > > [mel@csn.ul.ie: Add stats to track how relevant the logic is] > Needs-signed-off-by-original-author > > diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h > index 9716003..7d66695 100644 > --- a/include/linux/vmstat.h > +++ b/include/linux/vmstat.h > @@ -41,6 +41,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, > #endif > PGINODESTEAL, SLABS_SCANNED, KSWAPD_STEAL, KSWAPD_INODESTEAL, > KSWAPD_PREMATURE_FAST, KSWAPD_PREMATURE_SLOW, > + KSWAPD_NO_CONGESTION_WAIT, > PAGEOUTRUN, ALLOCSTALL, PGROTATED, > #ifdef CONFIG_HUGETLB_PAGE > HTLB_BUDDY_PGALLOC, HTLB_BUDDY_PGALLOC_FAIL, > diff --git a/mm/vmscan.c b/mm/vmscan.c > index ffa1766..70967e1 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -1966,6 +1966,7 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order) > * free_pages == high_wmark_pages(zone). > */ > int temp_priority[MAX_NR_ZONES]; > + int has_under_min_watermark_zone = 0; This is wrong declaration place. It must change to for (priority = DEF_PRIORITY; priority >= 0; priority--) { int end_zone = 0; /* Inclusive. 0 = ZONE_DMA */ unsigned long lru_pages = 0; + int has_under_min_watermark_zone = 0; because, has_under_min_watermark_zone should be initialized every priority. > loop_again: > total_scanned = 0; > @@ -2085,6 +2086,15 @@ loop_again: > if (total_scanned > SWAP_CLUSTER_MAX * 2 && > total_scanned > sc.nr_reclaimed + sc.nr_reclaimed / 2) > sc.may_writepage = 1; > + > + /* > + * We are still under min water mark. it mean we have > + * GFP_ATOMIC allocation failure risk. Hurry up! > + */ > + if (!zone_watermark_ok(zone, order, min_wmark_pages(zone), > + end_zone, 0)) > + has_under_min_watermark_zone = 1; > + > } > if (all_zones_ok) > break; /* kswapd: all done */ > @@ -2092,8 +2102,13 @@ loop_again: > * OK, kswapd is getting into trouble. Take a nap, then take > * another pass across the zones. > */ > - if (total_scanned && priority < DEF_PRIORITY - 2) > - congestion_wait(BLK_RW_ASYNC, HZ/10); > + if (total_scanned && (priority < DEF_PRIORITY - 2)) { > + This blank line is unnecesary. > + if (!has_under_min_watermark_zone) Probably "if (has_under_min_watermark_zone)" is correct. > + count_vm_event(KSWAPD_NO_CONGESTION_WAIT); > + else > + congestion_wait(BLK_RW_ASYNC, HZ/10); > + } Otherthing looks pretty good to me. please feel free to add my s-o-b or reviewed-by. Thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 57+ messages in thread
[parent not found: <20091114023138.3DA5.A69D9226-+CUm20s59erQFUHtdCDX3A@public.gmane.org>]
* [PATCH] vmscan: Stop kswapd waiting on congestion when the min watermark is not being met [not found] ` <20091114023138.3DA5.A69D9226-+CUm20s59erQFUHtdCDX3A@public.gmane.org> @ 2009-11-13 18:15 ` Mel Gorman 2009-11-13 18:26 ` Frans Pop ` (2 more replies) 0 siblings, 3 replies; 57+ messages in thread From: Mel Gorman @ 2009-11-13 18:15 UTC (permalink / raw) To: Andrew Morton Cc: KOSAKI Motohiro, Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski, Tobias Oetiker, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, Pekka Enberg, Rik van Riel, Christoph Lameter, Stephan von Krawczynski, Rafael J. Wysocki, Kernel Testers List If reclaim fails to make sufficient progress, the priority is raised. Once the priority is higher, kswapd starts waiting on congestion. However, if the zone is below the min watermark then kswapd needs to continue working without delay as there is a danger of an increased rate of GFP_ATOMIC allocation failure. This patch changes the conditions under which kswapd waits on congestion by only going to sleep if the min watermarks are being met. This patch replaces vmscan-take-order-into-consideration-when-deciding-if-kswapd-is-in-trouble.patch . [mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org: Add stats to track how relevant the logic is] From: KOSAKI Motohiro <kosaki.motohiro-+CUm20s59erQFUHtdCDX3A@public.gmane.org> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro-+CUm20s59erQFUHtdCDX3A@public.gmane.org> Signed-off-by: Mel Gorman <mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> --- include/linux/vmstat.h | 1 + mm/vmscan.c | 18 ++++++++++++++++-- mm/vmstat.c | 1 + 3 files changed, 18 insertions(+), 2 deletions(-) diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h index 9716003..7d66695 100644 --- a/include/linux/vmstat.h +++ b/include/linux/vmstat.h @@ -41,6 +41,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, #endif PGINODESTEAL, SLABS_SCANNED, KSWAPD_STEAL, KSWAPD_INODESTEAL, KSWAPD_PREMATURE_FAST, KSWAPD_PREMATURE_SLOW, + KSWAPD_NO_CONGESTION_WAIT, PAGEOUTRUN, ALLOCSTALL, PGROTATED, #ifdef CONFIG_HUGETLB_PAGE HTLB_BUDDY_PGALLOC, HTLB_BUDDY_PGALLOC_FAIL, diff --git a/mm/vmscan.c b/mm/vmscan.c index ffa1766..70a2322 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1979,6 +1979,7 @@ loop_again: for (priority = DEF_PRIORITY; priority >= 0; priority--) { int end_zone = 0; /* Inclusive. 0 = ZONE_DMA */ unsigned long lru_pages = 0; + int has_under_min_watermark_zone = 0; /* The swap token gets in the way of swapout... */ if (!priority) @@ -2085,6 +2086,15 @@ loop_again: if (total_scanned > SWAP_CLUSTER_MAX * 2 && total_scanned > sc.nr_reclaimed + sc.nr_reclaimed / 2) sc.may_writepage = 1; + + /* + * We are still under min water mark. it mean we have + * GFP_ATOMIC allocation failure risk. Hurry up! + */ + if (!zone_watermark_ok(zone, order, min_wmark_pages(zone), + end_zone, 0)) + has_under_min_watermark_zone = 1; + } if (all_zones_ok) break; /* kswapd: all done */ @@ -2092,8 +2102,12 @@ loop_again: * OK, kswapd is getting into trouble. Take a nap, then take * another pass across the zones. */ - if (total_scanned && priority < DEF_PRIORITY - 2) - congestion_wait(BLK_RW_ASYNC, HZ/10); + if (total_scanned && (priority < DEF_PRIORITY - 2)) { + if (!has_under_min_watermark_zone) + count_vm_event(KSWAPD_NO_CONGESTION_WAIT); + else + congestion_wait(BLK_RW_ASYNC, HZ/10); + } /* * We do this so kswapd doesn't build up large priorities for diff --git a/mm/vmstat.c b/mm/vmstat.c index 90b11e4..bc09547 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -685,6 +685,7 @@ static const char * const vmstat_text[] = { "kswapd_inodesteal", "kswapd_slept_prematurely_fast", "kswapd_slept_prematurely_slow", + "kswapd_no_congestion_wait", "pageoutrun", "allocstall", ^ permalink raw reply related [flat|nested] 57+ messages in thread
* Re: [PATCH] vmscan: Stop kswapd waiting on congestion when the min watermark is not being met 2009-11-13 18:15 ` [PATCH] vmscan: Stop kswapd waiting on congestion when the min watermark is not being met Mel Gorman @ 2009-11-13 18:26 ` Frans Pop [not found] ` <20091113181557.GM29804-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> 2009-11-13 18:36 ` [PATCH] vmscan: Stop kswapd waiting on congestion when the min watermark is not being met Rik van Riel 2 siblings, 0 replies; 57+ messages in thread From: Frans Pop @ 2009-11-13 18:26 UTC (permalink / raw) To: Mel Gorman Cc: Andrew Morton, KOSAKI Motohiro, Jiri Kosina, Sven Geggus, Karol Lewandowski, Tobias Oetiker, linux-kernel, linux-mm@kvack.org, Pekka Enberg, Rik van Riel, Christoph Lameter, Stephan von Krawczynski, Rafael J. Wysocki, Kernel Testers List On Friday 13 November 2009, Mel Gorman wrote: > If reclaim fails to make sufficient progress, the priority is raised. > Once the priority is higher, kswapd starts waiting on congestion. > However, if the zone is below the min watermark then kswapd needs to > continue working without delay as there is a danger of an increased rate > of GFP_ATOMIC allocation failure. > > This patch changes the conditions under which kswapd waits on > congestion by only going to sleep if the min watermarks are being met. > > This patch replaces > vmscan-take-order-into-consideration-when-deciding-if-kswapd-is-in-troub >le.patch . > > [mel@csn.ul.ie: Add stats to track how relevant the logic is] > From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> For this to work with git-am, the From: line has to be _above_ the patch description (must be the first line of the mail even). AFAIK at least. > Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> > Signed-off-by: Mel Gorman <mel@csn.ul.ie> Cheers, FJP -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 57+ messages in thread
[parent not found: <20091113181557.GM29804-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org>]
* Re: [PATCH] vmscan: Stop kswapd waiting on congestion when the min watermark is not being met [not found] ` <20091113181557.GM29804-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> @ 2009-11-13 18:33 ` KOSAKI Motohiro [not found] ` <2f11576a0911131033w4a9e6042k3349f0be290a167e-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 57+ messages in thread From: KOSAKI Motohiro @ 2009-11-13 18:33 UTC (permalink / raw) To: Mel Gorman Cc: Andrew Morton, Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski, Tobias Oetiker, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, Pekka Enberg, Rik van Riel, Christoph Lameter, Stephan von Krawczynski, Rafael J. Wysocki, Kernel Testers List > @@ -2092,8 +2102,12 @@ loop_again: > * OK, kswapd is getting into trouble. Take a nap, then take > * another pass across the zones. > */ > - if (total_scanned && priority < DEF_PRIORITY - 2) > - congestion_wait(BLK_RW_ASYNC, HZ/10); > + if (total_scanned && (priority < DEF_PRIORITY - 2)) { > + if (!has_under_min_watermark_zone) if I am correct, we must to remove "!". > + count_vm_event(KSWAPD_NO_CONGESTION_WAIT); > + else > + congestion_wait(BLK_RW_ASYNC, HZ/10); > + } ^ permalink raw reply [flat|nested] 57+ messages in thread
[parent not found: <2f11576a0911131033w4a9e6042k3349f0be290a167e-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* [PATCH] vmscan: Stop kswapd waiting on congestion when the min watermark is not being met V2 [not found] ` <2f11576a0911131033w4a9e6042k3349f0be290a167e-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2009-11-13 20:03 ` Mel Gorman [not found] ` <20091113200357.GO29804-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> 0 siblings, 1 reply; 57+ messages in thread From: Mel Gorman @ 2009-11-13 20:03 UTC (permalink / raw) To: Andrew Morton Cc: KOSAKI Motohiro, Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski, Tobias Oetiker, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, Pekka Enberg, Rik van Riel, Christoph Lameter, Stephan von Krawczynski, Rafael J. Wysocki, Kernel Testers List The last version has a stupid bug in it. Sorry. Changelog since V1 o Fix incorrect negation o Rename kswapd_no_congestion_wait to kswapd_skip_congestion_wait as suggested by Rik If reclaim fails to make sufficient progress, the priority is raised. Once the priority is higher, kswapd starts waiting on congestion. However, if the zone is below the min watermark then kswapd needs to continue working without delay as there is a danger of an increased rate of GFP_ATOMIC allocation failure. This patch changes the conditions under which kswapd waits on congestion by only going to sleep if the min watermarks are being met. [mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org: Add stats to track how relevant the logic is] From: KOSAKI Motohiro <kosaki.motohiro-+CUm20s59erQFUHtdCDX3A@public.gmane.org> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro-+CUm20s59erQFUHtdCDX3A@public.gmane.org> Signed-off-by: Mel Gorman <mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> Reviewed-by: Rik van Riel <riel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> --- include/linux/vmstat.h | 1 + mm/vmscan.c | 18 ++++++++++++++++-- mm/vmstat.c | 1 + 3 files changed, 18 insertions(+), 2 deletions(-) diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h index 9716003..7d66695 100644 --- a/include/linux/vmstat.h +++ b/include/linux/vmstat.h @@ -41,6 +41,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, #endif PGINODESTEAL, SLABS_SCANNED, KSWAPD_STEAL, KSWAPD_INODESTEAL, KSWAPD_PREMATURE_FAST, KSWAPD_PREMATURE_SLOW, + KSWAPD_SKIP_CONGESTION_WAIT, PAGEOUTRUN, ALLOCSTALL, PGROTATED, #ifdef CONFIG_HUGETLB_PAGE HTLB_BUDDY_PGALLOC, HTLB_BUDDY_PGALLOC_FAIL, diff --git a/mm/vmscan.c b/mm/vmscan.c index ffa1766..70a2322 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1979,6 +1979,7 @@ loop_again: for (priority = DEF_PRIORITY; priority >= 0; priority--) { int end_zone = 0; /* Inclusive. 0 = ZONE_DMA */ unsigned long lru_pages = 0; + int has_under_min_watermark_zone = 0; /* The swap token gets in the way of swapout... */ if (!priority) @ -2085,6 +2086,15 @@ loop_again: if (total_scanned > SWAP_CLUSTER_MAX * 2 && total_scanned > sc.nr_reclaimed + sc.nr_reclaimed / 2) sc.may_writepage = 1; + + /* + * We are still under min water mark. it mean we have + * GFP_ATOMIC allocation failure risk. Hurry up! + */ + if (!zone_watermark_ok(zone, order, min_wmark_pages(zone), + end_zone, 0)) + has_under_min_watermark_zone = 1; + } if (all_zones_ok) break; /* kswapd: all done */ @@ -2092,8 +2102,12 @@ loop_again: * OK, kswapd is getting into trouble. Take a nap, then take * another pass across the zones. */ - if (total_scanned && priority < DEF_PRIORITY - 2) - congestion_wait(BLK_RW_ASYNC, HZ/10); + if (total_scanned && (priority < DEF_PRIORITY - 2)) { + if (has_under_min_watermark_zone) + count_vm_event(KSWAPD_SKIP_CONGESTION_WAIT); + else + congestion_wait(BLK_RW_ASYNC, HZ/10); + } /* * We do this so kswapd doesn't build up large priorities for diff --git a/mm/vmstat.c b/mm/vmstat.c index 90b11e4..bc09547 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -685,6 +685,7 @@ static const char * const vmstat_text[] = { "kswapd_inodesteal", "kswapd_slept_prematurely_fast", "kswapd_slept_prematurely_slow", + "kswapd_skip_congestion_wait", "pageoutrun", "allocstall", ^ permalink raw reply related [flat|nested] 57+ messages in thread
[parent not found: <20091113200357.GO29804-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org>]
* Re: [PATCH] vmscan: Stop kswapd waiting on congestion when the min watermark is not being met V2 [not found] ` <20091113200357.GO29804-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> @ 2009-11-26 14:45 ` Tobias Oetiker [not found] ` <alpine.DEB.2.00.0911261542500.21450-EjsAmf5DE5zIvOfxy3zmAzgUDZmNtoG9@public.gmane.org> 0 siblings, 1 reply; 57+ messages in thread From: Tobias Oetiker @ 2009-11-26 14:45 UTC (permalink / raw) To: Mel Gorman Cc: Andrew Morton, KOSAKI Motohiro, Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, Pekka Enberg, Rik van Riel, Christoph Lameter, Stephan von Krawczynski, Rafael J. Wysocki, Kernel Testers List Hi Mel, Nov 13 Mel Gorman wrote: > The last version has a stupid bug in it. Sorry. > > Changelog since V1 > o Fix incorrect negation > o Rename kswapd_no_congestion_wait to kswapd_skip_congestion_wait as > suggested by Rik > > If reclaim fails to make sufficient progress, the priority is raised. > Once the priority is higher, kswapd starts waiting on congestion. However, > if the zone is below the min watermark then kswapd needs to continue working > without delay as there is a danger of an increased rate of GFP_ATOMIC > allocation failure. > > This patch changes the conditions under which kswapd waits on > congestion by only going to sleep if the min watermarks are being met. I finally got around to test this together with the whole series on 2.6.31.6. after running it for a day I have not yet seen a single order:5 allocation problem ... (while I had several an hour before) for the record, my kernel is now running with the following patches: patch1:Date: Thu, 12 Nov 2009 19:30:31 +0000 patch1:Subject: [PATCH 1/5] page allocator: Always wake kswapd when restarting an allocation attempt after direct reclaim failed patch2:Date: Thu, 12 Nov 2009 19:30:32 +0000 patch2:Subject: [PATCH 2/5] page allocator: Do not allow interrupts to use ALLOC_HARDER patch3:Date: Thu, 12 Nov 2009 19:30:33 +0000 patch3:Subject: [PATCH 3/5] page allocator: Wait on both sync and async congestion after direct reclaim patch4:Date: Thu, 12 Nov 2009 19:30:34 +0000 patch4:Subject: [PATCH 4/5] vmscan: Have kswapd sleep for a short interval and double check it should be asleep patch5:Date: Fri, 13 Nov 2009 20:03:57 +0000 patch5:Subject: [PATCH] vmscan: Stop kswapd waiting on congestion when the min watermark is not being met V2 patch6:Date: Tue, 17 Nov 2009 10:34:21 +0000 patch6:Subject: [PATCH] vmscan: Have kswapd sleep for a short interval and double check it should be asleep fix 1 cheers tobi -- Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland http://it.oetiker.ch tobi-7K0TWYW2a3pyDzI6CaY1VQ@public.gmane.org ++41 62 775 9902 / sb: -9900 ^ permalink raw reply [flat|nested] 57+ messages in thread
[parent not found: <alpine.DEB.2.00.0911261542500.21450-EjsAmf5DE5zIvOfxy3zmAzgUDZmNtoG9@public.gmane.org>]
* still getting allocation failures (was Re: [PATCH] vmscan: Stop kswapd waiting on congestion when the min watermark is not being met V2) [not found] ` <alpine.DEB.2.00.0911261542500.21450-EjsAmf5DE5zIvOfxy3zmAzgUDZmNtoG9@public.gmane.org> @ 2009-11-29 7:42 ` Tobi Oetiker 2009-12-02 11:32 ` Mel Gorman 0 siblings, 1 reply; 57+ messages in thread From: Tobi Oetiker @ 2009-11-29 7:42 UTC (permalink / raw) To: Mel Gorman Cc: Andrew Morton, KOSAKI Motohiro, Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, Pekka Enberg, Rik van Riel, Christoph Lameter, Stephan von Krawczynski, Rafael J. Wysocki, Kernel Testers List Hi Mel, Thursday Tobias Oetiker wrote: > Hi Mel, > > Nov 13 Mel Gorman wrote: > > > The last version has a stupid bug in it. Sorry. > > > > Changelog since V1 > > o Fix incorrect negation > > o Rename kswapd_no_congestion_wait to kswapd_skip_congestion_wait as > > suggested by Rik > > > > If reclaim fails to make sufficient progress, the priority is raised. > > Once the priority is higher, kswapd starts waiting on congestion. However, > > if the zone is below the min watermark then kswapd needs to continue working > > without delay as there is a danger of an increased rate of GFP_ATOMIC > > allocation failure. > > > > This patch changes the conditions under which kswapd waits on > > congestion by only going to sleep if the min watermarks are being met. > > I finally got around to test this together with the whole series on > 2.6.31.6. after running it for a day I have not yet seen a single > order:5 allocation problem ... (while I had several an hour before) > for the record, my kernel is now running with the following > patches: > > patch1:Date: Thu, 12 Nov 2009 19:30:31 +0000 > patch1:Subject: [PATCH 1/5] page allocator: Always wake kswapd when restarting an allocation attempt after direct reclaim failed > > patch2:Date: Thu, 12 Nov 2009 19:30:32 +0000 > patch2:Subject: [PATCH 2/5] page allocator: Do not allow interrupts to use ALLOC_HARDER > > patch3:Date: Thu, 12 Nov 2009 19:30:33 +0000 > patch3:Subject: [PATCH 3/5] page allocator: Wait on both sync and async congestion after direct reclaim > > patch4:Date: Thu, 12 Nov 2009 19:30:34 +0000 > patch4:Subject: [PATCH 4/5] vmscan: Have kswapd sleep for a short interval and double check it should be asleep > > patch5:Date: Fri, 13 Nov 2009 20:03:57 +0000 > patch5:Subject: [PATCH] vmscan: Stop kswapd waiting on congestion when the min watermark is not being met V2 > > patch6:Date: Tue, 17 Nov 2009 10:34:21 +0000 > patch6:Subject: [PATCH] vmscan: Have kswapd sleep for a short interval and double check it should be asleep fix 1 > I have now been running the new kernel for a few days and I am sorry to report that about a day after booting the allocation failures started showing again. More order:4 instead of order:5 ... Nov 29 07:16:17 johan kernel: [261565.598627] nfsd: page allocation failure. order:4, mode:0x4020 [kern.warning] Nov 29 07:16:17 johan kernel: [261565.598638] Pid: 6956, comm: nfsd Tainted: G D 2.6.31.6-oep #1 [kern.warning] Nov 29 07:16:17 johan kernel: [261565.598641] Call Trace: [kern.warning] Nov 29 07:16:17 johan kernel: [261565.598646] <IRQ> [<ffffffff810cb730>] __alloc_pages_nodemask+0x570/0x690 [kern.warning] Nov 29 07:16:17 johan kernel: [261565.598672] [<ffffffff8142aba5>] ? tcp_tso_segment+0x265/0x2e0 [kern.warning] Nov 29 07:16:17 johan kernel: [261565.598680] [<ffffffff810faf08>] kmalloc_large_node+0x68/0xc0 [kern.warning] Nov 29 07:16:17 johan kernel: [261565.598692] [<ffffffff810febfa>] __kmalloc_node_track_caller+0x11a/0x180 [kern.warning] the fact that there were no problems right after booting might give some indication as to where the problem is. This is on a machine with 24 GB of memory, as you can see from the 'top' headers, it has plenty of room to spare and should never run into allocation problems anyway. The only thing that would have changed after running it for about a day is that more memory is used for caching ... # cat /proc/meminfo MemTotal: 24746268 kB MemFree: 128012 kB Buffers: 6255988 kB Cached: 10600780 kB SwapCached: 23668 kB Active: 13078944 kB Inactive: 9229172 kB Active(anon): 5455164 kB Inactive(anon): 508616 kB Active(file): 7623780 kB Inactive(file): 8720556 kB Unevictable: 291768 kB Mlocked: 291768 kB SwapTotal: 83886056 kB SwapFree: 83846520 kB Dirty: 2028 kB Writeback: 0 kB AnonPages: 5723748 kB Mapped: 152240 kB Slab: 1339176 kB SReclaimable: 1270156 kB SUnreclaim: 69020 kB PageTables: 31208 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 101208440 kB Committed_AS: 8146696 kB VmallocTotal: 34359738367 kB VmallocUsed: 115084 kB VmallocChunk: 34359518011 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 10176 kB DirectMap2M: 25155584 kB hth tobi -- Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland http://it.oetiker.ch tobi-7K0TWYW2a3pyDzI6CaY1VQ@public.gmane.org ++41 62 775 9902 / sb: -9900 ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: still getting allocation failures (was Re: [PATCH] vmscan: Stop kswapd waiting on congestion when the min watermark is not being met V2) 2009-11-29 7:42 ` still getting allocation failures (was Re: [PATCH] vmscan: Stop kswapd waiting on congestion when the min watermark is not being met V2) Tobi Oetiker @ 2009-12-02 11:32 ` Mel Gorman 2009-12-02 21:30 ` Tobias Oetiker 0 siblings, 1 reply; 57+ messages in thread From: Mel Gorman @ 2009-12-02 11:32 UTC (permalink / raw) To: Tobi Oetiker Cc: Andrew Morton, KOSAKI Motohiro, Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski, linux-kernel, linux-mm@kvack.org, Pekka Enberg, Rik van Riel, Christoph Lameter, Stephan von Krawczynski, Rafael J. Wysocki, Kernel Testers List On Sun, Nov 29, 2009 at 08:42:09AM +0100, Tobi Oetiker wrote: > Hi Mel, > > Thursday Tobias Oetiker wrote: > > Hi Mel, > > > > Nov 13 Mel Gorman wrote: > > > > > The last version has a stupid bug in it. Sorry. > > > > > > Changelog since V1 > > > o Fix incorrect negation > > > o Rename kswapd_no_congestion_wait to kswapd_skip_congestion_wait as > > > suggested by Rik > > > > > > If reclaim fails to make sufficient progress, the priority is raised. > > > Once the priority is higher, kswapd starts waiting on congestion. However, > > > if the zone is below the min watermark then kswapd needs to continue working > > > without delay as there is a danger of an increased rate of GFP_ATOMIC > > > allocation failure. > > > > > > This patch changes the conditions under which kswapd waits on > > > congestion by only going to sleep if the min watermarks are being met. > > > > I finally got around to test this together with the whole series on > > 2.6.31.6. after running it for a day I have not yet seen a single > > order:5 allocation problem ... (while I had several an hour before) > > > for the record, my kernel is now running with the following > > patches: > > > > patch1:Date: Thu, 12 Nov 2009 19:30:31 +0000 > > patch1:Subject: [PATCH 1/5] page allocator: Always wake kswapd when restarting an allocation attempt after direct reclaim failed > > > > patch2:Date: Thu, 12 Nov 2009 19:30:32 +0000 > > patch2:Subject: [PATCH 2/5] page allocator: Do not allow interrupts to use ALLOC_HARDER > > > > patch3:Date: Thu, 12 Nov 2009 19:30:33 +0000 > > patch3:Subject: [PATCH 3/5] page allocator: Wait on both sync and async congestion after direct reclaim > > > > patch4:Date: Thu, 12 Nov 2009 19:30:34 +0000 > > patch4:Subject: [PATCH 4/5] vmscan: Have kswapd sleep for a short interval and double check it should be asleep > > > > patch5:Date: Fri, 13 Nov 2009 20:03:57 +0000 > > patch5:Subject: [PATCH] vmscan: Stop kswapd waiting on congestion when the min watermark is not being met V2 > > > > patch6:Date: Tue, 17 Nov 2009 10:34:21 +0000 > > patch6:Subject: [PATCH] vmscan: Have kswapd sleep for a short interval and double check it should be asleep fix 1 > > > I have now been running the new kernel for a few days and I am > sorry to report that about a day after booting the allocation > failures started showing again. More order:4 instead of order:5 ... > Why has the order changed? Also, what allocator were you using in 2.6.30 and 2.6.31.6, SLAB or SLUB? Did you happen to change them when upgrading the kernel? -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: still getting allocation failures (was Re: [PATCH] vmscan: Stop kswapd waiting on congestion when the min watermark is not being met V2) 2009-12-02 11:32 ` Mel Gorman @ 2009-12-02 21:30 ` Tobias Oetiker [not found] ` <alpine.DEB.2.00.0912022210220.30023-EjsAmf5DE5zIvOfxy3zmAzgUDZmNtoG9@public.gmane.org> 0 siblings, 1 reply; 57+ messages in thread From: Tobias Oetiker @ 2009-12-02 21:30 UTC (permalink / raw) To: Mel Gorman Cc: Andrew Morton, KOSAKI Motohiro, Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski, linux-kernel, linux-mm@kvack.org, Pekka Enberg, Rik van Riel, Christoph Lameter, Stephan von Krawczynski, Rafael J. Wysocki, Kernel Testers List Hi Mel, Today Mel Gorman wrote: > On Sun, Nov 29, 2009 at 08:42:09AM +0100, Tobi Oetiker wrote: > > Hi Mel, > > > > Thursday Tobias Oetiker wrote: > > > Hi Mel, > > > > > > Nov 13 Mel Gorman wrote: > > > > > > > The last version has a stupid bug in it. Sorry. > > > > > > > > Changelog since V1 > > > > o Fix incorrect negation > > > > o Rename kswapd_no_congestion_wait to kswapd_skip_congestion_wait as > > > > suggested by Rik > > > > > > > > If reclaim fails to make sufficient progress, the priority is raised. > > > > Once the priority is higher, kswapd starts waiting on congestion. However, > > > > if the zone is below the min watermark then kswapd needs to continue working > > > > without delay as there is a danger of an increased rate of GFP_ATOMIC > > > > allocation failure. > > > > > > > > This patch changes the conditions under which kswapd waits on > > > > congestion by only going to sleep if the min watermarks are being met. > > > > > > I finally got around to test this together with the whole series on > > > 2.6.31.6. after running it for a day I have not yet seen a single > > > order:5 allocation problem ... (while I had several an hour before) > > > > > for the record, my kernel is now running with the following > > > patches: > > > > > > patch1:Date: Thu, 12 Nov 2009 19:30:31 +0000 > > > patch1:Subject: [PATCH 1/5] page allocator: Always wake kswapd when restarting an allocation attempt after direct reclaim failed > > > > > > patch2:Date: Thu, 12 Nov 2009 19:30:32 +0000 > > > patch2:Subject: [PATCH 2/5] page allocator: Do not allow interrupts to use ALLOC_HARDER > > > > > > patch3:Date: Thu, 12 Nov 2009 19:30:33 +0000 > > > patch3:Subject: [PATCH 3/5] page allocator: Wait on both sync and async congestion after direct reclaim > > > > > > patch4:Date: Thu, 12 Nov 2009 19:30:34 +0000 > > > patch4:Subject: [PATCH 4/5] vmscan: Have kswapd sleep for a short interval and double check it should be asleep > > > > > > patch5:Date: Fri, 13 Nov 2009 20:03:57 +0000 > > > patch5:Subject: [PATCH] vmscan: Stop kswapd waiting on congestion when the min watermark is not being met V2 > > > > > > patch6:Date: Tue, 17 Nov 2009 10:34:21 +0000 > > > patch6:Subject: [PATCH] vmscan: Have kswapd sleep for a short interval and double check it should be asleep fix 1 > > > > > I have now been running the new kernel for a few days and I am > > sorry to report that about a day after booting the allocation > > failures started showing again. More order:4 instead of order:5 ... > > > > Why has the order changed? ? no idea ... the order has changed after applying the patches cited above. > Also, what allocator were you using in 2.6.30 and 2.6.31.6, SLAB or > SLUB? Did you happen to change them when upgrading the kernel? I have been and still am using SLUB ... cheers tobi -- Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland http://it.oetiker.ch tobi@oetiker.ch ++41 62 775 9902 / sb: -9900 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 57+ messages in thread
[parent not found: <alpine.DEB.2.00.0912022210220.30023-EjsAmf5DE5zIvOfxy3zmAzgUDZmNtoG9@public.gmane.org>]
* Re: still getting allocation failures (was Re: [PATCH] vmscan: Stop kswapd waiting on congestion when the min watermark is not being met V2) [not found] ` <alpine.DEB.2.00.0912022210220.30023-EjsAmf5DE5zIvOfxy3zmAzgUDZmNtoG9@public.gmane.org> @ 2009-12-03 20:26 ` Corrado Zoccolo [not found] ` <4e5e476b0912031226i5b0e6cf9hdfd5519182ccdefa-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 57+ messages in thread From: Corrado Zoccolo @ 2009-12-03 20:26 UTC (permalink / raw) To: Tobias Oetiker Cc: Mel Gorman, Andrew Morton, KOSAKI Motohiro, Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, Pekka Enberg, Rik van Riel, Christoph Lameter, Stephan von Krawczynski, Rafael J. Wysocki, Kernel Testers List Hi Tobias, does the patch in http://lkml.org/lkml/2009/11/30/301 help with your high order allocation problems? It seems that you have lot of memory, but high order pages do not show up. The patch should make them more likely to appear. On my machine (that has much less ram than yours), with the patch, I always have order-10 pages available. Corrado On Wed, Dec 2, 2009 at 10:30 PM, Tobias Oetiker <tobi-7K0TWYW2a3pyDzI6CaY1VQ@public.gmane.org> wrote: > Hi Mel, > > Today Mel Gorman wrote: > >> On Sun, Nov 29, 2009 at 08:42:09AM +0100, Tobi Oetiker wrote: >> > Hi Mel, >> > >> > Thursday Tobias Oetiker wrote: >> > > Hi Mel, >> > > >> > > Nov 13 Mel Gorman wrote: >> > > >> > > > The last version has a stupid bug in it. Sorry. >> > > > >> > > > Changelog since V1 >> > > > o Fix incorrect negation >> > > > o Rename kswapd_no_congestion_wait to kswapd_skip_congestion_wait as >> > > > suggested by Rik >> > > > >> > > > If reclaim fails to make sufficient progress, the priority is raised. >> > > > Once the priority is higher, kswapd starts waiting on congestion. However, >> > > > if the zone is below the min watermark then kswapd needs to continue working >> > > > without delay as there is a danger of an increased rate of GFP_ATOMIC >> > > > allocation failure. >> > > > >> > > > This patch changes the conditions under which kswapd waits on >> > > > congestion by only going to sleep if the min watermarks are being met. >> > > >> > > I finally got around to test this together with the whole series on >> > > 2.6.31.6. after running it for a day I have not yet seen a single >> > > order:5 allocation problem ... (while I had several an hour before) >> > >> > > for the record, my kernel is now running with the following >> > > patches: >> > > >> > > patch1:Date: Thu, 12 Nov 2009 19:30:31 +0000 >> > > patch1:Subject: [PATCH 1/5] page allocator: Always wake kswapd when restarting an allocation attempt after direct reclaim failed >> > > >> > > patch2:Date: Thu, 12 Nov 2009 19:30:32 +0000 >> > > patch2:Subject: [PATCH 2/5] page allocator: Do not allow interrupts to use ALLOC_HARDER >> > > >> > > patch3:Date: Thu, 12 Nov 2009 19:30:33 +0000 >> > > patch3:Subject: [PATCH 3/5] page allocator: Wait on both sync and async congestion after direct reclaim >> > > >> > > patch4:Date: Thu, 12 Nov 2009 19:30:34 +0000 >> > > patch4:Subject: [PATCH 4/5] vmscan: Have kswapd sleep for a short interval and double check it should be asleep >> > > >> > > patch5:Date: Fri, 13 Nov 2009 20:03:57 +0000 >> > > patch5:Subject: [PATCH] vmscan: Stop kswapd waiting on congestion when the min watermark is not being met V2 >> > > >> > > patch6:Date: Tue, 17 Nov 2009 10:34:21 +0000 >> > > patch6:Subject: [PATCH] vmscan: Have kswapd sleep for a short interval and double check it should be asleep fix 1 >> > > >> > I have now been running the new kernel for a few days and I am >> > sorry to report that about a day after booting the allocation >> > failures started showing again. More order:4 instead of order:5 ... >> > >> >> Why has the order changed? > > ? no idea ... the order has changed after applying the patches > cited above. > >> Also, what allocator were you using in 2.6.30 and 2.6.31.6, SLAB or >> SLUB? Did you happen to change them when upgrading the kernel? > > I have been and still am using SLUB ... > > cheers > tobi > > > -- > Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland > http://it.oetiker.ch tobi-7K0TWYW2a3pyDzI6CaY1VQ@public.gmane.org ++41 62 775 9902 / sb: -9900 > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- __________________________________________________________________________ dott. Corrado Zoccolo mailto:czoccolo-Re5JQEeQqe8@public.gmane.orgm PhD - Department of Computer Science - University of Pisa, Italy -------------------------------------------------------------------------- The self-confidence of a warrior is not the self-confidence of the average man. The average man seeks certainty in the eyes of the onlooker and calls that self-confidence. The warrior seeks impeccability in his own eyes and calls that humbleness. Tales of Power - C. Castaneda ^ permalink raw reply [flat|nested] 57+ messages in thread
[parent not found: <4e5e476b0912031226i5b0e6cf9hdfd5519182ccdefa-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: still getting allocation failures (was Re: [PATCH] vmscan: Stop kswapd waiting on congestion when the min watermark is not being met V2) [not found] ` <4e5e476b0912031226i5b0e6cf9hdfd5519182ccdefa-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2009-12-14 5:59 ` Tobias Oetiker [not found] ` <alpine.DEB.2.00.0912140646550.12657-EjsAmf5DE5zIvOfxy3zmAzgUDZmNtoG9@public.gmane.org> 0 siblings, 1 reply; 57+ messages in thread From: Tobias Oetiker @ 2009-12-14 5:59 UTC (permalink / raw) To: Corrado Zoccolo Cc: Mel Gorman, Andrew Morton, KOSAKI Motohiro, Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, Pekka Enberg, Rik van Riel, Christoph Lameter, Stephan von Krawczynski, Rafael J. Wysocki, Kernel Testers List Hi Corrado, Dec 3 Corrado Zoccolo wrote: > Hi Tobias, > does the patch in http://lkml.org/lkml/2009/11/30/301 help with your > high order allocation problems? > It seems that you have lot of memory, but high order pages do not show up. > The patch should make them more likely to appear. > On my machine (that has much less ram than yours), with the patch, I > always have order-10 pages available. I have tried it and ... it does not work, the page allocation failure still shows. BUT while testing it on two machines I found that it only shows on on machine. The workload on the two machines is similar (they both run virtualbox) and also the available memory. Could it be caused by a hardware driver ? cheers tobi -- Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland http://it.oetiker.ch tobi-7K0TWYW2a3pyDzI6CaY1VQ@public.gmane.org ++41 62 775 9902 / sb: -9900 ^ permalink raw reply [flat|nested] 57+ messages in thread
[parent not found: <alpine.DEB.2.00.0912140646550.12657-EjsAmf5DE5zIvOfxy3zmAzgUDZmNtoG9@public.gmane.org>]
* Re: still getting allocation failures (was Re: [PATCH] vmscan: Stop kswapd waiting on congestion when the min watermark is not being met V2) [not found] ` <alpine.DEB.2.00.0912140646550.12657-EjsAmf5DE5zIvOfxy3zmAzgUDZmNtoG9@public.gmane.org> @ 2009-12-14 8:49 ` Corrado Zoccolo 0 siblings, 0 replies; 57+ messages in thread From: Corrado Zoccolo @ 2009-12-14 8:49 UTC (permalink / raw) To: Tobias Oetiker Cc: Mel Gorman, Andrew Morton, KOSAKI Motohiro, Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, Pekka Enberg, Rik van Riel, Christoph Lameter, Stephan von Krawczynski, Rafael J. Wysocki, Kernel Testers List Hi Tobi, On Mon, Dec 14, 2009 at 6:59 AM, Tobias Oetiker <tobi-7K0TWYW2a3pyDzI6CaY1VQ@public.gmane.org> wrote: > Hi Corrado, > > Dec 3 Corrado Zoccolo wrote: > >> Hi Tobias, >> does the patch in http://lkml.org/lkml/2009/11/30/301 help with your >> high order allocation problems? >> It seems that you have lot of memory, but high order pages do not show up. >> The patch should make them more likely to appear. >> On my machine (that has much less ram than yours), with the patch, I >> always have order-10 pages available. > > I have tried it and ... it does not work, the  page allocation > failure still shows. BUT while testing it on two machines I found that it > only shows on on machine. The workload on the two machines is > similar (they both run virtualbox) and also the available memory. Where those both failing before the patch? Did the order of failure change? > Could it be caused by a hardware driver ? It should be something that is taking more time to release pages, but I don't know what can it be. What happens if you drop the caches when you are getting failures? Does the failure rate drops as if you had just rebooted? Can you log at regular intervals the content of /proc/buddyinfo, and try correlating when the number of pages of the requested order are becoming scarce with some other event? Thanks, Corrado > > cheers > tobi > > -- > Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland > http://it.oetiker.ch tobi-7K0TWYW2a3pyDzI6CaY1VQ@public.gmane.org ++41 62 775 9902 / sb: -9900 > -- __________________________________________________________________________ dott. Corrado Zoccolo mailto:czoccolo-Re5JQEeQqe8@public.gmane.orgm PhD - Department of Computer Science - University of Pisa, Italy -------------------------------------------------------------------------- The self-confidence of a warrior is not the self-confidence of the average man. The average man seeks certainty in the eyes of the onlooker and calls that self-confidence. The warrior seeks impeccability in his own eyes and calls that humbleness. Tales of Power - C. Castaneda ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH] vmscan: Stop kswapd waiting on congestion when the min watermark is not being met 2009-11-13 18:15 ` [PATCH] vmscan: Stop kswapd waiting on congestion when the min watermark is not being met Mel Gorman 2009-11-13 18:26 ` Frans Pop [not found] ` <20091113181557.GM29804-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> @ 2009-11-13 18:36 ` Rik van Riel 2 siblings, 0 replies; 57+ messages in thread From: Rik van Riel @ 2009-11-13 18:36 UTC (permalink / raw) To: Mel Gorman Cc: Andrew Morton, KOSAKI Motohiro, Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski, Tobias Oetiker, linux-kernel, linux-mm@kvack.org, Pekka Enberg, Christoph Lameter, Stephan von Krawczynski, Rafael J. Wysocki, Kernel Testers List On 11/13/2009 01:15 PM, Mel Gorman wrote: > If reclaim fails to make sufficient progress, the priority is raised. > Once the priority is higher, kswapd starts waiting on congestion. However, > if the zone is below the min watermark then kswapd needs to continue working > without delay as there is a danger of an increased rate of GFP_ATOMIC > allocation failure. > > This patch changes the conditions under which kswapd waits on > congestion by only going to sleep if the min watermarks are being met. > > > diff --git a/mm/vmstat.c b/mm/vmstat.c > index 90b11e4..bc09547 100644 > --- a/mm/vmstat.c > +++ b/mm/vmstat.c > @@ -685,6 +685,7 @@ static const char * const vmstat_text[] = { > "kswapd_inodesteal", > "kswapd_slept_prematurely_fast", > "kswapd_slept_prematurely_slow", > + "kswapd_no_congestion_wait", > > Perhaps better named "kswapd_skip_congestion_wait" ? Other than that, the patch looks good to me. Reviewed-by: Rik van Riel <riel@redhat.com> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 5/5] vmscan: Take order into consideration when deciding if kswapd is in trouble 2009-11-13 9:54 ` KOSAKI Motohiro [not found] ` <20091113142608.33B9.A69D9226-+CUm20s59erQFUHtdCDX3A@public.gmane.org> @ 2009-11-13 14:38 ` Minchan Kim 1 sibling, 0 replies; 57+ messages in thread From: Minchan Kim @ 2009-11-13 14:38 UTC (permalink / raw) To: KOSAKI Motohiro Cc: Mel Gorman, Andrew Morton, Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski, Tobias Oetiker, linux-kernel, linux-mm@kvack.org", Pekka Enberg, Rik van Riel, Christoph Lameter, Stephan von Krawczynski, Rafael J. Wysocki, Kernel Testers List Hi, Koskai. I missed this patch. I noticed this after Mel reply your patch. On Fri, Nov 13, 2009 at 6:54 PM, KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote: >> If reclaim fails to make sufficient progress, the priority is raised. >> Once the priority is higher, kswapd starts waiting on congestion. >> However, on systems with large numbers of high-order atomics due to >> crappy network cards, it's important that kswapd keep working in >> parallel to save their sorry ass. >> >> This patch takes into account the order kswapd is reclaiming at before >> waiting on congestion. The higher the order, the longer it is before >> kswapd considers itself to be in trouble. The impact is that kswapd >> works harder in parallel rather than depending on direct reclaimers or >> atomic allocations to fail. >> >> Signed-off-by: Mel Gorman <mel@csn.ul.ie> >> --- >> mm/vmscan.c | 14 ++++++++++++-- >> 1 files changed, 12 insertions(+), 2 deletions(-) >> >> diff --git a/mm/vmscan.c b/mm/vmscan.c >> index ffa1766..5e200f1 100644 >> --- a/mm/vmscan.c >> +++ b/mm/vmscan.c >> @@ -1946,7 +1946,7 @@ static int sleeping_prematurely(int order, long remaining) >> static unsigned long balance_pgdat(pg_data_t *pgdat, int order) >> { >> int all_zones_ok; >> - int priority; >> + int priority, congestion_priority; >> int i; >> unsigned long total_scanned; >> struct reclaim_state *reclaim_state = current->reclaim_state; >> @@ -1967,6 +1967,16 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order) >> */ >> int temp_priority[MAX_NR_ZONES]; >> >> + /* >> + * When priority reaches congestion_priority, kswapd will sleep >> + * for a short time while congestion clears. The higher the >> + * order being reclaimed, the less likely kswapd will go to >> + * sleep as high-order allocations are harder to reclaim and >> + * stall direct reclaimers longer >> + */ >> + congestion_priority = DEF_PRIORITY - 2; >> + congestion_priority -= min(congestion_priority, sc.order); > > This calculation mean > > sc.order congestion_priority scan-pages > --------------------------------------------------------- > 0 10 1/1024 * zone-mem > 1 9 1/512 * zone-mem > 2 8 1/256 * zone-mem > 3 7 1/128 * zone-mem > 4 6 1/64 * zone-mem > 5 5 1/32 * zone-mem > 6 4 1/16 * zone-mem > 7 3 1/8 * zone-mem > 8 2 1/4 * zone-mem > 9 1 1/2 * zone-mem > 10 0 1 * zone-mem > 11+ 0 1 * zone-mem > > I feel this is too agressive. The intention of this congestion_wait() > is to prevent kswapd use 100% cpu time. but the above promotion seems > break it. I can't understand your point. Mel didn't change the number of scan pages. It denpends on priority. He just added another one to prevent frequent contestion_wait. Still, shrink_zone is called with priority, not congestion_priority. > example, > ia64 have 256MB hugepage (i.e. order=14). it mean kswapd never sleep. Indeed. Good catch. > example2, > order-3 (i.e. PAGE_ALLOC_COSTLY_ORDER) makes one of most inefficent > reclaim, because it doesn't use lumpy recliam. > I've seen 128GB size zone, it mean 1/128 = 1GB. oh well, kswapd definitely > waste cpu time 100%. Above I said, It depends on priority, not congestion_priority. > >> + >> loop_again: >> total_scanned = 0; >> sc.nr_reclaimed = 0; >> @@ -2092,7 +2102,7 @@ loop_again: >> * OK, kswapd is getting into trouble. Take a nap, then take >> * another pass across the zones. >> */ >> - if (total_scanned && priority < DEF_PRIORITY - 2) >> + if (total_scanned && priority < congestion_priority) >> congestion_wait(BLK_RW_ASYNC, HZ/10); > > Instead, How about this? > > > > --- > mm/vmscan.c | 13 ++++++++++++- > 1 files changed, 12 insertions(+), 1 deletions(-) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 64e4388..937e90d 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -1938,6 +1938,7 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order) > * free_pages == high_wmark_pages(zone). > */ > int temp_priority[MAX_NR_ZONES]; > + int has_under_min_watermark_zone = 0; Let's make the shorter. How about "under_min_watermark"? > > loop_again: > total_scanned = 0; > @@ -2057,6 +2058,15 @@ loop_again: > if (total_scanned > SWAP_CLUSTER_MAX * 2 && > total_scanned > sc.nr_reclaimed + sc.nr_reclaimed / 2) > sc.may_writepage = 1; > + > + /* > + * We are still under min water mark. it mean we have > + * GFP_ATOMIC allocation failure risk. Hurry up! > + */ > + if (!zone_watermark_ok(zone, order, min_wmark_pages(zone), > + end_zone, 0)) > + has_under_min_watermark_zone = 1; > + > } > if (all_zones_ok) > break; /* kswapd: all done */ > @@ -2064,7 +2074,8 @@ loop_again: > * OK, kswapd is getting into trouble. Take a nap, then take > * another pass across the zones. > */ > - if (total_scanned && priority < DEF_PRIORITY - 2) > + if (total_scanned && (priority < DEF_PRIORITY - 2) && > + !has_under_min_watermark_zone) > congestion_wait(BLK_RW_ASYNC, HZ/10); > > /* > -- > 1.6.2.5 Anyway, Looks good to me. It's more straightforward than Mel's one, I think. Reviewed-by: Minchan Kim <minchan.kim@gmail.com> -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 57+ messages in thread
[parent not found: <1258054235-3208-6-git-send-email-mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org>]
* Re: [PATCH 5/5] vmscan: Take order into consideration when deciding if kswapd is in trouble [not found] ` <1258054235-3208-6-git-send-email-mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> @ 2009-11-13 12:41 ` Minchan Kim 0 siblings, 0 replies; 57+ messages in thread From: Minchan Kim @ 2009-11-13 12:41 UTC (permalink / raw) To: Mel Gorman Cc: Andrew Morton, Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski, Tobias Oetiker, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org", KOSAKI Motohiro, Pekka Enberg, Rik van Riel, Christoph Lameter, Stephan von Krawczynski, Rafael J. Wysocki, Kernel Testers List On Fri, Nov 13, 2009 at 4:30 AM, Mel Gorman <mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> wrote: > If reclaim fails to make sufficient progress, the priority is raised. > Once the priority is higher, kswapd starts waiting on congestion. > However, on systems with large numbers of high-order atomics due to > crappy network cards, it's important that kswapd keep working in > parallel to save their sorry ass. > > This patch takes into account the order kswapd is reclaiming at before > waiting on congestion. The higher the order, the longer it is before > kswapd considers itself to be in trouble. The impact is that kswapd > works harder in parallel rather than depending on direct reclaimers or > atomic allocations to fail. > > Signed-off-by: Mel Gorman <mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> It's make sense to me. It can help high order atomic allocation which is a big problem of allocator. :) Thanks Mel. Reviewed-by: Minchan Kim <minchan.kim-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> -- Kind regards, Minchan Kim ^ permalink raw reply [flat|nested] 57+ messages in thread
[parent not found: <1258054235-3208-1-git-send-email-mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org>]
* Re: [PATCH 0/5] Reduce GFP_ATOMIC allocation failures, candidate fix V3 [not found] ` <1258054235-3208-1-git-send-email-mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> @ 2009-11-13 9:04 ` Frans Pop [not found] ` <200911131004.25293.elendil-EIBgga6/0yRmR6Xm/wNWPw@public.gmane.org> 2009-11-13 12:47 ` Tobias Oetiker 2009-11-15 12:07 ` Karol Lewandowski 2 siblings, 1 reply; 57+ messages in thread From: Frans Pop @ 2009-11-13 9:04 UTC (permalink / raw) To: Mel Gorman Cc: Andrew Morton, Jiri Kosina, Sven Geggus, Karol Lewandowski, Tobias Oetiker, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, KOSAKI Motohiro, Pekka Enberg, Rik van Riel, Christoph Lameter, Stephan von Krawczynski, Rafael J. Wysocki, Kernel Testers List, Chris Mason On Thursday 12 November 2009, Mel Gorman wrote: > Changelog since V2 > o Dropped the kswapd-quickly-notice-high-order patch. In more detailed > testing, it made latencies even worse as kswapd slept more on > high-order congestion causing order-0 direct reclaims. > o Added changes to how congestion_wait() works > o Added a number of new patches altering the behaviour of reclaim I have tested this series on top of .32-rc7. First impression is that it does seem to improve my test case, but does not yet completely solve it. My last gitk instance now loads more smoothly for most of the time it takes to complete, but I still see a choke point where things freeze for a while and where I get SKB allocation errors from my wireless. However, that choke point does seem to happen later and to be shorter than without the patches. I'll try to do additional tests (with .31). If you'd like me to run this set with your instrumentation patch for congestion_wait, then please let me know. Chris Mason's analysis regarding dm-crypt workqueues in reply to your other mail looks very interesting. Cheers, FJP ^ permalink raw reply [flat|nested] 57+ messages in thread
[parent not found: <200911131004.25293.elendil-EIBgga6/0yRmR6Xm/wNWPw@public.gmane.org>]
* Re: [PATCH 0/5] Reduce GFP_ATOMIC allocation failures, candidate fix V3 [not found] ` <200911131004.25293.elendil-EIBgga6/0yRmR6Xm/wNWPw@public.gmane.org> @ 2009-11-16 17:57 ` Mel Gorman 0 siblings, 0 replies; 57+ messages in thread From: Mel Gorman @ 2009-11-16 17:57 UTC (permalink / raw) To: Frans Pop Cc: Andrew Morton, Jiri Kosina, Sven Geggus, Karol Lewandowski, Tobias Oetiker, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, KOSAKI Motohiro, Pekka Enberg, Rik van Riel, Christoph Lameter, Stephan von Krawczynski, Rafael J. Wysocki, Kernel Testers List, Chris Mason On Fri, Nov 13, 2009 at 10:04:21AM +0100, Frans Pop wrote: > On Thursday 12 November 2009, Mel Gorman wrote: > > Changelog since V2 > > o Dropped the kswapd-quickly-notice-high-order patch. In more detailed > > testing, it made latencies even worse as kswapd slept more on > > high-order congestion causing order-0 direct reclaims. > > o Added changes to how congestion_wait() works > > o Added a number of new patches altering the behaviour of reclaim > > I have tested this series on top of .32-rc7. First impression is that it > does seem to improve my test case, but does not yet completely solve it. > > My last gitk instance now loads more smoothly for most of the time it takes > to complete, but I still see a choke point where things freeze for a while > and where I get SKB allocation errors from my wireless. > However, that choke point does seem to happen later and to be shorter than > without the patches. > I haven't fully figured out why this makes a difference yet, but with .32-rc7 and these patches, could you retry the test except beforehand do cd /sys for SYS in `find -name low_latency`; do echo 0 > $SYS done I believe the low_latency logic might be interfering with the number of clean pages available for kswapd to reclaim. Thanks -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 0/5] Reduce GFP_ATOMIC allocation failures, candidate fix V3 [not found] ` <1258054235-3208-1-git-send-email-mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> 2009-11-13 9:04 ` [PATCH 0/5] Reduce GFP_ATOMIC allocation failures, candidate fix V3 Frans Pop @ 2009-11-13 12:47 ` Tobias Oetiker [not found] ` <alpine.DEB.2.00.0911131346560.22447-ZxOFkyelvri73kmZFr1Urg@public.gmane.org> 2009-11-15 12:07 ` Karol Lewandowski 2 siblings, 1 reply; 57+ messages in thread From: Tobias Oetiker @ 2009-11-13 12:47 UTC (permalink / raw) To: Mel Gorman Cc: Andrew Morton, Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org\", KOSAKI Motohiro, Pekka Enberg, Rik van Riel, Christoph Lameter, Stephan von Krawczynski, Rafael J. Wysocki, Kernel Testers List Hi Mel, Yesterday Mel Gorman wrote: > Sorry for the long delay in posting another version. Testing is extremely > time-consuming and I wasn't getting to work on this as much as I'd have liked. > > Changelog since V2 > o Dropped the kswapd-quickly-notice-high-order patch. In more detailed > testing, it made latencies even worse as kswapd slept more on high-order > congestion causing order-0 direct reclaims. > o Added changes to how congestion_wait() works > o Added a number of new patches altering the behaviour of reclaim so is there anything promissing for the order 5 allocation problems in this set? cheers tobi -- Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland http://it.oetiker.ch tobi-7K0TWYW2a3pyDzI6CaY1VQ@public.gmane.org ++41 62 775 9902 / sb: -9900 ^ permalink raw reply [flat|nested] 57+ messages in thread
[parent not found: <alpine.DEB.2.00.0911131346560.22447-ZxOFkyelvri73kmZFr1Urg@public.gmane.org>]
* Re: [PATCH 0/5] Reduce GFP_ATOMIC allocation failures, candidate fix V3 [not found] ` <alpine.DEB.2.00.0911131346560.22447-ZxOFkyelvri73kmZFr1Urg@public.gmane.org> @ 2009-11-13 13:37 ` Mel Gorman 0 siblings, 0 replies; 57+ messages in thread From: Mel Gorman @ 2009-11-13 13:37 UTC (permalink / raw) To: Tobias Oetiker Cc: Andrew Morton, Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, KOSAKI Motohiro, Pekka Enberg, Rik van Riel, Christoph Lameter, Stephan von Krawczynski, Rafael J. Wysocki, Kernel Testers List On Fri, Nov 13, 2009 at 01:47:35PM +0100, Tobias Oetiker wrote: > Hi Mel, > > Yesterday Mel Gorman wrote: > > > Sorry for the long delay in posting another version. Testing is extremely > > time-consuming and I wasn't getting to work on this as much as I'd have liked. > > > > Changelog since V2 > > o Dropped the kswapd-quickly-notice-high-order patch. In more detailed > > testing, it made latencies even worse as kswapd slept more on high-order > > congestion causing order-0 direct reclaims. > > o Added changes to how congestion_wait() works > > o Added a number of new patches altering the behaviour of reclaim > > so is there anything promissing for the order 5 allocation problems > in this set? > Yes. While the change in timing of direct reclaimers might be less important when dm-crypt is not involved, kswapd is more pro-active about maintaining the watermarks. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 0/5] Reduce GFP_ATOMIC allocation failures, candidate fix V3 [not found] ` <1258054235-3208-1-git-send-email-mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> 2009-11-13 9:04 ` [PATCH 0/5] Reduce GFP_ATOMIC allocation failures, candidate fix V3 Frans Pop 2009-11-13 12:47 ` Tobias Oetiker @ 2009-11-15 12:07 ` Karol Lewandowski [not found] ` <20091115120721.GA7557-nLtalAL5mPp2RxbNQum0x1nzlInOXLuq@public.gmane.org> 2 siblings, 1 reply; 57+ messages in thread From: Karol Lewandowski @ 2009-11-15 12:07 UTC (permalink / raw) To: Mel Gorman Cc: Andrew Morton, Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski, Tobias Oetiker, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, KOSAKI Motohiro, Pekka Enberg, Rik van Riel, Christoph Lameter, Stephan von Krawczynski, Rafael J. Wysocki, Kernel Testers List On Thu, Nov 12, 2009 at 07:30:30PM +0000, Mel Gorman wrote: > [Bug #14265] ifconfig: page allocation failure. order:5, mode:0x8020 w/ e100 > Patches 1-3 should be tested first. The testing I've done shows that the > page allocator and behaviour of congestion_wait() is more in line with > 2.6.30 than the vanilla kernels. > > It'd be nice to have 2 more tests, applying each patch on top noting any > behaviour change. i.e. ideally there would be results for > > o patches 1+2+3 > o patches 1+2+3+4 > o patches 1+2+3+4+5 > > Of course, any tests results are welcome. The rest of the mail is the > results of my own tests. I've tried testing 3+4+5 against 2.6.32-rc7 (1+2 seem to be in mainline) and got failure. I've noticed something strange (I think). I was unable to trigger failures when system was under heavy memory pressure (i.e. my testing - gitk, firefoxes, etc.). When I killed almost all memory hogs, put system into sleep and resumed -- it failed. free(1) showed: total used free shared buffers cached Mem: 255240 194052 61188 0 4040 49364 -/+ buffers/cache: 140648 114592 Swap: 514040 72712 441328 Is that ok? Wild guess -- maybe kswapd doesn't take fragmentation (or other factors) into account as hard as it used to in 2.6.30? Thanks. ^ permalink raw reply [flat|nested] 57+ messages in thread
[parent not found: <20091115120721.GA7557-nLtalAL5mPp2RxbNQum0x1nzlInOXLuq@public.gmane.org>]
* Re: [PATCH 0/5] Reduce GFP_ATOMIC allocation failures, candidate fix V3 [not found] ` <20091115120721.GA7557-nLtalAL5mPp2RxbNQum0x1nzlInOXLuq@public.gmane.org> @ 2009-11-16 9:52 ` Mel Gorman 2009-11-16 12:08 ` Karol Lewandowski 0 siblings, 1 reply; 57+ messages in thread From: Mel Gorman @ 2009-11-16 9:52 UTC (permalink / raw) To: Karol Lewandowski Cc: Andrew Morton, Frans Pop, Jiri Kosina, Sven Geggus, Tobias Oetiker, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, KOSAKI Motohiro, Pekka Enberg, Rik van Riel, Christoph Lameter, Stephan von Krawczynski, Rafael J. Wysocki, Kernel Testers List On Sun, Nov 15, 2009 at 01:07:21PM +0100, Karol Lewandowski wrote: > On Thu, Nov 12, 2009 at 07:30:30PM +0000, Mel Gorman wrote: > > > [Bug #14265] ifconfig: page allocation failure. order:5, mode:0x8020 w/ e100 > > > Patches 1-3 should be tested first. The testing I've done shows that the > > page allocator and behaviour of congestion_wait() is more in line with > > 2.6.30 than the vanilla kernels. > > > > It'd be nice to have 2 more tests, applying each patch on top noting any > > behaviour change. i.e. ideally there would be results for > > > > o patches 1+2+3 > > o patches 1+2+3+4 > > o patches 1+2+3+4+5 > > > > Of course, any tests results are welcome. The rest of the mail is the > > results of my own tests. > > I've tried testing 3+4+5 against 2.6.32-rc7 (1+2 seem to be in > mainline) and got failure. I've noticed something strange (I think). > I was unable to trigger failures when system was under heavy memory > pressure (i.e. my testing - gitk, firefoxes, etc.). When I killed > almost all memory hogs, put system into sleep and resumed -- it > failed. free(1) showed: > > total used free shared buffers cached > Mem: 255240 194052 61188 0 4040 49364 > -/+ buffers/cache: 140648 114592 > Swap: 514040 72712 441328 > > > Is that ok? Wild guess -- maybe kswapd doesn't take fragmentation (or > other factors) into account as hard as it used to in 2.6.30? > That's a lot of memory free. I take it the order-5 GFP_ATOMIC allocation failed. What was the dmesg for it please? -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 0/5] Reduce GFP_ATOMIC allocation failures, candidate fix V3 2009-11-16 9:52 ` Mel Gorman @ 2009-11-16 12:08 ` Karol Lewandowski [not found] ` <20091116120845.GA10115-nLtalAL5mPp2RxbNQum0x1nzlInOXLuq@public.gmane.org> 0 siblings, 1 reply; 57+ messages in thread From: Karol Lewandowski @ 2009-11-16 12:08 UTC (permalink / raw) To: Mel Gorman Cc: Karol Lewandowski, Andrew Morton, Frans Pop, Jiri Kosina, Sven Geggus, Tobias Oetiker, linux-kernel, linux-mm@kvack.org, KOSAKI Motohiro, Pekka Enberg, Rik van Riel, Christoph Lameter, Stephan von Krawczynski, Rafael J. Wysocki, Kernel Testers List On Mon, Nov 16, 2009 at 09:52:58AM +0000, Mel Gorman wrote: > On Sun, Nov 15, 2009 at 01:07:21PM +0100, Karol Lewandowski wrote: > > total used free shared buffers cached > > Mem: 255240 194052 61188 0 4040 49364 > > -/+ buffers/cache: 140648 114592 > > Swap: 514040 72712 441328 > > > > > > Is that ok? Wild guess -- maybe kswapd doesn't take fragmentation (or > > other factors) into account as hard as it used to in 2.6.30? > That's a lot of memory free. I take it the order-5 GFP_ATOMIC allocation > failed. What was the dmesg for it please? Sure, it's attached below. Hmm, "Normal free"/"DMA free" are much lower than 61MB as shown above. free(1) output was collected right after resume successfully finished. Thanks. e100: Intel(R) PRO/100 Network Driver, 3.5.24-k2-NAPI e100: Copyright(c) 1999-2006 Intel Corporation e100 0000:00:03.0: PCI INT A -> Link[LNKC] -> GSI 9 (level, low) -> IRQ 9 e100 0000:00:03.0: PME# disabled e100: eth0: e100_probe: addr 0xe8120000, irq 9, MAC addr 00:10:a4:89:e8:84 ifconfig: page allocation failure. order:5, mode:0x8020 Pid: 7339, comm: ifconfig Not tainted 2.6.32-rc7-dirty #3 Call Trace: [<c0168704>] ? __alloc_pages_nodemask+0x45d/0x4eb [<c01050b3>] ? dma_generic_alloc_coherent+0x4a/0xa7 [<c0105069>] ? dma_generic_alloc_coherent+0x0/0xa7 [<d0935ba0>] ? e100_alloc_cbs+0xc2/0x16f [e100] [<d0936d23>] ? e100_up+0x1b/0xf5 [e100] [<d0936e14>] ? e100_open+0x17/0x41 [e100] [<c03245cc>] ? dev_open+0x8f/0xc5 [<c0323d54>] ? dev_change_flags+0xa2/0x155 [<c03576b0>] ? devinet_ioctl+0x22a/0x51e [<c0316ebc>] ? sock_ioctl+0x1b6/0x1da [<c0316d06>] ? sock_ioctl+0x0/0x1da [<c0193f9b>] ? vfs_ioctl+0x1c/0x5f [<c0194565>] ? do_vfs_ioctl+0x4bf/0x4fb [<c0373362>] ? do_page_fault+0x33b/0x351 [<c01945cd>] ? sys_ioctl+0x2c/0x42 [<c0102834>] ? sysenter_do_call+0x12/0x26 Mem-Info: DMA per-cpu: CPU 0: hi: 0, btch: 1 usd: 0 Normal per-cpu: CPU 0: hi: 90, btch: 15 usd: 89 active_anon:15632 inactive_anon:17534 isolated_anon:0 active_file:12373 inactive_file:12884 isolated_file:0 unevictable:0 dirty:15 writeback:62 unstable:0 free:2072 slab_reclaimable:849 slab_unreclaimable:1080 mapped:7343 shmem:233 pagetables:505 bounce:0 DMA free:1440kB min:120kB low:148kB high:180kB active_anon:224kB inactive_anon:1516kB active_file:3440kB inactive_file:5140kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15808kB mlocked:0kB dirty:0kB writeback:0kB mapped:404kB shmem:44kB slab_reclaimable:16kB slab_unreclaimable:16kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 238 238 Normal free:6848kB min:1912kB low:2388kB high:2868kB active_anon:62304kB inactive_anon:68620kB active_file:46052kB inactive_file:46396kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:243776kB mlocked:0kB dirty:60kB writeback:248kB mapped:28968kB shmem:888kB slab_reclaimable:3380kB slab_unreclaimable:4304kB kernel_stack:464kB pagetables:2020kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:128 all_unreclaimable? no lowmem_reserve[]: 0 0 0 DMA: 4*4kB 8*8kB 33*16kB 22*32kB 2*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1440kB Normal: 486*4kB 271*8kB 139*16kB 12*32kB 0*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 6848kB 29040 total pagecache pages 3548 pages in swap cache Swap cache stats: add 72385, delete 68837, find 70772/74308 Free swap = 483200kB Total swap = 514040kB 65520 pages RAM 1726 pages reserved 20970 pages shared 50976 pages non-shared ifconfig: page allocation failure. order:5, mode:0x8020 Pid: 7339, comm: ifconfig Not tainted 2.6.32-rc7-dirty #3 Call Trace: [<c0168704>] ? __alloc_pages_nodemask+0x45d/0x4eb [<c01050b3>] ? dma_generic_alloc_coherent+0x4a/0xa7 [<c0105069>] ? dma_generic_alloc_coherent+0x0/0xa7 [<d0935ba0>] ? e100_alloc_cbs+0xc2/0x16f [e100] [<d0936d23>] ? e100_up+0x1b/0xf5 [e100] [<d0936e14>] ? e100_open+0x17/0x41 [e100] [<c03245cc>] ? dev_open+0x8f/0xc5 [<c0323d54>] ? dev_change_flags+0xa2/0x155 [<c03576b0>] ? devinet_ioctl+0x22a/0x51e [<c0316ebc>] ? sock_ioctl+0x1b6/0x1da [<c0316d06>] ? sock_ioctl+0x0/0x1da [<c0193f9b>] ? vfs_ioctl+0x1c/0x5f [<c0194565>] ? do_vfs_ioctl+0x4bf/0x4fb [<c0189370>] ? vfs_write+0xf4/0x105 [<c01945cd>] ? sys_ioctl+0x2c/0x42 [<c0102834>] ? sysenter_do_call+0x12/0x26 Mem-Info: DMA per-cpu: CPU 0: hi: 0, btch: 1 usd: 0 Normal per-cpu: CPU 0: hi: 90, btch: 15 usd: 82 active_anon:15563 inactive_anon:17593 isolated_anon:0 active_file:12294 inactive_file:12787 isolated_file:0 unevictable:0 dirty:18 writeback:155 unstable:0 free:2267 slab_reclaimable:847 slab_unreclaimable:1080 mapped:7271 shmem:229 pagetables:505 bounce:0 DMA free:1440kB min:120kB low:148kB high:180kB active_anon:224kB inactive_anon:1516kB active_file:3440kB inactive_file:5140kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15808kB mlocked:0kB dirty:0kB writeback:0kB mapped:404kB shmem:44kB slab_reclaimable:16kB slab_unreclaimable:16kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 238 238 Normal free:7628kB min:1912kB low:2388kB high:2868kB active_anon:62028kB inactive_anon:68856kB active_file:45736kB inactive_file:46008kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:243776kB mlocked:0kB dirty:72kB writeback:620kB mapped:28680kB shmem:872kB slab_reclaimable:3372kB slab_unreclaimable:4304kB kernel_stack:464kB pagetables:2020kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 DMA: 4*4kB 8*8kB 33*16kB 22*32kB 2*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1440kB Normal: 559*4kB 290*8kB 152*16kB 16*32kB 0*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 7628kB 29016 total pagecache pages 3702 pages in swap cache Swap cache stats: add 72550, delete 68848, find 70774/74310 Free swap = 482540kB Total swap = 514040kB 65520 pages RAM 1726 pages reserved 20826 pages shared 50895 pages non-shared e100 0000:00:03.0: firmware: requesting e100/d101s_ucode.bin e100: eth0 NIC Link is Up 100 Mbps Full Duplex crashreporter[9491]: segfault at b67701a0 ip b67701a0 sp bfa0fc2c error 4 in libnss_files-2.7.so[b67b6000+a000] -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 57+ messages in thread
[parent not found: <20091116120845.GA10115-nLtalAL5mPp2RxbNQum0x1nzlInOXLuq@public.gmane.org>]
* Re: [PATCH 0/5] Reduce GFP_ATOMIC allocation failures, candidate fix V3 [not found] ` <20091116120845.GA10115-nLtalAL5mPp2RxbNQum0x1nzlInOXLuq@public.gmane.org> @ 2009-11-16 14:32 ` Karol Lewandowski 0 siblings, 0 replies; 57+ messages in thread From: Karol Lewandowski @ 2009-11-16 14:32 UTC (permalink / raw) To: Karol Lewandowski Cc: Mel Gorman, Andrew Morton, Frans Pop, Jiri Kosina, Sven Geggus, Tobias Oetiker, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, KOSAKI Motohiro, Pekka Enberg, Rik van Riel, Christoph Lameter, Stephan von Krawczynski, Rafael J. Wysocki, Kernel Testers List On Mon, Nov 16, 2009 at 01:08:45PM +0100, Karol Lewandowski wrote: > On Mon, Nov 16, 2009 at 09:52:58AM +0000, Mel Gorman wrote: > > On Sun, Nov 15, 2009 at 01:07:21PM +0100, Karol Lewandowski wrote: > > > total used free shared buffers cached > > > Mem: 255240 194052 61188 0 4040 49364 > > > -/+ buffers/cache: 140648 114592 > > > Swap: 514040 72712 441328 > > > > > > > > > Is that ok? Wild guess -- maybe kswapd doesn't take fragmentation (or > > > other factors) into account as hard as it used to in 2.6.30? > > > That's a lot of memory free. I take it the order-5 GFP_ATOMIC allocation > > failed. What was the dmesg for it please? > > Sure, it's attached below. > > Hmm, "Normal free"/"DMA free" are much lower than 61MB as shown above. > free(1) output was collected right after resume successfully finished. Replying to myself, but well, it happened again: $ free total used free shared buffers cached Mem: 255240 189400 65840 0 3968 58916 -/+ buffers/cache: 126516 128724 Swap: 514040 98460 415580 $ dmesg e100: Intel(R) PRO/100 Network Driver, 3.5.24-k2-NAPI e100: Copyright(c) 1999-2006 Intel Corporation e100 0000:00:03.0: PCI INT A -> Link[LNKC] -> GSI 9 (level, low) -> IRQ 9 e100 0000:00:03.0: PME# disabled e100: eth0: e100_probe: addr 0xe8120000, irq 9, MAC addr 00:10:a4:89:e8:84 ifconfig: page allocation failure. order:5, mode:0x8020 Pid: 10402, comm: ifconfig Not tainted 2.6.32-rc7-dirty #3 Call Trace: [<c0168704>] ? __alloc_pages_nodemask+0x45d/0x4eb [<c01050b3>] ? dma_generic_alloc_coherent+0x4a/0xa7 [<c0105069>] ? dma_generic_alloc_coherent+0x0/0xa7 [<d0935ba0>] ? e100_alloc_cbs+0xc2/0x16f [e100] [<d0936d23>] ? e100_up+0x1b/0xf5 [e100] [<d0936e14>] ? e100_open+0x17/0x41 [e100] [<c03245cc>] ? dev_open+0x8f/0xc5 [<c0323d54>] ? dev_change_flags+0xa2/0x155 [<c03576b0>] ? devinet_ioctl+0x22a/0x51e [<c0316ebc>] ? sock_ioctl+0x1b6/0x1da [<c0316d06>] ? sock_ioctl+0x0/0x1da [<c0193f9b>] ? vfs_ioctl+0x1c/0x5f [<c0194565>] ? do_vfs_ioctl+0x4bf/0x4fb [<c0373362>] ? do_page_fault+0x33b/0x351 [<c01945cd>] ? sys_ioctl+0x2c/0x42 [<c0102834>] ? sysenter_do_call+0x12/0x26 Mem-Info: DMA per-cpu: CPU 0: hi: 0, btch: 1 usd: 0 Normal per-cpu: CPU 0: hi: 90, btch: 15 usd: 81 active_anon:10734 inactive_anon:19229 isolated_anon:0 active_file:14360 inactive_file:14515 isolated_file:0 unevictable:0 dirty:7 writeback:53 unstable:0 free:1101 slab_reclaimable:1334 slab_unreclaimable:1143 mapped:2686 shmem:44 pagetables:522 bounce:0 DMA free:1496kB min:120kB low:148kB high:180kB active_anon:724kB inactive_anon:1520kB active_file:3752kB inactive_file:3964kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15808kB mlocked:0kB dirty:0kB writeback:0kB mapped:324kB shmem:0kB slab_reclaimable:216kB slab_unreclaimable:36kB kernel_stack:4kB pagetables:16kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 238 238 Normal free:2908kB min:1912kB low:2388kB high:2868kB active_anon:42212kB inactive_anon:75396kB active_file:53688kB inactive_file:54096kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:243776kB mlocked:0kB dirty:28kB writeback:212kB mapped:10420kB shmem:176kB slab_reclaimable:5120kB slab_unreclaimable:4536kB kernel_stack:444kB pagetables:2072kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 DMA: 26*4kB 48*8kB 53*16kB 1*32kB 2*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1496kB Normal: 275*4kB 58*8kB 50*16kB 13*32kB 0*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 2908kB 35091 total pagecache pages 6168 pages in swap cache Swap cache stats: add 367741, delete 361573, find 144185/185303 Free swap = 438964kB Total swap = 514040kB 65520 pages RAM 1726 pages reserved 13452 pages shared 53168 pages non-shared ifconfig: page allocation failure. order:5, mode:0x8020 Pid: 10402, comm: ifconfig Not tainted 2.6.32-rc7-dirty #3 Call Trace: [<c0168704>] ? __alloc_pages_nodemask+0x45d/0x4eb [<c01050b3>] ? dma_generic_alloc_coherent+0x4a/0xa7 [<c0105069>] ? dma_generic_alloc_coherent+0x0/0xa7 [<d0935ba0>] ? e100_alloc_cbs+0xc2/0x16f [e100] [<d0936d23>] ? e100_up+0x1b/0xf5 [e100] [<d0936e14>] ? e100_open+0x17/0x41 [e100] [<c03245cc>] ? dev_open+0x8f/0xc5 [<c0323d54>] ? dev_change_flags+0xa2/0x155 [<c03576b0>] ? devinet_ioctl+0x22a/0x51e [<c0316ebc>] ? sock_ioctl+0x1b6/0x1da [<c0316d06>] ? sock_ioctl+0x0/0x1da [<c0193f9b>] ? vfs_ioctl+0x1c/0x5f [<c0194565>] ? do_vfs_ioctl+0x4bf/0x4fb [<c0189370>] ? vfs_write+0xf4/0x105 [<c01945cd>] ? sys_ioctl+0x2c/0x42 [<c0102834>] ? sysenter_do_call+0x12/0x26 Mem-Info: DMA per-cpu: CPU 0: hi: 0, btch: 1 usd: 0 Normal per-cpu: CPU 0: hi: 90, btch: 15 usd: 93 active_anon:10715 inactive_anon:19236 isolated_anon:0 active_file:14312 inactive_file:14473 isolated_file:0 unevictable:0 dirty:7 writeback:52 unstable:0 free:1191 slab_reclaimable:1334 slab_unreclaimable:1144 mapped:2686 shmem:44 pagetables:522 bounce:0 DMA free:1496kB min:120kB low:148kB high:180kB active_anon:724kB inactive_anon:1520kB active_file:3752kB inactive_file:3964kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15808kB mlocked:0kB dirty:0kB writeback:0kB mapped:324kB shmem:0kB slab_reclaimable:216kB slab_unreclaimable:36kB kernel_stack:4kB pagetables:16kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 238 238 Normal free:3268kB min:1912kB low:2388kB high:2868kB active_anon:42136kB inactive_anon:75424kB active_file:53496kB inactive_file:53928kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:243776kB mlocked:0kB dirty:28kB writeback:208kB mapped:10420kB shmem:176kB slab_reclaimable:5120kB slab_unreclaimable:4540kB kernel_stack:444kB pagetables:2072kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 DMA: 26*4kB 48*8kB 53*16kB 1*32kB 2*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1496kB Normal: 291*4kB 73*8kB 63*16kB 12*32kB 0*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3268kB 35039 total pagecache pages 6209 pages in swap cache Swap cache stats: add 367794, delete 361585, find 144185/185304 Free swap = 438756kB Total swap = 514040kB 65520 pages RAM 1726 pages reserved 13445 pages shared 53086 pages non-shared ifconfig: page allocation failure. order:5, mode:0x8020 Pid: 10440, comm: ifconfig Not tainted 2.6.32-rc7-dirty #3 Call Trace: [<c0168704>] ? __alloc_pages_nodemask+0x45d/0x4eb [<c01050b3>] ? dma_generic_alloc_coherent+0x4a/0xa7 [<c0105069>] ? dma_generic_alloc_coherent+0x0/0xa7 [<d0935ba0>] ? e100_alloc_cbs+0xc2/0x16f [e100] [<d0936d23>] ? e100_up+0x1b/0xf5 [e100] [<d0936e14>] ? e100_open+0x17/0x41 [e100] [<c03245cc>] ? dev_open+0x8f/0xc5 [<c0323d54>] ? dev_change_flags+0xa2/0x155 [<c03576b0>] ? devinet_ioctl+0x22a/0x51e [<c0316ebc>] ? sock_ioctl+0x1b6/0x1da [<c0316d06>] ? sock_ioctl+0x0/0x1da [<c0193f9b>] ? vfs_ioctl+0x1c/0x5f [<c0194565>] ? do_vfs_ioctl+0x4bf/0x4fb [<c0373362>] ? do_page_fault+0x33b/0x351 [<c01945cd>] ? sys_ioctl+0x2c/0x42 [<c0102834>] ? sysenter_do_call+0x12/0x26 Mem-Info: DMA per-cpu: CPU 0: hi: 0, btch: 1 usd: 0 Normal per-cpu: CPU 0: hi: 90, btch: 15 usd: 27 active_anon:9587 inactive_anon:19792 isolated_anon:0 active_file:12298 inactive_file:12256 isolated_file:0 unevictable:0 dirty:13 writeback:442 unstable:0 free:6098 slab_reclaimable:1287 slab_unreclaimable:1142 mapped:2436 shmem:37 pagetables:519 bounce:0 DMA free:1948kB min:120kB low:148kB high:180kB active_anon:724kB inactive_anon:1520kB active_file:3752kB inactive_file:3524kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15808kB mlocked:0kB dirty:0kB writeback:0kB mapped:324kB shmem:0kB slab_reclaimable:208kB slab_unreclaimable:32kB kernel_stack:4kB pagetables:16kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 238 238 Normal free:22444kB min:1912kB low:2388kB high:2868kB active_anon:37624kB inactive_anon:77648kB active_file:45440kB inactive_file:45500kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:243776kB mlocked:0kB dirty:52kB writeback:1768kB mapped:9420kB shmem:148kB slab_reclaimable:4940kB slab_unreclaimable:4536kB kernel_stack:444kB pagetables:2060kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 DMA: 115*4kB 54*8kB 56*16kB 1*32kB 2*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1948kB Normal: 2005*4kB 829*8kB 379*16kB 50*32kB 0*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 22444kB 32732 total pagecache pages 8128 pages in swap cache Swap cache stats: add 370349, delete 362221, find 144309/185439 Free swap = 428812kB Total swap = 514040kB 65520 pages RAM 1726 pages reserved 10771 pages shared 50488 pages non-shared ifconfig: page allocation failure. order:5, mode:0x8020 Pid: 10440, comm: ifconfig Not tainted 2.6.32-rc7-dirty #3 Call Trace: [<c0168704>] ? __alloc_pages_nodemask+0x45d/0x4eb [<c01050b3>] ? dma_generic_alloc_coherent+0x4a/0xa7 [<c0105069>] ? dma_generic_alloc_coherent+0x0/0xa7 [<d0935ba0>] ? e100_alloc_cbs+0xc2/0x16f [e100] [<d0936d23>] ? e100_up+0x1b/0xf5 [e100] [<d0936e14>] ? e100_open+0x17/0x41 [e100] [<c03245cc>] ? dev_open+0x8f/0xc5 [<c0323d54>] ? dev_change_flags+0xa2/0x155 [<c03576b0>] ? devinet_ioctl+0x22a/0x51e [<c0316ebc>] ? sock_ioctl+0x1b6/0x1da [<c0316d06>] ? sock_ioctl+0x0/0x1da [<c0193f9b>] ? vfs_ioctl+0x1c/0x5f [<c0194565>] ? do_vfs_ioctl+0x4bf/0x4fb [<c0101683>] ? __switch_to+0xf/0x14c [<c011d24b>] ? finish_task_switch+0x2e/0x7d [<c0370423>] ? schedule+0x4cc/0x4f3 [<c01945cd>] ? sys_ioctl+0x2c/0x42 [<c0102834>] ? sysenter_do_call+0x12/0x26 Mem-Info: DMA per-cpu: CPU 0: hi: 0, btch: 1 usd: 0 Normal per-cpu: CPU 0: hi: 90, btch: 15 usd: 30 active_anon:9587 inactive_anon:19792 isolated_anon:0 active_file:12298 inactive_file:12270 isolated_file:0 unevictable:0 dirty:13 writeback:434 unstable:0 free:6083 slab_reclaimable:1287 slab_unreclaimable:1143 mapped:2449 shmem:37 pagetables:519 bounce:0 DMA free:1948kB min:120kB low:148kB high:180kB active_anon:724kB inactive_anon:1520kB active_file:3752kB inactive_file:3524kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15808kB mlocked:0kB dirty:0kB writeback:0kB mapped:324kB shmem:0kB slab_reclaimable:208kB slab_unreclaimable:32kB kernel_stack:4kB pagetables:16kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 238 238 Normal free:22384kB min:1912kB low:2388kB high:2868kB active_anon:37624kB inactive_anon:77648kB active_file:45440kB inactive_file:45556kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:243776kB mlocked:0kB dirty:52kB writeback:1724kB mapped:9472kB shmem:148kB slab_reclaimable:4940kB slab_unreclaimable:4540kB kernel_stack:444kB pagetables:2060kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 DMA: 115*4kB 54*8kB 56*16kB 1*32kB 2*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1948kB Normal: 1990*4kB 829*8kB 379*16kB 50*32kB 0*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 22384kB 32740 total pagecache pages 8128 pages in swap cache Swap cache stats: add 370349, delete 362221, find 144309/185439 Free swap = 428812kB Total swap = 514040kB 65520 pages RAM 1726 pages reserved 10844 pages shared 50429 pages non-shared ^ permalink raw reply [flat|nested] 57+ messages in thread
end of thread, other threads:[~2009-12-14 8:49 UTC | newest] Thread overview: 57+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-11-12 19:30 [PATCH 0/5] Reduce GFP_ATOMIC allocation failures, candidate fix V3 Mel Gorman 2009-11-12 19:30 ` [PATCH 1/5] page allocator: Always wake kswapd when restarting an allocation attempt after direct reclaim failed Mel Gorman 2009-11-13 5:23 ` KOSAKI Motohiro 2009-11-13 13:55 ` Mel Gorman 2009-11-12 19:30 ` [PATCH 2/5] page allocator: Do not allow interrupts to use ALLOC_HARDER Mel Gorman [not found] ` <1258054235-3208-3-git-send-email-mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> 2009-11-13 5:24 ` KOSAKI Motohiro 2009-11-13 13:56 ` Mel Gorman 2009-11-12 19:30 ` [PATCH 3/5] page allocator: Wait on both sync and async congestion after direct reclaim Mel Gorman [not found] ` <1258054235-3208-4-git-send-email-mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> 2009-11-13 11:20 ` KOSAKI Motohiro [not found] ` <20091113142526.33B3.A69D9226-+CUm20s59erQFUHtdCDX3A@public.gmane.org> 2009-11-13 11:55 ` Jens Axboe [not found] ` <20091113115558.GY8742-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org> 2009-11-13 12:28 ` Mel Gorman [not found] ` <20091113122821.GC29804-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> 2009-11-13 13:32 ` Jens Axboe [not found] ` <20091113133211.GA8742-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org> 2009-11-13 13:41 ` Pekka Enberg 2009-11-13 15:22 ` Chris Mason 2009-11-13 14:16 ` Mel Gorman 2009-11-20 14:56 ` Mel Gorman 2009-11-12 19:30 ` [PATCH 4/5] vmscan: Have kswapd sleep for a short interval and double check it should be asleep Mel Gorman 2009-11-13 10:43 ` KOSAKI Motohiro [not found] ` <20091113142558.33B6.A69D9226-+CUm20s59erQFUHtdCDX3A@public.gmane.org> 2009-11-13 14:13 ` Mel Gorman 2009-11-13 18:00 ` KOSAKI Motohiro [not found] ` <20091114023901.3DA8.A69D9226-+CUm20s59erQFUHtdCDX3A@public.gmane.org> 2009-11-13 18:17 ` Mel Gorman [not found] ` <20091113181740.GN29804-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> 2009-11-14 9:34 ` KOSAKI Motohiro [not found] ` <2f11576a0911140134u21eafa83t9642bb25ccd953de-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2009-11-14 15:46 ` Mel Gorman [not found] ` <20091114154636.GR29804-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> 2009-11-17 11:03 ` KOSAKI Motohiro [not found] ` <20091117141638.3DCB.A69D9226-+CUm20s59erQFUHtdCDX3A@public.gmane.org> 2009-11-17 11:44 ` Mel Gorman 2009-11-17 12:18 ` KOSAKI Motohiro 2009-11-17 12:25 ` Mel Gorman [not found] ` <20091117122555.GZ29804-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> 2009-11-18 5:20 ` KOSAKI Motohiro 2009-11-17 10:34 ` [PATCH] vmscan: Have kswapd sleep for a short interval and double check it should be asleep fix 1 Mel Gorman 2009-11-18 5:27 ` KOSAKI Motohiro 2009-11-12 19:30 ` [PATCH 5/5] vmscan: Take order into consideration when deciding if kswapd is in trouble Mel Gorman 2009-11-13 9:54 ` KOSAKI Motohiro [not found] ` <20091113142608.33B9.A69D9226-+CUm20s59erQFUHtdCDX3A@public.gmane.org> 2009-11-13 13:54 ` Mel Gorman [not found] ` <20091113135443.GF29804-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> 2009-11-13 14:48 ` Minchan Kim 2009-11-13 18:00 ` KOSAKI Motohiro [not found] ` <20091114023138.3DA5.A69D9226-+CUm20s59erQFUHtdCDX3A@public.gmane.org> 2009-11-13 18:15 ` [PATCH] vmscan: Stop kswapd waiting on congestion when the min watermark is not being met Mel Gorman 2009-11-13 18:26 ` Frans Pop [not found] ` <20091113181557.GM29804-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> 2009-11-13 18:33 ` KOSAKI Motohiro [not found] ` <2f11576a0911131033w4a9e6042k3349f0be290a167e-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2009-11-13 20:03 ` [PATCH] vmscan: Stop kswapd waiting on congestion when the min watermark is not being met V2 Mel Gorman [not found] ` <20091113200357.GO29804-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> 2009-11-26 14:45 ` Tobias Oetiker [not found] ` <alpine.DEB.2.00.0911261542500.21450-EjsAmf5DE5zIvOfxy3zmAzgUDZmNtoG9@public.gmane.org> 2009-11-29 7:42 ` still getting allocation failures (was Re: [PATCH] vmscan: Stop kswapd waiting on congestion when the min watermark is not being met V2) Tobi Oetiker 2009-12-02 11:32 ` Mel Gorman 2009-12-02 21:30 ` Tobias Oetiker [not found] ` <alpine.DEB.2.00.0912022210220.30023-EjsAmf5DE5zIvOfxy3zmAzgUDZmNtoG9@public.gmane.org> 2009-12-03 20:26 ` Corrado Zoccolo [not found] ` <4e5e476b0912031226i5b0e6cf9hdfd5519182ccdefa-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2009-12-14 5:59 ` Tobias Oetiker [not found] ` <alpine.DEB.2.00.0912140646550.12657-EjsAmf5DE5zIvOfxy3zmAzgUDZmNtoG9@public.gmane.org> 2009-12-14 8:49 ` Corrado Zoccolo 2009-11-13 18:36 ` [PATCH] vmscan: Stop kswapd waiting on congestion when the min watermark is not being met Rik van Riel 2009-11-13 14:38 ` [PATCH 5/5] vmscan: Take order into consideration when deciding if kswapd is in trouble Minchan Kim [not found] ` <1258054235-3208-6-git-send-email-mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> 2009-11-13 12:41 ` Minchan Kim [not found] ` <1258054235-3208-1-git-send-email-mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> 2009-11-13 9:04 ` [PATCH 0/5] Reduce GFP_ATOMIC allocation failures, candidate fix V3 Frans Pop [not found] ` <200911131004.25293.elendil-EIBgga6/0yRmR6Xm/wNWPw@public.gmane.org> 2009-11-16 17:57 ` Mel Gorman 2009-11-13 12:47 ` Tobias Oetiker [not found] ` <alpine.DEB.2.00.0911131346560.22447-ZxOFkyelvri73kmZFr1Urg@public.gmane.org> 2009-11-13 13:37 ` Mel Gorman 2009-11-15 12:07 ` Karol Lewandowski [not found] ` <20091115120721.GA7557-nLtalAL5mPp2RxbNQum0x1nzlInOXLuq@public.gmane.org> 2009-11-16 9:52 ` Mel Gorman 2009-11-16 12:08 ` Karol Lewandowski [not found] ` <20091116120845.GA10115-nLtalAL5mPp2RxbNQum0x1nzlInOXLuq@public.gmane.org> 2009-11-16 14:32 ` Karol Lewandowski
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).