* Re: [Alsa-user] new source of MIDI playback slow-down identified - 5a03b051ed87e72b959f32a86054e1142ac4cf55 thp: use compaction in kswapd for GFP_ATOMIC order > 0 [not found] <g0ia38-jj6.ln1@ppp121-45-136-118.lns11.adl6.internode.on.net> @ 2011-02-22 7:37 ` Clemens Ladisch 2011-02-22 7:46 ` Arthur Marsh 2011-02-22 13:40 ` Andrea Arcangeli 0 siblings, 2 replies; 31+ messages in thread From: Clemens Ladisch @ 2011-02-22 7:37 UTC (permalink / raw) To: Arthur Marsh; +Cc: Andrea Arcangeli, alsa-user, linux-kernel Arthur Marsh wrote: > I'm experiencing MIDI playback slow-downs when I'm observing kswapd0 > active (a few percent of cpu in top output) in recent kernels. > > I git-bisected the problem down to: > > commit 5a03b051ed87e72b959f32a86054e1142ac4cf55 > Author: Andrea Arcangeli <aarcange@redhat.com> > Date: Thu Jan 13 15:47:11 2011 -0800 > > thp: use compaction in kswapd for GFP_ATOMIC order > 0 > > This takes advantage of memory compaction to properly generate pages of > order > 0 if regular page reclaim fails and priority level becomes more > severe and we don't reach the proper watermarks. > > Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org> > Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> > > I ran git-bisect over the weekend, building and installing ALSA 1.0.24 > with each kernel. After identifying the above commit, I rebuilt the 2.6 > head with that commit reverted and verified that the problem was no > longer present. Apparently, huge page compaction disables interrupts for much too long. > MIDI playback was via aplaymidi -p 17:0 to a Soundblaster Audigy 2 ZS > (SB0350) wavetable. The ALSA sequencer uses either the system timer or an HR timer at 1 kHz to deliver MIDI commands (notes); the wavetable driver requires its own interrupts in regular 5.3 ms intervals. Regards, Clemens ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [Alsa-user] new source of MIDI playback slow-down identified - 5a03b051ed87e72b959f32a86054e1142ac4cf55 thp: use compaction in kswapd for GFP_ATOMIC order > 0 2011-02-22 7:37 ` [Alsa-user] new source of MIDI playback slow-down identified - 5a03b051ed87e72b959f32a86054e1142ac4cf55 thp: use compaction in kswapd for GFP_ATOMIC order > 0 Clemens Ladisch @ 2011-02-22 7:46 ` Arthur Marsh 2011-02-22 13:40 ` Andrea Arcangeli 1 sibling, 0 replies; 31+ messages in thread From: Arthur Marsh @ 2011-02-22 7:46 UTC (permalink / raw) To: Clemens Ladisch; +Cc: Andrea Arcangeli, alsa-user, linux-kernel [-- Attachment #1: Type: text/plain, Size: 1646 bytes --] Clemens Ladisch wrote, on 22/02/11 18:07: > Arthur Marsh wrote: >> I'm experiencing MIDI playback slow-downs when I'm observing kswapd0 >> active (a few percent of cpu in top output) in recent kernels. >> >> I git-bisected the problem down to: >> >> commit 5a03b051ed87e72b959f32a86054e1142ac4cf55 >> Author: Andrea Arcangeli<aarcange@redhat.com> >> Date: Thu Jan 13 15:47:11 2011 -0800 >> >> thp: use compaction in kswapd for GFP_ATOMIC order> 0 >> >> This takes advantage of memory compaction to properly generate pages of >> order> 0 if regular page reclaim fails and priority level becomes more >> severe and we don't reach the proper watermarks. >> >> Signed-off-by: Andrea Arcangeli<aarcange@redhat.com> >> Signed-off-by: Andrew Morton<akpm@linux-foundation.org> >> Signed-off-by: Linus Torvalds<torvalds@linux-foundation.org> >> >> I ran git-bisect over the weekend, building and installing ALSA 1.0.24 >> with each kernel. After identifying the above commit, I rebuilt the 2.6 >> head with that commit reverted and verified that the problem was no >> longer present. > > Apparently, huge page compaction disables interrupts for much too long. > >> MIDI playback was via aplaymidi -p 17:0 to a Soundblaster Audigy 2 ZS >> (SB0350) wavetable. > > The ALSA sequencer uses either the system timer or an HR timer at 1 kHz > to deliver MIDI commands (notes); the wavetable driver requires its own > interrupts in regular 5.3 ms intervals. > > > Regards, > Clemens > Hi, Andrea Arcangeli's "z1" patch (attached) solved the problem for me, even with significant swap activity. Regards, Arthur. [-- Attachment #2: z1 --] [-- Type: text/plain, Size: 1412 bytes --] diff --git a/mm/vmscan.c b/mm/vmscan.c index 17497d0..5718bca 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2385,7 +2385,6 @@ loop_again: * cause too much scanning of the lower zones. */ for (i = 0; i <= end_zone; i++) { - int compaction; struct zone *zone = pgdat->node_zones + i; int nr_slab; @@ -2408,7 +2407,7 @@ loop_again: * zone has way too many pages free already. */ if (!zone_watermark_ok_safe(zone, order, - 8*high_wmark_pages(zone), end_zone, 0)) + high_wmark_pages(zone), end_zone, 0)) shrink_zone(priority, zone, &sc); reclaim_state->reclaimed_slab = 0; nr_slab = shrink_slab(sc.nr_scanned, GFP_KERNEL, @@ -2416,24 +2415,23 @@ loop_again: sc.nr_reclaimed += reclaim_state->reclaimed_slab; total_scanned += sc.nr_scanned; - compaction = 0; +#if 0 if (order && zone_watermark_ok(zone, 0, high_wmark_pages(zone), end_zone, 0) && !zone_watermark_ok(zone, order, high_wmark_pages(zone), - end_zone, 0)) { + end_zone, 0)) compact_zone_order(zone, order, sc.gfp_mask, false, COMPACT_MODE_KSWAPD); - compaction = 1; - } +#endif if (zone->all_unreclaimable) continue; - if (!compaction && nr_slab == 0 && + if (nr_slab == 0 && !zone_reclaimable(zone)) zone->all_unreclaimable = 1; /* ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [Alsa-user] new source of MIDI playback slow-down identified - 5a03b051ed87e72b959f32a86054e1142ac4cf55 thp: use compaction in kswapd for GFP_ATOMIC order > 0 2011-02-22 7:37 ` [Alsa-user] new source of MIDI playback slow-down identified - 5a03b051ed87e72b959f32a86054e1142ac4cf55 thp: use compaction in kswapd for GFP_ATOMIC order > 0 Clemens Ladisch 2011-02-22 7:46 ` Arthur Marsh @ 2011-02-22 13:40 ` Andrea Arcangeli 2011-02-22 16:15 ` Andrea Arcangeli 1 sibling, 1 reply; 31+ messages in thread From: Andrea Arcangeli @ 2011-02-22 13:40 UTC (permalink / raw) To: Clemens Ladisch; +Cc: Arthur Marsh, alsa-user, linux-kernel, Mel Gorman On Tue, Feb 22, 2011 at 08:37:23AM +0100, Clemens Ladisch wrote: > Arthur Marsh wrote: > > I'm experiencing MIDI playback slow-downs when I'm observing kswapd0 > > active (a few percent of cpu in top output) in recent kernels. > > > > I git-bisected the problem down to: > > > > commit 5a03b051ed87e72b959f32a86054e1142ac4cf55 > > Author: Andrea Arcangeli <aarcange@redhat.com> > > Date: Thu Jan 13 15:47:11 2011 -0800 > > > > thp: use compaction in kswapd for GFP_ATOMIC order > 0 > > > > This takes advantage of memory compaction to properly generate pages of > > order > 0 if regular page reclaim fails and priority level becomes more > > severe and we don't reach the proper watermarks. > > > > Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> > > Signed-off-by: Andrew Morton <akpm@linux-foundation.org> > > Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> > > > > I ran git-bisect over the weekend, building and installing ALSA 1.0.24 > > with each kernel. After identifying the above commit, I rebuilt the 2.6 > > head with that commit reverted and verified that the problem was no > > longer present. > > Apparently, huge page compaction disables interrupts for much too long. > > > MIDI playback was via aplaymidi -p 17:0 to a Soundblaster Audigy 2 ZS > > (SB0350) wavetable. > > The ALSA sequencer uses either the system timer or an HR timer at 1 kHz > to deliver MIDI commands (notes); the wavetable driver requires its own > interrupts in regular 5.3 ms intervals. I asked Arthur to test this fix. We already know the attached z1 patch fixed the problem 100%. But that was a debugging patch not meant for merging, if the below works, I think we're done. This is the same approach I'm also going to test in another benchmark that also showed increased latencies that isn't related to multimedia or midi but pure nfs load with jumbo frames. The problem is the same! The below is untested, please let me know if it helps, because it may not be enough. ========= Subject: compaction: fix high latencies created by kswapd From: Andrea Arcangeli <aarcange@redhat.com> We need to cond_resched in the compaction loop to avoid hurting latencies and stop being too aggressive in kswapd, and let it go in all unreclaimable if needed (not order aware logic but it's not worth being too aggressive with expensive compaction). Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> --- diff --git a/mm/compaction.c b/mm/compaction.c index 8be430b..fef06c4 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -274,6 +274,9 @@ static unsigned long isolate_migratepages(struct zone *zone, spin_lock_irq(&zone->lru_lock); for (; low_pfn < end_pfn; low_pfn++) { struct page *page; + + cond_resched(); + if (!pfn_valid_within(low_pfn)) continue; nr_scanned++; @@ -413,15 +416,6 @@ static int compact_finished(struct zone *zone, if (cc->order == -1) return COMPACT_CONTINUE; - /* - * Generating only one page of the right order is not enough - * for kswapd, we must continue until we're above the high - * watermark as a pool for high order GFP_ATOMIC allocations - * too. - */ - if (cc->compact_mode == COMPACT_MODE_KSWAPD) - return COMPACT_CONTINUE; - /* Direct compactor: Is a suitable page free? */ for (order = cc->order; order < MAX_ORDER; order++) { /* Job done if page is free of the right migratetype */ diff --git a/mm/vmscan.c b/mm/vmscan.c index 17497d0..564771c 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2385,7 +2385,6 @@ loop_again: * cause too much scanning of the lower zones. */ for (i = 0; i <= end_zone; i++) { - int compaction; struct zone *zone = pgdat->node_zones + i; int nr_slab; @@ -2408,7 +2407,7 @@ loop_again: * zone has way too many pages free already. */ if (!zone_watermark_ok_safe(zone, order, - 8*high_wmark_pages(zone), end_zone, 0)) + high_wmark_pages(zone), end_zone, 0)) shrink_zone(priority, zone, &sc); reclaim_state->reclaimed_slab = 0; nr_slab = shrink_slab(sc.nr_scanned, GFP_KERNEL, @@ -2416,24 +2415,21 @@ loop_again: sc.nr_reclaimed += reclaim_state->reclaimed_slab; total_scanned += sc.nr_scanned; - compaction = 0; if (order && zone_watermark_ok(zone, 0, high_wmark_pages(zone), end_zone, 0) && !zone_watermark_ok(zone, order, high_wmark_pages(zone), - end_zone, 0)) { + end_zone, 0)) compact_zone_order(zone, order, sc.gfp_mask, false, COMPACT_MODE_KSWAPD); - compaction = 1; - } if (zone->all_unreclaimable) continue; - if (!compaction && nr_slab == 0 && + if (nr_slab == 0 && !zone_reclaimable(zone)) zone->all_unreclaimable = 1; /* ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [Alsa-user] new source of MIDI playback slow-down identified - 5a03b051ed87e72b959f32a86054e1142ac4cf55 thp: use compaction in kswapd for GFP_ATOMIC order > 0 2011-02-22 13:40 ` Andrea Arcangeli @ 2011-02-22 16:15 ` Andrea Arcangeli 2011-02-22 16:59 ` Mel Gorman 2011-02-22 17:47 ` Arthur Marsh 0 siblings, 2 replies; 31+ messages in thread From: Andrea Arcangeli @ 2011-02-22 16:15 UTC (permalink / raw) To: Clemens Ladisch; +Cc: Arthur Marsh, alsa-user, linux-kernel, Mel Gorman [-- Attachment #1: Type: text/plain, Size: 408 bytes --] On Tue, Feb 22, 2011 at 02:40:47PM +0100, Andrea Arcangeli wrote: > spin_lock_irq(&zone->lru_lock); > for (; low_pfn < end_pfn; low_pfn++) { > struct page *page; > + > + cond_resched(); > + my bad, see the above spin_lock_irq oops... I attached two replacement patches to apply in order (both of them should be applied at the same time on top of git upstream, and they shouldn't lockup this time). [-- Attachment #2: kswapd-high_wmark --] [-- Type: text/plain, Size: 1853 bytes --] Subject: vmscan: kswapd must not free more than high_wmark pages From: Andrea Arcangeli <aarcange@redhat.com> When the min_free_kbytes is set with `hugeadm --set-recommended-min_free_kbytes" or with THP enabled (which runs the equivalent of "hugeadm --set-recommended-min_free_kbytes" to activate anti-frag at full effectiveness automatically at boot) the high wmark of some zone is as high as ~88M. 88M free on a 4G system isn't horrible, but 88M*8 = 704M free on a 4G system is definitely unbearable. This only tends to be visible on 4G systems with tiny over-4g zone where kswapd insists to reach the high wmark on the over-4g zone but doing so it shrunk up to 704M from the normal zone by mistake. For the trivial case where kswapd isn't waken until all zones hit the low wmark and there is no concurrency between allocator and kswapd freeing, rotating more the tiny above4g lru than "high-low" despite we only allocated "high-low" cache into it doesn't sound obviously right either. Bigger gap to me looks like will do more harm than good and if we need a real guarantee of balancing we should rotate the allocations across the zones (bigger lru in a zone will require it to be hit more frequently because it'll rotate slower than the other zones, the bias should not even dependent on the zone size but on the lru size). Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> --- mm/vmscan.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2408,7 +2408,7 @@ loop_again: * zone has way too many pages free already. */ if (!zone_watermark_ok_safe(zone, order, - 8*high_wmark_pages(zone), end_zone, 0)) + high_wmark_pages(zone), end_zone, 0)) shrink_zone(priority, zone, &sc); reclaim_state->reclaimed_slab = 0; nr_slab = shrink_slab(sc.nr_scanned, GFP_KERNEL, [-- Attachment #3: compaction-kswapd --] [-- Type: text/plain, Size: 2230 bytes --] --- mm/compaction.c | 19 ++++++++++--------- mm/vmscan.c | 8 ++------ 2 files changed, 12 insertions(+), 15 deletions(-) --- a/mm/compaction.c +++ b/mm/compaction.c @@ -271,9 +271,19 @@ static unsigned long isolate_migratepage } /* Time to isolate some pages for migration */ + cond_resched(); spin_lock_irq(&zone->lru_lock); for (; low_pfn < end_pfn; low_pfn++) { struct page *page; + + if (need_resched() || spin_is_contended(&zone->lru_lock)) { + if (fatal_signal_pending(current)) + break; + spin_unlock_irq(&zone->lru_lock); + cond_resched(); + spin_lock_irq(&zone->lru_lock); + } + if (!pfn_valid_within(low_pfn)) continue; nr_scanned++; @@ -413,15 +423,6 @@ static int compact_finished(struct zone if (cc->order == -1) return COMPACT_CONTINUE; - /* - * Generating only one page of the right order is not enough - * for kswapd, we must continue until we're above the high - * watermark as a pool for high order GFP_ATOMIC allocations - * too. - */ - if (cc->compact_mode == COMPACT_MODE_KSWAPD) - return COMPACT_CONTINUE; - /* Direct compactor: Is a suitable page free? */ for (order = cc->order; order < MAX_ORDER; order++) { /* Job done if page is free of the right migratetype */ --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2385,7 +2385,6 @@ loop_again: * cause too much scanning of the lower zones. */ for (i = 0; i <= end_zone; i++) { - int compaction; struct zone *zone = pgdat->node_zones + i; int nr_slab; @@ -2416,24 +2415,21 @@ loop_again: sc.nr_reclaimed += reclaim_state->reclaimed_slab; total_scanned += sc.nr_scanned; - compaction = 0; if (order && zone_watermark_ok(zone, 0, high_wmark_pages(zone), end_zone, 0) && !zone_watermark_ok(zone, order, high_wmark_pages(zone), - end_zone, 0)) { + end_zone, 0)) compact_zone_order(zone, order, sc.gfp_mask, false, COMPACT_MODE_KSWAPD); - compaction = 1; - } if (zone->all_unreclaimable) continue; - if (!compaction && nr_slab == 0 && + if (nr_slab == 0 && !zone_reclaimable(zone)) zone->all_unreclaimable = 1; /* ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [Alsa-user] new source of MIDI playback slow-down identified - 5a03b051ed87e72b959f32a86054e1142ac4cf55 thp: use compaction in kswapd for GFP_ATOMIC order > 0 2011-02-22 16:15 ` Andrea Arcangeli @ 2011-02-22 16:59 ` Mel Gorman 2011-02-22 17:08 ` Andrea Arcangeli 2011-02-22 17:47 ` Arthur Marsh 1 sibling, 1 reply; 31+ messages in thread From: Mel Gorman @ 2011-02-22 16:59 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Clemens Ladisch, Arthur Marsh, alsa-user, linux-kernel On Tue, Feb 22, 2011 at 05:15:13PM +0100, Andrea Arcangeli wrote: > --- > mm/compaction.c | 19 ++++++++++--------- > mm/vmscan.c | 8 ++------ > 2 files changed, 12 insertions(+), 15 deletions(-) > > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -271,9 +271,19 @@ static unsigned long isolate_migratepage > } > > /* Time to isolate some pages for migration */ > + cond_resched(); > spin_lock_irq(&zone->lru_lock); > for (; low_pfn < end_pfn; low_pfn++) { > struct page *page; > + > + if (need_resched() || spin_is_contended(&zone->lru_lock)) { > + if (fatal_signal_pending(current)) > + break; > + spin_unlock_irq(&zone->lru_lock); > + cond_resched(); > + spin_lock_irq(&zone->lru_lock); > + } > + There is a small chance that if the lock is contended, the current CPU will simply reacquire the lock. Any idea how likely that is? The need_resched() check itself seems reasonable and should reduce the length of time interrupts are disabled. > if (!pfn_valid_within(low_pfn)) > continue; > nr_scanned++; > @@ -413,15 +423,6 @@ static int compact_finished(struct zone > if (cc->order == -1) > return COMPACT_CONTINUE; > > - /* > - * Generating only one page of the right order is not enough > - * for kswapd, we must continue until we're above the high > - * watermark as a pool for high order GFP_ATOMIC allocations > - * too. > - */ > - if (cc->compact_mode == COMPACT_MODE_KSWAPD) > - return COMPACT_CONTINUE; > - Why is this change necessary? kswapd may go to sleep sooner as a result of this change but it doesn't affect the length of time interrupts are disabled. Some other latency problem you've found? > /* Direct compactor: Is a suitable page free? */ > for (order = cc->order; order < MAX_ORDER; order++) { > /* Job done if page is free of the right migratetype */ > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -2385,7 +2385,6 @@ loop_again: > * cause too much scanning of the lower zones. > */ > for (i = 0; i <= end_zone; i++) { > - int compaction; > struct zone *zone = pgdat->node_zones + i; > int nr_slab; > > @@ -2416,24 +2415,21 @@ loop_again: > sc.nr_reclaimed += reclaim_state->reclaimed_slab; > total_scanned += sc.nr_scanned; > > - compaction = 0; > if (order && > zone_watermark_ok(zone, 0, > high_wmark_pages(zone), > end_zone, 0) && > !zone_watermark_ok(zone, order, > high_wmark_pages(zone), > - end_zone, 0)) { > + end_zone, 0)) > compact_zone_order(zone, > order, > sc.gfp_mask, false, > COMPACT_MODE_KSWAPD); > - compaction = 1; > - } > > if (zone->all_unreclaimable) > continue; > - if (!compaction && nr_slab == 0 && > + if (nr_slab == 0 && > !zone_reclaimable(zone)) > zone->all_unreclaimable = 1; I'm not seeing how this change is related to interrupts either. The intention of the current code is that after compaction, a zone should not be considered all_unreclaimnable. The reason is that there was enough free memory before compaction started but compaction takes some time during which kswapd is not reclaiming pages at all. The view of the zone before and after compaction is not directly related to all_unreclaimable so all_reclaimable should only be set after shrinking a zone and there is insufficient free memory to meet watermarks. -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [Alsa-user] new source of MIDI playback slow-down identified - 5a03b051ed87e72b959f32a86054e1142ac4cf55 thp: use compaction in kswapd for GFP_ATOMIC order > 0 2011-02-22 16:59 ` Mel Gorman @ 2011-02-22 17:08 ` Andrea Arcangeli 2011-02-22 17:37 ` Mel Gorman 0 siblings, 1 reply; 31+ messages in thread From: Andrea Arcangeli @ 2011-02-22 17:08 UTC (permalink / raw) To: Mel Gorman; +Cc: Clemens Ladisch, Arthur Marsh, alsa-user, linux-kernel On Tue, Feb 22, 2011 at 04:59:45PM +0000, Mel Gorman wrote: > There is a small chance that if the lock is contended, the current CPU > will simply reacquire the lock. Any idea how likely that is? The > need_resched() check itself seems reasonable and should reduce the > length of time interrupts are disabled. If the loop is short the contention probability should be small. I mostly added it because that's the way cond_resched_lock does it. I thought it was better anyway. > Why is this change necessary? kswapd may go to sleep sooner as a result > of this change but it doesn't affect the length of time interrupts are > disabled. Some other latency problem you've found? It's not. But I don't want to run more than 1 loop. Otherwise I'm afraid that kswapd will generate a too big high load. > I'm not seeing how this change is related to interrupts either. The intention > of the current code is that after compaction, a zone should not be considered > all_unreclaimnable. The reason is that there was enough free memory > before compaction started but compaction takes some time during which > kswapd is not reclaiming pages at all. The view of the zone before and > after compaction is not directly related to all_unreclaimable so > all_reclaimable should only be set after shrinking a zone and there is > insufficient free memory to meet watermarks. There is not just the interrupt issue. There's also a problem that kswapd is generating a too high load. And I'm afraid what can happen is that kswapd should go in all reclaimable state and it doesn't because there was also an high order allocation in the mix. So I prefer to obey to the order=0 all unreclaimable logic with higher priority. The freeing-max one page above is also to run max 1 scan over all pfn before putting kswapd in all unreclaimable state. The probability that a GFP_ATOMIC allocation improves performance thanks to being "jumbo" more than one entire scan of the pfn in the system sounds quite small. If all goes well kswapd will generate more than one atomic page. Also it's good to keep the COMPACTION_KSWAPD mode to differentiate the low/high wmark (with kswapd checking the high one if not even a page of the right order is available). ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [Alsa-user] new source of MIDI playback slow-down identified - 5a03b051ed87e72b959f32a86054e1142ac4cf55 thp: use compaction in kswapd for GFP_ATOMIC order > 0 2011-02-22 17:08 ` Andrea Arcangeli @ 2011-02-22 17:37 ` Mel Gorman 0 siblings, 0 replies; 31+ messages in thread From: Mel Gorman @ 2011-02-22 17:37 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Clemens Ladisch, Arthur Marsh, alsa-user, linux-kernel On Tue, Feb 22, 2011 at 06:08:50PM +0100, Andrea Arcangeli wrote: > On Tue, Feb 22, 2011 at 04:59:45PM +0000, Mel Gorman wrote: > > There is a small chance that if the lock is contended, the current CPU > > will simply reacquire the lock. Any idea how likely that is? The > > need_resched() check itself seems reasonable and should reduce the > > length of time interrupts are disabled. > > If the loop is short the contention probability should be small. I > mostly added it because that's the way cond_resched_lock does it. I > thought it was better anyway. > Ok. > > Why is this change necessary? kswapd may go to sleep sooner as a result > > of this change but it doesn't affect the length of time interrupts are > > disabled. Some other latency problem you've found? > > It's not. But I don't want to run more than 1 loop. Otherwise I'm > afraid that kswapd will generate a too big high load. > It's a possibility. The intention was to keep compacting for high-order GFP_ATOMIC allocations but granted, this is not a strong justification. It occurred to me as well that while kswapd is doing this, no pages are being reclaimed. This could result in direct reclaimers being more frequent. I don't have data on how much this helps GFP_ATOMIC allocations but it's easier to imagine how it could increase latencies due to increased direct reclaim. > > I'm not seeing how this change is related to interrupts either. The intention > > of the current code is that after compaction, a zone should not be considered > > all_unreclaimnable. The reason is that there was enough free memory > > before compaction started but compaction takes some time during which > > kswapd is not reclaiming pages at all. The view of the zone before and > > after compaction is not directly related to all_unreclaimable so > > all_reclaimable should only be set after shrinking a zone and there is > > insufficient free memory to meet watermarks. > > There is not just the interrupt issue. There's also a problem that > kswapd is generating a too high load. And I'm afraid what can happen > is that kswapd should go in all reclaimable state and it doesn't > because there was also an high order allocation in the mix. Why should it go into an all_unreclaimable state after compaction when it hasn't been reclaiming pages though? A side-effect of all_unreclaimable is that the zone is considered balanced and so kswapd will consume less CPU by going to sleep because "all zones are balanced" but it feels like accidental behaviour. > So I > prefer to obey to the order=0 all unreclaimable logic with higher > priority. The freeing-max one page above is also to run max 1 scan > over all pfn before putting kswapd in all unreclaimable state. The > probability that a GFP_ATOMIC allocation improves performance thanks > to being "jumbo" more than one entire scan of the pfn in the system > sounds quite small. If all goes well kswapd will generate more than > one atomic page. Also it's good to keep the COMPACTION_KSWAPD mode to > differentiate the low/high wmark (with kswapd checking the high one if > not even a page of the right order is available). > Making kswapd more aggressive in compaction was intended to help high-order GFP_ATOMIC allocations. If them being sucecssful is no longer a big issue and failures are infrequent and tolerated, then it's ok to allow kswapd to sleep earlier. Unfortunately, I don't have any testcases that exercise these type of allocations but it'd be nice if those tests can be rerun. So of the three changes in the patch (which hopefully will be three patches eventually); Change 1 reduces the time interrupts are disabled. Hard to argue with that - the new behaviour is reasonable. Change 2 makes kswapd give up compaction earlier and go back to reclaiming pages. Potentially kswapd will go to sleep sooner and consume less CPU. At worst, high-order GFP_ATOMIC allocations may fail more frequently. It'd be nice to test the relevant workloads again to make sure they are not impaired. If they are not, then kswapd going back to sleep sooner is desirable and the change makes sense. Change 3 potentially puts kswapd to sleep sooner but it's marking a zone all_unreclaimable when it's not necessarily in that state. Potentially, kswapd for order-0 will later skip over that zone and reclaim no pages from it until a page is freed in that zone resetting the flag. Doesn't seem right :( -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [Alsa-user] new source of MIDI playback slow-down identified - 5a03b051ed87e72b959f32a86054e1142ac4cf55 thp: use compaction in kswapd for GFP_ATOMIC order > 0 2011-02-22 16:15 ` Andrea Arcangeli 2011-02-22 16:59 ` Mel Gorman @ 2011-02-22 17:47 ` Arthur Marsh 2011-02-22 19:43 ` Andrea Arcangeli 2011-02-23 16:24 ` Andrea Arcangeli 1 sibling, 2 replies; 31+ messages in thread From: Arthur Marsh @ 2011-02-22 17:47 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Clemens Ladisch, alsa-user, linux-kernel, Mel Gorman Andrea Arcangeli wrote, on 23/02/11 02:45: > On Tue, Feb 22, 2011 at 02:40:47PM +0100, Andrea Arcangeli wrote: >> spin_lock_irq(&zone->lru_lock); >> for (; low_pfn< end_pfn; low_pfn++) { >> struct page *page; >> + >> + cond_resched(); >> + > > my bad, see the above spin_lock_irq oops... > > I attached two replacement patches to apply in order (both of them > should be applied at the same time on top of git upstream, and they > shouldn't lockup this time). OK, these patches applied together against upstream didn't cause a crash but I did observe: significant slowdowns of MIDI playback (moreso than in previous cases, and with less than 20 Meg of swap file in use); kswapd0 sharing equal top place in CPU usage at times (e.g. 20 percent). If I should try only one of the patches or something else entirely, please let me know. Regards, Arthur. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [Alsa-user] new source of MIDI playback slow-down identified - 5a03b051ed87e72b959f32a86054e1142ac4cf55 thp: use compaction in kswapd for GFP_ATOMIC order > 0 2011-02-22 17:47 ` Arthur Marsh @ 2011-02-22 19:43 ` Andrea Arcangeli 2011-02-23 9:15 ` Mel Gorman 2011-02-23 16:24 ` Andrea Arcangeli 1 sibling, 1 reply; 31+ messages in thread From: Andrea Arcangeli @ 2011-02-22 19:43 UTC (permalink / raw) To: Arthur Marsh; +Cc: Clemens Ladisch, alsa-user, linux-kernel, Mel Gorman On Wed, Feb 23, 2011 at 04:17:44AM +1030, Arthur Marsh wrote: > OK, these patches applied together against upstream didn't cause a crash > but I did observe: > > significant slowdowns of MIDI playback (moreso than in previous cases, > and with less than 20 Meg of swap file in use); > > kswapd0 sharing equal top place in CPU usage at times (e.g. 20 percent). > > If I should try only one of the patches or something else entirely, > please let me know. For Mel: with z1, kswapd used only 0.1-3.9 percent of CPU while he loaded other applications. We may need a way to put kswapd in all uncompactable mode to solve this, logic 3 just trying not to disable the all unreclaimable logic seems not enough. I.e. if compact_zone_order doesn't return COMPACT_COMPLETE, stop the compaction loop in kswapd. Then we can put back in the COMPACT_MODE_KSWAPD return COMPACT_CONTINUE in compact_finished as the caller will throttle it (and it won't run more than one scan before putting kswapd to sleep in all uncompactable mode). ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [Alsa-user] new source of MIDI playback slow-down identified - 5a03b051ed87e72b959f32a86054e1142ac4cf55 thp: use compaction in kswapd for GFP_ATOMIC order > 0 2011-02-22 19:43 ` Andrea Arcangeli @ 2011-02-23 9:15 ` Mel Gorman 2011-02-23 11:41 ` Arthur Marsh 0 siblings, 1 reply; 31+ messages in thread From: Mel Gorman @ 2011-02-23 9:15 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Arthur Marsh, Clemens Ladisch, alsa-user, linux-kernel On Tue, Feb 22, 2011 at 08:43:25PM +0100, Andrea Arcangeli wrote: > On Wed, Feb 23, 2011 at 04:17:44AM +1030, Arthur Marsh wrote: > > OK, these patches applied together against upstream didn't cause a crash > > but I did observe: > > > > significant slowdowns of MIDI playback (moreso than in previous cases, > > and with less than 20 Meg of swap file in use); > > Can this be replicated with on-board audio hardware or is there something specific about the card? e.g. those typically driven by snd-intel8x0 or snd_hda_intel Unfortunately the original mail is a bit light on details on how this was reproduced and I didn't find a thread with more details. It looks like it's simply playing a midi file while the system is under load but less clear on what the symptoms are (audio skipping maybe?). I'll start with using irqs-off tracer to see can I replicate a similar style of issue without depending on sound. > > kswapd0 sharing equal top place in CPU usage at times (e.g. 20 percent). > > > > If I should try only one of the patches or something else entirely, > > please let me know. > > For Mel: with z1, kswapd used only 0.1-3.9 percent of CPU while he > loaded other applications. > What's the usage otherwise? As that patch has been NAKd by Rik on the grounds it eliminates the "balance gap" entirely, can it be checked if the patch below keeps the CPU usage low as well please? > We may need a way to put kswapd in all uncompactable mode to solve > this, logic 3 just trying not to disable the all unreclaimable logic > seems not enough. I.e. if compact_zone_order doesn't return > COMPACT_COMPLETE, stop the compaction loop in kswapd. Then we can put > back in the COMPACT_MODE_KSWAPD return COMPACT_CONTINUE in > compact_finished as the caller will throttle it (and it won't run more > than one scan before putting kswapd to sleep in all uncompactable mode). > Should be possible. I'm not going to develop such a patch now though and instead have a stab at replicating some of the exhibited problems (high kswapd CPU usage, long interrupt disabled times etc). Can the following patch be tested as a potential replacement for z1 please? ==== CUT HERE ==== vmscan: kswapd should not free an excessive number of pages when balancing small zones When reclaiming for order-0 pages, kswapd requires that all zones be balanced. Each cycle through balance_pgdat() does background ageing on all zones if necessary and applies equal pressure on the inactive zone unless a lot of pages are free already. A "lot of free pages" is defined as a "balance gap" above the high watermark which is currently 7*high_watermark. Historically this was reasonable as min_free_kbytes was small. However, on systems using huge pages, it is recommended that min_free_kbytes is higher and it is tuned with hugeadm --set-recommended-min_free_kbytes. With the introduction of transparent huge page support, this recommended value is also applied. On X86-64 with 4G of memory, min_free_kbytes becomes 67584 so one would expect around 68M of memory to be free. The Normal zone is approximately 35000 pages so under even normal memory pressure such as copying a large file, it gets exhausted quickly. As it is getting exhausted, kswapd applies pressure equally to all zones, including the DMA32 zone. DMA32 is approximately 700,000 pages with a high watermark of around 23,000 pages. In this situation, kswapd will reclaim around (23000*8 where 8 is the high watermark + balance gap of 7 * high watermark) pages or 718M of pages before the zone is ignored. What the user sees is that free memory far higher than it should be. To avoid an excessive number of pages being reclaimed from the larger zones, explicitely defines the "balance gap" to be either 1% of the zone or the low watermark for the zone, whichever is smaller. While kswapd will check all zones to apply pressure, it'll ignore zones that meets the (high_wmark + balance_gap) watermark. To test this, 80G were copied from a paritition and the amount of memory being used was recorded. A comparison of a patch and unpatched kernel can be seen at http://www.csn.ul.ie/~mel/postings/minfree-20110222/memory-usage-hydra.ps and shows that kswapd is not reclaiming as much memory with the patch applied. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Mel Gorman <mel@csn.ul.ie> --- include/linux/swap.h | 9 +++++++++ mm/vmscan.c | 16 +++++++++++++--- 2 files changed, 22 insertions(+), 3 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index 4d55932..a57c6e7 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -155,6 +155,15 @@ enum { #define SWAP_CLUSTER_MAX 32 #define COMPACT_CLUSTER_MAX SWAP_CLUSTER_MAX +/* + * Ratio between the present memory in the zone and the "gap" that + * we're allowing kswapd to shrink in addition to the per-zone high + * wmark, even for zones that already have the high wmark satisfied, + * in order to provide better per-zone lru behavior. We are ok to + * spend not more than 1% of the memory for this zone balancing "gap". + */ +#define KSWAPD_ZONE_BALANCE_GAP_RATIO 100 + #define SWAP_MAP_MAX 0x3e /* Max duplication count, in first swap_map */ #define SWAP_MAP_BAD 0x3f /* Note pageblock is bad, in first swap_map */ #define SWAP_HAS_CACHE 0x40 /* Flag page is cached, in first swap_map */ diff --git a/mm/vmscan.c b/mm/vmscan.c index 17497d0..0c83530 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2388,6 +2388,7 @@ loop_again: int compaction; struct zone *zone = pgdat->node_zones + i; int nr_slab; + unsigned long balance_gap; if (!populated_zone(zone)) continue; @@ -2404,11 +2405,20 @@ loop_again: mem_cgroup_soft_limit_reclaim(zone, order, sc.gfp_mask); /* - * We put equal pressure on every zone, unless one - * zone has way too many pages free already. + * We put equal pressure on every zone, unless + * one zone has way too many pages free + * already. The "too many pages" is defined + * as the high wmark plus a "gap" where the + * gap is either the low watermark or 1% + * of the zone, whichever is smaller. */ + balance_gap = min(low_wmark_pages(zone), + (zone->present_pages + + KSWAPD_ZONE_BALANCE_GAP_RATIO-1) / + KSWAPD_ZONE_BALANCE_GAP_RATIO); if (!zone_watermark_ok_safe(zone, order, - 8*high_wmark_pages(zone), end_zone, 0)) + high_wmark_pages(zone) + balance_gap, + end_zone, 0)) shrink_zone(priority, zone, &sc); reclaim_state->reclaimed_slab = 0; nr_slab = shrink_slab(sc.nr_scanned, GFP_KERNEL, ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [Alsa-user] new source of MIDI playback slow-down identified - 5a03b051ed87e72b959f32a86054e1142ac4cf55 thp: use compaction in kswapd for GFP_ATOMIC order > 0 2011-02-23 9:15 ` Mel Gorman @ 2011-02-23 11:41 ` Arthur Marsh 2011-02-23 13:50 ` Clemens Ladisch 2011-02-23 17:01 ` Mel Gorman 0 siblings, 2 replies; 31+ messages in thread From: Arthur Marsh @ 2011-02-23 11:41 UTC (permalink / raw) To: Mel Gorman; +Cc: Andrea Arcangeli, Clemens Ladisch, alsa-user, linux-kernel Mel Gorman wrote, on 23/02/11 19:45: > On Tue, Feb 22, 2011 at 08:43:25PM +0100, Andrea Arcangeli wrote: >> On Wed, Feb 23, 2011 at 04:17:44AM +1030, Arthur Marsh wrote: >>> OK, these patches applied together against upstream didn't cause a crash >>> but I did observe: >>> >>> significant slowdowns of MIDI playback (moreso than in previous cases, >>> and with less than 20 Meg of swap file in use); >>> > > Can this be replicated with on-board audio hardware or is there something > specific about the card? e.g. those typically driven by snd-intel8x0 or > snd_hda_intel This card (Soundblaster Audigy 2 ZS - SB0350) has an on-board wavetable synthesiser. To perform a test with sound hardware included on a motherboard, one would need to connect the pc to an external synthesiser to play the MIDI signal from the sound hardware on the motherboard. To quote an earlier posting from Clemens Ladisch: "The ALSA sequencer uses either the system timer or an HR timer at 1 kHz to deliver MIDI commands (notes); the wavetable driver requires its own interrupts in regular 5.3 ms intervals." The wavetable interrupts are apparently being lost. I am not getting lost MIDI events (ie notes going missing or being stuck on), but slowdown in play-back. > > Unfortunately the original mail is a bit light on details on how this was > reproduced and I didn't find a thread with more details. It looks like it's > simply playing a midi file while the system is under load but less clear > on what the symptoms are (audio skipping maybe?). I'll start with using > irqs-off tracer to see can I replicate a similar style of issue without > depending on sound. Clemens, would this work to identify the problem without relying on a device such as a sound card with a wavetable synthesiser or external synthesiser receiving MIDI signals from the pc? > >>> kswapd0 sharing equal top place in CPU usage at times (e.g. 20 percent). >>> >>> If I should try only one of the patches or something else entirely, >>> please let me know. >> >> For Mel: with z1, kswapd used only 0.1-3.9 percent of CPU while he >> loaded other applications. >> > > What's the usage otherwise? > > As that patch has been NAKd by Rik on the grounds it eliminates the "balance > gap" entirely, can it be checked if the patch below keeps the CPU usage low > as well please? Unfortunately, when I loaded KDE 3.5.X, konversation, aptitude -u, and icedove with Mel's patch in the last few minutes, kswapd0 ran up to 17.5 percent CPU usage and the playback using aplaymidi slowed down. Load was around 6 and swap usage had barely started when I noticed the slowdown in MIDI playback. By contrast, with the "z1" patch I could load the pc as above and also starting iceweasel with a few dozen tabs open, and play the MIDI file by using the QT application Rosegarden instead of aplaymidi, and not have any slowdown, even though the swap usage was up around 200 MB and load was over 10.0. > >> We may need a way to put kswapd in all uncompactable mode to solve >> this, logic 3 just trying not to disable the all unreclaimable logic >> seems not enough. I.e. if compact_zone_order doesn't return >> COMPACT_COMPLETE, stop the compaction loop in kswapd. Then we can put >> back in the COMPACT_MODE_KSWAPD return COMPACT_CONTINUE in >> compact_finished as the caller will throttle it (and it won't run more >> than one scan before putting kswapd to sleep in all uncompactable mode). >> > > Should be possible. I'm not going to develop such a patch now though and > instead have a stab at replicating some of the exhibited problems (high > kswapd CPU usage, long interrupt disabled times etc). > > Can the following patch be tested as a potential replacement for z1 > please? > > ==== CUT HERE ==== > vmscan: kswapd should not free an excessive number of pages when balancing small zones > > When reclaiming for order-0 pages, kswapd requires that all zones be > balanced. Each cycle through balance_pgdat() does background ageing on all > zones if necessary and applies equal pressure on the inactive zone unless > a lot of pages are free already. > > A "lot of free pages" is defined as a "balance gap" above the high watermark > which is currently 7*high_watermark. Historically this was reasonable as > min_free_kbytes was small. However, on systems using huge pages, it is > recommended that min_free_kbytes is higher and it is tuned with hugeadm > --set-recommended-min_free_kbytes. With the introduction of transparent > huge page support, this recommended value is also applied. On X86-64 with > 4G of memory, min_free_kbytes becomes 67584 so one would expect around 68M > of memory to be free. The Normal zone is approximately 35000 pages so under > even normal memory pressure such as copying a large file, it gets exhausted > quickly. As it is getting exhausted, kswapd applies pressure equally to all > zones, including the DMA32 zone. DMA32 is approximately 700,000 pages with > a high watermark of around 23,000 pages. In this situation, kswapd will > reclaim around (23000*8 where 8 is the high watermark + balance gap of 7 * > high watermark) pages or 718M of pages before the zone is ignored. What > the user sees is that free memory far higher than it should be. > > To avoid an excessive number of pages being reclaimed from the larger zones, > explicitely defines the "balance gap" to be either 1% of the zone or the > low watermark for the zone, whichever is smaller. While kswapd will check > all zones to apply pressure, it'll ignore zones that meets the (high_wmark + > balance_gap) watermark. > > To test this, 80G were copied from a paritition and the amount of memory > being used was recorded. A comparison of a patch and unpatched kernel > can be seen at > http://www.csn.ul.ie/~mel/postings/minfree-20110222/memory-usage-hydra.ps > and shows that kswapd is not reclaiming as much memory with the patch > applied. > > Signed-off-by: Andrea Arcangeli<aarcange@redhat.com> > Signed-off-by: Mel Gorman<mel@csn.ul.ie> > --- > include/linux/swap.h | 9 +++++++++ > mm/vmscan.c | 16 +++++++++++++--- > 2 files changed, 22 insertions(+), 3 deletions(-) > > diff --git a/include/linux/swap.h b/include/linux/swap.h > index 4d55932..a57c6e7 100644 > --- a/include/linux/swap.h > +++ b/include/linux/swap.h > @@ -155,6 +155,15 @@ enum { > #define SWAP_CLUSTER_MAX 32 > #define COMPACT_CLUSTER_MAX SWAP_CLUSTER_MAX > > +/* > + * Ratio between the present memory in the zone and the "gap" that > + * we're allowing kswapd to shrink in addition to the per-zone high > + * wmark, even for zones that already have the high wmark satisfied, > + * in order to provide better per-zone lru behavior. We are ok to > + * spend not more than 1% of the memory for this zone balancing "gap". > + */ > +#define KSWAPD_ZONE_BALANCE_GAP_RATIO 100 > + > #define SWAP_MAP_MAX 0x3e /* Max duplication count, in first swap_map */ > #define SWAP_MAP_BAD 0x3f /* Note pageblock is bad, in first swap_map */ > #define SWAP_HAS_CACHE 0x40 /* Flag page is cached, in first swap_map */ > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 17497d0..0c83530 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -2388,6 +2388,7 @@ loop_again: > int compaction; > struct zone *zone = pgdat->node_zones + i; > int nr_slab; > + unsigned long balance_gap; > > if (!populated_zone(zone)) > continue; > @@ -2404,11 +2405,20 @@ loop_again: > mem_cgroup_soft_limit_reclaim(zone, order, sc.gfp_mask); > > /* > - * We put equal pressure on every zone, unless one > - * zone has way too many pages free already. > + * We put equal pressure on every zone, unless > + * one zone has way too many pages free > + * already. The "too many pages" is defined > + * as the high wmark plus a "gap" where the > + * gap is either the low watermark or 1% > + * of the zone, whichever is smaller. > */ > + balance_gap = min(low_wmark_pages(zone), > + (zone->present_pages + > + KSWAPD_ZONE_BALANCE_GAP_RATIO-1) / > + KSWAPD_ZONE_BALANCE_GAP_RATIO); > if (!zone_watermark_ok_safe(zone, order, > - 8*high_wmark_pages(zone), end_zone, 0)) > + high_wmark_pages(zone) + balance_gap, > + end_zone, 0)) > shrink_zone(priority, zone,&sc); > reclaim_state->reclaimed_slab = 0; > nr_slab = shrink_slab(sc.nr_scanned, GFP_KERNEL, > Regards, Arthur. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [Alsa-user] new source of MIDI playback slow-down identified - 5a03b051ed87e72b959f32a86054e1142ac4cf55 thp: use compaction in kswapd for GFP_ATOMIC order > 0 2011-02-23 11:41 ` Arthur Marsh @ 2011-02-23 13:50 ` Clemens Ladisch 2011-02-23 17:01 ` Mel Gorman 1 sibling, 0 replies; 31+ messages in thread From: Clemens Ladisch @ 2011-02-23 13:50 UTC (permalink / raw) To: Arthur Marsh, Mel Gorman; +Cc: Andrea Arcangeli, alsa-user, linux-kernel Arthur Marsh wrote: > Mel Gorman wrote, on 23/02/11 19:45: > > Unfortunately the original mail is a bit light on details on how this was > > reproduced and I didn't find a thread with more details. It looks like it's > > simply playing a midi file while the system is under load but less clear > > on what the symptoms are (audio skipping maybe?). The synthesizer hardware continues to generate sound even if the computer does not update its settings, it's just that the timings are off. Even a few milliseconds of jitter are audible. > > I'll start with using irqs-off tracer to see can I replicate > > a similar style of issue without depending on sound. > > Clemens, would this work to identify the problem without relying on a > device such as a sound card with a wavetable synthesiser or external > synthesiser receiving MIDI signals from the pc? Yes (assuming that it is indeed delayed interrupts that cause the slowdowns, but I see no other mechanism how memory compaction could affect the sequencer timer). Any other job with realtime constraints should be similarly affected, but the tracer measures the problem directly at its source. Regards, Clemens ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [Alsa-user] new source of MIDI playback slow-down identified - 5a03b051ed87e72b959f32a86054e1142ac4cf55 thp: use compaction in kswapd for GFP_ATOMIC order > 0 2011-02-23 11:41 ` Arthur Marsh 2011-02-23 13:50 ` Clemens Ladisch @ 2011-02-23 17:01 ` Mel Gorman 2011-02-23 17:40 ` Andrea Arcangeli 1 sibling, 1 reply; 31+ messages in thread From: Mel Gorman @ 2011-02-23 17:01 UTC (permalink / raw) To: Arthur Marsh; +Cc: Andrea Arcangeli, Clemens Ladisch, alsa-user, linux-kernel On Wed, Feb 23, 2011 at 10:11:33PM +1030, Arthur Marsh wrote: > > > Mel Gorman wrote, on 23/02/11 19:45: >> On Tue, Feb 22, 2011 at 08:43:25PM +0100, Andrea Arcangeli wrote: >>> On Wed, Feb 23, 2011 at 04:17:44AM +1030, Arthur Marsh wrote: >>>> OK, these patches applied together against upstream didn't cause a crash >>>> but I did observe: >>>> >>>> significant slowdowns of MIDI playback (moreso than in previous cases, >>>> and with less than 20 Meg of swap file in use); >>>> >> >> Can this be replicated with on-board audio hardware or is there something >> specific about the card? e.g. those typically driven by snd-intel8x0 or >> snd_hda_intel > > This card (Soundblaster Audigy 2 ZS - SB0350) has an on-board wavetable > synthesiser. To perform a test with sound hardware included on a > motherboard, one would need to connect the pc to an external synthesiser > to play the MIDI signal from the sound hardware on the motherboard. > Nuts. Nonetheless, I wrote some tools to track worst IRQ-disabled latencies over time and identify the top offenders. Sure enough in my own very basic tests, compaction was showing up as disabling IRQs for a massive length of time. It wasn't a common occurance in my tests but they are very basic and it's not necessarily the *only* source of IRQs being disabled too long but it was the first one I found. I regret that yet more patches are being blasted around but I hope the included figures will convince you to run yet-another-test. I don't know how you are measuring how long IRQs are being disabled and when it's a problem but it's what I'm more interested in right now than the kswapd CPU usage which I'm still in the process of gathering data to analyse. ==== CUT HERE ==== mm: compaction: Minimise the time IRQs are disabled while isolating free pages compaction_alloc() isolates free pages to be used as migration targets. While its scanning, IRQs are disabled on the mistaken assumption the scanning was short. Analysis showed that IRQs were in fact being disabled for a substantial length of time. A simple test was run using large anonymous mappings with transparent hugepage support enabled to trigger frequent compactions. A monitor sampled what the worst IRQs-off latencies were and a post-processing tool found the following; Event compaction_alloc..compaction_alloc 3014 us count 1 Event compaction_alloc..compaction_alloc 2954 us count 1 Event compaction_alloc..compaction_alloc 1803 us count 1 Event compaction_alloc..compaction_alloc 1303 us count 1 Event compaction_alloc..compaction_alloc 1291 us count 1 Event compaction_alloc..compaction_alloc 911 us count 1 Event compaction_alloc..compaction_alloc 753 us count 1 Event compaction_alloc..compaction_alloc 747 us count 1 Event compaction_alloc..compaction_alloc 610 us count 1 Event split_huge_page..add_to_swap 583 us count 1 Event split_huge_page..add_to_swap 531 us count 1 Event split_huge_page..add_to_swap 262 us count 1 Event split_huge_page..add_to_swap 258 us count 1 Event split_huge_page..add_to_swap 256 us count 1 Event split_huge_page..add_to_swap 252 us count 1 Event split_huge_page..add_to_swap 248 us count 1 Event split_huge_page..add_to_swap 247 us count 2 Event compaction_alloc..compaction_alloc 134 us count 1 Event shrink_inactive_list..shrink_zone 85 us count 1 Event ftrace_module_notify..ftrace_module_notify 80 us count 1 Event shrink_inactive_list..shrink_zone 76 us count 1 Event shrink_inactive_list..shrink_zone 47 us count 1 Event save_args..restore_args 40 us count 1 Event save_args..restore_args 39 us count 1 Event save_args..call_softirq 38 us count 1 Event save_args..call_softirq 36 us count 2 Event save_args..call_softirq 35 us count 2 Event save_args..call_softirq 34 us count 2 Event drain_array..cache_reap 31 us count 1 Event cfq_kick_queue..__blk_run_queue 29 us count 1 Event save_args..restore_args 28 us count 1 Event save_args..call_softirq 28 us count 3 Event save_args..call_softirq 27 us count 2 Event save_args..restore_args 27 us count 1 Event mempool_free_slab..mempool_free_slab 26 us count 1 Event drain_array..cache_reap 26 us count 2 Event save_args..call_softirq 26 us count 10 Event mempool_free_slab..mempool_free_slab 25 us count 1 Event save_args..call_softirq 25 us count 4 Event save_args..call_softirq 23 us count 2 Event ____pagevec_lru_add..__lru_cache_add 2 us count 1 i.e. compaction is disabled IRQs for a prolonged period of time - 3ms in one instance. The full report generated by the tool can be found at http://www.csn.ul.ie/~mel/postings/irqs-disabled-2.6.38-rc6.txt . This patch reduces the time IRQs are disabled by simply disabling IRQs at the last possible minute. An updated IRQs-off summary report then looks like; Event shrink_inactive_list..shrink_zone 2075 us count 1 Event shrink_inactive_list..shrink_zone 686 us count 1 Event split_huge_page..add_to_swap 535 us count 1 Event split_huge_page..add_to_swap 269 us count 1 Event split_huge_page..add_to_swap 265 us count 1 Event split_huge_page..add_to_swap 264 us count 1 Event shrink_inactive_list..shrink_zone 261 us count 1 Event split_huge_page..add_to_swap 255 us count 1 Event split_huge_page..add_to_swap 253 us count 1 Event split_huge_page..add_to_swap 252 us count 1 Event split_huge_page..add_to_swap 250 us count 1 Event split_huge_page..add_to_swap 239 us count 1 Event split_huge_page..add_to_swap 237 us count 1 Event shrink_inactive_list..shrink_zone 205 us count 1 Event compact_zone..compact_zone_order 162 us count 1 Event compact_zone..compact_zone_order 136 us count 1 Event compact_zone..compact_zone_order 134 us count 1 Event compact_zone..compact_zone_order 114 us count 1 Event shrink_inactive_list..shrink_zone 114 us count 1 Event compact_zone..compact_zone_order 111 us count 1 Event shrink_inactive_list..shrink_zone 95 us count 1 Event shrink_inactive_list..shrink_zone 91 us count 2 Event shrink_inactive_list..shrink_zone 90 us count 1 Event ftrace_module_notify..ftrace_module_notify 89 us count 1 Event shrink_inactive_list..shrink_zone 89 us count 1 Event shrink_inactive_list..shrink_zone 88 us count 2 Event shrink_inactive_list..shrink_zone 86 us count 1 Event shrink_inactive_list..shrink_zone 84 us count 2 Event shrink_inactive_list..shrink_zone 70 us count 1 Event save_args..restore_args 46 us count 1 Event mempool_free_slab..mempool_free_slab 43 us count 1 Event mempool_free_slab..mempool_free_slab 38 us count 1 Event save_args..restore_args 37 us count 1 Event mempool_free_slab..mempool_free_slab 37 us count 1 Event save_args..call_softirq 36 us count 2 Event save_args..call_softirq 35 us count 1 Event save_args..restore_args 35 us count 1 Event mempool_free_slab..mempool_free_slab 34 us count 1 Event save_args..call_softirq 34 us count 1 Event save_args..restore_args 33 us count 1 Event save_args..call_softirq 33 us count 2 Event save_args..call_softirq 32 us count 1 Event save_args..restore_args 30 us count 1 Event save_args..call_softirq 30 us count 1 Event drain_array..cache_reap 30 us count 1 Event save_args..restore_args 29 us count 1 Event scsi_request_fn..process_one_work 29 us count 1 Event save_args..call_softirq 29 us count 1 Event save_args..call_softirq 28 us count 2 Event save_args..call_softirq 27 us count 5 Event drain_array..cache_reap 26 us count 1 Event save_args..call_softirq 26 us count 8 Event save_args..call_softirq 25 us count 6 Event save_args..call_softirq 24 us count 2 Event __wake_up..__wake_up 2 us count 1 A full report is again available at http://www.csn.ul.ie/~mel/postings/irqs-disabled-2.6.38-rc6-compactirq-v1r1.txt . As should be obvious, IRQ disabled latencies due to compaction are almost elimimnated for this particular test. As a bonus, the test also completed far faster MMTests Statistics: duration vanilla compactirq-v1r1 User/Sys Time Running Test (seconds) 28.63 18.68 Total Elapsed Time (seconds) 162.81 85.83 Signed-off-by: Mel Gorman <mel@csn.ul.ie> --- mm/compaction.c | 17 ++++++++++++----- 1 files changed, 12 insertions(+), 5 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 8be430b8..f47de94 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -155,7 +155,6 @@ static void isolate_freepages(struct zone *zone, * pages on cc->migratepages. We stop searching if the migrate * and free page scanners meet or enough free pages are isolated. */ - spin_lock_irqsave(&zone->lock, flags); for (; pfn > low_pfn && cc->nr_migratepages > nr_freepages; pfn -= pageblock_nr_pages) { unsigned long isolated; @@ -178,9 +177,18 @@ static void isolate_freepages(struct zone *zone, if (!suitable_migration_target(page)) continue; - /* Found a block suitable for isolating free pages from */ - isolated = isolate_freepages_block(zone, pfn, freelist); - nr_freepages += isolated; + /* + * Found a block suitable for isolating free pages from. Now + * we disabled interrupts, double check things are ok and + * isolate the pages. This is to minimise the time IRQs + * are disabled + */ + spin_lock_irqsave(&zone->lock, flags); + if (suitable_migration_target(page)) { + isolated = isolate_freepages_block(zone, pfn, freelist); + nr_freepages += isolated; + } + spin_unlock_irqrestore(&zone->lock, flags); /* * Record the highest PFN we isolated pages from. When next @@ -190,7 +198,6 @@ static void isolate_freepages(struct zone *zone, if (isolated) high_pfn = max(high_pfn, pfn); } - spin_unlock_irqrestore(&zone->lock, flags); /* split_free_page does not map the pages */ list_for_each_entry(page, freelist, lru) { ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [Alsa-user] new source of MIDI playback slow-down identified - 5a03b051ed87e72b959f32a86054e1142ac4cf55 thp: use compaction in kswapd for GFP_ATOMIC order > 0 2011-02-23 17:01 ` Mel Gorman @ 2011-02-23 17:40 ` Andrea Arcangeli 0 siblings, 0 replies; 31+ messages in thread From: Andrea Arcangeli @ 2011-02-23 17:40 UTC (permalink / raw) To: Mel Gorman; +Cc: Arthur Marsh, Clemens Ladisch, alsa-user, linux-kernel On Wed, Feb 23, 2011 at 05:01:17PM +0000, Mel Gorman wrote: > I regret that yet more patches are being blasted around but I hope the included > figures will convince you to run yet-another-test. I don't know how you are > measuring how long IRQs are being disabled and when it's a problem but it's > what I'm more interested in right now than the kswapd CPU usage which I'm > still in the process of gathering data to analyse. Arthur, I suggest to apply this patch before any of my patches (just like the kswapd-high-wmark) before testing my patches. It's orthogonal so it can be applied at the same time. Acked-by: Andrea Arcangeli <aarcange@redhat.com> Thanks, Andrea ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [Alsa-user] new source of MIDI playback slow-down identified - 5a03b051ed87e72b959f32a86054e1142ac4cf55 thp: use compaction in kswapd for GFP_ATOMIC order > 0 2011-02-22 17:47 ` Arthur Marsh 2011-02-22 19:43 ` Andrea Arcangeli @ 2011-02-23 16:24 ` Andrea Arcangeli 2011-02-23 16:36 ` Andrea Arcangeli ` (2 more replies) 1 sibling, 3 replies; 31+ messages in thread From: Andrea Arcangeli @ 2011-02-23 16:24 UTC (permalink / raw) To: Arthur Marsh; +Cc: Clemens Ladisch, alsa-user, linux-kernel, Mel Gorman [-- Attachment #1: Type: text/plain, Size: 8134 bytes --] On Wed, Feb 23, 2011 at 04:17:44AM +1030, Arthur Marsh wrote: > OK, these patches applied together against upstream didn't cause a crash > but I did observe: > > significant slowdowns of MIDI playback (moreso than in previous cases, > and with less than 20 Meg of swap file in use); > > kswapd0 sharing equal top place in CPU usage at times (e.g. 20 percent). > > If I should try only one of the patches or something else entirely, > please let me know. Yes, with irq off, schedule won't run and need_resched won't get set. So let's try this. In case this doesn't fix I definitely give it up with compaction in kswapd as GFP_ATOMIC will likely not get an huge benefit out of compaction anyway because 1) the allocations from GFP_ATOMIC are likely short lived, 2) the cost of the compaction scan loop + migration may be higher than the benefit we get from jumbo frames So if this doesn't fix it, I'll already post a definitive fix that removes compaction from kswapd (but leaves it enabled for direct reclaim for all order sizes including kernel stack). I already verified that this solves not just the midi latency issue but the other server benchmark that I'm dealing with. Then we can think at compaction+kswapd later for 2.6.39 but I think we need to close this kswapd issue in time for 2.6.38. So if the below isn't enough the next patch should be applied. I'll get those two patches tested in the server load too, and I'll let you know how the results are when it finishes. Please apply also the attached "kswapd-high-wmark" before the below one. ==== Subject: compaction: fix high latencies created by kswapd From: Andrea Arcangeli <aarcange@redhat.com> We need to proper spin_unlock_irq/cond_resched in the compaction loop to avoid hurting latencies. We must also stop being too aggressive insisting with compaction within kswapd if compaction don't make progress in every zone. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> --- include/linux/compaction.h | 6 +++++ mm/compaction.c | 53 +++++++++++++++++++++++++++++++++++++-------- mm/vmscan.c | 39 +++++++++++++++++++-------------- 3 files changed, 73 insertions(+), 25 deletions(-) --- a/mm/compaction.c +++ b/mm/compaction.c @@ -271,9 +271,27 @@ static unsigned long isolate_migratepage } /* Time to isolate some pages for migration */ + cond_resched(); spin_lock_irq(&zone->lru_lock); for (; low_pfn < end_pfn; low_pfn++) { struct page *page; + int unlocked = 0; + + /* give a chance to irqs before checking need_resched() */ + if (!((low_pfn+1) % SWAP_CLUSTER_MAX)) { + spin_unlock_irq(&zone->lru_lock); + unlocked = 1; + } + if (need_resched() || spin_is_contended(&zone->lru_lock)) { + if (fatal_signal_pending(current)) + break; + if (!unlocked) + spin_unlock_irq(&zone->lru_lock); + cond_resched(); + spin_lock_irq(&zone->lru_lock); + } else if (unlocked) + spin_lock_irq(&zone->lru_lock); + if (!pfn_valid_within(low_pfn)) continue; nr_scanned++; @@ -436,6 +454,28 @@ static int compact_finished(struct zone return COMPACT_CONTINUE; } +static unsigned long compaction_watermark(struct zone *zone, int order) +{ + /* + * Watermarks for order-0 must be met for compaction. Note the 2UL. + * This is because during migration, copies of pages need to be + * allocated and for a short time, the footprint is higher + */ + return low_wmark_pages(zone) + (2UL << order); +} + +static int __compaction_need_reclaim(struct zone *zone, int order, + unsigned long watermark) +{ + return !zone_watermark_ok(zone, 0, watermark, 0, 0); +} + +int compaction_need_reclaim(struct zone *zone, int order) +{ + return __compaction_need_reclaim(zone, order, + compaction_watermark(zone, order)); +} + /* * compaction_suitable: Is this suitable to run compaction on this zone now? * Returns @@ -449,21 +489,16 @@ unsigned long compaction_suitable(struct unsigned long watermark; /* - * Watermarks for order-0 must be met for compaction. Note the 2UL. - * This is because during migration, copies of pages need to be - * allocated and for a short time, the footprint is higher - */ - watermark = low_wmark_pages(zone) + (2UL << order); - if (!zone_watermark_ok(zone, 0, watermark, 0, 0)) - return COMPACT_SKIPPED; - - /* * order == -1 is expected when compacting via * /proc/sys/vm/compact_memory */ if (order == -1) return COMPACT_CONTINUE; + watermark = compaction_watermark(zone, order); + if (__compaction_need_reclaim(zone, order, watermark)) + return COMPACT_SKIPPED; + /* * fragmentation index determines if allocation failures are due to * low memory or external fragmentation --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2385,9 +2385,9 @@ loop_again: * cause too much scanning of the lower zones. */ for (i = 0; i <= end_zone; i++) { - int compaction; struct zone *zone = pgdat->node_zones + i; int nr_slab; + int local_order; if (!populated_zone(zone)) continue; @@ -2407,20 +2407,21 @@ loop_again: * We put equal pressure on every zone, unless one * zone has way too many pages free already. */ - if (!zone_watermark_ok_safe(zone, order, - high_wmark_pages(zone), end_zone, 0)) + if (!zone_watermark_ok_safe(zone, 0, + high_wmark_pages(zone), end_zone, 0) || + compaction_need_reclaim(zone, order)) { shrink_zone(priority, zone, &sc); - reclaim_state->reclaimed_slab = 0; - nr_slab = shrink_slab(sc.nr_scanned, GFP_KERNEL, - lru_pages); - sc.nr_reclaimed += reclaim_state->reclaimed_slab; - total_scanned += sc.nr_scanned; + reclaim_state->reclaimed_slab = 0; + nr_slab = shrink_slab(sc.nr_scanned, + GFP_KERNEL, + lru_pages); + sc.nr_reclaimed += reclaim_state->reclaimed_slab; + total_scanned += sc.nr_scanned; + } - compaction = 0; + local_order = order; if (order && - zone_watermark_ok(zone, 0, - high_wmark_pages(zone), - end_zone, 0) && + !compaction_need_reclaim(zone, order) && !zone_watermark_ok(zone, order, high_wmark_pages(zone), end_zone, 0)) { @@ -2428,12 +2429,18 @@ loop_again: order, sc.gfp_mask, false, COMPACT_MODE_KSWAPD); - compaction = 1; + /* + * Let kswapd stop immediately even if + * compaction didn't succeed to + * generate "high_wmark_pages" of the + * max order wanted in every zone. + */ + local_order = 0; } if (zone->all_unreclaimable) continue; - if (!compaction && nr_slab == 0 && + if (nr_slab == 0 && !zone_reclaimable(zone)) zone->all_unreclaimable = 1; /* @@ -2445,7 +2452,7 @@ loop_again: total_scanned > sc.nr_reclaimed + sc.nr_reclaimed / 2) sc.may_writepage = 1; - if (!zone_watermark_ok_safe(zone, order, + if (!zone_watermark_ok_safe(zone, local_order, high_wmark_pages(zone), end_zone, 0)) { all_zones_ok = 0; /* @@ -2453,7 +2460,7 @@ loop_again: * means that we have a GFP_ATOMIC allocation * failure risk. Hurry up! */ - if (!zone_watermark_ok_safe(zone, order, + if (!zone_watermark_ok_safe(zone, local_order, min_wmark_pages(zone), end_zone, 0)) has_under_min_watermark_zone = 1; } else { --- a/include/linux/compaction.h +++ b/include/linux/compaction.h @@ -26,6 +26,7 @@ extern int fragmentation_index(struct zo extern unsigned long try_to_compact_pages(struct zonelist *zonelist, int order, gfp_t gfp_mask, nodemask_t *mask, bool sync); +extern int compaction_need_reclaim(struct zone *zone, int order); extern unsigned long compaction_suitable(struct zone *zone, int order); extern unsigned long compact_zone_order(struct zone *zone, int order, gfp_t gfp_mask, bool sync, @@ -68,6 +69,11 @@ static inline unsigned long try_to_compa return COMPACT_CONTINUE; } +static inline int compaction_need_reclaim(struct zone *zone, int order) +{ + return 0; +} + static inline unsigned long compaction_suitable(struct zone *zone, int order) { return COMPACT_SKIPPED; [-- Attachment #2: kswapd-high_wmark --] [-- Type: text/plain, Size: 1853 bytes --] Subject: vmscan: kswapd must not free more than high_wmark pages From: Andrea Arcangeli <aarcange@redhat.com> When the min_free_kbytes is set with `hugeadm --set-recommended-min_free_kbytes" or with THP enabled (which runs the equivalent of "hugeadm --set-recommended-min_free_kbytes" to activate anti-frag at full effectiveness automatically at boot) the high wmark of some zone is as high as ~88M. 88M free on a 4G system isn't horrible, but 88M*8 = 704M free on a 4G system is definitely unbearable. This only tends to be visible on 4G systems with tiny over-4g zone where kswapd insists to reach the high wmark on the over-4g zone but doing so it shrunk up to 704M from the normal zone by mistake. For the trivial case where kswapd isn't waken until all zones hit the low wmark and there is no concurrency between allocator and kswapd freeing, rotating more the tiny above4g lru than "high-low" despite we only allocated "high-low" cache into it doesn't sound obviously right either. Bigger gap to me looks like will do more harm than good and if we need a real guarantee of balancing we should rotate the allocations across the zones (bigger lru in a zone will require it to be hit more frequently because it'll rotate slower than the other zones, the bias should not even dependent on the zone size but on the lru size). Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> --- mm/vmscan.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2408,7 +2408,7 @@ loop_again: * zone has way too many pages free already. */ if (!zone_watermark_ok_safe(zone, order, - 8*high_wmark_pages(zone), end_zone, 0)) + high_wmark_pages(zone), end_zone, 0)) shrink_zone(priority, zone, &sc); reclaim_state->reclaimed_slab = 0; nr_slab = shrink_slab(sc.nr_scanned, GFP_KERNEL, ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [Alsa-user] new source of MIDI playback slow-down identified - 5a03b051ed87e72b959f32a86054e1142ac4cf55 thp: use compaction in kswapd for GFP_ATOMIC order > 0 2011-02-23 16:24 ` Andrea Arcangeli @ 2011-02-23 16:36 ` Andrea Arcangeli 2011-02-23 16:40 ` Andrea Arcangeli 2011-02-23 16:55 ` Andrea Arcangeli 2011-02-23 17:10 ` Mel Gorman 2 siblings, 1 reply; 31+ messages in thread From: Andrea Arcangeli @ 2011-02-23 16:36 UTC (permalink / raw) To: Arthur Marsh; +Cc: Clemens Ladisch, alsa-user, linux-kernel, Mel Gorman [-- Attachment #1: Type: text/plain, Size: 5574 bytes --] On Wed, Feb 23, 2011 at 05:24:32PM +0100, Andrea Arcangeli wrote: > On Wed, Feb 23, 2011 at 04:17:44AM +1030, Arthur Marsh wrote: > > OK, these patches applied together against upstream didn't cause a crash > > but I did observe: > > > > significant slowdowns of MIDI playback (moreso than in previous cases, > > and with less than 20 Meg of swap file in use); > > > > kswapd0 sharing equal top place in CPU usage at times (e.g. 20 percent). > > > > If I should try only one of the patches or something else entirely, > > please let me know. > > Yes, with irq off, schedule won't run and need_resched won't get set. > > So let's try this. > > In case this doesn't fix I definitely give it up with compaction in > kswapd as GFP_ATOMIC will likely not get an huge benefit out of > compaction anyway because 1) the allocations from GFP_ATOMIC are > likely short lived, 2) the cost of the compaction scan loop + > migration may be higher than the benefit we get from jumbo frames > > So if this doesn't fix it, I'll already post a definitive fix that > removes compaction from kswapd (but leaves it enabled for direct > reclaim for all order sizes including kernel stack). I already > verified that this solves not just the midi latency issue but the > other server benchmark that I'm dealing with. Then we can think at > compaction+kswapd later for 2.6.39 but I think we need to close this > kswapd issue in time for 2.6.38. So if the below isn't enough the next > patch should be applied. I'll get those two patches tested in the > server load too, and I'll let you know how the results are when it > finishes. > > Please apply also the attached "kswapd-high-wmark" before the below > one. If the previous patch please test the below after the attached patch (as usual). If the previous patch (last attempt for 2.6.38 to add compaction in kswapd) fails this is the way to go for 2.6.38. === Subject: compaction: fix high compaction latencies and remove compaction-kswapd From: Andrea Arcangeli <aarcange@redhat.com> We need to proper spin_unlock_irq/cond_resched in the compaction loop to avoid hurting latencies. We must also stop calling compaction from kswapd as that creates too high load during memory pressure that can't be offseted by the improved performance of hugepage allocations. NOTE: this is not related to THP as all THP allocations uses __GFP_NO_KSWAPD, this is only related to usually small order allocations like the kernel stack that make kswapd go wild with compaction. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> --- mm/compaction.c | 40 +++++++++++++++++++++------------------- 1 file changed, 21 insertions(+), 19 deletions(-) --- a/mm/compaction.c +++ b/mm/compaction.c @@ -271,9 +271,27 @@ static unsigned long isolate_migratepage } /* Time to isolate some pages for migration */ + cond_resched(); spin_lock_irq(&zone->lru_lock); for (; low_pfn < end_pfn; low_pfn++) { struct page *page; + int unlocked = 0; + + /* give a chance to irqs before checking need_resched() */ + if (!((low_pfn+1) % SWAP_CLUSTER_MAX)) { + spin_unlock_irq(&zone->lru_lock); + unlocked = 1; + } + if (need_resched() || spin_is_contended(&zone->lru_lock)) { + if (fatal_signal_pending(current)) + break; + if (!unlocked) + spin_unlock_irq(&zone->lru_lock); + cond_resched(); + spin_lock_irq(&zone->lru_lock); + } else if (unlocked) + spin_lock_irq(&zone->lru_lock); + if (!pfn_valid_within(low_pfn)) continue; nr_scanned++; @@ -397,10 +415,7 @@ static int compact_finished(struct zone return COMPACT_COMPLETE; /* Compaction run is not finished if the watermark is not met */ - if (cc->compact_mode != COMPACT_MODE_KSWAPD) - watermark = low_wmark_pages(zone); - else - watermark = high_wmark_pages(zone); + watermark = low_wmark_pages(zone); watermark += (1 << cc->order); if (!zone_watermark_ok(zone, cc->order, watermark, 0, 0)) @@ -413,15 +428,6 @@ static int compact_finished(struct zone if (cc->order == -1) return COMPACT_CONTINUE; - /* - * Generating only one page of the right order is not enough - * for kswapd, we must continue until we're above the high - * watermark as a pool for high order GFP_ATOMIC allocations - * too. - */ - if (cc->compact_mode == COMPACT_MODE_KSWAPD) - return COMPACT_CONTINUE; - /* Direct compactor: Is a suitable page free? */ for (order = cc->order; order < MAX_ORDER; order++) { /* Job done if page is free of the right migratetype */ @@ -543,8 +549,7 @@ static int compact_zone(struct zone *zon unsigned long compact_zone_order(struct zone *zone, int order, gfp_t gfp_mask, - bool sync, - int compact_mode) + bool sync) { struct compact_control cc = { .nr_freepages = 0, @@ -553,7 +558,6 @@ unsigned long compact_zone_order(struct .migratetype = allocflags_to_migratetype(gfp_mask), .zone = zone, .sync = sync, - .compact_mode = compact_mode, }; INIT_LIST_HEAD(&cc.freepages); INIT_LIST_HEAD(&cc.migratepages); @@ -599,8 +603,7 @@ unsigned long try_to_compact_pages(struc nodemask) { int status; - status = compact_zone_order(zone, order, gfp_mask, sync, - COMPACT_MODE_DIRECT_RECLAIM); + status = compact_zone_order(zone, order, gfp_mask, sync); rc = max(status, rc); /* If a normal allocation would succeed, stop compacting */ @@ -631,7 +634,6 @@ static int compact_node(int nid) .nr_freepages = 0, .nr_migratepages = 0, .order = -1, - .compact_mode = COMPACT_MODE_DIRECT_RECLAIM, }; zone = &pgdat->node_zones[zoneid]; [-- Attachment #2: kswapd-high_wmark --] [-- Type: text/plain, Size: 1853 bytes --] Subject: vmscan: kswapd must not free more than high_wmark pages From: Andrea Arcangeli <aarcange@redhat.com> When the min_free_kbytes is set with `hugeadm --set-recommended-min_free_kbytes" or with THP enabled (which runs the equivalent of "hugeadm --set-recommended-min_free_kbytes" to activate anti-frag at full effectiveness automatically at boot) the high wmark of some zone is as high as ~88M. 88M free on a 4G system isn't horrible, but 88M*8 = 704M free on a 4G system is definitely unbearable. This only tends to be visible on 4G systems with tiny over-4g zone where kswapd insists to reach the high wmark on the over-4g zone but doing so it shrunk up to 704M from the normal zone by mistake. For the trivial case where kswapd isn't waken until all zones hit the low wmark and there is no concurrency between allocator and kswapd freeing, rotating more the tiny above4g lru than "high-low" despite we only allocated "high-low" cache into it doesn't sound obviously right either. Bigger gap to me looks like will do more harm than good and if we need a real guarantee of balancing we should rotate the allocations across the zones (bigger lru in a zone will require it to be hit more frequently because it'll rotate slower than the other zones, the bias should not even dependent on the zone size but on the lru size). Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> --- mm/vmscan.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2408,7 +2408,7 @@ loop_again: * zone has way too many pages free already. */ if (!zone_watermark_ok_safe(zone, order, - 8*high_wmark_pages(zone), end_zone, 0)) + high_wmark_pages(zone), end_zone, 0)) shrink_zone(priority, zone, &sc); reclaim_state->reclaimed_slab = 0; nr_slab = shrink_slab(sc.nr_scanned, GFP_KERNEL, ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [Alsa-user] new source of MIDI playback slow-down identified - 5a03b051ed87e72b959f32a86054e1142ac4cf55 thp: use compaction in kswapd for GFP_ATOMIC order > 0 2011-02-23 16:36 ` Andrea Arcangeli @ 2011-02-23 16:40 ` Andrea Arcangeli 2011-02-23 16:47 ` Andrea Arcangeli 0 siblings, 1 reply; 31+ messages in thread From: Andrea Arcangeli @ 2011-02-23 16:40 UTC (permalink / raw) To: Arthur Marsh; +Cc: Clemens Ladisch, alsa-user, linux-kernel, Mel Gorman On Wed, Feb 23, 2011 at 05:36:39PM +0100, Andrea Arcangeli wrote: > + if (need_resched() || spin_is_contended(&zone->lru_lock)) { > + if (fatal_signal_pending(current)) > + break; I noticed a buglet in this break... need to repost sorry. compaction-no-kswapd-2. === Subject: compaction: fix high compaction latencies and remove compaction-kswapd From: Andrea Arcangeli <aarcange@redhat.com> We need to proper spin_unlock_irq/cond_resched in the compaction loop to avoid hurting latencies. We must also stop calling compaction from kswapd as that creates too high load during memory pressure that can't be offseted by the improved performance of hugepage allocations. NOTE: this is not related to THP as all THP allocations uses __GFP_NO_KSWAPD, this is only related to usually small order allocations like the kernel stack that make kswapd go wild with compaction. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> --- mm/compaction.c | 40 +++++++++++++++++++++------------------- 1 file changed, 21 insertions(+), 19 deletions(-) --- a/mm/compaction.c +++ b/mm/compaction.c @@ -271,9 +271,27 @@ static unsigned long isolate_migratepage } /* Time to isolate some pages for migration */ + cond_resched(); spin_lock_irq(&zone->lru_lock); for (; low_pfn < end_pfn; low_pfn++) { struct page *page; + int unlocked = 0; + + /* give a chance to irqs before checking need_resched() */ + if (!((low_pfn+1) % SWAP_CLUSTER_MAX)) { + spin_unlock_irq(&zone->lru_lock); + unlocked = 1; + } + if (need_resched() || spin_is_contended(&zone->lru_lock)) { + if (!unlocked) + spin_unlock_irq(&zone->lru_lock); + cond_resched(); + spin_lock_irq(&zone->lru_lock); + if (fatal_signal_pending(current)) + break; + } else if (unlocked) + spin_lock_irq(&zone->lru_lock); + if (!pfn_valid_within(low_pfn)) continue; nr_scanned++; @@ -397,10 +415,7 @@ static int compact_finished(struct zone return COMPACT_COMPLETE; /* Compaction run is not finished if the watermark is not met */ - if (cc->compact_mode != COMPACT_MODE_KSWAPD) - watermark = low_wmark_pages(zone); - else - watermark = high_wmark_pages(zone); + watermark = low_wmark_pages(zone); watermark += (1 << cc->order); if (!zone_watermark_ok(zone, cc->order, watermark, 0, 0)) @@ -413,15 +428,6 @@ static int compact_finished(struct zone if (cc->order == -1) return COMPACT_CONTINUE; - /* - * Generating only one page of the right order is not enough - * for kswapd, we must continue until we're above the high - * watermark as a pool for high order GFP_ATOMIC allocations - * too. - */ - if (cc->compact_mode == COMPACT_MODE_KSWAPD) - return COMPACT_CONTINUE; - /* Direct compactor: Is a suitable page free? */ for (order = cc->order; order < MAX_ORDER; order++) { /* Job done if page is free of the right migratetype */ @@ -543,8 +549,7 @@ static int compact_zone(struct zone *zon unsigned long compact_zone_order(struct zone *zone, int order, gfp_t gfp_mask, - bool sync, - int compact_mode) + bool sync) { struct compact_control cc = { .nr_freepages = 0, @@ -553,7 +558,6 @@ unsigned long compact_zone_order(struct .migratetype = allocflags_to_migratetype(gfp_mask), .zone = zone, .sync = sync, - .compact_mode = compact_mode, }; INIT_LIST_HEAD(&cc.freepages); INIT_LIST_HEAD(&cc.migratepages); @@ -599,8 +603,7 @@ unsigned long try_to_compact_pages(struc nodemask) { int status; - status = compact_zone_order(zone, order, gfp_mask, sync, - COMPACT_MODE_DIRECT_RECLAIM); + status = compact_zone_order(zone, order, gfp_mask, sync); rc = max(status, rc); /* If a normal allocation would succeed, stop compacting */ @@ -631,7 +634,6 @@ static int compact_node(int nid) .nr_freepages = 0, .nr_migratepages = 0, .order = -1, - .compact_mode = COMPACT_MODE_DIRECT_RECLAIM, }; zone = &pgdat->node_zones[zoneid]; ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [Alsa-user] new source of MIDI playback slow-down identified - 5a03b051ed87e72b959f32a86054e1142ac4cf55 thp: use compaction in kswapd for GFP_ATOMIC order > 0 2011-02-23 16:40 ` Andrea Arcangeli @ 2011-02-23 16:47 ` Andrea Arcangeli 0 siblings, 0 replies; 31+ messages in thread From: Andrea Arcangeli @ 2011-02-23 16:47 UTC (permalink / raw) To: Arthur Marsh; +Cc: Clemens Ladisch, alsa-user, linux-kernel, Mel Gorman On Wed, Feb 23, 2011 at 05:40:01PM +0100, Andrea Arcangeli wrote: > I noticed a buglet in this break... need to repost sorry. compaction-no-kswapd-2. and one part of the diff went missing during quilt ref... no luck. Try #3. So here a compaction-no-kswapd-3. Apologies for the flood. === Subject: compaction: fix high compaction latencies and remove compaction-kswapd From: Andrea Arcangeli <aarcange@redhat.com> We need to proper spin_unlock_irq/cond_resched in the compaction loop to avoid hurting latencies. We must also stop calling compaction from kswapd as that creates too high load during memory pressure that can't be offseted by the improved performance of hugepage allocations. NOTE: this is not related to THP as all THP allocations uses __GFP_NO_KSWAPD, this is only related to usually small order allocations like the kernel stack that make kswapd go wild with compaction. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> --- mm/compaction.c | 40 +++++++++++++++++++++------------------- 1 file changed, 21 insertions(+), 19 deletions(-) --- a/mm/compaction.c +++ b/mm/compaction.c @@ -271,9 +271,27 @@ static unsigned long isolate_migratepage } /* Time to isolate some pages for migration */ + cond_resched(); spin_lock_irq(&zone->lru_lock); for (; low_pfn < end_pfn; low_pfn++) { struct page *page; + int unlocked = 0; + + /* give a chance to irqs before checking need_resched() */ + if (!((low_pfn+1) % SWAP_CLUSTER_MAX)) { + spin_unlock_irq(&zone->lru_lock); + unlocked = 1; + } + if (need_resched() || spin_is_contended(&zone->lru_lock)) { + if (!unlocked) + spin_unlock_irq(&zone->lru_lock); + cond_resched(); + spin_lock_irq(&zone->lru_lock); + if (fatal_signal_pending(current)) + break; + } else if (unlocked) + spin_lock_irq(&zone->lru_lock); + if (!pfn_valid_within(low_pfn)) continue; nr_scanned++; @@ -397,10 +415,7 @@ static int compact_finished(struct zone return COMPACT_COMPLETE; /* Compaction run is not finished if the watermark is not met */ - if (cc->compact_mode != COMPACT_MODE_KSWAPD) - watermark = low_wmark_pages(zone); - else - watermark = high_wmark_pages(zone); + watermark = low_wmark_pages(zone); watermark += (1 << cc->order); if (!zone_watermark_ok(zone, cc->order, watermark, 0, 0)) @@ -413,15 +428,6 @@ static int compact_finished(struct zone if (cc->order == -1) return COMPACT_CONTINUE; - /* - * Generating only one page of the right order is not enough - * for kswapd, we must continue until we're above the high - * watermark as a pool for high order GFP_ATOMIC allocations - * too. - */ - if (cc->compact_mode == COMPACT_MODE_KSWAPD) - return COMPACT_CONTINUE; - /* Direct compactor: Is a suitable page free? */ for (order = cc->order; order < MAX_ORDER; order++) { /* Job done if page is free of the right migratetype */ @@ -543,8 +549,7 @@ static int compact_zone(struct zone *zon unsigned long compact_zone_order(struct zone *zone, int order, gfp_t gfp_mask, - bool sync, - int compact_mode) + bool sync) { struct compact_control cc = { .nr_freepages = 0, @@ -553,7 +558,6 @@ unsigned long compact_zone_order(struct .migratetype = allocflags_to_migratetype(gfp_mask), .zone = zone, .sync = sync, - .compact_mode = compact_mode, }; INIT_LIST_HEAD(&cc.freepages); INIT_LIST_HEAD(&cc.migratepages); @@ -599,8 +603,7 @@ unsigned long try_to_compact_pages(struc nodemask) { int status; - status = compact_zone_order(zone, order, gfp_mask, sync, - COMPACT_MODE_DIRECT_RECLAIM); + status = compact_zone_order(zone, order, gfp_mask, sync); rc = max(status, rc); /* If a normal allocation would succeed, stop compacting */ @@ -631,7 +634,6 @@ static int compact_node(int nid) .nr_freepages = 0, .nr_migratepages = 0, .order = -1, - .compact_mode = COMPACT_MODE_DIRECT_RECLAIM, }; zone = &pgdat->node_zones[zoneid]; diff --git a/include/linux/compaction.h b/include/linux/compaction.h index dfa2ed4..cc9f7a4 100644 --- a/include/linux/compaction.h +++ b/include/linux/compaction.h @@ -11,9 +11,6 @@ /* The full zone was compacted */ #define COMPACT_COMPLETE 3 -#define COMPACT_MODE_DIRECT_RECLAIM 0 -#define COMPACT_MODE_KSWAPD 1 - #ifdef CONFIG_COMPACTION extern int sysctl_compact_memory; extern int sysctl_compaction_handler(struct ctl_table *table, int write, @@ -28,8 +25,7 @@ extern unsigned long try_to_compact_pages(struct zonelist *zonelist, bool sync); extern unsigned long compaction_suitable(struct zone *zone, int order); extern unsigned long compact_zone_order(struct zone *zone, int order, - gfp_t gfp_mask, bool sync, - int compact_mode); + gfp_t gfp_mask, bool sync); /* Do not skip compaction more than 64 times */ #define COMPACT_MAX_DEFER_SHIFT 6 @@ -74,8 +70,7 @@ static inline unsigned long compaction_suitable(struct zone *zone, int order) } static inline unsigned long compact_zone_order(struct zone *zone, int order, - gfp_t gfp_mask, bool sync, - int compact_mode) + gfp_t gfp_mask, bool sync) { return COMPACT_CONTINUE; } diff --git a/mm/vmscan.c b/mm/vmscan.c index 17497d0..0e7121d 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2385,7 +2385,6 @@ loop_again: * cause too much scanning of the lower zones. */ for (i = 0; i <= end_zone; i++) { - int compaction; struct zone *zone = pgdat->node_zones + i; int nr_slab; @@ -2416,24 +2415,9 @@ loop_again: sc.nr_reclaimed += reclaim_state->reclaimed_slab; total_scanned += sc.nr_scanned; - compaction = 0; - if (order && - zone_watermark_ok(zone, 0, - high_wmark_pages(zone), - end_zone, 0) && - !zone_watermark_ok(zone, order, - high_wmark_pages(zone), - end_zone, 0)) { - compact_zone_order(zone, - order, - sc.gfp_mask, false, - COMPACT_MODE_KSWAPD); - compaction = 1; - } - if (zone->all_unreclaimable) continue; - if (!compaction && nr_slab == 0 && + if (nr_slab == 0 && !zone_reclaimable(zone)) zone->all_unreclaimable = 1; /* ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [Alsa-user] new source of MIDI playback slow-down identified - 5a03b051ed87e72b959f32a86054e1142ac4cf55 thp: use compaction in kswapd for GFP_ATOMIC order > 0 2011-02-23 16:24 ` Andrea Arcangeli 2011-02-23 16:36 ` Andrea Arcangeli @ 2011-02-23 16:55 ` Andrea Arcangeli 2011-02-23 20:07 ` Arthur Marsh 2011-02-23 17:10 ` Mel Gorman 2 siblings, 1 reply; 31+ messages in thread From: Andrea Arcangeli @ 2011-02-23 16:55 UTC (permalink / raw) To: Arthur Marsh; +Cc: Clemens Ladisch, alsa-user, linux-kernel, Mel Gorman On Wed, Feb 23, 2011 at 05:24:32PM +0100, Andrea Arcangeli wrote: > On Wed, Feb 23, 2011 at 04:17:44AM +1030, Arthur Marsh wrote: > > OK, these patches applied together against upstream didn't cause a crash > > but I did observe: > > > > significant slowdowns of MIDI playback (moreso than in previous cases, > > and with less than 20 Meg of swap file in use); > > > > kswapd0 sharing equal top place in CPU usage at times (e.g. 20 percent). > > > > If I should try only one of the patches or something else entirely, > > please let me know. > > Yes, with irq off, schedule won't run and need_resched won't get set. > > So let's try this. > > In case this doesn't fix I definitely give it up with compaction in > kswapd as GFP_ATOMIC will likely not get an huge benefit out of > compaction anyway because 1) the allocations from GFP_ATOMIC are > likely short lived, 2) the cost of the compaction scan loop + > migration may be higher than the benefit we get from jumbo frames > > So if this doesn't fix it, I'll already post a definitive fix that > removes compaction from kswapd (but leaves it enabled for direct > reclaim for all order sizes including kernel stack). I already > verified that this solves not just the midi latency issue but the > other server benchmark that I'm dealing with. Then we can think at > compaction+kswapd later for 2.6.39 but I think we need to close this > kswapd issue in time for 2.6.38. So if the below isn't enough the next > patch should be applied. I'll get those two patches tested in the > server load too, and I'll let you know how the results are when it > finishes. > > Please apply also the attached "kswapd-high-wmark" before the below > one. > + if (need_resched() || spin_is_contended(&zone->lru_lock)) { > + if (fatal_signal_pending(current)) > + break; this is compaction-kswapd-2, to fix the buglet same as in the other patch. In short (to avoid confusion) it'll be helpful if you can test compaction-kswapd-2 vs compaction-no-kswapd-3 and let us know if both are ok or if only compaction-no-kswapd-3 is ok. compaction-no-kswapd-3 should help, compaction-kswapd-2 I'm unsure. And as usual I suggest to apply kswapd-high-wmark (attached to previous emails) _before_ applying both patches. === Subject: compaction: fix high latencies created by kswapd From: Andrea Arcangeli <aarcange@redhat.com> We need to proper spin_unlock_irq/cond_resched in the compaction loop to avoid hurting latencies. We must also stop being too aggressive insisting with compaction within kswapd if compaction don't make progress in every zone. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> --- include/linux/compaction.h | 6 +++++ mm/compaction.c | 53 +++++++++++++++++++++++++++++++++++++-------- mm/vmscan.c | 39 +++++++++++++++++++-------------- 3 files changed, 73 insertions(+), 25 deletions(-) --- a/mm/compaction.c +++ b/mm/compaction.c @@ -271,9 +271,27 @@ static unsigned long isolate_migratepage } /* Time to isolate some pages for migration */ + cond_resched(); spin_lock_irq(&zone->lru_lock); for (; low_pfn < end_pfn; low_pfn++) { struct page *page; + int unlocked = 0; + + /* give a chance to irqs before checking need_resched() */ + if (!((low_pfn+1) % SWAP_CLUSTER_MAX)) { + spin_unlock_irq(&zone->lru_lock); + unlocked = 1; + } + if (need_resched() || spin_is_contended(&zone->lru_lock)) { + if (!unlocked) + spin_unlock_irq(&zone->lru_lock); + cond_resched(); + spin_lock_irq(&zone->lru_lock); + if (fatal_signal_pending(current)) + break; + } else if (unlocked) + spin_lock_irq(&zone->lru_lock); + if (!pfn_valid_within(low_pfn)) continue; nr_scanned++; @@ -436,6 +454,28 @@ static int compact_finished(struct zone return COMPACT_CONTINUE; } +static unsigned long compaction_watermark(struct zone *zone, int order) +{ + /* + * Watermarks for order-0 must be met for compaction. Note the 2UL. + * This is because during migration, copies of pages need to be + * allocated and for a short time, the footprint is higher + */ + return low_wmark_pages(zone) + (2UL << order); +} + +static int __compaction_need_reclaim(struct zone *zone, int order, + unsigned long watermark) +{ + return !zone_watermark_ok(zone, 0, watermark, 0, 0); +} + +int compaction_need_reclaim(struct zone *zone, int order) +{ + return __compaction_need_reclaim(zone, order, + compaction_watermark(zone, order)); +} + /* * compaction_suitable: Is this suitable to run compaction on this zone now? * Returns @@ -449,21 +489,16 @@ unsigned long compaction_suitable(struct unsigned long watermark; /* - * Watermarks for order-0 must be met for compaction. Note the 2UL. - * This is because during migration, copies of pages need to be - * allocated and for a short time, the footprint is higher - */ - watermark = low_wmark_pages(zone) + (2UL << order); - if (!zone_watermark_ok(zone, 0, watermark, 0, 0)) - return COMPACT_SKIPPED; - - /* * order == -1 is expected when compacting via * /proc/sys/vm/compact_memory */ if (order == -1) return COMPACT_CONTINUE; + watermark = compaction_watermark(zone, order); + if (__compaction_need_reclaim(zone, order, watermark)) + return COMPACT_SKIPPED; + /* * fragmentation index determines if allocation failures are due to * low memory or external fragmentation --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2385,9 +2385,9 @@ loop_again: * cause too much scanning of the lower zones. */ for (i = 0; i <= end_zone; i++) { - int compaction; struct zone *zone = pgdat->node_zones + i; int nr_slab; + int local_order; if (!populated_zone(zone)) continue; @@ -2407,20 +2407,21 @@ loop_again: * We put equal pressure on every zone, unless one * zone has way too many pages free already. */ - if (!zone_watermark_ok_safe(zone, order, - high_wmark_pages(zone), end_zone, 0)) + if (!zone_watermark_ok_safe(zone, 0, + high_wmark_pages(zone), end_zone, 0) || + compaction_need_reclaim(zone, order)) { shrink_zone(priority, zone, &sc); - reclaim_state->reclaimed_slab = 0; - nr_slab = shrink_slab(sc.nr_scanned, GFP_KERNEL, - lru_pages); - sc.nr_reclaimed += reclaim_state->reclaimed_slab; - total_scanned += sc.nr_scanned; + reclaim_state->reclaimed_slab = 0; + nr_slab = shrink_slab(sc.nr_scanned, + GFP_KERNEL, + lru_pages); + sc.nr_reclaimed += reclaim_state->reclaimed_slab; + total_scanned += sc.nr_scanned; + } - compaction = 0; + local_order = order; if (order && - zone_watermark_ok(zone, 0, - high_wmark_pages(zone), - end_zone, 0) && + !compaction_need_reclaim(zone, order) && !zone_watermark_ok(zone, order, high_wmark_pages(zone), end_zone, 0)) { @@ -2428,12 +2429,18 @@ loop_again: order, sc.gfp_mask, false, COMPACT_MODE_KSWAPD); - compaction = 1; + /* + * Let kswapd stop immediately even if + * compaction didn't succeed to + * generate "high_wmark_pages" of the + * max order wanted in every zone. + */ + local_order = 0; } if (zone->all_unreclaimable) continue; - if (!compaction && nr_slab == 0 && + if (nr_slab == 0 && !zone_reclaimable(zone)) zone->all_unreclaimable = 1; /* @@ -2445,7 +2452,7 @@ loop_again: total_scanned > sc.nr_reclaimed + sc.nr_reclaimed / 2) sc.may_writepage = 1; - if (!zone_watermark_ok_safe(zone, order, + if (!zone_watermark_ok_safe(zone, local_order, high_wmark_pages(zone), end_zone, 0)) { all_zones_ok = 0; /* @@ -2453,7 +2460,7 @@ loop_again: * means that we have a GFP_ATOMIC allocation * failure risk. Hurry up! */ - if (!zone_watermark_ok_safe(zone, order, + if (!zone_watermark_ok_safe(zone, local_order, min_wmark_pages(zone), end_zone, 0)) has_under_min_watermark_zone = 1; } else { --- a/include/linux/compaction.h +++ b/include/linux/compaction.h @@ -26,6 +26,7 @@ extern int fragmentation_index(struct zo extern unsigned long try_to_compact_pages(struct zonelist *zonelist, int order, gfp_t gfp_mask, nodemask_t *mask, bool sync); +extern int compaction_need_reclaim(struct zone *zone, int order); extern unsigned long compaction_suitable(struct zone *zone, int order); extern unsigned long compact_zone_order(struct zone *zone, int order, gfp_t gfp_mask, bool sync, @@ -68,6 +69,11 @@ static inline unsigned long try_to_compa return COMPACT_CONTINUE; } +static inline int compaction_need_reclaim(struct zone *zone, int order) +{ + return 0; +} + static inline unsigned long compaction_suitable(struct zone *zone, int order) { return COMPACT_SKIPPED; ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [Alsa-user] new source of MIDI playback slow-down identified - 5a03b051ed87e72b959f32a86054e1142ac4cf55 thp: use compaction in kswapd for GFP_ATOMIC order > 0 2011-02-23 16:55 ` Andrea Arcangeli @ 2011-02-23 20:07 ` Arthur Marsh 2011-02-23 21:25 ` Andrea Arcangeli 0 siblings, 1 reply; 31+ messages in thread From: Arthur Marsh @ 2011-02-23 20:07 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Clemens Ladisch, alsa-user, linux-kernel, Mel Gorman Andrea Arcangeli wrote, on 24/02/11 03:25: > On Wed, Feb 23, 2011 at 05:24:32PM +0100, Andrea Arcangeli wrote: >> On Wed, Feb 23, 2011 at 04:17:44AM +1030, Arthur Marsh wrote: >>> OK, these patches applied together against upstream didn't cause a crash >>> but I did observe: >>> >>> significant slowdowns of MIDI playback (moreso than in previous cases, >>> and with less than 20 Meg of swap file in use); >>> >>> kswapd0 sharing equal top place in CPU usage at times (e.g. 20 percent). >>> >>> If I should try only one of the patches or something else entirely, >>> please let me know. >> >> Yes, with irq off, schedule won't run and need_resched won't get set. >> >> So let's try this. >> >> In case this doesn't fix I definitely give it up with compaction in >> kswapd as GFP_ATOMIC will likely not get an huge benefit out of >> compaction anyway because 1) the allocations from GFP_ATOMIC are >> likely short lived, 2) the cost of the compaction scan loop + >> migration may be higher than the benefit we get from jumbo frames >> >> So if this doesn't fix it, I'll already post a definitive fix that >> removes compaction from kswapd (but leaves it enabled for direct >> reclaim for all order sizes including kernel stack). I already >> verified that this solves not just the midi latency issue but the >> other server benchmark that I'm dealing with. Then we can think at >> compaction+kswapd later for 2.6.39 but I think we need to close this >> kswapd issue in time for 2.6.38. So if the below isn't enough the next >> patch should be applied. I'll get those two patches tested in the >> server load too, and I'll let you know how the results are when it >> finishes. >> >> Please apply also the attached "kswapd-high-wmark" before the below >> one. >> + if (need_resched() || spin_is_contended(&zone->lru_lock)) { >> + if (fatal_signal_pending(current)) >> + break; > > this is compaction-kswapd-2, to fix the buglet same as in the other > patch. > > In short (to avoid confusion) it'll be helpful if you can test > compaction-kswapd-2 vs compaction-no-kswapd-3 and let us know if both > are ok or if only compaction-no-kswapd-3 is ok. compaction-no-kswapd-3 > should help, compaction-kswapd-2 I'm unsure. And as usual I suggest to > apply kswapd-high-wmark (attached to previous emails) _before_ > applying both patches. kswapd-high_wmark + compaction-kswapd-2 - kswapd0 CPU up to 11 percent and slightly less pronounced slowdowns of MIDI playback compared to previous patches. kswapd-high_wmark + compaction-no-kswapd-3 - kswapd0 CPU up to 2.3 percent and no noticable slowdown of MIDI playback. Mel Gorman's mm/compaction.c patch - kswapd0 CPU up to 20 percent and no noticable slowdown of MIDI playback. Thanks everyone for the help with this. Arthur. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [Alsa-user] new source of MIDI playback slow-down identified - 5a03b051ed87e72b959f32a86054e1142ac4cf55 thp: use compaction in kswapd for GFP_ATOMIC order > 0 2011-02-23 20:07 ` Arthur Marsh @ 2011-02-23 21:25 ` Andrea Arcangeli 2011-02-23 21:55 ` Arthur Marsh 0 siblings, 1 reply; 31+ messages in thread From: Andrea Arcangeli @ 2011-02-23 21:25 UTC (permalink / raw) To: Arthur Marsh; +Cc: Clemens Ladisch, alsa-user, linux-kernel, Mel Gorman On Thu, Feb 24, 2011 at 06:37:58AM +1030, Arthur Marsh wrote: > kswapd-high_wmark + compaction-kswapd-2 - kswapd0 CPU up to 11 percent > and slightly less pronounced slowdowns of MIDI playback compared to > previous patches. > > kswapd-high_wmark + compaction-no-kswapd-3 - kswapd0 CPU up to 2.3 > percent and no noticable slowdown of MIDI playback. > > Mel Gorman's mm/compaction.c patch - kswapd0 CPU up to 20 percent and no > noticable slowdown of MIDI playback. > > Thanks everyone for the help with this. Ok then I think it's safer to go with compaction-no-kswapd-3. I also created a git tree in case anybody else wants to test in easier way. git clone --reference linux-2.6.git git://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git (or "git clone git://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git") cd aa git checkout -f 3d74399aaece29047beba13a81650538af8db67a (compaction-kswapd+compaction_alloc_lowlat) git checkout -f f048d0082bd3c6a1cc3f5e67aa5ef83d1561ec27 (compaction-no-kswapd+rest+compaction_alloc_lowlat) git checkout -f 859e705548f7377b1803b05b2904bae77495fd1e (only compaction_alloc_lowlat) http://git.kernel.org/?p=linux/kernel/git/andrea/aa.git;a=shortlog;h=3d74399aaece29047beba13a81650538af8db67a http://git.kernel.org/?p=linux/kernel/git/andrea/aa.git;a=shortlog;h=f048d0082bd3c6a1cc3f5e67aa5ef83d1561ec27 http://git.kernel.org/?p=linux/kernel/git/andrea/aa.git;a=shortlog;h=859e705548f7377b1803b05b2904bae77495fd1e Arthur could you give one more spin to the 3d74399aaece29047beba13a81650538af8db67a tree? (or do you prefer an updated patch compaction-kswapd-3?) I'd like to get 3d74399aaece29047beba13a81650538af8db67a tested again because I did one more modification included in the git version compared to the patch version (nr_slab was not always initialized and it could lead to slightly higher kswapd cpu load than intended). I doubt it will help though (just in case). Mel what you think? If f048d0082bd3c6a1cc3f5e67aa5ef83d1561ec27 is still the only one that shows no regression, I think it's safe to apply it to 2.6.38 and revert the compaction in kswapd feature. Then we can add compaction to kswapd later with no hurry. Thanks a lot for the help and quick feedback! Andrea ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [Alsa-user] new source of MIDI playback slow-down identified - 5a03b051ed87e72b959f32a86054e1142ac4cf55 thp: use compaction in kswapd for GFP_ATOMIC order > 0 2011-02-23 21:25 ` Andrea Arcangeli @ 2011-02-23 21:55 ` Arthur Marsh 2011-02-23 23:59 ` Andrea Arcangeli 0 siblings, 1 reply; 31+ messages in thread From: Arthur Marsh @ 2011-02-23 21:55 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Clemens Ladisch, alsa-user, linux-kernel, Mel Gorman Andrea Arcangeli wrote, on 24/02/11 07:55: > On Thu, Feb 24, 2011 at 06:37:58AM +1030, Arthur Marsh wrote: >> kswapd-high_wmark + compaction-kswapd-2 - kswapd0 CPU up to 11 percent >> and slightly less pronounced slowdowns of MIDI playback compared to >> previous patches. >> >> kswapd-high_wmark + compaction-no-kswapd-3 - kswapd0 CPU up to 2.3 >> percent and no noticable slowdown of MIDI playback. >> >> Mel Gorman's mm/compaction.c patch - kswapd0 CPU up to 20 percent and no >> noticable slowdown of MIDI playback. >> >> Thanks everyone for the help with this. > > Ok then I think it's safer to go with compaction-no-kswapd-3. One more combination I tried: Mel Gorman's mm/compaction.c patch with Andrea Archangeli's kswapd-high_wmark + compaction-no-kswapd-3 patches - kswapd0 CPU less than 2 percent and no noticable slowdown of MIDI playback. This may be better performing than compaction-no-kswapd-3 alone but is not conclusive. > > I also created a git tree in case anybody else wants to test in easier > way. > > git clone --reference linux-2.6.git git://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git > (or "git clone git://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git") > cd aa > git checkout -f 3d74399aaece29047beba13a81650538af8db67a > (compaction-kswapd+compaction_alloc_lowlat) > git checkout -f f048d0082bd3c6a1cc3f5e67aa5ef83d1561ec27 > (compaction-no-kswapd+rest+compaction_alloc_lowlat) > git checkout -f 859e705548f7377b1803b05b2904bae77495fd1e (only > compaction_alloc_lowlat) > > http://git.kernel.org/?p=linux/kernel/git/andrea/aa.git;a=shortlog;h=3d74399aaece29047beba13a81650538af8db67a > http://git.kernel.org/?p=linux/kernel/git/andrea/aa.git;a=shortlog;h=f048d0082bd3c6a1cc3f5e67aa5ef83d1561ec27 > http://git.kernel.org/?p=linux/kernel/git/andrea/aa.git;a=shortlog;h=859e705548f7377b1803b05b2904bae77495fd1e > > Arthur could you give one more spin to the > 3d74399aaece29047beba13a81650538af8db67a tree? (or do you prefer an > updated patch compaction-kswapd-3?) If you can send me an updated patch compaction-no-kswapd-3 (I presume that kswapd-high_wmark is still needed) it would be easier for me to apply. > > I'd like to get 3d74399aaece29047beba13a81650538af8db67a tested again > because I did one more modification included in the git version > compared to the patch version (nr_slab was not always initialized and > it could lead to slightly higher kswapd cpu load than intended). I > doubt it will help though (just in case). > > Mel what you think? If f048d0082bd3c6a1cc3f5e67aa5ef83d1561ec27 is > still the only one that shows no regression, I think it's safe to > apply it to 2.6.38 and revert the compaction in kswapd feature. Then > we can add compaction to kswapd later with no hurry. > > Thanks a lot for the help and quick feedback! > Andrea > Arthur. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [Alsa-user] new source of MIDI playback slow-down identified - 5a03b051ed87e72b959f32a86054e1142ac4cf55 thp: use compaction in kswapd for GFP_ATOMIC order > 0 2011-02-23 21:55 ` Arthur Marsh @ 2011-02-23 23:59 ` Andrea Arcangeli 2011-02-24 1:40 ` Arthur Marsh 0 siblings, 1 reply; 31+ messages in thread From: Andrea Arcangeli @ 2011-02-23 23:59 UTC (permalink / raw) To: Arthur Marsh; +Cc: Clemens Ladisch, alsa-user, linux-kernel, Mel Gorman On Thu, Feb 24, 2011 at 08:25:43AM +1030, Arthur Marsh wrote: > One more combination I tried: > > Mel Gorman's mm/compaction.c patch with Andrea Archangeli's > kswapd-high_wmark + compaction-no-kswapd-3 patches - kswapd0 CPU less > than 2 percent and no noticable slowdown of MIDI playback. Applying Mel's patch on top should decrease latency more. > If you can send me an updated patch compaction-no-kswapd-3 (I presume > that kswapd-high_wmark is still needed) it would be easier for me to apply. It's a compaction-kswapd-3. It's likely going to work the same as the previous compaction-kswapd-2 (not as good as compaction-no-kswapd). It's better to apply both the kswapd-high_wmark and Mel's patch too (not only this one) during testing. Thanks, Andrea === Subject: compaction: fix high latencies created by kswapd From: Andrea Arcangeli <aarcange@redhat.com> We need to proper spin_unlock_irq/cond_resched in the compaction loop to avoid hurting latencies. We must also stop being too aggressive insisting with compaction within kswapd if compaction don't make progress in every zone. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> --- include/linux/compaction.h | 6 +++++ mm/compaction.c | 53 +++++++++++++++++++++++++++++++++++++-------- mm/vmscan.c | 40 ++++++++++++++++++++------------- 3 files changed, 74 insertions(+), 25 deletions(-) --- a/mm/compaction.c +++ b/mm/compaction.c @@ -278,9 +278,27 @@ static unsigned long isolate_migratepage } /* Time to isolate some pages for migration */ + cond_resched(); spin_lock_irq(&zone->lru_lock); for (; low_pfn < end_pfn; low_pfn++) { struct page *page; + int unlocked = 0; + + /* give a chance to irqs before checking need_resched() */ + if (!((low_pfn+1) % SWAP_CLUSTER_MAX)) { + spin_unlock_irq(&zone->lru_lock); + unlocked = 1; + } + if (need_resched() || spin_is_contended(&zone->lru_lock)) { + if (!unlocked) + spin_unlock_irq(&zone->lru_lock); + cond_resched(); + spin_lock_irq(&zone->lru_lock); + if (fatal_signal_pending(current)) + break; + } else if (unlocked) + spin_lock_irq(&zone->lru_lock); + if (!pfn_valid_within(low_pfn)) continue; nr_scanned++; @@ -443,6 +461,28 @@ static int compact_finished(struct zone return COMPACT_CONTINUE; } +static unsigned long compaction_watermark(struct zone *zone, int order) +{ + /* + * Watermarks for order-0 must be met for compaction. Note the 2UL. + * This is because during migration, copies of pages need to be + * allocated and for a short time, the footprint is higher + */ + return low_wmark_pages(zone) + (2UL << order); +} + +static int __compaction_need_reclaim(struct zone *zone, + unsigned long watermark) +{ + return !zone_watermark_ok(zone, 0, watermark, 0, 0); +} + +int compaction_need_reclaim(struct zone *zone, int order) +{ + return __compaction_need_reclaim(zone, + compaction_watermark(zone, order)); +} + /* * compaction_suitable: Is this suitable to run compaction on this zone now? * Returns @@ -456,21 +496,16 @@ unsigned long compaction_suitable(struct unsigned long watermark; /* - * Watermarks for order-0 must be met for compaction. Note the 2UL. - * This is because during migration, copies of pages need to be - * allocated and for a short time, the footprint is higher - */ - watermark = low_wmark_pages(zone) + (2UL << order); - if (!zone_watermark_ok(zone, 0, watermark, 0, 0)) - return COMPACT_SKIPPED; - - /* * order == -1 is expected when compacting via * /proc/sys/vm/compact_memory */ if (order == -1) return COMPACT_CONTINUE; + watermark = compaction_watermark(zone, order); + if (__compaction_need_reclaim(zone, watermark)) + return COMPACT_SKIPPED; + /* * fragmentation index determines if allocation failures are due to * low memory or external fragmentation --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2385,9 +2385,9 @@ loop_again: * cause too much scanning of the lower zones. */ for (i = 0; i <= end_zone; i++) { - int compaction; struct zone *zone = pgdat->node_zones + i; int nr_slab; + int local_order; if (!populated_zone(zone)) continue; @@ -2407,20 +2407,22 @@ loop_again: * We put equal pressure on every zone, unless one * zone has way too many pages free already. */ - if (!zone_watermark_ok_safe(zone, order, - high_wmark_pages(zone), end_zone, 0)) + nr_slab = 0; + if (!zone_watermark_ok_safe(zone, 0, + high_wmark_pages(zone), end_zone, 0) || + compaction_need_reclaim(zone, order)) { shrink_zone(priority, zone, &sc); - reclaim_state->reclaimed_slab = 0; - nr_slab = shrink_slab(sc.nr_scanned, GFP_KERNEL, - lru_pages); - sc.nr_reclaimed += reclaim_state->reclaimed_slab; - total_scanned += sc.nr_scanned; + reclaim_state->reclaimed_slab = 0; + nr_slab = shrink_slab(sc.nr_scanned, + GFP_KERNEL, + lru_pages); + sc.nr_reclaimed += reclaim_state->reclaimed_slab; + total_scanned += sc.nr_scanned; + } - compaction = 0; + local_order = order; if (order && - zone_watermark_ok(zone, 0, - high_wmark_pages(zone), - end_zone, 0) && + !compaction_need_reclaim(zone, order) && !zone_watermark_ok(zone, order, high_wmark_pages(zone), end_zone, 0)) { @@ -2428,12 +2430,18 @@ loop_again: order, sc.gfp_mask, false, COMPACT_MODE_KSWAPD); - compaction = 1; + /* + * Let kswapd stop immediately even if + * compaction didn't succeed to + * generate "high_wmark_pages" of the + * max order wanted in every zone. + */ + local_order = 0; } if (zone->all_unreclaimable) continue; - if (!compaction && nr_slab == 0 && + if (nr_slab == 0 && !zone_reclaimable(zone)) zone->all_unreclaimable = 1; /* @@ -2445,7 +2453,7 @@ loop_again: total_scanned > sc.nr_reclaimed + sc.nr_reclaimed / 2) sc.may_writepage = 1; - if (!zone_watermark_ok_safe(zone, order, + if (!zone_watermark_ok_safe(zone, local_order, high_wmark_pages(zone), end_zone, 0)) { all_zones_ok = 0; /* @@ -2453,7 +2461,7 @@ loop_again: * means that we have a GFP_ATOMIC allocation * failure risk. Hurry up! */ - if (!zone_watermark_ok_safe(zone, order, + if (!zone_watermark_ok_safe(zone, local_order, min_wmark_pages(zone), end_zone, 0)) has_under_min_watermark_zone = 1; } else { --- a/include/linux/compaction.h +++ b/include/linux/compaction.h @@ -26,6 +26,7 @@ extern int fragmentation_index(struct zo extern unsigned long try_to_compact_pages(struct zonelist *zonelist, int order, gfp_t gfp_mask, nodemask_t *mask, bool sync); +extern int compaction_need_reclaim(struct zone *zone, int order); extern unsigned long compaction_suitable(struct zone *zone, int order); extern unsigned long compact_zone_order(struct zone *zone, int order, gfp_t gfp_mask, bool sync, @@ -68,6 +69,11 @@ static inline unsigned long try_to_compa return COMPACT_CONTINUE; } +static inline int compaction_need_reclaim(struct zone *zone, int order) +{ + return 0; +} + static inline unsigned long compaction_suitable(struct zone *zone, int order) { return COMPACT_SKIPPED; ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [Alsa-user] new source of MIDI playback slow-down identified - 5a03b051ed87e72b959f32a86054e1142ac4cf55 thp: use compaction in kswapd for GFP_ATOMIC order > 0 2011-02-23 23:59 ` Andrea Arcangeli @ 2011-02-24 1:40 ` Arthur Marsh 2011-02-24 1:54 ` Andrea Arcangeli 0 siblings, 1 reply; 31+ messages in thread From: Arthur Marsh @ 2011-02-24 1:40 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Clemens Ladisch, alsa-user, linux-kernel, Mel Gorman Andrea Arcangeli wrote, on 24/02/11 10:29: > On Thu, Feb 24, 2011 at 08:25:43AM +1030, Arthur Marsh wrote: >> One more combination I tried: >> >> Mel Gorman's mm/compaction.c patch with Andrea Archangeli's >> kswapd-high_wmark + compaction-no-kswapd-3 patches - kswapd0 CPU less >> than 2 percent and no noticable slowdown of MIDI playback. > > Applying Mel's patch on top should decrease latency more. > >> If you can send me an updated patch compaction-no-kswapd-3 (I presume >> that kswapd-high_wmark is still needed) it would be easier for me to apply. > > It's a compaction-kswapd-3. It's likely going to work the same as the > previous compaction-kswapd-2 (not as good as > compaction-no-kswapd). It's better to apply both the kswapd-high_wmark > and Mel's patch too (not only this one) during testing. OK, with kswapd-high_wmark + compaction-kswapd-3 + Mel's patch (with the compaction initialisation fix), MIDI playback is fine. kswapd0 CPU can very occasionally hit equal highest (17 percent was the highest I noticed, but is generally below the top 4-5 processes and less than 10 percent when working, dropping to around 0.3 percent when swap activity has subsided). This was with loading KDE 3.5.10, konversation, aptitude -u, icedove and iceweasel with several dozen tabs in addition to aplaymidi. Regards, Arthur. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [Alsa-user] new source of MIDI playback slow-down identified - 5a03b051ed87e72b959f32a86054e1142ac4cf55 thp: use compaction in kswapd for GFP_ATOMIC order > 0 2011-02-24 1:40 ` Arthur Marsh @ 2011-02-24 1:54 ` Andrea Arcangeli 2011-02-26 6:43 ` Andrea Arcangeli 0 siblings, 1 reply; 31+ messages in thread From: Andrea Arcangeli @ 2011-02-24 1:54 UTC (permalink / raw) To: Arthur Marsh; +Cc: Clemens Ladisch, alsa-user, linux-kernel, Mel Gorman Hi Arthur, On Thu, Feb 24, 2011 at 12:10:45PM +1030, Arthur Marsh wrote: > OK, with kswapd-high_wmark + compaction-kswapd-3 + Mel's patch (with the > compaction initialisation fix), MIDI playback is fine. > kswapd0 CPU can very occasionally hit equal highest (17 percent was the > highest I noticed, but is generally below the top 4-5 processes and less > than 10 percent when working, dropping to around 0.3 percent when swap > activity has subsided). This was with loading KDE 3.5.10, konversation, > aptitude -u, icedove and iceweasel with several dozen tabs in addition > to aplaymidi. That sounds good... so maybe we don't have to backout compaction out of kswapd and the new kswapd+compaction logic is working fine without overloading the system like the older logic did. Ok so tomorrow I'll get all results on these 3 kernels ( compaction-kswapd-3+compaction_alloc_lowlat-2 vs compaction-no-kswapd-3+compaction_alloc_lowlat-2 vs compaction_alloc_lowlat2) on network server load, where throughout is measured in addition to latency. Then we'll have a better picture to decide. If throughput is decreased in any measurable way I still prefer compaction-no-kswapd and to rethink at this later without hurry. The network load with jumbo frames from e1000 is very good at stressing compaction+kswapd so there should be no surprises if throughput and latency are ok with the new compaction kswapd logic. I suggest you to keep running with this combination (compaction-kswapd-3 + Mel's patch with the initialization fix that I sent you + kswapd-high_wmark) if you can, to be sure it works fine for you. Thanks a lot for now! Andrea ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [Alsa-user] new source of MIDI playback slow-down identified - 5a03b051ed87e72b959f32a86054e1142ac4cf55 thp: use compaction in kswapd for GFP_ATOMIC order > 0 2011-02-24 1:54 ` Andrea Arcangeli @ 2011-02-26 6:43 ` Andrea Arcangeli 2011-02-27 8:48 ` Arthur Marsh 0 siblings, 1 reply; 31+ messages in thread From: Andrea Arcangeli @ 2011-02-26 6:43 UTC (permalink / raw) To: Arthur Marsh; +Cc: Clemens Ladisch, alsa-user, linux-kernel, Mel Gorman On Thu, Feb 24, 2011 at 02:54:05AM +0100, Andrea Arcangeli wrote: > Ok so tomorrow I'll get all results on these 3 kernels ( > compaction-kswapd-3+compaction_alloc_lowlat-2 vs > compaction-no-kswapd-3+compaction_alloc_lowlat-2 vs > compaction_alloc_lowlat2) on network server load, where throughout is > measured in addition to latency. Then we'll have a better picture to Latency is still lowest with compaction-no-kswapd-3. compaction_alloc still is at the top of the profiling with compaction-kswapd-3+compaction_alloc_lowlat-2. However compaction-kswapd-3 reduced the overhead somewhat but not enough to be as fast as with compaction-no-kswapd-3 (even if much better than before). So we should apply compaction-no-kswapd-3 to 2.6.38 I think. ==== Subject: compaction: fix high compaction latencies and remove compaction-kswapd From: Andrea Arcangeli <aarcange@redhat.com> It's safer to stop calling compaction from kswapd as that creates too high load during memory pressure that can't be offseted by the improved performance of compound allocations. NOTE: this is not related to THP (THP allocations uses __GFP_NO_KSWAPD), this is only related to frequent and small order allocations that make kswapd go wild with compaction. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> --- --- a/mm/compaction.c +++ b/mm/compaction.c @@ -405,10 +423,7 @@ static int compact_finished(struct zone return COMPACT_COMPLETE; /* Compaction run is not finished if the watermark is not met */ - if (cc->compact_mode != COMPACT_MODE_KSWAPD) - watermark = low_wmark_pages(zone); - else - watermark = high_wmark_pages(zone); + watermark = low_wmark_pages(zone); watermark += (1 << cc->order); if (!zone_watermark_ok(zone, cc->order, watermark, 0, 0)) @@ -421,15 +436,6 @@ static int compact_finished(struct zone if (cc->order == -1) return COMPACT_CONTINUE; - /* - * Generating only one page of the right order is not enough - * for kswapd, we must continue until we're above the high - * watermark as a pool for high order GFP_ATOMIC allocations - * too. - */ - if (cc->compact_mode == COMPACT_MODE_KSWAPD) - return COMPACT_CONTINUE; - /* Direct compactor: Is a suitable page free? */ for (order = cc->order; order < MAX_ORDER; order++) { /* Job done if page is free of the right migratetype */ @@ -551,8 +557,7 @@ static int compact_zone(struct zone *zon unsigned long compact_zone_order(struct zone *zone, int order, gfp_t gfp_mask, - bool sync, - int compact_mode) + bool sync) { struct compact_control cc = { .nr_freepages = 0, @@ -561,7 +566,6 @@ unsigned long compact_zone_order(struct .migratetype = allocflags_to_migratetype(gfp_mask), .zone = zone, .sync = sync, - .compact_mode = compact_mode, }; INIT_LIST_HEAD(&cc.freepages); INIT_LIST_HEAD(&cc.migratepages); @@ -607,8 +611,7 @@ unsigned long try_to_compact_pages(struc nodemask) { int status; - status = compact_zone_order(zone, order, gfp_mask, sync, - COMPACT_MODE_DIRECT_RECLAIM); + status = compact_zone_order(zone, order, gfp_mask, sync); rc = max(status, rc); /* If a normal allocation would succeed, stop compacting */ @@ -639,7 +642,6 @@ static int compact_node(int nid) .nr_freepages = 0, .nr_migratepages = 0, .order = -1, - .compact_mode = COMPACT_MODE_DIRECT_RECLAIM, }; zone = &pgdat->node_zones[zoneid]; --- a/include/linux/compaction.h +++ b/include/linux/compaction.h @@ -11,9 +11,6 @@ /* The full zone was compacted */ #define COMPACT_COMPLETE 3 -#define COMPACT_MODE_DIRECT_RECLAIM 0 -#define COMPACT_MODE_KSWAPD 1 - #ifdef CONFIG_COMPACTION extern int sysctl_compact_memory; extern int sysctl_compaction_handler(struct ctl_table *table, int write, @@ -28,8 +25,7 @@ extern unsigned long try_to_compact_page bool sync); extern unsigned long compaction_suitable(struct zone *zone, int order); extern unsigned long compact_zone_order(struct zone *zone, int order, - gfp_t gfp_mask, bool sync, - int compact_mode); + gfp_t gfp_mask, bool sync); /* Do not skip compaction more than 64 times */ #define COMPACT_MAX_DEFER_SHIFT 6 @@ -74,8 +70,7 @@ static inline unsigned long compaction_s } static inline unsigned long compact_zone_order(struct zone *zone, int order, - gfp_t gfp_mask, bool sync, - int compact_mode) + gfp_t gfp_mask, bool sync) { return COMPACT_CONTINUE; } --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2385,7 +2385,6 @@ loop_again: * cause too much scanning of the lower zones. */ for (i = 0; i <= end_zone; i++) { - int compaction; struct zone *zone = pgdat->node_zones + i; int nr_slab; unsigned long balance_gap; @@ -2426,24 +2425,9 @@ loop_again: sc.nr_reclaimed += reclaim_state->reclaimed_slab; total_scanned += sc.nr_scanned; - compaction = 0; - if (order && - zone_watermark_ok(zone, 0, - high_wmark_pages(zone), - end_zone, 0) && - !zone_watermark_ok(zone, order, - high_wmark_pages(zone), - end_zone, 0)) { - compact_zone_order(zone, - order, - sc.gfp_mask, false, - COMPACT_MODE_KSWAPD); - compaction = 1; - } - if (zone->all_unreclaimable) continue; - if (!compaction && nr_slab == 0 && + if (nr_slab == 0 && !zone_reclaimable(zone)) zone->all_unreclaimable = 1; /* ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [Alsa-user] new source of MIDI playback slow-down identified - 5a03b051ed87e72b959f32a86054e1142ac4cf55 thp: use compaction in kswapd for GFP_ATOMIC order > 0 2011-02-26 6:43 ` Andrea Arcangeli @ 2011-02-27 8:48 ` Arthur Marsh 0 siblings, 0 replies; 31+ messages in thread From: Arthur Marsh @ 2011-02-27 8:48 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Clemens Ladisch, alsa-user, linux-kernel, Mel Gorman Andrea Arcangeli wrote, on 26/02/11 17:13: > On Thu, Feb 24, 2011 at 02:54:05AM +0100, Andrea Arcangeli wrote: >> Ok so tomorrow I'll get all results on these 3 kernels ( >> compaction-kswapd-3+compaction_alloc_lowlat-2 vs >> compaction-no-kswapd-3+compaction_alloc_lowlat-2 vs >> compaction_alloc_lowlat2) on network server load, where throughout is >> measured in addition to latency. Then we'll have a better picture to > > Latency is still lowest with compaction-no-kswapd-3. compaction_alloc > still is at the top of the profiling with > compaction-kswapd-3+compaction_alloc_lowlat-2. However > compaction-kswapd-3 reduced the overhead somewhat but not enough to be > as fast as with compaction-no-kswapd-3 (even if much better than > before). > > So we should apply compaction-no-kswapd-3 to 2.6.38 I think. I built a kernel based against git head of around 0800 UTC Saturday 2011-02-26 plus compaction-no-kswapd-3 (Andrea's) and compaction_alloc_lowlat-2 (Mel's) patches. The the latency performance for MIDI playback is excellent and kswap0 has reported 100 seconds of CPU time in 22 hours of the kernel kernel running. However, I am now battling against the kernel freeing up memory when using icedove/thunderbird. top is reporting 74 MiB resident, 418 MiB virtual, and as I pause typing to watch top output, free RAM has gone up to 50 MiB and I'm stuck waiting for parts of icedove to swap back in so that I can get a response from my input (either scrolling a message when reading it or composing a message, where I am left waiting for the text that I typed to appear). Normally in this machine with only 384 MiB RAM and (say) 2.6.35-2.6.37 kernels, free RAM will hover around the 5 MiB mark when lots of applications are open. I may wait when switching from one running application to another for it to respond, but once I am using that particular application it sufficiently responsive not to be annoying. The aggressive reclamation of RAM by the kernel I built around 0800 UTC Saturday is working against the application currently in focus. I will try Andrea's and Mel's patches built against 2.6.36-rc6 over the next several hours to see how that compares. Arthur. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [Alsa-user] new source of MIDI playback slow-down identified - 5a03b051ed87e72b959f32a86054e1142ac4cf55 thp: use compaction in kswapd for GFP_ATOMIC order > 0 2011-02-23 16:24 ` Andrea Arcangeli 2011-02-23 16:36 ` Andrea Arcangeli 2011-02-23 16:55 ` Andrea Arcangeli @ 2011-02-23 17:10 ` Mel Gorman 2011-02-23 17:27 ` Andrea Arcangeli 2 siblings, 1 reply; 31+ messages in thread From: Mel Gorman @ 2011-02-23 17:10 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Arthur Marsh, Clemens Ladisch, alsa-user, linux-kernel On Wed, Feb 23, 2011 at 05:24:32PM +0100, Andrea Arcangeli wrote: > On Wed, Feb 23, 2011 at 04:17:44AM +1030, Arthur Marsh wrote: > > OK, these patches applied together against upstream didn't cause a crash > > but I did observe: > > > > significant slowdowns of MIDI playback (moreso than in previous cases, > > and with less than 20 Meg of swap file in use); > > > > kswapd0 sharing equal top place in CPU usage at times (e.g. 20 percent). > > > > If I should try only one of the patches or something else entirely, > > please let me know. > > Yes, with irq off, schedule won't run and need_resched won't get set. > Stepping back a little, how did you determine that isolate_migrate was the major problem? In my initial tests using the irqsoff tracer (sampled for the duration fo the test every few seconds and resetting the max latency each time), compaction_alloc() was a far worse source of problems and isolate_migratepage didn't even register. It might be that I'm not testing on large enough machines though. > So let's try this. > > In case this doesn't fix I definitely give it up with compaction in > kswapd as GFP_ATOMIC will likely not get an huge benefit out of > compaction anyway because 1) the allocations from GFP_ATOMIC are > likely short lived, 2) the cost of the compaction scan loop + > migration may be higher than the benefit we get from jumbo frames > In another mail, I posted a patch that dealt with compaction_alloc after finding that IRQs were being disabled for millisecond lengths of time. That length of time for IRQs being disabled could account for the performance loss on the network load. Can test the network load with it applied? > <SNIP> -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [Alsa-user] new source of MIDI playback slow-down identified - 5a03b051ed87e72b959f32a86054e1142ac4cf55 thp: use compaction in kswapd for GFP_ATOMIC order > 0 2011-02-23 17:10 ` Mel Gorman @ 2011-02-23 17:27 ` Andrea Arcangeli 2011-02-23 17:44 ` Mel Gorman 0 siblings, 1 reply; 31+ messages in thread From: Andrea Arcangeli @ 2011-02-23 17:27 UTC (permalink / raw) To: Mel Gorman; +Cc: Arthur Marsh, Clemens Ladisch, alsa-user, linux-kernel On Wed, Feb 23, 2011 at 05:10:47PM +0000, Mel Gorman wrote: > On Wed, Feb 23, 2011 at 05:24:32PM +0100, Andrea Arcangeli wrote: > > On Wed, Feb 23, 2011 at 04:17:44AM +1030, Arthur Marsh wrote: > > > OK, these patches applied together against upstream didn't cause a crash > > > but I did observe: > > > > > > significant slowdowns of MIDI playback (moreso than in previous cases, > > > and with less than 20 Meg of swap file in use); > > > > > > kswapd0 sharing equal top place in CPU usage at times (e.g. 20 percent). > > > > > > If I should try only one of the patches or something else entirely, > > > please let me know. > > > > Yes, with irq off, schedule won't run and need_resched won't get set. > > > > Stepping back a little, how did you determine that isolate_migrate was the > major problem? In my initial tests using the irqsoff tracer (sampled for > the duration fo the test every few seconds and resetting the max latency > each time), compaction_alloc() was a far worse source of problems and > isolate_migratepage didn't even register. It might be that I'm not testing > on large enough machines though. I think you're right compaction_alloc is a bigger problem. Your patch to isolate_freepages is a must have and in the right direction. However I think having large areas set as PageBuddy may be common too, the irq latency source in isolated_migratepages I think needs fixing too. We must be guaranteed to release irqs after max N pages (where N is SWAP_CLUSTER_MAX in my last two patches). > In another mail, I posted a patch that dealt with compaction_alloc after > finding that IRQs were being disabled for millisecond lengths of time. > That length of time for IRQs being disabled could account for the performance > loss on the network load. Can test the network load with it applied? kswapd was also running at 100% on all CPUs in that test. The z1 that doesn't fix the latency source in compaction but that removes compaction from kswapd (a light/hackish version of compaction-no-kswapd-3 that I just posted) fixes the problem completely for the network load too. So clearly it's not only a problem we can fix in compaction, the irq latency will improve for sure, but we still get an overload from kswapd which is not ok I think. What I am planning to test on the network load is high-wmark+compaction_alloc_lowlat+compaction-kswapd-3 vs high-wmark+compaction_alloc_lowlat+compaction-no-kswapd-2. Is this ok? If you want I can test also high-wmark+compaction_alloc_lowlat without compaction-kswapd-3/compaction-no-kswapd-2 but I think the irq-latency source in isolate_migratepages in presence of large PageBuddy regions (after any large application started at boot quits) isn't ok. Also I think having kswapd at 100% cpu load isn't ok. So I doubt we should stop at compaction_alloc_lowlat. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [Alsa-user] new source of MIDI playback slow-down identified - 5a03b051ed87e72b959f32a86054e1142ac4cf55 thp: use compaction in kswapd for GFP_ATOMIC order > 0 2011-02-23 17:27 ` Andrea Arcangeli @ 2011-02-23 17:44 ` Mel Gorman 2011-02-23 18:14 ` Andrea Arcangeli 0 siblings, 1 reply; 31+ messages in thread From: Mel Gorman @ 2011-02-23 17:44 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Arthur Marsh, Clemens Ladisch, alsa-user, linux-kernel On Wed, Feb 23, 2011 at 06:27:34PM +0100, Andrea Arcangeli wrote: > On Wed, Feb 23, 2011 at 05:10:47PM +0000, Mel Gorman wrote: > > On Wed, Feb 23, 2011 at 05:24:32PM +0100, Andrea Arcangeli wrote: > > > On Wed, Feb 23, 2011 at 04:17:44AM +1030, Arthur Marsh wrote: > > > > OK, these patches applied together against upstream didn't cause a crash > > > > but I did observe: > > > > > > > > significant slowdowns of MIDI playback (moreso than in previous cases, > > > > and with less than 20 Meg of swap file in use); > > > > > > > > kswapd0 sharing equal top place in CPU usage at times (e.g. 20 percent). > > > > > > > > If I should try only one of the patches or something else entirely, > > > > please let me know. > > > > > > Yes, with irq off, schedule won't run and need_resched won't get set. > > > > > > > Stepping back a little, how did you determine that isolate_migrate was the > > major problem? In my initial tests using the irqsoff tracer (sampled for > > the duration fo the test every few seconds and resetting the max latency > > each time), compaction_alloc() was a far worse source of problems and > > isolate_migratepage didn't even register. It might be that I'm not testing > > on large enough machines though. > > I think you're right compaction_alloc is a bigger problem. Your patch > to isolate_freepages is a must have and in the right direction. > Nice one. > However I think having large areas set as PageBuddy may be common too, > the irq latency source in isolated_migratepages I think needs fixing > too. We must be guaranteed to release irqs after max N pages (where N > is SWAP_CLUSTER_MAX in my last two patches). > Your logic makes sense and I can see why it might not necessarily show up in my tests. I was simply wondering if you spotted the problem directly or from looking at teh source. > > In another mail, I posted a patch that dealt with compaction_alloc after > > finding that IRQs were being disabled for millisecond lengths of time. > > That length of time for IRQs being disabled could account for the performance > > loss on the network load. Can test the network load with it applied? > > kswapd was also running at 100% on all CPUs in that test. > On the plus side, the patch I posted also reduces kswapd CPU time. Graphing CPU usage over time, I saw the following; http://www.csn.ul.ie/~mel/postings/compaction-20110223/kswapdcpu-smooth-hydra.ps i.e. CPU usage of kswapd is also reduced. The graph is smoothened because the raw figures are so jagged as to be almost impossible to read. The z1 patches and others could also further reduce it (I haven't measured it yet) but I thought it was interesting that IRQs being disabled for long periods also contribed so heavily to kswapd CPU usage. > The z1 that doesn't fix the latency source in compaction but that > removes compaction from kswapd (a light/hackish version of > compaction-no-kswapd-3 that I just posted) fixes the problem > completely for the network load too. > Ok. If necessary we can disable it entirely for this cycle but as I'm seeing large sources of IRQ disabled latency in compaction and shrink_inactive_list, it'd be nice to get that ironed out while the problem is obvious too. > So clearly it's not only a problem we can fix in compaction, the irq > latency will improve for sure, but we still get an overload from > kswapd which is not ok I think. > Indeed not. > What I am planning to test on the network load is > high-wmark+compaction_alloc_lowlat+compaction-kswapd-3 vs > high-wmark+compaction_alloc_lowlat+compaction-no-kswapd-2. > > Is this ok? Sure to see what the results are. I'm still hoping we can prove the high-wmark unnecessary due to Rik's naks. His reasoning about the corner cases it potentially introduces is hard, if not impossible, to disprove. > If you want I can test also > high-wmark+compaction_alloc_lowlat without > compaction-kswapd-3/compaction-no-kswapd-2 but I think the irq-latency > source in isolate_migratepages in presence of large PageBuddy regions > (after any large application started at boot quits) isn't ok. Can you ditch all these patches in a directory somewhere because I'm getting confused as to which patch is which exactly :) > Also I > think having kswapd at 100% cpu load isn't ok. So I doubt we should > stop at compaction_alloc_lowlat. > kswapd at 100% CPU is certainly unsuitable but would like to be sure we are getting it down the right way without reintroducing the problems this 8*high_wmark check fixed. -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [Alsa-user] new source of MIDI playback slow-down identified - 5a03b051ed87e72b959f32a86054e1142ac4cf55 thp: use compaction in kswapd for GFP_ATOMIC order > 0 2011-02-23 17:44 ` Mel Gorman @ 2011-02-23 18:14 ` Andrea Arcangeli 0 siblings, 0 replies; 31+ messages in thread From: Andrea Arcangeli @ 2011-02-23 18:14 UTC (permalink / raw) To: Mel Gorman; +Cc: Arthur Marsh, Clemens Ladisch, alsa-user, linux-kernel On Wed, Feb 23, 2011 at 05:44:37PM +0000, Mel Gorman wrote: > Your logic makes sense and I can see why it might not necessarily show > up in my tests. I was simply wondering if you spotted the problem > directly or from looking at teh source. I looked at the profiling and then at the source, but compaction_alloc is on top, so it matches your findings. This is with z1. Samples % of Total Cum. Samples Cum. % of Total module:function ------------------------------------------------------------------------------------------------- 177786 6.178 177786 6.178 sunrpc:svc_recv 128779 4.475 306565 10.654 sunrpc:svc_xprt_enqueue 80786 2.807 387351 13.462 vmlinux:__d_lookup 62272 2.164 449623 15.626 ext4:ext4_htree_store_dirent 55896 1.942 505519 17.569 jbd2:journal_clean_one_cp_list 43868 1.524 549387 19.093 vmlinux:task_rq_lock 43572 1.514 592959 20.608 vmlinux:kfree 37620 1.307 630579 21.915 vmlinux:mwait_idle 36169 1.257 666748 23.172 vmlinux:schedule 34037 1.182 700785 24.355 e1000:e1000_clean 31945 1.110 732730 25.465 vmlinux:find_busiest_group 31491 1.094 764221 26.560 qla2xxx:qla24xx_intr_handler 30681 1.066 794902 27.626 vmlinux:_atomic_dec_and_lock 7425 0.258 xxxxxx xxxxxx vmlinux:get_page_from_freelist This is with current compaction logic in kswapd. Samples % of Total Cum. Samples Cum. % of Total module:function ------------------------------------------------------------------------------------------------- 1182928 17.358 1182928 17.358 vmlinux:get_page_from_freelist 657802 9.652 1840730 27.011 vmlinux:free_pcppages_bulk 579976 8.510 2420706 35.522 sunrpc:svc_xprt_enqueue 508953 7.468 2929659 42.991 sunrpc:svc_recv 490538 7.198 3420197 50.189 vmlinux:compaction_alloc 188620 2.767 3608817 52.957 vmlinux:tg_shares_up 97527 1.431 3706344 54.388 vmlinux:__d_lookup 85670 1.257 3792014 55.646 jbd2:journal_clean_one_cp_list 71738 1.052 3863752 56.698 vmlinux:mutex_spin_on_owner 71037 1.042 3934789 57.741 vmlinux:kfree So clearly your patch may increase performance too (because of less contention on the spinlock) but it's unlikely to make compaction_alloc go away from the profiling. This isn't measuring irq latency, just the time the CPU spent on each function but the two issues are connected (as the more we call in that function the higher probability to run into the high latency loop once in a while). > On the plus side, the patch I posted also reduces kswapd CPU time. > Graphing CPU usage over time, I saw the following; > > http://www.csn.ul.ie/~mel/postings/compaction-20110223/kswapdcpu-smooth-hydra.ps > > i.e. CPU usage of kswapd is also reduced. The graph is smoothened because > the raw figures are so jagged as to be almost impossible to read. The z1 > patches and others could also further reduce it (I haven't measured it yet) > but I thought it was interesting that IRQs being disabled for long periods > also contribed so heavily to kswapd CPU usage. I think it's lower contention on the heavily used zone lock may have contributed to decreasing the overall system load if it's a large SMP, not sure why kswapd usage went down though. No problem so, I will test also a third kernel with your patch alone. > Ok. If necessary we can disable it entirely for this cycle but as I'm > seeing large sources of IRQ disabled latency in compaction and > shrink_inactive_list, it'd be nice to get that ironed out while the > problem is obvious too. Sure. The current kswapd code helps to find any latency issue in compaction ;). In fact they were totally unnoticed until we enabled it in kswapd. > Sure to see what the results are. I'm still hoping we can prove the high-wmark > unnecessary due to Rik's naks. His reasoning about the corner cases it > potentially introduces is hard, if not impossible, to disprove. In my evaluation shrinking more on the small lists was worse for overall zone lru balancing. That's the side effect of that change. But I'm not against changing it to high+min like he suggested. For now this was simpler. I've seen your patch too, that's ok with me too. But because I don't see exactly the rationale of why it's a problem, I don't like things that I'm uncertain about and I find the removal of *8 simpler. > Can you ditch all these patches in a directory somewhere because I'm > getting confused as to which patch is which exactly :) ok.... Let me finish sending the 3 kernels to test. > kswapd at 100% CPU is certainly unsuitable but would like to be sure we > are getting it down the right way without reintroducing the problems > this 8*high_wmark check fixed. Well the 8*high was never related to high kswapd load, simply it has the effect that more memory is free when kswapd stops. It's very quick at reaching 700m free, then it behaves identical as if only ~100m are free (like now without the *8). About kswapd: the current logic is clearly not ok in certain workloads (my fault), so my attempt at fixing it is compaction-kswapd-3. I think the primary problem is kswapd won't stop after the first invocation of compaction if there's any fragmentation in any zone (could even be a tiny dma zone). So this will fix it. But it'll still cause one compaction invocation for every new order > 0 allocation (no big deal for the dma zone as it's small). If you check that vmscan.c change of compaction-kswapd-2 I think it has better chance to work now. (also noticed __compaction_need_reclaim doesn't need the "int order" parameter but you can ignore it, it's harmless, worthless to fix until we know if this helps). If even this fails, that means calling compaction even a single time for each kswapd wakeup (in addition to direct compaction) is too much. Next step would be to kswapd.max_order-- until it reaches zero so it stops being called unless direct compaction is invoked too. But then we can try this later. compaction-no-kswapd-3 + your compaction_alloc_lowlat should fix the problem and it's good thing kswapd misbehaved so we noticed the latency issues. ^ permalink raw reply [flat|nested] 31+ messages in thread
end of thread, other threads:[~2011-02-27 8:48 UTC | newest]
Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <g0ia38-jj6.ln1@ppp121-45-136-118.lns11.adl6.internode.on.net>
2011-02-22 7:37 ` [Alsa-user] new source of MIDI playback slow-down identified - 5a03b051ed87e72b959f32a86054e1142ac4cf55 thp: use compaction in kswapd for GFP_ATOMIC order > 0 Clemens Ladisch
2011-02-22 7:46 ` Arthur Marsh
2011-02-22 13:40 ` Andrea Arcangeli
2011-02-22 16:15 ` Andrea Arcangeli
2011-02-22 16:59 ` Mel Gorman
2011-02-22 17:08 ` Andrea Arcangeli
2011-02-22 17:37 ` Mel Gorman
2011-02-22 17:47 ` Arthur Marsh
2011-02-22 19:43 ` Andrea Arcangeli
2011-02-23 9:15 ` Mel Gorman
2011-02-23 11:41 ` Arthur Marsh
2011-02-23 13:50 ` Clemens Ladisch
2011-02-23 17:01 ` Mel Gorman
2011-02-23 17:40 ` Andrea Arcangeli
2011-02-23 16:24 ` Andrea Arcangeli
2011-02-23 16:36 ` Andrea Arcangeli
2011-02-23 16:40 ` Andrea Arcangeli
2011-02-23 16:47 ` Andrea Arcangeli
2011-02-23 16:55 ` Andrea Arcangeli
2011-02-23 20:07 ` Arthur Marsh
2011-02-23 21:25 ` Andrea Arcangeli
2011-02-23 21:55 ` Arthur Marsh
2011-02-23 23:59 ` Andrea Arcangeli
2011-02-24 1:40 ` Arthur Marsh
2011-02-24 1:54 ` Andrea Arcangeli
2011-02-26 6:43 ` Andrea Arcangeli
2011-02-27 8:48 ` Arthur Marsh
2011-02-23 17:10 ` Mel Gorman
2011-02-23 17:27 ` Andrea Arcangeli
2011-02-23 17:44 ` Mel Gorman
2011-02-23 18:14 ` Andrea Arcangeli
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox