* Root-causing kswapd spinning on Sandy Bridge laptops?
@ 2011-06-24 6:22 Andrew Lutomirski
2011-06-24 9:27 ` Minchan Kim
2011-06-24 18:44 ` Andi Kleen
0 siblings, 2 replies; 14+ messages in thread
From: Andrew Lutomirski @ 2011-06-24 6:22 UTC (permalink / raw)
To: Minchan Kim, linux-mm
I'm back :-/
I just triggered the kswapd bug on 2.6.39.1, which has the
cond_resched in shrink_slab. This time my system's still usable (I'm
tying this email on it), but kswapd0 is taking 100% cpu. It *does*
schedule (tested by setting its affinity the same as another CPU hog
and confirming that each one gets 50%).
It appears to be calling i915_gem_inactive_shrink in a loop. I have
probes on entry and return of i915_gem_inactive_shrink and on return
of shrink_slab. I see:
kswapd0 47 [000] 59599.956573: mm_vmscan_kswapd_wake: nid=0 order=0
kswapd0 47 [000] 59599.956575: shrink_zone:
(ffffffff810c848c) priority=12 zone=ffff8801005fe000
kswapd0 47 [000] 59599.956576: shrink_zone_return:
(ffffffff810c848c <- ffffffff810c96c6) arg1=0
kswapd0 47 [000] 59599.956578: i915_gem_inactive_shrink:
(ffffffffa0081e48) gfp_mask=d0 nr_to_scan=0
kswapd0 47 [000] 59599.956589: shrink_return:
(ffffffffa0081e48 <- ffffffff810c6a62) arg1=320
kswapd0 47 [000] 59599.956589: shrink_slab_return:
(ffffffff810c69f5 <- ffffffff810c96ec) arg1=0
kswapd0 47 [000] 59599.956592: i915_gem_inactive_shrink:
(ffffffffa0081e48) gfp_mask=d0 nr_to_scan=0
kswapd0 47 [000] 59599.956602: shrink_return:
(ffffffffa0081e48 <- ffffffff810c6a62) arg1=320
kswapd0 47 [000] 59599.956603: shrink_slab_return:
(ffffffff810c69f5 <- ffffffff810c96ec) arg1=0
kswapd0 47 [000] 59599.956605: shrink_zone:
(ffffffff810c848c) priority=12 zone=ffff8801005fee00
kswapd0 47 [000] 59599.956606: shrink_zone_return:
(ffffffff810c848c <- ffffffff810c96c6) arg1=0
kswapd0 47 [000] 59599.956608: i915_gem_inactive_shrink:
(ffffffffa0081e48) gfp_mask=d0 nr_to_scan=0
kswapd0 47 [000] 59599.956609: shrink_return:
(ffffffffa0081e48 <- ffffffff810c6a62) arg1=0
kswapd0 47 [000] 59599.956610: shrink_slab_return:
(ffffffff810c69f5 <- ffffffff810c96ec) arg1=0
kswapd0 47 [000] 59599.956611: mm_vmscan_kswapd_wake: nid=0 order=0
kswapd0 47 [000] 59599.956612: shrink_zone:
(ffffffff810c848c) priority=12 zone=ffff8801005fe000
kswapd0 47 [000] 59599.956614: shrink_zone_return:
(ffffffff810c848c <- ffffffff810c96c6) arg1=0
kswapd0 47 [000] 59599.956616: i915_gem_inactive_shrink:
(ffffffffa0081e48) gfp_mask=d0 nr_to_scan=0
kswapd0 47 [000] 59599.956617: shrink_return:
(ffffffffa0081e48 <- ffffffff810c6a62) arg1=0
kswapd0 47 [000] 59599.956618: shrink_slab_return:
(ffffffff810c69f5 <- ffffffff810c96ec) arg1=0
kswapd0 47 [000] 59599.956620: i915_gem_inactive_shrink:
(ffffffffa0081e48) gfp_mask=d0 nr_to_scan=0
kswapd0 47 [000] 59599.956621: shrink_return:
(ffffffffa0081e48 <- ffffffff810c6a62) arg1=0
kswapd0 47 [000] 59599.956621: shrink_slab_return:
(ffffffff810c69f5 <- ffffffff810c96ec) arg1=0
kswapd0 47 [000] 59599.956623: shrink_zone:
(ffffffff810c848c) priority=12 zone=ffff8801005fee00
kswapd0 47 [000] 59599.956624: shrink_zone_return:
(ffffffff810c848c <- ffffffff810c96c6) arg1=0
kswapd0 47 [000] 59599.956626: i915_gem_inactive_shrink:
(ffffffffa0081e48) gfp_mask=d0 nr_to_scan=0
kswapd0 47 [000] 59599.956627: shrink_return:
(ffffffffa0081e48 <- ffffffff810c6a62) arg1=0
kswapd0 47 [000] 59599.956628: shrink_slab_return:
(ffffffff810c69f5 <- ffffffff810c96ec) arg1=0
kswapd0 47 [000] 59599.956629: mm_vmscan_kswapd_wake: nid=0 order=0
The command was:
perf record -g -aR -p 47 -e probe:i915_gem_inactive_shrink -e
probe:shrink_return -e probe:shrink_slab_return -e probe:shrink_zone
-e probe:shrink_zone_return -e probe:kswapd_try_to_sleep -e
vmscan:mm_vmscan_kswapd_sleep -e vmscan:mm_vmscan_kswapd_wake -e
vmscan:mm_vmscan_wakeup_kswapd -e vmscan:mm_vmscan_lru_shrink_inactive
-e probe:wakeup_kswapd; perf script
(shrink_return is i915_gem_inactive_shrink's return. sorry, badly named.)
It looks like something kswapd_try_to_sleep is not getting called.
I do not know how to reproduce this, but I'll leave it running overnight.
--Andy
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Root-causing kswapd spinning on Sandy Bridge laptops?
2011-06-24 6:22 Root-causing kswapd spinning on Sandy Bridge laptops? Andrew Lutomirski
@ 2011-06-24 9:27 ` Minchan Kim
2011-06-24 9:38 ` Minchan Kim
2011-06-24 10:24 ` Pádraig Brady
2011-06-24 18:44 ` Andi Kleen
1 sibling, 2 replies; 14+ messages in thread
From: Minchan Kim @ 2011-06-24 9:27 UTC (permalink / raw)
To: Andrew Lutomirski; +Cc: linux-mm, Mel Gorman, Pádraig Brady
Hi Andrew,
Sorry but right now I don't have a time to dive into this.
But it seems to be similar to the problem Mel is looking at.
Cced him.
Even, Pádraig Brady seem to have a reproducible scenario.
I will look when I have a time.
I hope I will be back sooner or later.
On Fri, Jun 24, 2011 at 3:22 PM, Andrew Lutomirski <luto@mit.edu> wrote:
> I'm back :-/
>
> I just triggered the kswapd bug on 2.6.39.1, which has the
> cond_resched in shrink_slab. This time my system's still usable (I'm
> tying this email on it), but kswapd0 is taking 100% cpu. It *does*
> schedule (tested by setting its affinity the same as another CPU hog
> and confirming that each one gets 50%).
>
> It appears to be calling i915_gem_inactive_shrink in a loop. I have
> probes on entry and return of i915_gem_inactive_shrink and on return
> of shrink_slab. I see:
>
> kswapd0 47 [000] 59599.956573: mm_vmscan_kswapd_wake: nid=0 order=0
> kswapd0 47 [000] 59599.956575: shrink_zone:
> (ffffffff810c848c) priority=12 zone=ffff8801005fe000
> kswapd0 47 [000] 59599.956576: shrink_zone_return:
> (ffffffff810c848c <- ffffffff810c96c6) arg1=0
> kswapd0 47 [000] 59599.956578: i915_gem_inactive_shrink:
> (ffffffffa0081e48) gfp_mask=d0 nr_to_scan=0
> kswapd0 47 [000] 59599.956589: shrink_return:
> (ffffffffa0081e48 <- ffffffff810c6a62) arg1=320
> kswapd0 47 [000] 59599.956589: shrink_slab_return:
> (ffffffff810c69f5 <- ffffffff810c96ec) arg1=0
> kswapd0 47 [000] 59599.956592: i915_gem_inactive_shrink:
> (ffffffffa0081e48) gfp_mask=d0 nr_to_scan=0
> kswapd0 47 [000] 59599.956602: shrink_return:
> (ffffffffa0081e48 <- ffffffff810c6a62) arg1=320
> kswapd0 47 [000] 59599.956603: shrink_slab_return:
> (ffffffff810c69f5 <- ffffffff810c96ec) arg1=0
> kswapd0 47 [000] 59599.956605: shrink_zone:
> (ffffffff810c848c) priority=12 zone=ffff8801005fee00
> kswapd0 47 [000] 59599.956606: shrink_zone_return:
> (ffffffff810c848c <- ffffffff810c96c6) arg1=0
> kswapd0 47 [000] 59599.956608: i915_gem_inactive_shrink:
> (ffffffffa0081e48) gfp_mask=d0 nr_to_scan=0
> kswapd0 47 [000] 59599.956609: shrink_return:
> (ffffffffa0081e48 <- ffffffff810c6a62) arg1=0
> kswapd0 47 [000] 59599.956610: shrink_slab_return:
> (ffffffff810c69f5 <- ffffffff810c96ec) arg1=0
> kswapd0 47 [000] 59599.956611: mm_vmscan_kswapd_wake: nid=0 order=0
> kswapd0 47 [000] 59599.956612: shrink_zone:
> (ffffffff810c848c) priority=12 zone=ffff8801005fe000
> kswapd0 47 [000] 59599.956614: shrink_zone_return:
> (ffffffff810c848c <- ffffffff810c96c6) arg1=0
> kswapd0 47 [000] 59599.956616: i915_gem_inactive_shrink:
> (ffffffffa0081e48) gfp_mask=d0 nr_to_scan=0
> kswapd0 47 [000] 59599.956617: shrink_return:
> (ffffffffa0081e48 <- ffffffff810c6a62) arg1=0
> kswapd0 47 [000] 59599.956618: shrink_slab_return:
> (ffffffff810c69f5 <- ffffffff810c96ec) arg1=0
> kswapd0 47 [000] 59599.956620: i915_gem_inactive_shrink:
> (ffffffffa0081e48) gfp_mask=d0 nr_to_scan=0
> kswapd0 47 [000] 59599.956621: shrink_return:
> (ffffffffa0081e48 <- ffffffff810c6a62) arg1=0
> kswapd0 47 [000] 59599.956621: shrink_slab_return:
> (ffffffff810c69f5 <- ffffffff810c96ec) arg1=0
> kswapd0 47 [000] 59599.956623: shrink_zone:
> (ffffffff810c848c) priority=12 zone=ffff8801005fee00
> kswapd0 47 [000] 59599.956624: shrink_zone_return:
> (ffffffff810c848c <- ffffffff810c96c6) arg1=0
> kswapd0 47 [000] 59599.956626: i915_gem_inactive_shrink:
> (ffffffffa0081e48) gfp_mask=d0 nr_to_scan=0
> kswapd0 47 [000] 59599.956627: shrink_return:
> (ffffffffa0081e48 <- ffffffff810c6a62) arg1=0
> kswapd0 47 [000] 59599.956628: shrink_slab_return:
> (ffffffff810c69f5 <- ffffffff810c96ec) arg1=0
> kswapd0 47 [000] 59599.956629: mm_vmscan_kswapd_wake: nid=0 order=0
>
> The command was:
>
> perf record -g -aR -p 47 -e probe:i915_gem_inactive_shrink -e
> probe:shrink_return -e probe:shrink_slab_return -e probe:shrink_zone
> -e probe:shrink_zone_return -e probe:kswapd_try_to_sleep -e
> vmscan:mm_vmscan_kswapd_sleep -e vmscan:mm_vmscan_kswapd_wake -e
> vmscan:mm_vmscan_wakeup_kswapd -e vmscan:mm_vmscan_lru_shrink_inactive
> -e probe:wakeup_kswapd; perf script
>
> (shrink_return is i915_gem_inactive_shrink's return. sorry, badly named.)
>
> It looks like something kswapd_try_to_sleep is not getting called.
>
> I do not know how to reproduce this, but I'll leave it running overnight.
>
> --Andy
>
--
Kind regards,
Minchan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Root-causing kswapd spinning on Sandy Bridge laptops?
2011-06-24 9:27 ` Minchan Kim
@ 2011-06-24 9:38 ` Minchan Kim
2011-06-24 10:24 ` Pádraig Brady
1 sibling, 0 replies; 14+ messages in thread
From: Minchan Kim @ 2011-06-24 9:38 UTC (permalink / raw)
To: Andrew Lutomirski; +Cc: linux-mm, Mel Gorman, Pádraig Brady
On Fri, Jun 24, 2011 at 6:27 PM, Minchan Kim <minchan.kim@gmail.com> wrote:
> Hi Andrew,
>
> Sorry but right now I don't have a time to dive into this.
> But it seems to be similar to the problem Mel is looking at.
> Cced him.
>
> Even, Pádraig Brady seem to have a reproducible scenario.
> I will look when I have a time.
> I hope I will be back sooner or later.
>
>
> On Fri, Jun 24, 2011 at 3:22 PM, Andrew Lutomirski <luto@mit.edu> wrote:
>> I'm back :-/
>>
>> I just triggered the kswapd bug on 2.6.39.1, which has the
>> cond_resched in shrink_slab. This time my system's still usable (I'm
>> tying this email on it), but kswapd0 is taking 100% cpu. It *does*
>> schedule (tested by setting its affinity the same as another CPU hog
>> and confirming that each one gets 50%).
>>
>> It appears to be calling i915_gem_inactive_shrink in a loop. I have
>> probes on entry and return of i915_gem_inactive_shrink and on return
>> of shrink_slab. I see:
>>
>> kswapd0 47 [000] 59599.956573: mm_vmscan_kswapd_wake: nid=0 order=0
>> kswapd0 47 [000] 59599.956575: shrink_zone:
>> (ffffffff810c848c) priority=12 zone=ffff8801005fe000
>> kswapd0 47 [000] 59599.956576: shrink_zone_return:
>> (ffffffff810c848c <- ffffffff810c96c6) arg1=0
>> kswapd0 47 [000] 59599.956578: i915_gem_inactive_shrink:
>> (ffffffffa0081e48) gfp_mask=d0 nr_to_scan=0
>> kswapd0 47 [000] 59599.956589: shrink_return:
>> (ffffffffa0081e48 <- ffffffff810c6a62) arg1=320
>> kswapd0 47 [000] 59599.956589: shrink_slab_return:
>> (ffffffff810c69f5 <- ffffffff810c96ec) arg1=0
>> kswapd0 47 [000] 59599.956592: i915_gem_inactive_shrink:
>> (ffffffffa0081e48) gfp_mask=d0 nr_to_scan=0
>> kswapd0 47 [000] 59599.956602: shrink_return:
>> (ffffffffa0081e48 <- ffffffff810c6a62) arg1=320
>> kswapd0 47 [000] 59599.956603: shrink_slab_return:
>> (ffffffff810c69f5 <- ffffffff810c96ec) arg1=0
>> kswapd0 47 [000] 59599.956605: shrink_zone:
>> (ffffffff810c848c) priority=12 zone=ffff8801005fee00
>> kswapd0 47 [000] 59599.956606: shrink_zone_return:
>> (ffffffff810c848c <- ffffffff810c96c6) arg1=0
>> kswapd0 47 [000] 59599.956608: i915_gem_inactive_shrink:
>> (ffffffffa0081e48) gfp_mask=d0 nr_to_scan=0
>> kswapd0 47 [000] 59599.956609: shrink_return:
>> (ffffffffa0081e48 <- ffffffff810c6a62) arg1=0
>> kswapd0 47 [000] 59599.956610: shrink_slab_return:
>> (ffffffff810c69f5 <- ffffffff810c96ec) arg1=0
>> kswapd0 47 [000] 59599.956611: mm_vmscan_kswapd_wake: nid=0 order=0
>> kswapd0 47 [000] 59599.956612: shrink_zone:
>> (ffffffff810c848c) priority=12 zone=ffff8801005fe000
>> kswapd0 47 [000] 59599.956614: shrink_zone_return:
>> (ffffffff810c848c <- ffffffff810c96c6) arg1=0
>> kswapd0 47 [000] 59599.956616: i915_gem_inactive_shrink:
>> (ffffffffa0081e48) gfp_mask=d0 nr_to_scan=0
>> kswapd0 47 [000] 59599.956617: shrink_return:
>> (ffffffffa0081e48 <- ffffffff810c6a62) arg1=0
>> kswapd0 47 [000] 59599.956618: shrink_slab_return:
>> (ffffffff810c69f5 <- ffffffff810c96ec) arg1=0
>> kswapd0 47 [000] 59599.956620: i915_gem_inactive_shrink:
>> (ffffffffa0081e48) gfp_mask=d0 nr_to_scan=0
>> kswapd0 47 [000] 59599.956621: shrink_return:
>> (ffffffffa0081e48 <- ffffffff810c6a62) arg1=0
>> kswapd0 47 [000] 59599.956621: shrink_slab_return:
>> (ffffffff810c69f5 <- ffffffff810c96ec) arg1=0
>> kswapd0 47 [000] 59599.956623: shrink_zone:
>> (ffffffff810c848c) priority=12 zone=ffff8801005fee00
>> kswapd0 47 [000] 59599.956624: shrink_zone_return:
>> (ffffffff810c848c <- ffffffff810c96c6) arg1=0
>> kswapd0 47 [000] 59599.956626: i915_gem_inactive_shrink:
>> (ffffffffa0081e48) gfp_mask=d0 nr_to_scan=0
>> kswapd0 47 [000] 59599.956627: shrink_return:
>> (ffffffffa0081e48 <- ffffffff810c6a62) arg1=0
>> kswapd0 47 [000] 59599.956628: shrink_slab_return:
>> (ffffffff810c69f5 <- ffffffff810c96ec) arg1=0
>> kswapd0 47 [000] 59599.956629: mm_vmscan_kswapd_wake: nid=0 order=0
>>
>> The command was:
>>
>> perf record -g -aR -p 47 -e probe:i915_gem_inactive_shrink -e
>> probe:shrink_return -e probe:shrink_slab_return -e probe:shrink_zone
>> -e probe:shrink_zone_return -e probe:kswapd_try_to_sleep -e
>> vmscan:mm_vmscan_kswapd_sleep -e vmscan:mm_vmscan_kswapd_wake -e
>> vmscan:mm_vmscan_wakeup_kswapd -e vmscan:mm_vmscan_lru_shrink_inactive
>> -e probe:wakeup_kswapd; perf script
>>
>> (shrink_return is i915_gem_inactive_shrink's return. sorry, badly named.)
>>
>> It looks like something kswapd_try_to_sleep is not getting called.
>>
>> I do not know how to reproduce this, but I'll leave it running overnight.
If the problem happen again, could you probe wakeup_kswapd, too?
I think it can help us.
Thanks.
--
Kind regards,
Minchan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Root-causing kswapd spinning on Sandy Bridge laptops?
2011-06-24 9:27 ` Minchan Kim
2011-06-24 9:38 ` Minchan Kim
@ 2011-06-24 10:24 ` Pádraig Brady
2011-06-24 12:15 ` Pádraig Brady
2011-06-24 12:51 ` Mel Gorman
1 sibling, 2 replies; 14+ messages in thread
From: Pádraig Brady @ 2011-06-24 10:24 UTC (permalink / raw)
To: Minchan Kim; +Cc: Andrew Lutomirski, linux-mm, Mel Gorman
On 24/06/11 10:27, Minchan Kim wrote:
> Hi Andrew,
>
> Sorry but right now I don't have a time to dive into this.
> But it seems to be similar to the problem Mel is looking at.
> Cced him.
>
> Even, PA!draig Brady seem to have a reproducible scenario.
> I will look when I have a time.
> I hope I will be back sooner or later.
My reproducer is (I've 3GB RAM, 1.5G swap):
dd bs=1M count=3000 if=/dev/zero of=spin.test
To stop it spinning I just have to uncache the data,
the handiest way being:
rm spin.test
To confirm, the top of the profile I posted is:
i915_gem_object_bind_to_gtt
shrink_slab
cheers,
PA!draig.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Root-causing kswapd spinning on Sandy Bridge laptops?
2011-06-24 10:24 ` Pádraig Brady
@ 2011-06-24 12:15 ` Pádraig Brady
2011-06-24 12:51 ` Mel Gorman
1 sibling, 0 replies; 14+ messages in thread
From: Pádraig Brady @ 2011-06-24 12:15 UTC (permalink / raw)
To: Minchan Kim; +Cc: Andrew Lutomirski, linux-mm, Mel Gorman, KOSAKI Motohiro
On 24/06/11 11:24, PA!draig Brady wrote:
> On 24/06/11 10:27, Minchan Kim wrote:
>> Hi Andrew,
>>
>> Sorry but right now I don't have a time to dive into this.
>> But it seems to be similar to the problem Mel is looking at.
>> Cced him.
>>
>> Even, PA!draig Brady seem to have a reproducible scenario.
>> I will look when I have a time.
>> I hope I will be back sooner or later.
>
> My reproducer is (I've 3GB RAM, 1.5G swap):
> dd bs=1M count=3000 if=/dev/zero of=spin.test
>
> To stop it spinning I just have to uncache the data,
> the handiest way being:
> rm spin.test
>
> To confirm, the top of the profile I posted is:
> i915_gem_object_bind_to_gtt
> shrink_slab
BTW I just tried this patch,
but it didn't change anything.
http://marc.info/?l=linux-kernel&m=130890263124399&w=2
cheers,
PA!draig.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Root-causing kswapd spinning on Sandy Bridge laptops?
2011-06-24 10:24 ` Pádraig Brady
2011-06-24 12:15 ` Pádraig Brady
@ 2011-06-24 12:51 ` Mel Gorman
2011-06-24 13:32 ` Andrew Lutomirski
1 sibling, 1 reply; 14+ messages in thread
From: Mel Gorman @ 2011-06-24 12:51 UTC (permalink / raw)
To: P?draig Brady; +Cc: Minchan Kim, Andrew Lutomirski, linux-mm
On Fri, Jun 24, 2011 at 11:24:24AM +0100, P?draig Brady wrote:
> On 24/06/11 10:27, Minchan Kim wrote:
> > Hi Andrew,
> >
> > Sorry but right now I don't have a time to dive into this.
> > But it seems to be similar to the problem Mel is looking at.
> > Cced him.
> >
> > Even, Padraig Brady seem to have a reproducible scenario.
> > I will look when I have a time.
> > I hope I will be back sooner or later.
>
> My reproducer is (I've 3GB RAM, 1.5G swap):
> dd bs=1M count=3000 if=/dev/zero of=spin.test
>
> To stop it spinning I just have to uncache the data,
> the handiest way being:
> rm spin.test
>
> To confirm, the top of the profile I posted is:
> i915_gem_object_bind_to_gtt
> shrink_slab
>
I don't think it's an i915 bug. Another candidate fix in the other
thread that Padraig started.
--
Mel Gorman
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Root-causing kswapd spinning on Sandy Bridge laptops?
2011-06-24 12:51 ` Mel Gorman
@ 2011-06-24 13:32 ` Andrew Lutomirski
0 siblings, 0 replies; 14+ messages in thread
From: Andrew Lutomirski @ 2011-06-24 13:32 UTC (permalink / raw)
To: Mel Gorman; +Cc: P?draig Brady, Minchan Kim, linux-mm
On Fri, Jun 24, 2011 at 8:51 AM, Mel Gorman <mgorman@suse.de> wrote:
> On Fri, Jun 24, 2011 at 11:24:24AM +0100, P?draig Brady wrote:
>> On 24/06/11 10:27, Minchan Kim wrote:
>> > Hi Andrew,
>> >
>> > Sorry but right now I don't have a time to dive into this.
>> > But it seems to be similar to the problem Mel is looking at.
>> > Cced him.
>> >
>> > Even, Pádraig Brady seem to have a reproducible scenario.
>> > I will look when I have a time.
>> > I hope I will be back sooner or later.
>>
>> My reproducer is (I've 3GB RAM, 1.5G swap):
>> dd bs=1M count=3000 if=/dev/zero of=spin.test
>>
>> To stop it spinning I just have to uncache the data,
>> the handiest way being:
>> rm spin.test
>>
>> To confirm, the top of the profile I posted is:
>> i915_gem_object_bind_to_gtt
>> shrink_slab
>>
>
> I don't think it's an i915 bug. Another candidate fix in the other
> thread that Padraig started.
I bet you're right. I do indeed have a tiny high zone. (No clue why
-- I have 2G of ram right now.)
I won't be a reliable tester because I don't have a good way to
reproduce this bug.
--Andy
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Root-causing kswapd spinning on Sandy Bridge laptops?
2011-06-24 6:22 Root-causing kswapd spinning on Sandy Bridge laptops? Andrew Lutomirski
2011-06-24 9:27 ` Minchan Kim
@ 2011-06-24 18:44 ` Andi Kleen
2011-06-24 18:48 ` Andrew Lutomirski
2011-06-24 18:54 ` Chris Mason
1 sibling, 2 replies; 14+ messages in thread
From: Andi Kleen @ 2011-06-24 18:44 UTC (permalink / raw)
To: Andrew Lutomirski; +Cc: Minchan Kim, linux-mm, intel-gfx
Andrew Lutomirski <luto@mit.edu> writes:
[Putting the Intel graphics driver developers in cc.]
> I'm back :-/
>
> I just triggered the kswapd bug on 2.6.39.1, which has the
> cond_resched in shrink_slab. This time my system's still usable (I'm
> tying this email on it), but kswapd0 is taking 100% cpu. It *does*
> schedule (tested by setting its affinity the same as another CPU hog
> and confirming that each one gets 50%).
>
> It appears to be calling i915_gem_inactive_shrink in a loop. I have
> probes on entry and return of i915_gem_inactive_shrink and on return
> of shrink_slab. I see:
>
> kswapd0 47 [000] 59599.956573: mm_vmscan_kswapd_wake: nid=0 order=0
> kswapd0 47 [000] 59599.956575: shrink_zone:
> (ffffffff810c848c) priority=12 zone=ffff8801005fe000
> kswapd0 47 [000] 59599.956576: shrink_zone_return:
> (ffffffff810c848c <- ffffffff810c96c6) arg1=0
> kswapd0 47 [000] 59599.956578: i915_gem_inactive_shrink:
> (ffffffffa0081e48) gfp_mask=d0 nr_to_scan=0
> kswapd0 47 [000] 59599.956589: shrink_return:
> (ffffffffa0081e48 <- ffffffff810c6a62) arg1=320
> kswapd0 47 [000] 59599.956589: shrink_slab_return:
> (ffffffff810c69f5 <- ffffffff810c96ec) arg1=0
> kswapd0 47 [000] 59599.956592: i915_gem_inactive_shrink:
> (ffffffffa0081e48) gfp_mask=d0 nr_to_scan=0
> kswapd0 47 [000] 59599.956602: shrink_return:
> (ffffffffa0081e48 <- ffffffff810c6a62) arg1=320
> kswapd0 47 [000] 59599.956603: shrink_slab_return:
> (ffffffff810c69f5 <- ffffffff810c96ec) arg1=0
> kswapd0 47 [000] 59599.956605: shrink_zone:
> (ffffffff810c848c) priority=12 zone=ffff8801005fee00
> kswapd0 47 [000] 59599.956606: shrink_zone_return:
> (ffffffff810c848c <- ffffffff810c96c6) arg1=0
> kswapd0 47 [000] 59599.956608: i915_gem_inactive_shrink:
> (ffffffffa0081e48) gfp_mask=d0 nr_to_scan=0
> kswapd0 47 [000] 59599.956609: shrink_return:
> (ffffffffa0081e48 <- ffffffff810c6a62) arg1=0
> kswapd0 47 [000] 59599.956610: shrink_slab_return:
> (ffffffff810c69f5 <- ffffffff810c96ec) arg1=0
> kswapd0 47 [000] 59599.956611: mm_vmscan_kswapd_wake: nid=0 order=0
> kswapd0 47 [000] 59599.956612: shrink_zone:
> (ffffffff810c848c) priority=12 zone=ffff8801005fe000
> kswapd0 47 [000] 59599.956614: shrink_zone_return:
> (ffffffff810c848c <- ffffffff810c96c6) arg1=0
> kswapd0 47 [000] 59599.956616: i915_gem_inactive_shrink:
> (ffffffffa0081e48) gfp_mask=d0 nr_to_scan=0
> kswapd0 47 [000] 59599.956617: shrink_return:
> (ffffffffa0081e48 <- ffffffff810c6a62) arg1=0
> kswapd0 47 [000] 59599.956618: shrink_slab_return:
> (ffffffff810c69f5 <- ffffffff810c96ec) arg1=0
> kswapd0 47 [000] 59599.956620: i915_gem_inactive_shrink:
> (ffffffffa0081e48) gfp_mask=d0 nr_to_scan=0
> kswapd0 47 [000] 59599.956621: shrink_return:
> (ffffffffa0081e48 <- ffffffff810c6a62) arg1=0
> kswapd0 47 [000] 59599.956621: shrink_slab_return:
> (ffffffff810c69f5 <- ffffffff810c96ec) arg1=0
> kswapd0 47 [000] 59599.956623: shrink_zone:
> (ffffffff810c848c) priority=12 zone=ffff8801005fee00
> kswapd0 47 [000] 59599.956624: shrink_zone_return:
> (ffffffff810c848c <- ffffffff810c96c6) arg1=0
> kswapd0 47 [000] 59599.956626: i915_gem_inactive_shrink:
> (ffffffffa0081e48) gfp_mask=d0 nr_to_scan=0
> kswapd0 47 [000] 59599.956627: shrink_return:
> (ffffffffa0081e48 <- ffffffff810c6a62) arg1=0
> kswapd0 47 [000] 59599.956628: shrink_slab_return:
> (ffffffff810c69f5 <- ffffffff810c96ec) arg1=0
> kswapd0 47 [000] 59599.956629: mm_vmscan_kswapd_wake: nid=0 order=0
>
> The command was:
>
> perf record -g -aR -p 47 -e probe:i915_gem_inactive_shrink -e
> probe:shrink_return -e probe:shrink_slab_return -e probe:shrink_zone
> -e probe:shrink_zone_return -e probe:kswapd_try_to_sleep -e
> vmscan:mm_vmscan_kswapd_sleep -e vmscan:mm_vmscan_kswapd_wake -e
> vmscan:mm_vmscan_wakeup_kswapd -e vmscan:mm_vmscan_lru_shrink_inactive
> -e probe:wakeup_kswapd; perf script
>
> (shrink_return is i915_gem_inactive_shrink's return. sorry, badly named.)
>
> It looks like something kswapd_try_to_sleep is not getting called.
>
> I do not know how to reproduce this, but I'll leave it running overnight.
>
> --Andy
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
--
ak@linux.intel.com -- Speaking for myself only
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Root-causing kswapd spinning on Sandy Bridge laptops?
2011-06-24 18:44 ` Andi Kleen
@ 2011-06-24 18:48 ` Andrew Lutomirski
2011-06-24 19:13 ` Andi Kleen
2011-06-24 18:54 ` Chris Mason
1 sibling, 1 reply; 14+ messages in thread
From: Andrew Lutomirski @ 2011-06-24 18:48 UTC (permalink / raw)
To: Andi Kleen; +Cc: Minchan Kim, linux-mm, intel-gfx
On Fri, Jun 24, 2011 at 12:44 PM, Andi Kleen <andi@firstfloor.org> wrote:
> Andrew Lutomirski <luto@mit.edu> writes:
>
> [Putting the Intel graphics driver developers in cc.]
My Sandy Bridge laptop is to blame, the graphics aren't the culprit. It's this:
BIOS-e820: 0000000100000000 - 0000000100600000 (usable)
The kernel can't handle the tiny bit of memory above 4G. Mel's
patches work so far.
--Andy
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Root-causing kswapd spinning on Sandy Bridge laptops?
2011-06-24 18:44 ` Andi Kleen
2011-06-24 18:48 ` Andrew Lutomirski
@ 2011-06-24 18:54 ` Chris Mason
2011-06-27 11:03 ` Mel Gorman
1 sibling, 1 reply; 14+ messages in thread
From: Chris Mason @ 2011-06-24 18:54 UTC (permalink / raw)
To: Andi Kleen; +Cc: Andrew Lutomirski, Minchan Kim, linux-mm, intel-gfx
Excerpts from Andi Kleen's message of 2011-06-24 14:44:12 -0400:
> Andrew Lutomirski <luto@mit.edu> writes:
>
> [Putting the Intel graphics driver developers in cc.]
>
> > I'm back :-/
> >
> > I just triggered the kswapd bug on 2.6.39.1, which has the
> > cond_resched in shrink_slab. This time my system's still usable (I'm
> > tying this email on it), but kswapd0 is taking 100% cpu. It *does*
> > schedule (tested by setting its affinity the same as another CPU hog
> > and confirming that each one gets 50%).
> >
> > It appears to be calling i915_gem_inactive_shrink in a loop. I have
> > probes on entry and return of i915_gem_inactive_shrink and on return
> > of shrink_slab. I see:
> >
> > kswapd0 47 [000] 59599.956573: mm_vmscan_kswapd_wake: nid=0 order=0
> > kswapd0 47 [000] 59599.956575: shrink_zone:
> > (ffffffff810c848c) priority=12 zone=ffff8801005fe000
> > kswapd0 47 [000] 59599.956576: shrink_zone_return:
> > (ffffffff810c848c <- ffffffff810c96c6) arg1=0
> > kswapd0 47 [000] 59599.956578: i915_gem_inactive_shrink:
A similar trace came up a bunch of times in Jejb's NMI softlockup/kswapd
consumes the machine thread. That one was tracked down to slub high
order allocations.
I'm sure that one is burned in on Mel's memory, but after a while the
individual traces fell out of the thread, and I'm not sure the i915 part
stuck out.
-chris
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Root-causing kswapd spinning on Sandy Bridge laptops?
2011-06-24 18:48 ` Andrew Lutomirski
@ 2011-06-24 19:13 ` Andi Kleen
2011-06-24 19:17 ` Christoph Hellwig
0 siblings, 1 reply; 14+ messages in thread
From: Andi Kleen @ 2011-06-24 19:13 UTC (permalink / raw)
To: Andrew Lutomirski; +Cc: Andi Kleen, Minchan Kim, linux-mm, intel-gfx
On Fri, Jun 24, 2011 at 12:48:20PM -0600, Andrew Lutomirski wrote:
> On Fri, Jun 24, 2011 at 12:44 PM, Andi Kleen <andi@firstfloor.org> wrote:
> > Andrew Lutomirski <luto@mit.edu> writes:
> >
> > [Putting the Intel graphics driver developers in cc.]
>
> My Sandy Bridge laptop is to blame, the graphics aren't the culprit. It's this:
>
> BIOS-e820: 0000000100000000 - 0000000100600000 (usable)
>
> The kernel can't handle the tiny bit of memory above 4G. Mel's
> patches work so far.
Maybe the graphics driver could be still nicer the VM and perhaps
be more aggressive in the callback?
But I failed anyways because the graphics developers run a closed
list. Never mind.
-Andi
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Root-causing kswapd spinning on Sandy Bridge laptops?
2011-06-24 19:13 ` Andi Kleen
@ 2011-06-24 19:17 ` Christoph Hellwig
0 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2011-06-24 19:17 UTC (permalink / raw)
To: Andi Kleen; +Cc: Andrew Lutomirski, Minchan Kim, linux-mm, intel-gfx
On Fri, Jun 24, 2011 at 09:13:34PM +0200, Andi Kleen wrote:
> Maybe the graphics driver could be still nicer the VM and perhaps
> be more aggressive in the callback?
Or just fix the nasty bugs in there, e.g. apply
[PATCH] i915: slab shrinker have to return -1 if it can't shrink any objects
which was sent to lkml today. Also the first three patches from Dave
Chinners per-sb shrinker series, which fix two bugs in the core shrinker
code, and add tracing to it should help a lot.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Root-causing kswapd spinning on Sandy Bridge laptops?
2011-06-24 18:54 ` Chris Mason
@ 2011-06-27 11:03 ` Mel Gorman
2011-06-27 20:18 ` James Bottomley
0 siblings, 1 reply; 14+ messages in thread
From: Mel Gorman @ 2011-06-27 11:03 UTC (permalink / raw)
To: Chris Mason
Cc: Andi Kleen, Andrew Lutomirski, Minchan Kim, linux-mm, intel-gfx
On Fri, Jun 24, 2011 at 02:54:11PM -0400, Chris Mason wrote:
> Excerpts from Andi Kleen's message of 2011-06-24 14:44:12 -0400:
> > Andrew Lutomirski <luto@mit.edu> writes:
> >
> > [Putting the Intel graphics driver developers in cc.]
> >
> > > I'm back :-/
> > >
> > > I just triggered the kswapd bug on 2.6.39.1, which has the
> > > cond_resched in shrink_slab. This time my system's still usable (I'm
> > > tying this email on it), but kswapd0 is taking 100% cpu. It *does*
> > > schedule (tested by setting its affinity the same as another CPU hog
> > > and confirming that each one gets 50%).
> > >
> > > It appears to be calling i915_gem_inactive_shrink in a loop. I have
> > > probes on entry and return of i915_gem_inactive_shrink and on return
> > > of shrink_slab. I see:
> > >
> > > kswapd0 47 [000] 59599.956573: mm_vmscan_kswapd_wake: nid=0 order=0
> > > kswapd0 47 [000] 59599.956575: shrink_zone:
> > > (ffffffff810c848c) priority=12 zone=ffff8801005fe000
> > > kswapd0 47 [000] 59599.956576: shrink_zone_return:
> > > (ffffffff810c848c <- ffffffff810c96c6) arg1=0
> > > kswapd0 47 [000] 59599.956578: i915_gem_inactive_shrink:
>
> A similar trace came up a bunch of times in Jejb's NMI softlockup/kswapd
> consumes the machine thread. That one was tracked down to slub high
> order allocations.
>
> I'm sure that one is burned in on Mel's memory, but after a while the
> individual traces fell out of the thread, and I'm not sure the i915 part
> stuck out.
>
I expect that Jejb's lockup is also fixed by "Stop kswapd consuming
100% CPU when highest zone is small". i915 didn't help but at the end
of the day, kswapd shouldn't have been shrinking slab so aggressively.
--
Mel Gorman
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Root-causing kswapd spinning on Sandy Bridge laptops?
2011-06-27 11:03 ` Mel Gorman
@ 2011-06-27 20:18 ` James Bottomley
0 siblings, 0 replies; 14+ messages in thread
From: James Bottomley @ 2011-06-27 20:18 UTC (permalink / raw)
To: Mel Gorman
Cc: Chris Mason, Andi Kleen, Andrew Lutomirski, Minchan Kim, linux-mm,
intel-gfx
On Mon, 2011-06-27 at 12:03 +0100, Mel Gorman wrote:
> On Fri, Jun 24, 2011 at 02:54:11PM -0400, Chris Mason wrote:
> > Excerpts from Andi Kleen's message of 2011-06-24 14:44:12 -0400:
> > > Andrew Lutomirski <luto@mit.edu> writes:
> > >
> > > [Putting the Intel graphics driver developers in cc.]
> > >
> > > > I'm back :-/
> > > >
> > > > I just triggered the kswapd bug on 2.6.39.1, which has the
> > > > cond_resched in shrink_slab. This time my system's still usable (I'm
> > > > tying this email on it), but kswapd0 is taking 100% cpu. It *does*
> > > > schedule (tested by setting its affinity the same as another CPU hog
> > > > and confirming that each one gets 50%).
> > > >
> > > > It appears to be calling i915_gem_inactive_shrink in a loop. I have
> > > > probes on entry and return of i915_gem_inactive_shrink and on return
> > > > of shrink_slab. I see:
> > > >
> > > > kswapd0 47 [000] 59599.956573: mm_vmscan_kswapd_wake: nid=0 order=0
> > > > kswapd0 47 [000] 59599.956575: shrink_zone:
> > > > (ffffffff810c848c) priority=12 zone=ffff8801005fe000
> > > > kswapd0 47 [000] 59599.956576: shrink_zone_return:
> > > > (ffffffff810c848c <- ffffffff810c96c6) arg1=0
> > > > kswapd0 47 [000] 59599.956578: i915_gem_inactive_shrink:
> >
> > A similar trace came up a bunch of times in Jejb's NMI softlockup/kswapd
> > consumes the machine thread. That one was tracked down to slub high
> > order allocations.
> >
> > I'm sure that one is burned in on Mel's memory, but after a while the
> > individual traces fell out of the thread, and I'm not sure the i915 part
> > stuck out.
> >
>
> I expect that Jejb's lockup is also fixed by "Stop kswapd consuming
> 100% CPU when highest zone is small". i915 didn't help but at the end
> of the day, kswapd shouldn't have been shrinking slab so aggressively.
It will be a while before I can try this out, I'm afraid ... the laptop
is currently on tour in Europe with its owner. I finally just
downgraded it to FC13 which made most of the issues go away.
James
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2011-06-27 20:18 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-06-24 6:22 Root-causing kswapd spinning on Sandy Bridge laptops? Andrew Lutomirski
2011-06-24 9:27 ` Minchan Kim
2011-06-24 9:38 ` Minchan Kim
2011-06-24 10:24 ` Pádraig Brady
2011-06-24 12:15 ` Pádraig Brady
2011-06-24 12:51 ` Mel Gorman
2011-06-24 13:32 ` Andrew Lutomirski
2011-06-24 18:44 ` Andi Kleen
2011-06-24 18:48 ` Andrew Lutomirski
2011-06-24 19:13 ` Andi Kleen
2011-06-24 19:17 ` Christoph Hellwig
2011-06-24 18:54 ` Chris Mason
2011-06-27 11:03 ` Mel Gorman
2011-06-27 20:18 ` James Bottomley
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).