Root-causing kswapd spinning on Sandy Bridge laptops?

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Root-causing kswapd spinning on Sandy Bridge laptops?
@ 2011-06-24  6:22 Andrew Lutomirski
  2011-06-24  9:27 ` Minchan Kim
  2011-06-24 18:44 ` Andi Kleen
  0 siblings, 2 replies; 14+ messages in thread
From: Andrew Lutomirski @ 2011-06-24  6:22 UTC (permalink / raw)
  To: Minchan Kim, linux-mm

I'm back :-/

I just triggered the kswapd bug on 2.6.39.1, which has the
cond_resched in shrink_slab.  This time my system's still usable (I'm
tying this email on it), but kswapd0 is taking 100% cpu.  It *does*
schedule (tested by setting its affinity the same as another CPU hog
and confirming that each one gets 50%).

It appears to be calling i915_gem_inactive_shrink in a loop.  I have
probes on entry and return of i915_gem_inactive_shrink and on return
of shrink_slab.  I see:

         kswapd0    47 [000] 59599.956573: mm_vmscan_kswapd_wake: nid=0 order=0
         kswapd0    47 [000] 59599.956575: shrink_zone:
(ffffffff810c848c) priority=12 zone=ffff8801005fe000
         kswapd0    47 [000] 59599.956576: shrink_zone_return:
(ffffffff810c848c <- ffffffff810c96c6) arg1=0
         kswapd0    47 [000] 59599.956578: i915_gem_inactive_shrink:
(ffffffffa0081e48) gfp_mask=d0 nr_to_scan=0
         kswapd0    47 [000] 59599.956589: shrink_return:
(ffffffffa0081e48 <- ffffffff810c6a62) arg1=320
         kswapd0    47 [000] 59599.956589: shrink_slab_return:
(ffffffff810c69f5 <- ffffffff810c96ec) arg1=0
         kswapd0    47 [000] 59599.956592: i915_gem_inactive_shrink:
(ffffffffa0081e48) gfp_mask=d0 nr_to_scan=0
         kswapd0    47 [000] 59599.956602: shrink_return:
(ffffffffa0081e48 <- ffffffff810c6a62) arg1=320
         kswapd0    47 [000] 59599.956603: shrink_slab_return:
(ffffffff810c69f5 <- ffffffff810c96ec) arg1=0
         kswapd0    47 [000] 59599.956605: shrink_zone:
(ffffffff810c848c) priority=12 zone=ffff8801005fee00
         kswapd0    47 [000] 59599.956606: shrink_zone_return:
(ffffffff810c848c <- ffffffff810c96c6) arg1=0
         kswapd0    47 [000] 59599.956608: i915_gem_inactive_shrink:
(ffffffffa0081e48) gfp_mask=d0 nr_to_scan=0
         kswapd0    47 [000] 59599.956609: shrink_return:
(ffffffffa0081e48 <- ffffffff810c6a62) arg1=0
         kswapd0    47 [000] 59599.956610: shrink_slab_return:
(ffffffff810c69f5 <- ffffffff810c96ec) arg1=0
         kswapd0    47 [000] 59599.956611: mm_vmscan_kswapd_wake: nid=0 order=0
         kswapd0    47 [000] 59599.956612: shrink_zone:
(ffffffff810c848c) priority=12 zone=ffff8801005fe000
         kswapd0    47 [000] 59599.956614: shrink_zone_return:
(ffffffff810c848c <- ffffffff810c96c6) arg1=0
         kswapd0    47 [000] 59599.956616: i915_gem_inactive_shrink:
(ffffffffa0081e48) gfp_mask=d0 nr_to_scan=0
         kswapd0    47 [000] 59599.956617: shrink_return:
(ffffffffa0081e48 <- ffffffff810c6a62) arg1=0
         kswapd0    47 [000] 59599.956618: shrink_slab_return:
(ffffffff810c69f5 <- ffffffff810c96ec) arg1=0
         kswapd0    47 [000] 59599.956620: i915_gem_inactive_shrink:
(ffffffffa0081e48) gfp_mask=d0 nr_to_scan=0
         kswapd0    47 [000] 59599.956621: shrink_return:
(ffffffffa0081e48 <- ffffffff810c6a62) arg1=0
         kswapd0    47 [000] 59599.956621: shrink_slab_return:
(ffffffff810c69f5 <- ffffffff810c96ec) arg1=0
         kswapd0    47 [000] 59599.956623: shrink_zone:
(ffffffff810c848c) priority=12 zone=ffff8801005fee00
         kswapd0    47 [000] 59599.956624: shrink_zone_return:
(ffffffff810c848c <- ffffffff810c96c6) arg1=0
         kswapd0    47 [000] 59599.956626: i915_gem_inactive_shrink:
(ffffffffa0081e48) gfp_mask=d0 nr_to_scan=0
         kswapd0    47 [000] 59599.956627: shrink_return:
(ffffffffa0081e48 <- ffffffff810c6a62) arg1=0
         kswapd0    47 [000] 59599.956628: shrink_slab_return:
(ffffffff810c69f5 <- ffffffff810c96ec) arg1=0
         kswapd0    47 [000] 59599.956629: mm_vmscan_kswapd_wake: nid=0 order=0

The command was:

perf record -g -aR -p 47 -e probe:i915_gem_inactive_shrink -e
probe:shrink_return -e probe:shrink_slab_return -e probe:shrink_zone
-e probe:shrink_zone_return -e probe:kswapd_try_to_sleep -e
vmscan:mm_vmscan_kswapd_sleep -e vmscan:mm_vmscan_kswapd_wake -e
vmscan:mm_vmscan_wakeup_kswapd -e vmscan:mm_vmscan_lru_shrink_inactive
-e probe:wakeup_kswapd; perf script

(shrink_return is i915_gem_inactive_shrink's return.  sorry, badly named.)

It looks like something kswapd_try_to_sleep is not getting called.

I do not know how to reproduce this, but I'll leave it running overnight.

--Andy

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Root-causing kswapd spinning on Sandy Bridge laptops?
  2011-06-24  6:22 Root-causing kswapd spinning on Sandy Bridge laptops? Andrew Lutomirski
@ 2011-06-24  9:27 ` Minchan Kim
  2011-06-24  9:38   ` Minchan Kim
  2011-06-24 10:24   ` Pádraig Brady
  2011-06-24 18:44 ` Andi Kleen
  1 sibling, 2 replies; 14+ messages in thread
From: Minchan Kim @ 2011-06-24  9:27 UTC (permalink / raw)
  To: Andrew Lutomirski; +Cc: linux-mm, Mel Gorman, Pádraig Brady

Hi Andrew,

Sorry but right now I don't have a time to dive into this.
But it seems to be similar to the problem Mel is looking at.
Cced him.

Even, Pádraig Brady seem to have a reproducible scenario.
I will look when I have a time.
I hope I will be back sooner or later.


On Fri, Jun 24, 2011 at 3:22 PM, Andrew Lutomirski <luto@mit.edu> wrote:
> I'm back :-/
>
> I just triggered the kswapd bug on 2.6.39.1, which has the
> cond_resched in shrink_slab.  This time my system's still usable (I'm
> tying this email on it), but kswapd0 is taking 100% cpu.  It *does*
> schedule (tested by setting its affinity the same as another CPU hog
> and confirming that each one gets 50%).
>
> It appears to be calling i915_gem_inactive_shrink in a loop.  I have
> probes on entry and return of i915_gem_inactive_shrink and on return
> of shrink_slab.  I see:
>
>         kswapd0    47 [000] 59599.956573: mm_vmscan_kswapd_wake: nid=0 order=0
>         kswapd0    47 [000] 59599.956575: shrink_zone:
> (ffffffff810c848c) priority=12 zone=ffff8801005fe000
>         kswapd0    47 [000] 59599.956576: shrink_zone_return:
> (ffffffff810c848c <- ffffffff810c96c6) arg1=0
>         kswapd0    47 [000] 59599.956578: i915_gem_inactive_shrink:
> (ffffffffa0081e48) gfp_mask=d0 nr_to_scan=0
>         kswapd0    47 [000] 59599.956589: shrink_return:
> (ffffffffa0081e48 <- ffffffff810c6a62) arg1=320
>         kswapd0    47 [000] 59599.956589: shrink_slab_return:
> (ffffffff810c69f5 <- ffffffff810c96ec) arg1=0
>         kswapd0    47 [000] 59599.956592: i915_gem_inactive_shrink:
> (ffffffffa0081e48) gfp_mask=d0 nr_to_scan=0
>         kswapd0    47 [000] 59599.956602: shrink_return:
> (ffffffffa0081e48 <- ffffffff810c6a62) arg1=320
>         kswapd0    47 [000] 59599.956603: shrink_slab_return:
> (ffffffff810c69f5 <- ffffffff810c96ec) arg1=0
>         kswapd0    47 [000] 59599.956605: shrink_zone:
> (ffffffff810c848c) priority=12 zone=ffff8801005fee00
>         kswapd0    47 [000] 59599.956606: shrink_zone_return:
> (ffffffff810c848c <- ffffffff810c96c6) arg1=0
>         kswapd0    47 [000] 59599.956608: i915_gem_inactive_shrink:
> (ffffffffa0081e48) gfp_mask=d0 nr_to_scan=0
>         kswapd0    47 [000] 59599.956609: shrink_return:
> (ffffffffa0081e48 <- ffffffff810c6a62) arg1=0
>         kswapd0    47 [000] 59599.956610: shrink_slab_return:
> (ffffffff810c69f5 <- ffffffff810c96ec) arg1=0
>         kswapd0    47 [000] 59599.956611: mm_vmscan_kswapd_wake: nid=0 order=0
>         kswapd0    47 [000] 59599.956612: shrink_zone:
> (ffffffff810c848c) priority=12 zone=ffff8801005fe000
>         kswapd0    47 [000] 59599.956614: shrink_zone_return:
> (ffffffff810c848c <- ffffffff810c96c6) arg1=0
>         kswapd0    47 [000] 59599.956616: i915_gem_inactive_shrink:
> (ffffffffa0081e48) gfp_mask=d0 nr_to_scan=0
>         kswapd0    47 [000] 59599.956617: shrink_return:
> (ffffffffa0081e48 <- ffffffff810c6a62) arg1=0
>         kswapd0    47 [000] 59599.956618: shrink_slab_return:
> (ffffffff810c69f5 <- ffffffff810c96ec) arg1=0
>         kswapd0    47 [000] 59599.956620: i915_gem_inactive_shrink:
> (ffffffffa0081e48) gfp_mask=d0 nr_to_scan=0
>         kswapd0    47 [000] 59599.956621: shrink_return:
> (ffffffffa0081e48 <- ffffffff810c6a62) arg1=0
>         kswapd0    47 [000] 59599.956621: shrink_slab_return:
> (ffffffff810c69f5 <- ffffffff810c96ec) arg1=0
>         kswapd0    47 [000] 59599.956623: shrink_zone:
> (ffffffff810c848c) priority=12 zone=ffff8801005fee00
>         kswapd0    47 [000] 59599.956624: shrink_zone_return:
> (ffffffff810c848c <- ffffffff810c96c6) arg1=0
>         kswapd0    47 [000] 59599.956626: i915_gem_inactive_shrink:
> (ffffffffa0081e48) gfp_mask=d0 nr_to_scan=0
>         kswapd0    47 [000] 59599.956627: shrink_return:
> (ffffffffa0081e48 <- ffffffff810c6a62) arg1=0
>         kswapd0    47 [000] 59599.956628: shrink_slab_return:
> (ffffffff810c69f5 <- ffffffff810c96ec) arg1=0
>         kswapd0    47 [000] 59599.956629: mm_vmscan_kswapd_wake: nid=0 order=0
>
> The command was:
>
> perf record -g -aR -p 47 -e probe:i915_gem_inactive_shrink -e
> probe:shrink_return -e probe:shrink_slab_return -e probe:shrink_zone
> -e probe:shrink_zone_return -e probe:kswapd_try_to_sleep -e
> vmscan:mm_vmscan_kswapd_sleep -e vmscan:mm_vmscan_kswapd_wake -e
> vmscan:mm_vmscan_wakeup_kswapd -e vmscan:mm_vmscan_lru_shrink_inactive
> -e probe:wakeup_kswapd; perf script
>
> (shrink_return is i915_gem_inactive_shrink's return.  sorry, badly named.)
>
> It looks like something kswapd_try_to_sleep is not getting called.
>
> I do not know how to reproduce this, but I'll leave it running overnight.
>
> --Andy
>



-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Root-causing kswapd spinning on Sandy Bridge laptops?
  2011-06-24  9:27 ` Minchan Kim
@ 2011-06-24  9:38   ` Minchan Kim
  2011-06-24 10:24   ` Pádraig Brady
  1 sibling, 0 replies; 14+ messages in thread
From: Minchan Kim @ 2011-06-24  9:38 UTC (permalink / raw)
  To: Andrew Lutomirski; +Cc: linux-mm, Mel Gorman, Pádraig Brady

On Fri, Jun 24, 2011 at 6:27 PM, Minchan Kim <minchan.kim@gmail.com> wrote:
> Hi Andrew,
>
> Sorry but right now I don't have a time to dive into this.
> But it seems to be similar to the problem Mel is looking at.
> Cced him.
>
> Even, Pádraig Brady seem to have a reproducible scenario.
> I will look when I have a time.
> I hope I will be back sooner or later.
>
>
> On Fri, Jun 24, 2011 at 3:22 PM, Andrew Lutomirski <luto@mit.edu> wrote:
>> I'm back :-/
>>
>> I just triggered the kswapd bug on 2.6.39.1, which has the
>> cond_resched in shrink_slab.  This time my system's still usable (I'm
>> tying this email on it), but kswapd0 is taking 100% cpu.  It *does*
>> schedule (tested by setting its affinity the same as another CPU hog
>> and confirming that each one gets 50%).
>>
>> It appears to be calling i915_gem_inactive_shrink in a loop.  I have
>> probes on entry and return of i915_gem_inactive_shrink and on return
>> of shrink_slab.  I see:
>>
>>         kswapd0    47 [000] 59599.956573: mm_vmscan_kswapd_wake: nid=0 order=0
>>         kswapd0    47 [000] 59599.956575: shrink_zone:
>> (ffffffff810c848c) priority=12 zone=ffff8801005fe000
>>         kswapd0    47 [000] 59599.956576: shrink_zone_return:
>> (ffffffff810c848c <- ffffffff810c96c6) arg1=0
>>         kswapd0    47 [000] 59599.956578: i915_gem_inactive_shrink:
>> (ffffffffa0081e48) gfp_mask=d0 nr_to_scan=0
>>         kswapd0    47 [000] 59599.956589: shrink_return:
>> (ffffffffa0081e48 <- ffffffff810c6a62) arg1=320
>>         kswapd0    47 [000] 59599.956589: shrink_slab_return:
>> (ffffffff810c69f5 <- ffffffff810c96ec) arg1=0
>>         kswapd0    47 [000] 59599.956592: i915_gem_inactive_shrink:
>> (ffffffffa0081e48) gfp_mask=d0 nr_to_scan=0
>>         kswapd0    47 [000] 59599.956602: shrink_return:
>> (ffffffffa0081e48 <- ffffffff810c6a62) arg1=320
>>         kswapd0    47 [000] 59599.956603: shrink_slab_return:
>> (ffffffff810c69f5 <- ffffffff810c96ec) arg1=0
>>         kswapd0    47 [000] 59599.956605: shrink_zone:
>> (ffffffff810c848c) priority=12 zone=ffff8801005fee00
>>         kswapd0    47 [000] 59599.956606: shrink_zone_return:
>> (ffffffff810c848c <- ffffffff810c96c6) arg1=0
>>         kswapd0    47 [000] 59599.956608: i915_gem_inactive_shrink:
>> (ffffffffa0081e48) gfp_mask=d0 nr_to_scan=0
>>         kswapd0    47 [000] 59599.956609: shrink_return:
>> (ffffffffa0081e48 <- ffffffff810c6a62) arg1=0
>>         kswapd0    47 [000] 59599.956610: shrink_slab_return:
>> (ffffffff810c69f5 <- ffffffff810c96ec) arg1=0
>>         kswapd0    47 [000] 59599.956611: mm_vmscan_kswapd_wake: nid=0 order=0
>>         kswapd0    47 [000] 59599.956612: shrink_zone:
>> (ffffffff810c848c) priority=12 zone=ffff8801005fe000
>>         kswapd0    47 [000] 59599.956614: shrink_zone_return:
>> (ffffffff810c848c <- ffffffff810c96c6) arg1=0
>>         kswapd0    47 [000] 59599.956616: i915_gem_inactive_shrink:
>> (ffffffffa0081e48) gfp_mask=d0 nr_to_scan=0
>>         kswapd0    47 [000] 59599.956617: shrink_return:
>> (ffffffffa0081e48 <- ffffffff810c6a62) arg1=0
>>         kswapd0    47 [000] 59599.956618: shrink_slab_return:
>> (ffffffff810c69f5 <- ffffffff810c96ec) arg1=0
>>         kswapd0    47 [000] 59599.956620: i915_gem_inactive_shrink:
>> (ffffffffa0081e48) gfp_mask=d0 nr_to_scan=0
>>         kswapd0    47 [000] 59599.956621: shrink_return:
>> (ffffffffa0081e48 <- ffffffff810c6a62) arg1=0
>>         kswapd0    47 [000] 59599.956621: shrink_slab_return:
>> (ffffffff810c69f5 <- ffffffff810c96ec) arg1=0
>>         kswapd0    47 [000] 59599.956623: shrink_zone:
>> (ffffffff810c848c) priority=12 zone=ffff8801005fee00
>>         kswapd0    47 [000] 59599.956624: shrink_zone_return:
>> (ffffffff810c848c <- ffffffff810c96c6) arg1=0
>>         kswapd0    47 [000] 59599.956626: i915_gem_inactive_shrink:
>> (ffffffffa0081e48) gfp_mask=d0 nr_to_scan=0
>>         kswapd0    47 [000] 59599.956627: shrink_return:
>> (ffffffffa0081e48 <- ffffffff810c6a62) arg1=0
>>         kswapd0    47 [000] 59599.956628: shrink_slab_return:
>> (ffffffff810c69f5 <- ffffffff810c96ec) arg1=0
>>         kswapd0    47 [000] 59599.956629: mm_vmscan_kswapd_wake: nid=0 order=0
>>
>> The command was:
>>
>> perf record -g -aR -p 47 -e probe:i915_gem_inactive_shrink -e
>> probe:shrink_return -e probe:shrink_slab_return -e probe:shrink_zone
>> -e probe:shrink_zone_return -e probe:kswapd_try_to_sleep -e
>> vmscan:mm_vmscan_kswapd_sleep -e vmscan:mm_vmscan_kswapd_wake -e
>> vmscan:mm_vmscan_wakeup_kswapd -e vmscan:mm_vmscan_lru_shrink_inactive
>> -e probe:wakeup_kswapd; perf script
>>
>> (shrink_return is i915_gem_inactive_shrink's return.  sorry, badly named.)
>>
>> It looks like something kswapd_try_to_sleep is not getting called.
>>
>> I do not know how to reproduce this, but I'll leave it running overnight.

If the problem happen again, could you probe wakeup_kswapd, too?
I think it can help us.

Thanks.


-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Root-causing kswapd spinning on Sandy Bridge laptops?
  2011-06-24  9:27 ` Minchan Kim
  2011-06-24  9:38   ` Minchan Kim
@ 2011-06-24 10:24   ` Pádraig Brady
  2011-06-24 12:15     ` Pádraig Brady
  2011-06-24 12:51     ` Mel Gorman
  1 sibling, 2 replies; 14+ messages in thread
From: Pádraig Brady @ 2011-06-24 10:24 UTC (permalink / raw)
  To: Minchan Kim; +Cc: Andrew Lutomirski, linux-mm, Mel Gorman

On 24/06/11 10:27, Minchan Kim wrote:
> Hi Andrew,
> 
> Sorry but right now I don't have a time to dive into this.
> But it seems to be similar to the problem Mel is looking at.
> Cced him.
> 
> Even, PA!draig Brady seem to have a reproducible scenario.
> I will look when I have a time.
> I hope I will be back sooner or later.

My reproducer is (I've 3GB RAM, 1.5G swap):
  dd bs=1M count=3000 if=/dev/zero of=spin.test

To stop it spinning I just have to uncache the data,
the handiest way being:
  rm spin.test

To confirm, the top of the profile I posted is:
  i915_gem_object_bind_to_gtt
    shrink_slab

cheers,
PA!draig.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Root-causing kswapd spinning on Sandy Bridge laptops?
  2011-06-24 10:24   ` Pádraig Brady
@ 2011-06-24 12:15     ` Pádraig Brady
  2011-06-24 12:51     ` Mel Gorman
  1 sibling, 0 replies; 14+ messages in thread
From: Pádraig Brady @ 2011-06-24 12:15 UTC (permalink / raw)
  To: Minchan Kim; +Cc: Andrew Lutomirski, linux-mm, Mel Gorman, KOSAKI Motohiro

On 24/06/11 11:24, PA!draig Brady wrote:
> On 24/06/11 10:27, Minchan Kim wrote:
>> Hi Andrew,
>>
>> Sorry but right now I don't have a time to dive into this.
>> But it seems to be similar to the problem Mel is looking at.
>> Cced him.
>>
>> Even, PA!draig Brady seem to have a reproducible scenario.
>> I will look when I have a time.
>> I hope I will be back sooner or later.
> 
> My reproducer is (I've 3GB RAM, 1.5G swap):
>   dd bs=1M count=3000 if=/dev/zero of=spin.test
> 
> To stop it spinning I just have to uncache the data,
> the handiest way being:
>   rm spin.test
> 
> To confirm, the top of the profile I posted is:
>   i915_gem_object_bind_to_gtt
>     shrink_slab

BTW I just tried this patch,
but it didn't change anything.

http://marc.info/?l=linux-kernel&m=130890263124399&w=2

cheers,
PA!draig.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Root-causing kswapd spinning on Sandy Bridge laptops?
  2011-06-24 10:24   ` Pádraig Brady
  2011-06-24 12:15     ` Pádraig Brady
@ 2011-06-24 12:51     ` Mel Gorman
  2011-06-24 13:32       ` Andrew Lutomirski
  1 sibling, 1 reply; 14+ messages in thread
From: Mel Gorman @ 2011-06-24 12:51 UTC (permalink / raw)
  To: P?draig Brady; +Cc: Minchan Kim, Andrew Lutomirski, linux-mm

On Fri, Jun 24, 2011 at 11:24:24AM +0100, P?draig Brady wrote:
> On 24/06/11 10:27, Minchan Kim wrote:
> > Hi Andrew,
> > 
> > Sorry but right now I don't have a time to dive into this.
> > But it seems to be similar to the problem Mel is looking at.
> > Cced him.
> > 
> > Even, Padraig Brady seem to have a reproducible scenario.
> > I will look when I have a time.
> > I hope I will be back sooner or later.
> 
> My reproducer is (I've 3GB RAM, 1.5G swap):
>   dd bs=1M count=3000 if=/dev/zero of=spin.test
> 
> To stop it spinning I just have to uncache the data,
> the handiest way being:
>   rm spin.test
> 
> To confirm, the top of the profile I posted is:
>   i915_gem_object_bind_to_gtt
>     shrink_slab
> 

I don't think it's an i915 bug. Another candidate fix in the other
thread that Padraig started.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Root-causing kswapd spinning on Sandy Bridge laptops?
  2011-06-24 12:51     ` Mel Gorman
@ 2011-06-24 13:32       ` Andrew Lutomirski
  0 siblings, 0 replies; 14+ messages in thread
From: Andrew Lutomirski @ 2011-06-24 13:32 UTC (permalink / raw)
  To: Mel Gorman; +Cc: P?draig Brady, Minchan Kim, linux-mm

On Fri, Jun 24, 2011 at 8:51 AM, Mel Gorman <mgorman@suse.de> wrote:
> On Fri, Jun 24, 2011 at 11:24:24AM +0100, P?draig Brady wrote:
>> On 24/06/11 10:27, Minchan Kim wrote:
>> > Hi Andrew,
>> >
>> > Sorry but right now I don't have a time to dive into this.
>> > But it seems to be similar to the problem Mel is looking at.
>> > Cced him.
>> >
>> > Even, Pádraig Brady seem to have a reproducible scenario.
>> > I will look when I have a time.
>> > I hope I will be back sooner or later.
>>
>> My reproducer is (I've 3GB RAM, 1.5G swap):
>>   dd bs=1M count=3000 if=/dev/zero of=spin.test
>>
>> To stop it spinning I just have to uncache the data,
>> the handiest way being:
>>   rm spin.test
>>
>> To confirm, the top of the profile I posted is:
>>   i915_gem_object_bind_to_gtt
>>     shrink_slab
>>
>
> I don't think it's an i915 bug. Another candidate fix in the other
> thread that Padraig started.

I bet you're right.  I do indeed have a tiny high zone.  (No clue why
-- I have 2G of ram right now.)

I won't be a reliable tester because I don't have a good way to
reproduce this bug.

--Andy

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Root-causing kswapd spinning on Sandy Bridge laptops?
  2011-06-24  6:22 Root-causing kswapd spinning on Sandy Bridge laptops? Andrew Lutomirski
  2011-06-24  9:27 ` Minchan Kim
@ 2011-06-24 18:44 ` Andi Kleen
  2011-06-24 18:48   ` Andrew Lutomirski
  2011-06-24 18:54   ` Chris Mason
  1 sibling, 2 replies; 14+ messages in thread
From: Andi Kleen @ 2011-06-24 18:44 UTC (permalink / raw)
  To: Andrew Lutomirski; +Cc: Minchan Kim, linux-mm, intel-gfx

Andrew Lutomirski <luto@mit.edu> writes:

[Putting the Intel graphics driver developers in cc.]

> I'm back :-/
>
> I just triggered the kswapd bug on 2.6.39.1, which has the
> cond_resched in shrink_slab.  This time my system's still usable (I'm
> tying this email on it), but kswapd0 is taking 100% cpu.  It *does*
> schedule (tested by setting its affinity the same as another CPU hog
> and confirming that each one gets 50%).
>
> It appears to be calling i915_gem_inactive_shrink in a loop.  I have
> probes on entry and return of i915_gem_inactive_shrink and on return
> of shrink_slab.  I see:
>
>          kswapd0    47 [000] 59599.956573: mm_vmscan_kswapd_wake: nid=0 order=0
>          kswapd0    47 [000] 59599.956575: shrink_zone:
> (ffffffff810c848c) priority=12 zone=ffff8801005fe000
>          kswapd0    47 [000] 59599.956576: shrink_zone_return:
> (ffffffff810c848c <- ffffffff810c96c6) arg1=0
>          kswapd0    47 [000] 59599.956578: i915_gem_inactive_shrink:
> (ffffffffa0081e48) gfp_mask=d0 nr_to_scan=0
>          kswapd0    47 [000] 59599.956589: shrink_return:
> (ffffffffa0081e48 <- ffffffff810c6a62) arg1=320
>          kswapd0    47 [000] 59599.956589: shrink_slab_return:
> (ffffffff810c69f5 <- ffffffff810c96ec) arg1=0
>          kswapd0    47 [000] 59599.956592: i915_gem_inactive_shrink:
> (ffffffffa0081e48) gfp_mask=d0 nr_to_scan=0
>          kswapd0    47 [000] 59599.956602: shrink_return:
> (ffffffffa0081e48 <- ffffffff810c6a62) arg1=320
>          kswapd0    47 [000] 59599.956603: shrink_slab_return:
> (ffffffff810c69f5 <- ffffffff810c96ec) arg1=0
>          kswapd0    47 [000] 59599.956605: shrink_zone:
> (ffffffff810c848c) priority=12 zone=ffff8801005fee00
>          kswapd0    47 [000] 59599.956606: shrink_zone_return:
> (ffffffff810c848c <- ffffffff810c96c6) arg1=0
>          kswapd0    47 [000] 59599.956608: i915_gem_inactive_shrink:
> (ffffffffa0081e48) gfp_mask=d0 nr_to_scan=0
>          kswapd0    47 [000] 59599.956609: shrink_return:
> (ffffffffa0081e48 <- ffffffff810c6a62) arg1=0
>          kswapd0    47 [000] 59599.956610: shrink_slab_return:
> (ffffffff810c69f5 <- ffffffff810c96ec) arg1=0
>          kswapd0    47 [000] 59599.956611: mm_vmscan_kswapd_wake: nid=0 order=0
>          kswapd0    47 [000] 59599.956612: shrink_zone:
> (ffffffff810c848c) priority=12 zone=ffff8801005fe000
>          kswapd0    47 [000] 59599.956614: shrink_zone_return:
> (ffffffff810c848c <- ffffffff810c96c6) arg1=0
>          kswapd0    47 [000] 59599.956616: i915_gem_inactive_shrink:
> (ffffffffa0081e48) gfp_mask=d0 nr_to_scan=0
>          kswapd0    47 [000] 59599.956617: shrink_return:
> (ffffffffa0081e48 <- ffffffff810c6a62) arg1=0
>          kswapd0    47 [000] 59599.956618: shrink_slab_return:
> (ffffffff810c69f5 <- ffffffff810c96ec) arg1=0
>          kswapd0    47 [000] 59599.956620: i915_gem_inactive_shrink:
> (ffffffffa0081e48) gfp_mask=d0 nr_to_scan=0
>          kswapd0    47 [000] 59599.956621: shrink_return:
> (ffffffffa0081e48 <- ffffffff810c6a62) arg1=0
>          kswapd0    47 [000] 59599.956621: shrink_slab_return:
> (ffffffff810c69f5 <- ffffffff810c96ec) arg1=0
>          kswapd0    47 [000] 59599.956623: shrink_zone:
> (ffffffff810c848c) priority=12 zone=ffff8801005fee00
>          kswapd0    47 [000] 59599.956624: shrink_zone_return:
> (ffffffff810c848c <- ffffffff810c96c6) arg1=0
>          kswapd0    47 [000] 59599.956626: i915_gem_inactive_shrink:
> (ffffffffa0081e48) gfp_mask=d0 nr_to_scan=0
>          kswapd0    47 [000] 59599.956627: shrink_return:
> (ffffffffa0081e48 <- ffffffff810c6a62) arg1=0
>          kswapd0    47 [000] 59599.956628: shrink_slab_return:
> (ffffffff810c69f5 <- ffffffff810c96ec) arg1=0
>          kswapd0    47 [000] 59599.956629: mm_vmscan_kswapd_wake: nid=0 order=0
>
> The command was:
>
> perf record -g -aR -p 47 -e probe:i915_gem_inactive_shrink -e
> probe:shrink_return -e probe:shrink_slab_return -e probe:shrink_zone
> -e probe:shrink_zone_return -e probe:kswapd_try_to_sleep -e
> vmscan:mm_vmscan_kswapd_sleep -e vmscan:mm_vmscan_kswapd_wake -e
> vmscan:mm_vmscan_wakeup_kswapd -e vmscan:mm_vmscan_lru_shrink_inactive
> -e probe:wakeup_kswapd; perf script
>
> (shrink_return is i915_gem_inactive_shrink's return.  sorry, badly named.)
>
> It looks like something kswapd_try_to_sleep is not getting called.
>
> I do not know how to reproduce this, but I'll leave it running overnight.
>
> --Andy
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>

-- 
ak@linux.intel.com -- Speaking for myself only

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Root-causing kswapd spinning on Sandy Bridge laptops?
  2011-06-24 18:44 ` Andi Kleen
@ 2011-06-24 18:48   ` Andrew Lutomirski
  2011-06-24 19:13     ` Andi Kleen
  2011-06-24 18:54   ` Chris Mason
  1 sibling, 1 reply; 14+ messages in thread
From: Andrew Lutomirski @ 2011-06-24 18:48 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Minchan Kim, linux-mm, intel-gfx

On Fri, Jun 24, 2011 at 12:44 PM, Andi Kleen <andi@firstfloor.org> wrote:
> Andrew Lutomirski <luto@mit.edu> writes:
>
> [Putting the Intel graphics driver developers in cc.]

My Sandy Bridge laptop is to blame, the graphics aren't the culprit.  It's this:

  BIOS-e820: 0000000100000000 - 0000000100600000 (usable)

The kernel can't handle the tiny bit of memory above 4G.  Mel's
patches work so far.

--Andy

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Root-causing kswapd spinning on Sandy Bridge laptops?
  2011-06-24 18:44 ` Andi Kleen
  2011-06-24 18:48   ` Andrew Lutomirski
@ 2011-06-24 18:54   ` Chris Mason
  2011-06-27 11:03     ` Mel Gorman
  1 sibling, 1 reply; 14+ messages in thread
From: Chris Mason @ 2011-06-24 18:54 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Andrew Lutomirski, Minchan Kim, linux-mm, intel-gfx

Excerpts from Andi Kleen's message of 2011-06-24 14:44:12 -0400:
> Andrew Lutomirski <luto@mit.edu> writes:
> 
> [Putting the Intel graphics driver developers in cc.]
> 
> > I'm back :-/
> >
> > I just triggered the kswapd bug on 2.6.39.1, which has the
> > cond_resched in shrink_slab.  This time my system's still usable (I'm
> > tying this email on it), but kswapd0 is taking 100% cpu.  It *does*
> > schedule (tested by setting its affinity the same as another CPU hog
> > and confirming that each one gets 50%).
> >
> > It appears to be calling i915_gem_inactive_shrink in a loop.  I have
> > probes on entry and return of i915_gem_inactive_shrink and on return
> > of shrink_slab.  I see:
> >
> >          kswapd0    47 [000] 59599.956573: mm_vmscan_kswapd_wake: nid=0 order=0
> >          kswapd0    47 [000] 59599.956575: shrink_zone:
> > (ffffffff810c848c) priority=12 zone=ffff8801005fe000
> >          kswapd0    47 [000] 59599.956576: shrink_zone_return:
> > (ffffffff810c848c <- ffffffff810c96c6) arg1=0
> >          kswapd0    47 [000] 59599.956578: i915_gem_inactive_shrink:

A similar trace came up a bunch of times in Jejb's NMI softlockup/kswapd
consumes the machine thread.  That one was tracked down to slub high
order allocations.

I'm sure that one is burned in on Mel's memory, but after a while the
individual traces fell out of the thread, and I'm not sure the i915 part
stuck out.

-chris

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Root-causing kswapd spinning on Sandy Bridge laptops?
  2011-06-24 18:48   ` Andrew Lutomirski
@ 2011-06-24 19:13     ` Andi Kleen
  2011-06-24 19:17       ` Christoph Hellwig
  0 siblings, 1 reply; 14+ messages in thread
From: Andi Kleen @ 2011-06-24 19:13 UTC (permalink / raw)
  To: Andrew Lutomirski; +Cc: Andi Kleen, Minchan Kim, linux-mm, intel-gfx

On Fri, Jun 24, 2011 at 12:48:20PM -0600, Andrew Lutomirski wrote:
> On Fri, Jun 24, 2011 at 12:44 PM, Andi Kleen <andi@firstfloor.org> wrote:
> > Andrew Lutomirski <luto@mit.edu> writes:
> >
> > [Putting the Intel graphics driver developers in cc.]
> 
> My Sandy Bridge laptop is to blame, the graphics aren't the culprit.  It's this:
> 
>   BIOS-e820: 0000000100000000 - 0000000100600000 (usable)
> 
> The kernel can't handle the tiny bit of memory above 4G.  Mel's
> patches work so far.

Maybe the graphics driver could be still nicer the VM and perhaps
be more aggressive in the callback?

But I failed anyways because the graphics developers run a closed
list. Never mind.

-Andi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Root-causing kswapd spinning on Sandy Bridge laptops?
  2011-06-24 19:13     ` Andi Kleen
@ 2011-06-24 19:17       ` Christoph Hellwig
  0 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2011-06-24 19:17 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Andrew Lutomirski, Minchan Kim, linux-mm, intel-gfx

On Fri, Jun 24, 2011 at 09:13:34PM +0200, Andi Kleen wrote:
> Maybe the graphics driver could be still nicer the VM and perhaps
> be more aggressive in the callback?

Or just fix the nasty bugs in there, e.g. apply

[PATCH] i915: slab shrinker have to return -1 if it can't shrink any objects

which was sent to lkml today.  Also the first three patches from Dave
Chinners per-sb shrinker series, which fix two bugs in the core shrinker
code, and add tracing to it should help a lot.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Root-causing kswapd spinning on Sandy Bridge laptops?
  2011-06-24 18:54   ` Chris Mason
@ 2011-06-27 11:03     ` Mel Gorman
  2011-06-27 20:18       ` James Bottomley
  0 siblings, 1 reply; 14+ messages in thread
From: Mel Gorman @ 2011-06-27 11:03 UTC (permalink / raw)
  To: Chris Mason
  Cc: Andi Kleen, Andrew Lutomirski, Minchan Kim, linux-mm, intel-gfx

On Fri, Jun 24, 2011 at 02:54:11PM -0400, Chris Mason wrote:
> Excerpts from Andi Kleen's message of 2011-06-24 14:44:12 -0400:
> > Andrew Lutomirski <luto@mit.edu> writes:
> > 
> > [Putting the Intel graphics driver developers in cc.]
> > 
> > > I'm back :-/
> > >
> > > I just triggered the kswapd bug on 2.6.39.1, which has the
> > > cond_resched in shrink_slab.  This time my system's still usable (I'm
> > > tying this email on it), but kswapd0 is taking 100% cpu.  It *does*
> > > schedule (tested by setting its affinity the same as another CPU hog
> > > and confirming that each one gets 50%).
> > >
> > > It appears to be calling i915_gem_inactive_shrink in a loop.  I have
> > > probes on entry and return of i915_gem_inactive_shrink and on return
> > > of shrink_slab.  I see:
> > >
> > >          kswapd0    47 [000] 59599.956573: mm_vmscan_kswapd_wake: nid=0 order=0
> > >          kswapd0    47 [000] 59599.956575: shrink_zone:
> > > (ffffffff810c848c) priority=12 zone=ffff8801005fe000
> > >          kswapd0    47 [000] 59599.956576: shrink_zone_return:
> > > (ffffffff810c848c <- ffffffff810c96c6) arg1=0
> > >          kswapd0    47 [000] 59599.956578: i915_gem_inactive_shrink:
> 
> A similar trace came up a bunch of times in Jejb's NMI softlockup/kswapd
> consumes the machine thread.  That one was tracked down to slub high
> order allocations.
> 
> I'm sure that one is burned in on Mel's memory, but after a while the
> individual traces fell out of the thread, and I'm not sure the i915 part
> stuck out.
> 

I expect that Jejb's lockup is also fixed by "Stop kswapd consuming
100% CPU when highest zone is small". i915 didn't help but at the end
of the day, kswapd shouldn't have been shrinking slab so aggressively.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Root-causing kswapd spinning on Sandy Bridge laptops?
  2011-06-27 11:03     ` Mel Gorman
@ 2011-06-27 20:18       ` James Bottomley
  0 siblings, 0 replies; 14+ messages in thread
From: James Bottomley @ 2011-06-27 20:18 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Chris Mason, Andi Kleen, Andrew Lutomirski, Minchan Kim, linux-mm,
	intel-gfx

On Mon, 2011-06-27 at 12:03 +0100, Mel Gorman wrote:
> On Fri, Jun 24, 2011 at 02:54:11PM -0400, Chris Mason wrote:
> > Excerpts from Andi Kleen's message of 2011-06-24 14:44:12 -0400:
> > > Andrew Lutomirski <luto@mit.edu> writes:
> > > 
> > > [Putting the Intel graphics driver developers in cc.]
> > > 
> > > > I'm back :-/
> > > >
> > > > I just triggered the kswapd bug on 2.6.39.1, which has the
> > > > cond_resched in shrink_slab.  This time my system's still usable (I'm
> > > > tying this email on it), but kswapd0 is taking 100% cpu.  It *does*
> > > > schedule (tested by setting its affinity the same as another CPU hog
> > > > and confirming that each one gets 50%).
> > > >
> > > > It appears to be calling i915_gem_inactive_shrink in a loop.  I have
> > > > probes on entry and return of i915_gem_inactive_shrink and on return
> > > > of shrink_slab.  I see:
> > > >
> > > >          kswapd0    47 [000] 59599.956573: mm_vmscan_kswapd_wake: nid=0 order=0
> > > >          kswapd0    47 [000] 59599.956575: shrink_zone:
> > > > (ffffffff810c848c) priority=12 zone=ffff8801005fe000
> > > >          kswapd0    47 [000] 59599.956576: shrink_zone_return:
> > > > (ffffffff810c848c <- ffffffff810c96c6) arg1=0
> > > >          kswapd0    47 [000] 59599.956578: i915_gem_inactive_shrink:
> > 
> > A similar trace came up a bunch of times in Jejb's NMI softlockup/kswapd
> > consumes the machine thread.  That one was tracked down to slub high
> > order allocations.
> > 
> > I'm sure that one is burned in on Mel's memory, but after a while the
> > individual traces fell out of the thread, and I'm not sure the i915 part
> > stuck out.
> > 
> 
> I expect that Jejb's lockup is also fixed by "Stop kswapd consuming
> 100% CPU when highest zone is small". i915 didn't help but at the end
> of the day, kswapd shouldn't have been shrinking slab so aggressively.

It will be a while before I can try this out, I'm afraid ... the laptop
is currently on tour in Europe with its owner.  I finally just
downgraded it to FC13 which made most of the issues go away.

James



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2011-06-27 20:18 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-06-24  6:22 Root-causing kswapd spinning on Sandy Bridge laptops? Andrew Lutomirski
2011-06-24  9:27 ` Minchan Kim
2011-06-24  9:38   ` Minchan Kim
2011-06-24 10:24   ` Pádraig Brady
2011-06-24 12:15     ` Pádraig Brady
2011-06-24 12:51     ` Mel Gorman
2011-06-24 13:32       ` Andrew Lutomirski
2011-06-24 18:44 ` Andi Kleen
2011-06-24 18:48   ` Andrew Lutomirski
2011-06-24 19:13     ` Andi Kleen
2011-06-24 19:17       ` Christoph Hellwig
2011-06-24 18:54   ` Chris Mason
2011-06-27 11:03     ` Mel Gorman
2011-06-27 20:18       ` James Bottomley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).