[Patch] Call cond_resched() at bottom of main look in balance

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [Patch] Call cond_resched() at bottom of main look in balance_pgdat()
@ 2010-06-17 18:48 Larry Woodman
  2010-06-21 11:45 ` KOSAKI Motohiro
  0 siblings, 1 reply; 11+ messages in thread
From: Larry Woodman @ 2010-06-17 18:48 UTC (permalink / raw)
  To: linux-kernel, linux-mm

We are seeing a problem where kswapd gets stuck and hogs the CPU on a
small single CPU system when an OOM kill should occur.  When this
happens swap space has been exhausted and the pagecache has been shrunk
to zero.  Once kswapd gets the CPU it never gives it up because at least
one zone is below high.  Adding a single cond_resched() at the end of
the main loop in balance_pgdat() fixes the problem by allowing the
watchdog and tasks to run and eventually do an OOM kill which frees up
the resources.


----------------------------------------------------------------
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 9c7e57c..c5c46b7 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2182,6 +2182,7 @@ loop_again:
                 */
                if (sc.nr_reclaimed >= SWAP_CLUSTER_MAX)
                        break;
+               cond_resched();
        }
 out:
        /*

-----------------------------------------------------------------

Signed-off-by: Larry Woodman <lwoodman@redhat.com>


-----------------------------------------------------------------
BUG: soft lockup - CPU#0 stuck for 61s! [kswapd0:26]
Modules linked in: sunrpc(U) p4_clockmod(U) ipv6(U) dm_mirror(U)...

Pid: 26, comm: kswapd0 Not tainted (2.6.32-34.el6.i686 #1) HP Compaq dx2300
EIP: 0060:[<c04e84fb>] EFLAGS: 00000246 CPU: 0
EIP is at shrink_slab+0x7b/0x170
EAX: 00000040 EBX: 00000000 ECX: dbf43e54 EDX: 00000000
ESI: e01c2b30 EDI: 00000000 EBP: c0ad4600 ESP: dbc8dec0
 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
CR0: 8005003b CR2: 0805090c CR3: 108eb000 CR4: 000006f0
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
Call Trace:
 [<c04eab91>] ? kswapd+0x541/0x830
 [<c04eae80>] ? isolate_pages_global+0x0/0x220
 [<c04702d0>] ? autoremove_wake_function+0x0/0x40
 [<c043d5e0>] ? complete+0x40/0x60
 [<c04ea650>] ? kswapd+0x0/0x830
 [<c0470094>] ? kthread+0x74/0x80
 [<c0470020>] ? kthread+0x0/0x80
 [<c040a547>] ? kernel_thread_helper+0x7/0x10
-------------------------------------------------------------------
Mem-Info:
DMA per-cpu:
CPU    0: hi:    0, btch:   1 usd:   0
Normal per-cpu:
CPU    0: hi:  186, btch:  31 usd: 152
active_anon:54902 inactive_anon:54849 isolated_anon:32
 active_file:0 inactive_file:25 isolated_file:0
 unevictable:660 dirty:0 writeback:6 unstable:0
 free:1172 slab_reclaimable:1969 slab_unreclaimable:8322
 mapped:196 shmem:801 pagetables:1300 bounce:0
...
Normal free:2672kB min:2764kB low:3452kB high:4144kB 
...
21729 total pagecache pages
20240 pages in swap cache
Swap cache stats: add 468211, delete 447971, find 12560445/12560936
Free swap  = 0kB
Total swap = 1015800kB
128720 pages RAM
0 pages HighMem
3223 pages reserved
1206 pages shared
121413 pages non-shared

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [Patch] Call cond_resched() at bottom of main look in balance_pgdat()
  2010-06-17 18:48 [Patch] Call cond_resched() at bottom of main look in balance_pgdat() Larry Woodman
@ 2010-06-21 11:45 ` KOSAKI Motohiro
  2010-06-21 14:13   ` Minchan Kim
  2010-06-22 18:21   ` Rik van Riel
  0 siblings, 2 replies; 11+ messages in thread
From: KOSAKI Motohiro @ 2010-06-21 11:45 UTC (permalink / raw)
  To: Larry Woodman; +Cc: kosaki.motohiro, linux-kernel, linux-mm

> We are seeing a problem where kswapd gets stuck and hogs the CPU on a
> small single CPU system when an OOM kill should occur.  When this
> happens swap space has been exhausted and the pagecache has been shrunk
> to zero.  Once kswapd gets the CPU it never gives it up because at least
> one zone is below high.  Adding a single cond_resched() at the end of
> the main loop in balance_pgdat() fixes the problem by allowing the
> watchdog and tasks to run and eventually do an OOM kill which frees up
> the resources.

Thank you. this seems regression.

> Mem-Info:
> DMA per-cpu:
> CPU    0: hi:    0, btch:   1 usd:   0
> Normal per-cpu:
> CPU    0: hi:  186, btch:  31 usd: 152
> active_anon:54902 inactive_anon:54849 isolated_anon:32
>  active_file:0 inactive_file:25 isolated_file:0
>  unevictable:660 dirty:0 writeback:6 unstable:0
>  free:1172 slab_reclaimable:1969 slab_unreclaimable:8322
>  mapped:196 shmem:801 pagetables:1300 bounce:0
> ...
> Normal free:2672kB min:2764kB low:3452kB high:4144kB 
> ...
> 21729 total pagecache pages
> 20240 pages in swap cache
> Swap cache stats: add 468211, delete 447971, find 12560445/12560936
> Free swap  = 0kB
> Total swap = 1015800kB
> 128720 pages RAM
> 0 pages HighMem
> 3223 pages reserved
> 1206 pages shared
> 121413 pages non-shared
> 

zero free swap. then, vmscan don't try to scan anon pages. but
file pages are almost zero. then, shrink_page_list() was not called
enough frequently....

I guess it is caused following commit (by me).

	commit bb3ab596832b920c703d1aea1ce76d69c0f71fb7
	Author: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
	Date:   Mon Dec 14 17:58:55 2009 -0800
	    vmscan: stop kswapd waiting on congestion when the min watermark is not being met

Very thanks your effort. your analysis seems perfect.

btw, I reformat your patch a bit. your previous email is a bit akpm 
unfriendly.


=============================================================
Subject: [PATCH] Call cond_resched() at bottom of main look in balance_pgdat()
From: Larry Woodman <lwoodman@redhat.com>

We are seeing a problem where kswapd gets stuck and hogs the CPU on a
small single CPU system when an OOM kill should occur.  When this
happens swap space has been exhausted and the pagecache has been shrunk
to zero.  Once kswapd gets the CPU it never gives it up because at least
one zone is below high.  Adding a single cond_resched() at the end of
the main loop in balance_pgdat() fixes the problem by allowing the
watchdog and tasks to run and eventually do an OOM kill which frees up
the resources.

kosaki note: This seems regression caused by commit bb3ab59683
(vmscan: stop kswapd waiting on congestion when the min watermark is
 not being met)

Signed-off-by: Larry Woodman <lwoodman@redhat.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/vmscan.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 9c7e57c..c5c46b7 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2182,6 +2182,7 @@ loop_again:
 		 */
 		if (sc.nr_reclaimed >= SWAP_CLUSTER_MAX)
 			break;
+		cond_resched();
 	}
 out:
 	/*
-- 
1.6.5.2



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [Patch] Call cond_resched() at bottom of main look in balance_pgdat()
  2010-06-21 11:45 ` KOSAKI Motohiro
@ 2010-06-21 14:13   ` Minchan Kim
  2010-06-22  2:24     ` KOSAKI Motohiro
  2010-06-22 18:21   ` Rik van Riel
  1 sibling, 1 reply; 11+ messages in thread
From: Minchan Kim @ 2010-06-21 14:13 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: Larry Woodman, linux-kernel, linux-mm

On Mon, Jun 21, 2010 at 08:45:46PM +0900, KOSAKI Motohiro wrote:
> > We are seeing a problem where kswapd gets stuck and hogs the CPU on a
> > small single CPU system when an OOM kill should occur.  When this
> > happens swap space has been exhausted and the pagecache has been shrunk
> > to zero.  Once kswapd gets the CPU it never gives it up because at least
> > one zone is below high.  Adding a single cond_resched() at the end of
> > the main loop in balance_pgdat() fixes the problem by allowing the
> > watchdog and tasks to run and eventually do an OOM kill which frees up
> > the resources.
> 
> Thank you. this seems regression.

Yes. I waited your response. :)

> 
> > Mem-Info:
> > DMA per-cpu:
> > CPU    0: hi:    0, btch:   1 usd:   0
> > Normal per-cpu:
> > CPU    0: hi:  186, btch:  31 usd: 152
> > active_anon:54902 inactive_anon:54849 isolated_anon:32
> >  active_file:0 inactive_file:25 isolated_file:0
> >  unevictable:660 dirty:0 writeback:6 unstable:0
> >  free:1172 slab_reclaimable:1969 slab_unreclaimable:8322
> >  mapped:196 shmem:801 pagetables:1300 bounce:0
> > ...
> > Normal free:2672kB min:2764kB low:3452kB high:4144kB 
> > ...
> > 21729 total pagecache pages
> > 20240 pages in swap cache
> > Swap cache stats: add 468211, delete 447971, find 12560445/12560936
> > Free swap  = 0kB
> > Total swap = 1015800kB
> > 128720 pages RAM
> > 0 pages HighMem
> > 3223 pages reserved
> > 1206 pages shared
> > 121413 pages non-shared
> > 
> 
> zero free swap. then, vmscan don't try to scan anon pages. but
> file pages are almost zero. then, shrink_page_list() was not called
> enough frequently....
> 
> I guess it is caused following commit (by me).
> 
> 	commit bb3ab596832b920c703d1aea1ce76d69c0f71fb7
> 	Author: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> 	Date:   Mon Dec 14 17:58:55 2009 -0800
> 	    vmscan: stop kswapd waiting on congestion when the min watermark is not being met
> 
> Very thanks your effort. your analysis seems perfect.
> 
> btw, I reformat your patch a bit. your previous email is a bit akpm 
> unfriendly.
> 
> 
> =============================================================
> Subject: [PATCH] Call cond_resched() at bottom of main look in balance_pgdat()
> From: Larry Woodman <lwoodman@redhat.com>
> 
> We are seeing a problem where kswapd gets stuck and hogs the CPU on a
> small single CPU system when an OOM kill should occur.  When this
> happens swap space has been exhausted and the pagecache has been shrunk
> to zero.  Once kswapd gets the CPU it never gives it up because at least
> one zone is below high.  Adding a single cond_resched() at the end of
> the main loop in balance_pgdat() fixes the problem by allowing the
> watchdog and tasks to run and eventually do an OOM kill which frees up
> the resources.
> 
> kosaki note: This seems regression caused by commit bb3ab59683
> (vmscan: stop kswapd waiting on congestion when the min watermark is
>  not being met)
> 
> Signed-off-by: Larry Woodman <lwoodman@redhat.com>
> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> ---
>  mm/vmscan.c |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 9c7e57c..c5c46b7 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2182,6 +2182,7 @@ loop_again:
>  		 */
>  		if (sc.nr_reclaimed >= SWAP_CLUSTER_MAX)
>  			break;
> +		cond_resched();
>  	}
>  out:
>  	/*
> -- 
> 1.6.5.2

Kosaki's patch's goal is that kswap doesn't yield cpu if the zone doesn't meet its
min watermark to avoid failing atomic allocation.
But this patch could yield kswapd's time slice at any time. 
Doesn't the patch break your goal in bb3ab59683?


-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Patch] Call cond_resched() at bottom of main look in balance_pgdat()
  2010-06-21 14:13   ` Minchan Kim
@ 2010-06-22  2:24     ` KOSAKI Motohiro
  2010-06-22  2:45       ` Minchan Kim
  0 siblings, 1 reply; 11+ messages in thread
From: KOSAKI Motohiro @ 2010-06-22  2:24 UTC (permalink / raw)
  To: Minchan Kim; +Cc: kosaki.motohiro, Larry Woodman, linux-kernel, linux-mm

> > =============================================================
> > Subject: [PATCH] Call cond_resched() at bottom of main look in balance_pgdat()
> > From: Larry Woodman <lwoodman@redhat.com>
> > 
> > We are seeing a problem where kswapd gets stuck and hogs the CPU on a
> > small single CPU system when an OOM kill should occur.  When this
> > happens swap space has been exhausted and the pagecache has been shrunk
> > to zero.  Once kswapd gets the CPU it never gives it up because at least
> > one zone is below high.  Adding a single cond_resched() at the end of
> > the main loop in balance_pgdat() fixes the problem by allowing the
> > watchdog and tasks to run and eventually do an OOM kill which frees up
> > the resources.
> > 
> > kosaki note: This seems regression caused by commit bb3ab59683
> > (vmscan: stop kswapd waiting on congestion when the min watermark is
> >  not being met)
> > 
> > Signed-off-by: Larry Woodman <lwoodman@redhat.com>
> > Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > ---
> >  mm/vmscan.c |    1 +
> >  1 files changed, 1 insertions(+), 0 deletions(-)
> > 
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 9c7e57c..c5c46b7 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -2182,6 +2182,7 @@ loop_again:
> >  		 */
> >  		if (sc.nr_reclaimed >= SWAP_CLUSTER_MAX)
> >  			break;
> > +		cond_resched();
> >  	}
> >  out:
> >  	/*
> > -- 
> > 1.6.5.2
> 
> Kosaki's patch's goal is that kswap doesn't yield cpu if the zone doesn't meet its
> min watermark to avoid failing atomic allocation.
> But this patch could yield kswapd's time slice at any time. 
> Doesn't the patch break your goal in bb3ab59683?

No. it don't break.

Typically, kswapd periodically call shrink_page_list() and it call
cond_resched() even if bb3ab59683 case.
Larry observed very exceptional situation. his system don't have
reclaimable pages at all, then eventually shrink_page_list() was not
called very long time.
His patch only change such very rare situation, I think it's safe.

Thanks.




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Patch] Call cond_resched() at bottom of main look in balance_pgdat()
  2010-06-22  2:24     ` KOSAKI Motohiro
@ 2010-06-22  2:45       ` Minchan Kim
  2010-06-22  3:23         ` KOSAKI Motohiro
  0 siblings, 1 reply; 11+ messages in thread
From: Minchan Kim @ 2010-06-22  2:45 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: Larry Woodman, linux-kernel, linux-mm

On Tue, Jun 22, 2010 at 11:24 AM, KOSAKI Motohiro
<kosaki.motohiro@jp.fujitsu.com> wrote:
>> > =============================================================
>> > Subject: [PATCH] Call cond_resched() at bottom of main look in balance_pgdat()
>> > From: Larry Woodman <lwoodman@redhat.com>
>> >
>> > We are seeing a problem where kswapd gets stuck and hogs the CPU on a
>> > small single CPU system when an OOM kill should occur.  When this
>> > happens swap space has been exhausted and the pagecache has been shrunk
>> > to zero.  Once kswapd gets the CPU it never gives it up because at least
>> > one zone is below high.  Adding a single cond_resched() at the end of
>> > the main loop in balance_pgdat() fixes the problem by allowing the
>> > watchdog and tasks to run and eventually do an OOM kill which frees up
>> > the resources.
>> >
>> > kosaki note: This seems regression caused by commit bb3ab59683
>> > (vmscan: stop kswapd waiting on congestion when the min watermark is
>> >  not being met)
>> >
>> > Signed-off-by: Larry Woodman <lwoodman@redhat.com>
>> > Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
>> > ---
>> >  mm/vmscan.c |    1 +
>> >  1 files changed, 1 insertions(+), 0 deletions(-)
>> >
>> > diff --git a/mm/vmscan.c b/mm/vmscan.c
>> > index 9c7e57c..c5c46b7 100644
>> > --- a/mm/vmscan.c
>> > +++ b/mm/vmscan.c
>> > @@ -2182,6 +2182,7 @@ loop_again:
>> >              */
>> >             if (sc.nr_reclaimed >= SWAP_CLUSTER_MAX)
>> >                     break;
>> > +           cond_resched();
>> >     }
>> >  out:
>> >     /*
>> > --
>> > 1.6.5.2
>>
>> Kosaki's patch's goal is that kswap doesn't yield cpu if the zone doesn't meet its
>> min watermark to avoid failing atomic allocation.
>> But this patch could yield kswapd's time slice at any time.
>> Doesn't the patch break your goal in bb3ab59683?
>
> No. it don't break.
>
> Typically, kswapd periodically call shrink_page_list() and it call
> cond_resched() even if bb3ab59683 case.

Hmm. If it is, bb3ab59683 is effective really?

The bb3ab59683's goal is prevent CPU yield in case of free < min_watermark.
But shrink_page_list can yield cpu from kswapd at any time.
So I am not sure what is bb3ab59683's benefit.
Did you have any number about bb3ab59683's effectiveness?
(Of course, I know it's very hard. Just out of curiosity)

As a matter of fact, when I saw this Larry's patch, I thought it would
be better to revert bb3ab59683. Then congestion_wait could yield CPU
to other process.

What do you think about?

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Patch] Call cond_resched() at bottom of main look in  balance_pgdat()
  2010-06-22  2:45       ` Minchan Kim
@ 2010-06-22  3:23         ` KOSAKI Motohiro
  2010-06-22  4:29           ` Minchan Kim
  0 siblings, 1 reply; 11+ messages in thread
From: KOSAKI Motohiro @ 2010-06-22  3:23 UTC (permalink / raw)
  To: Minchan Kim; +Cc: kosaki.motohiro, Larry Woodman, linux-kernel, linux-mm

> >> Kosaki's patch's goal is that kswap doesn't yield cpu if the zone doesn't meet its
> >> min watermark to avoid failing atomic allocation.
> >> But this patch could yield kswapd's time slice at any time.
> >> Doesn't the patch break your goal in bb3ab59683?
> >
> > No. it don't break.
> >
> > Typically, kswapd periodically call shrink_page_list() and it call
> > cond_resched() even if bb3ab59683 case.
> 
> Hmm. If it is, bb3ab59683 is effective really?
> 
> The bb3ab59683's goal is prevent CPU yield in case of free < min_watermark.
> But shrink_page_list can yield cpu from kswapd at any time.
> So I am not sure what is bb3ab59683's benefit.
> Did you have any number about bb3ab59683's effectiveness?
> (Of course, I know it's very hard. Just out of curiosity)
> 
> As a matter of fact, when I saw this Larry's patch, I thought it would
> be better to revert bb3ab59683. Then congestion_wait could yield CPU
> to other process.
> 
> What do you think about?

No. The goal is not prevent CPU yield. The goal is avoid unnecessary
_long_ sleep (i.e. congestion_wait(BLK_RW_ASYNC, HZ/10)).
Anyway we can't refuse CPU yield on UP. it lead to hangup ;)

What do you mean the number? If it mean how much reduce congestion_wait(),
it was posted a lot of time. If it mean how much reduce page allocation 
failure bug report, I think it has been observable reduced since half 
years ago. 

If you have specific worried concern, can you please share it?



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Patch] Call cond_resched() at bottom of main look in balance_pgdat()
  2010-06-22  3:23         ` KOSAKI Motohiro
@ 2010-06-22  4:29           ` Minchan Kim
  2010-06-22 21:33             ` Johannes Weiner
  0 siblings, 1 reply; 11+ messages in thread
From: Minchan Kim @ 2010-06-22  4:29 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: Larry Woodman, linux-kernel, linux-mm

On Tue, Jun 22, 2010 at 12:23 PM, KOSAKI Motohiro
<kosaki.motohiro@jp.fujitsu.com> wrote:
>> >> Kosaki's patch's goal is that kswap doesn't yield cpu if the zone doesn't meet its
>> >> min watermark to avoid failing atomic allocation.
>> >> But this patch could yield kswapd's time slice at any time.
>> >> Doesn't the patch break your goal in bb3ab59683?
>> >
>> > No. it don't break.
>> >
>> > Typically, kswapd periodically call shrink_page_list() and it call
>> > cond_resched() even if bb3ab59683 case.
>>
>> Hmm. If it is, bb3ab59683 is effective really?
>>
>> The bb3ab59683's goal is prevent CPU yield in case of free < min_watermark.
>> But shrink_page_list can yield cpu from kswapd at any time.
>> So I am not sure what is bb3ab59683's benefit.
>> Did you have any number about bb3ab59683's effectiveness?
>> (Of course, I know it's very hard. Just out of curiosity)
>>
>> As a matter of fact, when I saw this Larry's patch, I thought it would
>> be better to revert bb3ab59683. Then congestion_wait could yield CPU
>> to other process.
>>
>> What do you think about?
>
> No. The goal is not prevent CPU yield. The goal is avoid unnecessary
> _long_ sleep (i.e. congestion_wait(BLK_RW_ASYNC, HZ/10)).

I meant it.

> Anyway we can't refuse CPU yield on UP. it lead to hangup ;)
>
> What do you mean the number? If it mean how much reduce congestion_wait(),
> it was posted a lot of time. If it mean how much reduce page allocation
> failure bug report, I think it has been observable reduced since half
> years ago.

I meant second.
Hmm. I doubt it's observable since at that time, Mel had posted many
patches to reduce page allocation fail. bb3ab59683 was just one of
them.

>
> If you have specific worried concern, can you please share it?
>

My concern is that I don't want to add new band-aid on uncertain
feature to solve
regression of uncertain feature.(Sorry for calling Larry's patch as band-aid.).
If we revert bb3ab59683, congestion_wait in balance_pgdat could yield
cpu from kswapd.

If you insist on bb3ab59683's effective and have proved it at past, I
am not against it.

And If it's regression of bb3ab59683, Doesn't it make sense following as?
It could restore old behavior.

---
                 * OK, kswapd is getting into trouble.  Take a nap, then take
                 * another pass across the zones.
                 */
                if (total_scanned && (priority < DEF_PRIORITY - 2)) {
                        if (has_under_min_watermark_zone) {
                                count_vm_event(KSWAPD_SKIP_CONGESTION_WAIT);
                                /* allowing CPU yield to go on
watchdog or OOMed task */
                                cond_resched();
                        }
                        else
                                congestion_wait(BLK_RW_ASYNC, HZ/10);
                }


-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Patch] Call cond_resched() at bottom of main look in balance_pgdat()
  2010-06-22  4:29           ` Minchan Kim
@ 2010-06-22 21:33             ` Johannes Weiner
  2010-06-22 23:07               ` Minchan Kim
  0 siblings, 1 reply; 11+ messages in thread
From: Johannes Weiner @ 2010-06-22 21:33 UTC (permalink / raw)
  To: Minchan Kim; +Cc: KOSAKI Motohiro, Larry Woodman, linux-kernel, linux-mm

On Tue, Jun 22, 2010 at 01:29:17PM +0900, Minchan Kim wrote:
> On Tue, Jun 22, 2010 at 12:23 PM, KOSAKI Motohiro
> <kosaki.motohiro@jp.fujitsu.com> wrote:
> >> >> Kosaki's patch's goal is that kswap doesn't yield cpu if the zone doesn't meet its
> >> >> min watermark to avoid failing atomic allocation.
> >> >> But this patch could yield kswapd's time slice at any time.
> >> >> Doesn't the patch break your goal in bb3ab59683?
> >> >
> >> > No. it don't break.
> >> >
> >> > Typically, kswapd periodically call shrink_page_list() and it call
> >> > cond_resched() even if bb3ab59683 case.
> >>
> >> Hmm. If it is, bb3ab59683 is effective really?
> >>
> >> The bb3ab59683's goal is prevent CPU yield in case of free < min_watermark.
> >> But shrink_page_list can yield cpu from kswapd at any time.
> >> So I am not sure what is bb3ab59683's benefit.
> >> Did you have any number about bb3ab59683's effectiveness?
> >> (Of course, I know it's very hard. Just out of curiosity)
> >>
> >> As a matter of fact, when I saw this Larry's patch, I thought it would
> >> be better to revert bb3ab59683. Then congestion_wait could yield CPU
> >> to other process.
> >>
> >> What do you think about?
> >
> > No. The goal is not prevent CPU yield. The goal is avoid unnecessary
> > _long_ sleep (i.e. congestion_wait(BLK_RW_ASYNC, HZ/10)).
> 
> I meant it.
> 
> > Anyway we can't refuse CPU yield on UP. it lead to hangup ;)
> >
> > What do you mean the number? If it mean how much reduce congestion_wait(),
> > it was posted a lot of time. If it mean how much reduce page allocation
> > failure bug report, I think it has been observable reduced since half
> > years ago.
> 
> I meant second.
> Hmm. I doubt it's observable since at that time, Mel had posted many
> patches to reduce page allocation fail. bb3ab59683 was just one of
> them.
> 
> >
> > If you have specific worried concern, can you please share it?
> >
> 
> My concern is that I don't want to add new band-aid on uncertain
> feature to solve
> regression of uncertain feature.(Sorry for calling Larry's patch as band-aid.).
> If we revert bb3ab59683, congestion_wait in balance_pgdat could yield
> cpu from kswapd.
> 
> If you insist on bb3ab59683's effective and have proved it at past, I
> am not against it.
> 
> And If it's regression of bb3ab59683, Doesn't it make sense following as?
> It could restore old behavior.
> 
> ---
>                  * OK, kswapd is getting into trouble.  Take a nap, then take
>                  * another pass across the zones.
>                  */
>                 if (total_scanned && (priority < DEF_PRIORITY - 2)) {
>                         if (has_under_min_watermark_zone) {
>                                 count_vm_event(KSWAPD_SKIP_CONGESTION_WAIT);
>                                 /* allowing CPU yield to go on
> watchdog or OOMed task */
>                                 cond_resched();

We have two things here: one is waiting for some IO to complete, which
we skip if we are in a hurry.  The other thing is that we have a
potentially long-running loop with no garuanteed rescheduling point in
it.  I would rather not mix up those two and let this cond_resched()
for #2 stand on it's own and be self-explanatory.

So,

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

to Larry's patch (or KOSAKI-san's version of it for that matter).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Patch] Call cond_resched() at bottom of main look in balance_pgdat()
  2010-06-22 21:33             ` Johannes Weiner
@ 2010-06-22 23:07               ` Minchan Kim
  2010-06-28 19:10                 ` Andrew Morton
  0 siblings, 1 reply; 11+ messages in thread
From: Minchan Kim @ 2010-06-22 23:07 UTC (permalink / raw)
  To: Johannes Weiner; +Cc: KOSAKI Motohiro, Larry Woodman, linux-kernel, linux-mm

On Wed, Jun 23, 2010 at 6:33 AM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> On Tue, Jun 22, 2010 at 01:29:17PM +0900, Minchan Kim wrote:
>> On Tue, Jun 22, 2010 at 12:23 PM, KOSAKI Motohiro
>> <kosaki.motohiro@jp.fujitsu.com> wrote:
>> >> >> Kosaki's patch's goal is that kswap doesn't yield cpu if the zone doesn't meet its
>> >> >> min watermark to avoid failing atomic allocation.
>> >> >> But this patch could yield kswapd's time slice at any time.
>> >> >> Doesn't the patch break your goal in bb3ab59683?
>> >> >
>> >> > No. it don't break.
>> >> >
>> >> > Typically, kswapd periodically call shrink_page_list() and it call
>> >> > cond_resched() even if bb3ab59683 case.
>> >>
>> >> Hmm. If it is, bb3ab59683 is effective really?
>> >>
>> >> The bb3ab59683's goal is prevent CPU yield in case of free < min_watermark.
>> >> But shrink_page_list can yield cpu from kswapd at any time.
>> >> So I am not sure what is bb3ab59683's benefit.
>> >> Did you have any number about bb3ab59683's effectiveness?
>> >> (Of course, I know it's very hard. Just out of curiosity)
>> >>
>> >> As a matter of fact, when I saw this Larry's patch, I thought it would
>> >> be better to revert bb3ab59683. Then congestion_wait could yield CPU
>> >> to other process.
>> >>
>> >> What do you think about?
>> >
>> > No. The goal is not prevent CPU yield. The goal is avoid unnecessary
>> > _long_ sleep (i.e. congestion_wait(BLK_RW_ASYNC, HZ/10)).
>>
>> I meant it.
>>
>> > Anyway we can't refuse CPU yield on UP. it lead to hangup ;)
>> >
>> > What do you mean the number? If it mean how much reduce congestion_wait(),
>> > it was posted a lot of time. If it mean how much reduce page allocation
>> > failure bug report, I think it has been observable reduced since half
>> > years ago.
>>
>> I meant second.
>> Hmm. I doubt it's observable since at that time, Mel had posted many
>> patches to reduce page allocation fail. bb3ab59683 was just one of
>> them.
>>
>> >
>> > If you have specific worried concern, can you please share it?
>> >
>>
>> My concern is that I don't want to add new band-aid on uncertain
>> feature to solve
>> regression of uncertain feature.(Sorry for calling Larry's patch as band-aid.).
>> If we revert bb3ab59683, congestion_wait in balance_pgdat could yield
>> cpu from kswapd.
>>
>> If you insist on bb3ab59683's effective and have proved it at past, I
>> am not against it.
>>
>> And If it's regression of bb3ab59683, Doesn't it make sense following as?
>> It could restore old behavior.
>>
>> ---
>>                  * OK, kswapd is getting into trouble.  Take a nap, then take
>>                  * another pass across the zones.
>>                  */
>>                 if (total_scanned && (priority < DEF_PRIORITY - 2)) {
>>                         if (has_under_min_watermark_zone) {
>>                                 count_vm_event(KSWAPD_SKIP_CONGESTION_WAIT);
>>                                 /* allowing CPU yield to go on
>> watchdog or OOMed task */
>>                                 cond_resched();
>
> We have two things here: one is waiting for some IO to complete, which
> we skip if we are in a hurry.  The other thing is that we have a
> potentially long-running loop with no garuanteed rescheduling point in
> it.  I would rather not mix up those two and let this cond_resched()
> for #2 stand on it's own and be self-explanatory.
>
> So,
>
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
>
> to Larry's patch (or KOSAKI-san's version of it for that matter).
>

Okay. As I hear Kosaki and Hannes opinions, I was paranoid.
Thanks for good comment!, Kosaki and Hannes.
Feel free to add my sign to Kosaki's version(I like detailed description :) )

Reviewed-by: Minchan Kim <minchan.kim@gmail.com>


-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Patch] Call cond_resched() at bottom of main look in balance_pgdat()
  2010-06-22 23:07               ` Minchan Kim
@ 2010-06-28 19:10                 ` Andrew Morton
  0 siblings, 0 replies; 11+ messages in thread
From: Andrew Morton @ 2010-06-28 19:10 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Johannes Weiner, KOSAKI Motohiro, Larry Woodman, linux-kernel,
	linux-mm

On Wed, 23 Jun 2010 08:07:34 +0900
Minchan Kim <minchan.kim@gmail.com> wrote:

> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
> Reviewed-by: Rik van Riel <riel@redhat.com>
> Reviewed-by: KOSAKI Motohiro<kosaki.motohiro@jp.fujitsu.com>

The patch is a bit sucky, isn't it?

a) the cond_resched() which Larry's patch adds is very special.  It
   _looks_ like a random preemption point but it's actually critical to
   the correct functioning of the system.  That's utterly unobvious to
   anyone who reads the code, so a comment explaining this *must* be
   included.

b) cond_resched() is a really crappy way of solving the problem
   which Larry described.  It will sit there chewing away CPU time
   until kswapd's timeslice expires.

I suppose we can live with b) although it _does_ suck and I'd suggest
that the comment include a big FIXME, so someone might fix it.

Larry, please fix a), gather the acks and reviewed-by's, update the
changelog to identify the commit which broke it and resend?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Patch] Call cond_resched() at bottom of main look in balance_pgdat()
  2010-06-21 11:45 ` KOSAKI Motohiro
  2010-06-21 14:13   ` Minchan Kim
@ 2010-06-22 18:21   ` Rik van Riel
  1 sibling, 0 replies; 11+ messages in thread
From: Rik van Riel @ 2010-06-22 18:21 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: Larry Woodman, linux-kernel, linux-mm

On 06/21/2010 07:45 AM, KOSAKI Motohiro wrote:

> kosaki note: This seems regression caused by commit bb3ab59683
> (vmscan: stop kswapd waiting on congestion when the min watermark is
>   not being met)
> 
> Signed-off-by: Larry Woodman<lwoodman@redhat.com>
> Reviewed-by: KOSAKI Motohiro<kosaki.motohiro@jp.fujitsu.com>

Reviewed-by: Rik van Riel <riel@redhat.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2010-06-28 19:11 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-06-17 18:48 [Patch] Call cond_resched() at bottom of main look in balance_pgdat() Larry Woodman
2010-06-21 11:45 ` KOSAKI Motohiro
2010-06-21 14:13   ` Minchan Kim
2010-06-22  2:24     ` KOSAKI Motohiro
2010-06-22  2:45       ` Minchan Kim
2010-06-22  3:23         ` KOSAKI Motohiro
2010-06-22  4:29           ` Minchan Kim
2010-06-22 21:33             ` Johannes Weiner
2010-06-22 23:07               ` Minchan Kim
2010-06-28 19:10                 ` Andrew Morton
2010-06-22 18:21   ` Rik van Riel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).