From: "avagin@gmail.com" <avagin@gmail.com>
To: Minchan Kim <minchan.kim@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
Andrew Morton <akpm@linux-foundation.org>,
Andrey Vagin <avagin@openvz.org>, Mel Gorman <mel@csn.ul.ie>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm: check zone->all_unreclaimable in all_unreclaimable()
Date: Fri, 11 Mar 2011 09:08:32 +0300 [thread overview]
Message-ID: <4D79BC60.1040106@gmail.com> (raw)
In-Reply-To: <AANLkTi=1695Wp9UheV_OKk5MixNUY2aHWfQ2WO1evSe2@mail.gmail.com>
On 03/11/2011 03:18 AM, Minchan Kim wrote:
> On Fri, Mar 11, 2011 at 8:58 AM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
>> On Thu, 10 Mar 2011 15:58:29 +0900
>> Minchan Kim<minchan.kim@gmail.com> wrote:
>>
>>> Hi Kame,
>>>
>>> Sorry for late response.
>>> I had a time to test this issue shortly because these day I am very busy.
>>> This issue was interesting to me.
>>> So I hope taking a time for enough testing when I have a time.
>>> I should find out root cause of livelock.
>>>
>>
>> Thanks. I and Kosaki-san reproduced the bug with swapless system.
>> Now, Kosaki-san is digging and found some issue with scheduler boost at OOM
>> and lack of enough "wait" in vmscan.c.
>>
>> I myself made patch like attached one. This works well for returning TRUE at
>> all_unreclaimable() but livelock(deadlock?) still happens.
>
> I saw the deadlock.
> It seems to happen by following code by my quick debug but not sure. I
> need to investigate further but don't have a time now. :(
>
>
> * Note: this may have a chance of deadlock if it gets
> * blocked waiting for another task which itself is waiting
> * for memory. Is there a better alternative?
> */
> if (test_tsk_thread_flag(p, TIF_MEMDIE))
> return ERR_PTR(-1UL);
> It would be wait to die the task forever without another victim selection.
> If it's right, It's a known BUG and we have no choice until now. Hmm.
I fixed this bug too and sent patch "mm: skip zombie in OOM-killer".
http://groups.google.com/group/linux.kernel/browse_thread/thread/b9c6ddf34d1671ab/2941e1877ca4f626?lnk=raot&pli=1
- if (test_tsk_thread_flag(p, TIF_MEMDIE))
+ if (test_tsk_thread_flag(p, TIF_MEMDIE) && p->mm)
return ERR_PTR(-1UL);
It is not committed yet, because Devid Rientjes and company think what
to do with "[patch] oom: prevent unnecessary oom kills or kernel panics.".
>
>> I wonder vmscan itself isn't a key for fixing issue.
>
> I agree.
>
>> Then, I'd like to wait for Kosaki-san's answer ;)
>
> Me, too. :)
>
>>
>> I'm now wondering how to catch fork-bomb and stop it (without using cgroup).
>
> Yes. Fork throttling without cgroup is very important.
> And as off-topic, mem_notify without memcontrol you mentioned is
> important to embedded people, I gues.
>
>> I think the problem is that fork-bomb is faster than killall...
>
> And deadlock problem I mentioned.
>
>>
>> Thanks,
>> -Kame
>
> Thanks for the investigation, Kame.
>
>> ==
>>
>> This is just a debug patch.
>>
>> ---
>> mm/vmscan.c | 58 ++++++++++++++++++++++++++++++++++++++++++++++++++++++----
>> 1 file changed, 54 insertions(+), 4 deletions(-)
>>
>> Index: mmotm-0303/mm/vmscan.c
>> ===================================================================
>> --- mmotm-0303.orig/mm/vmscan.c
>> +++ mmotm-0303/mm/vmscan.c
>> @@ -1983,9 +1983,55 @@ static void shrink_zones(int priority, s
>> }
>> }
>>
>> -static bool zone_reclaimable(struct zone *zone)
>> +static bool zone_seems_empty(struct zone *zone, struct scan_control *sc)
>> {
>> - return zone->pages_scanned< zone_reclaimable_pages(zone) * 6;
>> + unsigned long nr, wmark, free, isolated, lru;
>> +
>> + /*
>> + * If scanned, zone->pages_scanned is incremented and this can
>> + * trigger OOM.
>> + */
>> + if (sc->nr_scanned)
>> + return false;
>> +
>> + free = zone_page_state(zone, NR_FREE_PAGES);
>> + isolated = zone_page_state(zone, NR_ISOLATED_FILE);
>> + if (nr_swap_pages)
>> + isolated += zone_page_state(zone, NR_ISOLATED_ANON);
>> +
>> + /* In we cannot do scan, don't count LRU pages. */
>> + if (!zone->all_unreclaimable) {
>> + lru = zone_page_state(zone, NR_ACTIVE_FILE);
>> + lru += zone_page_state(zone, NR_INACTIVE_FILE);
>> + if (nr_swap_pages) {
>> + lru += zone_page_state(zone, NR_ACTIVE_ANON);
>> + lru += zone_page_state(zone, NR_INACTIVE_ANON);
>> + }
>> + } else
>> + lru = 0;
>> + nr = free + isolated + lru;
>> + wmark = min_wmark_pages(zone);
>> + wmark += zone->lowmem_reserve[gfp_zone(sc->gfp_mask)];
>> + wmark += 1<< sc->order;
>> + printk("thread %d/%ld all %d scanned %ld pages %ld/%ld/%ld/%ld/%ld/%ld\n",
>> + current->pid, sc->nr_scanned, zone->all_unreclaimable,
>> + zone->pages_scanned,
>> + nr,free,isolated,lru,
>> + zone_reclaimable_pages(zone), wmark);
>> + /*
>> + * In some case (especially noswap), almost all page cache are paged out
>> + * and we'll see the amount of reclaimable+free pages is smaller than
>> + * zone->min. In this case, we canoot expect any recovery other
>> + * than OOM-KILL. We can't reclaim memory enough for usual tasks.
>> + */
>> +
>> + return nr<= wmark;
>> +}
>> +
>> +static bool zone_reclaimable(struct zone *zone, struct scan_control *sc)
>> +{
>> + /* zone_reclaimable_pages() can return 0, we need<= */
>> + return zone->pages_scanned<= zone_reclaimable_pages(zone) * 6;
>> }
>>
>> /*
>> @@ -2006,11 +2052,15 @@ static bool all_unreclaimable(struct zon
>> continue;
>> if (!cpuset_zone_allowed_hardwall(zone, GFP_KERNEL))
>> continue;
>> - if (zone_reclaimable(zone)) {
>> + if (zone_seems_empty(zone, sc))
>> + continue;
>> + if (zone_reclaimable(zone, sc)) {
>> all_unreclaimable = false;
>> break;
>> }
>> }
>> + if (all_unreclaimable)
>> + printk("all_unreclaimable() returns TRUE\n");
>>
>> return all_unreclaimable;
>> }
>> @@ -2456,7 +2506,7 @@ loop_again:
>> if (zone->all_unreclaimable)
>> continue;
>> if (!compaction&& nr_slab == 0&&
>> - !zone_reclaimable(zone))
>> + !zone_reclaimable(zone,&sc))
>> zone->all_unreclaimable = 1;
>> /*
>> * If we've done a decent amount of scanning and
>>
>>
>
>
>
WARNING: multiple messages have this Message-ID (diff)
From: "avagin@gmail.com" <avagin@gmail.com>
To: Minchan Kim <minchan.kim@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
Andrew Morton <akpm@linux-foundation.org>,
Andrey Vagin <avagin@openvz.org>, Mel Gorman <mel@csn.ul.ie>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm: check zone->all_unreclaimable in all_unreclaimable()
Date: Fri, 11 Mar 2011 09:08:32 +0300 [thread overview]
Message-ID: <4D79BC60.1040106@gmail.com> (raw)
In-Reply-To: <AANLkTi=1695Wp9UheV_OKk5MixNUY2aHWfQ2WO1evSe2@mail.gmail.com>
On 03/11/2011 03:18 AM, Minchan Kim wrote:
> On Fri, Mar 11, 2011 at 8:58 AM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
>> On Thu, 10 Mar 2011 15:58:29 +0900
>> Minchan Kim<minchan.kim@gmail.com> wrote:
>>
>>> Hi Kame,
>>>
>>> Sorry for late response.
>>> I had a time to test this issue shortly because these day I am very busy.
>>> This issue was interesting to me.
>>> So I hope taking a time for enough testing when I have a time.
>>> I should find out root cause of livelock.
>>>
>>
>> Thanks. I and Kosaki-san reproduced the bug with swapless system.
>> Now, Kosaki-san is digging and found some issue with scheduler boost at OOM
>> and lack of enough "wait" in vmscan.c.
>>
>> I myself made patch like attached one. This works well for returning TRUE at
>> all_unreclaimable() but livelock(deadlock?) still happens.
>
> I saw the deadlock.
> It seems to happen by following code by my quick debug but not sure. I
> need to investigate further but don't have a time now. :(
>
>
> * Note: this may have a chance of deadlock if it gets
> * blocked waiting for another task which itself is waiting
> * for memory. Is there a better alternative?
> */
> if (test_tsk_thread_flag(p, TIF_MEMDIE))
> return ERR_PTR(-1UL);
> It would be wait to die the task forever without another victim selection.
> If it's right, It's a known BUG and we have no choice until now. Hmm.
I fixed this bug too and sent patch "mm: skip zombie in OOM-killer".
http://groups.google.com/group/linux.kernel/browse_thread/thread/b9c6ddf34d1671ab/2941e1877ca4f626?lnk=raot&pli=1
- if (test_tsk_thread_flag(p, TIF_MEMDIE))
+ if (test_tsk_thread_flag(p, TIF_MEMDIE) && p->mm)
return ERR_PTR(-1UL);
It is not committed yet, because Devid Rientjes and company think what
to do with "[patch] oom: prevent unnecessary oom kills or kernel panics.".
>
>> I wonder vmscan itself isn't a key for fixing issue.
>
> I agree.
>
>> Then, I'd like to wait for Kosaki-san's answer ;)
>
> Me, too. :)
>
>>
>> I'm now wondering how to catch fork-bomb and stop it (without using cgroup).
>
> Yes. Fork throttling without cgroup is very important.
> And as off-topic, mem_notify without memcontrol you mentioned is
> important to embedded people, I gues.
>
>> I think the problem is that fork-bomb is faster than killall...
>
> And deadlock problem I mentioned.
>
>>
>> Thanks,
>> -Kame
>
> Thanks for the investigation, Kame.
>
>> ==
>>
>> This is just a debug patch.
>>
>> ---
>> mm/vmscan.c | 58 ++++++++++++++++++++++++++++++++++++++++++++++++++++++----
>> 1 file changed, 54 insertions(+), 4 deletions(-)
>>
>> Index: mmotm-0303/mm/vmscan.c
>> ===================================================================
>> --- mmotm-0303.orig/mm/vmscan.c
>> +++ mmotm-0303/mm/vmscan.c
>> @@ -1983,9 +1983,55 @@ static void shrink_zones(int priority, s
>> }
>> }
>>
>> -static bool zone_reclaimable(struct zone *zone)
>> +static bool zone_seems_empty(struct zone *zone, struct scan_control *sc)
>> {
>> - return zone->pages_scanned< zone_reclaimable_pages(zone) * 6;
>> + unsigned long nr, wmark, free, isolated, lru;
>> +
>> + /*
>> + * If scanned, zone->pages_scanned is incremented and this can
>> + * trigger OOM.
>> + */
>> + if (sc->nr_scanned)
>> + return false;
>> +
>> + free = zone_page_state(zone, NR_FREE_PAGES);
>> + isolated = zone_page_state(zone, NR_ISOLATED_FILE);
>> + if (nr_swap_pages)
>> + isolated += zone_page_state(zone, NR_ISOLATED_ANON);
>> +
>> + /* In we cannot do scan, don't count LRU pages. */
>> + if (!zone->all_unreclaimable) {
>> + lru = zone_page_state(zone, NR_ACTIVE_FILE);
>> + lru += zone_page_state(zone, NR_INACTIVE_FILE);
>> + if (nr_swap_pages) {
>> + lru += zone_page_state(zone, NR_ACTIVE_ANON);
>> + lru += zone_page_state(zone, NR_INACTIVE_ANON);
>> + }
>> + } else
>> + lru = 0;
>> + nr = free + isolated + lru;
>> + wmark = min_wmark_pages(zone);
>> + wmark += zone->lowmem_reserve[gfp_zone(sc->gfp_mask)];
>> + wmark += 1<< sc->order;
>> + printk("thread %d/%ld all %d scanned %ld pages %ld/%ld/%ld/%ld/%ld/%ld\n",
>> + current->pid, sc->nr_scanned, zone->all_unreclaimable,
>> + zone->pages_scanned,
>> + nr,free,isolated,lru,
>> + zone_reclaimable_pages(zone), wmark);
>> + /*
>> + * In some case (especially noswap), almost all page cache are paged out
>> + * and we'll see the amount of reclaimable+free pages is smaller than
>> + * zone->min. In this case, we canoot expect any recovery other
>> + * than OOM-KILL. We can't reclaim memory enough for usual tasks.
>> + */
>> +
>> + return nr<= wmark;
>> +}
>> +
>> +static bool zone_reclaimable(struct zone *zone, struct scan_control *sc)
>> +{
>> + /* zone_reclaimable_pages() can return 0, we need<= */
>> + return zone->pages_scanned<= zone_reclaimable_pages(zone) * 6;
>> }
>>
>> /*
>> @@ -2006,11 +2052,15 @@ static bool all_unreclaimable(struct zon
>> continue;
>> if (!cpuset_zone_allowed_hardwall(zone, GFP_KERNEL))
>> continue;
>> - if (zone_reclaimable(zone)) {
>> + if (zone_seems_empty(zone, sc))
>> + continue;
>> + if (zone_reclaimable(zone, sc)) {
>> all_unreclaimable = false;
>> break;
>> }
>> }
>> + if (all_unreclaimable)
>> + printk("all_unreclaimable() returns TRUE\n");
>>
>> return all_unreclaimable;
>> }
>> @@ -2456,7 +2506,7 @@ loop_again:
>> if (zone->all_unreclaimable)
>> continue;
>> if (!compaction&& nr_slab == 0&&
>> - !zone_reclaimable(zone))
>> + !zone_reclaimable(zone,&sc))
>> zone->all_unreclaimable = 1;
>> /*
>> * If we've done a decent amount of scanning and
>>
>>
>
>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2011-03-11 6:08 UTC|newest]
Thread overview: 114+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-03-05 11:44 [PATCH] mm: check zone->all_unreclaimable in all_unreclaimable() Andrey Vagin
2011-03-05 11:44 ` Andrey Vagin
2011-03-05 15:20 ` Minchan Kim
2011-03-05 15:20 ` Minchan Kim
2011-03-05 15:34 ` Andrew Vagin
2011-03-05 15:53 ` Minchan Kim
2011-03-05 15:53 ` Minchan Kim
2011-03-05 16:41 ` Andrew Vagin
2011-03-05 16:41 ` Andrew Vagin
2011-03-05 17:07 ` Minchan Kim
2011-03-05 17:07 ` Minchan Kim
2011-03-07 21:58 ` Andrew Morton
2011-03-07 21:58 ` Andrew Morton
2011-03-07 23:45 ` Minchan Kim
2011-03-07 23:45 ` Minchan Kim
2011-03-09 5:37 ` KAMEZAWA Hiroyuki
2011-03-09 5:37 ` KAMEZAWA Hiroyuki
2011-03-09 5:43 ` KAMEZAWA Hiroyuki
2011-03-09 5:43 ` KAMEZAWA Hiroyuki
2011-03-10 6:58 ` Minchan Kim
2011-03-10 6:58 ` Minchan Kim
2011-03-10 23:58 ` KAMEZAWA Hiroyuki
2011-03-10 23:58 ` KAMEZAWA Hiroyuki
2011-03-11 0:18 ` Minchan Kim
2011-03-11 0:18 ` Minchan Kim
2011-03-11 6:08 ` avagin [this message]
2011-03-11 6:08 ` avagin
2011-03-14 1:03 ` Minchan Kim
2011-03-14 1:03 ` Minchan Kim
2011-03-08 0:44 ` KAMEZAWA Hiroyuki
2011-03-08 0:44 ` KAMEZAWA Hiroyuki
2011-03-08 3:06 ` KOSAKI Motohiro
2011-03-08 3:06 ` KOSAKI Motohiro
2011-03-08 19:02 ` avagin
2011-03-08 19:02 ` avagin
2011-03-09 5:52 ` KAMEZAWA Hiroyuki
2011-03-09 5:52 ` KAMEZAWA Hiroyuki
2011-03-09 6:17 ` KOSAKI Motohiro
2011-03-09 6:17 ` KOSAKI Motohiro
2011-03-10 14:08 ` KOSAKI Motohiro
2011-03-10 14:08 ` KOSAKI Motohiro
2011-03-08 8:12 ` Andrew Vagin
2011-03-08 8:12 ` Andrew Vagin
2011-03-09 6:06 ` KAMEZAWA Hiroyuki
2011-03-09 6:06 ` KAMEZAWA Hiroyuki
2011-05-04 1:38 ` CAI Qian
2011-05-09 6:54 ` KOSAKI Motohiro
2011-05-09 6:54 ` KOSAKI Motohiro
2011-05-09 8:47 ` CAI Qian
2011-05-09 8:47 ` CAI Qian
2011-05-09 9:19 ` KOSAKI Motohiro
2011-05-09 9:19 ` KOSAKI Motohiro
2011-05-10 8:11 ` OOM Killer don't works at all if the system have >gigabytes memory (was Re: [PATCH] mm: check zone->all_unreclaimable in all_unreclaimable()) KOSAKI Motohiro
2011-05-10 8:11 ` KOSAKI Motohiro
2011-05-10 8:14 ` [PATCH 1/4] oom: improve dump_tasks() show items KOSAKI Motohiro
2011-05-10 8:14 ` KOSAKI Motohiro
2011-05-10 23:29 ` David Rientjes
2011-05-10 23:29 ` David Rientjes
2011-05-13 10:14 ` KOSAKI Motohiro
2011-05-13 10:14 ` KOSAKI Motohiro
2011-05-10 8:15 ` [PATCH 2/4] oom: kill younger process first KOSAKI Motohiro
2011-05-10 8:15 ` KOSAKI Motohiro
2011-05-10 23:31 ` David Rientjes
2011-05-10 23:31 ` David Rientjes
2011-05-13 10:15 ` KOSAKI Motohiro
2011-05-13 10:15 ` KOSAKI Motohiro
2011-05-11 23:33 ` Minchan Kim
2011-05-11 23:33 ` Minchan Kim
2011-05-12 0:52 ` KAMEZAWA Hiroyuki
2011-05-12 0:52 ` KAMEZAWA Hiroyuki
2011-05-12 1:30 ` Minchan Kim
2011-05-12 1:30 ` Minchan Kim
2011-05-12 1:53 ` KAMEZAWA Hiroyuki
2011-05-12 1:53 ` KAMEZAWA Hiroyuki
2011-05-12 2:23 ` Minchan Kim
2011-05-12 2:23 ` Minchan Kim
2011-05-12 3:39 ` KAMEZAWA Hiroyuki
2011-05-12 3:39 ` KAMEZAWA Hiroyuki
2011-05-12 4:17 ` Minchan Kim
2011-05-12 4:17 ` Minchan Kim
2011-05-12 14:38 ` Paul E. McKenney
2011-05-12 14:38 ` Paul E. McKenney
2011-05-13 10:18 ` KOSAKI Motohiro
2011-05-13 10:18 ` KOSAKI Motohiro
2011-05-10 8:15 ` [PATCH 3/4] oom: oom-killer don't use permillage of system-ram internally KOSAKI Motohiro
2011-05-10 8:15 ` KOSAKI Motohiro
2011-05-10 23:40 ` David Rientjes
2011-05-10 23:40 ` David Rientjes
2011-05-13 10:30 ` KOSAKI Motohiro
2011-05-13 10:30 ` KOSAKI Motohiro
2011-05-10 8:16 ` [PATCH 4/4] oom: don't kill random process KOSAKI Motohiro
2011-05-10 8:16 ` KOSAKI Motohiro
2011-05-10 23:41 ` David Rientjes
2011-05-10 23:41 ` David Rientjes
2011-05-10 23:22 ` OOM Killer don't works at all if the system have >gigabytes memory (was Re: [PATCH] mm: check zone->all_unreclaimable in all_unreclaimable()) David Rientjes
2011-05-10 23:22 ` David Rientjes
2011-05-11 2:30 ` CAI Qian
2011-05-11 2:30 ` CAI Qian
2011-05-11 20:34 ` David Rientjes
2011-05-11 20:34 ` David Rientjes
2011-05-12 0:13 ` Minchan Kim
2011-05-12 0:13 ` Minchan Kim
2011-05-12 19:38 ` David Rientjes
2011-05-12 19:38 ` David Rientjes
2011-05-13 4:16 ` Minchan Kim
2011-05-13 4:16 ` Minchan Kim
2011-05-13 11:04 ` KOSAKI Motohiro
2011-05-13 11:04 ` KOSAKI Motohiro
2011-05-16 20:42 ` David Rientjes
2011-05-16 20:42 ` David Rientjes
2011-05-13 6:53 ` CAI Qian
2011-05-13 6:53 ` CAI Qian
2011-05-16 20:46 ` David Rientjes
2011-05-16 20:46 ` David Rientjes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4D79BC60.1040106@gmail.com \
--to=avagin@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=avagin@openvz.org \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mel@csn.ul.ie \
--cc=minchan.kim@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.