Re: Deadlock possibly caused by too_many_isolated.

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Minchan Kim <minchan.kim@gmail.com>
To: Torsten Kaiser <just.for.lkml@googlemail.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>, Neil Brown <neilb@suse.de>,
	Rik van Riel <riel@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"Li, Shaohua" <shaohua.li@intel.com>
Subject: Re: Deadlock possibly caused by too_many_isolated.
Date: Wed, 20 Oct 2010 23:23:26 +0900	[thread overview]
Message-ID: <20101020142326.GA5243@barrios-desktop> (raw)
In-Reply-To: <AANLkTinC=xcgfwgXw8Tr-Q_cnxZakjj_W=HwQRV+5vkd@mail.gmail.com>

Hello

On Wed, Oct 20, 2010 at 09:25:49AM +0200, Torsten Kaiser wrote:
> On Wed, Oct 20, 2010 at 7:57 AM, Wu Fengguang <fengguang.wu@intel.com> wrote:
> > On Tue, Oct 19, 2010 at 06:06:21PM +0800, Torsten Kaiser wrote:
> >> swap_writepage() uses get_swap_bio() which uses bio_alloc() to get one
> >> bio. That bio is the submitted, but the submit path seems to get into
> >> make_request from raid1.c and that allocates a second bio from
> >> bio_alloc() via bio_clone().
> >>
> >> I am seeing this pattern (swap_writepage calling
> >> md_make_request/make_request and then getting stuck in mempool_alloc)
> >> more than 5 times in the SysRq+T output...
> >
> > I bet the root cause is the failure of pool->alloc(__GFP_NORETRY)
> > inside mempool_alloc(), which can be fixed by this patch.
> 
> No. I tested the patch (ontop of Neils fix and your patch regarding
> too_many_isolated()), but the system got stuck the same way on the
> first try to fill the tmpfs.
> I think the basic problem is, that the mempool that should guarantee
> progress is exhausted because the raid1 device is stacked between the
> pageout code and the disks and so the "use only 1 bio"-rule gets
> violated.
> 
> > Thanks,
> > Fengguang
> > ---
> >
> > concurrent direct page reclaim problem
> >
> > ?__GFP_NORETRY page allocations may fail when there are many concurrent page
> > ?allocating tasks, but not necessary in real short of memory. The root cause
> > ?is, tasks will first run direct page reclaim to free some pages from the LRU
> > ?lists and put them to the per-cpu page lists and the buddy system, and then
> > ?try to get a free page from there. ?However the free pages reclaimed by this
> > ?task may be consumed by other tasks when the direct reclaim task is able to
> > ?get the free page for itself.
> 
> I believe the facts disagree with that assumtion. My bad for not
> posting this before, but I also used SysRq+M to see whats going on,
> but each time there still was some free memory.
> Here is the SysRq+M output from the run with only Neils patch applied,
> but on each other run the same ~14Mb stayed free


What is your problem?(Sorry if you explained it several time).
I read the thread. 
It seems Wu's patch solved deadlock problem by FS lock holding and too_many_isolated.
What is the problem remained in your case? unusable system by swapstorm?
If it is, I think it's expected behavior. Please see the below comment. 
(If I don't catch your point, Please explain your problem.)

> 
> [  437.481365] SysRq : Show Memory
> [  437.490003] Mem-Info:
> [  437.491357] Node 0 DMA per-cpu:
> [  437.500032] CPU    0: hi:    0, btch:   1 usd:   0
> [  437.500032] CPU    1: hi:    0, btch:   1 usd:   0
> [  437.500032] CPU    2: hi:    0, btch:   1 usd:   0
> [  437.500032] CPU    3: hi:    0, btch:   1 usd:   0
> [  437.500032] Node 0 DMA32 per-cpu:
> [  437.500032] CPU    0: hi:  186, btch:  31 usd: 138
> [  437.500032] CPU    1: hi:  186, btch:  31 usd:  30
> [  437.500032] CPU    2: hi:  186, btch:  31 usd:   0
> [  437.500032] CPU    3: hi:  186, btch:  31 usd:   0
> [  437.500032] Node 1 DMA32 per-cpu:
> [  437.500032] CPU    0: hi:  186, btch:  31 usd:   0
> [  437.500032] CPU    1: hi:  186, btch:  31 usd:   0
> [  437.500032] CPU    2: hi:  186, btch:  31 usd:   0
> [  437.500032] CPU    3: hi:  186, btch:  31 usd:   0
> [  437.500032] Node 1 Normal per-cpu:
> [  437.500032] CPU    0: hi:  186, btch:  31 usd:   0
> [  437.500032] CPU    1: hi:  186, btch:  31 usd:   0
> [  437.500032] CPU    2: hi:  186, btch:  31 usd:  25
> [  437.500032] CPU    3: hi:  186, btch:  31 usd:  30
> [  437.500032] active_anon:2039 inactive_anon:985233 isolated_anon:682
> [  437.500032]  active_file:1667 inactive_file:1723 isolated_file:0
> [  437.500032]  unevictable:0 dirty:0 writeback:25387 unstable:0
> [  437.500032]  free:3471 slab_reclaimable:2840 slab_unreclaimable:6337
> [  437.500032]  mapped:1284 shmem:960501 pagetables:523 bounce:0
> [  437.500032] Node 0 DMA free:8008kB min:28kB low:32kB high:40kB
> active_anon:0kB inact
> ive_anon:7596kB active_file:12kB inactive_file:0kB unevictable:0kB
> isolated(anon):0kB i
> solated(file):0kB present:15768kB mlocked:0kB dirty:0kB
> writeback:404kB mapped:0kB shme
> m:7192kB slab_reclaimable:32kB slab_unreclaimable:304kB
> kernel_stack:0kB pagetables:0kB
>  unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:118
> all_unreclaimable? no
> [  437.500032] lowmem_reserve[]: 0 2004 2004 2004

Node 0 DMA : free 8008K but lowmem_reserve 8012K(2004 pages)
So page allocator can't allocate the page unless preferred zone is DMA

> [  437.500032] Node 0 DMA32 free:2980kB min:4036kB low:5044kB
> high:6052kB active_anon:2
> 844kB inactive_anon:1918424kB active_file:3428kB inactive_file:3780kB
> unevictable:0kB isolated(anon):1232kB isolated(file):0kB
> present:2052320kB mlocked:0kB dirty:0kB writeback:72016kB
> mapped:2232kB shmem:1847640kB slab_reclaimable:5444kB
> slab_unreclaimable:13508kB kernel_stack:744kB pagetables:864kB
> unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0
> all_unreclaimable? no
> [  437.500032] lowmem_reserve[]: 0 0 0 0

Node 0 DMA32 : free 2980K but min 4036K.
Few file LRU compare to anon LRU

Normally, it could fail to allocate the page. 
'Normal' means caller doesn't request alloc_pages with __GFP_HIGH or !__GFP_WAIT
Generally many call sites don't pass gfp_flag with __GFP_HIGH|!__GFP_WAIT.

> [  437.500032] Node 1 DMA32 free:2188kB min:3036kB low:3792kB
> high:4552kB active_anon:0kB inactive_anon:1555368kB active_file:0kB
> inactive_file:28kB unevictable:0kB isolated(anon):768kB
> isolated(file):0kB present:1544000kB mlocked:0kB dirty:0kB
> writeback:21160kB mapped:0kB shmem:1534960kB slab_reclaimable:3728kB
> slab_unreclaimable:7076kB kernel_stack:8kB pagetables:0kB unstable:0kB
> bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
> [  437.500032] lowmem_reserve[]: 0 0 505 505

Node 1 DMA32 free : 2188K min 3036K 
It's a same situation with Node 0 DMA32. 
Normally, it could fail to allocate the page. 
Few file LRU compare to anon LRU


> [  437.500032] Node 1 Normal free:708kB min:1016kB low:1268kB
> high:1524kB active_anon:5312kB inactive_anon:459544kB
> active_file:3228kB inactive_file:3084kB unevictable:0kB
> isolated(anon):728kB isolated(file):0kB present:517120kB mlocked:0kB
> dirty:0kB writeback:7968kB mapped:2904kB shmem:452212kB
> slab_reclaimable:2156kB slab_unreclaimable:4460kB kernel_stack:200kB
> pagetables:1228kB unstable:0kB bounce:0kB writeback_tmp:0kB
> pages_scanned:9678 all_unreclaimable? no
> [  437.500032] lowmem_reserve[]: 0 0 0 0

Node 1 Normal : free 708K min 1016K 
Normally, it could fail to allocate the page. 
Few file LRU compare to anon LRU

> [  437.500032] Node 0 DMA: 2*4kB 2*8kB 1*16kB 3*32kB 3*64kB 4*128kB
> 4*256kB 2*512kB 1*1024kB 2*2048kB 0*4096kB = 8008kB
> [  437.500032] Node 0 DMA32: 27*4kB 15*8kB 8*16kB 8*32kB 7*64kB
> 1*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2980kB
> [  437.500032] Node 1 DMA32: 1*4kB 6*8kB 3*16kB 1*32kB 0*64kB 1*128kB
> 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 2308kB
> [  437.500032] Node 1 Normal: 39*4kB 13*8kB 10*16kB 3*32kB 1*64kB
> 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 708kB
> [  437.500032] 989289 total pagecache pages
> [  437.500032] 25398 pages in swap cache
> [  437.500032] Swap cache stats: add 859204, delete 833806, find 28/39
> [  437.500032] Free swap  = 9865628kB
> [  437.500032] Total swap = 10000316kB
> [  437.500032] 1048575 pages RAM
> [  437.500032] 33809 pages reserved
> [  437.500032] 7996 pages shared
> [  437.500032] 1008521 pages non-shared
> 
All zones don't have enough pages and don't have enough file lru pages.
So swapout is expected behavior, I think.
It means your workload exceeds your system available DRAM size.

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2010-10-20 14:23 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-09-14 23:11 Deadlock possibly caused by too_many_isolated Neil Brown
2010-09-15  0:30 ` Rik van Riel
2010-09-15  2:23   ` Neil Brown
2010-09-15  2:37     ` Wu Fengguang
2010-09-15  2:54       ` Wu Fengguang
2010-09-15  3:06         ` Wu Fengguang
2010-09-15  3:13           ` Wu Fengguang
2010-09-15  3:18             ` Shaohua Li
2010-09-15  3:31               ` Wu Fengguang
2010-09-15  3:17           ` Neil Brown
2010-09-15  3:47             ` Wu Fengguang
2010-09-15  8:28     ` Wu Fengguang
2010-09-15  8:44       ` Neil Brown
2010-10-18  4:14         ` Neil Brown
2010-10-18  5:04           ` KOSAKI Motohiro
2010-10-18 10:58           ` Torsten Kaiser
2010-10-18 23:11             ` Neil Brown
2010-10-19  8:43               ` Torsten Kaiser
2010-10-19 10:06                 ` Torsten Kaiser
2010-10-20  5:57                   ` Wu Fengguang
2010-10-20  7:05                     ` KOSAKI Motohiro
2010-10-20  9:27                       ` Wu Fengguang
2010-10-20 13:03                         ` Jens Axboe
2010-10-22  5:37                           ` Wu Fengguang
2010-10-22  8:07                             ` Wu Fengguang
2010-10-22  8:09                               ` Jens Axboe
2010-10-24 16:52                                 ` Wu Fengguang
2010-10-25  6:40                                   ` Neil Brown
2010-10-25  7:26                                     ` Wu Fengguang
2010-10-20  7:25                     ` Torsten Kaiser
2010-10-20  9:01                       ` Wu Fengguang
2010-10-20 10:07                         ` Torsten Kaiser
2010-10-20 14:23                       ` Minchan Kim [this message]
2010-10-20 15:35                         ` Torsten Kaiser
2010-10-20 23:31                           ` Minchan Kim
2010-10-18 16:15           ` Wu Fengguang
2010-10-18 21:58             ` Andrew Morton
2010-10-18 22:31               ` Neil Brown
2010-10-18 22:41                 ` Andrew Morton
2010-10-19  0:57                   ` KOSAKI Motohiro
2010-10-19  1:15                     ` Minchan Kim
2010-10-19  1:21                       ` KOSAKI Motohiro
2010-10-19  1:32                         ` Minchan Kim
2010-10-19  2:03                           ` KOSAKI Motohiro
2010-10-19  2:16                             ` Minchan Kim
2010-10-19  2:54                               ` KOSAKI Motohiro
2010-10-19  2:35                       ` Wu Fengguang
2010-10-19  2:52                         ` Minchan Kim
2010-10-19  3:05                           ` Wu Fengguang
2010-10-19  3:09                             ` Minchan Kim
2010-10-19  3:13                               ` KOSAKI Motohiro
2010-10-19  5:11                                 ` Minchan Kim
2010-10-19  3:21                               ` Shaohua Li
2010-10-19  7:15                                 ` Shaohua Li
2010-10-19  7:34                                   ` Minchan Kim
2010-10-19  2:24                   ` Wu Fengguang
2010-10-19  2:37                     ` KOSAKI Motohiro
2010-10-19  2:37                     ` Minchan Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101020142326.GA5243@barrios-desktop \
    --to=minchan.kim@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=fengguang.wu@intel.com \
    --cc=just.for.lkml@googlemail.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=neilb@suse.de \
    --cc=riel@redhat.com \
    --cc=shaohua.li@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).