From: Minchan Kim <minchan.kim@gmail.com>
To: Torsten Kaiser <just.for.lkml@googlemail.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>, Neil Brown <neilb@suse.de>,
Rik van Riel <riel@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"Li, Shaohua" <shaohua.li@intel.com>
Subject: Re: Deadlock possibly caused by too_many_isolated.
Date: Wed, 20 Oct 2010 23:23:26 +0900 [thread overview]
Message-ID: <20101020142326.GA5243@barrios-desktop> (raw)
In-Reply-To: <AANLkTinC=xcgfwgXw8Tr-Q_cnxZakjj_W=HwQRV+5vkd@mail.gmail.com>
Hello
On Wed, Oct 20, 2010 at 09:25:49AM +0200, Torsten Kaiser wrote:
> On Wed, Oct 20, 2010 at 7:57 AM, Wu Fengguang <fengguang.wu@intel.com> wrote:
> > On Tue, Oct 19, 2010 at 06:06:21PM +0800, Torsten Kaiser wrote:
> >> swap_writepage() uses get_swap_bio() which uses bio_alloc() to get one
> >> bio. That bio is the submitted, but the submit path seems to get into
> >> make_request from raid1.c and that allocates a second bio from
> >> bio_alloc() via bio_clone().
> >>
> >> I am seeing this pattern (swap_writepage calling
> >> md_make_request/make_request and then getting stuck in mempool_alloc)
> >> more than 5 times in the SysRq+T output...
> >
> > I bet the root cause is the failure of pool->alloc(__GFP_NORETRY)
> > inside mempool_alloc(), which can be fixed by this patch.
>
> No. I tested the patch (ontop of Neils fix and your patch regarding
> too_many_isolated()), but the system got stuck the same way on the
> first try to fill the tmpfs.
> I think the basic problem is, that the mempool that should guarantee
> progress is exhausted because the raid1 device is stacked between the
> pageout code and the disks and so the "use only 1 bio"-rule gets
> violated.
>
> > Thanks,
> > Fengguang
> > ---
> >
> > concurrent direct page reclaim problem
> >
> > ?__GFP_NORETRY page allocations may fail when there are many concurrent page
> > ?allocating tasks, but not necessary in real short of memory. The root cause
> > ?is, tasks will first run direct page reclaim to free some pages from the LRU
> > ?lists and put them to the per-cpu page lists and the buddy system, and then
> > ?try to get a free page from there. ?However the free pages reclaimed by this
> > ?task may be consumed by other tasks when the direct reclaim task is able to
> > ?get the free page for itself.
>
> I believe the facts disagree with that assumtion. My bad for not
> posting this before, but I also used SysRq+M to see whats going on,
> but each time there still was some free memory.
> Here is the SysRq+M output from the run with only Neils patch applied,
> but on each other run the same ~14Mb stayed free
What is your problem?(Sorry if you explained it several time).
I read the thread.
It seems Wu's patch solved deadlock problem by FS lock holding and too_many_isolated.
What is the problem remained in your case? unusable system by swapstorm?
If it is, I think it's expected behavior. Please see the below comment.
(If I don't catch your point, Please explain your problem.)
>
> [ 437.481365] SysRq : Show Memory
> [ 437.490003] Mem-Info:
> [ 437.491357] Node 0 DMA per-cpu:
> [ 437.500032] CPU 0: hi: 0, btch: 1 usd: 0
> [ 437.500032] CPU 1: hi: 0, btch: 1 usd: 0
> [ 437.500032] CPU 2: hi: 0, btch: 1 usd: 0
> [ 437.500032] CPU 3: hi: 0, btch: 1 usd: 0
> [ 437.500032] Node 0 DMA32 per-cpu:
> [ 437.500032] CPU 0: hi: 186, btch: 31 usd: 138
> [ 437.500032] CPU 1: hi: 186, btch: 31 usd: 30
> [ 437.500032] CPU 2: hi: 186, btch: 31 usd: 0
> [ 437.500032] CPU 3: hi: 186, btch: 31 usd: 0
> [ 437.500032] Node 1 DMA32 per-cpu:
> [ 437.500032] CPU 0: hi: 186, btch: 31 usd: 0
> [ 437.500032] CPU 1: hi: 186, btch: 31 usd: 0
> [ 437.500032] CPU 2: hi: 186, btch: 31 usd: 0
> [ 437.500032] CPU 3: hi: 186, btch: 31 usd: 0
> [ 437.500032] Node 1 Normal per-cpu:
> [ 437.500032] CPU 0: hi: 186, btch: 31 usd: 0
> [ 437.500032] CPU 1: hi: 186, btch: 31 usd: 0
> [ 437.500032] CPU 2: hi: 186, btch: 31 usd: 25
> [ 437.500032] CPU 3: hi: 186, btch: 31 usd: 30
> [ 437.500032] active_anon:2039 inactive_anon:985233 isolated_anon:682
> [ 437.500032] active_file:1667 inactive_file:1723 isolated_file:0
> [ 437.500032] unevictable:0 dirty:0 writeback:25387 unstable:0
> [ 437.500032] free:3471 slab_reclaimable:2840 slab_unreclaimable:6337
> [ 437.500032] mapped:1284 shmem:960501 pagetables:523 bounce:0
> [ 437.500032] Node 0 DMA free:8008kB min:28kB low:32kB high:40kB
> active_anon:0kB inact
> ive_anon:7596kB active_file:12kB inactive_file:0kB unevictable:0kB
> isolated(anon):0kB i
> solated(file):0kB present:15768kB mlocked:0kB dirty:0kB
> writeback:404kB mapped:0kB shme
> m:7192kB slab_reclaimable:32kB slab_unreclaimable:304kB
> kernel_stack:0kB pagetables:0kB
> unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:118
> all_unreclaimable? no
> [ 437.500032] lowmem_reserve[]: 0 2004 2004 2004
Node 0 DMA : free 8008K but lowmem_reserve 8012K(2004 pages)
So page allocator can't allocate the page unless preferred zone is DMA
> [ 437.500032] Node 0 DMA32 free:2980kB min:4036kB low:5044kB
> high:6052kB active_anon:2
> 844kB inactive_anon:1918424kB active_file:3428kB inactive_file:3780kB
> unevictable:0kB isolated(anon):1232kB isolated(file):0kB
> present:2052320kB mlocked:0kB dirty:0kB writeback:72016kB
> mapped:2232kB shmem:1847640kB slab_reclaimable:5444kB
> slab_unreclaimable:13508kB kernel_stack:744kB pagetables:864kB
> unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0
> all_unreclaimable? no
> [ 437.500032] lowmem_reserve[]: 0 0 0 0
Node 0 DMA32 : free 2980K but min 4036K.
Few file LRU compare to anon LRU
Normally, it could fail to allocate the page.
'Normal' means caller doesn't request alloc_pages with __GFP_HIGH or !__GFP_WAIT
Generally many call sites don't pass gfp_flag with __GFP_HIGH|!__GFP_WAIT.
> [ 437.500032] Node 1 DMA32 free:2188kB min:3036kB low:3792kB
> high:4552kB active_anon:0kB inactive_anon:1555368kB active_file:0kB
> inactive_file:28kB unevictable:0kB isolated(anon):768kB
> isolated(file):0kB present:1544000kB mlocked:0kB dirty:0kB
> writeback:21160kB mapped:0kB shmem:1534960kB slab_reclaimable:3728kB
> slab_unreclaimable:7076kB kernel_stack:8kB pagetables:0kB unstable:0kB
> bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
> [ 437.500032] lowmem_reserve[]: 0 0 505 505
Node 1 DMA32 free : 2188K min 3036K
It's a same situation with Node 0 DMA32.
Normally, it could fail to allocate the page.
Few file LRU compare to anon LRU
> [ 437.500032] Node 1 Normal free:708kB min:1016kB low:1268kB
> high:1524kB active_anon:5312kB inactive_anon:459544kB
> active_file:3228kB inactive_file:3084kB unevictable:0kB
> isolated(anon):728kB isolated(file):0kB present:517120kB mlocked:0kB
> dirty:0kB writeback:7968kB mapped:2904kB shmem:452212kB
> slab_reclaimable:2156kB slab_unreclaimable:4460kB kernel_stack:200kB
> pagetables:1228kB unstable:0kB bounce:0kB writeback_tmp:0kB
> pages_scanned:9678 all_unreclaimable? no
> [ 437.500032] lowmem_reserve[]: 0 0 0 0
Node 1 Normal : free 708K min 1016K
Normally, it could fail to allocate the page.
Few file LRU compare to anon LRU
> [ 437.500032] Node 0 DMA: 2*4kB 2*8kB 1*16kB 3*32kB 3*64kB 4*128kB
> 4*256kB 2*512kB 1*1024kB 2*2048kB 0*4096kB = 8008kB
> [ 437.500032] Node 0 DMA32: 27*4kB 15*8kB 8*16kB 8*32kB 7*64kB
> 1*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2980kB
> [ 437.500032] Node 1 DMA32: 1*4kB 6*8kB 3*16kB 1*32kB 0*64kB 1*128kB
> 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 2308kB
> [ 437.500032] Node 1 Normal: 39*4kB 13*8kB 10*16kB 3*32kB 1*64kB
> 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 708kB
> [ 437.500032] 989289 total pagecache pages
> [ 437.500032] 25398 pages in swap cache
> [ 437.500032] Swap cache stats: add 859204, delete 833806, find 28/39
> [ 437.500032] Free swap = 9865628kB
> [ 437.500032] Total swap = 10000316kB
> [ 437.500032] 1048575 pages RAM
> [ 437.500032] 33809 pages reserved
> [ 437.500032] 7996 pages shared
> [ 437.500032] 1008521 pages non-shared
>
All zones don't have enough pages and don't have enough file lru pages.
So swapout is expected behavior, I think.
It means your workload exceeds your system available DRAM size.
--
Kind regards,
Minchan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-10-20 14:23 UTC|newest]
Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-09-14 23:11 Deadlock possibly caused by too_many_isolated Neil Brown
2010-09-15 0:30 ` Rik van Riel
2010-09-15 2:23 ` Neil Brown
2010-09-15 2:37 ` Wu Fengguang
2010-09-15 2:54 ` Wu Fengguang
2010-09-15 3:06 ` Wu Fengguang
2010-09-15 3:13 ` Wu Fengguang
2010-09-15 3:18 ` Shaohua Li
2010-09-15 3:31 ` Wu Fengguang
2010-09-15 3:17 ` Neil Brown
2010-09-15 3:47 ` Wu Fengguang
2010-09-15 8:28 ` Wu Fengguang
2010-09-15 8:44 ` Neil Brown
2010-10-18 4:14 ` Neil Brown
2010-10-18 5:04 ` KOSAKI Motohiro
2010-10-18 10:58 ` Torsten Kaiser
2010-10-18 23:11 ` Neil Brown
2010-10-19 8:43 ` Torsten Kaiser
2010-10-19 10:06 ` Torsten Kaiser
2010-10-20 5:57 ` Wu Fengguang
2010-10-20 7:05 ` KOSAKI Motohiro
2010-10-20 9:27 ` Wu Fengguang
2010-10-20 13:03 ` Jens Axboe
2010-10-22 5:37 ` Wu Fengguang
2010-10-22 8:07 ` Wu Fengguang
2010-10-22 8:09 ` Jens Axboe
2010-10-24 16:52 ` Wu Fengguang
2010-10-25 6:40 ` Neil Brown
2010-10-25 7:26 ` Wu Fengguang
2010-10-20 7:25 ` Torsten Kaiser
2010-10-20 9:01 ` Wu Fengguang
2010-10-20 10:07 ` Torsten Kaiser
2010-10-20 14:23 ` Minchan Kim [this message]
2010-10-20 15:35 ` Torsten Kaiser
2010-10-20 23:31 ` Minchan Kim
2010-10-18 16:15 ` Wu Fengguang
2010-10-18 21:58 ` Andrew Morton
2010-10-18 22:31 ` Neil Brown
2010-10-18 22:41 ` Andrew Morton
2010-10-19 0:57 ` KOSAKI Motohiro
2010-10-19 1:15 ` Minchan Kim
2010-10-19 1:21 ` KOSAKI Motohiro
2010-10-19 1:32 ` Minchan Kim
2010-10-19 2:03 ` KOSAKI Motohiro
2010-10-19 2:16 ` Minchan Kim
2010-10-19 2:54 ` KOSAKI Motohiro
2010-10-19 2:35 ` Wu Fengguang
2010-10-19 2:52 ` Minchan Kim
2010-10-19 3:05 ` Wu Fengguang
2010-10-19 3:09 ` Minchan Kim
2010-10-19 3:13 ` KOSAKI Motohiro
2010-10-19 5:11 ` Minchan Kim
2010-10-19 3:21 ` Shaohua Li
2010-10-19 7:15 ` Shaohua Li
2010-10-19 7:34 ` Minchan Kim
2010-10-19 2:24 ` Wu Fengguang
2010-10-19 2:37 ` KOSAKI Motohiro
2010-10-19 2:37 ` Minchan Kim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20101020142326.GA5243@barrios-desktop \
--to=minchan.kim@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=fengguang.wu@intel.com \
--cc=just.for.lkml@googlemail.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=neilb@suse.de \
--cc=riel@redhat.com \
--cc=shaohua.li@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).