[RFC] can we use vmalloc to alloc thread stack if compaction failed

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [RFC] can we use vmalloc to alloc thread stack if compaction failed
@ 2016-07-28  7:08 Xishi Qiu
  2016-07-28  7:20 ` Michal Hocko
  0 siblings, 1 reply; 13+ messages in thread
From: Xishi Qiu @ 2016-07-28  7:08 UTC (permalink / raw)
  To: Tejun Heo, Ingo Molnar, Michal Hocko, Peter Zijlstra; +Cc: LKML, Linux MM

Usually THREAD_SIZE_ORDER is 2, it means we need to alloc 16kb continuous
physical memory during fork a new process.

If the system's memory is very small, especially the smart phone, maybe there
is only 1G memory. So the free memory is very small and compaction is not
always success in slowpath(__alloc_pages_slowpath), then alloc thread stack
may be failed for memory fragment.

Can we use vmalloc to alloc thread stack if compaction failed in slowpath?
e.g. Use vmalloc as a fallback if alloc_page/kamlloc failed.

I think the performance may be a little regression, and any other problems?

Thanks,
Xishi Qiu

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] can we use vmalloc to alloc thread stack if compaction failed
  2016-07-28  7:08 [RFC] can we use vmalloc to alloc thread stack if compaction failed Xishi Qiu
@ 2016-07-28  7:20 ` Michal Hocko
  2016-07-28  7:41   ` Xishi Qiu
  0 siblings, 1 reply; 13+ messages in thread
From: Michal Hocko @ 2016-07-28  7:20 UTC (permalink / raw)
  To: Xishi Qiu
  Cc: Tejun Heo, Ingo Molnar, Peter Zijlstra, LKML, Linux MM,
	Andy Lutomirski

On Thu 28-07-16 15:08:26, Xishi Qiu wrote:
> Usually THREAD_SIZE_ORDER is 2, it means we need to alloc 16kb continuous
> physical memory during fork a new process.
> 
> If the system's memory is very small, especially the smart phone, maybe there
> is only 1G memory. So the free memory is very small and compaction is not
> always success in slowpath(__alloc_pages_slowpath), then alloc thread stack
> may be failed for memory fragment.

Well, with the current implementation of the page allocator those
requests will not fail in most cases. The oom killer would be invoked in
order to free up some memory.

> Can we use vmalloc to alloc thread stack if compaction failed in slowpath?

Not yet but Andy is working on this.

> e.g. Use vmalloc as a fallback if alloc_page/kamlloc failed.
> 
> I think the performance may be a little regression, and any other problems?
> 
> Thanks,
> Xishi Qiu

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] can we use vmalloc to alloc thread stack if compaction failed
  2016-07-28  7:20 ` Michal Hocko
@ 2016-07-28  7:41   ` Xishi Qiu
  2016-07-28  7:58     ` Michal Hocko
  0 siblings, 1 reply; 13+ messages in thread
From: Xishi Qiu @ 2016-07-28  7:41 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tejun Heo, Ingo Molnar, Peter Zijlstra, LKML, Linux MM,
	Andy Lutomirski, Yisheng Xie

On 2016/7/28 15:20, Michal Hocko wrote:

> On Thu 28-07-16 15:08:26, Xishi Qiu wrote:
>> Usually THREAD_SIZE_ORDER is 2, it means we need to alloc 16kb continuous
>> physical memory during fork a new process.
>>
>> If the system's memory is very small, especially the smart phone, maybe there
>> is only 1G memory. So the free memory is very small and compaction is not
>> always success in slowpath(__alloc_pages_slowpath), then alloc thread stack
>> may be failed for memory fragment.
> 
> Well, with the current implementation of the page allocator those
> requests will not fail in most cases. The oom killer would be invoked in
> order to free up some memory.
> 

Hi Michal,

Yes, it success in most cases, but I did have seen this problem in some
stress-test.

DMA free:470628kB, but alloc 2 order block failed during fork a new process.
There are so many memory fragments and the large block may be soon taken by
others after compact because of stress-test.

--- dmesg messages ---
07-13 08:41:51.341 <4>[309805.658142s][pid:1361,cpu5,sManagerService]sManagerService: page allocation failure: order:2, mode:0x2000d1
07-13 08:41:51.346 <4>[309805.658142s][pid:1361,cpu5,sManagerService]CPU: 5 PID: 1361 Comm: sManagerService Tainted: G        W       4.1.18-g09f547b #1
07-13 08:41:51.347 <4>[309805.658142s][pid:1361,cpu5,sManagerService]TGID: 981 Comm: system_server
07-13 08:41:51.347 <4>[309805.658172s][pid:1361,cpu5,sManagerService]Hardware name: hi3650 (DT)
07-13 08:41:51.347 <0>[309805.658172s][pid:1361,cpu5,sManagerService]Call trace:
07-13 08:41:51.347 <4>[309805.658203s][pid:1361,cpu5,sManagerService][<ffffffc00008a0a4>] dump_backtrace+0x0/0x150
07-13 08:41:51.347 <4>[309805.658203s][pid:1361,cpu5,sManagerService][<ffffffc00008a214>] show_stack+0x20/0x28
07-13 08:41:51.347 <4>[309805.658203s][pid:1361,cpu5,sManagerService][<ffffffc000fc4034>] dump_stack+0x84/0xa8
07-13 08:41:51.347 <4>[309805.658203s][pid:1361,cpu5,sManagerService][<ffffffc00018af54>] warn_alloc_failed+0x10c/0x164
07-13 08:41:51.347 <4>[309805.658233s][pid:1361,cpu5,sManagerService][<ffffffc00018e778>] __alloc_pages_nodemask+0x5b4/0x888
07-13 08:41:51.347 <4>[309805.658233s][pid:1361,cpu5,sManagerService][<ffffffc00018eb84>] alloc_kmem_pages_node+0x44/0x50
07-13 08:41:51.347 <4>[309805.658233s][pid:1361,cpu5,sManagerService][<ffffffc00009fa78>] copy_process.part.46+0x140/0x15ac
07-13 08:41:51.347 <4>[309805.658233s][pid:1361,cpu5,sManagerService][<ffffffc0000a10a0>] do_fork+0xe8/0x444
07-13 08:41:51.347 <4>[309805.658264s][pid:1361,cpu5,sManagerService][<ffffffc0000a14e8>] SyS_clone+0x3c/0x48
07-13 08:41:51.347 <4>[309805.658264s][pid:1361,cpu5,sManagerService]Mem-Info:
07-13 08:41:51.347 <4>[309805.658264s][pid:1361,cpu5,sManagerService]active_anon:491074 inactive_anon:118072 isolated_anon:0
07-13 08:41:51.347 <4>[309805.658264s] active_file:19087 inactive_file:9843 isolated_file:0
07-13 08:41:51.347 <4>[309805.658264s] unevictable:322 dirty:20 writeback:0 unstable:0
07-13 08:41:51.347 <4>[309805.658264s] slab_reclaimable:11788 slab_unreclaimable:28068
07-13 08:41:51.347 <4>[309805.658264s] mapped:20633 shmem:4038 pagetables:10865 bounce:72
07-13 08:41:51.347 <4>[309805.658264s] free:118678 free_pcp:58 free_cma:0
07-13 08:41:51.347 <4>[309805.658294s][pid:1361,cpu5,sManagerService]DMA free:470628kB min:6800kB low:29116kB high:30816kB active_anon:1868540kB inactive_anon:376100kB active_file:292kB inactive_file:240kB unevictable:1080kB isolated(anon):0kB isolated(file):0kB present:3446780kB managed:3307056kB mlocked:1080kB dirty:80kB writeback:0kB mapped:7604kB shmem:14380kB slab_reclaimable:47152kB slab_unreclaimable:112268kB kernel_stack:28224kB pagetables:43460kB unstable:0kB bounce:288kB free_pcp:204kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
07-13 08:41:51.347 <4>[309805.658294s][pid:1361,cpu5,sManagerService]lowmem_reserve[]: 0 415 415
07-13 08:41:51.347 <4>[309805.658294s][pid:1361,cpu5,sManagerService]Normal free:4084kB min:872kB low:3740kB high:3960kB active_anon:95756kB inactive_anon:96188kB active_file:76056kB inactive_file:39132kB unevictable:208kB isolated(anon):0kB isolated(file):0kB present:524288kB managed:425480kB mlocked:208kB dirty:0kB writeback:0kB mapped:74928kB shmem:1772kB slab_reclaimable:0kB slab_unreclaimable:4kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:28kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
07-13 08:41:51.347 <4>[309805.658294s][pid:1361,cpu5,sManagerService]lowmem_reserve[]: 0 0 0
07-13 08:41:51.347 <4>[309805.658325s][pid:1361,cpu5,sManagerService]DMA: 68324*4kB (UEM) 24706*8kB (UER) 2*16kB (U) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 470976kB
07-13 08:41:51.347 <4>[309805.658355s][pid:1361,cpu5,sManagerService]Normal: 270*4kB (UMR) 82*8kB (UMR) 48*16kB (MR) 25*32kB (R) 12*64kB (R) 2*128kB (R) 1*256kB (R) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 4584kB
07-13 08:41:51.347 <4>[309805.658386s][pid:1361,cpu5,sManagerService]38319 total pagecache pages
07-13 08:41:51.347 <4>[309805.658386s][pid:1361,cpu5,sManagerService]5384 pages in swap cache
07-13 08:41:51.347 <4>[309805.658386s][pid:1361,cpu5,sManagerService]Swap cache stats: add 628084, delete 622700, find 2187699/2264909
07-13 08:41:51.347 <4>[309805.658386s][pid:1361,cpu5,sManagerService]Free swap  = 0kB
07-13 08:41:51.348 <4>[309805.658416s][pid:1361,cpu5,sManagerService]Total swap = 524284kB
07-13 08:41:51.348 <4>[309805.658416s][pid:1361,cpu5,sManagerService]992767 pages RAM
07-13 08:41:51.348 <4>[309805.658416s][pid:1361,cpu5,sManagerService]0 pages HighMem/MovableOnly
07-13 08:41:51.348 <4>[309805.658416s][pid:1361,cpu5,sManagerService]51441 pages reserved
07-13 08:41:51.348 <4>[309805.658416s][pid:1361,cpu5,sManagerService]8192 pages cma reserved
07-13 08:41:51.767 <6>[309806.068298s][pid:2247,cpu6,notification-sq][I/sensorhub] shb_release ok

>> Can we use vmalloc to alloc thread stack if compaction failed in slowpath?
> 
> Not yet but Andy is working on this.
> 
>> e.g. Use vmalloc as a fallback if alloc_page/kamlloc failed.
>>
>> I think the performance may be a little regression, and any other problems?
>>
>> Thanks,
>> Xishi Qiu
> 



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] can we use vmalloc to alloc thread stack if compaction failed
  2016-07-28  7:41   ` Xishi Qiu
@ 2016-07-28  7:58     ` Michal Hocko
  2016-07-28  8:45       ` Xishi Qiu
  0 siblings, 1 reply; 13+ messages in thread
From: Michal Hocko @ 2016-07-28  7:58 UTC (permalink / raw)
  To: Xishi Qiu
  Cc: Tejun Heo, Ingo Molnar, Peter Zijlstra, LKML, Linux MM,
	Andy Lutomirski, Yisheng Xie

On Thu 28-07-16 15:41:53, Xishi Qiu wrote:
> On 2016/7/28 15:20, Michal Hocko wrote:
> 
> > On Thu 28-07-16 15:08:26, Xishi Qiu wrote:
> >> Usually THREAD_SIZE_ORDER is 2, it means we need to alloc 16kb continuous
> >> physical memory during fork a new process.
> >>
> >> If the system's memory is very small, especially the smart phone, maybe there
> >> is only 1G memory. So the free memory is very small and compaction is not
> >> always success in slowpath(__alloc_pages_slowpath), then alloc thread stack
> >> may be failed for memory fragment.
> > 
> > Well, with the current implementation of the page allocator those
> > requests will not fail in most cases. The oom killer would be invoked in
> > order to free up some memory.
> > 
> 
> Hi Michal,
> 
> Yes, it success in most cases, but I did have seen this problem in some
> stress-test.
> 
> DMA free:470628kB, but alloc 2 order block failed during fork a new process.
> There are so many memory fragments and the large block may be soon taken by
> others after compact because of stress-test.
> 
> --- dmesg messages ---
> 07-13 08:41:51.341 <4>[309805.658142s][pid:1361,cpu5,sManagerService]sManagerService: page allocation failure: order:2, mode:0x2000d1

Yes but this is __GFP_DMA allocation. I guess you have already reported
this failure and you've been told that this is quite unexpected for the
kernel stack allocation. It is your out-of-tree patch which just makes
things worse because DMA restricted allocations are considered "lowmem"
and so they do not invoke OOM killer and do not retry like regular
GFP_KERNEL allocations.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] can we use vmalloc to alloc thread stack if compaction failed
  2016-07-28  7:58     ` Michal Hocko
@ 2016-07-28  8:45       ` Xishi Qiu
  2016-07-28  9:43         ` Michal Hocko
  0 siblings, 1 reply; 13+ messages in thread
From: Xishi Qiu @ 2016-07-28  8:45 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tejun Heo, Ingo Molnar, Peter Zijlstra, LKML, Linux MM,
	Andy Lutomirski, Yisheng Xie

On 2016/7/28 15:58, Michal Hocko wrote:

> On Thu 28-07-16 15:41:53, Xishi Qiu wrote:
>> On 2016/7/28 15:20, Michal Hocko wrote:
>>
>>> On Thu 28-07-16 15:08:26, Xishi Qiu wrote:
>>>> Usually THREAD_SIZE_ORDER is 2, it means we need to alloc 16kb continuous
>>>> physical memory during fork a new process.
>>>>
>>>> If the system's memory is very small, especially the smart phone, maybe there
>>>> is only 1G memory. So the free memory is very small and compaction is not
>>>> always success in slowpath(__alloc_pages_slowpath), then alloc thread stack
>>>> may be failed for memory fragment.
>>>
>>> Well, with the current implementation of the page allocator those
>>> requests will not fail in most cases. The oom killer would be invoked in
>>> order to free up some memory.
>>>
>>
>> Hi Michal,
>>
>> Yes, it success in most cases, but I did have seen this problem in some
>> stress-test.
>>
>> DMA free:470628kB, but alloc 2 order block failed during fork a new process.
>> There are so many memory fragments and the large block may be soon taken by
>> others after compact because of stress-test.
>>
>> --- dmesg messages ---
>> 07-13 08:41:51.341 <4>[309805.658142s][pid:1361,cpu5,sManagerService]sManagerService: page allocation failure: order:2, mode:0x2000d1
> 
> Yes but this is __GFP_DMA allocation. I guess you have already reported
> this failure and you've been told that this is quite unexpected for the
> kernel stack allocation. It is your out-of-tree patch which just makes
> things worse because DMA restricted allocations are considered "lowmem"
> and so they do not invoke OOM killer and do not retry like regular
> GFP_KERNEL allocations.

Hi Michal,

Yes, we add GFP_DMA, but I don't think this is the key for the problem.

If we do oom-killer, maybe we will get a large block later, but there
is enough free memory before oom(although most of them are fragments).

I wonder if we can alloc success without kill any process in this situation.
Maybe use vmalloc is a good way, but I don't know the influence.

Thanks,
Xishi Qiu

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] can we use vmalloc to alloc thread stack if compaction failed
  2016-07-28  8:45       ` Xishi Qiu
@ 2016-07-28  9:43         ` Michal Hocko
  2016-07-28 10:51           ` Xishi Qiu
  0 siblings, 1 reply; 13+ messages in thread
From: Michal Hocko @ 2016-07-28  9:43 UTC (permalink / raw)
  To: Xishi Qiu
  Cc: Tejun Heo, Ingo Molnar, Peter Zijlstra, LKML, Linux MM,
	Andy Lutomirski, Yisheng Xie

On Thu 28-07-16 16:45:06, Xishi Qiu wrote:
> On 2016/7/28 15:58, Michal Hocko wrote:
> 
> > On Thu 28-07-16 15:41:53, Xishi Qiu wrote:
> >> On 2016/7/28 15:20, Michal Hocko wrote:
> >>
> >>> On Thu 28-07-16 15:08:26, Xishi Qiu wrote:
> >>>> Usually THREAD_SIZE_ORDER is 2, it means we need to alloc 16kb continuous
> >>>> physical memory during fork a new process.
> >>>>
> >>>> If the system's memory is very small, especially the smart phone, maybe there
> >>>> is only 1G memory. So the free memory is very small and compaction is not
> >>>> always success in slowpath(__alloc_pages_slowpath), then alloc thread stack
> >>>> may be failed for memory fragment.
> >>>
> >>> Well, with the current implementation of the page allocator those
> >>> requests will not fail in most cases. The oom killer would be invoked in
> >>> order to free up some memory.
> >>>
> >>
> >> Hi Michal,
> >>
> >> Yes, it success in most cases, but I did have seen this problem in some
> >> stress-test.
> >>
> >> DMA free:470628kB, but alloc 2 order block failed during fork a new process.
> >> There are so many memory fragments and the large block may be soon taken by
> >> others after compact because of stress-test.
> >>
> >> --- dmesg messages ---
> >> 07-13 08:41:51.341 <4>[309805.658142s][pid:1361,cpu5,sManagerService]sManagerService: page allocation failure: order:2, mode:0x2000d1
> > 
> > Yes but this is __GFP_DMA allocation. I guess you have already reported
> > this failure and you've been told that this is quite unexpected for the
> > kernel stack allocation. It is your out-of-tree patch which just makes
> > things worse because DMA restricted allocations are considered "lowmem"
> > and so they do not invoke OOM killer and do not retry like regular
> > GFP_KERNEL allocations.
> 
> Hi Michal,
> 
> Yes, we add GFP_DMA, but I don't think this is the key for the problem.

You are restricting the allocation request to a single zone which is
definitely not good. Look at how many larger order pages are available
in the Normal zone.

> If we do oom-killer, maybe we will get a large block later, but there
> is enough free memory before oom(although most of them are fragments).

Killing a task is of course the last resort action. It would give you
larger order blocks used for the victims thread.

> I wonder if we can alloc success without kill any process in this situation.

Sure it would be preferable to compact that memory but that might be
hard with your restriction in place. Consider that DMA zone would tend
to be less movable than normal zones as users would have to pin it for
DMA. Your DMA is really large so this might turn out to just happen to
work but note that the primary problem here is that you put a zone
restriction for your allocations.

> Maybe use vmalloc is a good way, but I don't know the influence.

You can have a look at vmalloc patches posted by Andy. They are not that
trivial.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] can we use vmalloc to alloc thread stack if compaction failed
  2016-07-28  9:43         ` Michal Hocko
@ 2016-07-28 10:51           ` Xishi Qiu
  2016-07-28 15:07             ` Andy Lutomirski
  0 siblings, 1 reply; 13+ messages in thread
From: Xishi Qiu @ 2016-07-28 10:51 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tejun Heo, Ingo Molnar, Peter Zijlstra, LKML, Linux MM,
	Andy Lutomirski, Yisheng Xie

On 2016/7/28 17:43, Michal Hocko wrote:

> On Thu 28-07-16 16:45:06, Xishi Qiu wrote:
>> On 2016/7/28 15:58, Michal Hocko wrote:
>>
>>> On Thu 28-07-16 15:41:53, Xishi Qiu wrote:
>>>> On 2016/7/28 15:20, Michal Hocko wrote:
>>>>
>>>>> On Thu 28-07-16 15:08:26, Xishi Qiu wrote:
>>>>>> Usually THREAD_SIZE_ORDER is 2, it means we need to alloc 16kb continuous
>>>>>> physical memory during fork a new process.
>>>>>>
>>>>>> If the system's memory is very small, especially the smart phone, maybe there
>>>>>> is only 1G memory. So the free memory is very small and compaction is not
>>>>>> always success in slowpath(__alloc_pages_slowpath), then alloc thread stack
>>>>>> may be failed for memory fragment.
>>>>>
>>>>> Well, with the current implementation of the page allocator those
>>>>> requests will not fail in most cases. The oom killer would be invoked in
>>>>> order to free up some memory.
>>>>>
>>>>
>>>> Hi Michal,
>>>>
>>>> Yes, it success in most cases, but I did have seen this problem in some
>>>> stress-test.
>>>>
>>>> DMA free:470628kB, but alloc 2 order block failed during fork a new process.
>>>> There are so many memory fragments and the large block may be soon taken by
>>>> others after compact because of stress-test.
>>>>
>>>> --- dmesg messages ---
>>>> 07-13 08:41:51.341 <4>[309805.658142s][pid:1361,cpu5,sManagerService]sManagerService: page allocation failure: order:2, mode:0x2000d1
>>>
>>> Yes but this is __GFP_DMA allocation. I guess you have already reported
>>> this failure and you've been told that this is quite unexpected for the
>>> kernel stack allocation. It is your out-of-tree patch which just makes
>>> things worse because DMA restricted allocations are considered "lowmem"
>>> and so they do not invoke OOM killer and do not retry like regular
>>> GFP_KERNEL allocations.
>>
>> Hi Michal,
>>
>> Yes, we add GFP_DMA, but I don't think this is the key for the problem.
> 
> You are restricting the allocation request to a single zone which is
> definitely not good. Look at how many larger order pages are available
> in the Normal zone.
> 
>> If we do oom-killer, maybe we will get a large block later, but there
>> is enough free memory before oom(although most of them are fragments).
> 
> Killing a task is of course the last resort action. It would give you
> larger order blocks used for the victims thread.
> 
>> I wonder if we can alloc success without kill any process in this situation.
> 
> Sure it would be preferable to compact that memory but that might be
> hard with your restriction in place. Consider that DMA zone would tend
> to be less movable than normal zones as users would have to pin it for
> DMA. Your DMA is really large so this might turn out to just happen to
> work but note that the primary problem here is that you put a zone
> restriction for your allocations.
> 
>> Maybe use vmalloc is a good way, but I don't know the influence.
> 
> You can have a look at vmalloc patches posted by Andy. They are not that
> trivial.
> 

Hi Michal,

Thank you for your comment, could you give me the link?

Thanks,
Xishi Qiu

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] can we use vmalloc to alloc thread stack if compaction failed
  2016-07-28 10:51           ` Xishi Qiu
@ 2016-07-28 15:07             ` Andy Lutomirski
  2016-07-29  3:01               ` Joonsoo Kim
  0 siblings, 1 reply; 13+ messages in thread
From: Andy Lutomirski @ 2016-07-28 15:07 UTC (permalink / raw)
  To: Xishi Qiu
  Cc: Michal Hocko, Tejun Heo, Ingo Molnar, Peter Zijlstra, LKML,
	Linux MM, Yisheng Xie

On Thu, Jul 28, 2016 at 3:51 AM, Xishi Qiu <qiuxishi@huawei.com> wrote:
> On 2016/7/28 17:43, Michal Hocko wrote:
>
>> On Thu 28-07-16 16:45:06, Xishi Qiu wrote:
>>> On 2016/7/28 15:58, Michal Hocko wrote:
>>>
>>>> On Thu 28-07-16 15:41:53, Xishi Qiu wrote:
>>>>> On 2016/7/28 15:20, Michal Hocko wrote:
>>>>>
>>>>>> On Thu 28-07-16 15:08:26, Xishi Qiu wrote:
>>>>>>> Usually THREAD_SIZE_ORDER is 2, it means we need to alloc 16kb continuous
>>>>>>> physical memory during fork a new process.
>>>>>>>
>>>>>>> If the system's memory is very small, especially the smart phone, maybe there
>>>>>>> is only 1G memory. So the free memory is very small and compaction is not
>>>>>>> always success in slowpath(__alloc_pages_slowpath), then alloc thread stack
>>>>>>> may be failed for memory fragment.
>>>>>>
>>>>>> Well, with the current implementation of the page allocator those
>>>>>> requests will not fail in most cases. The oom killer would be invoked in
>>>>>> order to free up some memory.
>>>>>>
>>>>>
>>>>> Hi Michal,
>>>>>
>>>>> Yes, it success in most cases, but I did have seen this problem in some
>>>>> stress-test.
>>>>>
>>>>> DMA free:470628kB, but alloc 2 order block failed during fork a new process.
>>>>> There are so many memory fragments and the large block may be soon taken by
>>>>> others after compact because of stress-test.
>>>>>
>>>>> --- dmesg messages ---
>>>>> 07-13 08:41:51.341 <4>[309805.658142s][pid:1361,cpu5,sManagerService]sManagerService: page allocation failure: order:2, mode:0x2000d1
>>>>
>>>> Yes but this is __GFP_DMA allocation. I guess you have already reported
>>>> this failure and you've been told that this is quite unexpected for the
>>>> kernel stack allocation. It is your out-of-tree patch which just makes
>>>> things worse because DMA restricted allocations are considered "lowmem"
>>>> and so they do not invoke OOM killer and do not retry like regular
>>>> GFP_KERNEL allocations.
>>>
>>> Hi Michal,
>>>
>>> Yes, we add GFP_DMA, but I don't think this is the key for the problem.
>>
>> You are restricting the allocation request to a single zone which is
>> definitely not good. Look at how many larger order pages are available
>> in the Normal zone.
>>
>>> If we do oom-killer, maybe we will get a large block later, but there
>>> is enough free memory before oom(although most of them are fragments).
>>
>> Killing a task is of course the last resort action. It would give you
>> larger order blocks used for the victims thread.
>>
>>> I wonder if we can alloc success without kill any process in this situation.
>>
>> Sure it would be preferable to compact that memory but that might be
>> hard with your restriction in place. Consider that DMA zone would tend
>> to be less movable than normal zones as users would have to pin it for
>> DMA. Your DMA is really large so this might turn out to just happen to
>> work but note that the primary problem here is that you put a zone
>> restriction for your allocations.
>>
>>> Maybe use vmalloc is a good way, but I don't know the influence.
>>
>> You can have a look at vmalloc patches posted by Andy. They are not that
>> trivial.
>>
>
> Hi Michal,
>
> Thank you for your comment, could you give me the link?
>

I've been keeping it mostly up to date in this branch:

https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/log/?h=x86/vmap_stack

It's currently out of sync due to a bunch of the patches being queued
elsewhere for the merge window.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] can we use vmalloc to alloc thread stack if compaction failed
  2016-07-28 15:07             ` Andy Lutomirski
@ 2016-07-29  3:01               ` Joonsoo Kim
  2016-07-29 19:47                 ` Andy Lutomirski
  0 siblings, 1 reply; 13+ messages in thread
From: Joonsoo Kim @ 2016-07-29  3:01 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Xishi Qiu, Michal Hocko, Tejun Heo, Ingo Molnar, Peter Zijlstra,
	LKML, Linux MM, Yisheng Xie

On Thu, Jul 28, 2016 at 08:07:51AM -0700, Andy Lutomirski wrote:
> On Thu, Jul 28, 2016 at 3:51 AM, Xishi Qiu <qiuxishi@huawei.com> wrote:
> > On 2016/7/28 17:43, Michal Hocko wrote:
> >
> >> On Thu 28-07-16 16:45:06, Xishi Qiu wrote:
> >>> On 2016/7/28 15:58, Michal Hocko wrote:
> >>>
> >>>> On Thu 28-07-16 15:41:53, Xishi Qiu wrote:
> >>>>> On 2016/7/28 15:20, Michal Hocko wrote:
> >>>>>
> >>>>>> On Thu 28-07-16 15:08:26, Xishi Qiu wrote:
> >>>>>>> Usually THREAD_SIZE_ORDER is 2, it means we need to alloc 16kb continuous
> >>>>>>> physical memory during fork a new process.
> >>>>>>>
> >>>>>>> If the system's memory is very small, especially the smart phone, maybe there
> >>>>>>> is only 1G memory. So the free memory is very small and compaction is not
> >>>>>>> always success in slowpath(__alloc_pages_slowpath), then alloc thread stack
> >>>>>>> may be failed for memory fragment.
> >>>>>>
> >>>>>> Well, with the current implementation of the page allocator those
> >>>>>> requests will not fail in most cases. The oom killer would be invoked in
> >>>>>> order to free up some memory.
> >>>>>>
> >>>>>
> >>>>> Hi Michal,
> >>>>>
> >>>>> Yes, it success in most cases, but I did have seen this problem in some
> >>>>> stress-test.
> >>>>>
> >>>>> DMA free:470628kB, but alloc 2 order block failed during fork a new process.
> >>>>> There are so many memory fragments and the large block may be soon taken by
> >>>>> others after compact because of stress-test.
> >>>>>
> >>>>> --- dmesg messages ---
> >>>>> 07-13 08:41:51.341 <4>[309805.658142s][pid:1361,cpu5,sManagerService]sManagerService: page allocation failure: order:2, mode:0x2000d1
> >>>>
> >>>> Yes but this is __GFP_DMA allocation. I guess you have already reported
> >>>> this failure and you've been told that this is quite unexpected for the
> >>>> kernel stack allocation. It is your out-of-tree patch which just makes
> >>>> things worse because DMA restricted allocations are considered "lowmem"
> >>>> and so they do not invoke OOM killer and do not retry like regular
> >>>> GFP_KERNEL allocations.
> >>>
> >>> Hi Michal,
> >>>
> >>> Yes, we add GFP_DMA, but I don't think this is the key for the problem.
> >>
> >> You are restricting the allocation request to a single zone which is
> >> definitely not good. Look at how many larger order pages are available
> >> in the Normal zone.
> >>
> >>> If we do oom-killer, maybe we will get a large block later, but there
> >>> is enough free memory before oom(although most of them are fragments).
> >>
> >> Killing a task is of course the last resort action. It would give you
> >> larger order blocks used for the victims thread.
> >>
> >>> I wonder if we can alloc success without kill any process in this situation.
> >>
> >> Sure it would be preferable to compact that memory but that might be
> >> hard with your restriction in place. Consider that DMA zone would tend
> >> to be less movable than normal zones as users would have to pin it for
> >> DMA. Your DMA is really large so this might turn out to just happen to
> >> work but note that the primary problem here is that you put a zone
> >> restriction for your allocations.
> >>
> >>> Maybe use vmalloc is a good way, but I don't know the influence.
> >>
> >> You can have a look at vmalloc patches posted by Andy. They are not that
> >> trivial.
> >>
> >
> > Hi Michal,
> >
> > Thank you for your comment, could you give me the link?
> >
> 
> I've been keeping it mostly up to date in this branch:
> 
> https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/log/?h=x86/vmap_stack
> 
> It's currently out of sync due to a bunch of the patches being queued
> elsewhere for the merge window.

Hello, Andy.

I have some questions about it.

IIUC, to turn on HAVE_ARCH_VMAP_STACK on different architecture, there
is nothing to be done in architecture side if the architecture doesn't
support lazily faults in top-level paging entries for the vmalloc
area. Is my understanding is correct?

And, I'd like to know how you search problematic places using kernel
stack for DMA.

One note is that, stack overflow happens at the previous page of the
stack end position if stack grows down, but, guard page is placed at
the next page of the stack begin position. So, this stack overflow
detection depends on the fact that previous vmalloc-ed area is allocated
without VM_NO_GUARD. There isn't many users for this flag so there
would be no problem but just note.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] can we use vmalloc to alloc thread stack if compaction failed
  2016-07-29  3:01               ` Joonsoo Kim
@ 2016-07-29 19:47                 ` Andy Lutomirski
  2016-08-01  5:30                   ` Joonsoo Kim
  0 siblings, 1 reply; 13+ messages in thread
From: Andy Lutomirski @ 2016-07-29 19:47 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andy Lutomirski, Xishi Qiu, Michal Hocko, Tejun Heo, Ingo Molnar,
	Peter Zijlstra, LKML, Linux MM, Yisheng Xie

---------- Forwarded message ----------
From: "Joonsoo Kim" <iamjoonsoo.kim@lge.com>
Date: Jul 28, 2016 7:57 PM
Subject: Re: [RFC] can we use vmalloc to alloc thread stack if compaction failed
To: "Andy Lutomirski" <luto@kernel.org>
Cc: "Xishi Qiu" <qiuxishi@huawei.com>, "Michal Hocko"
<mhocko@kernel.org>, "Tejun Heo" <tj@kernel.org>, "Ingo Molnar"
<mingo@kernel.org>, "Peter Zijlstra" <peterz@infradead.org>, "LKML"
<linux-kernel@vger.kernel.org>, "Linux MM" <linux-mm@kvack.org>,
"Yisheng Xie" <xieyisheng1@huawei.com>

> On Thu, Jul 28, 2016 at 08:07:51AM -0700, Andy Lutomirski wrote:
> > On Thu, Jul 28, 2016 at 3:51 AM, Xishi Qiu <qiuxishi@huawei.com> wrote:
> > > On 2016/7/28 17:43, Michal Hocko wrote:
> > >
> > >> On Thu 28-07-16 16:45:06, Xishi Qiu wrote:
> > >>> On 2016/7/28 15:58, Michal Hocko wrote:
> > >>>
> > >>>> On Thu 28-07-16 15:41:53, Xishi Qiu wrote:
> > >>>>> On 2016/7/28 15:20, Michal Hocko wrote:
> > >>>>>
> > >>>>>> On Thu 28-07-16 15:08:26, Xishi Qiu wrote:
> > >>>>>>> Usually THREAD_SIZE_ORDER is 2, it means we need to alloc 16kb continuous
> > >>>>>>> physical memory during fork a new process.
> > >>>>>>>
> > >>>>>>> If the system's memory is very small, especially the smart phone, maybe there
> > >>>>>>> is only 1G memory. So the free memory is very small and compaction is not
> > >>>>>>> always success in slowpath(__alloc_pages_slowpath), then alloc thread stack
> > >>>>>>> may be failed for memory fragment.
> > >>>>>>
> > >>>>>> Well, with the current implementation of the page allocator those
> > >>>>>> requests will not fail in most cases. The oom killer would be invoked in
> > >>>>>> order to free up some memory.
> > >>>>>>
> > >>>>>
> > >>>>> Hi Michal,
> > >>>>>
> > >>>>> Yes, it success in most cases, but I did have seen this problem in some
> > >>>>> stress-test.
> > >>>>>
> > >>>>> DMA free:470628kB, but alloc 2 order block failed during fork a new process.
> > >>>>> There are so many memory fragments and the large block may be soon taken by
> > >>>>> others after compact because of stress-test.
> > >>>>>
> > >>>>> --- dmesg messages ---
> > >>>>> 07-13 08:41:51.341 <4>[309805.658142s][pid:1361,cpu5,sManagerService]sManagerService: page allocation failure: order:2, mode:0x2000d1
> > >>>>
> > >>>> Yes but this is __GFP_DMA allocation. I guess you have already reported
> > >>>> this failure and you've been told that this is quite unexpected for the
> > >>>> kernel stack allocation. It is your out-of-tree patch which just makes
> > >>>> things worse because DMA restricted allocations are considered "lowmem"
> > >>>> and so they do not invoke OOM killer and do not retry like regular
> > >>>> GFP_KERNEL allocations.
> > >>>
> > >>> Hi Michal,
> > >>>
> > >>> Yes, we add GFP_DMA, but I don't think this is the key for the problem.
> > >>
> > >> You are restricting the allocation request to a single zone which is
> > >> definitely not good. Look at how many larger order pages are available
> > >> in the Normal zone.
> > >>
> > >>> If we do oom-killer, maybe we will get a large block later, but there
> > >>> is enough free memory before oom(although most of them are fragments).
> > >>
> > >> Killing a task is of course the last resort action. It would give you
> > >> larger order blocks used for the victims thread.
> > >>
> > >>> I wonder if we can alloc success without kill any process in this situation.
> > >>
> > >> Sure it would be preferable to compact that memory but that might be
> > >> hard with your restriction in place. Consider that DMA zone would tend
> > >> to be less movable than normal zones as users would have to pin it for
> > >> DMA. Your DMA is really large so this might turn out to just happen to
> > >> work but note that the primary problem here is that you put a zone
> > >> restriction for your allocations.
> > >>
> > >>> Maybe use vmalloc is a good way, but I don't know the influence.
> > >>
> > >> You can have a look at vmalloc patches posted by Andy. They are not that
> > >> trivial.
> > >>
> > >
> > > Hi Michal,
> > >
> > > Thank you for your comment, could you give me the link?
> > >
> >
> > I've been keeping it mostly up to date in this branch:
> >
> > https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/log/?h=x86/vmap_stack
> >
> > It's currently out of sync due to a bunch of the patches being queued
> > elsewhere for the merge window.
>
> Hello, Andy.
>
> I have some questions about it.
>
> IIUC, to turn on HAVE_ARCH_VMAP_STACK on different architecture, there
> is nothing to be done in architecture side if the architecture doesn't
> support lazily faults in top-level paging entries for the vmalloc
> area. Is my understanding is correct?
>

There should be nothing fundamental that needs to be done.  On the
other hand, it might be good to make sure the arch code can print a
clean stack trace on stack overflow.

If it's helpful, I just pushed out anew

> And, I'd like to know how you search problematic places using kernel
> stack for DMA.
>

I did some searching for problematic sg_init_buf calls using
Coccinelle.  I'm not very good at Coccinelle, so I may have missed
something.

For the most part, DMA API debugging should have found the problems
already.  The ones I found were in drivers that didn't do real DMA:
crypto users and virtio.

> One note is that, stack overflow happens at the previous page of the
> stack end position if stack grows down, but, guard page is placed at
> the next page of the stack begin position. So, this stack overflow
> detection depends on the fact that previous vmalloc-ed area is allocated
> without VM_NO_GUARD. There isn't many users for this flag so there
> would be no problem but just note.

Yes, and that's a known weakness.  It would be nice to improve it.

--Andy

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] can we use vmalloc to alloc thread stack if compaction failed
  2016-07-29 19:47                 ` Andy Lutomirski
@ 2016-08-01  5:30                   ` Joonsoo Kim
  2016-08-10 11:59                     ` Andy Lutomirski
  0 siblings, 1 reply; 13+ messages in thread
From: Joonsoo Kim @ 2016-08-01  5:30 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Andy Lutomirski, Xishi Qiu, Michal Hocko, Tejun Heo, Ingo Molnar,
	Peter Zijlstra, LKML, Linux MM, Yisheng Xie

On Fri, Jul 29, 2016 at 12:47:38PM -0700, Andy Lutomirski wrote:
> ---------- Forwarded message ----------
> From: "Joonsoo Kim" <iamjoonsoo.kim@lge.com>
> Date: Jul 28, 2016 7:57 PM
> Subject: Re: [RFC] can we use vmalloc to alloc thread stack if compaction failed
> To: "Andy Lutomirski" <luto@kernel.org>
> Cc: "Xishi Qiu" <qiuxishi@huawei.com>, "Michal Hocko"
> <mhocko@kernel.org>, "Tejun Heo" <tj@kernel.org>, "Ingo Molnar"
> <mingo@kernel.org>, "Peter Zijlstra" <peterz@infradead.org>, "LKML"
> <linux-kernel@vger.kernel.org>, "Linux MM" <linux-mm@kvack.org>,
> "Yisheng Xie" <xieyisheng1@huawei.com>
> 
> > On Thu, Jul 28, 2016 at 08:07:51AM -0700, Andy Lutomirski wrote:
> > > On Thu, Jul 28, 2016 at 3:51 AM, Xishi Qiu <qiuxishi@huawei.com> wrote:
> > > > On 2016/7/28 17:43, Michal Hocko wrote:
> > > >
> > > >> On Thu 28-07-16 16:45:06, Xishi Qiu wrote:
> > > >>> On 2016/7/28 15:58, Michal Hocko wrote:
> > > >>>
> > > >>>> On Thu 28-07-16 15:41:53, Xishi Qiu wrote:
> > > >>>>> On 2016/7/28 15:20, Michal Hocko wrote:
> > > >>>>>
> > > >>>>>> On Thu 28-07-16 15:08:26, Xishi Qiu wrote:
> > > >>>>>>> Usually THREAD_SIZE_ORDER is 2, it means we need to alloc 16kb continuous
> > > >>>>>>> physical memory during fork a new process.
> > > >>>>>>>
> > > >>>>>>> If the system's memory is very small, especially the smart phone, maybe there
> > > >>>>>>> is only 1G memory. So the free memory is very small and compaction is not
> > > >>>>>>> always success in slowpath(__alloc_pages_slowpath), then alloc thread stack
> > > >>>>>>> may be failed for memory fragment.
> > > >>>>>>
> > > >>>>>> Well, with the current implementation of the page allocator those
> > > >>>>>> requests will not fail in most cases. The oom killer would be invoked in
> > > >>>>>> order to free up some memory.
> > > >>>>>>
> > > >>>>>
> > > >>>>> Hi Michal,
> > > >>>>>
> > > >>>>> Yes, it success in most cases, but I did have seen this problem in some
> > > >>>>> stress-test.
> > > >>>>>
> > > >>>>> DMA free:470628kB, but alloc 2 order block failed during fork a new process.
> > > >>>>> There are so many memory fragments and the large block may be soon taken by
> > > >>>>> others after compact because of stress-test.
> > > >>>>>
> > > >>>>> --- dmesg messages ---
> > > >>>>> 07-13 08:41:51.341 <4>[309805.658142s][pid:1361,cpu5,sManagerService]sManagerService: page allocation failure: order:2, mode:0x2000d1
> > > >>>>
> > > >>>> Yes but this is __GFP_DMA allocation. I guess you have already reported
> > > >>>> this failure and you've been told that this is quite unexpected for the
> > > >>>> kernel stack allocation. It is your out-of-tree patch which just makes
> > > >>>> things worse because DMA restricted allocations are considered "lowmem"
> > > >>>> and so they do not invoke OOM killer and do not retry like regular
> > > >>>> GFP_KERNEL allocations.
> > > >>>
> > > >>> Hi Michal,
> > > >>>
> > > >>> Yes, we add GFP_DMA, but I don't think this is the key for the problem.
> > > >>
> > > >> You are restricting the allocation request to a single zone which is
> > > >> definitely not good. Look at how many larger order pages are available
> > > >> in the Normal zone.
> > > >>
> > > >>> If we do oom-killer, maybe we will get a large block later, but there
> > > >>> is enough free memory before oom(although most of them are fragments).
> > > >>
> > > >> Killing a task is of course the last resort action. It would give you
> > > >> larger order blocks used for the victims thread.
> > > >>
> > > >>> I wonder if we can alloc success without kill any process in this situation.
> > > >>
> > > >> Sure it would be preferable to compact that memory but that might be
> > > >> hard with your restriction in place. Consider that DMA zone would tend
> > > >> to be less movable than normal zones as users would have to pin it for
> > > >> DMA. Your DMA is really large so this might turn out to just happen to
> > > >> work but note that the primary problem here is that you put a zone
> > > >> restriction for your allocations.
> > > >>
> > > >>> Maybe use vmalloc is a good way, but I don't know the influence.
> > > >>
> > > >> You can have a look at vmalloc patches posted by Andy. They are not that
> > > >> trivial.
> > > >>
> > > >
> > > > Hi Michal,
> > > >
> > > > Thank you for your comment, could you give me the link?
> > > >
> > >
> > > I've been keeping it mostly up to date in this branch:
> > >
> > > https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/log/?h=x86/vmap_stack
> > >
> > > It's currently out of sync due to a bunch of the patches being queued
> > > elsewhere for the merge window.
> >
> > Hello, Andy.
> >
> > I have some questions about it.
> >
> > IIUC, to turn on HAVE_ARCH_VMAP_STACK on different architecture, there
> > is nothing to be done in architecture side if the architecture doesn't
> > support lazily faults in top-level paging entries for the vmalloc
> > area. Is my understanding is correct?
> >
> 
> There should be nothing fundamental that needs to be done.  On the
> other hand, it might be good to make sure the arch code can print a
> clean stack trace on stack overflow.
> 
> If it's helpful, I just pushed out anew

You mean that you can turn on HAVE_ARCH_VMAP_STACK on the other arch? It
would be helpful. :)

> 
> > And, I'd like to know how you search problematic places using kernel
> > stack for DMA.
> >
> 
> I did some searching for problematic sg_init_buf calls using
> Coccinelle.  I'm not very good at Coccinelle, so I may have missed
> something.

I'm also not familiar with Coccinelle. Could you share your .cocci
script? I can think of following one but there would be a better way.

virtual report

@stack_var depends on report@
type T1;
expression E1, E2;
identifier I1;
@@
(
* T1 I1;
)
...
(
* sg_init_one(E1, &I1, E2)
|
* sg_set_buf(E1, &I1, E2)
)

@stack_arr depends on report@
type T1;
expression E1, E2, E3;
identifier I1;
@@
(
* T1 I1[E1];
)
...
(
* sg_init_one(E2, I1, E3)
|
* sg_set_buf(E2, I1, E3)
)


> For the most part, DMA API debugging should have found the problems
> already.  The ones I found were in drivers that didn't do real DMA:
> crypto users and virtio.

Ah... using stack for DMA API is already prohibited.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] can we use vmalloc to alloc thread stack if compaction failed
  2016-08-01  5:30                   ` Joonsoo Kim
@ 2016-08-10 11:59                     ` Andy Lutomirski
  2016-08-16  4:18                       ` Joonsoo Kim
  0 siblings, 1 reply; 13+ messages in thread
From: Andy Lutomirski @ 2016-08-10 11:59 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andy Lutomirski, Xishi Qiu, Michal Hocko, Tejun Heo, Ingo Molnar,
	Peter Zijlstra, LKML, Linux MM, Yisheng Xie

On Sun, Jul 31, 2016 at 10:30 PM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> On Fri, Jul 29, 2016 at 12:47:38PM -0700, Andy Lutomirski wrote:
>> ---------- Forwarded message ----------
>> From: "Joonsoo Kim" <iamjoonsoo.kim@lge.com>
>> Date: Jul 28, 2016 7:57 PM
>> Subject: Re: [RFC] can we use vmalloc to alloc thread stack if compaction failed
>> To: "Andy Lutomirski" <luto@kernel.org>
>> Cc: "Xishi Qiu" <qiuxishi@huawei.com>, "Michal Hocko"
>> <mhocko@kernel.org>, "Tejun Heo" <tj@kernel.org>, "Ingo Molnar"
>> <mingo@kernel.org>, "Peter Zijlstra" <peterz@infradead.org>, "LKML"
>> <linux-kernel@vger.kernel.org>, "Linux MM" <linux-mm@kvack.org>,
>> "Yisheng Xie" <xieyisheng1@huawei.com>
>>
>> > On Thu, Jul 28, 2016 at 08:07:51AM -0700, Andy Lutomirski wrote:
>> > > On Thu, Jul 28, 2016 at 3:51 AM, Xishi Qiu <qiuxishi@huawei.com> wrote:
>> > > > On 2016/7/28 17:43, Michal Hocko wrote:
>> > > >
>> > > >> On Thu 28-07-16 16:45:06, Xishi Qiu wrote:
>> > > >>> On 2016/7/28 15:58, Michal Hocko wrote:
>> > > >>>
>> > > >>>> On Thu 28-07-16 15:41:53, Xishi Qiu wrote:
>> > > >>>>> On 2016/7/28 15:20, Michal Hocko wrote:
>> > > >>>>>
>> > > >>>>>> On Thu 28-07-16 15:08:26, Xishi Qiu wrote:
>> > > >>>>>>> Usually THREAD_SIZE_ORDER is 2, it means we need to alloc 16kb continuous
>> > > >>>>>>> physical memory during fork a new process.
>> > > >>>>>>>
>> > > >>>>>>> If the system's memory is very small, especially the smart phone, maybe there
>> > > >>>>>>> is only 1G memory. So the free memory is very small and compaction is not
>> > > >>>>>>> always success in slowpath(__alloc_pages_slowpath), then alloc thread stack
>> > > >>>>>>> may be failed for memory fragment.
>> > > >>>>>>
>> > > >>>>>> Well, with the current implementation of the page allocator those
>> > > >>>>>> requests will not fail in most cases. The oom killer would be invoked in
>> > > >>>>>> order to free up some memory.
>> > > >>>>>>
>> > > >>>>>
>> > > >>>>> Hi Michal,
>> > > >>>>>
>> > > >>>>> Yes, it success in most cases, but I did have seen this problem in some
>> > > >>>>> stress-test.
>> > > >>>>>
>> > > >>>>> DMA free:470628kB, but alloc 2 order block failed during fork a new process.
>> > > >>>>> There are so many memory fragments and the large block may be soon taken by
>> > > >>>>> others after compact because of stress-test.
>> > > >>>>>
>> > > >>>>> --- dmesg messages ---
>> > > >>>>> 07-13 08:41:51.341 <4>[309805.658142s][pid:1361,cpu5,sManagerService]sManagerService: page allocation failure: order:2, mode:0x2000d1
>> > > >>>>
>> > > >>>> Yes but this is __GFP_DMA allocation. I guess you have already reported
>> > > >>>> this failure and you've been told that this is quite unexpected for the
>> > > >>>> kernel stack allocation. It is your out-of-tree patch which just makes
>> > > >>>> things worse because DMA restricted allocations are considered "lowmem"
>> > > >>>> and so they do not invoke OOM killer and do not retry like regular
>> > > >>>> GFP_KERNEL allocations.
>> > > >>>
>> > > >>> Hi Michal,
>> > > >>>
>> > > >>> Yes, we add GFP_DMA, but I don't think this is the key for the problem.
>> > > >>
>> > > >> You are restricting the allocation request to a single zone which is
>> > > >> definitely not good. Look at how many larger order pages are available
>> > > >> in the Normal zone.
>> > > >>
>> > > >>> If we do oom-killer, maybe we will get a large block later, but there
>> > > >>> is enough free memory before oom(although most of them are fragments).
>> > > >>
>> > > >> Killing a task is of course the last resort action. It would give you
>> > > >> larger order blocks used for the victims thread.
>> > > >>
>> > > >>> I wonder if we can alloc success without kill any process in this situation.
>> > > >>
>> > > >> Sure it would be preferable to compact that memory but that might be
>> > > >> hard with your restriction in place. Consider that DMA zone would tend
>> > > >> to be less movable than normal zones as users would have to pin it for
>> > > >> DMA. Your DMA is really large so this might turn out to just happen to
>> > > >> work but note that the primary problem here is that you put a zone
>> > > >> restriction for your allocations.
>> > > >>
>> > > >>> Maybe use vmalloc is a good way, but I don't know the influence.
>> > > >>
>> > > >> You can have a look at vmalloc patches posted by Andy. They are not that
>> > > >> trivial.
>> > > >>
>> > > >
>> > > > Hi Michal,
>> > > >
>> > > > Thank you for your comment, could you give me the link?
>> > > >
>> > >
>> > > I've been keeping it mostly up to date in this branch:
>> > >
>> > > https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/log/?h=x86/vmap_stack
>> > >
>> > > It's currently out of sync due to a bunch of the patches being queued
>> > > elsewhere for the merge window.
>> >
>> > Hello, Andy.
>> >
>> > I have some questions about it.
>> >
>> > IIUC, to turn on HAVE_ARCH_VMAP_STACK on different architecture, there
>> > is nothing to be done in architecture side if the architecture doesn't
>> > support lazily faults in top-level paging entries for the vmalloc
>> > area. Is my understanding is correct?
>> >
>>
>> There should be nothing fundamental that needs to be done.  On the
>> other hand, it might be good to make sure the arch code can print a
>> clean stack trace on stack overflow.
>>
>> If it's helpful, I just pushed out anew
>
> You mean that you can turn on HAVE_ARCH_VMAP_STACK on the other arch? It
> would be helpful. :)
>
>>
>> > And, I'd like to know how you search problematic places using kernel
>> > stack for DMA.
>> >
>>
>> I did some searching for problematic sg_init_buf calls using
>> Coccinelle.  I'm not very good at Coccinelle, so I may have missed
>> something.
>
> I'm also not familiar with Coccinelle. Could you share your .cocci
> script? I can think of following one but there would be a better way.
>
> virtual report
>
> @stack_var depends on report@
> type T1;
> expression E1, E2;
> identifier I1;
> @@
> (
> * T1 I1;
> )
> ...
> (
> * sg_init_one(E1, &I1, E2)
> |
> * sg_set_buf(E1, &I1, E2)
> )
>
> @stack_arr depends on report@
> type T1;
> expression E1, E2, E3;
> identifier I1;
> @@
> (
> * T1 I1[E1];
> )
> ...
> (
> * sg_init_one(E2, I1, E3)
> |
> * sg_set_buf(E2, I1, E3)
> )
>
>

$ cat sgstack.cocci
@@
local idexpression S;
expression A, B;
@@

(
* sg_init_one(A, &S, B)
|
* virt_to_phys(&S)


not very inspiring.  I barely understand Coccinelle syntax, and sadly
I find the manual nearly incomprehensible.  I can read the grammar,
but that doesn't mean I know what the various declarations do.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] can we use vmalloc to alloc thread stack if compaction failed
  2016-08-10 11:59                     ` Andy Lutomirski
@ 2016-08-16  4:18                       ` Joonsoo Kim
  0 siblings, 0 replies; 13+ messages in thread
From: Joonsoo Kim @ 2016-08-16  4:18 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Andy Lutomirski, Xishi Qiu, Michal Hocko, Tejun Heo, Ingo Molnar,
	Peter Zijlstra, LKML, Linux MM, Yisheng Xie

On Wed, Aug 10, 2016 at 04:59:39AM -0700, Andy Lutomirski wrote:
> On Sun, Jul 31, 2016 at 10:30 PM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> > On Fri, Jul 29, 2016 at 12:47:38PM -0700, Andy Lutomirski wrote:
> >> ---------- Forwarded message ----------
> >> From: "Joonsoo Kim" <iamjoonsoo.kim@lge.com>
> >> Date: Jul 28, 2016 7:57 PM
> >> Subject: Re: [RFC] can we use vmalloc to alloc thread stack if compaction failed
> >> To: "Andy Lutomirski" <luto@kernel.org>
> >> Cc: "Xishi Qiu" <qiuxishi@huawei.com>, "Michal Hocko"
> >> <mhocko@kernel.org>, "Tejun Heo" <tj@kernel.org>, "Ingo Molnar"
> >> <mingo@kernel.org>, "Peter Zijlstra" <peterz@infradead.org>, "LKML"
> >> <linux-kernel@vger.kernel.org>, "Linux MM" <linux-mm@kvack.org>,
> >> "Yisheng Xie" <xieyisheng1@huawei.com>
> >>
> >> > On Thu, Jul 28, 2016 at 08:07:51AM -0700, Andy Lutomirski wrote:
> >> > > On Thu, Jul 28, 2016 at 3:51 AM, Xishi Qiu <qiuxishi@huawei.com> wrote:
> >> > > > On 2016/7/28 17:43, Michal Hocko wrote:
> >> > > >
> >> > > >> On Thu 28-07-16 16:45:06, Xishi Qiu wrote:
> >> > > >>> On 2016/7/28 15:58, Michal Hocko wrote:
> >> > > >>>
> >> > > >>>> On Thu 28-07-16 15:41:53, Xishi Qiu wrote:
> >> > > >>>>> On 2016/7/28 15:20, Michal Hocko wrote:
> >> > > >>>>>
> >> > > >>>>>> On Thu 28-07-16 15:08:26, Xishi Qiu wrote:
> >> > > >>>>>>> Usually THREAD_SIZE_ORDER is 2, it means we need to alloc 16kb continuous
> >> > > >>>>>>> physical memory during fork a new process.
> >> > > >>>>>>>
> >> > > >>>>>>> If the system's memory is very small, especially the smart phone, maybe there
> >> > > >>>>>>> is only 1G memory. So the free memory is very small and compaction is not
> >> > > >>>>>>> always success in slowpath(__alloc_pages_slowpath), then alloc thread stack
> >> > > >>>>>>> may be failed for memory fragment.
> >> > > >>>>>>
> >> > > >>>>>> Well, with the current implementation of the page allocator those
> >> > > >>>>>> requests will not fail in most cases. The oom killer would be invoked in
> >> > > >>>>>> order to free up some memory.
> >> > > >>>>>>
> >> > > >>>>>
> >> > > >>>>> Hi Michal,
> >> > > >>>>>
> >> > > >>>>> Yes, it success in most cases, but I did have seen this problem in some
> >> > > >>>>> stress-test.
> >> > > >>>>>
> >> > > >>>>> DMA free:470628kB, but alloc 2 order block failed during fork a new process.
> >> > > >>>>> There are so many memory fragments and the large block may be soon taken by
> >> > > >>>>> others after compact because of stress-test.
> >> > > >>>>>
> >> > > >>>>> --- dmesg messages ---
> >> > > >>>>> 07-13 08:41:51.341 <4>[309805.658142s][pid:1361,cpu5,sManagerService]sManagerService: page allocation failure: order:2, mode:0x2000d1
> >> > > >>>>
> >> > > >>>> Yes but this is __GFP_DMA allocation. I guess you have already reported
> >> > > >>>> this failure and you've been told that this is quite unexpected for the
> >> > > >>>> kernel stack allocation. It is your out-of-tree patch which just makes
> >> > > >>>> things worse because DMA restricted allocations are considered "lowmem"
> >> > > >>>> and so they do not invoke OOM killer and do not retry like regular
> >> > > >>>> GFP_KERNEL allocations.
> >> > > >>>
> >> > > >>> Hi Michal,
> >> > > >>>
> >> > > >>> Yes, we add GFP_DMA, but I don't think this is the key for the problem.
> >> > > >>
> >> > > >> You are restricting the allocation request to a single zone which is
> >> > > >> definitely not good. Look at how many larger order pages are available
> >> > > >> in the Normal zone.
> >> > > >>
> >> > > >>> If we do oom-killer, maybe we will get a large block later, but there
> >> > > >>> is enough free memory before oom(although most of them are fragments).
> >> > > >>
> >> > > >> Killing a task is of course the last resort action. It would give you
> >> > > >> larger order blocks used for the victims thread.
> >> > > >>
> >> > > >>> I wonder if we can alloc success without kill any process in this situation.
> >> > > >>
> >> > > >> Sure it would be preferable to compact that memory but that might be
> >> > > >> hard with your restriction in place. Consider that DMA zone would tend
> >> > > >> to be less movable than normal zones as users would have to pin it for
> >> > > >> DMA. Your DMA is really large so this might turn out to just happen to
> >> > > >> work but note that the primary problem here is that you put a zone
> >> > > >> restriction for your allocations.
> >> > > >>
> >> > > >>> Maybe use vmalloc is a good way, but I don't know the influence.
> >> > > >>
> >> > > >> You can have a look at vmalloc patches posted by Andy. They are not that
> >> > > >> trivial.
> >> > > >>
> >> > > >
> >> > > > Hi Michal,
> >> > > >
> >> > > > Thank you for your comment, could you give me the link?
> >> > > >
> >> > >
> >> > > I've been keeping it mostly up to date in this branch:
> >> > >
> >> > > https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/log/?h=x86/vmap_stack
> >> > >
> >> > > It's currently out of sync due to a bunch of the patches being queued
> >> > > elsewhere for the merge window.
> >> >
> >> > Hello, Andy.
> >> >
> >> > I have some questions about it.
> >> >
> >> > IIUC, to turn on HAVE_ARCH_VMAP_STACK on different architecture, there
> >> > is nothing to be done in architecture side if the architecture doesn't
> >> > support lazily faults in top-level paging entries for the vmalloc
> >> > area. Is my understanding is correct?
> >> >
> >>
> >> There should be nothing fundamental that needs to be done.  On the
> >> other hand, it might be good to make sure the arch code can print a
> >> clean stack trace on stack overflow.
> >>
> >> If it's helpful, I just pushed out anew
> >
> > You mean that you can turn on HAVE_ARCH_VMAP_STACK on the other arch? It
> > would be helpful. :)
> >
> >>
> >> > And, I'd like to know how you search problematic places using kernel
> >> > stack for DMA.
> >> >
> >>
> >> I did some searching for problematic sg_init_buf calls using
> >> Coccinelle.  I'm not very good at Coccinelle, so I may have missed
> >> something.
> >
> > I'm also not familiar with Coccinelle. Could you share your .cocci
> > script? I can think of following one but there would be a better way.
> >
> > virtual report
> >
> > @stack_var depends on report@
> > type T1;
> > expression E1, E2;
> > identifier I1;
> > @@
> > (
> > * T1 I1;
> > )
> > ...
> > (
> > * sg_init_one(E1, &I1, E2)
> > |
> > * sg_set_buf(E1, &I1, E2)
> > )
> >
> > @stack_arr depends on report@
> > type T1;
> > expression E1, E2, E3;
> > identifier I1;
> > @@
> > (
> > * T1 I1[E1];
> > )
> > ...
> > (
> > * sg_init_one(E2, I1, E3)
> > |
> > * sg_set_buf(E2, I1, E3)
> > )
> >
> >
> 
> $ cat sgstack.cocci
> @@
> local idexpression S;
> expression A, B;
> @@
> 
> (
> * sg_init_one(A, &S, B)
> |
> * virt_to_phys(&S)
> 
> 
> not very inspiring.  I barely understand Coccinelle syntax, and sadly
> I find the manual nearly incomprehensible.  I can read the grammar,
> but that doesn't mean I know what the various declarations do.

Thanks for sharing it.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2016-08-16  4:13 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-07-28  7:08 [RFC] can we use vmalloc to alloc thread stack if compaction failed Xishi Qiu
2016-07-28  7:20 ` Michal Hocko
2016-07-28  7:41   ` Xishi Qiu
2016-07-28  7:58     ` Michal Hocko
2016-07-28  8:45       ` Xishi Qiu
2016-07-28  9:43         ` Michal Hocko
2016-07-28 10:51           ` Xishi Qiu
2016-07-28 15:07             ` Andy Lutomirski
2016-07-29  3:01               ` Joonsoo Kim
2016-07-29 19:47                 ` Andy Lutomirski
2016-08-01  5:30                   ` Joonsoo Kim
2016-08-10 11:59                     ` Andy Lutomirski
2016-08-16  4:18                       ` Joonsoo Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).