public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3] binder_alloc: Replace kcalloc with kvcalloc to mitigate OOM issues
@ 2024-06-14  4:09 Lei Liu
  2024-06-14 18:38 ` Carlos Llamas
  0 siblings, 1 reply; 9+ messages in thread
From: Lei Liu @ 2024-06-14  4:09 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
	Martijn Coenen, Joel Fernandes, Christian Brauner, Carlos Llamas,
	Suren Baghdasaryan, linux-kernel
  Cc: opensource.kernel, Lei Liu

1.In binder_alloc, there is a frequent need for order3 memory
allocation, especially on small-memory mobile devices, which can lead
to OOM and cause foreground applications to be killed, resulting in
flashbacks.The kernel call stack after the issue occurred is as follows:
dumpsys invoked oom-killer:
gfp_mask=0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), order=3,
oom_score_adj=-950
CPU: 6 PID: 31329 Comm: dumpsys Tainted: G        WC O
5.10.168-android12-9-00003-gc873b6b86254-ab10823632 #1
Call trace:
 dump_backtrace.cfi_jt+0x0/0x8
 dump_stack_lvl+0xdc/0x138
 dump_header+0x5c/0x2ac
 oom_kill_process+0x124/0x304
 out_of_memory+0x25c/0x5e0
 __alloc_pages_slowpath+0x690/0xf6c
 __alloc_pages_nodemask+0x1f4/0x3dc
 kmalloc_order+0x54/0x338
 kmalloc_order_trace+0x34/0x1bc
 __kmalloc+0x5e8/0x9c0
 binder_alloc_mmap_handler+0x88/0x1f8
 binder_mmap+0x90/0x10c
 mmap_region+0x44c/0xc14
 do_mmap+0x518/0x680
 vm_mmap_pgoff+0x15c/0x378
 ksys_mmap_pgoff+0x80/0x108
 __arm64_sys_mmap+0x38/0x48
 el0_svc_common+0xd4/0x270
 el0_svc+0x28/0x98
 el0_sync_handler+0x8c/0xf0
 el0_sync+0x1b4/0x1c0
Mem-Info:
active_anon:47096 inactive_anon:57927 isolated_anon:100
active_file:43790 inactive_file:44434 isolated_file:0
unevictable:14693 dirty:171 writeback:0\x0a slab_reclaimable:21676
slab_unreclaimable:81771\x0a mapped:84485 shmem:4275 pagetables:33367
bounce:0\x0a free:3772 free_pcp:198 free_cma:11
Node 0 active_anon:188384kB inactive_anon:231708kB active_file:175160kB
inactive_file:177736kB unevictable:58772kB isolated(anon):400kB
isolated(file):0kB mapped:337940kB dirty:684kB writeback:0kB
shmem:17100kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB
writeback_tmp:0kB kernel_stack:84960kB shadow_call_stack:21340kB
Normal free:15088kB min:8192kB low:42616kB high:46164kB
reserved_highatomic:4096KB active_anon:187644kB inactive_anon:231608kB
active_file:174552kB inactive_file:178012kB unevictable:58772kB
writepending:684kB present:3701440kB managed:3550144kB mlocked:58508kB
pagetables:133468kB bounce:0kB free_pcp:1048kB local_pcp:12kB
free_cma:44kB
Normal: 3313*4kB (UMEH) 165*8kB (UMEH) 35*16kB (H) 15*32kB (H) 0*64kB
0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 15612kB
108356 total pagecache pages

2.We use kvcalloc to allocate memory, which can reduce system OOM
occurrences, as well as decrease the time and probability of failure
for order3 memory allocations. Additionally, it can also improve the
throughput of binder (as verified by Google's binder_benchmark testing
tool).

3.We have conducted multiple tests on an 12GB memory phone, and the
performance of kvcalloc is better. Below is a partial excerpt of the
test data.
throughput = (size * Iterations)/Time
kvcalloc->kvmalloc:
Benchmark-kvcalloc	Time	CPU	Iterations	throughput(Gb/s)
----------------------------------------------------------------
BM_sendVec_binder-4096	30926 ns	20481 ns	34457	4563.66↑
BM_sendVec_binder-8192	42667 ns	30837 ns	22631	4345.11↑
BM_sendVec_binder-16384	67586 ns	52381 ns	13318	3228.51↑
BM_sendVec_binder-32768	116496 ns	94893 ns	7416	2085.97↑
BM_sendVec_binder-65536	265482 ns	209214 ns	3530	871.40↑

kcalloc->kmalloc
Benchmark-kcalloc	Time	CPU	Iterations	throughput(Gb/s)
----------------------------------------------------------------
BM_sendVec_binder-4096	39070 ns	24207 ns	31063	3256.56
BM_sendVec_binder-8192	49476 ns	35099 ns	18817	3115.62
BM_sendVec_binder-16384	76866 ns	58924 ns	11883	2532.86
BM_sendVec_binder-32768	134022 ns	102788 ns	6535	1597.78
BM_sendVec_binder-65536	281004 ns	220028 ns	3135	731.14

Signed-off-by: Lei Liu <liulei.rjpt@vivo.com>
---
Changelog:
v2->v3:
1.Modify the commit message description as the description for the V2
  version is unclear.
---
 drivers/android/binder_alloc.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index 2e1f261ec5c8..5dcab4a5e341 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -836,7 +836,7 @@ int binder_alloc_mmap_handler(struct binder_alloc *alloc,
 
 	alloc->buffer = vma->vm_start;
 
-	alloc->pages = kcalloc(alloc->buffer_size / PAGE_SIZE,
+	alloc->pages = kvcalloc(alloc->buffer_size / PAGE_SIZE,
 			       sizeof(alloc->pages[0]),
 			       GFP_KERNEL);
 	if (alloc->pages == NULL) {
@@ -869,7 +869,7 @@ int binder_alloc_mmap_handler(struct binder_alloc *alloc,
 	return 0;
 
 err_alloc_buf_struct_failed:
-	kfree(alloc->pages);
+	kvfree(alloc->pages);
 	alloc->pages = NULL;
 err_alloc_pages_failed:
 	alloc->buffer = 0;
@@ -939,7 +939,7 @@ void binder_alloc_deferred_release(struct binder_alloc *alloc)
 			__free_page(alloc->pages[i].page_ptr);
 			page_count++;
 		}
-		kfree(alloc->pages);
+		kvfree(alloc->pages);
 	}
 	spin_unlock(&alloc->lock);
 	if (alloc->mm)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v3] binder_alloc: Replace kcalloc with kvcalloc to mitigate OOM issues
  2024-06-14  4:09 [PATCH v3] binder_alloc: Replace kcalloc with kvcalloc to mitigate OOM issues Lei Liu
@ 2024-06-14 18:38 ` Carlos Llamas
  2024-06-17  4:01   ` Lei Liu
  0 siblings, 1 reply; 9+ messages in thread
From: Carlos Llamas @ 2024-06-14 18:38 UTC (permalink / raw)
  To: Lei Liu
  Cc: Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
	Martijn Coenen, Joel Fernandes, Christian Brauner,
	Suren Baghdasaryan, linux-kernel, opensource.kernel

On Fri, Jun 14, 2024 at 12:09:29PM +0800, Lei Liu wrote:
> 1.In binder_alloc, there is a frequent need for order3 memory
> allocation, especially on small-memory mobile devices, which can lead
> to OOM and cause foreground applications to be killed, resulting in
> flashbacks.The kernel call stack after the issue occurred is as follows:
> dumpsys invoked oom-killer:
> gfp_mask=0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), order=3,
> oom_score_adj=-950
> CPU: 6 PID: 31329 Comm: dumpsys Tainted: G        WC O
> 5.10.168-android12-9-00003-gc873b6b86254-ab10823632 #1
> Call trace:
>  dump_backtrace.cfi_jt+0x0/0x8
>  dump_stack_lvl+0xdc/0x138
>  dump_header+0x5c/0x2ac
>  oom_kill_process+0x124/0x304
>  out_of_memory+0x25c/0x5e0
>  __alloc_pages_slowpath+0x690/0xf6c
>  __alloc_pages_nodemask+0x1f4/0x3dc
>  kmalloc_order+0x54/0x338
>  kmalloc_order_trace+0x34/0x1bc
>  __kmalloc+0x5e8/0x9c0
>  binder_alloc_mmap_handler+0x88/0x1f8
>  binder_mmap+0x90/0x10c
>  mmap_region+0x44c/0xc14
>  do_mmap+0x518/0x680
>  vm_mmap_pgoff+0x15c/0x378
>  ksys_mmap_pgoff+0x80/0x108
>  __arm64_sys_mmap+0x38/0x48
>  el0_svc_common+0xd4/0x270
>  el0_svc+0x28/0x98
>  el0_sync_handler+0x8c/0xf0
>  el0_sync+0x1b4/0x1c0
> Mem-Info:
> active_anon:47096 inactive_anon:57927 isolated_anon:100
> active_file:43790 inactive_file:44434 isolated_file:0
> unevictable:14693 dirty:171 writeback:0\x0a slab_reclaimable:21676
> slab_unreclaimable:81771\x0a mapped:84485 shmem:4275 pagetables:33367
> bounce:0\x0a free:3772 free_pcp:198 free_cma:11
> Node 0 active_anon:188384kB inactive_anon:231708kB active_file:175160kB
> inactive_file:177736kB unevictable:58772kB isolated(anon):400kB
> isolated(file):0kB mapped:337940kB dirty:684kB writeback:0kB
> shmem:17100kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB
> writeback_tmp:0kB kernel_stack:84960kB shadow_call_stack:21340kB
> Normal free:15088kB min:8192kB low:42616kB high:46164kB
> reserved_highatomic:4096KB active_anon:187644kB inactive_anon:231608kB
> active_file:174552kB inactive_file:178012kB unevictable:58772kB
> writepending:684kB present:3701440kB managed:3550144kB mlocked:58508kB
> pagetables:133468kB bounce:0kB free_pcp:1048kB local_pcp:12kB
> free_cma:44kB
> Normal: 3313*4kB (UMEH) 165*8kB (UMEH) 35*16kB (H) 15*32kB (H) 0*64kB
> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 15612kB
> 108356 total pagecache pages

Think about indenting this stacktrace. IMO, the v1 had a commit log that
was much easier to follow.

> 
> 2.We use kvcalloc to allocate memory, which can reduce system OOM
> occurrences, as well as decrease the time and probability of failure
> for order3 memory allocations. Additionally, it can also improve the
> throughput of binder (as verified by Google's binder_benchmark testing
> tool).
> 
> 3.We have conducted multiple tests on an 12GB memory phone, and the
> performance of kvcalloc is better. Below is a partial excerpt of the
> test data.
> throughput = (size * Iterations)/Time

Huh? Do you have an explanation for this performance improvement?
Did you test this under memory pressure?

My understanding is that kvcalloc() == kcalloc() if there is enough
contiguous memory no?

I would expect the performance to be the same at best.

> kvcalloc->kvmalloc:
> Benchmark-kvcalloc	Time	CPU	Iterations	throughput(Gb/s)
> ----------------------------------------------------------------
> BM_sendVec_binder-4096	30926 ns	20481 ns	34457	4563.66↑
> BM_sendVec_binder-8192	42667 ns	30837 ns	22631	4345.11↑
> BM_sendVec_binder-16384	67586 ns	52381 ns	13318	3228.51↑
> BM_sendVec_binder-32768	116496 ns	94893 ns	7416	2085.97↑
> BM_sendVec_binder-65536	265482 ns	209214 ns	3530	871.40↑
> 
> kcalloc->kmalloc
> Benchmark-kcalloc	Time	CPU	Iterations	throughput(Gb/s)
> ----------------------------------------------------------------
> BM_sendVec_binder-4096	39070 ns	24207 ns	31063	3256.56
> BM_sendVec_binder-8192	49476 ns	35099 ns	18817	3115.62
> BM_sendVec_binder-16384	76866 ns	58924 ns	11883	2532.86
> BM_sendVec_binder-32768	134022 ns	102788 ns	6535	1597.78
> BM_sendVec_binder-65536	281004 ns	220028 ns	3135	731.14
> 
> Signed-off-by: Lei Liu <liulei.rjpt@vivo.com>
> ---
> Changelog:
> v2->v3:
> 1.Modify the commit message description as the description for the V2
>   version is unclear.

The complete history of the changelog would be better.

> ---
>  drivers/android/binder_alloc.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
> index 2e1f261ec5c8..5dcab4a5e341 100644
> --- a/drivers/android/binder_alloc.c
> +++ b/drivers/android/binder_alloc.c
> @@ -836,7 +836,7 @@ int binder_alloc_mmap_handler(struct binder_alloc *alloc,
>  
>  	alloc->buffer = vma->vm_start;
>  
> -	alloc->pages = kcalloc(alloc->buffer_size / PAGE_SIZE,
> +	alloc->pages = kvcalloc(alloc->buffer_size / PAGE_SIZE,
>  			       sizeof(alloc->pages[0]),
>  			       GFP_KERNEL);

I believe Greg had asked for these to be aligned to the parenthesis.
You can double check by running checkpatch with the -strict flag.

>  	if (alloc->pages == NULL) {
> @@ -869,7 +869,7 @@ int binder_alloc_mmap_handler(struct binder_alloc *alloc,
>  	return 0;
>  
>  err_alloc_buf_struct_failed:
> -	kfree(alloc->pages);
> +	kvfree(alloc->pages);
>  	alloc->pages = NULL;
>  err_alloc_pages_failed:
>  	alloc->buffer = 0;
> @@ -939,7 +939,7 @@ void binder_alloc_deferred_release(struct binder_alloc *alloc)
>  			__free_page(alloc->pages[i].page_ptr);
>  			page_count++;
>  		}
> -		kfree(alloc->pages);
> +		kvfree(alloc->pages);
>  	}
>  	spin_unlock(&alloc->lock);
>  	if (alloc->mm)
> -- 
> 2.34.1
> 

I'm not so sure about the results and performance improvements that are
claimed here. However, the switch to kvcalloc() itself seems reasonable
to me.

I'll run these tests myself as the results might have some noise. I'll
get back with the results.

Thanks,
Carlos Llamas


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3] binder_alloc: Replace kcalloc with kvcalloc to mitigate OOM issues
  2024-06-14 18:38 ` Carlos Llamas
@ 2024-06-17  4:01   ` Lei Liu
  2024-06-17 18:43     ` Carlos Llamas
  0 siblings, 1 reply; 9+ messages in thread
From: Lei Liu @ 2024-06-17  4:01 UTC (permalink / raw)
  To: Carlos Llamas
  Cc: Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
	Martijn Coenen, Joel Fernandes, Christian Brauner,
	Suren Baghdasaryan, linux-kernel, opensource.kernel


On 6/15/2024 at 2:38, Carlos Llamas wrote:
> On Fri, Jun 14, 2024 at 12:09:29PM +0800, Lei Liu wrote:
>> 1.In binder_alloc, there is a frequent need for order3 memory
>> allocation, especially on small-memory mobile devices, which can lead
>> to OOM and cause foreground applications to be killed, resulting in
>> flashbacks.The kernel call stack after the issue occurred is as follows:
>> dumpsys invoked oom-killer:
>> gfp_mask=0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), order=3,
>> oom_score_adj=-950
>> CPU: 6 PID: 31329 Comm: dumpsys Tainted: G        WC O
>> 5.10.168-android12-9-00003-gc873b6b86254-ab10823632 #1
>> Call trace:
>>   dump_backtrace.cfi_jt+0x0/0x8
>>   dump_stack_lvl+0xdc/0x138
>>   dump_header+0x5c/0x2ac
>>   oom_kill_process+0x124/0x304
>>   out_of_memory+0x25c/0x5e0
>>   __alloc_pages_slowpath+0x690/0xf6c
>>   __alloc_pages_nodemask+0x1f4/0x3dc
>>   kmalloc_order+0x54/0x338
>>   kmalloc_order_trace+0x34/0x1bc
>>   __kmalloc+0x5e8/0x9c0
>>   binder_alloc_mmap_handler+0x88/0x1f8
>>   binder_mmap+0x90/0x10c
>>   mmap_region+0x44c/0xc14
>>   do_mmap+0x518/0x680
>>   vm_mmap_pgoff+0x15c/0x378
>>   ksys_mmap_pgoff+0x80/0x108
>>   __arm64_sys_mmap+0x38/0x48
>>   el0_svc_common+0xd4/0x270
>>   el0_svc+0x28/0x98
>>   el0_sync_handler+0x8c/0xf0
>>   el0_sync+0x1b4/0x1c0
>> Mem-Info:
>> active_anon:47096 inactive_anon:57927 isolated_anon:100
>> active_file:43790 inactive_file:44434 isolated_file:0
>> unevictable:14693 dirty:171 writeback:0\x0a slab_reclaimable:21676
>> slab_unreclaimable:81771\x0a mapped:84485 shmem:4275 pagetables:33367
>> bounce:0\x0a free:3772 free_pcp:198 free_cma:11
>> Node 0 active_anon:188384kB inactive_anon:231708kB active_file:175160kB
>> inactive_file:177736kB unevictable:58772kB isolated(anon):400kB
>> isolated(file):0kB mapped:337940kB dirty:684kB writeback:0kB
>> shmem:17100kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB
>> writeback_tmp:0kB kernel_stack:84960kB shadow_call_stack:21340kB
>> Normal free:15088kB min:8192kB low:42616kB high:46164kB
>> reserved_highatomic:4096KB active_anon:187644kB inactive_anon:231608kB
>> active_file:174552kB inactive_file:178012kB unevictable:58772kB
>> writepending:684kB present:3701440kB managed:3550144kB mlocked:58508kB
>> pagetables:133468kB bounce:0kB free_pcp:1048kB local_pcp:12kB
>> free_cma:44kB
>> Normal: 3313*4kB (UMEH) 165*8kB (UMEH) 35*16kB (H) 15*32kB (H) 0*64kB
>> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 15612kB
>> 108356 total pagecache pages
> Think about indenting this stacktrace. IMO, the v1 had a commit log that
> was much easier to follow.
Hmm, okay, your suggestion is good. I will consider updating another 
version later as per your suggestion, and trim the stack.
>> 2.We use kvcalloc to allocate memory, which can reduce system OOM
>> occurrences, as well as decrease the time and probability of failure
>> for order3 memory allocations. Additionally, it can also improve the
>> throughput of binder (as verified by Google's binder_benchmark testing
>> tool).
>>
>> 3.We have conducted multiple tests on an 12GB memory phone, and the
>> performance of kvcalloc is better. Below is a partial excerpt of the
>> test data.
>> throughput = (size * Iterations)/Time
> Huh? Do you have an explanation for this performance improvement?
> Did you test this under memory pressure?
Hmm, in our mobile project, we often encounter OOM and application 
crashes under stress testing.
> My understanding is that kvcalloc() == kcalloc() if there is enough
> contiguous memory no?
>
> I would expect the performance to be the same at best.

1.The main reason is memory fragmentation, where we are unable to 
allocate contiguous order3 memory. Additionally, using the GFP_KERNEL 
allocation flag in the kernel's __alloc_pages_slowpath function results 
in multiple retry attempts, and if direct_reclaim and memory_compact are 
unsuccessful, OOM occurs.

2.When fragmentation is severe, we observed that kvmalloc is faster than 
kmalloc, as it eliminates the need for multiple retry attempts to 
allocate order3. In such cases, falling back to order0 may result in 
higher allocation efficiency.

3.Another crucial point is that in the kernel, allocations greater than 
order3 are considered PAGE_ALLOC_COSTLY_ORDER. This leads to a reduced 
number of retry attempts in __alloc_pages_slowpath, which explains the 
increased time for order3 allocation in fragmented scenarios.

In summary, under high memory pressure scenarios, the system is prone to 
fragmentation. Instead of waiting for order3 allocation, it is more 
efficient to allow kvmalloc to automatically select between order0 and 
order3, reducing wait times in high memory pressure scenarios. This is 
also the reason why kvmalloc can improve throughput.

>> kvcalloc->kvmalloc:
>> Benchmark-kvcalloc	Time	CPU	Iterations	throughput(Gb/s)
>> ----------------------------------------------------------------
>> BM_sendVec_binder-4096	30926 ns	20481 ns	34457	4563.66↑
>> BM_sendVec_binder-8192	42667 ns	30837 ns	22631	4345.11↑
>> BM_sendVec_binder-16384	67586 ns	52381 ns	13318	3228.51↑
>> BM_sendVec_binder-32768	116496 ns	94893 ns	7416	2085.97↑
>> BM_sendVec_binder-65536	265482 ns	209214 ns	3530	871.40↑
>>
>> kcalloc->kmalloc
>> Benchmark-kcalloc	Time	CPU	Iterations	throughput(Gb/s)
>> ----------------------------------------------------------------
>> BM_sendVec_binder-4096	39070 ns	24207 ns	31063	3256.56
>> BM_sendVec_binder-8192	49476 ns	35099 ns	18817	3115.62
>> BM_sendVec_binder-16384	76866 ns	58924 ns	11883	2532.86
>> BM_sendVec_binder-32768	134022 ns	102788 ns	6535	1597.78
>> BM_sendVec_binder-65536	281004 ns	220028 ns	3135	731.14
>>
>> Signed-off-by: Lei Liu <liulei.rjpt@vivo.com>
>> ---
>> Changelog:
>> v2->v3:
>> 1.Modify the commit message description as the description for the V2
>>    version is unclear.
> The complete history of the changelog would be better.
>
>> ---
>>   drivers/android/binder_alloc.c | 6 +++---
>>   1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
>> index 2e1f261ec5c8..5dcab4a5e341 100644
>> --- a/drivers/android/binder_alloc.c
>> +++ b/drivers/android/binder_alloc.c
>> @@ -836,7 +836,7 @@ int binder_alloc_mmap_handler(struct binder_alloc *alloc,
>>   
>>   	alloc->buffer = vma->vm_start;
>>   
>> -	alloc->pages = kcalloc(alloc->buffer_size / PAGE_SIZE,
>> +	alloc->pages = kvcalloc(alloc->buffer_size / PAGE_SIZE,
>>   			       sizeof(alloc->pages[0]),
>>   			       GFP_KERNEL);
> I believe Greg had asked for these to be aligned to the parenthesis.
> You can double check by running checkpatch with the -strict flag.
Okay, I'll double check the format of the paths again.
>>   	if (alloc->pages == NULL) {
>> @@ -869,7 +869,7 @@ int binder_alloc_mmap_handler(struct binder_alloc *alloc,
>>   	return 0;
>>   
>>   err_alloc_buf_struct_failed:
>> -	kfree(alloc->pages);
>> +	kvfree(alloc->pages);
>>   	alloc->pages = NULL;
>>   err_alloc_pages_failed:
>>   	alloc->buffer = 0;
>> @@ -939,7 +939,7 @@ void binder_alloc_deferred_release(struct binder_alloc *alloc)
>>   			__free_page(alloc->pages[i].page_ptr);
>>   			page_count++;
>>   		}
>> -		kfree(alloc->pages);
>> +		kvfree(alloc->pages);
>>   	}
>>   	spin_unlock(&alloc->lock);
>>   	if (alloc->mm)
>> -- 
>> 2.34.1
>>
> I'm not so sure about the results and performance improvements that are
> claimed here. However, the switch to kvcalloc() itself seems reasonable
> to me.
>
> I'll run these tests myself as the results might have some noise. I'll
> get back with the results.
>
> Thanks,
> Carlos Llamas

Okay, thank you for the suggestion. I look forward to receiving your 
test results and continuing our discussion.

My testing tool is the binder throughput testing tool provided by 
Google. You can give it a try here:

https://source.android.com/docs/core/tests/vts/performance


Thanks,

Lei liu

>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3] binder_alloc: Replace kcalloc with kvcalloc to mitigate OOM issues
  2024-06-17  4:01   ` Lei Liu
@ 2024-06-17 18:43     ` Carlos Llamas
  2024-06-18  2:50       ` Lei Liu
  0 siblings, 1 reply; 9+ messages in thread
From: Carlos Llamas @ 2024-06-17 18:43 UTC (permalink / raw)
  To: Lei Liu
  Cc: Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
	Martijn Coenen, Joel Fernandes, Christian Brauner,
	Suren Baghdasaryan, linux-kernel, opensource.kernel

On Mon, Jun 17, 2024 at 12:01:26PM +0800, Lei Liu wrote:
> On 6/15/2024 at 2:38, Carlos Llamas wrote:
> > My understanding is that kvcalloc() == kcalloc() if there is enough
> > contiguous memory no?
> > 
> > I would expect the performance to be the same at best.
> 
> 1.The main reason is memory fragmentation, where we are unable to
> allocate contiguous order3 memory. Additionally, using the GFP_KERNEL
> allocation flag in the kernel's __alloc_pages_slowpath function results
> in multiple retry attempts, and if direct_reclaim and memory_compact
> are unsuccessful, OOM occurs.
> 
> 2.When fragmentation is severe, we observed that kvmalloc is faster
> than kmalloc, as it eliminates the need for multiple retry attempts to
> allocate order3. In such cases, falling back to order0 may result in
> higher allocation efficiency.
> 
> 3.Another crucial point is that in the kernel, allocations greater than
> order3 are considered PAGE_ALLOC_COSTLY_ORDER. This leads to a reduced
> number of retry attempts in __alloc_pages_slowpath, which explains the
> increased time for order3 allocation in fragmented scenarios.
> 
> In summary, under high memory pressure scenarios, the system is prone
> to fragmentation. Instead of waiting for order3 allocation, it is more
> efficient to allow kvmalloc to automatically select between order0 and
> order3, reducing wait times in high memory pressure scenarios. This is
> also the reason why kvmalloc can improve throughput.

Yes, all this makes sense. What I don't understand is how "performance
of kvcalloc is better". This is not supposed to be.

> > I'm not so sure about the results and performance improvements that are
> > claimed here. However, the switch to kvcalloc() itself seems reasonable
> > to me.
> > 
> > I'll run these tests myself as the results might have some noise. I'll
> > get back with the results.
> > 
> > Thanks,
> > Carlos Llamas
> 
> Okay, thank you for the suggestion. I look forward to receiving your
> test results and continuing our discussion.
> 

I ran several iterations of the benchmark test on a Pixel device and as
expected I didn't see any significant differences. This is a good thing,
but either we need to understand how you obtained a better performance
from using kvcalloc(), or it would be better to drop this claim from the
commit log.

The following are two individual samples of each form. However, if we
could average the output and get rid of the noise it seems the numbers
are pretty much the same.

Sample with kcalloc():
------------------------------------------------------------------
Benchmark                        Time             CPU   Iterations
------------------------------------------------------------------
BM_sendVec_binder/4          19983 ns         9832 ns        60255
BM_sendVec_binder/8          19766 ns         9690 ns        71699
BM_sendVec_binder/16         19785 ns         9722 ns        72086
BM_sendVec_binder/32         20067 ns         9864 ns        71535
BM_sendVec_binder/64         20077 ns         9941 ns        69141
BM_sendVec_binder/128        20147 ns         9944 ns        71016
BM_sendVec_binder/256        20424 ns        10044 ns        69451
BM_sendVec_binder/512        20518 ns        10064 ns        69179
BM_sendVec_binder/1024       21073 ns        10319 ns        67599
BM_sendVec_binder/2048       21482 ns        10502 ns        66767
BM_sendVec_binder/4096       22308 ns        10809 ns        63841
BM_sendVec_binder/8192       24022 ns        11649 ns        60795
BM_sendVec_binder/16384      27172 ns        13426 ns        51940
BM_sendVec_binder/32768      32853 ns        16345 ns        42211
BM_sendVec_binder/65536      80177 ns        39787 ns        17557

Sample with kvalloc():
------------------------------------------------------------------
Benchmark                        Time             CPU   Iterations
------------------------------------------------------------------
BM_sendVec_binder/4          19900 ns         9711 ns        68626
BM_sendVec_binder/8          19903 ns         9756 ns        71524
BM_sendVec_binder/16         19601 ns         9541 ns        71069
BM_sendVec_binder/32         19514 ns         9530 ns        72469
BM_sendVec_binder/64         20042 ns        10006 ns        69753
BM_sendVec_binder/128        20142 ns         9965 ns        70392
BM_sendVec_binder/256        20274 ns         9958 ns        70173
BM_sendVec_binder/512        20305 ns         9966 ns        70347
BM_sendVec_binder/1024       20883 ns        10250 ns        67813
BM_sendVec_binder/2048       21364 ns        10455 ns        67366
BM_sendVec_binder/4096       22350 ns        10888 ns        65689
BM_sendVec_binder/8192       24113 ns        11707 ns        58149
BM_sendVec_binder/16384      27122 ns        13346 ns        52515
BM_sendVec_binder/32768      32158 ns        15901 ns        44139
BM_sendVec_binder/65536      87594 ns        43627 ns        16040

To reiterate, the switch to kvcalloc() sounds good to me. Let's just fix
the commit log and Greg's suggestions too.

Thanks,
Carlos Llamas

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3] binder_alloc: Replace kcalloc with kvcalloc to mitigate OOM issues
  2024-06-17 18:43     ` Carlos Llamas
@ 2024-06-18  2:50       ` Lei Liu
  2024-06-18  4:37         ` Carlos Llamas
  0 siblings, 1 reply; 9+ messages in thread
From: Lei Liu @ 2024-06-18  2:50 UTC (permalink / raw)
  To: Carlos Llamas
  Cc: Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
	Martijn Coenen, Joel Fernandes, Christian Brauner,
	Suren Baghdasaryan, linux-kernel, opensource.kernel


On 2024/6/18 2:43, Carlos Llamas wrote:
> On Mon, Jun 17, 2024 at 12:01:26PM +0800, Lei Liu wrote:
>> On 6/15/2024 at 2:38, Carlos Llamas wrote:
>>> My understanding is that kvcalloc() == kcalloc() if there is enough
>>> contiguous memory no?
>>>
>>> I would expect the performance to be the same at best.
>> 1.The main reason is memory fragmentation, where we are unable to
>> allocate contiguous order3 memory. Additionally, using the GFP_KERNEL
>> allocation flag in the kernel's __alloc_pages_slowpath function results
>> in multiple retry attempts, and if direct_reclaim and memory_compact
>> are unsuccessful, OOM occurs.
>>
>> 2.When fragmentation is severe, we observed that kvmalloc is faster
>> than kmalloc, as it eliminates the need for multiple retry attempts to
>> allocate order3. In such cases, falling back to order0 may result in
>> higher allocation efficiency.
>>
>> 3.Another crucial point is that in the kernel, allocations greater than
>> order3 are considered PAGE_ALLOC_COSTLY_ORDER. This leads to a reduced
>> number of retry attempts in __alloc_pages_slowpath, which explains the
>> increased time for order3 allocation in fragmented scenarios.
>>
>> In summary, under high memory pressure scenarios, the system is prone
>> to fragmentation. Instead of waiting for order3 allocation, it is more
>> efficient to allow kvmalloc to automatically select between order0 and
>> order3, reducing wait times in high memory pressure scenarios. This is
>> also the reason why kvmalloc can improve throughput.
> Yes, all this makes sense. What I don't understand is how "performance
> of kvcalloc is better". This is not supposed to be.

Based on my current understanding:
1.kvmalloc may allocate memory faster than kmalloc in cases of memory 
fragmentation, which could potentially improve the performance of binder.
2.Memory allocated by kvmalloc may not be contiguous, which could 
potentially degrade the data read and write speed of binder.

I'm uncertain about the relative impact of the points mentioned above. 
I'm interested in hearing your perspective on this matter.

>>> I'm not so sure about the results and performance improvements that are
>>> claimed here. However, the switch to kvcalloc() itself seems reasonable
>>> to me.
>>>
>>> I'll run these tests myself as the results might have some noise. I'll
>>> get back with the results.
>>>
>>> Thanks,
>>> Carlos Llamas
>> Okay, thank you for the suggestion. I look forward to receiving your
>> test results and continuing our discussion.
>>
> I ran several iterations of the benchmark test on a Pixel device and as
> expected I didn't see any significant differences. This is a good thing,
> but either we need to understand how you obtained a better performance
> from using kvcalloc(), or it would be better to drop this claim from the
> commit log.
>
> The following are two individual samples of each form. However, if we
> could average the output and get rid of the noise it seems the numbers
> are pretty much the same.
>
> Sample with kcalloc():
> ------------------------------------------------------------------
> Benchmark                        Time             CPU   Iterations
> ------------------------------------------------------------------
> BM_sendVec_binder/4          19983 ns         9832 ns        60255
> BM_sendVec_binder/8          19766 ns         9690 ns        71699
> BM_sendVec_binder/16         19785 ns         9722 ns        72086
> BM_sendVec_binder/32         20067 ns         9864 ns        71535
> BM_sendVec_binder/64         20077 ns         9941 ns        69141
> BM_sendVec_binder/128        20147 ns         9944 ns        71016
> BM_sendVec_binder/256        20424 ns        10044 ns        69451
> BM_sendVec_binder/512        20518 ns        10064 ns        69179
> BM_sendVec_binder/1024       21073 ns        10319 ns        67599
> BM_sendVec_binder/2048       21482 ns        10502 ns        66767
> BM_sendVec_binder/4096       22308 ns        10809 ns        63841
> BM_sendVec_binder/8192       24022 ns        11649 ns        60795
> BM_sendVec_binder/16384      27172 ns        13426 ns        51940
> BM_sendVec_binder/32768      32853 ns        16345 ns        42211
> BM_sendVec_binder/65536      80177 ns        39787 ns        17557
>
> Sample with kvalloc():
> ------------------------------------------------------------------
> Benchmark                        Time             CPU   Iterations
> ------------------------------------------------------------------
> BM_sendVec_binder/4          19900 ns         9711 ns        68626
> BM_sendVec_binder/8          19903 ns         9756 ns        71524
> BM_sendVec_binder/16         19601 ns         9541 ns        71069
> BM_sendVec_binder/32         19514 ns         9530 ns        72469
> BM_sendVec_binder/64         20042 ns        10006 ns        69753
> BM_sendVec_binder/128        20142 ns         9965 ns        70392
> BM_sendVec_binder/256        20274 ns         9958 ns        70173
> BM_sendVec_binder/512        20305 ns         9966 ns        70347
> BM_sendVec_binder/1024       20883 ns        10250 ns        67813
> BM_sendVec_binder/2048       21364 ns        10455 ns        67366
> BM_sendVec_binder/4096       22350 ns        10888 ns        65689
> BM_sendVec_binder/8192       24113 ns        11707 ns        58149
> BM_sendVec_binder/16384      27122 ns        13346 ns        52515
> BM_sendVec_binder/32768      32158 ns        15901 ns        44139
> BM_sendVec_binder/65536      87594 ns        43627 ns        16040
>
> To reiterate, the switch to kvcalloc() sounds good to me. Let's just fix
> the commit log and Greg's suggestions too.
>
> Thanks,
> Carlos Llamas

Hmm, this is really good news. From the current test results, it seems 
that kvmalloc does not degrade performance for binder.

I will retest the data on our phone to see if we reach the same 
conclusion. If kvmalloc still proves to be better, we will provide you 
with the reproduction method.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3] binder_alloc: Replace kcalloc with kvcalloc to mitigate OOM issues
  2024-06-18  2:50       ` Lei Liu
@ 2024-06-18  4:37         ` Carlos Llamas
  2024-06-19  8:35           ` Lei Liu
  2024-06-19  8:44           ` Lei Liu
  0 siblings, 2 replies; 9+ messages in thread
From: Carlos Llamas @ 2024-06-18  4:37 UTC (permalink / raw)
  To: Lei Liu
  Cc: Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
	Martijn Coenen, Joel Fernandes, Christian Brauner,
	Suren Baghdasaryan, linux-kernel, opensource.kernel

On Tue, Jun 18, 2024 at 10:50:17AM +0800, Lei Liu wrote:
> 
> On 2024/6/18 2:43, Carlos Llamas wrote:
> > On Mon, Jun 17, 2024 at 12:01:26PM +0800, Lei Liu wrote:
> > > On 6/15/2024 at 2:38, Carlos Llamas wrote:
> > Yes, all this makes sense. What I don't understand is how "performance
> > of kvcalloc is better". This is not supposed to be.
> 
> Based on my current understanding:
> 1.kvmalloc may allocate memory faster than kmalloc in cases of memory
> fragmentation, which could potentially improve the performance of binder.

I think there is a misunderstanding of the allocations performed in this
benchmark test. Yes, in general when there is heavy memory pressure it
can be faster to use kvmalloc() and not try too hard to reclaim
contiguous memory.

In the case of binder though, this is the mmap() allocation. This call
is part of the "initial setup". In the test, there should only be two
calls to kvmalloc(), since the benchmark is done across two processes.
That's it.

So the time it takes to allocate this memory is irrelevant to the
performance results. Does this make sense?

> 2.Memory allocated by kvmalloc may not be contiguous, which could
> potentially degrade the data read and write speed of binder.

This _is_ what is being considered in the benchmark test instead. There
are repeated accesses to alloc->pages[n]. Your point is then the reason
why I was expecting "same performance at best".

> Hmm, this is really good news. From the current test results, it seems that
> kvmalloc does not degrade performance for binder.

Yeah, not in the "happy" case anyways. I'm not sure what the numbers
look like under some memory pressure.

> I will retest the data on our phone to see if we reach the same conclusion.
> If kvmalloc still proves to be better, we will provide you with the
> reproduction method.
> 
Ok, thanks. I would suggest you do an "adb shell stop" before running
these test. This might help with the noise.

Thanks,
Carlos Llamas

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3] binder_alloc: Replace kcalloc with kvcalloc to mitigate OOM issues
  2024-06-18  4:37         ` Carlos Llamas
@ 2024-06-19  8:35           ` Lei Liu
  2024-06-19  8:44           ` Lei Liu
  1 sibling, 0 replies; 9+ messages in thread
From: Lei Liu @ 2024-06-19  8:35 UTC (permalink / raw)
  To: Carlos Llamas
  Cc: Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
	Martijn Coenen, Joel Fernandes, Christian Brauner,
	Suren Baghdasaryan, linux-kernel, opensource.kernel


On 2024/6/18 12:37, Carlos Llamas wrote:
> On Tue, Jun 18, 2024 at 10:50:17AM +0800, Lei Liu wrote:
>> On 2024/6/18 2:43, Carlos Llamas wrote:
>>> On Mon, Jun 17, 2024 at 12:01:26PM +0800, Lei Liu wrote:
>>>> On 6/15/2024 at 2:38, Carlos Llamas wrote:
>>> Yes, all this makes sense. What I don't understand is how "performance
>>> of kvcalloc is better". This is not supposed to be.
>> Based on my current understanding:
>> 1.kvmalloc may allocate memory faster than kmalloc in cases of memory
>> fragmentation, which could potentially improve the performance of binder.
> I think there is a misunderstanding of the allocations performed in this
> benchmark test. Yes, in general when there is heavy memory pressure it
> can be faster to use kvmalloc() and not try too hard to reclaim
> contiguous memory.
>
> In the case of binder though, this is the mmap() allocation. This call
> is part of the "initial setup". In the test, there should only be two
> calls to kvmalloc(), since the benchmark is done across two processes.
> That's it.
>
> So the time it takes to allocate this memory is irrelevant to the
> performance results. Does this make sense?
>
>> 2.Memory allocated by kvmalloc may not be contiguous, which could
>> potentially degrade the data read and write speed of binder.
> This _is_ what is being considered in the benchmark test instead. There
> are repeated accesses to alloc->pages[n]. Your point is then the reason
> why I was expecting "same performance at best".
>
>> Hmm, this is really good news. From the current test results, it seems that
>> kvmalloc does not degrade performance for binder.
> Yeah, not in the "happy" case anyways. I'm not sure what the numbers
> look like under some memory pressure.
>
>> I will retest the data on our phone to see if we reach the same conclusion.
>> If kvmalloc still proves to be better, we will provide you with the
>> reproduction method.
>>
> Ok, thanks. I would suggest you do an "adb shell stop" before running
> these test. This might help with the noise.
>
> Thanks,
> Carlos Llamas

We used the "adb shell stop" command to retest the data. Now, the test 
data for kmalloc and vmalloc are basically consistent. There are a few 
instances where vmalloc may be slightly inferior, but the difference is 
not significant, within 3%. adb shell stop/ kmalloc /8+256G 
---------------------------------------------------------------------- 
Benchmark Time CPU Iterations OUTPUT OUTPUTCPU 
---------------------------------------------------------------------- 
BM_sendVec_binder4 39126 18550 38894 3.976282 8.38684 BM_sendVec_binder8 
38924 18542 37786 7.766108 16.3028 BM_sendVec_binder16 38328 18228 36700 
15.32039 32.2141 BM_sendVec_binder32 38154 18215 38240 32.07213 67.1798 
BM_sendVec_binder64 39093 18809 36142 59.16885 122.977 
BM_sendVec_binder128 40169 19188 36461 116.1843 243.2253 
BM_sendVec_binder256 40695 19559 35951 226.1569 470.5484 
BM_sendVec_binder512 41446 20211 34259 423.2159 867.8743 
BM_sendVec_binder1024 44040 22939 28904 672.0639 1290.278 
BM_sendVec_binder2048 47817 25821 26595 1139.063 2109.393 
BM_sendVec_binder4096 54749 30905 22742 1701.423 3014.115 
BM_sendVec_binder8192 68316 42017 16684 2000.634 3252.858 
BM_sendVec_binder16384 95435 64081 10961 1881.752 2802.469 
BM_sendVec_binder32768 148232 107504 6510 1439.093 1984.295 
BM_sendVec_binder65536 326499 229874 3178 637.8991 906.0329 NORAML TEST 
SUM 10355.79 17188.15 stressapptest eat 2G SUM 10088.39 16625.97 adb 
shell stop/ kvmalloc /8+256G 
----------------------------------------------------------------------- 
Benchmark Time CPU Iterations OUTPUT OUTPUTCPU 
----------------------------------------------------------------------- 
BM_sendVec_binder4 39673 18832 36598 3.689965 7.773577 
BM_sendVec_binder8 39869 18969 37188 7.462038 15.68369 
BM_sendVec_binder16 39774 18896 36627 14.73405 31.01355 
BM_sendVec_binder32 40225 19125 36995 29.43045 61.90013 
BM_sendVec_binder64 40549 19529 35148 55.47544 115.1862 
BM_sendVec_binder128 41580 19892 35384 108.9262 227.6871 
BM_sendVec_binder256 41584 20059 34060 209.6806 434.6857 
BM_sendVec_binder512 42829 20899 32493 388.4381 796.0389 
BM_sendVec_binder1024 45037 23360 29251 665.0759 1282.236 
BM_sendVec_binder2048 47853 25761 27091 1159.433 2153.735 
BM_sendVec_binder4096 55574 31745 22405 1651.328 2890.877 
BM_sendVec_binder8192 70706 43693 16400 1900.105 3074.836 
BM_sendVec_binder16384 96161 64362 10793 1838.921 2747.468 
BM_sendVec_binder32768 147875 107292 6296 1395.147 1922.858 
BM_sendVec_binder65536 330324 232296 3053 605.7126 861.3209 NORAML TEST 
SUM 10033.56 16623.35 stressapptest eat 2G SUM 9958.43 16497.55 Can I 
prepare the V4 version of the patch now? Do I need to modify anything 
else in the V4 version, in addition to addressing the following two 
points? 1.Shorten the "backtrace" in the commit message. 2.Modify the 
code indentation to comply with the community's code style requirements.

Thanks,
Lei Liu


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3] binder_alloc: Replace kcalloc with kvcalloc to mitigate OOM issues
  2024-06-18  4:37         ` Carlos Llamas
  2024-06-19  8:35           ` Lei Liu
@ 2024-06-19  8:44           ` Lei Liu
  2024-06-19 23:41             ` Carlos Llamas
  1 sibling, 1 reply; 9+ messages in thread
From: Lei Liu @ 2024-06-19  8:44 UTC (permalink / raw)
  To: Carlos Llamas
  Cc: Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
	Martijn Coenen, Joel Fernandes, Christian Brauner,
	Suren Baghdasaryan, linux-kernel, opensource.kernel


On 2024/6/18 12:37, Carlos Llamas wrote:
> On Tue, Jun 18, 2024 at 10:50:17AM +0800, Lei Liu wrote:
>> On 2024/6/18 2:43, Carlos Llamas wrote:
>>> On Mon, Jun 17, 2024 at 12:01:26PM +0800, Lei Liu wrote:
>>>> On 6/15/2024 at 2:38, Carlos Llamas wrote:
>>> Yes, all this makes sense. What I don't understand is how "performance
>>> of kvcalloc is better". This is not supposed to be.
>> Based on my current understanding:
>> 1.kvmalloc may allocate memory faster than kmalloc in cases of memory
>> fragmentation, which could potentially improve the performance of binder.
> I think there is a misunderstanding of the allocations performed in this
> benchmark test. Yes, in general when there is heavy memory pressure it
> can be faster to use kvmalloc() and not try too hard to reclaim
> contiguous memory.
>
> In the case of binder though, this is the mmap() allocation. This call
> is part of the "initial setup". In the test, there should only be two
> calls to kvmalloc(), since the benchmark is done across two processes.
> That's it.
>
> So the time it takes to allocate this memory is irrelevant to the
> performance results. Does this make sense?
>
>> 2.Memory allocated by kvmalloc may not be contiguous, which could
>> potentially degrade the data read and write speed of binder.
> This _is_ what is being considered in the benchmark test instead. There
> are repeated accesses to alloc->pages[n]. Your point is then the reason
> why I was expecting "same performance at best".
>
>> Hmm, this is really good news. From the current test results, it seems that
>> kvmalloc does not degrade performance for binder.
> Yeah, not in the "happy" case anyways. I'm not sure what the numbers
> look like under some memory pressure.
>
>> I will retest the data on our phone to see if we reach the same conclusion.
>> If kvmalloc still proves to be better, we will provide you with the
>> reproduction method.
>>
> Ok, thanks. I would suggest you do an "adb shell stop" before running
> these test. This might help with the noise.
>
> Thanks,
> Carlos Llamas

We used the "adb shell stop" command to retest the data.

Now, the test data for kmalloc and vmalloc are basically consistent.

There are a few instances where vmalloc may be slightly inferior, but 
the difference is not significant, within 3%.

adb shell stop/ kmalloc /8+256G
----------------------------------------------------------------------
Benchmark                Time     CPU   Iterations  OUTPUT OUTPUTCPU
----------------------------------------------------------------------
BM_sendVec_binder4      39126    18550    38894    3.976282 8.38684
BM_sendVec_binder8      38924    18542    37786    7.766108 16.3028
BM_sendVec_binder16     38328    18228    36700    15.32039 32.2141
BM_sendVec_binder32     38154    18215    38240    32.07213 67.1798
BM_sendVec_binder64     39093    18809    36142    59.16885 122.977
BM_sendVec_binder128    40169    19188    36461    116.1843 243.2253
BM_sendVec_binder256    40695    19559    35951    226.1569 470.5484
BM_sendVec_binder512    41446    20211    34259    423.2159 867.8743
BM_sendVec_binder1024   44040    22939    28904    672.0639 1290.278
BM_sendVec_binder2048   47817    25821    26595    1139.063 2109.393
BM_sendVec_binder4096   54749    30905    22742    1701.423 3014.115
BM_sendVec_binder8192   68316    42017    16684    2000.634 3252.858
BM_sendVec_binder16384  95435    64081    10961    1881.752 2802.469
BM_sendVec_binder32768  148232  107504     6510    1439.093 1984.295
BM_sendVec_binder65536  326499  229874     3178    637.8991 906.0329
NORAML TEST                                 SUM    10355.79 17188.15
stressapptest eat 2G                        SUM    10088.39 16625.97

adb shell stop/ kvmalloc /8+256G
-----------------------------------------------------------------------
Benchmark                Time     CPU   Iterations   OUTPUT OUTPUTCPU
-----------------------------------------------------------------------
BM_sendVec_binder4       39673    18832    36598    3.689965 7.773577
BM_sendVec_binder8       39869    18969    37188    7.462038 15.68369
BM_sendVec_binder16      39774    18896    36627    14.73405 31.01355
BM_sendVec_binder32      40225    19125    36995    29.43045 61.90013
BM_sendVec_binder64      40549    19529    35148    55.47544 115.1862
BM_sendVec_binder128     41580    19892    35384    108.9262 227.6871
BM_sendVec_binder256     41584    20059    34060    209.6806 434.6857
BM_sendVec_binder512     42829    20899    32493    388.4381 796.0389
BM_sendVec_binder1024    45037    23360    29251    665.0759 1282.236
BM_sendVec_binder2048    47853    25761    27091    1159.433 2153.735
BM_sendVec_binder4096    55574    31745    22405    1651.328 2890.877
BM_sendVec_binder8192    70706    43693    16400    1900.105 3074.836
BM_sendVec_binder16384   96161    64362    10793    1838.921 2747.468
BM_sendVec_binder32768  147875   107292     6296    1395.147 1922.858
BM_sendVec_binder65536  330324   232296     3053    605.7126 861.3209
NORAML TEST                                 SUM     10033.56 16623.35
stressapptest eat 2G                        SUM      9958.43 16497.55


Can I prepare the V4 version of the patch now? Do I need to modify 
anything else in the V4 version, in addition to addressing the following 
two points?

1.Shorten the "backtrace" in the commit message.

2.Modify the code indentation to comply with the community's code style 
requirements.

thanks

Lei liu

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3] binder_alloc: Replace kcalloc with kvcalloc to mitigate OOM issues
  2024-06-19  8:44           ` Lei Liu
@ 2024-06-19 23:41             ` Carlos Llamas
  0 siblings, 0 replies; 9+ messages in thread
From: Carlos Llamas @ 2024-06-19 23:41 UTC (permalink / raw)
  To: Lei Liu
  Cc: Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
	Martijn Coenen, Joel Fernandes, Christian Brauner,
	Suren Baghdasaryan, linux-kernel, opensource.kernel

On Wed, Jun 19, 2024 at 04:44:07PM +0800, Lei Liu wrote:
> We used the "adb shell stop" command to retest the data.
> 
> Now, the test data for kmalloc and vmalloc are basically consistent.

Ok, this matches my observations too.

> Can I prepare the V4 version of the patch now? Do I need to modify anything
> else in the V4 version, in addition to addressing the following two points?
> 
> 1.Shorten the "backtrace" in the commit message.
> 
> 2.Modify the code indentation to comply with the community's code style
> requirements.

Yeap, that would be all. Thanks.

Carlos Llamas

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-06-19 23:41 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-14  4:09 [PATCH v3] binder_alloc: Replace kcalloc with kvcalloc to mitigate OOM issues Lei Liu
2024-06-14 18:38 ` Carlos Llamas
2024-06-17  4:01   ` Lei Liu
2024-06-17 18:43     ` Carlos Llamas
2024-06-18  2:50       ` Lei Liu
2024-06-18  4:37         ` Carlos Llamas
2024-06-19  8:35           ` Lei Liu
2024-06-19  8:44           ` Lei Liu
2024-06-19 23:41             ` Carlos Llamas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox