* [PATCH v4 0/2] perf/core: Bail out early if the request AUX area is out of bound
@ 2023-08-04 7:29 Shuai Xue
2023-08-04 7:29 ` [PATCH v4 1/2] " Shuai Xue
2023-08-04 7:29 ` [PATCH v4 2/2] perf record: Update docs regarding the maximum limitation of AUX area Shuai Xue
0 siblings, 2 replies; 10+ messages in thread
From: Shuai Xue @ 2023-08-04 7:29 UTC (permalink / raw)
To: alexander.shishkin, peterz, james.clark, leo.yan
Cc: mingo, baolin.wang, acme, mark.rutland, jolsa, namhyung, irogers,
adrian.hunter, linux-perf-users, linux-kernel, nathan, bpf
changes since v3 (per James):
- change to return with existing value -ENOMEM
- split doc part to a separate patch since they are going to be merged through
separate trees.
changes since v2:
- remove unnecessary overflow check (per Peter)
changes since v1:
- drop out patch2 because it has been fixed on upstream (Thanks James for reminding)
- move sanity check into rb_alloc_aux (per Leo)
- add overflow check (per James)
Shuai Xue (2):
perf/core: Bail out early if the request AUX area is out of bound
perf record: Update docs regarding the maximum limitation of AUX area
kernel/events/ring_buffer.c | 3 +++
tools/perf/Documentation/perf-record.txt | 3 +++
2 files changed, 6 insertions(+)
--
2.39.3
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v4 1/2] perf/core: Bail out early if the request AUX area is out of bound
2023-08-04 7:29 [PATCH v4 0/2] perf/core: Bail out early if the request AUX area is out of bound Shuai Xue
@ 2023-08-04 7:29 ` Shuai Xue
2023-08-04 8:59 ` Leo Yan
2023-08-04 7:29 ` [PATCH v4 2/2] perf record: Update docs regarding the maximum limitation of AUX area Shuai Xue
1 sibling, 1 reply; 10+ messages in thread
From: Shuai Xue @ 2023-08-04 7:29 UTC (permalink / raw)
To: alexander.shishkin, peterz, james.clark, leo.yan
Cc: mingo, baolin.wang, acme, mark.rutland, jolsa, namhyung, irogers,
adrian.hunter, linux-perf-users, linux-kernel, nathan, bpf
When perf-record with a large AUX area, e.g 4GB, it fails with:
#perf record -C 0 -m ,4G -e arm_spe_0// -- sleep 1
failed to mmap with 12 (Cannot allocate memory)
and it reveals a WARNING with __alloc_pages():
[ 66.595604] ------------[ cut here ]------------
[ 66.600206] WARNING: CPU: 44 PID: 17573 at mm/page_alloc.c:5568 __alloc_pages+0x1ec/0x248
[ 66.608375] Modules linked in: ip6table_filter(E) ip6_tables(E) iptable_filter(E) ebtable_nat(E) ebtables(E) aes_ce_blk(E) vfat(E) fat(E) aes_ce_cipher(E) crct10dif_ce(E) ghash_ce(E) sm4_ce_cipher(E) sm4(E) sha2_ce(E) sha256_arm64(E) sha1_ce(E) acpi_ipmi(E) sbsa_gwdt(E) sg(E) ipmi_si(E) ipmi_devintf(E) ipmi_msghandler(E) ip_tables(E) sd_mod(E) ast(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) nvme(E) sysimgblt(E) i2c_algo_bit(E) nvme_core(E) drm_shmem_helper(E) ahci(E) t10_pi(E) libahci(E) drm(E) crc64_rocksoft(E) i40e(E) crc64(E) libata(E) i2c_core(E)
[ 66.657719] CPU: 44 PID: 17573 Comm: perf Kdump: loaded Tainted: G E 6.3.0-rc4+ #58
[ 66.666749] Hardware name: Default Default/Default, BIOS 1.2.M1.AL.P.139.00 03/22/2023
[ 66.674650] pstate: 23400009 (nzCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
[ 66.681597] pc : __alloc_pages+0x1ec/0x248
[ 66.685680] lr : __kmalloc_large_node+0xc0/0x1f8
[ 66.690285] sp : ffff800020523980
[ 66.693585] pmr_save: 000000e0
[ 66.696624] x29: ffff800020523980 x28: ffff000832975800 x27: 0000000000000000
[ 66.703746] x26: 0000000000100000 x25: 0000000000100000 x24: ffff8000083615d0
[ 66.710866] x23: 0000000000040dc0 x22: ffff000823d6d140 x21: 000000000000000b
[ 66.717987] x20: 000000000000000b x19: 0000000000000000 x18: 0000000000000030
[ 66.725108] x17: 0000000000000000 x16: ffff800008f05be8 x15: ffff000823d6d6d0
[ 66.732229] x14: 0000000000000000 x13: 343373656761705f x12: 726e202c30206574
[ 66.739350] x11: 00000000ffff7fff x10: 00000000ffff7fff x9 : ffff8000083af570
[ 66.746471] x8 : 00000000000bffe8 x7 : c0000000ffff7fff x6 : 000000000005fff4
[ 66.753592] x5 : 0000000000000000 x4 : ffff000823d6d8d8 x3 : 0000000000000000
[ 66.760713] x2 : 0000000000000000 x1 : 0000000000000001 x0 : 0000000000040dc0
[ 66.767834] Call trace:
[ 66.770267] __alloc_pages+0x1ec/0x248
[ 66.774003] __kmalloc_large_node+0xc0/0x1f8
[ 66.778259] __kmalloc_node+0x134/0x1e8
[ 66.782081] rb_alloc_aux+0xe0/0x298
[ 66.785643] perf_mmap+0x440/0x660
[ 66.789031] mmap_region+0x308/0x8a8
[ 66.792593] do_mmap+0x3c0/0x528
[ 66.795807] vm_mmap_pgoff+0xf4/0x1b8
[ 66.799456] ksys_mmap_pgoff+0x18c/0x218
[ 66.803365] __arm64_sys_mmap+0x38/0x58
[ 66.807187] invoke_syscall+0x50/0x128
[ 66.810922] el0_svc_common.constprop.0+0x58/0x188
[ 66.815698] do_el0_svc+0x34/0x50
[ 66.818999] el0_svc+0x34/0x108
[ 66.822127] el0t_64_sync_handler+0xb8/0xc0
[ 66.826296] el0t_64_sync+0x1a4/0x1a8
[ 66.829946] ---[ end trace 0000000000000000 ]---
'rb->aux_pages' allocated by kcalloc() is a pointer array which is used to
maintains AUX trace pages. The allocated page for this array is physically
contiguous (and virtually contiguous) with an order of 0..MAX_ORDER. If the
size of pointer array crosses the limitation set by MAX_ORDER, it reveals a
WARNING.
So bail out early with -ENOMEM if the request AUX area is out of bound,
e.g.:
#perf record -C 0 -m ,4G -e arm_spe_0// -- sleep 1
failed to mmap with 12 (Cannot allocate memory)
Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
---
kernel/events/ring_buffer.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index a0433f37b024..c445e927368d 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -699,6 +699,9 @@ int rb_alloc_aux(struct perf_buffer *rb, struct perf_event *event,
watermark = 0;
}
+ /* Can't allocate more than MAX_ORDER */
+ if (get_order((unsigned long)nr_pages * sizeof(void *)) > MAX_ORDER)
+ return -ENOMEM;
rb->aux_pages = kcalloc_node(nr_pages, sizeof(void *), GFP_KERNEL,
node);
if (!rb->aux_pages)
--
2.39.3
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v4 2/2] perf record: Update docs regarding the maximum limitation of AUX area
2023-08-04 7:29 [PATCH v4 0/2] perf/core: Bail out early if the request AUX area is out of bound Shuai Xue
2023-08-04 7:29 ` [PATCH v4 1/2] " Shuai Xue
@ 2023-08-04 7:29 ` Shuai Xue
2023-08-04 8:46 ` Leo Yan
1 sibling, 1 reply; 10+ messages in thread
From: Shuai Xue @ 2023-08-04 7:29 UTC (permalink / raw)
To: alexander.shishkin, peterz, james.clark, leo.yan
Cc: mingo, baolin.wang, acme, mark.rutland, jolsa, namhyung, irogers,
adrian.hunter, linux-perf-users, linux-kernel, nathan, bpf
The maximum AUX area is limited by the page size of the system. Update
the documentation to reflect this.
Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
---
tools/perf/Documentation/perf-record.txt | 3 +++
1 file changed, 3 insertions(+)
diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index 680396c56bd1..b0ee7b63da0e 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -292,6 +292,9 @@ OPTIONS
Also, by adding a comma, the number of mmap pages for AUX
area tracing can be specified.
+ The maximum AUX area is limited by the page size of the system. For
+ example with 4K pages configured, the maximum is 2GiB.
+
-g::
Enables call-graph (stack chain/backtrace) recording for both
kernel space and user space.
--
2.39.3
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v4 2/2] perf record: Update docs regarding the maximum limitation of AUX area
2023-08-04 7:29 ` [PATCH v4 2/2] perf record: Update docs regarding the maximum limitation of AUX area Shuai Xue
@ 2023-08-04 8:46 ` Leo Yan
2023-08-04 11:47 ` Shuai Xue
0 siblings, 1 reply; 10+ messages in thread
From: Leo Yan @ 2023-08-04 8:46 UTC (permalink / raw)
To: Shuai Xue
Cc: alexander.shishkin, peterz, james.clark, mingo, baolin.wang, acme,
mark.rutland, jolsa, namhyung, irogers, adrian.hunter,
linux-perf-users, linux-kernel, nathan, bpf
On Fri, Aug 04, 2023 at 03:29:45PM +0800, Shuai Xue wrote:
> The maximum AUX area is limited by the page size of the system. Update
> the documentation to reflect this.
>
> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
> ---
> tools/perf/Documentation/perf-record.txt | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
> index 680396c56bd1..b0ee7b63da0e 100644
> --- a/tools/perf/Documentation/perf-record.txt
> +++ b/tools/perf/Documentation/perf-record.txt
> @@ -292,6 +292,9 @@ OPTIONS
> Also, by adding a comma, the number of mmap pages for AUX
> area tracing can be specified.
>
> + The maximum AUX area is limited by the page size of the system. For
> + example with 4K pages configured, the maximum is 2GiB.
> +
This statement is incorrect as it fails to give out prerequisites.
E.g., on Arm64, for 4KiB, 16KiB or 64KiB base page size, different page
size has different default values for MAX_ORDER. Furthermore, MAX_ORDER
can be set by config ARCH_FORCE_MAX_ORDER, thus we cannot arbitrarily
say the maximum allocation size is 2GiB for 4KiB page size.
Maybe we could consider to use a formula to present the avaliable
maximum buffer size:
PAGE_SIZE << MAX_ORDER
( ---------------------- ) * PAGE_SIZE
sizeof(page_pointer)
PAGE_SIZE << MAX_ORDER : the size of maximal physically
contiguous allocations, which is the maximum size can be
allocated by slab/slub.
Thanks,
Leo
> -g::
> Enables call-graph (stack chain/backtrace) recording for both
> kernel space and user space.
> --
> 2.39.3
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v4 1/2] perf/core: Bail out early if the request AUX area is out of bound
2023-08-04 7:29 ` [PATCH v4 1/2] " Shuai Xue
@ 2023-08-04 8:59 ` Leo Yan
2023-08-04 11:24 ` Shuai Xue
0 siblings, 1 reply; 10+ messages in thread
From: Leo Yan @ 2023-08-04 8:59 UTC (permalink / raw)
To: Shuai Xue
Cc: alexander.shishkin, peterz, james.clark, mingo, baolin.wang, acme,
mark.rutland, jolsa, namhyung, irogers, adrian.hunter,
linux-perf-users, linux-kernel, nathan, bpf
On Fri, Aug 04, 2023 at 03:29:44PM +0800, Shuai Xue wrote:
> When perf-record with a large AUX area, e.g 4GB, it fails with:
>
> #perf record -C 0 -m ,4G -e arm_spe_0// -- sleep 1
> failed to mmap with 12 (Cannot allocate memory)
>
> and it reveals a WARNING with __alloc_pages():
>
> [ 66.595604] ------------[ cut here ]------------
> [ 66.600206] WARNING: CPU: 44 PID: 17573 at mm/page_alloc.c:5568 __alloc_pages+0x1ec/0x248
> [ 66.608375] Modules linked in: ip6table_filter(E) ip6_tables(E) iptable_filter(E) ebtable_nat(E) ebtables(E) aes_ce_blk(E) vfat(E) fat(E) aes_ce_cipher(E) crct10dif_ce(E) ghash_ce(E) sm4_ce_cipher(E) sm4(E) sha2_ce(E) sha256_arm64(E) sha1_ce(E) acpi_ipmi(E) sbsa_gwdt(E) sg(E) ipmi_si(E) ipmi_devintf(E) ipmi_msghandler(E) ip_tables(E) sd_mod(E) ast(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) nvme(E) sysimgblt(E) i2c_algo_bit(E) nvme_core(E) drm_shmem_helper(E) ahci(E) t10_pi(E) libahci(E) drm(E) crc64_rocksoft(E) i40e(E) crc64(E) libata(E) i2c_core(E)
> [ 66.657719] CPU: 44 PID: 17573 Comm: perf Kdump: loaded Tainted: G E 6.3.0-rc4+ #58
> [ 66.666749] Hardware name: Default Default/Default, BIOS 1.2.M1.AL.P.139.00 03/22/2023
> [ 66.674650] pstate: 23400009 (nzCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
> [ 66.681597] pc : __alloc_pages+0x1ec/0x248
> [ 66.685680] lr : __kmalloc_large_node+0xc0/0x1f8
> [ 66.690285] sp : ffff800020523980
> [ 66.693585] pmr_save: 000000e0
> [ 66.696624] x29: ffff800020523980 x28: ffff000832975800 x27: 0000000000000000
> [ 66.703746] x26: 0000000000100000 x25: 0000000000100000 x24: ffff8000083615d0
> [ 66.710866] x23: 0000000000040dc0 x22: ffff000823d6d140 x21: 000000000000000b
> [ 66.717987] x20: 000000000000000b x19: 0000000000000000 x18: 0000000000000030
> [ 66.725108] x17: 0000000000000000 x16: ffff800008f05be8 x15: ffff000823d6d6d0
> [ 66.732229] x14: 0000000000000000 x13: 343373656761705f x12: 726e202c30206574
> [ 66.739350] x11: 00000000ffff7fff x10: 00000000ffff7fff x9 : ffff8000083af570
> [ 66.746471] x8 : 00000000000bffe8 x7 : c0000000ffff7fff x6 : 000000000005fff4
> [ 66.753592] x5 : 0000000000000000 x4 : ffff000823d6d8d8 x3 : 0000000000000000
> [ 66.760713] x2 : 0000000000000000 x1 : 0000000000000001 x0 : 0000000000040dc0
> [ 66.767834] Call trace:
> [ 66.770267] __alloc_pages+0x1ec/0x248
> [ 66.774003] __kmalloc_large_node+0xc0/0x1f8
> [ 66.778259] __kmalloc_node+0x134/0x1e8
> [ 66.782081] rb_alloc_aux+0xe0/0x298
> [ 66.785643] perf_mmap+0x440/0x660
> [ 66.789031] mmap_region+0x308/0x8a8
> [ 66.792593] do_mmap+0x3c0/0x528
> [ 66.795807] vm_mmap_pgoff+0xf4/0x1b8
> [ 66.799456] ksys_mmap_pgoff+0x18c/0x218
> [ 66.803365] __arm64_sys_mmap+0x38/0x58
> [ 66.807187] invoke_syscall+0x50/0x128
> [ 66.810922] el0_svc_common.constprop.0+0x58/0x188
> [ 66.815698] do_el0_svc+0x34/0x50
> [ 66.818999] el0_svc+0x34/0x108
> [ 66.822127] el0t_64_sync_handler+0xb8/0xc0
> [ 66.826296] el0t_64_sync+0x1a4/0x1a8
> [ 66.829946] ---[ end trace 0000000000000000 ]---
>
> 'rb->aux_pages' allocated by kcalloc() is a pointer array which is used to
> maintains AUX trace pages. The allocated page for this array is physically
> contiguous (and virtually contiguous) with an order of 0..MAX_ORDER. If the
> size of pointer array crosses the limitation set by MAX_ORDER, it reveals a
> WARNING.
>
> So bail out early with -ENOMEM if the request AUX area is out of bound,
> e.g.:
>
> #perf record -C 0 -m ,4G -e arm_spe_0// -- sleep 1
> failed to mmap with 12 (Cannot allocate memory)
>
> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
> ---
> kernel/events/ring_buffer.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
> index a0433f37b024..c445e927368d 100644
> --- a/kernel/events/ring_buffer.c
> +++ b/kernel/events/ring_buffer.c
> @@ -699,6 +699,9 @@ int rb_alloc_aux(struct perf_buffer *rb, struct perf_event *event,
> watermark = 0;
> }
>
> + /* Can't allocate more than MAX_ORDER */
The comment is confused. I'd like to refine it as:
/*
* kcalloc_node() is unable to allocate buffer if the size is larger
* than: PAGE_SIZE << MAX_ORDER; directly bail out in this case.
*/
To be honest, I am not sure if perf core maintainers like this kind
thing or not. Please seek their opinion before you move forward.
Thanks,
Leo
> + if (get_order((unsigned long)nr_pages * sizeof(void *)) > MAX_ORDER)
> + return -ENOMEM;
> rb->aux_pages = kcalloc_node(nr_pages, sizeof(void *), GFP_KERNEL,
> node);
> if (!rb->aux_pages)
> --
> 2.39.3
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v4 1/2] perf/core: Bail out early if the request AUX area is out of bound
2023-08-04 8:59 ` Leo Yan
@ 2023-08-04 11:24 ` Shuai Xue
2023-09-06 3:27 ` Shuai Xue
0 siblings, 1 reply; 10+ messages in thread
From: Shuai Xue @ 2023-08-04 11:24 UTC (permalink / raw)
To: Leo Yan, James Clark, Peter Zijlstra
Cc: alexander.shishkin, mingo, baolin.wang, acme, mark.rutland, jolsa,
namhyung, irogers, adrian.hunter, linux-perf-users, linux-kernel,
nathan, bpf
On 2023/8/4 16:59, Leo Yan wrote:
> On Fri, Aug 04, 2023 at 03:29:44PM +0800, Shuai Xue wrote:
>> When perf-record with a large AUX area, e.g 4GB, it fails with:
>>
>> #perf record -C 0 -m ,4G -e arm_spe_0// -- sleep 1
>> failed to mmap with 12 (Cannot allocate memory)
>>
>> and it reveals a WARNING with __alloc_pages():
>>
>> [ 66.595604] ------------[ cut here ]------------
>> [ 66.600206] WARNING: CPU: 44 PID: 17573 at mm/page_alloc.c:5568 __alloc_pages+0x1ec/0x248
>> [ 66.608375] Modules linked in: ip6table_filter(E) ip6_tables(E) iptable_filter(E) ebtable_nat(E) ebtables(E) aes_ce_blk(E) vfat(E) fat(E) aes_ce_cipher(E) crct10dif_ce(E) ghash_ce(E) sm4_ce_cipher(E) sm4(E) sha2_ce(E) sha256_arm64(E) sha1_ce(E) acpi_ipmi(E) sbsa_gwdt(E) sg(E) ipmi_si(E) ipmi_devintf(E) ipmi_msghandler(E) ip_tables(E) sd_mod(E) ast(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) nvme(E) sysimgblt(E) i2c_algo_bit(E) nvme_core(E) drm_shmem_helper(E) ahci(E) t10_pi(E) libahci(E) drm(E) crc64_rocksoft(E) i40e(E) crc64(E) libata(E) i2c_core(E)
>> [ 66.657719] CPU: 44 PID: 17573 Comm: perf Kdump: loaded Tainted: G E 6.3.0-rc4+ #58
>> [ 66.666749] Hardware name: Default Default/Default, BIOS 1.2.M1.AL.P.139.00 03/22/2023
>> [ 66.674650] pstate: 23400009 (nzCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
>> [ 66.681597] pc : __alloc_pages+0x1ec/0x248
>> [ 66.685680] lr : __kmalloc_large_node+0xc0/0x1f8
>> [ 66.690285] sp : ffff800020523980
>> [ 66.693585] pmr_save: 000000e0
>> [ 66.696624] x29: ffff800020523980 x28: ffff000832975800 x27: 0000000000000000
>> [ 66.703746] x26: 0000000000100000 x25: 0000000000100000 x24: ffff8000083615d0
>> [ 66.710866] x23: 0000000000040dc0 x22: ffff000823d6d140 x21: 000000000000000b
>> [ 66.717987] x20: 000000000000000b x19: 0000000000000000 x18: 0000000000000030
>> [ 66.725108] x17: 0000000000000000 x16: ffff800008f05be8 x15: ffff000823d6d6d0
>> [ 66.732229] x14: 0000000000000000 x13: 343373656761705f x12: 726e202c30206574
>> [ 66.739350] x11: 00000000ffff7fff x10: 00000000ffff7fff x9 : ffff8000083af570
>> [ 66.746471] x8 : 00000000000bffe8 x7 : c0000000ffff7fff x6 : 000000000005fff4
>> [ 66.753592] x5 : 0000000000000000 x4 : ffff000823d6d8d8 x3 : 0000000000000000
>> [ 66.760713] x2 : 0000000000000000 x1 : 0000000000000001 x0 : 0000000000040dc0
>> [ 66.767834] Call trace:
>> [ 66.770267] __alloc_pages+0x1ec/0x248
>> [ 66.774003] __kmalloc_large_node+0xc0/0x1f8
>> [ 66.778259] __kmalloc_node+0x134/0x1e8
>> [ 66.782081] rb_alloc_aux+0xe0/0x298
>> [ 66.785643] perf_mmap+0x440/0x660
>> [ 66.789031] mmap_region+0x308/0x8a8
>> [ 66.792593] do_mmap+0x3c0/0x528
>> [ 66.795807] vm_mmap_pgoff+0xf4/0x1b8
>> [ 66.799456] ksys_mmap_pgoff+0x18c/0x218
>> [ 66.803365] __arm64_sys_mmap+0x38/0x58
>> [ 66.807187] invoke_syscall+0x50/0x128
>> [ 66.810922] el0_svc_common.constprop.0+0x58/0x188
>> [ 66.815698] do_el0_svc+0x34/0x50
>> [ 66.818999] el0_svc+0x34/0x108
>> [ 66.822127] el0t_64_sync_handler+0xb8/0xc0
>> [ 66.826296] el0t_64_sync+0x1a4/0x1a8
>> [ 66.829946] ---[ end trace 0000000000000000 ]---
>>
>> 'rb->aux_pages' allocated by kcalloc() is a pointer array which is used to
>> maintains AUX trace pages. The allocated page for this array is physically
>> contiguous (and virtually contiguous) with an order of 0..MAX_ORDER. If the
>> size of pointer array crosses the limitation set by MAX_ORDER, it reveals a
>> WARNING.
>>
>> So bail out early with -ENOMEM if the request AUX area is out of bound,
>> e.g.:
>>
>> #perf record -C 0 -m ,4G -e arm_spe_0// -- sleep 1
>> failed to mmap with 12 (Cannot allocate memory)
>>
>> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
>> ---
>> kernel/events/ring_buffer.c | 3 +++
>> 1 file changed, 3 insertions(+)
>>
>> diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
>> index a0433f37b024..c445e927368d 100644
>> --- a/kernel/events/ring_buffer.c
>> +++ b/kernel/events/ring_buffer.c
>> @@ -699,6 +699,9 @@ int rb_alloc_aux(struct perf_buffer *rb, struct perf_event *event,
>> watermark = 0;
>> }
>>
>> + /* Can't allocate more than MAX_ORDER */
>
> The comment is confused. I'd like to refine it as:
>
> /*
> * kcalloc_node() is unable to allocate buffer if the size is larger
> * than: PAGE_SIZE << MAX_ORDER; directly bail out in this case.
> */
Hi, Leo,
Thank you for your quick feedback. The comment is simplified from Peter's reply in v2
version. Your refined comment is more detailed and it makes sense to me, I would like
to adopt it if @Peter has no other opinions.
> To be honest, I am not sure if perf core maintainers like this kind
> thing or not. Please seek their opinion before you move forward.
>
and hi, all perf core maintainers,
I have not received explicit objection from perf core maintainers @Peter or @James so
I moved forward to address their comments. It's fine to me to wait for more opinions from
perf core maintainers.
Best Regards,
Shuai
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v4 2/2] perf record: Update docs regarding the maximum limitation of AUX area
2023-08-04 8:46 ` Leo Yan
@ 2023-08-04 11:47 ` Shuai Xue
0 siblings, 0 replies; 10+ messages in thread
From: Shuai Xue @ 2023-08-04 11:47 UTC (permalink / raw)
To: Leo Yan, James Clark
Cc: alexander.shishkin, peterz, mingo, baolin.wang, acme,
mark.rutland, jolsa, namhyung, irogers, adrian.hunter,
linux-perf-users, linux-kernel, nathan, bpf
On 2023/8/4 16:46, Leo Yan wrote:
> On Fri, Aug 04, 2023 at 03:29:45PM +0800, Shuai Xue wrote:
>> The maximum AUX area is limited by the page size of the system. Update
>> the documentation to reflect this.
>>
>> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
>> ---
>> tools/perf/Documentation/perf-record.txt | 3 +++
>> 1 file changed, 3 insertions(+)
>>
>> diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
>> index 680396c56bd1..b0ee7b63da0e 100644
>> --- a/tools/perf/Documentation/perf-record.txt
>> +++ b/tools/perf/Documentation/perf-record.txt
>> @@ -292,6 +292,9 @@ OPTIONS
>> Also, by adding a comma, the number of mmap pages for AUX
>> area tracing can be specified.
>>
>> + The maximum AUX area is limited by the page size of the system. For
>> + example with 4K pages configured, the maximum is 2GiB.
>> +
>
> This statement is incorrect as it fails to give out prerequisites.
>
> E.g., on Arm64, for 4KiB, 16KiB or 64KiB base page size, different page
> size has different default values for MAX_ORDER. Furthermore, MAX_ORDER
> can be set by config ARCH_FORCE_MAX_ORDER, thus we cannot arbitrarily
> say the maximum allocation size is 2GiB for 4KiB page size.
>
Hi, Leo,
You are right, thank you for point this out.
Maybe we could consider to use a formula to present the avaliable
> maximum buffer size:
>
> PAGE_SIZE << MAX_ORDER
> ( ---------------------- ) * PAGE_SIZE
> sizeof(page_pointer)
>
> PAGE_SIZE << MAX_ORDER : the size of maximal physically
> contiguous allocations, which is the maximum size can be
> allocated by slab/slub.
I agree that a formula to present that limitation is more accurate. But as
@James commented in last v3, "Minor nit: I wouldn't expect a Perf tool user
to know what "MAX_ORDER" is", how about to keep both:
The maximum AUX area is limited by the maximum physically contiguous memory
allocated from slab/slub. It can be calculated with following formula:
PAGE_SIZE << MAX_ORDER
( ---------------------- ) * PAGE_SIZE
sizeof(page_pointer)
For example with 4K pages and MAX_ORDER=10 configured, the maximum AUX area
is 2GiB.
Thank you.
Best Regards,
Shuai
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v4 1/2] perf/core: Bail out early if the request AUX area is out of bound
2023-08-04 11:24 ` Shuai Xue
@ 2023-09-06 3:27 ` Shuai Xue
2023-09-06 7:02 ` Leo Yan
0 siblings, 1 reply; 10+ messages in thread
From: Shuai Xue @ 2023-09-06 3:27 UTC (permalink / raw)
To: Leo Yan, James Clark, Peter Zijlstra
Cc: alexander.shishkin, mingo, baolin.wang, acme, mark.rutland, jolsa,
namhyung, irogers, adrian.hunter, linux-perf-users, linux-kernel,
nathan, bpf
On 2023/8/4 19:24, Shuai Xue wrote:
>
>
> On 2023/8/4 16:59, Leo Yan wrote:
>> On Fri, Aug 04, 2023 at 03:29:44PM +0800, Shuai Xue wrote:
>>> When perf-record with a large AUX area, e.g 4GB, it fails with:
>>>
>>> #perf record -C 0 -m ,4G -e arm_spe_0// -- sleep 1
>>> failed to mmap with 12 (Cannot allocate memory)
>>>
>>> and it reveals a WARNING with __alloc_pages():
>>>
>>> [ 66.595604] ------------[ cut here ]------------
>>> [ 66.600206] WARNING: CPU: 44 PID: 17573 at mm/page_alloc.c:5568 __alloc_pages+0x1ec/0x248
>>> [ 66.608375] Modules linked in: ip6table_filter(E) ip6_tables(E) iptable_filter(E) ebtable_nat(E) ebtables(E) aes_ce_blk(E) vfat(E) fat(E) aes_ce_cipher(E) crct10dif_ce(E) ghash_ce(E) sm4_ce_cipher(E) sm4(E) sha2_ce(E) sha256_arm64(E) sha1_ce(E) acpi_ipmi(E) sbsa_gwdt(E) sg(E) ipmi_si(E) ipmi_devintf(E) ipmi_msghandler(E) ip_tables(E) sd_mod(E) ast(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) nvme(E) sysimgblt(E) i2c_algo_bit(E) nvme_core(E) drm_shmem_helper(E) ahci(E) t10_pi(E) libahci(E) drm(E) crc64_rocksoft(E) i40e(E) crc64(E) libata(E) i2c_core(E)
>>> [ 66.657719] CPU: 44 PID: 17573 Comm: perf Kdump: loaded Tainted: G E 6.3.0-rc4+ #58
>>> [ 66.666749] Hardware name: Default Default/Default, BIOS 1.2.M1.AL.P.139.00 03/22/2023
>>> [ 66.674650] pstate: 23400009 (nzCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
>>> [ 66.681597] pc : __alloc_pages+0x1ec/0x248
>>> [ 66.685680] lr : __kmalloc_large_node+0xc0/0x1f8
>>> [ 66.690285] sp : ffff800020523980
>>> [ 66.693585] pmr_save: 000000e0
>>> [ 66.696624] x29: ffff800020523980 x28: ffff000832975800 x27: 0000000000000000
>>> [ 66.703746] x26: 0000000000100000 x25: 0000000000100000 x24: ffff8000083615d0
>>> [ 66.710866] x23: 0000000000040dc0 x22: ffff000823d6d140 x21: 000000000000000b
>>> [ 66.717987] x20: 000000000000000b x19: 0000000000000000 x18: 0000000000000030
>>> [ 66.725108] x17: 0000000000000000 x16: ffff800008f05be8 x15: ffff000823d6d6d0
>>> [ 66.732229] x14: 0000000000000000 x13: 343373656761705f x12: 726e202c30206574
>>> [ 66.739350] x11: 00000000ffff7fff x10: 00000000ffff7fff x9 : ffff8000083af570
>>> [ 66.746471] x8 : 00000000000bffe8 x7 : c0000000ffff7fff x6 : 000000000005fff4
>>> [ 66.753592] x5 : 0000000000000000 x4 : ffff000823d6d8d8 x3 : 0000000000000000
>>> [ 66.760713] x2 : 0000000000000000 x1 : 0000000000000001 x0 : 0000000000040dc0
>>> [ 66.767834] Call trace:
>>> [ 66.770267] __alloc_pages+0x1ec/0x248
>>> [ 66.774003] __kmalloc_large_node+0xc0/0x1f8
>>> [ 66.778259] __kmalloc_node+0x134/0x1e8
>>> [ 66.782081] rb_alloc_aux+0xe0/0x298
>>> [ 66.785643] perf_mmap+0x440/0x660
>>> [ 66.789031] mmap_region+0x308/0x8a8
>>> [ 66.792593] do_mmap+0x3c0/0x528
>>> [ 66.795807] vm_mmap_pgoff+0xf4/0x1b8
>>> [ 66.799456] ksys_mmap_pgoff+0x18c/0x218
>>> [ 66.803365] __arm64_sys_mmap+0x38/0x58
>>> [ 66.807187] invoke_syscall+0x50/0x128
>>> [ 66.810922] el0_svc_common.constprop.0+0x58/0x188
>>> [ 66.815698] do_el0_svc+0x34/0x50
>>> [ 66.818999] el0_svc+0x34/0x108
>>> [ 66.822127] el0t_64_sync_handler+0xb8/0xc0
>>> [ 66.826296] el0t_64_sync+0x1a4/0x1a8
>>> [ 66.829946] ---[ end trace 0000000000000000 ]---
>>>
>>> 'rb->aux_pages' allocated by kcalloc() is a pointer array which is used to
>>> maintains AUX trace pages. The allocated page for this array is physically
>>> contiguous (and virtually contiguous) with an order of 0..MAX_ORDER. If the
>>> size of pointer array crosses the limitation set by MAX_ORDER, it reveals a
>>> WARNING.
>>>
>>> So bail out early with -ENOMEM if the request AUX area is out of bound,
>>> e.g.:
>>>
>>> #perf record -C 0 -m ,4G -e arm_spe_0// -- sleep 1
>>> failed to mmap with 12 (Cannot allocate memory)
>>>
>>> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
>>> ---
>>> kernel/events/ring_buffer.c | 3 +++
>>> 1 file changed, 3 insertions(+)
>>>
>>> diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
>>> index a0433f37b024..c445e927368d 100644
>>> --- a/kernel/events/ring_buffer.c
>>> +++ b/kernel/events/ring_buffer.c
>>> @@ -699,6 +699,9 @@ int rb_alloc_aux(struct perf_buffer *rb, struct perf_event *event,
>>> watermark = 0;
>>> }
>>>
>>> + /* Can't allocate more than MAX_ORDER */
>>
>> The comment is confused. I'd like to refine it as:
>>
>> /*
>> * kcalloc_node() is unable to allocate buffer if the size is larger
>> * than: PAGE_SIZE << MAX_ORDER; directly bail out in this case.
>> */
>
> Hi, Leo,
>
> Thank you for your quick feedback. The comment is simplified from Peter's reply in v2
> version. Your refined comment is more detailed and it makes sense to me, I would like
> to adopt it if @Peter has no other opinions.
>
>> To be honest, I am not sure if perf core maintainers like this kind
>> thing or not. Please seek their opinion before you move forward.
>>
>
> and hi, all perf core maintainers,
>
> I have not received explicit objection from perf core maintainers @Peter or @James so
> I moved forward to address their comments. It's fine to me to wait for more opinions from
> perf core maintainers.
>
> Best Regards,
> Shuai
>
Hi, Leo, and all folks,
Any more comments? Should I move forward to send a new patch?
Thank you.
Best Regards,
Shuai
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v4 1/2] perf/core: Bail out early if the request AUX area is out of bound
2023-09-06 3:27 ` Shuai Xue
@ 2023-09-06 7:02 ` Leo Yan
2023-09-06 7:07 ` Shuai Xue
0 siblings, 1 reply; 10+ messages in thread
From: Leo Yan @ 2023-09-06 7:02 UTC (permalink / raw)
To: Shuai Xue
Cc: James Clark, Peter Zijlstra, alexander.shishkin, mingo,
baolin.wang, acme, mark.rutland, jolsa, namhyung, irogers,
adrian.hunter, linux-perf-users, linux-kernel, nathan, bpf
Hi Shuai,
On Wed, Sep 06, 2023 at 11:27:38AM +0800, Shuai Xue wrote:
[...]
> >>> + /* Can't allocate more than MAX_ORDER */
> >>
> >> The comment is confused. I'd like to refine it as:
> >>
> >> /*
> >> * kcalloc_node() is unable to allocate buffer if the size is larger
> >> * than: PAGE_SIZE << MAX_ORDER; directly bail out in this case.
> >> */
> >
> > Hi, Leo,
> >
> > Thank you for your quick feedback. The comment is simplified from Peter's reply in v2
> > version. Your refined comment is more detailed and it makes sense to me, I would like
> > to adopt it if @Peter has no other opinions.
> >
> >> To be honest, I am not sure if perf core maintainers like this kind
> >> thing or not. Please seek their opinion before you move forward.
> >>
> >
> > and hi, all perf core maintainers,
> >
> > I have not received explicit objection from perf core maintainers @Peter or @James so
> > I moved forward to address their comments. It's fine to me to wait for more opinions from
> > perf core maintainers.
> >
> > Best Regards,
> > Shuai
> >
>
> Hi, Leo, and all folks,
>
> Any more comments? Should I move forward to send a new patch?
I am afraid I cannot give a reliable suggestion.
Anyway, I personally think the returned error value in this patch is
better than the kernel oops since the kernel oops is a bit scary for
tool's users ;) Another reason is the perf core layer should report
error earlier rather than relying on the page buddy allocation layer
to detect the memory allocation failure, which is easier for both
developers and users to understand the failure.
IMHO, a good practice is to respin a new patch set and send out for
review. Good luck!
Thanks,
Leo
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v4 1/2] perf/core: Bail out early if the request AUX area is out of bound
2023-09-06 7:02 ` Leo Yan
@ 2023-09-06 7:07 ` Shuai Xue
0 siblings, 0 replies; 10+ messages in thread
From: Shuai Xue @ 2023-09-06 7:07 UTC (permalink / raw)
To: Leo Yan
Cc: James Clark, Peter Zijlstra, alexander.shishkin, mingo,
baolin.wang, acme, mark.rutland, jolsa, namhyung, irogers,
adrian.hunter, linux-perf-users, linux-kernel, nathan, bpf
On 2023/9/6 15:02, Leo Yan wrote:
> Hi Shuai,
>
> On Wed, Sep 06, 2023 at 11:27:38AM +0800, Shuai Xue wrote:
>
> [...]
>
>>>>> + /* Can't allocate more than MAX_ORDER */
>>>>
>>>> The comment is confused. I'd like to refine it as:
>>>>
>>>> /*
>>>> * kcalloc_node() is unable to allocate buffer if the size is larger
>>>> * than: PAGE_SIZE << MAX_ORDER; directly bail out in this case.
>>>> */
>>>
>>> Hi, Leo,
>>>
>>> Thank you for your quick feedback. The comment is simplified from Peter's reply in v2
>>> version. Your refined comment is more detailed and it makes sense to me, I would like
>>> to adopt it if @Peter has no other opinions.
>>>
>>>> To be honest, I am not sure if perf core maintainers like this kind
>>>> thing or not. Please seek their opinion before you move forward.
>>>>
>>>
>>> and hi, all perf core maintainers,
>>>
>>> I have not received explicit objection from perf core maintainers @Peter or @James so
>>> I moved forward to address their comments. It's fine to me to wait for more opinions from
>>> perf core maintainers.
>>>
>>> Best Regards,
>>> Shuai
>>>
>>
>> Hi, Leo, and all folks,
>>
>> Any more comments? Should I move forward to send a new patch?
>
> I am afraid I cannot give a reliable suggestion.
>
> Anyway, I personally think the returned error value in this patch is
> better than the kernel oops since the kernel oops is a bit scary for
> tool's users ;) Another reason is the perf core layer should report
> error earlier rather than relying on the page buddy allocation layer
> to detect the memory allocation failure, which is easier for both
> developers and users to understand the failure.
>
> IMHO, a good practice is to respin a new patch set and send out for
> review. Good luck!
>
Hi, Leo,
Thanks for valuable comments.
I will send a new patch set.
Thank you.
Best Regards,
Shuai
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2023-09-06 7:07 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-04 7:29 [PATCH v4 0/2] perf/core: Bail out early if the request AUX area is out of bound Shuai Xue
2023-08-04 7:29 ` [PATCH v4 1/2] " Shuai Xue
2023-08-04 8:59 ` Leo Yan
2023-08-04 11:24 ` Shuai Xue
2023-09-06 3:27 ` Shuai Xue
2023-09-06 7:02 ` Leo Yan
2023-09-06 7:07 ` Shuai Xue
2023-08-04 7:29 ` [PATCH v4 2/2] perf record: Update docs regarding the maximum limitation of AUX area Shuai Xue
2023-08-04 8:46 ` Leo Yan
2023-08-04 11:47 ` Shuai Xue
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).