public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* NULL pointer deref when running BPF monitor program (6.11.0-rc1)
@ 2024-08-05  9:20 Juri Lelli
  2024-08-05 16:49 ` Jiri Olsa
  0 siblings, 1 reply; 27+ messages in thread
From: Juri Lelli @ 2024-08-05  9:20 UTC (permalink / raw)
  To: bpf; +Cc: LKML, asavkov

Hello,

I am experimenting with using BPF to monitor and verify SCHED_DEADLINE
behavior when dealing with priority inheritance and it looks like I'm
reliably hitting the following NULL pointer dereference. I am under the
assumption this shouldn't be possible with BPF thanks to the verifier,
so I'm reaching out to understand if the problem is in the operator
(myself, doing something obviously wrong) or if I might be hitting a
genuine issue.

The BPF program is available at

https://gitlab.com/jlelli/dlmon/-/blob/main/src/dlmon.bpf.c?ref_type=heads

I am pretty sure it has issues in itself, but again I didn't expect it
could crash my system.

First occurrence comes from starting this simple kernel module that creates
a bunch of SCHED_DEADLINE tasks locking an rt_mutex:

https://gitlab.com/jlelli/dlmon/-/blob/main/module/pit.c?ref_type=heads

After loading the pit module, start the dlmon BPF program and it should
crash (maybe restart it a few times in case it doesn't right away).

--->8---
[  330.621728] Initializing PIT
[  330.624625] pit module --- lock=rtmutex
[  338.233883] BUG: kernel NULL pointer dereference, address: 000000000000040c
[  338.240852] #PF: supervisor read access in kernel mode
[  338.245990] #PF: error_code(0x0000) - not-present page
[  338.251129] PGD 0 P4D 0
[  338.253670] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
[  338.258550] CPU: 123 UID: 0 PID: 4213 Comm: low Tainted: G           OE      6.11.0-rc1 #1
[  338.266810] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[  338.272121] Hardware name: Dell Inc. PowerEdge R7525/0PYVT1, BIOS 2.14.1 12/17/2023
[  338.279773] RIP: 0010:bpf_prog_ec8173ca2868eb50_handle__sched_pi_setprio+0x2a/0xe3
[  338.287339] Code: f3 0f 1e fa 0f 1f 44 00 00 66 90 55 48 89 e5 f3 0f 1e fa 48 81 ec 30 00 00 00 53 41 55 41 56 48 89 fb 4c 8b 6b 00 4c 8b 73 08 <41> 8b be 0c 04 00 00 48 83 ff 06 0f 85 9b 00 00 00 41 8b be c0 09
[  338.306084] RSP: 0018:ffffb7fb4ed37d00 EFLAGS: 00010086
[  338.311312] RAX: ffffffffc012ec8c RBX: ffffb7fb4ed37d68 RCX: ffff88e5df9b5900
[  338.318446] RDX: ffff88dea5cb99c0 RSI: ffffb7fb4a5f5048 RDI: ffffb7fb4ed37d68
[  338.325579] RBP: ffffb7fb4ed37d48 R08: 0000004ec048f000 R09: 00000000000003d7
[  338.332711] R10: 000000000000c8f4 R11: 0000007b00001075 R12: 0000000000000000
[  338.339843] R13: ffff88dea5cb99c0 R14: 0000000000000000 R15: 0000000000000001
[  338.346976] FS:  0000000000000000(0000) GS:ffff88e5df980000(0000) knlGS:0000000000000000
[  338.355062] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  338.360807] CR2: 000000000000040c CR3: 0000000657022000 CR4: 0000000000350ef0
[  338.367940] Call Trace:
[  338.370396]  <TASK>
[  338.372498]  ? srso_return_thunk+0x5/0x5f
[  338.376514]  ? show_trace_log_lvl+0x255/0x2f0
[  338.380875]  ? show_trace_log_lvl+0x255/0x2f0
[  338.385234]  ? bpf_trace_run2+0x89/0x110
[  338.389158]  ? __die_body.cold+0x8/0x12
[  338.392998]  ? page_fault_oops+0x146/0x160
[  338.397097]  ? exc_page_fault+0x73/0x160
[  338.401022]  ? asm_exc_page_fault+0x26/0x30
[  338.405211]  ? 0xffffffffc012ec8c
[  338.408536]  ? bpf_prog_ec8173ca2868eb50_handle__sched_pi_setprio+0x2a/0xe3
[  338.415497]  ? srso_return_thunk+0x5/0x5f
[  338.419509]  ? srso_return_thunk+0x5/0x5f
[  338.423521]  ? sysvec_irq_work+0xe/0x90
[  338.427359]  ? srso_return_thunk+0x5/0x5f
[  338.431372]  ? srso_return_thunk+0x5/0x5f
[  338.435387]  ? srso_return_thunk+0x5/0x5f
[  338.439399]  bpf_trace_run2+0x89/0x110
[  338.443152]  rt_mutex_setprio+0x1d3/0x3d0
[  338.447170]  mark_wakeup_next_waiter+0x80/0xd0
[  338.451622]  rt_mutex_unlock+0xea/0x130
[  338.455456]  ? __trace_bprintk+0x90/0xa0
[  338.459387]  ? periodic_run+0x122/0x360 [pit]
[  338.463745]  ? periodic_run+0x75/0x360 [pit]
[  338.468017]  periodic_run+0x29b/0x360 [pit]
[  338.472202]  ? periodic_run+0x75/0x360 [pit]
[  338.476477]  ? __pfx_periodic_run+0x10/0x10 [pit]
[  338.481182]  kthread+0xd2/0x100
[  338.484328]  ? __pfx_kthread+0x10/0x10
[  338.488079]  ret_from_fork+0x34/0x50
[  338.491659]  ? __pfx_kthread+0x10/0x10
[  338.495412]  ret_from_fork_asm+0x1a/0x30
[  338.499342]  </TASK>
[  338.501530] Modules linked in: pit(OE) vfat fat ipmi_ssif intel_rapl_msr amd_atl intel_rapl_common amd64_edac edac_mce_amd kvm_amd ice kvm dell_pc platform_profile dell_wmi bnxt_en sparse_keymap rfkill tg3 video mgag200 acpi_power_meter ipmi_si gnss acpi_ipmi dcdbas dell_smbios i2c_piix4 libie ipmi_devintf i2c_algo_bit dell_wmi_descriptor ipmi_msghandler rapl pcspkr ptdma acpi_cpufreq k10temp wmi_bmof i2c_smbus fuse loop nfnetlink xfs sd_mod ahci crct10dif_pclmul libahci crc32_pclmul crc32c_intel ghash_clmulni_intel libata ccp megaraid_sas sp5100_tco wmi dm_mirror dm_region_hash dm_log dm_mod
[  338.501607] Unloaded tainted modules: pit(OE):3 [last unloaded: pit(OE)]
[  338.560682] CR2: 000000000000040c
[  338.564003] ---[ end trace 0000000000000000 ]---
--->8---

Wondering if maybe my module was the culprit, I also then verified that
I could hit the crash simply by starting a similar type of workload
completely from userspace by the use of rt-app with the following
taskset:

https://gitlab.com/jlelli/dlmon/-/blob/main/samples/rt-app/multi_lock.json?ref_type=heads

This script should help with installing rt-app:

https://gitlab.com/jlelli/dlmon/-/blob/main/samples/rt-app/install_fedora.sh?ref_type=heads

Starting rt-app with the above taskset and then starting dlmon (again
maybe several times) should eventually trigger the following crash.

--->8---
[  154.566882] BUG: kernel NULL pointer dereference, address: 000000000000040c
[  154.573844] #PF: supervisor read access in kernel mode
[  154.578982] #PF: error_code(0x0000) - not-present page
[  154.584122] PGD 146fff067 P4D 146fff067 PUD 10fc00067 PMD 0
[  154.589780] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
[  154.594659] CPU: 28 UID: 0 PID: 2234 Comm: thread0-13 Kdump: loaded Not tainted 6.11.0-rc1 #8
[  154.603179] Hardware name: Dell Inc. PowerEdge R740/04FC42, BIOS 2.10.2 02/24/2021
[  154.610744] RIP: 0010:bpf_prog_ec8173ca2868eb50_handle__sched_pi_setprio+0x22/0xd7
[  154.618310] Code: cc cc cc cc cc cc cc cc 0f 1f 44 00 00 66 90 55 48 89 e5 48 81 ec 30 00 00 00 53 41 55 41 56 48 89 fb 4c 8b 6b 00 4c 8b 73 08 <41> 8b be 0c 04 00 00 48 83 ff 06 0f 85 9b 00 00 00 41 8b be c0 09
[  154.637052] RSP: 0018:ffffabac60aebbc0 EFLAGS: 00010086
[  154.642278] RAX: ffffffffc03fba5c RBX: ffffabac60aebc28 RCX: 000000000000001f
[  154.649411] RDX: ffff95a90b4e4180 RSI: ffffabac4e639048 RDI: ffffabac60aebc28
[  154.656544] RBP: ffffabac60aebc08 R08: 00000023fce7674a R09: ffff95a91d85af38
[  154.663674] R10: ffff95a91d85a0c0 R11: 000000003357e518 R12: 0000000000000000
[  154.670807] R13: ffff95a90b4e4180 R14: 0000000000000000 R15: 0000000000000001
[  154.677939] FS:  00007ffa6d600640(0000) GS:ffff95c01bf00000(0000) knlGS:0000000000000000
[  154.686026] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  154.691769] CR2: 000000000000040c CR3: 000000014b9f2005 CR4: 00000000007706f0
[  154.698903] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  154.706035] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  154.713168] PKRU: 55555554
[  154.715879] Call Trace:
[  154.718332]  <TASK>
[  154.720439]  ? __die+0x20/0x70
[  154.723498]  ? page_fault_oops+0x75/0x170
[  154.727508]  ? sysvec_irq_work+0xb/0x90
[  154.731348]  ? exc_page_fault+0x64/0x140
[  154.735275]  ? asm_exc_page_fault+0x22/0x30
[  154.739461]  ? 0xffffffffc03fba5c
[  154.742780]  ? bpf_prog_ec8173ca2868eb50_handle__sched_pi_setprio+0x22/0xd7
[  154.749737]  bpf_trace_run2+0x71/0xf0
[  154.753405]  ? raw_spin_rq_lock_nested+0x19/0x80
[  154.758023]  rt_mutex_setprio+0x1bf/0x3d0
[  154.762035]  ? hrtimer_nanosleep+0xb1/0x190
[  154.766221]  ? rseq_get_rseq_cs+0x1d/0x220
[  154.770320]  mark_wakeup_next_waiter+0x85/0xd0
[  154.774765]  __rt_mutex_futex_unlock+0x1c/0x40
[  154.779211]  futex_unlock_pi+0x240/0x310
[  154.783137]  do_futex+0x149/0x1d0
[  154.786457]  __x64_sys_futex+0x73/0x1d0
[  154.790294]  do_syscall_64+0x79/0x150
[  154.793962]  ? update_process_times+0x8c/0xa0
[  154.798319]  ? timerqueue_add+0x9b/0xc0
[  154.802158]  ? enqueue_hrtimer+0x35/0x90
[  154.806085]  ? __hrtimer_run_queues+0x141/0x2a0
[  154.810616]  ? ktime_get+0x34/0xc0
[  154.814021]  ? clockevents_program_event+0x92/0x100
[  154.818901]  ? hrtimer_interrupt+0x129/0x240
[  154.823174]  ? sched_clock+0xc/0x30
[  154.826666]  ? sched_clock_cpu+0xb/0x190
[  154.830591]  ? irqtime_account_irq+0x41/0xc0
[  154.834865]  ? clear_bhb_loop+0x45/0xa0
[  154.838702]  ? clear_bhb_loop+0x45/0xa0
[  154.842542]  ? clear_bhb_loop+0x45/0xa0
[  154.846381]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  154.851434] RIP: 0033:0x7ffa75a8e956
[  154.855011] Code: 0f 86 26 fe ff ff 83 c0 16 83 e0 f7 0f 85 2d ff ff ff e9 15 fe ff ff 40 80 f6 87 45 31 d2 31 d2 4c 89 c7 b8 ca 00 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 86 5b fd ff ff 83 f8 92 0f 84 52 fd ff ff 83
[  154.873757] RSP: 002b:00007ffa6d5ffb98 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
[  154.881321] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ffa75a8e956
[  154.888454] RDX: 0000000000000000 RSI: 0000000000000087 RDI: 000000003357e518
[  154.895587] RBP: 0000000000000002 R08: 000000003357e518 R09: 0000000000000000
[  154.902718] R10: 0000000000000000 R11: 0000000000000246 R12: 000000003357e518
[  154.909852] R13: 000000003357e510 R14: 0000000000000000 R15: 000000003357e8e8
[  154.916983]  </TASK>
[  154.919176] Modules linked in: qrtr rfkill vfat fat intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common skx_edac skx_edac_common nfit libnvdimm x86_pkg_temp_thermal coretemp ipmi_ssif rapl iTCO_wdt iTCO_vendor_support dell_pc intel_cstate dell_smbios platform_profile mei_me i2c_i801 acpi_power_meter dcdbas intel_uncore dell_wmi_descriptor wmi_bmof pcspkr ipmi_si mei i2c_smbus lpc_ich acpi_ipmi intel_pch_thermal ipmi_devintf ipmi_msghandler xfs libcrc32c sr_mod sd_mod cdrom sg uas usb_storage mgag200 drm_shmem_helper drm_kms_helper ahci crct10dif_pclmul libahci crc32_pclmul i40e drm igb crc32c_intel libata dca megaraid_sas ghash_clmulni_intel i2c_algo_bit libie wmi dm_mirror dm_region_hash dm_log dm_mod fuse
[  154.984673] CR2: 000000000000040c
--->8---

Apologies for the rather long report, but I tried to provide hopefully
enough information already for whoever might have time to take a look at
this. Please let me know if I'm either wrong in what I'm trying to do or
how to proceed (if you need more info, etc.).

Thank you in advance!

Best,
Juri


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NULL pointer deref when running BPF monitor program (6.11.0-rc1)
  2024-08-05  9:20 NULL pointer deref when running BPF monitor program (6.11.0-rc1) Juri Lelli
@ 2024-08-05 16:49 ` Jiri Olsa
  2024-08-05 17:00   ` Alexei Starovoitov
  0 siblings, 1 reply; 27+ messages in thread
From: Jiri Olsa @ 2024-08-05 16:49 UTC (permalink / raw)
  To: Juri Lelli; +Cc: bpf, LKML, asavkov

On Mon, Aug 05, 2024 at 11:20:11AM +0200, Juri Lelli wrote:

SNIP

> [  154.566882] BUG: kernel NULL pointer dereference, address: 000000000000040c
> [  154.573844] #PF: supervisor read access in kernel mode
> [  154.578982] #PF: error_code(0x0000) - not-present page
> [  154.584122] PGD 146fff067 P4D 146fff067 PUD 10fc00067 PMD 0
> [  154.589780] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
> [  154.594659] CPU: 28 UID: 0 PID: 2234 Comm: thread0-13 Kdump: loaded Not tainted 6.11.0-rc1 #8
> [  154.603179] Hardware name: Dell Inc. PowerEdge R740/04FC42, BIOS 2.10.2 02/24/2021
> [  154.610744] RIP: 0010:bpf_prog_ec8173ca2868eb50_handle__sched_pi_setprio+0x22/0xd7
> [  154.618310] Code: cc cc cc cc cc cc cc cc 0f 1f 44 00 00 66 90 55 48 89 e5 48 81 ec 30 00 00 00 53 41 55 41 56 48 89 fb 4c 8b 6b 00 4c 8b 73 08 <41> 8b be 0c 04 00 00 48 83 ff 06 0f 85 9b 00 00 00 41 8b be c0 09
> [  154.637052] RSP: 0018:ffffabac60aebbc0 EFLAGS: 00010086
> [  154.642278] RAX: ffffffffc03fba5c RBX: ffffabac60aebc28 RCX: 000000000000001f
> [  154.649411] RDX: ffff95a90b4e4180 RSI: ffffabac4e639048 RDI: ffffabac60aebc28
> [  154.656544] RBP: ffffabac60aebc08 R08: 00000023fce7674a R09: ffff95a91d85af38
> [  154.663674] R10: ffff95a91d85a0c0 R11: 000000003357e518 R12: 0000000000000000
> [  154.670807] R13: ffff95a90b4e4180 R14: 0000000000000000 R15: 0000000000000001
> [  154.677939] FS:  00007ffa6d600640(0000) GS:ffff95c01bf00000(0000) knlGS:0000000000000000
> [  154.686026] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  154.691769] CR2: 000000000000040c CR3: 000000014b9f2005 CR4: 00000000007706f0
> [  154.698903] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  154.706035] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [  154.713168] PKRU: 55555554
> [  154.715879] Call Trace:
> [  154.718332]  <TASK>
> [  154.720439]  ? __die+0x20/0x70
> [  154.723498]  ? page_fault_oops+0x75/0x170
> [  154.727508]  ? sysvec_irq_work+0xb/0x90
> [  154.731348]  ? exc_page_fault+0x64/0x140
> [  154.735275]  ? asm_exc_page_fault+0x22/0x30
> [  154.739461]  ? 0xffffffffc03fba5c
> [  154.742780]  ? bpf_prog_ec8173ca2868eb50_handle__sched_pi_setprio+0x22/0xd7

hi,
reproduced.. AFAICS looks like the bpf program somehow lost the booster != NULL
check and just load the policy field without it and crash when booster is rubbish

int handle__sched_pi_setprio(u64 * ctx):
; int handle__sched_pi_setprio(u64 *ctx)
   0: (bf) r6 = r1
; struct task_struct *boosted = (void *) ctx[0];
   1: (79) r7 = *(u64 *)(r6 +0)
; struct task_struct *booster = (void *) ctx[1];
   2: (79) r8 = *(u64 *)(r6 +8)
; if (booster->policy != SCHED_DEADLINE)

curious why the check disappeared, because object file has it, so I guess verifier
took it out for some reason, will check

jirka


> [  154.749737]  bpf_trace_run2+0x71/0xf0
> [  154.753405]  ? raw_spin_rq_lock_nested+0x19/0x80
> [  154.758023]  rt_mutex_setprio+0x1bf/0x3d0
> [  154.762035]  ? hrtimer_nanosleep+0xb1/0x190
> [  154.766221]  ? rseq_get_rseq_cs+0x1d/0x220
> [  154.770320]  mark_wakeup_next_waiter+0x85/0xd0
> [  154.774765]  __rt_mutex_futex_unlock+0x1c/0x40
> [  154.779211]  futex_unlock_pi+0x240/0x310
> [  154.783137]  do_futex+0x149/0x1d0
> [  154.786457]  __x64_sys_futex+0x73/0x1d0
> [  154.790294]  do_syscall_64+0x79/0x150
> [  154.793962]  ? update_process_times+0x8c/0xa0
> [  154.798319]  ? timerqueue_add+0x9b/0xc0
> [  154.802158]  ? enqueue_hrtimer+0x35/0x90
> [  154.806085]  ? __hrtimer_run_queues+0x141/0x2a0
> [  154.810616]  ? ktime_get+0x34/0xc0
> [  154.814021]  ? clockevents_program_event+0x92/0x100
> [  154.818901]  ? hrtimer_interrupt+0x129/0x240
> [  154.823174]  ? sched_clock+0xc/0x30
> [  154.826666]  ? sched_clock_cpu+0xb/0x190
> [  154.830591]  ? irqtime_account_irq+0x41/0xc0
> [  154.834865]  ? clear_bhb_loop+0x45/0xa0
> [  154.838702]  ? clear_bhb_loop+0x45/0xa0
> [  154.842542]  ? clear_bhb_loop+0x45/0xa0
> [  154.846381]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [  154.851434] RIP: 0033:0x7ffa75a8e956
> [  154.855011] Code: 0f 86 26 fe ff ff 83 c0 16 83 e0 f7 0f 85 2d ff ff ff e9 15 fe ff ff 40 80 f6 87 45 31 d2 31 d2 4c 89 c7 b8 ca 00 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 86 5b fd ff ff 83 f8 92 0f 84 52 fd ff ff 83
> [  154.873757] RSP: 002b:00007ffa6d5ffb98 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
> [  154.881321] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ffa75a8e956
> [  154.888454] RDX: 0000000000000000 RSI: 0000000000000087 RDI: 000000003357e518
> [  154.895587] RBP: 0000000000000002 R08: 000000003357e518 R09: 0000000000000000
> [  154.902718] R10: 0000000000000000 R11: 0000000000000246 R12: 000000003357e518
> [  154.909852] R13: 000000003357e510 R14: 0000000000000000 R15: 000000003357e8e8
> [  154.916983]  </TASK>
> [  154.919176] Modules linked in: qrtr rfkill vfat fat intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common skx_edac skx_edac_common nfit libnvdimm x86_pkg_temp_thermal coretemp ipmi_ssif rapl iTCO_wdt iTCO_vendor_support dell_pc intel_cstate dell_smbios platform_profile mei_me i2c_i801 acpi_power_meter dcdbas intel_uncore dell_wmi_descriptor wmi_bmof pcspkr ipmi_si mei i2c_smbus lpc_ich acpi_ipmi intel_pch_thermal ipmi_devintf ipmi_msghandler xfs libcrc32c sr_mod sd_mod cdrom sg uas usb_storage mgag200 drm_shmem_helper drm_kms_helper ahci crct10dif_pclmul libahci crc32_pclmul i40e drm igb crc32c_intel libata dca megaraid_sas ghash_clmulni_intel i2c_algo_bit libie wmi dm_mirror dm_region_hash dm_log dm_mod fuse
> [  154.984673] CR2: 000000000000040c
> --->8---
> 
> Apologies for the rather long report, but I tried to provide hopefully
> enough information already for whoever might have time to take a look at
> this. Please let me know if I'm either wrong in what I'm trying to do or
> how to proceed (if you need more info, etc.).
> 
> Thank you in advance!
> 
> Best,
> Juri
> 
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NULL pointer deref when running BPF monitor program (6.11.0-rc1)
  2024-08-05 16:49 ` Jiri Olsa
@ 2024-08-05 17:00   ` Alexei Starovoitov
  2024-08-06  7:08     ` Juri Lelli
                       ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Alexei Starovoitov @ 2024-08-05 17:00 UTC (permalink / raw)
  To: Jiri Olsa; +Cc: Juri Lelli, bpf, LKML, Artem Savkov

On Mon, Aug 5, 2024 at 9:50 AM Jiri Olsa <olsajiri@gmail.com> wrote:
>
> On Mon, Aug 05, 2024 at 11:20:11AM +0200, Juri Lelli wrote:
>
> SNIP
>
> > [  154.566882] BUG: kernel NULL pointer dereference, address: 000000000000040c
> > [  154.573844] #PF: supervisor read access in kernel mode
> > [  154.578982] #PF: error_code(0x0000) - not-present page
> > [  154.584122] PGD 146fff067 P4D 146fff067 PUD 10fc00067 PMD 0
> > [  154.589780] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
> > [  154.594659] CPU: 28 UID: 0 PID: 2234 Comm: thread0-13 Kdump: loaded Not tainted 6.11.0-rc1 #8
> > [  154.603179] Hardware name: Dell Inc. PowerEdge R740/04FC42, BIOS 2.10.2 02/24/2021
> > [  154.610744] RIP: 0010:bpf_prog_ec8173ca2868eb50_handle__sched_pi_setprio+0x22/0xd7
> > [  154.618310] Code: cc cc cc cc cc cc cc cc 0f 1f 44 00 00 66 90 55 48 89 e5 48 81 ec 30 00 00 00 53 41 55 41 56 48 89 fb 4c 8b 6b 00 4c 8b 73 08 <41> 8b be 0c 04 00 00 48 83 ff 06 0f 85 9b 00 00 00 41 8b be c0 09
> > [  154.637052] RSP: 0018:ffffabac60aebbc0 EFLAGS: 00010086
> > [  154.642278] RAX: ffffffffc03fba5c RBX: ffffabac60aebc28 RCX: 000000000000001f
> > [  154.649411] RDX: ffff95a90b4e4180 RSI: ffffabac4e639048 RDI: ffffabac60aebc28
> > [  154.656544] RBP: ffffabac60aebc08 R08: 00000023fce7674a R09: ffff95a91d85af38
> > [  154.663674] R10: ffff95a91d85a0c0 R11: 000000003357e518 R12: 0000000000000000
> > [  154.670807] R13: ffff95a90b4e4180 R14: 0000000000000000 R15: 0000000000000001
> > [  154.677939] FS:  00007ffa6d600640(0000) GS:ffff95c01bf00000(0000) knlGS:0000000000000000
> > [  154.686026] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  154.691769] CR2: 000000000000040c CR3: 000000014b9f2005 CR4: 00000000007706f0
> > [  154.698903] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [  154.706035] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [  154.713168] PKRU: 55555554
> > [  154.715879] Call Trace:
> > [  154.718332]  <TASK>
> > [  154.720439]  ? __die+0x20/0x70
> > [  154.723498]  ? page_fault_oops+0x75/0x170
> > [  154.727508]  ? sysvec_irq_work+0xb/0x90
> > [  154.731348]  ? exc_page_fault+0x64/0x140
> > [  154.735275]  ? asm_exc_page_fault+0x22/0x30
> > [  154.739461]  ? 0xffffffffc03fba5c
> > [  154.742780]  ? bpf_prog_ec8173ca2868eb50_handle__sched_pi_setprio+0x22/0xd7
>
> hi,
> reproduced.. AFAICS looks like the bpf program somehow lost the booster != NULL
> check and just load the policy field without it and crash when booster is rubbish
>
> int handle__sched_pi_setprio(u64 * ctx):
> ; int handle__sched_pi_setprio(u64 *ctx)
>    0: (bf) r6 = r1
> ; struct task_struct *boosted = (void *) ctx[0];
>    1: (79) r7 = *(u64 *)(r6 +0)
> ; struct task_struct *booster = (void *) ctx[1];
>    2: (79) r8 = *(u64 *)(r6 +8)
> ; if (booster->policy != SCHED_DEADLINE)
>
> curious why the check disappeared, because object file has it, so I guess verifier
> took it out for some reason, will check

Juri,

Thanks for flagging!

Jiri,

the verifier removes the check because it assumes that pointers
passed by the kernel into tracepoint are valid and trusted.
In this case:
        trace_sched_pi_setprio(p, pi_task);

pi_task can be NULL.

We cannot make all tracepoint pointers to be PTR_TRUSTED | PTR_MAYBE_NULL
by default, since it will break a bunch of progs.
Instead we can annotate this tracepoint arg as __nullable and
teach the verifier to recognize such special arguments of tracepoints.

Let's think how to workaround such verifier eagerness to remove != null check.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NULL pointer deref when running BPF monitor program (6.11.0-rc1)
  2024-08-05 17:00   ` Alexei Starovoitov
@ 2024-08-06  7:08     ` Juri Lelli
  2024-08-06 13:17     ` Jiri Olsa
  2024-08-06 13:24     ` Jiri Olsa
  2 siblings, 0 replies; 27+ messages in thread
From: Juri Lelli @ 2024-08-06  7:08 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: Jiri Olsa, bpf, LKML, Artem Savkov

Hi Jiri and Alexei,

On 05/08/24 10:00, Alexei Starovoitov wrote:
> On Mon, Aug 5, 2024 at 9:50 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> >
> > On Mon, Aug 05, 2024 at 11:20:11AM +0200, Juri Lelli wrote:
> >
> > SNIP
> >
> > > [  154.566882] BUG: kernel NULL pointer dereference, address: 000000000000040c
> > > [  154.573844] #PF: supervisor read access in kernel mode
> > > [  154.578982] #PF: error_code(0x0000) - not-present page
> > > [  154.584122] PGD 146fff067 P4D 146fff067 PUD 10fc00067 PMD 0
> > > [  154.589780] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
> > > [  154.594659] CPU: 28 UID: 0 PID: 2234 Comm: thread0-13 Kdump: loaded Not tainted 6.11.0-rc1 #8
> > > [  154.603179] Hardware name: Dell Inc. PowerEdge R740/04FC42, BIOS 2.10.2 02/24/2021
> > > [  154.610744] RIP: 0010:bpf_prog_ec8173ca2868eb50_handle__sched_pi_setprio+0x22/0xd7
> > > [  154.618310] Code: cc cc cc cc cc cc cc cc 0f 1f 44 00 00 66 90 55 48 89 e5 48 81 ec 30 00 00 00 53 41 55 41 56 48 89 fb 4c 8b 6b 00 4c 8b 73 08 <41> 8b be 0c 04 00 00 48 83 ff 06 0f 85 9b 00 00 00 41 8b be c0 09
> > > [  154.637052] RSP: 0018:ffffabac60aebbc0 EFLAGS: 00010086
> > > [  154.642278] RAX: ffffffffc03fba5c RBX: ffffabac60aebc28 RCX: 000000000000001f
> > > [  154.649411] RDX: ffff95a90b4e4180 RSI: ffffabac4e639048 RDI: ffffabac60aebc28
> > > [  154.656544] RBP: ffffabac60aebc08 R08: 00000023fce7674a R09: ffff95a91d85af38
> > > [  154.663674] R10: ffff95a91d85a0c0 R11: 000000003357e518 R12: 0000000000000000
> > > [  154.670807] R13: ffff95a90b4e4180 R14: 0000000000000000 R15: 0000000000000001
> > > [  154.677939] FS:  00007ffa6d600640(0000) GS:ffff95c01bf00000(0000) knlGS:0000000000000000
> > > [  154.686026] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [  154.691769] CR2: 000000000000040c CR3: 000000014b9f2005 CR4: 00000000007706f0
> > > [  154.698903] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > [  154.706035] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > [  154.713168] PKRU: 55555554
> > > [  154.715879] Call Trace:
> > > [  154.718332]  <TASK>
> > > [  154.720439]  ? __die+0x20/0x70
> > > [  154.723498]  ? page_fault_oops+0x75/0x170
> > > [  154.727508]  ? sysvec_irq_work+0xb/0x90
> > > [  154.731348]  ? exc_page_fault+0x64/0x140
> > > [  154.735275]  ? asm_exc_page_fault+0x22/0x30
> > > [  154.739461]  ? 0xffffffffc03fba5c
> > > [  154.742780]  ? bpf_prog_ec8173ca2868eb50_handle__sched_pi_setprio+0x22/0xd7
> >
> > hi,
> > reproduced.. AFAICS looks like the bpf program somehow lost the booster != NULL
> > check and just load the policy field without it and crash when booster is rubbish
> >
> > int handle__sched_pi_setprio(u64 * ctx):
> > ; int handle__sched_pi_setprio(u64 *ctx)
> >    0: (bf) r6 = r1
> > ; struct task_struct *boosted = (void *) ctx[0];
> >    1: (79) r7 = *(u64 *)(r6 +0)
> > ; struct task_struct *booster = (void *) ctx[1];
> >    2: (79) r8 = *(u64 *)(r6 +8)
> > ; if (booster->policy != SCHED_DEADLINE)
> >
> > curious why the check disappeared, because object file has it, so I guess verifier
> > took it out for some reason, will check
> 
> Juri,
> 
> Thanks for flagging!

Thanks for the super quick reply from both of you!

> Jiri,
> 
> the verifier removes the check because it assumes that pointers
> passed by the kernel into tracepoint are valid and trusted.
> In this case:
>         trace_sched_pi_setprio(p, pi_task);
> 
> pi_task can be NULL.
> 
> We cannot make all tracepoint pointers to be PTR_TRUSTED | PTR_MAYBE_NULL
> by default, since it will break a bunch of progs.
> Instead we can annotate this tracepoint arg as __nullable and
> teach the verifier to recognize such special arguments of tracepoints.
> 
> Let's think how to workaround such verifier eagerness to remove != null check.

Of course more than willing to test anything out, but no rush.

Best,
Juri


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NULL pointer deref when running BPF monitor program (6.11.0-rc1)
  2024-08-05 17:00   ` Alexei Starovoitov
  2024-08-06  7:08     ` Juri Lelli
@ 2024-08-06 13:17     ` Jiri Olsa
  2024-08-06 13:24     ` Jiri Olsa
  2 siblings, 0 replies; 27+ messages in thread
From: Jiri Olsa @ 2024-08-06 13:17 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: Jiri Olsa, Juri Lelli, bpf, LKML, Artem Savkov

On Mon, Aug 05, 2024 at 10:00:40AM -0700, Alexei Starovoitov wrote:
> On Mon, Aug 5, 2024 at 9:50 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> >
> > On Mon, Aug 05, 2024 at 11:20:11AM +0200, Juri Lelli wrote:
> >
> > SNIP
> >
> > > [  154.566882] BUG: kernel NULL pointer dereference, address: 000000000000040c
> > > [  154.573844] #PF: supervisor read access in kernel mode
> > > [  154.578982] #PF: error_code(0x0000) - not-present page
> > > [  154.584122] PGD 146fff067 P4D 146fff067 PUD 10fc00067 PMD 0
> > > [  154.589780] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
> > > [  154.594659] CPU: 28 UID: 0 PID: 2234 Comm: thread0-13 Kdump: loaded Not tainted 6.11.0-rc1 #8
> > > [  154.603179] Hardware name: Dell Inc. PowerEdge R740/04FC42, BIOS 2.10.2 02/24/2021
> > > [  154.610744] RIP: 0010:bpf_prog_ec8173ca2868eb50_handle__sched_pi_setprio+0x22/0xd7
> > > [  154.618310] Code: cc cc cc cc cc cc cc cc 0f 1f 44 00 00 66 90 55 48 89 e5 48 81 ec 30 00 00 00 53 41 55 41 56 48 89 fb 4c 8b 6b 00 4c 8b 73 08 <41> 8b be 0c 04 00 00 48 83 ff 06 0f 85 9b 00 00 00 41 8b be c0 09
> > > [  154.637052] RSP: 0018:ffffabac60aebbc0 EFLAGS: 00010086
> > > [  154.642278] RAX: ffffffffc03fba5c RBX: ffffabac60aebc28 RCX: 000000000000001f
> > > [  154.649411] RDX: ffff95a90b4e4180 RSI: ffffabac4e639048 RDI: ffffabac60aebc28
> > > [  154.656544] RBP: ffffabac60aebc08 R08: 00000023fce7674a R09: ffff95a91d85af38
> > > [  154.663674] R10: ffff95a91d85a0c0 R11: 000000003357e518 R12: 0000000000000000
> > > [  154.670807] R13: ffff95a90b4e4180 R14: 0000000000000000 R15: 0000000000000001
> > > [  154.677939] FS:  00007ffa6d600640(0000) GS:ffff95c01bf00000(0000) knlGS:0000000000000000
> > > [  154.686026] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [  154.691769] CR2: 000000000000040c CR3: 000000014b9f2005 CR4: 00000000007706f0
> > > [  154.698903] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > [  154.706035] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > [  154.713168] PKRU: 55555554
> > > [  154.715879] Call Trace:
> > > [  154.718332]  <TASK>
> > > [  154.720439]  ? __die+0x20/0x70
> > > [  154.723498]  ? page_fault_oops+0x75/0x170
> > > [  154.727508]  ? sysvec_irq_work+0xb/0x90
> > > [  154.731348]  ? exc_page_fault+0x64/0x140
> > > [  154.735275]  ? asm_exc_page_fault+0x22/0x30
> > > [  154.739461]  ? 0xffffffffc03fba5c
> > > [  154.742780]  ? bpf_prog_ec8173ca2868eb50_handle__sched_pi_setprio+0x22/0xd7
> >
> > hi,
> > reproduced.. AFAICS looks like the bpf program somehow lost the booster != NULL
> > check and just load the policy field without it and crash when booster is rubbish
> >
> > int handle__sched_pi_setprio(u64 * ctx):
> > ; int handle__sched_pi_setprio(u64 *ctx)
> >    0: (bf) r6 = r1
> > ; struct task_struct *boosted = (void *) ctx[0];
> >    1: (79) r7 = *(u64 *)(r6 +0)
> > ; struct task_struct *booster = (void *) ctx[1];
> >    2: (79) r8 = *(u64 *)(r6 +8)
> > ; if (booster->policy != SCHED_DEADLINE)
> >
> > curious why the check disappeared, because object file has it, so I guess verifier
> > took it out for some reason, will check
> 
> Juri,
> 
> Thanks for flagging!
> 
> Jiri,
> 
> the verifier removes the check because it assumes that pointers
> passed by the kernel into tracepoint are valid and trusted.

ok I was wondering that's the case, but couldn't find that in the code quickly ;-)

> In this case:
>         trace_sched_pi_setprio(p, pi_task);
> 
> pi_task can be NULL.
> 
> We cannot make all tracepoint pointers to be PTR_TRUSTED | PTR_MAYBE_NULL
> by default, since it will break a bunch of progs.
> Instead we can annotate this tracepoint arg as __nullable and
> teach the verifier to recognize such special arguments of tracepoints.
> 
> Let's think how to workaround such verifier eagerness to remove != null check.

there's probably better way, but following seems to workaround the issue

moving the logic into func__sched_pi_setprio function with tasks arguments
and call it with NULL from place that's never executed

jirka


---
diff --git a/src/dlmon.bpf.c b/src/dlmon.bpf.c
index 73c22d56a75f..5b99ff9e0a46 100644
--- a/src/dlmon.bpf.c
+++ b/src/dlmon.bpf.c
@@ -4,6 +4,8 @@
 #include <bpf/bpf_helpers.h>
 #include "dlmon.h"
 
+int unset;
+
 struct dl_parameters_t {
 	u64 runtime;
 	u64 period;
@@ -160,11 +162,10 @@ int handle__contention_end(u64 *ctx)
 	return 0;
 }
 
-SEC("tp_btf/sched_pi_setprio")
-int handle__sched_pi_setprio(u64 *ctx)
+static __attribute__((noinline))
+int func__sched_pi_setprio(void *ctx, struct task_struct *boosted, struct task_struct *booster)
+
 {
-	struct task_struct *boosted = (void *) ctx[0];
-	struct task_struct *booster = (void *) ctx[1];
 	struct dl_parameters_t *lookup;
 	struct task_pi_event pi_event;
 	u32 pid;
@@ -210,6 +211,18 @@ int handle__sched_pi_setprio(u64 *ctx)
 	return 0;
 }
 
+SEC("tp_btf/sched_pi_setprio")
+int handle__sched_pi_setprio(u64 *ctx)
+{
+	struct task_struct *boosted = (void *) ctx[0];
+	struct task_struct *booster = (void *) ctx[1];
+
+	if (unset)
+		return func__sched_pi_setprio(ctx, boosted, NULL);
+
+	return func__sched_pi_setprio(ctx, boosted, booster);
+}
+
 SEC("tp_btf/sched_wakeup")
 int handle__sched_wakeup(u64 *ctx)
 {

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: NULL pointer deref when running BPF monitor program (6.11.0-rc1)
  2024-08-05 17:00   ` Alexei Starovoitov
  2024-08-06  7:08     ` Juri Lelli
  2024-08-06 13:17     ` Jiri Olsa
@ 2024-08-06 13:24     ` Jiri Olsa
  2024-08-06 18:44       ` Alexei Starovoitov
  2 siblings, 1 reply; 27+ messages in thread
From: Jiri Olsa @ 2024-08-06 13:24 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: Jiri Olsa, Juri Lelli, bpf, LKML, Artem Savkov

On Mon, Aug 05, 2024 at 10:00:40AM -0700, Alexei Starovoitov wrote:
> On Mon, Aug 5, 2024 at 9:50 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> >
> > On Mon, Aug 05, 2024 at 11:20:11AM +0200, Juri Lelli wrote:
> >
> > SNIP
> >
> > > [  154.566882] BUG: kernel NULL pointer dereference, address: 000000000000040c
> > > [  154.573844] #PF: supervisor read access in kernel mode
> > > [  154.578982] #PF: error_code(0x0000) - not-present page
> > > [  154.584122] PGD 146fff067 P4D 146fff067 PUD 10fc00067 PMD 0
> > > [  154.589780] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
> > > [  154.594659] CPU: 28 UID: 0 PID: 2234 Comm: thread0-13 Kdump: loaded Not tainted 6.11.0-rc1 #8
> > > [  154.603179] Hardware name: Dell Inc. PowerEdge R740/04FC42, BIOS 2.10.2 02/24/2021
> > > [  154.610744] RIP: 0010:bpf_prog_ec8173ca2868eb50_handle__sched_pi_setprio+0x22/0xd7
> > > [  154.618310] Code: cc cc cc cc cc cc cc cc 0f 1f 44 00 00 66 90 55 48 89 e5 48 81 ec 30 00 00 00 53 41 55 41 56 48 89 fb 4c 8b 6b 00 4c 8b 73 08 <41> 8b be 0c 04 00 00 48 83 ff 06 0f 85 9b 00 00 00 41 8b be c0 09
> > > [  154.637052] RSP: 0018:ffffabac60aebbc0 EFLAGS: 00010086
> > > [  154.642278] RAX: ffffffffc03fba5c RBX: ffffabac60aebc28 RCX: 000000000000001f
> > > [  154.649411] RDX: ffff95a90b4e4180 RSI: ffffabac4e639048 RDI: ffffabac60aebc28
> > > [  154.656544] RBP: ffffabac60aebc08 R08: 00000023fce7674a R09: ffff95a91d85af38
> > > [  154.663674] R10: ffff95a91d85a0c0 R11: 000000003357e518 R12: 0000000000000000
> > > [  154.670807] R13: ffff95a90b4e4180 R14: 0000000000000000 R15: 0000000000000001
> > > [  154.677939] FS:  00007ffa6d600640(0000) GS:ffff95c01bf00000(0000) knlGS:0000000000000000
> > > [  154.686026] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [  154.691769] CR2: 000000000000040c CR3: 000000014b9f2005 CR4: 00000000007706f0
> > > [  154.698903] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > [  154.706035] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > [  154.713168] PKRU: 55555554
> > > [  154.715879] Call Trace:
> > > [  154.718332]  <TASK>
> > > [  154.720439]  ? __die+0x20/0x70
> > > [  154.723498]  ? page_fault_oops+0x75/0x170
> > > [  154.727508]  ? sysvec_irq_work+0xb/0x90
> > > [  154.731348]  ? exc_page_fault+0x64/0x140
> > > [  154.735275]  ? asm_exc_page_fault+0x22/0x30
> > > [  154.739461]  ? 0xffffffffc03fba5c
> > > [  154.742780]  ? bpf_prog_ec8173ca2868eb50_handle__sched_pi_setprio+0x22/0xd7
> >
> > hi,
> > reproduced.. AFAICS looks like the bpf program somehow lost the booster != NULL
> > check and just load the policy field without it and crash when booster is rubbish
> >
> > int handle__sched_pi_setprio(u64 * ctx):
> > ; int handle__sched_pi_setprio(u64 *ctx)
> >    0: (bf) r6 = r1
> > ; struct task_struct *boosted = (void *) ctx[0];
> >    1: (79) r7 = *(u64 *)(r6 +0)
> > ; struct task_struct *booster = (void *) ctx[1];
> >    2: (79) r8 = *(u64 *)(r6 +8)
> > ; if (booster->policy != SCHED_DEADLINE)
> >
> > curious why the check disappeared, because object file has it, so I guess verifier
> > took it out for some reason, will check
> 
> Juri,
> 
> Thanks for flagging!
> 
> Jiri,
> 
> the verifier removes the check because it assumes that pointers
> passed by the kernel into tracepoint are valid and trusted.
> In this case:
>         trace_sched_pi_setprio(p, pi_task);
> 
> pi_task can be NULL.
> 
> We cannot make all tracepoint pointers to be PTR_TRUSTED | PTR_MAYBE_NULL
> by default, since it will break a bunch of progs.
> Instead we can annotate this tracepoint arg as __nullable and
> teach the verifier to recognize such special arguments of tracepoints.

ok, so you mean to be able to mark it in event header like:

  TRACE_EVENT(sched_pi_setprio,
        TP_PROTO(struct task_struct *tsk, struct task_struct *pi_task __nullable),

I guess we could make pahole to emit DECL_TAG for that argument,
but I'm not sure how to propagate that __nullable info to pahole

while wondering about that, I tried the direct fix below ;-)

jirka


---
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 95426d5b634e..1a20bbdead64 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -6377,6 +6377,25 @@ int btf_ctx_arg_offset(const struct btf *btf, const struct btf_type *func_proto,
 	return off;
 }
 
+static bool is_tracing_prog_raw_tp(const struct bpf_prog *prog, const char *name)
+{
+	struct btf *btf = prog->aux->attach_btf;
+	const struct btf_type *t;
+	const char *tname;
+
+	if (prog->expected_attach_type != BPF_TRACE_RAW_TP)
+		return false;
+
+	t = btf_type_by_id(btf, prog->aux->attach_btf_id);
+	if (!t)
+		return false;
+
+	tname = btf_name_by_offset(btf, t->name_off);
+	if (!tname)
+		return false;
+	return !strcmp(tname, name);
+}
+
 bool btf_ctx_access(int off, int size, enum bpf_access_type type,
 		    const struct bpf_prog *prog,
 		    struct bpf_insn_access_aux *info)
@@ -6544,6 +6563,10 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
 		}
 	}
 
+	/* Second argument of sched_pi_setprio tracepoint can be null */
+	if (is_tracing_prog_raw_tp(prog, "btf_trace_sched_pi_setprio") && arg == 1)
+		info->reg_type |= PTR_MAYBE_NULL;
+
 	info->btf = btf;
 	info->btf_id = t->type;
 	t = btf_type_by_id(btf, t->type);

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: NULL pointer deref when running BPF monitor program (6.11.0-rc1)
  2024-08-06 13:24     ` Jiri Olsa
@ 2024-08-06 18:44       ` Alexei Starovoitov
  2024-08-08 10:46         ` Jiri Olsa
  0 siblings, 1 reply; 27+ messages in thread
From: Alexei Starovoitov @ 2024-08-06 18:44 UTC (permalink / raw)
  To: Jiri Olsa; +Cc: Juri Lelli, bpf, LKML, Artem Savkov

On Tue, Aug 6, 2024 at 6:24 AM Jiri Olsa <olsajiri@gmail.com> wrote:
>
> > Jiri,
> >
> > the verifier removes the check because it assumes that pointers
> > passed by the kernel into tracepoint are valid and trusted.
> > In this case:
> >         trace_sched_pi_setprio(p, pi_task);
> >
> > pi_task can be NULL.
> >
> > We cannot make all tracepoint pointers to be PTR_TRUSTED | PTR_MAYBE_NULL
> > by default, since it will break a bunch of progs.
> > Instead we can annotate this tracepoint arg as __nullable and
> > teach the verifier to recognize such special arguments of tracepoints.
>
> ok, so you mean to be able to mark it in event header like:
>
>   TRACE_EVENT(sched_pi_setprio,
>         TP_PROTO(struct task_struct *tsk, struct task_struct *pi_task __nullable),
>
> I guess we could make pahole to emit DECL_TAG for that argument,
> but I'm not sure how to propagate that __nullable info to pahole
>
> while wondering about that, I tried the direct fix below ;-)

We don't need to rush such a hack below.
No need to add decl_tag and change pahole either.
The arg name is already vmlinux BTF:
[51371] FUNC_PROTO '(anon)' ret_type_id=0 vlen=3
        '__data' type_id=61
        'tsk' type_id=77
        'pi_task' type_id=77
[51372] FUNC '__bpf_trace_sched_pi_setprio' type_id=51371 linkage=static

just need to rename "pi_task" to "pi_task__nullable"
and teach the verifier.


> jirka
>
>
> ---
> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> index 95426d5b634e..1a20bbdead64 100644
> --- a/kernel/bpf/btf.c
> +++ b/kernel/bpf/btf.c
> @@ -6377,6 +6377,25 @@ int btf_ctx_arg_offset(const struct btf *btf, const struct btf_type *func_proto,
>         return off;
>  }
>
> +static bool is_tracing_prog_raw_tp(const struct bpf_prog *prog, const char *name)
> +{
> +       struct btf *btf = prog->aux->attach_btf;
> +       const struct btf_type *t;
> +       const char *tname;
> +
> +       if (prog->expected_attach_type != BPF_TRACE_RAW_TP)
> +               return false;
> +
> +       t = btf_type_by_id(btf, prog->aux->attach_btf_id);
> +       if (!t)
> +               return false;
> +
> +       tname = btf_name_by_offset(btf, t->name_off);
> +       if (!tname)
> +               return false;
> +       return !strcmp(tname, name);
> +}
> +
>  bool btf_ctx_access(int off, int size, enum bpf_access_type type,
>                     const struct bpf_prog *prog,
>                     struct bpf_insn_access_aux *info)
> @@ -6544,6 +6563,10 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
>                 }
>         }
>
> +       /* Second argument of sched_pi_setprio tracepoint can be null */
> +       if (is_tracing_prog_raw_tp(prog, "btf_trace_sched_pi_setprio") && arg == 1)
> +               info->reg_type |= PTR_MAYBE_NULL;
> +
>         info->btf = btf;
>         info->btf_id = t->type;
>         t = btf_type_by_id(btf, t->type);

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NULL pointer deref when running BPF monitor program (6.11.0-rc1)
  2024-08-06 18:44       ` Alexei Starovoitov
@ 2024-08-08 10:46         ` Jiri Olsa
  2024-08-08 15:43           ` Alexei Starovoitov
  0 siblings, 1 reply; 27+ messages in thread
From: Jiri Olsa @ 2024-08-08 10:46 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: Jiri Olsa, Juri Lelli, bpf, LKML, Artem Savkov

On Tue, Aug 06, 2024 at 11:44:52AM -0700, Alexei Starovoitov wrote:
> On Tue, Aug 6, 2024 at 6:24 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> >
> > > Jiri,
> > >
> > > the verifier removes the check because it assumes that pointers
> > > passed by the kernel into tracepoint are valid and trusted.
> > > In this case:
> > >         trace_sched_pi_setprio(p, pi_task);
> > >
> > > pi_task can be NULL.
> > >
> > > We cannot make all tracepoint pointers to be PTR_TRUSTED | PTR_MAYBE_NULL
> > > by default, since it will break a bunch of progs.
> > > Instead we can annotate this tracepoint arg as __nullable and
> > > teach the verifier to recognize such special arguments of tracepoints.
> >
> > ok, so you mean to be able to mark it in event header like:
> >
> >   TRACE_EVENT(sched_pi_setprio,
> >         TP_PROTO(struct task_struct *tsk, struct task_struct *pi_task __nullable),
> >
> > I guess we could make pahole to emit DECL_TAG for that argument,
> > but I'm not sure how to propagate that __nullable info to pahole
> >
> > while wondering about that, I tried the direct fix below ;-)
> 
> We don't need to rush such a hack below.
> No need to add decl_tag and change pahole either.
> The arg name is already vmlinux BTF:
> [51371] FUNC_PROTO '(anon)' ret_type_id=0 vlen=3
>         '__data' type_id=61
>         'tsk' type_id=77
>         'pi_task' type_id=77
> [51372] FUNC '__bpf_trace_sched_pi_setprio' type_id=51371 linkage=static
> 
> just need to rename "pi_task" to "pi_task__nullable"
> and teach the verifier.

the problem is that btf_trace_<xxx> is typedef 

  typedef void (*btf_trace_##call)(void *__data, proto);

and dwarf does not store argument names for subroutine type entry,
so it's not in BTF's TYPEDEF either

it's the btf_trace_##call typedef ID that verifier has to work with,
I wonder we could somehow associate that ID with __bpf_trace_##call
subroutine entry which has the argument names

we could store __bpf_trace_##call's BTF_ID in __bpf_raw_tp_map record,
but we'd need to do the lookup based on the tracepoint name when loading
the program .. ATM we do the lookup __bpf_raw_tp_map record only when
doing attach, so we would need to move it to program load time

or we could 'fix' the argument names in pahole, but that'd probably
mean extra setup and hash lookup, so also not great

jirka


> 
> 
> > jirka
> >
> >
> > ---
> > diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> > index 95426d5b634e..1a20bbdead64 100644
> > --- a/kernel/bpf/btf.c
> > +++ b/kernel/bpf/btf.c
> > @@ -6377,6 +6377,25 @@ int btf_ctx_arg_offset(const struct btf *btf, const struct btf_type *func_proto,
> >         return off;
> >  }
> >
> > +static bool is_tracing_prog_raw_tp(const struct bpf_prog *prog, const char *name)
> > +{
> > +       struct btf *btf = prog->aux->attach_btf;
> > +       const struct btf_type *t;
> > +       const char *tname;
> > +
> > +       if (prog->expected_attach_type != BPF_TRACE_RAW_TP)
> > +               return false;
> > +
> > +       t = btf_type_by_id(btf, prog->aux->attach_btf_id);
> > +       if (!t)
> > +               return false;
> > +
> > +       tname = btf_name_by_offset(btf, t->name_off);
> > +       if (!tname)
> > +               return false;
> > +       return !strcmp(tname, name);
> > +}
> > +
> >  bool btf_ctx_access(int off, int size, enum bpf_access_type type,
> >                     const struct bpf_prog *prog,
> >                     struct bpf_insn_access_aux *info)
> > @@ -6544,6 +6563,10 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
> >                 }
> >         }
> >
> > +       /* Second argument of sched_pi_setprio tracepoint can be null */
> > +       if (is_tracing_prog_raw_tp(prog, "btf_trace_sched_pi_setprio") && arg == 1)
> > +               info->reg_type |= PTR_MAYBE_NULL;
> > +
> >         info->btf = btf;
> >         info->btf_id = t->type;
> >         t = btf_type_by_id(btf, t->type);

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NULL pointer deref when running BPF monitor program (6.11.0-rc1)
  2024-08-08 10:46         ` Jiri Olsa
@ 2024-08-08 15:43           ` Alexei Starovoitov
  2024-08-15 11:48             ` Jiri Olsa
  0 siblings, 1 reply; 27+ messages in thread
From: Alexei Starovoitov @ 2024-08-08 15:43 UTC (permalink / raw)
  To: Jiri Olsa; +Cc: Juri Lelli, bpf, LKML, Artem Savkov

On Thu, Aug 8, 2024 at 3:46 AM Jiri Olsa <olsajiri@gmail.com> wrote:
>
> On Tue, Aug 06, 2024 at 11:44:52AM -0700, Alexei Starovoitov wrote:
> > On Tue, Aug 6, 2024 at 6:24 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> > >
> > > > Jiri,
> > > >
> > > > the verifier removes the check because it assumes that pointers
> > > > passed by the kernel into tracepoint are valid and trusted.
> > > > In this case:
> > > >         trace_sched_pi_setprio(p, pi_task);
> > > >
> > > > pi_task can be NULL.
> > > >
> > > > We cannot make all tracepoint pointers to be PTR_TRUSTED | PTR_MAYBE_NULL
> > > > by default, since it will break a bunch of progs.
> > > > Instead we can annotate this tracepoint arg as __nullable and
> > > > teach the verifier to recognize such special arguments of tracepoints.
> > >
> > > ok, so you mean to be able to mark it in event header like:
> > >
> > >   TRACE_EVENT(sched_pi_setprio,
> > >         TP_PROTO(struct task_struct *tsk, struct task_struct *pi_task __nullable),
> > >
> > > I guess we could make pahole to emit DECL_TAG for that argument,
> > > but I'm not sure how to propagate that __nullable info to pahole
> > >
> > > while wondering about that, I tried the direct fix below ;-)
> >
> > We don't need to rush such a hack below.
> > No need to add decl_tag and change pahole either.
> > The arg name is already vmlinux BTF:
> > [51371] FUNC_PROTO '(anon)' ret_type_id=0 vlen=3
> >         '__data' type_id=61
> >         'tsk' type_id=77
> >         'pi_task' type_id=77
> > [51372] FUNC '__bpf_trace_sched_pi_setprio' type_id=51371 linkage=static
> >
> > just need to rename "pi_task" to "pi_task__nullable"
> > and teach the verifier.
>
> the problem is that btf_trace_<xxx> is typedef
>
>   typedef void (*btf_trace_##call)(void *__data, proto);
>
> and dwarf does not store argument names for subroutine type entry,
> so it's not in BTF's TYPEDEF either
>
> it's the btf_trace_##call typedef ID that verifier has to work with,
> I wonder we could somehow associate that ID with __bpf_trace_##call
> subroutine entry which has the argument names
>
> we could store __bpf_trace_##call's BTF_ID in __bpf_raw_tp_map record,
> but we'd need to do the lookup based on the tracepoint name when loading
> the program .. ATM we do the lookup __bpf_raw_tp_map record only when
> doing attach, so we would need to move it to program load time
>
> or we could 'fix' the argument names in pahole, but that'd probably
> mean extra setup and hash lookup, so also not great

I would do a simple string search in vmlinux BTF for "__bpf_trace" + tp name.
No need to add btf_id-s and waste memory to speed up the slow path.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NULL pointer deref when running BPF monitor program (6.11.0-rc1)
  2024-08-08 15:43           ` Alexei Starovoitov
@ 2024-08-15 11:48             ` Jiri Olsa
  2024-08-15 12:37               ` Alexei Starovoitov
  0 siblings, 1 reply; 27+ messages in thread
From: Jiri Olsa @ 2024-08-15 11:48 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Jiri Olsa, Juri Lelli, bpf, LKML, Artem Savkov, Steven Rostedt,
	Jose E. Marchesi

On Thu, Aug 08, 2024 at 08:43:05AM -0700, Alexei Starovoitov wrote:
> On Thu, Aug 8, 2024 at 3:46 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> >
> > On Tue, Aug 06, 2024 at 11:44:52AM -0700, Alexei Starovoitov wrote:
> > > On Tue, Aug 6, 2024 at 6:24 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> > > >
> > > > > Jiri,
> > > > >
> > > > > the verifier removes the check because it assumes that pointers
> > > > > passed by the kernel into tracepoint are valid and trusted.
> > > > > In this case:
> > > > >         trace_sched_pi_setprio(p, pi_task);
> > > > >
> > > > > pi_task can be NULL.
> > > > >
> > > > > We cannot make all tracepoint pointers to be PTR_TRUSTED | PTR_MAYBE_NULL
> > > > > by default, since it will break a bunch of progs.
> > > > > Instead we can annotate this tracepoint arg as __nullable and
> > > > > teach the verifier to recognize such special arguments of tracepoints.
> > > >
> > > > ok, so you mean to be able to mark it in event header like:
> > > >
> > > >   TRACE_EVENT(sched_pi_setprio,
> > > >         TP_PROTO(struct task_struct *tsk, struct task_struct *pi_task __nullable),
> > > >
> > > > I guess we could make pahole to emit DECL_TAG for that argument,
> > > > but I'm not sure how to propagate that __nullable info to pahole
> > > >
> > > > while wondering about that, I tried the direct fix below ;-)
> > >
> > > We don't need to rush such a hack below.
> > > No need to add decl_tag and change pahole either.
> > > The arg name is already vmlinux BTF:
> > > [51371] FUNC_PROTO '(anon)' ret_type_id=0 vlen=3
> > >         '__data' type_id=61
> > >         'tsk' type_id=77
> > >         'pi_task' type_id=77
> > > [51372] FUNC '__bpf_trace_sched_pi_setprio' type_id=51371 linkage=static
> > >
> > > just need to rename "pi_task" to "pi_task__nullable"
> > > and teach the verifier.
> >
> > the problem is that btf_trace_<xxx> is typedef
> >
> >   typedef void (*btf_trace_##call)(void *__data, proto);
> >
> > and dwarf does not store argument names for subroutine type entry,
> > so it's not in BTF's TYPEDEF either
> >
> > it's the btf_trace_##call typedef ID that verifier has to work with,
> > I wonder we could somehow associate that ID with __bpf_trace_##call
> > subroutine entry which has the argument names
> >
> > we could store __bpf_trace_##call's BTF_ID in __bpf_raw_tp_map record,
> > but we'd need to do the lookup based on the tracepoint name when loading
> > the program .. ATM we do the lookup __bpf_raw_tp_map record only when
> > doing attach, so we would need to move it to program load time
> >
> > or we could 'fix' the argument names in pahole, but that'd probably
> > mean extra setup and hash lookup, so also not great
> 
> I would do a simple string search in vmlinux BTF for "__bpf_trace" + tp name.
> No need to add btf_id-s and waste memory to speed up the slow path.

I checked bit more and there are more tracepoints with the same issue,
the first diff stat looks like:

	 include/trace/events/afs.h                            | 44 ++++++++++++++++++++++----------------------
	 include/trace/events/cachefiles.h                     | 96 ++++++++++++++++++++++++++++++++++++++++++++++++------------------------------------------------
	 include/trace/events/ext4.h                           |  6 +++---
	 include/trace/events/fib.h                            | 16 ++++++++--------
	 include/trace/events/filelock.h                       | 38 +++++++++++++++++++-------------------
	 include/trace/events/host1x.h                         | 10 +++++-----
	 include/trace/events/huge_memory.h                    | 24 ++++++++++++------------
	 include/trace/events/kmem.h                           | 18 +++++++++---------
	 include/trace/events/netfs.h                          | 16 ++++++++--------
	 include/trace/events/power.h                          |  6 +++---
	 include/trace/events/qdisc.h                          |  8 ++++----
	 include/trace/events/rxrpc.h                          | 12 ++++++------
	 include/trace/events/sched.h                          | 12 ++++++------
	 include/trace/events/sunrpc.h                         |  8 ++++----
	 include/trace/events/tcp.h                            | 14 +++++++-------
	 include/trace/events/tegra_apb_dma.h                  |  6 +++---
	 include/trace/events/timer_migration.h                | 10 +++++-----
	 include/trace/events/writeback.h                      | 16 ++++++++--------

plus there's one case where pointer needs to be checked with IS_ERR in
include/trace/events/rdma_core.h trace_mr_alloc/mr_integ_alloc

I'm not excited about the '_nullable' argument suffix, because it's lot
of extra changes/renames in TP_fast_assign and it does not solve the
IS_ERR case above

I checked on the type tag and with llvm build we get the TYPE_TAG info
nicely in BTF:

	[119148] TYPEDEF 'btf_trace_sched_pi_setprio' type_id=119149
	[119149] PTR '(anon)' type_id=119150
	[119150] FUNC_PROTO '(anon)' ret_type_id=0 vlen=3
		'(anon)' type_id=27
		'(anon)' type_id=678
		'(anon)' type_id=119152
	[119151] TYPE_TAG 'nullable' type_id=679
	[119152] PTR '(anon)' type_id=119151

	[679] STRUCT 'task_struct' size=15424 vlen=277

which we can easily check in verifier.. the tracepoint definition would look like:

	-       TP_PROTO(struct task_struct *tsk, struct task_struct *pi_task),
	+       TP_PROTO(struct task_struct *tsk, struct task_struct __nullable *pi_task),

and no other change in TP_fast_assign is needed

I think using the type tag for this is nicer, but I'm not sure where's
gcc at with btf_type_tag implementation, need to check on that

jirka

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NULL pointer deref when running BPF monitor program (6.11.0-rc1)
  2024-08-15 11:48             ` Jiri Olsa
@ 2024-08-15 12:37               ` Alexei Starovoitov
  2024-08-16 14:10                 ` Steven Rostedt
  0 siblings, 1 reply; 27+ messages in thread
From: Alexei Starovoitov @ 2024-08-15 12:37 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Juri Lelli, bpf, LKML, Artem Savkov, Steven Rostedt,
	Jose E. Marchesi

On Thu, Aug 15, 2024 at 1:48 PM Jiri Olsa <olsajiri@gmail.com> wrote:
>
> On Thu, Aug 08, 2024 at 08:43:05AM -0700, Alexei Starovoitov wrote:
> > On Thu, Aug 8, 2024 at 3:46 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> > >
> > > On Tue, Aug 06, 2024 at 11:44:52AM -0700, Alexei Starovoitov wrote:
> > > > On Tue, Aug 6, 2024 at 6:24 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> > > > >
> > > > > > Jiri,
> > > > > >
> > > > > > the verifier removes the check because it assumes that pointers
> > > > > > passed by the kernel into tracepoint are valid and trusted.
> > > > > > In this case:
> > > > > >         trace_sched_pi_setprio(p, pi_task);
> > > > > >
> > > > > > pi_task can be NULL.
> > > > > >
> > > > > > We cannot make all tracepoint pointers to be PTR_TRUSTED | PTR_MAYBE_NULL
> > > > > > by default, since it will break a bunch of progs.
> > > > > > Instead we can annotate this tracepoint arg as __nullable and
> > > > > > teach the verifier to recognize such special arguments of tracepoints.
> > > > >
> > > > > ok, so you mean to be able to mark it in event header like:
> > > > >
> > > > >   TRACE_EVENT(sched_pi_setprio,
> > > > >         TP_PROTO(struct task_struct *tsk, struct task_struct *pi_task __nullable),
> > > > >
> > > > > I guess we could make pahole to emit DECL_TAG for that argument,
> > > > > but I'm not sure how to propagate that __nullable info to pahole
> > > > >
> > > > > while wondering about that, I tried the direct fix below ;-)
> > > >
> > > > We don't need to rush such a hack below.
> > > > No need to add decl_tag and change pahole either.
> > > > The arg name is already vmlinux BTF:
> > > > [51371] FUNC_PROTO '(anon)' ret_type_id=0 vlen=3
> > > >         '__data' type_id=61
> > > >         'tsk' type_id=77
> > > >         'pi_task' type_id=77
> > > > [51372] FUNC '__bpf_trace_sched_pi_setprio' type_id=51371 linkage=static
> > > >
> > > > just need to rename "pi_task" to "pi_task__nullable"
> > > > and teach the verifier.
> > >
> > > the problem is that btf_trace_<xxx> is typedef
> > >
> > >   typedef void (*btf_trace_##call)(void *__data, proto);
> > >
> > > and dwarf does not store argument names for subroutine type entry,
> > > so it's not in BTF's TYPEDEF either
> > >
> > > it's the btf_trace_##call typedef ID that verifier has to work with,
> > > I wonder we could somehow associate that ID with __bpf_trace_##call
> > > subroutine entry which has the argument names
> > >
> > > we could store __bpf_trace_##call's BTF_ID in __bpf_raw_tp_map record,
> > > but we'd need to do the lookup based on the tracepoint name when loading
> > > the program .. ATM we do the lookup __bpf_raw_tp_map record only when
> > > doing attach, so we would need to move it to program load time
> > >
> > > or we could 'fix' the argument names in pahole, but that'd probably
> > > mean extra setup and hash lookup, so also not great
> >
> > I would do a simple string search in vmlinux BTF for "__bpf_trace" + tp name.
> > No need to add btf_id-s and waste memory to speed up the slow path.
>
> I checked bit more and there are more tracepoints with the same issue,
> the first diff stat looks like:
>
>          include/trace/events/afs.h                            | 44 ++++++++++++++++++++++----------------------
>          include/trace/events/cachefiles.h                     | 96 ++++++++++++++++++++++++++++++++++++++++++++++++------------------------------------------------
>          include/trace/events/ext4.h                           |  6 +++---
>          include/trace/events/fib.h                            | 16 ++++++++--------
>          include/trace/events/filelock.h                       | 38 +++++++++++++++++++-------------------
>          include/trace/events/host1x.h                         | 10 +++++-----
>          include/trace/events/huge_memory.h                    | 24 ++++++++++++------------
>          include/trace/events/kmem.h                           | 18 +++++++++---------
>          include/trace/events/netfs.h                          | 16 ++++++++--------
>          include/trace/events/power.h                          |  6 +++---
>          include/trace/events/qdisc.h                          |  8 ++++----
>          include/trace/events/rxrpc.h                          | 12 ++++++------
>          include/trace/events/sched.h                          | 12 ++++++------
>          include/trace/events/sunrpc.h                         |  8 ++++----
>          include/trace/events/tcp.h                            | 14 +++++++-------
>          include/trace/events/tegra_apb_dma.h                  |  6 +++---
>          include/trace/events/timer_migration.h                | 10 +++++-----
>          include/trace/events/writeback.h                      | 16 ++++++++--------
>
> plus there's one case where pointer needs to be checked with IS_ERR in
> include/trace/events/rdma_core.h trace_mr_alloc/mr_integ_alloc
>
> I'm not excited about the '_nullable' argument suffix, because it's lot
> of extra changes/renames in TP_fast_assign and it does not solve the
> IS_ERR case above
>
> I checked on the type tag and with llvm build we get the TYPE_TAG info
> nicely in BTF:
>
>         [119148] TYPEDEF 'btf_trace_sched_pi_setprio' type_id=119149
>         [119149] PTR '(anon)' type_id=119150
>         [119150] FUNC_PROTO '(anon)' ret_type_id=0 vlen=3
>                 '(anon)' type_id=27
>                 '(anon)' type_id=678
>                 '(anon)' type_id=119152
>         [119151] TYPE_TAG 'nullable' type_id=679
>         [119152] PTR '(anon)' type_id=119151
>
>         [679] STRUCT 'task_struct' size=15424 vlen=277
>
> which we can easily check in verifier.. the tracepoint definition would look like:
>
>         -       TP_PROTO(struct task_struct *tsk, struct task_struct *pi_task),
>         +       TP_PROTO(struct task_struct *tsk, struct task_struct __nullable *pi_task),
>
> and no other change in TP_fast_assign is needed
>
> I think using the type tag for this is nicer, but I'm not sure where's
> gcc at with btf_type_tag implementation, need to check on that

Unfortunately last time I heard gcc was still far.
So we cannot rely on decl_tag or type_tag yet.
Aside from __nullable we would need another suffix to indicate is_err.

Maybe we can do something with the TP* macro?
So the suffix only seen one place instead of search-and-replace
through the body?

but imo above diff stat doesn't look too bad.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NULL pointer deref when running BPF monitor program (6.11.0-rc1)
  2024-08-15 12:37               ` Alexei Starovoitov
@ 2024-08-16 14:10                 ` Steven Rostedt
  2024-08-16 18:59                   ` Jiri Olsa
  0 siblings, 1 reply; 27+ messages in thread
From: Steven Rostedt @ 2024-08-16 14:10 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Jiri Olsa, Juri Lelli, bpf, LKML, Artem Savkov, Jose E. Marchesi

On Thu, 15 Aug 2024 14:37:18 +0200
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:

> > I checked bit more and there are more tracepoints with the same issue,
> > the first diff stat looks like:
> >
> >          include/trace/events/afs.h                            | 44 ++++++++++++++++++++++----------------------
> >          include/trace/events/cachefiles.h                     | 96 ++++++++++++++++++++++++++++++++++++++++++++++++------------------------------------------------
> >          include/trace/events/ext4.h                           |  6 +++---
> >          include/trace/events/fib.h                            | 16 ++++++++--------
> >          include/trace/events/filelock.h                       | 38 +++++++++++++++++++-------------------
> >          include/trace/events/host1x.h                         | 10 +++++-----
> >          include/trace/events/huge_memory.h                    | 24 ++++++++++++------------
> >          include/trace/events/kmem.h                           | 18 +++++++++---------
> >          include/trace/events/netfs.h                          | 16 ++++++++--------
> >          include/trace/events/power.h                          |  6 +++---
> >          include/trace/events/qdisc.h                          |  8 ++++----
> >          include/trace/events/rxrpc.h                          | 12 ++++++------
> >          include/trace/events/sched.h                          | 12 ++++++------
> >          include/trace/events/sunrpc.h                         |  8 ++++----
> >          include/trace/events/tcp.h                            | 14 +++++++-------
> >          include/trace/events/tegra_apb_dma.h                  |  6 +++---
> >          include/trace/events/timer_migration.h                | 10 +++++-----
> >          include/trace/events/writeback.h                      | 16 ++++++++--------
> >
> > plus there's one case where pointer needs to be checked with IS_ERR in
> > include/trace/events/rdma_core.h trace_mr_alloc/mr_integ_alloc
> >
> > I'm not excited about the '_nullable' argument suffix, because it's lot
> > of extra changes/renames in TP_fast_assign and it does not solve the
> > IS_ERR case above
> >
> > I checked on the type tag and with llvm build we get the TYPE_TAG info
> > nicely in BTF:
> >
> >         [119148] TYPEDEF 'btf_trace_sched_pi_setprio' type_id=119149
> >         [119149] PTR '(anon)' type_id=119150
> >         [119150] FUNC_PROTO '(anon)' ret_type_id=0 vlen=3
> >                 '(anon)' type_id=27
> >                 '(anon)' type_id=678
> >                 '(anon)' type_id=119152
> >         [119151] TYPE_TAG 'nullable' type_id=679
> >         [119152] PTR '(anon)' type_id=119151
> >
> >         [679] STRUCT 'task_struct' size=15424 vlen=277
> >
> > which we can easily check in verifier.. the tracepoint definition would look like:
> >
> >         -       TP_PROTO(struct task_struct *tsk, struct task_struct *pi_task),
> >         +       TP_PROTO(struct task_struct *tsk, struct task_struct __nullable *pi_task),
> >
> > and no other change in TP_fast_assign is needed
> >
> > I think using the type tag for this is nicer, but I'm not sure where's
> > gcc at with btf_type_tag implementation, need to check on that  
> 
> Unfortunately last time I heard gcc was still far.
> So we cannot rely on decl_tag or type_tag yet.
> Aside from __nullable we would need another suffix to indicate is_err.
> 
> Maybe we can do something with the TP* macro?
> So the suffix only seen one place instead of search-and-replace
> through the body?
> 
> but imo above diff stat doesn't look too bad.

I'm fine with annotation of parameters, but I really don't want this
being part of the TP_fast_assign() content. What would that look like
anyway?

-- Steve

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NULL pointer deref when running BPF monitor program (6.11.0-rc1)
  2024-08-16 14:10                 ` Steven Rostedt
@ 2024-08-16 18:59                   ` Jiri Olsa
  2024-08-16 19:30                     ` Steven Rostedt
  0 siblings, 1 reply; 27+ messages in thread
From: Jiri Olsa @ 2024-08-16 18:59 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Alexei Starovoitov, Jiri Olsa, Juri Lelli, bpf, LKML,
	Artem Savkov, Jose E. Marchesi

On Fri, Aug 16, 2024 at 10:10:31AM -0400, Steven Rostedt wrote:
> On Thu, 15 Aug 2024 14:37:18 +0200
> Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> 
> > > I checked bit more and there are more tracepoints with the same issue,
> > > the first diff stat looks like:
> > >
> > >          include/trace/events/afs.h                            | 44 ++++++++++++++++++++++----------------------
> > >          include/trace/events/cachefiles.h                     | 96 ++++++++++++++++++++++++++++++++++++++++++++++++------------------------------------------------
> > >          include/trace/events/ext4.h                           |  6 +++---
> > >          include/trace/events/fib.h                            | 16 ++++++++--------
> > >          include/trace/events/filelock.h                       | 38 +++++++++++++++++++-------------------
> > >          include/trace/events/host1x.h                         | 10 +++++-----
> > >          include/trace/events/huge_memory.h                    | 24 ++++++++++++------------
> > >          include/trace/events/kmem.h                           | 18 +++++++++---------
> > >          include/trace/events/netfs.h                          | 16 ++++++++--------
> > >          include/trace/events/power.h                          |  6 +++---
> > >          include/trace/events/qdisc.h                          |  8 ++++----
> > >          include/trace/events/rxrpc.h                          | 12 ++++++------
> > >          include/trace/events/sched.h                          | 12 ++++++------
> > >          include/trace/events/sunrpc.h                         |  8 ++++----
> > >          include/trace/events/tcp.h                            | 14 +++++++-------
> > >          include/trace/events/tegra_apb_dma.h                  |  6 +++---
> > >          include/trace/events/timer_migration.h                | 10 +++++-----
> > >          include/trace/events/writeback.h                      | 16 ++++++++--------
> > >
> > > plus there's one case where pointer needs to be checked with IS_ERR in
> > > include/trace/events/rdma_core.h trace_mr_alloc/mr_integ_alloc
> > >
> > > I'm not excited about the '_nullable' argument suffix, because it's lot
> > > of extra changes/renames in TP_fast_assign and it does not solve the
> > > IS_ERR case above
> > >
> > > I checked on the type tag and with llvm build we get the TYPE_TAG info
> > > nicely in BTF:
> > >
> > >         [119148] TYPEDEF 'btf_trace_sched_pi_setprio' type_id=119149
> > >         [119149] PTR '(anon)' type_id=119150
> > >         [119150] FUNC_PROTO '(anon)' ret_type_id=0 vlen=3
> > >                 '(anon)' type_id=27
> > >                 '(anon)' type_id=678
> > >                 '(anon)' type_id=119152
> > >         [119151] TYPE_TAG 'nullable' type_id=679
> > >         [119152] PTR '(anon)' type_id=119151
> > >
> > >         [679] STRUCT 'task_struct' size=15424 vlen=277
> > >
> > > which we can easily check in verifier.. the tracepoint definition would look like:
> > >
> > >         -       TP_PROTO(struct task_struct *tsk, struct task_struct *pi_task),
> > >         +       TP_PROTO(struct task_struct *tsk, struct task_struct __nullable *pi_task),
> > >
> > > and no other change in TP_fast_assign is needed
> > >
> > > I think using the type tag for this is nicer, but I'm not sure where's
> > > gcc at with btf_type_tag implementation, need to check on that  
> > 
> > Unfortunately last time I heard gcc was still far.
> > So we cannot rely on decl_tag or type_tag yet.
> > Aside from __nullable we would need another suffix to indicate is_err.
> > 
> > Maybe we can do something with the TP* macro?
> > So the suffix only seen one place instead of search-and-replace
> > through the body?
> > 
> > but imo above diff stat doesn't look too bad.
> 
> I'm fine with annotation of parameters, but I really don't want this
> being part of the TP_fast_assign() content. What would that look like
> anyway?

best option would be using btf_type_tag annotation, which would look like:

	-       TP_PROTO(struct task_struct *tsk, struct task_struct *pi_task),
	+       TP_PROTO(struct task_struct *tsk, struct task_struct __nullable *pi_task),

but gcc does not support that yet, just clang

so far the only working solution I have is adding '__nullable' suffix
to argument name:

	diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
	index 9ea4c404bd4e..fc46f0b42741 100644
	--- a/include/trace/events/sched.h
	+++ b/include/trace/events/sched.h
	@@ -559,9 +559,9 @@ DEFINE_EVENT(sched_stat_runtime, sched_stat_runtime,
	  */
	 TRACE_EVENT(sched_pi_setprio,
	 
	-	TP_PROTO(struct task_struct *tsk, struct task_struct *pi_task),
	+	TP_PROTO(struct task_struct *tsk, struct task_struct *pi_task__nullable),
	 
	-	TP_ARGS(tsk, pi_task),
	+	TP_ARGS(tsk, pi_task__nullable),
	 
		TP_STRUCT__entry(
			__array( char,	comm,	TASK_COMM_LEN	)
	@@ -574,8 +574,8 @@ TRACE_EVENT(sched_pi_setprio,
			memcpy(__entry->comm, tsk->comm, TASK_COMM_LEN);
			__entry->pid		= tsk->pid;
			__entry->oldprio	= tsk->prio;
	-		__entry->newprio	= pi_task ?
	-				min(tsk->normal_prio, pi_task->prio) :
	+		__entry->newprio	= pi_task__nullable ?
	+				min(tsk->normal_prio, pi_task__nullable->prio) :
					tsk->normal_prio;
			/* XXX SCHED_DEADLINE bits missing */
		),


now I'm trying to make work something like:

	diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
	index 9ea4c404bd4e..4e4aae2d5700 100644
	--- a/include/trace/events/sched.h
	+++ b/include/trace/events/sched.h
	@@ -559,9 +559,9 @@ DEFINE_EVENT(sched_stat_runtime, sched_stat_runtime,
	  */
	 TRACE_EVENT(sched_pi_setprio,
	 
	-	TP_PROTO(struct task_struct *tsk, struct task_struct *pi_task),
	+	TP_PROTO(struct task_struct *tsk, struct task_struct *__nullable(pi_task)),
	 
	-	TP_ARGS(tsk, pi_task),
	+	TP_ARGS(tsk, __nullable(pi_task)),
	 
		TP_STRUCT__entry(
			__array( char,	comm,	TASK_COMM_LEN	)


jirka

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NULL pointer deref when running BPF monitor program (6.11.0-rc1)
  2024-08-16 18:59                   ` Jiri Olsa
@ 2024-08-16 19:30                     ` Steven Rostedt
  2024-08-19 11:47                       ` Jiri Olsa
  0 siblings, 1 reply; 27+ messages in thread
From: Steven Rostedt @ 2024-08-16 19:30 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Alexei Starovoitov, Juri Lelli, bpf, LKML, Artem Savkov,
	Jose E. Marchesi

On Fri, 16 Aug 2024 20:59:47 +0200
Jiri Olsa <olsajiri@gmail.com> wrote:

> so far the only working solution I have is adding '__nullable' suffix
> to argument name:
> 
> 	diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
> 	index 9ea4c404bd4e..fc46f0b42741 100644
> 	--- a/include/trace/events/sched.h
> 	+++ b/include/trace/events/sched.h
> 	@@ -559,9 +559,9 @@ DEFINE_EVENT(sched_stat_runtime, sched_stat_runtime,
> 	  */
> 	 TRACE_EVENT(sched_pi_setprio,
> 	 
> 	-	TP_PROTO(struct task_struct *tsk, struct task_struct *pi_task),
> 	+	TP_PROTO(struct task_struct *tsk, struct task_struct *pi_task__nullable),
> 	 
> 	-	TP_ARGS(tsk, pi_task),
> 	+	TP_ARGS(tsk, pi_task__nullable),
> 	 
> 		TP_STRUCT__entry(
> 			__array( char,	comm,	TASK_COMM_LEN	)
> 	@@ -574,8 +574,8 @@ TRACE_EVENT(sched_pi_setprio,
> 			memcpy(__entry->comm, tsk->comm, TASK_COMM_LEN);
> 			__entry->pid		= tsk->pid;
> 			__entry->oldprio	= tsk->prio;
> 	-		__entry->newprio	= pi_task ?
> 	-				min(tsk->normal_prio, pi_task->prio) :
> 	+		__entry->newprio	= pi_task__nullable ?
> 	+				min(tsk->normal_prio, pi_task__nullable->prio) :
> 					tsk->normal_prio;
> 			/* XXX SCHED_DEADLINE bits missing */
> 		),
> 
> 
> now I'm trying to make work something like:
> 
> 	diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
> 	index 9ea4c404bd4e..4e4aae2d5700 100644
> 	--- a/include/trace/events/sched.h
> 	+++ b/include/trace/events/sched.h
> 	@@ -559,9 +559,9 @@ DEFINE_EVENT(sched_stat_runtime, sched_stat_runtime,
> 	  */
> 	 TRACE_EVENT(sched_pi_setprio,
> 	 
> 	-	TP_PROTO(struct task_struct *tsk, struct task_struct *pi_task),
> 	+	TP_PROTO(struct task_struct *tsk, struct task_struct *__nullable(pi_task)),
> 	 
> 	-	TP_ARGS(tsk, pi_task),
> 	+	TP_ARGS(tsk, __nullable(pi_task)),
> 	 
> 		TP_STRUCT__entry(
> 			__array( char,	comm,	TASK_COMM_LEN	)

Hmm, that's really ugly though. Both versions.

Now when Alexei said:

> > > > > We cannot make all tracepoint pointers to be PTR_TRUSTED | PTR_MAYBE_NULL
> > > > > by default, since it will break a bunch of progs.
> > > > > Instead we can annotate this tracepoint arg as __nullable and
> > > > > teach the verifier to recognize such special arguments of tracepoints. 

I'm not familiar with the verifier, so I don't know how the above is
implemented, and why it would break a bunch of progs.

If you had a macro around the parameter:

		TP_PROTO(struct task_struct *tsk, struct task_struct *__nullable(pi_task)),

Could having that go through another macro pass in trace_events.h work?
That is, could we associate the trace event with "nullable" parameters
that could be stored someplace else for you?

-- Steve

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NULL pointer deref when running BPF monitor program (6.11.0-rc1)
  2024-08-16 19:30                     ` Steven Rostedt
@ 2024-08-19 11:47                       ` Jiri Olsa
  2024-08-19 14:05                         ` Jiri Olsa
  2024-08-19 15:37                         ` Steven Rostedt
  0 siblings, 2 replies; 27+ messages in thread
From: Jiri Olsa @ 2024-08-19 11:47 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Jiri Olsa, Alexei Starovoitov, Juri Lelli, bpf, LKML,
	Artem Savkov, Jose E. Marchesi

On Fri, Aug 16, 2024 at 03:30:40PM -0400, Steven Rostedt wrote:
> On Fri, 16 Aug 2024 20:59:47 +0200
> Jiri Olsa <olsajiri@gmail.com> wrote:
> 
> > so far the only working solution I have is adding '__nullable' suffix
> > to argument name:
> > 
> > 	diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
> > 	index 9ea4c404bd4e..fc46f0b42741 100644
> > 	--- a/include/trace/events/sched.h
> > 	+++ b/include/trace/events/sched.h
> > 	@@ -559,9 +559,9 @@ DEFINE_EVENT(sched_stat_runtime, sched_stat_runtime,
> > 	  */
> > 	 TRACE_EVENT(sched_pi_setprio,
> > 	 
> > 	-	TP_PROTO(struct task_struct *tsk, struct task_struct *pi_task),
> > 	+	TP_PROTO(struct task_struct *tsk, struct task_struct *pi_task__nullable),
> > 	 
> > 	-	TP_ARGS(tsk, pi_task),
> > 	+	TP_ARGS(tsk, pi_task__nullable),
> > 	 
> > 		TP_STRUCT__entry(
> > 			__array( char,	comm,	TASK_COMM_LEN	)
> > 	@@ -574,8 +574,8 @@ TRACE_EVENT(sched_pi_setprio,
> > 			memcpy(__entry->comm, tsk->comm, TASK_COMM_LEN);
> > 			__entry->pid		= tsk->pid;
> > 			__entry->oldprio	= tsk->prio;
> > 	-		__entry->newprio	= pi_task ?
> > 	-				min(tsk->normal_prio, pi_task->prio) :
> > 	+		__entry->newprio	= pi_task__nullable ?
> > 	+				min(tsk->normal_prio, pi_task__nullable->prio) :
> > 					tsk->normal_prio;
> > 			/* XXX SCHED_DEADLINE bits missing */
> > 		),
> > 
> > 
> > now I'm trying to make work something like:
> > 
> > 	diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
> > 	index 9ea4c404bd4e..4e4aae2d5700 100644
> > 	--- a/include/trace/events/sched.h
> > 	+++ b/include/trace/events/sched.h
> > 	@@ -559,9 +559,9 @@ DEFINE_EVENT(sched_stat_runtime, sched_stat_runtime,
> > 	  */
> > 	 TRACE_EVENT(sched_pi_setprio,
> > 	 
> > 	-	TP_PROTO(struct task_struct *tsk, struct task_struct *pi_task),
> > 	+	TP_PROTO(struct task_struct *tsk, struct task_struct *__nullable(pi_task)),
> > 	 
> > 	-	TP_ARGS(tsk, pi_task),
> > 	+	TP_ARGS(tsk, __nullable(pi_task)),
> > 	 
> > 		TP_STRUCT__entry(
> > 			__array( char,	comm,	TASK_COMM_LEN	)
> 
> Hmm, that's really ugly though. Both versions.
> 
> Now when Alexei said:
> 
> > > > > > We cannot make all tracepoint pointers to be PTR_TRUSTED | PTR_MAYBE_NULL
> > > > > > by default, since it will break a bunch of progs.
> > > > > > Instead we can annotate this tracepoint arg as __nullable and
> > > > > > teach the verifier to recognize such special arguments of tracepoints. 
> 
> I'm not familiar with the verifier, so I don't know how the above is
> implemented, and why it would break a bunch of progs.

verifier assumes that programs attached to the tracepoint can access
pointer arguments without checking them for null and some of those
programs most likely access such arguments directly

changing that globally and require bpf program to do null check for all
pointer arguments will make verifier fail to load existing programs

> 
> If you had a macro around the parameter:
> 
> 		TP_PROTO(struct task_struct *tsk, struct task_struct *__nullable(pi_task)),
> 
> Could having that go through another macro pass in trace_events.h work?
> That is, could we associate the trace event with "nullable" parameters
> that could be stored someplace else for you?

IIUC you mean to store extra data for each tracepoint that would
annotate the argument? as Alexei pointed out earlier it might be
too much, because we'd be fine with just adding suffix to annotated
arguments in __bpf_trace_##call:

	__bpf_trace_##call(void *__data, proto)                                 \
	{                                                                       \
		CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(__data, CAST_TO_U64(args));        \
	}

with that verifier could easily get suffix information from BTF and
once gcc implements btf_type_tag we can easily switch to that

jirka

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NULL pointer deref when running BPF monitor program (6.11.0-rc1)
  2024-08-19 11:47                       ` Jiri Olsa
@ 2024-08-19 14:05                         ` Jiri Olsa
  2024-08-19 15:37                         ` Steven Rostedt
  1 sibling, 0 replies; 27+ messages in thread
From: Jiri Olsa @ 2024-08-19 14:05 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Steven Rostedt, Alexei Starovoitov, Juri Lelli, bpf, LKML,
	Artem Savkov, Jose E. Marchesi

On Mon, Aug 19, 2024 at 01:47:20PM +0200, Jiri Olsa wrote:
> On Fri, Aug 16, 2024 at 03:30:40PM -0400, Steven Rostedt wrote:
> > On Fri, 16 Aug 2024 20:59:47 +0200
> > Jiri Olsa <olsajiri@gmail.com> wrote:
> > 
> > > so far the only working solution I have is adding '__nullable' suffix
> > > to argument name:
> > > 
> > > 	diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
> > > 	index 9ea4c404bd4e..fc46f0b42741 100644
> > > 	--- a/include/trace/events/sched.h
> > > 	+++ b/include/trace/events/sched.h
> > > 	@@ -559,9 +559,9 @@ DEFINE_EVENT(sched_stat_runtime, sched_stat_runtime,
> > > 	  */
> > > 	 TRACE_EVENT(sched_pi_setprio,
> > > 	 
> > > 	-	TP_PROTO(struct task_struct *tsk, struct task_struct *pi_task),
> > > 	+	TP_PROTO(struct task_struct *tsk, struct task_struct *pi_task__nullable),
> > > 	 
> > > 	-	TP_ARGS(tsk, pi_task),
> > > 	+	TP_ARGS(tsk, pi_task__nullable),
> > > 	 
> > > 		TP_STRUCT__entry(
> > > 			__array( char,	comm,	TASK_COMM_LEN	)
> > > 	@@ -574,8 +574,8 @@ TRACE_EVENT(sched_pi_setprio,
> > > 			memcpy(__entry->comm, tsk->comm, TASK_COMM_LEN);
> > > 			__entry->pid		= tsk->pid;
> > > 			__entry->oldprio	= tsk->prio;
> > > 	-		__entry->newprio	= pi_task ?
> > > 	-				min(tsk->normal_prio, pi_task->prio) :
> > > 	+		__entry->newprio	= pi_task__nullable ?
> > > 	+				min(tsk->normal_prio, pi_task__nullable->prio) :
> > > 					tsk->normal_prio;
> > > 			/* XXX SCHED_DEADLINE bits missing */
> > > 		),
> > > 
> > > 
> > > now I'm trying to make work something like:
> > > 
> > > 	diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
> > > 	index 9ea4c404bd4e..4e4aae2d5700 100644
> > > 	--- a/include/trace/events/sched.h
> > > 	+++ b/include/trace/events/sched.h
> > > 	@@ -559,9 +559,9 @@ DEFINE_EVENT(sched_stat_runtime, sched_stat_runtime,
> > > 	  */
> > > 	 TRACE_EVENT(sched_pi_setprio,
> > > 	 
> > > 	-	TP_PROTO(struct task_struct *tsk, struct task_struct *pi_task),
> > > 	+	TP_PROTO(struct task_struct *tsk, struct task_struct *__nullable(pi_task)),
> > > 	 
> > > 	-	TP_ARGS(tsk, pi_task),
> > > 	+	TP_ARGS(tsk, __nullable(pi_task)),
> > > 	 
> > > 		TP_STRUCT__entry(
> > > 			__array( char,	comm,	TASK_COMM_LEN	)
> > 
> > Hmm, that's really ugly though. Both versions.
> > 
> > Now when Alexei said:
> > 
> > > > > > > We cannot make all tracepoint pointers to be PTR_TRUSTED | PTR_MAYBE_NULL
> > > > > > > by default, since it will break a bunch of progs.
> > > > > > > Instead we can annotate this tracepoint arg as __nullable and
> > > > > > > teach the verifier to recognize such special arguments of tracepoints. 
> > 
> > I'm not familiar with the verifier, so I don't know how the above is
> > implemented, and why it would break a bunch of progs.
> 
> verifier assumes that programs attached to the tracepoint can access
> pointer arguments without checking them for null and some of those
> programs most likely access such arguments directly
> 
> changing that globally and require bpf program to do null check for all
> pointer arguments will make verifier fail to load existing programs
> 
> > 
> > If you had a macro around the parameter:
> > 
> > 		TP_PROTO(struct task_struct *tsk, struct task_struct *__nullable(pi_task)),
> > 
> > Could having that go through another macro pass in trace_events.h work?
> > That is, could we associate the trace event with "nullable" parameters
> > that could be stored someplace else for you?
> 
> IIUC you mean to store extra data for each tracepoint that would
> annotate the argument? as Alexei pointed out earlier it might be
> too much, because we'd be fine with just adding suffix to annotated
> arguments in __bpf_trace_##call:
> 
> 	__bpf_trace_##call(void *__data, proto)                                 \

nah.. it's defined for class template, so tracepoints like cgroup_mkdir
won't have its own __bpf_trace_cgroup_mkdir function that we could use

we need to do something else

jirka


> 	{                                                                       \
> 		CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(__data, CAST_TO_U64(args));        \
> 	}
> 
> with that verifier could easily get suffix information from BTF and
> once gcc implements btf_type_tag we can easily switch to that
> 
> jirka

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NULL pointer deref when running BPF monitor program (6.11.0-rc1)
  2024-08-19 11:47                       ` Jiri Olsa
  2024-08-19 14:05                         ` Jiri Olsa
@ 2024-08-19 15:37                         ` Steven Rostedt
  2024-08-20 10:17                           ` Jiri Olsa
  1 sibling, 1 reply; 27+ messages in thread
From: Steven Rostedt @ 2024-08-19 15:37 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Alexei Starovoitov, Juri Lelli, bpf, LKML, Artem Savkov,
	Jose E. Marchesi

On Mon, 19 Aug 2024 13:47:20 +0200
Jiri Olsa <olsajiri@gmail.com> wrote:

> verifier assumes that programs attached to the tracepoint can access
> pointer arguments without checking them for null and some of those
> programs most likely access such arguments directly

Hmm, so the verifier made a wrong assumption :-/  That's because that was
never a requirement for tracepoint arguments and several can easily be
NULL. That's why the macros have NULL checks for all arguments. For
example, see include/trace/stages/stage5_get_offsets.h:

  static inline const char *__string_src(const char *str)
  {
       if (!str)
               return EVENT_NULL_STR;
       return str;
  }


How does the verifier handle accessing function arguments? Because a
tracepoint call is no different.

> 
> changing that globally and require bpf program to do null check for all
> pointer arguments will make verifier fail to load existing programs
> 
> > 
> > If you had a macro around the parameter:
> > 
> > 		TP_PROTO(struct task_struct *tsk, struct task_struct *__nullable(pi_task)),
> > 
> > Could having that go through another macro pass in trace_events.h work?
> > That is, could we associate the trace event with "nullable" parameters
> > that could be stored someplace else for you?  
> 
> IIUC you mean to store extra data for each tracepoint that would
> annotate the argument? as Alexei pointed out earlier it might be
> too much, because we'd be fine with just adding suffix to annotated
> arguments in __bpf_trace_##call:
> 
> 	__bpf_trace_##call(void *__data, proto)                                 \
> 	{                                                                       \
> 		CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(__data, CAST_TO_U64(args));        \
> 	}
> 
> with that verifier could easily get suffix information from BTF and
> once gcc implements btf_type_tag we can easily switch to that

Could it be possible that the verifier could add to the exception table for
all accesses to tracepoint arguments? Then if there's a NULL pointer
dereference, the kernel will not crash but the exception can be sent to the
user space process instead? That is, it sends SIGSEV to the task accessing
NULL when it shouldn't.

-- Steve

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NULL pointer deref when running BPF monitor program (6.11.0-rc1)
  2024-08-19 15:37                         ` Steven Rostedt
@ 2024-08-20 10:17                           ` Jiri Olsa
  2024-08-20 15:05                             ` Steven Rostedt
  0 siblings, 1 reply; 27+ messages in thread
From: Jiri Olsa @ 2024-08-20 10:17 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Jiri Olsa, Alexei Starovoitov, Juri Lelli, bpf, LKML,
	Artem Savkov, Jose E. Marchesi

On Mon, Aug 19, 2024 at 11:37:47AM -0400, Steven Rostedt wrote:
> On Mon, 19 Aug 2024 13:47:20 +0200
> Jiri Olsa <olsajiri@gmail.com> wrote:
> 
> > verifier assumes that programs attached to the tracepoint can access
> > pointer arguments without checking them for null and some of those
> > programs most likely access such arguments directly
> 
> Hmm, so the verifier made a wrong assumption :-/  That's because that was
> never a requirement for tracepoint arguments and several can easily be
> NULL. That's why the macros have NULL checks for all arguments. For
> example, see include/trace/stages/stage5_get_offsets.h:
> 
>   static inline const char *__string_src(const char *str)
>   {
>        if (!str)
>                return EVENT_NULL_STR;
>        return str;
>   }
> 
> 
> How does the verifier handle accessing function arguments? Because a
> tracepoint call is no different.

verifier is checking program's access to function arguments which in
case of tracepoint is access to context (btf_ctx_access function)

> 
> > 
> > changing that globally and require bpf program to do null check for all
> > pointer arguments will make verifier fail to load existing programs
> > 
> > > 
> > > If you had a macro around the parameter:
> > > 
> > > 		TP_PROTO(struct task_struct *tsk, struct task_struct *__nullable(pi_task)),
> > > 
> > > Could having that go through another macro pass in trace_events.h work?
> > > That is, could we associate the trace event with "nullable" parameters
> > > that could be stored someplace else for you?  
> > 
> > IIUC you mean to store extra data for each tracepoint that would
> > annotate the argument? as Alexei pointed out earlier it might be
> > too much, because we'd be fine with just adding suffix to annotated
> > arguments in __bpf_trace_##call:
> > 
> > 	__bpf_trace_##call(void *__data, proto)                                 \
> > 	{                                                                       \
> > 		CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(__data, CAST_TO_U64(args));        \
> > 	}
> > 
> > with that verifier could easily get suffix information from BTF and
> > once gcc implements btf_type_tag we can easily switch to that
> 
> Could it be possible that the verifier could add to the exception table for
> all accesses to tracepoint arguments? Then if there's a NULL pointer
> dereference, the kernel will not crash but the exception can be sent to the
> user space process instead? That is, it sends SIGSEV to the task accessing
> NULL when it shouldn't.

hm, but that would mean random process that would happened to trigger
the tracepoint would segfault, right? I don't think we can do that

it seems better to teach verifier which tracepoint arguments can be NULL
and deny load of the bpf program that would not check such argument properly

jirka

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NULL pointer deref when running BPF monitor program (6.11.0-rc1)
  2024-08-20 10:17                           ` Jiri Olsa
@ 2024-08-20 15:05                             ` Steven Rostedt
  2024-10-02 16:30                               ` Jiri Olsa
  0 siblings, 1 reply; 27+ messages in thread
From: Steven Rostedt @ 2024-08-20 15:05 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Alexei Starovoitov, Juri Lelli, bpf, LKML, Artem Savkov,
	Jose E. Marchesi

On Tue, 20 Aug 2024 12:17:31 +0200
Jiri Olsa <olsajiri@gmail.com> wrote:

> > Could it be possible that the verifier could add to the exception table for
> > all accesses to tracepoint arguments? Then if there's a NULL pointer
> > dereference, the kernel will not crash but the exception can be sent to the
> > user space process instead? That is, it sends SIGSEV to the task accessing
> > NULL when it shouldn't.  
> 
> hm, but that would mean random process that would happened to trigger
> the tracepoint would segfault, right? I don't think we can do that

Better than a kernel crash, isn't it?  I thought the guarantee of BPF was
not to ever crash the kernel. Crashing user space may be bad, but not
always fatal, and something that can be fixed by fixng the BPF program that
was loaded.

> 
> it seems better to teach verifier which tracepoint arguments can be NULL
> and deny load of the bpf program that would not check such argument properly

These are not mutually exclusive. I think you want both. Adding annotation
is going to be a whack-a-mole game as new tracepoints will always be
created with new possibly NULL parameters and even old tracepoints can add
that too. There's nothing to stop that.

The exception table logic will prevent any missed checks from causing a
kernel crash, and your annotations will keep user space from crashing.

-- Steve

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NULL pointer deref when running BPF monitor program (6.11.0-rc1)
  2024-08-20 15:05                             ` Steven Rostedt
@ 2024-10-02 16:30                               ` Jiri Olsa
  2024-10-09 20:41                                 ` Jiri Olsa
  0 siblings, 1 reply; 27+ messages in thread
From: Jiri Olsa @ 2024-10-02 16:30 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Jiri Olsa, Alexei Starovoitov, Juri Lelli, bpf, LKML,
	Artem Savkov, Jose E. Marchesi

On Tue, Aug 20, 2024 at 11:05:07AM -0400, Steven Rostedt wrote:
> On Tue, 20 Aug 2024 12:17:31 +0200
> Jiri Olsa <olsajiri@gmail.com> wrote:
> 
> > > Could it be possible that the verifier could add to the exception table for
> > > all accesses to tracepoint arguments? Then if there's a NULL pointer
> > > dereference, the kernel will not crash but the exception can be sent to the
> > > user space process instead? That is, it sends SIGSEV to the task accessing
> > > NULL when it shouldn't.  
> > 
> > hm, but that would mean random process that would happened to trigger
> > the tracepoint would segfault, right? I don't think we can do that
> 
> Better than a kernel crash, isn't it?  I thought the guarantee of BPF was
> not to ever crash the kernel. Crashing user space may be bad, but not
> always fatal, and something that can be fixed by fixng the BPF program that
> was loaded.
> 
> > 
> > it seems better to teach verifier which tracepoint arguments can be NULL
> > and deny load of the bpf program that would not check such argument properly
> 
> These are not mutually exclusive. I think you want both. Adding annotation
> is going to be a whack-a-mole game as new tracepoints will always be
> created with new possibly NULL parameters and even old tracepoints can add
> that too. There's nothing to stop that.
> 
> The exception table logic will prevent any missed checks from causing a
> kernel crash, and your annotations will keep user space from crashing.
> 
> -- Steve

sorry for delay.. reviving this after plumbers and other stuff that got in a way

Steven,
we were discussing this in plumbers and you had an idea on doing this
automatically through objtool.. IIRC you meant tracking instructions
that carry argument pointers for NULL checks

AFAICS we'd need to do roughly:
  - for each tracepoint we'd need to interpret one of the functions
    where TP_fast_assign macro gets unwinded:
      perf_trace_##call
      trace_custom_event_raw_event_##call
      trace_event_raw_event_##call
  - we can't tell at this point which argument is kernel object,
    so we'd need to check all arguments (assuming we can get their count)
  - store argument info (if it has null check) into some elf tables and
    use those later in bpf verifier
  - it's all arch specific 

on first look it seems hard and fragile (given it's arch specific)
but I might be easily wrong with above.. do you have an idea on how
this could work?

thanks,
jirka

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NULL pointer deref when running BPF monitor program (6.11.0-rc1)
  2024-10-02 16:30                               ` Jiri Olsa
@ 2024-10-09 20:41                                 ` Jiri Olsa
  2024-10-10  0:33                                   ` Josh Poimboeuf
  0 siblings, 1 reply; 27+ messages in thread
From: Jiri Olsa @ 2024-10-09 20:41 UTC (permalink / raw)
  To: Jiri Olsa, Josh Poimboeuf
  Cc: Steven Rostedt, Alexei Starovoitov, Juri Lelli, bpf, LKML,
	Jose E. Marchesi

On Wed, Oct 02, 2024 at 06:30:30PM +0200, Jiri Olsa wrote:
> On Tue, Aug 20, 2024 at 11:05:07AM -0400, Steven Rostedt wrote:
> > On Tue, 20 Aug 2024 12:17:31 +0200
> > Jiri Olsa <olsajiri@gmail.com> wrote:
> > 
> > > > Could it be possible that the verifier could add to the exception table for
> > > > all accesses to tracepoint arguments? Then if there's a NULL pointer
> > > > dereference, the kernel will not crash but the exception can be sent to the
> > > > user space process instead? That is, it sends SIGSEV to the task accessing
> > > > NULL when it shouldn't.  
> > > 
> > > hm, but that would mean random process that would happened to trigger
> > > the tracepoint would segfault, right? I don't think we can do that
> > 
> > Better than a kernel crash, isn't it?  I thought the guarantee of BPF was
> > not to ever crash the kernel. Crashing user space may be bad, but not
> > always fatal, and something that can be fixed by fixng the BPF program that
> > was loaded.
> > 
> > > 
> > > it seems better to teach verifier which tracepoint arguments can be NULL
> > > and deny load of the bpf program that would not check such argument properly
> > 
> > These are not mutually exclusive. I think you want both. Adding annotation
> > is going to be a whack-a-mole game as new tracepoints will always be
> > created with new possibly NULL parameters and even old tracepoints can add
> > that too. There's nothing to stop that.
> > 
> > The exception table logic will prevent any missed checks from causing a
> > kernel crash, and your annotations will keep user space from crashing.
> > 
> > -- Steve
> 
> sorry for delay.. reviving this after plumbers and other stuff that got in a way
> 
> Steven,
> we were discussing this in plumbers and you had an idea on doing this
> automatically through objtool.. IIRC you meant tracking instructions
> that carry argument pointers for NULL checks
> 
> AFAICS we'd need to do roughly:
>   - for each tracepoint we'd need to interpret one of the functions
>     where TP_fast_assign macro gets unwinded:
>       perf_trace_##call
>       trace_custom_event_raw_event_##call
>       trace_event_raw_event_##call
>   - we can't tell at this point which argument is kernel object,
>     so we'd need to check all arguments (assuming we can get their count)
>   - store argument info (if it has null check) into some elf tables and
>     use those later in bpf verifier
>   - it's all arch specific 
> 
> on first look it seems hard and fragile (given it's arch specific)
> but I might be easily wrong with above.. do you have an idea on how
> this could work?

Hi Josh,
we'd like to have information on which of tracepoint's arguments can be NULL

Steven had an idea that objtool could help with that by doing something like
what's described above.. would you have any thoughts on that?

thanks,
jirka

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NULL pointer deref when running BPF monitor program (6.11.0-rc1)
  2024-10-09 20:41                                 ` Jiri Olsa
@ 2024-10-10  0:33                                   ` Josh Poimboeuf
  2024-10-10  0:56                                     ` Steven Rostedt
  0 siblings, 1 reply; 27+ messages in thread
From: Josh Poimboeuf @ 2024-10-10  0:33 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Steven Rostedt, Alexei Starovoitov, Juri Lelli, bpf, LKML,
	Jose E. Marchesi

On Wed, Oct 09, 2024 at 10:41:42PM +0200, Jiri Olsa wrote:
> > AFAICS we'd need to do roughly:
> >   - for each tracepoint we'd need to interpret one of the functions
> >     where TP_fast_assign macro gets unwinded:
> >       perf_trace_##call
> >       trace_custom_event_raw_event_##call
> >       trace_event_raw_event_##call
> >   - we can't tell at this point which argument is kernel object,
> >     so we'd need to check all arguments (assuming we can get their count)
> >   - store argument info (if it has null check) into some elf tables and
> >     use those later in bpf verifier
> >   - it's all arch specific 
> > 
> > on first look it seems hard and fragile (given it's arch specific)
> > but I might be easily wrong with above.. do you have an idea on how
> > this could work?
> 
> Hi Josh,
> we'd like to have information on which of tracepoint's arguments can be NULL
> 
> Steven had an idea that objtool could help with that by doing something like
> what's described above.. would you have any thoughts on that?

Objtool doesn't know anything about function arguments, I'm not sure how
this could be done unless I'm missing something.

-- 
Josh

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NULL pointer deref when running BPF monitor program (6.11.0-rc1)
  2024-10-10  0:33                                   ` Josh Poimboeuf
@ 2024-10-10  0:56                                     ` Steven Rostedt
  2024-10-10  0:57                                       ` Steven Rostedt
  0 siblings, 1 reply; 27+ messages in thread
From: Steven Rostedt @ 2024-10-10  0:56 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Jiri Olsa, Alexei Starovoitov, Juri Lelli, bpf, LKML,
	Jose E. Marchesi

On Wed, 9 Oct 2024 17:33:31 -0700
Josh Poimboeuf <jpoimboe@kernel.org> wrote:

> > Hi Josh,
> > we'd like to have information on which of tracepoint's arguments can be NULL
> > 
> > Steven had an idea that objtool could help with that by doing something like
> > what's described above.. would you have any thoughts on that?  
> 
> Objtool doesn't know anything about function arguments, I'm not sure how
> this could be done unless I'm missing something.

I was thinking if something like objtool (could be something else that can
read the executable code) and know of where functions are. It could just
see if anything tests rdi, rsi, rdx, rcx, r8 or r9 (or their 32 bit
alternatives) for NULL before using or setting it.

If it does, then we know that one of the arguments could possibly be NULL.

-- Steve


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NULL pointer deref when running BPF monitor program (6.11.0-rc1)
  2024-10-10  0:56                                     ` Steven Rostedt
@ 2024-10-10  0:57                                       ` Steven Rostedt
  2024-10-10  3:17                                         ` Josh Poimboeuf
  0 siblings, 1 reply; 27+ messages in thread
From: Steven Rostedt @ 2024-10-10  0:57 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Jiri Olsa, Alexei Starovoitov, Juri Lelli, bpf, LKML,
	Jose E. Marchesi

On Wed, 9 Oct 2024 20:56:47 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> I was thinking if something like objtool (could be something else that can
> read the executable code) and know of where functions are. It could just
> see if anything tests rdi, rsi, rdx, rcx, r8 or r9 (or their 32 bit
> alternatives) for NULL before using or setting it.
> 
> If it does, then we know that one of the arguments could possibly be NULL.

Oh, and it only needs to look at functions that are named:

  trace_event_raw_event_*()

-- Steve

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NULL pointer deref when running BPF monitor program (6.11.0-rc1)
  2024-10-10  0:57                                       ` Steven Rostedt
@ 2024-10-10  3:17                                         ` Josh Poimboeuf
  2024-10-10  9:00                                           ` Jiri Olsa
  0 siblings, 1 reply; 27+ messages in thread
From: Josh Poimboeuf @ 2024-10-10  3:17 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Jiri Olsa, Alexei Starovoitov, Juri Lelli, bpf, LKML,
	Jose E. Marchesi, Peter Zijlstra

On Wed, Oct 09, 2024 at 08:57:50PM -0400, Steven Rostedt wrote:
> On Wed, 9 Oct 2024 20:56:47 -0400
> Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > I was thinking if something like objtool (could be something else that can
> > read the executable code) and know of where functions are. It could just
> > see if anything tests rdi, rsi, rdx, rcx, r8 or r9 (or their 32 bit
> > alternatives) for NULL before using or setting it.
> > 
> > If it does, then we know that one of the arguments could possibly be NULL.
> 
> Oh, and it only needs to look at functions that are named:
> 
>   trace_event_raw_event_*()

Unfortunately it's not that simple, the args could be moved around to
other registers.  And objtool doesn't have an emulator.

Also it's not clear how that would deal with >6 args, or IS_ERR() as
Jirka pointed out upthread.

-- 
Josh

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NULL pointer deref when running BPF monitor program (6.11.0-rc1)
  2024-10-10  3:17                                         ` Josh Poimboeuf
@ 2024-10-10  9:00                                           ` Jiri Olsa
  2024-10-10 13:49                                             ` Steven Rostedt
  0 siblings, 1 reply; 27+ messages in thread
From: Jiri Olsa @ 2024-10-10  9:00 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Steven Rostedt, Jiri Olsa, Alexei Starovoitov, Juri Lelli, bpf,
	LKML, Jose E. Marchesi, Peter Zijlstra

On Wed, Oct 09, 2024 at 08:17:27PM -0700, Josh Poimboeuf wrote:
> On Wed, Oct 09, 2024 at 08:57:50PM -0400, Steven Rostedt wrote:
> > On Wed, 9 Oct 2024 20:56:47 -0400
> > Steven Rostedt <rostedt@goodmis.org> wrote:
> > 
> > > I was thinking if something like objtool (could be something else that can
> > > read the executable code) and know of where functions are. It could just
> > > see if anything tests rdi, rsi, rdx, rcx, r8 or r9 (or their 32 bit
> > > alternatives) for NULL before using or setting it.
> > > 
> > > If it does, then we know that one of the arguments could possibly be NULL.
> > 
> > Oh, and it only needs to look at functions that are named:
> > 
> >   trace_event_raw_event_*()
> 
> Unfortunately it's not that simple, the args could be moved around to
> other registers.  And objtool doesn't have an emulator.
> 
> Also it's not clear how that would deal with >6 args, or IS_ERR() as
> Jirka pointed out upthread.

another complication might be that the code in tracepoint's fast assign
can potentially call global function (?), that could do the argument NULL
check and we won't have its code at objtool invocation time

jirka

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NULL pointer deref when running BPF monitor program (6.11.0-rc1)
  2024-10-10  9:00                                           ` Jiri Olsa
@ 2024-10-10 13:49                                             ` Steven Rostedt
  0 siblings, 0 replies; 27+ messages in thread
From: Steven Rostedt @ 2024-10-10 13:49 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Josh Poimboeuf, Alexei Starovoitov, Juri Lelli, bpf, LKML,
	Jose E. Marchesi, Peter Zijlstra

On Thu, 10 Oct 2024 11:00:30 +0200
Jiri Olsa <olsajiri@gmail.com> wrote:

> > Unfortunately it's not that simple, the args could be moved around to
> > other registers.  And objtool doesn't have an emulator.
> > 
> > Also it's not clear how that would deal with >6 args, or IS_ERR() as
> > Jirka pointed out upthread.  

For the >6 args, I would say that the verifier just says any arg greater
than 6 can be NULL. There's not many trace events that have that (if any).

> 
> another complication might be that the code in tracepoint's fast assign
> can potentially call global function (?), that could do the argument NULL
> check and we won't have its code at objtool invocation time

I'm starting to think that the best thing to do is to have the verifier add
exception code in the bpf program that just kills the task if it faults on
reading a tracepoint parameter.

This all started because it was assumed (incorrectly, and I was never
asked) that trace point args can't be NULL. It was always the case that
they could be. This was not a regression.

Now that there's existing BPF programs that assume that tracepoint
arguments are not NULL, is a bug in user space. Not the kernel.

-- Steve

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2024-10-10 13:49 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-05  9:20 NULL pointer deref when running BPF monitor program (6.11.0-rc1) Juri Lelli
2024-08-05 16:49 ` Jiri Olsa
2024-08-05 17:00   ` Alexei Starovoitov
2024-08-06  7:08     ` Juri Lelli
2024-08-06 13:17     ` Jiri Olsa
2024-08-06 13:24     ` Jiri Olsa
2024-08-06 18:44       ` Alexei Starovoitov
2024-08-08 10:46         ` Jiri Olsa
2024-08-08 15:43           ` Alexei Starovoitov
2024-08-15 11:48             ` Jiri Olsa
2024-08-15 12:37               ` Alexei Starovoitov
2024-08-16 14:10                 ` Steven Rostedt
2024-08-16 18:59                   ` Jiri Olsa
2024-08-16 19:30                     ` Steven Rostedt
2024-08-19 11:47                       ` Jiri Olsa
2024-08-19 14:05                         ` Jiri Olsa
2024-08-19 15:37                         ` Steven Rostedt
2024-08-20 10:17                           ` Jiri Olsa
2024-08-20 15:05                             ` Steven Rostedt
2024-10-02 16:30                               ` Jiri Olsa
2024-10-09 20:41                                 ` Jiri Olsa
2024-10-10  0:33                                   ` Josh Poimboeuf
2024-10-10  0:56                                     ` Steven Rostedt
2024-10-10  0:57                                       ` Steven Rostedt
2024-10-10  3:17                                         ` Josh Poimboeuf
2024-10-10  9:00                                           ` Jiri Olsa
2024-10-10 13:49                                             ` Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox