* [PATCH bpf v2 0/3] Fix the read of vsyscall page through bpf
@ 2024-01-26 11:54 Hou Tao
2024-01-26 11:54 ` [PATCH bpf v2 1/3] x86/mm: Move is_vsyscall_vaddr() into asm/vsyscall.h Hou Tao
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Hou Tao @ 2024-01-26 11:54 UTC (permalink / raw)
To: x86, bpf
Cc: Dave Hansen, Andy Lutomirski, Peter Zijlstra, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, H . Peter Anvin, linux-kernel,
xingwei lee, Jann Horn, Sohil Mehta, Yonghong Song, houtao1
From: Hou Tao <houtao1@huawei.com>
Hi,
As reported by syzboot [1] and [2], when trying to read vsyscall page
by using bpf_probe_read_kernel() or bpf_probe_read(), oops may happen.
Thomas Gleixner had proposed a test patch [3], but it seems that no
formal patch is posted after about one month [4], so I post it instead
and add an Originally-by tag in patch #2.
Patch #1 makes is_vsyscall_vaddr() being a common helper. Patch #2 fixes
the problem by disallowing vsyscall page read for
copy_from_kernel_nofault(). Patch #3 adds one test case to ensure the
read of vsyscall page through bpf is rejected. Please see individual
patches for more details.
Comments are always welcome.
[1]: https://lore.kernel.org/bpf/CAG48ez06TZft=ATH1qh2c5mpS5BT8UakwNkzi6nvK5_djC-4Nw@mail.gmail.com/
[2]: https://lore.kernel.org/bpf/CABOYnLynjBoFZOf3Z4BhaZkc5hx_kHfsjiW+UWLoB=w33LvScw@mail.gmail.com/
[3]: https://lore.kernel.org/bpf/87r0jwquhv.ffs@tglx/
[4]: https://lore.kernel.org/bpf/e24b125c-8ff4-9031-6c53-67ff2e01f316@huaweicloud.com/
Change Log:
v2:
* move is_vsyscall_vaddr to asm/vsyscall.h instead (Sohil)
* elaborate on the reason for disallowing of vsyscall page read in
copy_from_kernel_nofault_allowed() (Sohil)
* update the commit message of patch #2 to more clearly explain how
the oops occurs. (Sohil)
* update the commit message of patch #3 to explain the expected return
values of various bpf helpers (Yonghong)
v1: https://lore.kernel.org/bpf/20240119073019.1528573-1-houtao@huaweicloud.com/
Hou Tao (3):
x86/mm: Move is_vsyscall_vaddr() into asm/vsyscall.h
x86/mm: Disallow vsyscall page read for copy_from_kernel_nofault()
selftest/bpf: Test the read of vsyscall page under x86-64
arch/x86/include/asm/vsyscall.h | 10 ++++
arch/x86/mm/fault.c | 9 ---
arch/x86/mm/maccess.c | 9 +++
.../selftests/bpf/prog_tests/read_vsyscall.c | 57 +++++++++++++++++++
.../selftests/bpf/progs/read_vsyscall.c | 45 +++++++++++++++
5 files changed, 121 insertions(+), 9 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/read_vsyscall.c
create mode 100644 tools/testing/selftests/bpf/progs/read_vsyscall.c
--
2.29.2
^ permalink raw reply [flat|nested] 9+ messages in thread* [PATCH bpf v2 1/3] x86/mm: Move is_vsyscall_vaddr() into asm/vsyscall.h 2024-01-26 11:54 [PATCH bpf v2 0/3] Fix the read of vsyscall page through bpf Hou Tao @ 2024-01-26 11:54 ` Hou Tao 2024-01-29 23:56 ` Sohil Mehta 2024-01-26 11:54 ` [PATCH bpf v2 2/3] x86/mm: Disallow vsyscall page read for copy_from_kernel_nofault() Hou Tao 2024-01-26 11:54 ` [PATCH bpf v2 3/3] selftest/bpf: Test the read of vsyscall page under x86-64 Hou Tao 2 siblings, 1 reply; 9+ messages in thread From: Hou Tao @ 2024-01-26 11:54 UTC (permalink / raw) To: x86, bpf Cc: Dave Hansen, Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H . Peter Anvin, linux-kernel, xingwei lee, Jann Horn, Sohil Mehta, Yonghong Song, houtao1 From: Hou Tao <houtao1@huawei.com> Moving is_vsyscall_vaddr() into asm/vsyscall.h to make it available for copy_from_kernel_nofault_allowed() in arch/x86/mm/maccess.c. Signed-off-by: Hou Tao <houtao1@huawei.com> --- arch/x86/include/asm/vsyscall.h | 10 ++++++++++ arch/x86/mm/fault.c | 9 --------- 2 files changed, 10 insertions(+), 9 deletions(-) diff --git a/arch/x86/include/asm/vsyscall.h b/arch/x86/include/asm/vsyscall.h index ab60a71a8dcb9..472f0263dbc61 100644 --- a/arch/x86/include/asm/vsyscall.h +++ b/arch/x86/include/asm/vsyscall.h @@ -4,6 +4,7 @@ #include <linux/seqlock.h> #include <uapi/asm/vsyscall.h> +#include <asm/page_types.h> #ifdef CONFIG_X86_VSYSCALL_EMULATION extern void map_vsyscall(void); @@ -24,4 +25,13 @@ static inline bool emulate_vsyscall(unsigned long error_code, } #endif +/* + * The (legacy) vsyscall page is the long page in the kernel portion + * of the address space that has user-accessible permissions. + */ +static inline bool is_vsyscall_vaddr(unsigned long vaddr) +{ + return unlikely((vaddr & PAGE_MASK) == VSYSCALL_ADDR); +} + #endif /* _ASM_X86_VSYSCALL_H */ diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 679b09cfe241c..d6375b3c633bc 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -798,15 +798,6 @@ show_signal_msg(struct pt_regs *regs, unsigned long error_code, show_opcodes(regs, loglvl); } -/* - * The (legacy) vsyscall page is the long page in the kernel portion - * of the address space that has user-accessible permissions. - */ -static bool is_vsyscall_vaddr(unsigned long vaddr) -{ - return unlikely((vaddr & PAGE_MASK) == VSYSCALL_ADDR); -} - static void __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code, unsigned long address, u32 pkey, int si_code) -- 2.29.2 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH bpf v2 1/3] x86/mm: Move is_vsyscall_vaddr() into asm/vsyscall.h 2024-01-26 11:54 ` [PATCH bpf v2 1/3] x86/mm: Move is_vsyscall_vaddr() into asm/vsyscall.h Hou Tao @ 2024-01-29 23:56 ` Sohil Mehta 2024-01-30 4:20 ` Hou Tao 0 siblings, 1 reply; 9+ messages in thread From: Sohil Mehta @ 2024-01-29 23:56 UTC (permalink / raw) To: Hou Tao, x86, bpf Cc: Dave Hansen, Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H . Peter Anvin, linux-kernel, xingwei lee, Jann Horn, Yonghong Song, houtao1 On 1/26/2024 3:54 AM, Hou Tao wrote: > From: Hou Tao <houtao1@huawei.com> > > Moving is_vsyscall_vaddr() into asm/vsyscall.h to make it available for s/Moving/Move > copy_from_kernel_nofault_allowed() in arch/x86/mm/maccess.c. > > Signed-off-by: Hou Tao <houtao1@huawei.com> > --- > arch/x86/include/asm/vsyscall.h | 10 ++++++++++ > arch/x86/mm/fault.c | 9 --------- > 2 files changed, 10 insertions(+), 9 deletions(-) > Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH bpf v2 1/3] x86/mm: Move is_vsyscall_vaddr() into asm/vsyscall.h 2024-01-29 23:56 ` Sohil Mehta @ 2024-01-30 4:20 ` Hou Tao 0 siblings, 0 replies; 9+ messages in thread From: Hou Tao @ 2024-01-30 4:20 UTC (permalink / raw) To: Sohil Mehta, x86, bpf Cc: Dave Hansen, Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H . Peter Anvin, linux-kernel, xingwei lee, Jann Horn, Yonghong Song, houtao1 On 1/30/2024 7:56 AM, Sohil Mehta wrote: > On 1/26/2024 3:54 AM, Hou Tao wrote: >> From: Hou Tao <houtao1@huawei.com> >> >> Moving is_vsyscall_vaddr() into asm/vsyscall.h to make it available for > s/Moving/Move Will update in v3. > >> copy_from_kernel_nofault_allowed() in arch/x86/mm/maccess.c. >> >> Signed-off-by: Hou Tao <houtao1@huawei.com> >> --- >> arch/x86/include/asm/vsyscall.h | 10 ++++++++++ >> arch/x86/mm/fault.c | 9 --------- >> 2 files changed, 10 insertions(+), 9 deletions(-) >> > > Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Thank you for the review and all of your suggestions. ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH bpf v2 2/3] x86/mm: Disallow vsyscall page read for copy_from_kernel_nofault() 2024-01-26 11:54 [PATCH bpf v2 0/3] Fix the read of vsyscall page through bpf Hou Tao 2024-01-26 11:54 ` [PATCH bpf v2 1/3] x86/mm: Move is_vsyscall_vaddr() into asm/vsyscall.h Hou Tao @ 2024-01-26 11:54 ` Hou Tao 2024-01-29 23:50 ` Sohil Mehta 2024-01-26 11:54 ` [PATCH bpf v2 3/3] selftest/bpf: Test the read of vsyscall page under x86-64 Hou Tao 2 siblings, 1 reply; 9+ messages in thread From: Hou Tao @ 2024-01-26 11:54 UTC (permalink / raw) To: x86, bpf Cc: Dave Hansen, Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H . Peter Anvin, linux-kernel, xingwei lee, Jann Horn, Sohil Mehta, Yonghong Song, houtao1 From: Hou Tao <houtao1@huawei.com> When trying to use copy_from_kernel_nofault() to read vsyscall page through a bpf program, the following oops was reported: BUG: unable to handle page fault for address: ffffffffff600000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 3231067 P4D 3231067 PUD 3233067 PMD 3235067 PTE 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 1 PID: 20390 Comm: test_progs ...... 6.7.0+ #58 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) ...... RIP: 0010:copy_from_kernel_nofault+0x6f/0x110 ...... Call Trace: <TASK> ? copy_from_kernel_nofault+0x6f/0x110 bpf_probe_read_kernel+0x1d/0x50 bpf_prog_2061065e56845f08_do_probe_read+0x51/0x8d trace_call_bpf+0xc5/0x1c0 perf_call_bpf_enter.isra.0+0x69/0xb0 perf_syscall_enter+0x13e/0x200 syscall_trace_enter+0x188/0x1c0 do_syscall_64+0xb5/0xe0 entry_SYSCALL_64_after_hwframe+0x6e/0x76 </TASK> ...... ---[ end trace 0000000000000000 ]--- It seems the occurrence of oops depends on SMAP feature of CPU. It happens as follow: a bpf program uses bpf_probe_read_kernel() to read from vsyscall page, bpf_probe_read_kernel() invokes copy_from_kernel_nofault() in turn and then invokes __get_user_asm(). Because the vsyscall page address is not readable for kernel space, a page fault exception is triggered accordingly, handle_page_fault() considers the vsyscall page address as a userspace address instead of a kernel space address, so the fix-up set-up by bpf isn't applied. Because the CPU has SMAP feature and the access happens in kernel mode, so page_fault_oops() is invoked and an oops happens. If these is no SMAP feature, the fix-up set-up by bpf will be applied and copy_from_kernel_nofault() will return -EFAULT instead. Considering handle_page_fault() has already considered the vsyscall page address as a userspace address, fix the problem by disallowing vsyscall page read for copy_from_kernel_nofault(). Originally-by: Thomas Gleixner <tglx@linutronix.de> Reported-by: syzbot+72aa0161922eba61b50e@syzkaller.appspotmail.com Closes: https://lore.kernel.org/bpf/CAG48ez06TZft=ATH1qh2c5mpS5BT8UakwNkzi6nvK5_djC-4Nw@mail.gmail.com Reported-by: xingwei lee <xrivendell7@gmail.com> Closes: https://lore.kernel.org/bpf/CABOYnLynjBoFZOf3Z4BhaZkc5hx_kHfsjiW+UWLoB=w33LvScw@mail.gmail.com Signed-off-by: Hou Tao <houtao1@huawei.com> --- arch/x86/mm/maccess.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/arch/x86/mm/maccess.c b/arch/x86/mm/maccess.c index 6993f026adec9..d9272e1db5224 100644 --- a/arch/x86/mm/maccess.c +++ b/arch/x86/mm/maccess.c @@ -3,6 +3,8 @@ #include <linux/uaccess.h> #include <linux/kernel.h> +#include <asm/vsyscall.h> + #ifdef CONFIG_X86_64 bool copy_from_kernel_nofault_allowed(const void *unsafe_src, size_t size) { @@ -15,6 +17,13 @@ bool copy_from_kernel_nofault_allowed(const void *unsafe_src, size_t size) if (vaddr < TASK_SIZE_MAX + PAGE_SIZE) return false; + /* Also consider the vsyscall page as userspace address. Otherwise, + * reading the vsyscall page in copy_from_kernel_nofault() may + * trigger an oops due to an unhandled page fault. + */ + if (is_vsyscall_vaddr(vaddr)) + return false; + /* * Allow everything during early boot before 'x86_virt_bits' * is initialized. Needed for instruction decoding in early -- 2.29.2 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH bpf v2 2/3] x86/mm: Disallow vsyscall page read for copy_from_kernel_nofault() 2024-01-26 11:54 ` [PATCH bpf v2 2/3] x86/mm: Disallow vsyscall page read for copy_from_kernel_nofault() Hou Tao @ 2024-01-29 23:50 ` Sohil Mehta 2024-01-30 4:18 ` Hou Tao 0 siblings, 1 reply; 9+ messages in thread From: Sohil Mehta @ 2024-01-29 23:50 UTC (permalink / raw) To: Hou Tao, x86, bpf Cc: Dave Hansen, Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H . Peter Anvin, linux-kernel, xingwei lee, Jann Horn, Yonghong Song, houtao1 Hi Hou Tao, I agree to your approach in this patch. Please see some comments below. On 1/26/2024 3:54 AM, Hou Tao wrote: > From: Hou Tao <houtao1@huawei.com> > > When trying to use copy_from_kernel_nofault() to read vsyscall page > through a bpf program, the following oops was reported: > > BUG: unable to handle page fault for address: ffffffffff600000 > #PF: supervisor read access in kernel mode > #PF: error_code(0x0000) - not-present page > PGD 3231067 P4D 3231067 PUD 3233067 PMD 3235067 PTE 0 > Oops: 0000 [#1] PREEMPT SMP PTI > CPU: 1 PID: 20390 Comm: test_progs ...... 6.7.0+ #58 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) ...... > RIP: 0010:copy_from_kernel_nofault+0x6f/0x110 > ...... > Call Trace: > <TASK> > ? copy_from_kernel_nofault+0x6f/0x110 > bpf_probe_read_kernel+0x1d/0x50 > bpf_prog_2061065e56845f08_do_probe_read+0x51/0x8d > trace_call_bpf+0xc5/0x1c0 > perf_call_bpf_enter.isra.0+0x69/0xb0 > perf_syscall_enter+0x13e/0x200 > syscall_trace_enter+0x188/0x1c0 > do_syscall_64+0xb5/0xe0 > entry_SYSCALL_64_after_hwframe+0x6e/0x76 > </TASK> > ...... > ---[ end trace 0000000000000000 ]--- > > It seems the occurrence of oops depends on SMAP feature of CPU. It > happens as follow: a bpf program uses bpf_probe_read_kernel() to read > from vsyscall page, bpf_probe_read_kernel() invokes > copy_from_kernel_nofault() in turn and then invokes __get_user_asm(). > Because the vsyscall page address is not readable for kernel space, > a page fault exception is triggered accordingly, handle_page_fault() > considers the vsyscall page address as a userspace address instead of a > kernel space address, so the fix-up set-up by bpf isn't applied. Because > the CPU has SMAP feature and the access happens in kernel mode, so > page_fault_oops() is invoked and an oops happens. If these is no SMAP > feature, the fix-up set-up by bpf will be applied and > copy_from_kernel_nofault() will return -EFAULT instead. > I find this paragraph to be a bit hard to follow. I think we can minimize the reference to SMAP here since it is only helping detect cross address space accesses. How about something like the following: The oops is triggered when: 1) A bpf program uses bpf_probe_read_kernel() to read from the vsyscall page and invokes copy_from_kernel_nofault() which in turn calls __get_user_asm(). 2) Because the vsyscall page address is not readable from kernel space, a page fault exception is triggered accordingly. 3) handle_page_fault() considers the vsyscall page address as a user space address instead of a kernel space address. This results in the fix-up setup by bpf not being applied and a page_fault_oops() is invoked due to SMAP. > Considering handle_page_fault() has already considered the vsyscall page > address as a userspace address, fix the problem by disallowing vsyscall > page read for copy_from_kernel_nofault(). > I agree, following the same approach as handle_page_fault() seems reasonable. > Originally-by: Thomas Gleixner <tglx@linutronix.de> > Reported-by: syzbot+72aa0161922eba61b50e@syzkaller.appspotmail.com > Closes: https://lore.kernel.org/bpf/CAG48ez06TZft=ATH1qh2c5mpS5BT8UakwNkzi6nvK5_djC-4Nw@mail.gmail.com > Reported-by: xingwei lee <xrivendell7@gmail.com> > Closes: https://lore.kernel.org/bpf/CABOYnLynjBoFZOf3Z4BhaZkc5hx_kHfsjiW+UWLoB=w33LvScw@mail.gmail.com > Signed-off-by: Hou Tao <houtao1@huawei.com> > --- > arch/x86/mm/maccess.c | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/arch/x86/mm/maccess.c b/arch/x86/mm/maccess.c > index 6993f026adec9..d9272e1db5224 100644 > --- a/arch/x86/mm/maccess.c > +++ b/arch/x86/mm/maccess.c > @@ -3,6 +3,8 @@ > #include <linux/uaccess.h> > #include <linux/kernel.h> > > +#include <asm/vsyscall.h> > + > #ifdef CONFIG_X86_64 > bool copy_from_kernel_nofault_allowed(const void *unsafe_src, size_t size) > { > @@ -15,6 +17,13 @@ bool copy_from_kernel_nofault_allowed(const void *unsafe_src, size_t size) > if (vaddr < TASK_SIZE_MAX + PAGE_SIZE) > return false; > > + /* Also consider the vsyscall page as userspace address. Otherwise, > + * reading the vsyscall page in copy_from_kernel_nofault() may > + * trigger an oops due to an unhandled page fault. > + */ x86 prefers a slightly different style for multi-line comments. Please refer to https://docs.kernel.org/process/maintainer-tip.html#comment-style. How about rewording the above as: /* * Reading from the vsyscall page may cause an unhandled fault in * certain cases. Though it is at an address above TASK_SIZE_MAX, it is * usually considered as a user space address. */ > + if (is_vsyscall_vaddr(vaddr)) > + return false; > + It would have been convenient if we had a common check for whether a particular address is a kernel address or not. fault_in_kernel_space() serves that purpose to an extent in other places. I thought we could rename fault_in_kernel_space() to vaddr_in_kernel_space() and use it here. But the check in copy_from_kernel_nofault_allowed() includes the user guard page as well. So the checks wouldn't exactly be the same. I am unsure of the implications if we get rid of that difference. Maybe we can leave it as-is for now unless someone else chimes in. Sohil ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH bpf v2 2/3] x86/mm: Disallow vsyscall page read for copy_from_kernel_nofault() 2024-01-29 23:50 ` Sohil Mehta @ 2024-01-30 4:18 ` Hou Tao 0 siblings, 0 replies; 9+ messages in thread From: Hou Tao @ 2024-01-30 4:18 UTC (permalink / raw) To: Sohil Mehta, x86, bpf Cc: Dave Hansen, Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H . Peter Anvin, linux-kernel, xingwei lee, Jann Horn, Yonghong Song, houtao1 Hi, On 1/30/2024 7:50 AM, Sohil Mehta wrote: > Hi Hou Tao, > > I agree to your approach in this patch. Please see some comments below. > > On 1/26/2024 3:54 AM, Hou Tao wrote: >> From: Hou Tao <houtao1@huawei.com> >> >> When trying to use copy_from_kernel_nofault() to read vsyscall page >> through a bpf program, the following oops was reported: [SNIP] >> It seems the occurrence of oops depends on SMAP feature of CPU. It >> happens as follow: a bpf program uses bpf_probe_read_kernel() to read >> from vsyscall page, bpf_probe_read_kernel() invokes >> copy_from_kernel_nofault() in turn and then invokes __get_user_asm(). >> Because the vsyscall page address is not readable for kernel space, >> a page fault exception is triggered accordingly, handle_page_fault() >> considers the vsyscall page address as a userspace address instead of a >> kernel space address, so the fix-up set-up by bpf isn't applied. Because >> the CPU has SMAP feature and the access happens in kernel mode, so >> page_fault_oops() is invoked and an oops happens. If these is no SMAP >> feature, the fix-up set-up by bpf will be applied and >> copy_from_kernel_nofault() will return -EFAULT instead. >> > I find this paragraph to be a bit hard to follow. I think we can > minimize the reference to SMAP here since it is only helping detect > cross address space accesses. How about something like the following: > > The oops is triggered when: > > 1) A bpf program uses bpf_probe_read_kernel() to read from the vsyscall > page and invokes copy_from_kernel_nofault() which in turn calls > __get_user_asm(). > > 2) Because the vsyscall page address is not readable from kernel space, > a page fault exception is triggered accordingly. > > 3) handle_page_fault() considers the vsyscall page address as a user > space address instead of a kernel space address. This results in the > fix-up setup by bpf not being applied and a page_fault_oops() is invoked > due to SMAP. Thanks for the rephrasing. It is much better now. >> Considering handle_page_fault() has already considered the vsyscall page >> address as a userspace address, fix the problem by disallowing vsyscall >> page read for copy_from_kernel_nofault(). >> > I agree, following the same approach as handle_page_fault() seems > reasonable. > >> Originally-by: Thomas Gleixner <tglx@linutronix.de> >> Reported-by: syzbot+72aa0161922eba61b50e@syzkaller.appspotmail.com >> Closes: https://lore.kernel.org/bpf/CAG48ez06TZft=ATH1qh2c5mpS5BT8UakwNkzi6nvK5_djC-4Nw@mail.gmail.com >> Reported-by: xingwei lee <xrivendell7@gmail.com> >> Closes: https://lore.kernel.org/bpf/CABOYnLynjBoFZOf3Z4BhaZkc5hx_kHfsjiW+UWLoB=w33LvScw@mail.gmail.com >> Signed-off-by: Hou Tao <houtao1@huawei.com> >> --- >> arch/x86/mm/maccess.c | 9 +++++++++ >> 1 file changed, 9 insertions(+) >> >> diff --git a/arch/x86/mm/maccess.c b/arch/x86/mm/maccess.c >> index 6993f026adec9..d9272e1db5224 100644 >> --- a/arch/x86/mm/maccess.c >> +++ b/arch/x86/mm/maccess.c >> @@ -3,6 +3,8 @@ >> #include <linux/uaccess.h> >> #include <linux/kernel.h> >> >> +#include <asm/vsyscall.h> >> + >> #ifdef CONFIG_X86_64 >> bool copy_from_kernel_nofault_allowed(const void *unsafe_src, size_t size) >> { >> @@ -15,6 +17,13 @@ bool copy_from_kernel_nofault_allowed(const void *unsafe_src, size_t size) >> if (vaddr < TASK_SIZE_MAX + PAGE_SIZE) >> return false; >> >> + /* Also consider the vsyscall page as userspace address. Otherwise, >> + * reading the vsyscall page in copy_from_kernel_nofault() may >> + * trigger an oops due to an unhandled page fault. >> + */ > x86 prefers a slightly different style for multi-line comments. Please > refer to https://docs.kernel.org/process/maintainer-tip.html#comment-style. I see. Will update. > > How about rewording the above as: > > /* > * Reading from the vsyscall page may cause an unhandled fault in > * certain cases. Though it is at an address above TASK_SIZE_MAX, it is > * usually considered as a user space address. > */ Thanks for the rewording. Will do in v3. > >> + if (is_vsyscall_vaddr(vaddr)) >> + return false; >> + > It would have been convenient if we had a common check for whether a > particular address is a kernel address or not. fault_in_kernel_space() > serves that purpose to an extent in other places. > > I thought we could rename fault_in_kernel_space() to > vaddr_in_kernel_space() and use it here. But the check in > copy_from_kernel_nofault_allowed() includes the user guard page as well. > So the checks wouldn't exactly be the same. > > I am unsure of the implications if we get rid of that difference. Maybe > we can leave it as-is for now unless someone else chimes in. There is other difference between fault_in_kernel_space() and copy_from_kernel_nofault_allowed(). fault_in_kernel_space() uses address >= TASK_SIZE_MAX to check the kernel space address, but copy_from_kernel_nofault_allowed() uses vaddr >= TASK_SIZE_MAX + PAGE_SIZE to check the kernel space address, so I prefer to keep it as-is. > > Sohil ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH bpf v2 3/3] selftest/bpf: Test the read of vsyscall page under x86-64 2024-01-26 11:54 [PATCH bpf v2 0/3] Fix the read of vsyscall page through bpf Hou Tao 2024-01-26 11:54 ` [PATCH bpf v2 1/3] x86/mm: Move is_vsyscall_vaddr() into asm/vsyscall.h Hou Tao 2024-01-26 11:54 ` [PATCH bpf v2 2/3] x86/mm: Disallow vsyscall page read for copy_from_kernel_nofault() Hou Tao @ 2024-01-26 11:54 ` Hou Tao 2024-01-26 19:36 ` Yonghong Song 2 siblings, 1 reply; 9+ messages in thread From: Hou Tao @ 2024-01-26 11:54 UTC (permalink / raw) To: x86, bpf Cc: Dave Hansen, Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H . Peter Anvin, linux-kernel, xingwei lee, Jann Horn, Sohil Mehta, Yonghong Song, houtao1 From: Hou Tao <houtao1@huawei.com> Under x86-64, when using bpf_probe_read_kernel{_str}() or bpf_probe_read{_str}() to read vsyscall page, the read may trigger oops, so add one test case to ensure that the problem is fixed. Beside those four bpf helpers mentioned above, testing the read of vsyscall page by using bpf_probe_read_user{_str} and bpf_copy_from_user{_task}() as well. The test case passes the address of vsyscall page to these six helpers and checks whether the returned values are expected: 1) For bpf_probe_read_kernel{_str}()/bpf_probe_read{_str}(), the expected return value is -ERANGE as shown below: bpf_probe_read_kernel_common copy_from_kernel_nofault // false, return -ERANGE copy_from_kernel_nofault_allowed 2) For bpf_probe_read_user{_str}(), the expected return value is -EFAULT as show below: bpf_probe_read_user_common copy_from_user_nofault // false, return -EFAULT __access_ok 3) For bpf_copy_from_user(), the expected return value is -EFAULT: // return -EFAULT bpf_copy_from_user copy_from_user _copy_from_user // return false access_ok 4) For bpf_copy_from_user_task(), the expected return value is -EFAULT: // return -EFAULT bpf_copy_from_user_task access_process_vm // return 0 vma_lookup() // return 0 expand_stack() The occurrence of oops depends on the availability of CPU SMAP [1] feature and there are three possible configurations of vsyscall page in boot cmd-line: vsyscall={xonly|none|emulate}, so there are totally six possible combinations. Under all these combinations, the running of the test case succeeds. [1]: https://en.wikipedia.org/wiki/Supervisor_Mode_Access_Prevention Signed-off-by: Hou Tao <houtao1@huawei.com> --- .../selftests/bpf/prog_tests/read_vsyscall.c | 57 +++++++++++++++++++ .../selftests/bpf/progs/read_vsyscall.c | 45 +++++++++++++++ 2 files changed, 102 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/read_vsyscall.c create mode 100644 tools/testing/selftests/bpf/progs/read_vsyscall.c diff --git a/tools/testing/selftests/bpf/prog_tests/read_vsyscall.c b/tools/testing/selftests/bpf/prog_tests/read_vsyscall.c new file mode 100644 index 0000000000000..3405923fe4e65 --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/read_vsyscall.c @@ -0,0 +1,57 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (C) 2024. Huawei Technologies Co., Ltd */ +#include "test_progs.h" +#include "read_vsyscall.skel.h" + +#if defined(__x86_64__) +/* For VSYSCALL_ADDR */ +#include <asm/vsyscall.h> +#else +/* To prevent build failure on non-x86 arch */ +#define VSYSCALL_ADDR 0UL +#endif + +struct read_ret_desc { + const char *name; + int ret; +} all_read[] = { + { .name = "probe_read_kernel", .ret = -ERANGE }, + { .name = "probe_read_kernel_str", .ret = -ERANGE }, + { .name = "probe_read", .ret = -ERANGE }, + { .name = "probe_read_str", .ret = -ERANGE }, + { .name = "probe_read_user", .ret = -EFAULT }, + { .name = "probe_read_user_str", .ret = -EFAULT }, + { .name = "copy_from_user", .ret = -EFAULT }, + { .name = "copy_from_user_task", .ret = -EFAULT }, +}; + +void test_read_vsyscall(void) +{ + struct read_vsyscall *skel; + unsigned int i; + int err; + +#if !defined(__x86_64__) + test__skip(); + return; +#endif + skel = read_vsyscall__open_and_load(); + if (!ASSERT_OK_PTR(skel, "read_vsyscall open_load")) + return; + + skel->bss->target_pid = getpid(); + err = read_vsyscall__attach(skel); + if (!ASSERT_EQ(err, 0, "read_vsyscall attach")) + goto out; + + /* userspace may don't have vsyscall page due to LEGACY_VSYSCALL_NONE, + * but it doesn't affect the returned error codes. + */ + skel->bss->user_ptr = (void *)VSYSCALL_ADDR; + usleep(1); + + for (i = 0; i < ARRAY_SIZE(all_read); i++) + ASSERT_EQ(skel->bss->read_ret[i], all_read[i].ret, all_read[i].name); +out: + read_vsyscall__destroy(skel); +} diff --git a/tools/testing/selftests/bpf/progs/read_vsyscall.c b/tools/testing/selftests/bpf/progs/read_vsyscall.c new file mode 100644 index 0000000000000..986f96687ae15 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/read_vsyscall.c @@ -0,0 +1,45 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (C) 2024. Huawei Technologies Co., Ltd */ +#include <linux/types.h> +#include <bpf/bpf_helpers.h> + +#include "bpf_misc.h" + +int target_pid = 0; +void *user_ptr = 0; +int read_ret[8]; + +char _license[] SEC("license") = "GPL"; + +SEC("fentry/" SYS_PREFIX "sys_nanosleep") +int do_probe_read(void *ctx) +{ + char buf[8]; + + if ((bpf_get_current_pid_tgid() >> 32) != target_pid) + return 0; + + read_ret[0] = bpf_probe_read_kernel(buf, sizeof(buf), user_ptr); + read_ret[1] = bpf_probe_read_kernel_str(buf, sizeof(buf), user_ptr); + read_ret[2] = bpf_probe_read(buf, sizeof(buf), user_ptr); + read_ret[3] = bpf_probe_read_str(buf, sizeof(buf), user_ptr); + read_ret[4] = bpf_probe_read_user(buf, sizeof(buf), user_ptr); + read_ret[5] = bpf_probe_read_user_str(buf, sizeof(buf), user_ptr); + + return 0; +} + +SEC("fentry.s/" SYS_PREFIX "sys_nanosleep") +int do_copy_from_user(void *ctx) +{ + char buf[8]; + + if ((bpf_get_current_pid_tgid() >> 32) != target_pid) + return 0; + + read_ret[6] = bpf_copy_from_user(buf, sizeof(buf), user_ptr); + read_ret[7] = bpf_copy_from_user_task(buf, sizeof(buf), user_ptr, + bpf_get_current_task_btf(), 0); + + return 0; +} -- 2.29.2 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH bpf v2 3/3] selftest/bpf: Test the read of vsyscall page under x86-64 2024-01-26 11:54 ` [PATCH bpf v2 3/3] selftest/bpf: Test the read of vsyscall page under x86-64 Hou Tao @ 2024-01-26 19:36 ` Yonghong Song 0 siblings, 0 replies; 9+ messages in thread From: Yonghong Song @ 2024-01-26 19:36 UTC (permalink / raw) To: Hou Tao, x86, bpf Cc: Dave Hansen, Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H . Peter Anvin, linux-kernel, xingwei lee, Jann Horn, Sohil Mehta, houtao1 On 1/26/24 3:54 AM, Hou Tao wrote: > From: Hou Tao <houtao1@huawei.com> > > Under x86-64, when using bpf_probe_read_kernel{_str}() or > bpf_probe_read{_str}() to read vsyscall page, the read may trigger oops, > so add one test case to ensure that the problem is fixed. Beside those > four bpf helpers mentioned above, testing the read of vsyscall page by > using bpf_probe_read_user{_str} and bpf_copy_from_user{_task}() as well. > > The test case passes the address of vsyscall page to these six helpers > and checks whether the returned values are expected: > > 1) For bpf_probe_read_kernel{_str}()/bpf_probe_read{_str}(), the > expected return value is -ERANGE as shown below: > > bpf_probe_read_kernel_common > copy_from_kernel_nofault > // false, return -ERANGE > copy_from_kernel_nofault_allowed > > 2) For bpf_probe_read_user{_str}(), the expected return value is -EFAULT > as show below: > > bpf_probe_read_user_common > copy_from_user_nofault > // false, return -EFAULT > __access_ok > > 3) For bpf_copy_from_user(), the expected return value is -EFAULT: > > // return -EFAULT > bpf_copy_from_user > copy_from_user > _copy_from_user > // return false > access_ok > > 4) For bpf_copy_from_user_task(), the expected return value is -EFAULT: > > // return -EFAULT > bpf_copy_from_user_task > access_process_vm > // return 0 > vma_lookup() > // return 0 > expand_stack() > > The occurrence of oops depends on the availability of CPU SMAP [1] > feature and there are three possible configurations of vsyscall page in > boot cmd-line: vsyscall={xonly|none|emulate}, so there are totally six > possible combinations. Under all these combinations, the running of the > test case succeeds. > > [1]: https://en.wikipedia.org/wiki/Supervisor_Mode_Access_Prevention > > Signed-off-by: Hou Tao <houtao1@huawei.com> The first two patches look good to me but I think it would be better if x86 folks can ack on them. The selftest patch LGTM. Acked-by: Yonghong Song <yonghong.song@linux.dev> ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2024-01-30 4:20 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-01-26 11:54 [PATCH bpf v2 0/3] Fix the read of vsyscall page through bpf Hou Tao 2024-01-26 11:54 ` [PATCH bpf v2 1/3] x86/mm: Move is_vsyscall_vaddr() into asm/vsyscall.h Hou Tao 2024-01-29 23:56 ` Sohil Mehta 2024-01-30 4:20 ` Hou Tao 2024-01-26 11:54 ` [PATCH bpf v2 2/3] x86/mm: Disallow vsyscall page read for copy_from_kernel_nofault() Hou Tao 2024-01-29 23:50 ` Sohil Mehta 2024-01-30 4:18 ` Hou Tao 2024-01-26 11:54 ` [PATCH bpf v2 3/3] selftest/bpf: Test the read of vsyscall page under x86-64 Hou Tao 2024-01-26 19:36 ` Yonghong Song
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).