* Kernel 4.15 lost set_robust_list support on POWER 9
@ 2018-02-05 12:48 Florian Weimer
2018-02-05 12:59 ` Florian Weimer
2018-02-05 21:14 ` Mauricio Faria de Oliveira
0 siblings, 2 replies; 8+ messages in thread
From: Florian Weimer @ 2018-02-05 12:48 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Jiri Slaby, Michael Ellerman
I get this:
7041 set_robust_list(0x7fff93dc3980, 24) = -1 ENOSYS (Function not
implemented)
The system call works on 4.14. Looks like the probing for
futex_cmpxchg_enabled goes wrong.
Sorry, I have no idea where to start digging. I don't see anything
obvious in dmesg.
I'm trying to revert
commit 30d6e0a4190d37740e9447e4e4815f06992dd8c3
Author: Jiri Slaby <jslaby@suse.cz>
Date: Thu Aug 24 09:31:05 2017 +0200
futex: Remove duplicated code and fix undefined behaviour
to see if it makes a difference.
Thanks,
Florian
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: Kernel 4.15 lost set_robust_list support on POWER 9 2018-02-05 12:48 Kernel 4.15 lost set_robust_list support on POWER 9 Florian Weimer @ 2018-02-05 12:59 ` Florian Weimer 2018-02-05 21:14 ` Mauricio Faria de Oliveira 1 sibling, 0 replies; 8+ messages in thread From: Florian Weimer @ 2018-02-05 12:59 UTC (permalink / raw) To: linuxppc-dev; +Cc: Jiri Slaby, Michael Ellerman On 02/05/2018 01:48 PM, Florian Weimer wrote: > I get this: > > 7041 set_robust_list(0x7fff93dc3980, 24) = -1 ENOSYS (Function not > implemented) > > The system call works on 4.14. Looks like the probing for > futex_cmpxchg_enabled goes wrong. > > Sorry, I have no idea where to start digging. I don't see anything > obvious in dmesg. > > I'm trying to revert > > commit 30d6e0a4190d37740e9447e4e4815f06992dd8c3 > Author: Jiri Slaby <jslaby@suse.cz> > Date: Thu Aug 24 09:31:05 2017 +0200 > > futex: Remove duplicated code and fix undefined behaviour > > to see if it makes a difference Never mind, it must be something else because that commit is in 4.14, but set_robust_list is still working there. (Let's hope that the probing doesn't fail randomly …) Thanks, Florian ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Kernel 4.15 lost set_robust_list support on POWER 9 2018-02-05 12:48 Kernel 4.15 lost set_robust_list support on POWER 9 Florian Weimer 2018-02-05 12:59 ` Florian Weimer @ 2018-02-05 21:14 ` Mauricio Faria de Oliveira 2018-02-05 21:55 ` Benjamin Herrenschmidt 1 sibling, 1 reply; 8+ messages in thread From: Mauricio Faria de Oliveira @ 2018-02-05 21:14 UTC (permalink / raw) To: linuxppc-dev, npiggin, mpe; +Cc: Florian Weimer, Jiri Slaby Nick, Michael, On 02/05/2018 10:48 AM, Florian Weimer wrote: > 7041 set_robust_list(0x7fff93dc3980, 24) = -1 ENOSYS (Function not > implemented) The regression was introduced by commit 371b8044 ("powerpc/64s: Initialize ISAv3 MMU registers before setting partition table"). The problem is Radix MMU specific (does not occur with 'disable_radix'), and does not occur with that code reverted (ie do not set PIDR to zero). Do you see any reasons why? (wondering if at all related to access_ok() in include/asm/uaccess.h) with: # strace -e set_robust_list -f ./test set_robust_list(0x7fffa4b03910, 24) = -1 ENOSYS (Function not implemented) +++ exited with 1 +++ # uname -r 4.15.0 without: # strace -e set_robust_list -f ./test set_robust_list(0x7fff889c3910, 24) = 0 +++ exited with 0 +++ # uname -r 4.15.0.nopidr ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Kernel 4.15 lost set_robust_list support on POWER 9 2018-02-05 21:14 ` Mauricio Faria de Oliveira @ 2018-02-05 21:55 ` Benjamin Herrenschmidt 2018-02-06 1:06 ` Nicholas Piggin 0 siblings, 1 reply; 8+ messages in thread From: Benjamin Herrenschmidt @ 2018-02-05 21:55 UTC (permalink / raw) To: Mauricio Faria de Oliveira, linuxppc-dev, npiggin, mpe Cc: Florian Weimer, Jiri Slaby, Aneesh Kumar K.V On Mon, 2018-02-05 at 19:14 -0200, Mauricio Faria de Oliveira wrote: > Nick, Michael, +Aneesh. > On 02/05/2018 10:48 AM, Florian Weimer wrote: > > 7041 set_robust_list(0x7fff93dc3980, 24) = -1 ENOSYS (Function not > > implemented) > > The regression was introduced by commit 371b8044 ("powerpc/64s: > Initialize ISAv3 MMU registers before setting partition table"). > > The problem is Radix MMU specific (does not occur with 'disable_radix'), > and does not occur with that code reverted (ie do not set PIDR to zero). > > Do you see any reasons why? > (wondering if at all related to access_ok() in include/asm/uaccess.h) > > with: > > # strace -e set_robust_list -f ./test > set_robust_list(0x7fffa4b03910, 24) = -1 ENOSYS (Function not > implemented) > +++ exited with 1 +++ > > # uname -r > 4.15.0 > > without: > > # strace -e set_robust_list -f ./test > set_robust_list(0x7fff889c3910, 24) = 0 > +++ exited with 0 +++ > > # uname -r > 4.15.0.nopidr ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Kernel 4.15 lost set_robust_list support on POWER 9 2018-02-05 21:55 ` Benjamin Herrenschmidt @ 2018-02-06 1:06 ` Nicholas Piggin 2018-02-06 3:17 ` Aneesh Kumar K.V 2018-02-06 13:17 ` Mauricio Faria de Oliveira 0 siblings, 2 replies; 8+ messages in thread From: Nicholas Piggin @ 2018-02-06 1:06 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Mauricio Faria de Oliveira, linuxppc-dev, mpe, Florian Weimer, Jiri Slaby, Aneesh Kumar K.V On Tue, 06 Feb 2018 08:55:31 +1100 Benjamin Herrenschmidt <benh@au1.ibm.com> wrote: > On Mon, 2018-02-05 at 19:14 -0200, Mauricio Faria de Oliveira wrote: > > Nick, Michael, > > +Aneesh. > > > On 02/05/2018 10:48 AM, Florian Weimer wrote: > > > 7041 set_robust_list(0x7fff93dc3980, 24) = -1 ENOSYS (Function not > > > implemented) > > > > The regression was introduced by commit 371b8044 ("powerpc/64s: > > Initialize ISAv3 MMU registers before setting partition table"). > > > > The problem is Radix MMU specific (does not occur with 'disable_radix'), > > and does not occur with that code reverted (ie do not set PIDR to zero). > > > > Do you see any reasons why? > > (wondering if at all related to access_ok() in include/asm/uaccess.h) Does this help? powerpc/64s/radix: allocate guard-PID for kernel contexts at boot 64s/radix uses PID 0 for its kernel mapping at the 0xCxxx (quadrant 3) address. This mapping is also accessible at 0x0xxx when PIDR=0 -- the top 2 bits just selects the addressing mode, which is effectively the same when PIDR=0 -- so address 0 translates to physical address 0 by the kernel's linear map. Commit 371b8044 ("powerpc/64s: Initialize ISAv3 MMU registers before setting partition table"), which zeroes PIDR at boot, caused this situation, and that stops kernel access to NULL from faulting in boot. Before this, we inherited what firmware or kexec gave, which is almost always non-zero. futex_atomic_cmpxchg detection is done in boot, by testing if it returns -EFAULT on a NULL address. This breaks when kernel access to NULL during boot does not fault. This patch allocates a non-zero guard PID for init_mm, and switches kernel context to the guard PID at boot. This disallows access to the kernel mapping from quadrant 0 at boot. The effectiveness of this protection will be diminished a little after boot when kernel threads inherit the last context, but those should have NULL guard areas, and it's possible we will actually prefer to do a non-lazy switch back to the guard PID in a future change. For now, this gives a minimal fix, and gives NULL pointer protection for boot. --- arch/powerpc/mm/pgtable-radix.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c index 573a9a2ee455..6389a8527e4a 100644 --- a/arch/powerpc/mm/pgtable-radix.c +++ b/arch/powerpc/mm/pgtable-radix.c @@ -20,6 +20,7 @@ #include <asm/pgtable.h> #include <asm/pgalloc.h> +#include <asm/mmu_context.h> #include <asm/dma.h> #include <asm/machdep.h> #include <asm/mmu.h> @@ -333,6 +334,9 @@ static void __init radix_init_pgtable(void) "r" (TLBIEL_INVAL_SET_LPID), "r" (0)); asm volatile("eieio; tlbsync; ptesync" : : : "memory"); trace_tlbie(0, 0, TLBIEL_INVAL_SET_LPID, 0, 2, 1, 1); + + init_mm.context.id = mmu_base_pid; + mmu_base_pid++; } static void __init radix_init_partition_table(void) @@ -579,7 +583,7 @@ void __init radix__early_init_mmu(void) radix_init_iamr(); radix_init_pgtable(); - + radix__switch_mmu_context(NULL, &init_mm); if (cpu_has_feature(CPU_FTR_HVMODE)) tlbiel_all(); } @@ -604,6 +608,7 @@ void radix__early_init_mmu_secondary(void) } radix_init_iamr(); + radix__switch_mmu_context(NULL, &init_mm); if (cpu_has_feature(CPU_FTR_HVMODE)) tlbiel_all(); } -- 2.15.1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: Kernel 4.15 lost set_robust_list support on POWER 9 2018-02-06 1:06 ` Nicholas Piggin @ 2018-02-06 3:17 ` Aneesh Kumar K.V 2018-02-06 4:29 ` Nicholas Piggin 2018-02-06 13:17 ` Mauricio Faria de Oliveira 1 sibling, 1 reply; 8+ messages in thread From: Aneesh Kumar K.V @ 2018-02-06 3:17 UTC (permalink / raw) To: Nicholas Piggin, Benjamin Herrenschmidt Cc: Mauricio Faria de Oliveira, linuxppc-dev, mpe, Florian Weimer, Jiri Slaby Nicholas Piggin <npiggin@gmail.com> writes: > On Tue, 06 Feb 2018 08:55:31 +1100 > Benjamin Herrenschmidt <benh@au1.ibm.com> wrote: > >> On Mon, 2018-02-05 at 19:14 -0200, Mauricio Faria de Oliveira wrote: >> > Nick, Michael, >> >> +Aneesh. >> >> > On 02/05/2018 10:48 AM, Florian Weimer wrote: >> > > 7041 set_robust_list(0x7fff93dc3980, 24) = -1 ENOSYS (Function not >> > > implemented) >> > >> > The regression was introduced by commit 371b8044 ("powerpc/64s: >> > Initialize ISAv3 MMU registers before setting partition table"). >> > >> > The problem is Radix MMU specific (does not occur with 'disable_radix'), >> > and does not occur with that code reverted (ie do not set PIDR to zero). >> > >> > Do you see any reasons why? >> > (wondering if at all related to access_ok() in include/asm/uaccess.h) > > Does this help? > > powerpc/64s/radix: allocate guard-PID for kernel contexts at boot > > 64s/radix uses PID 0 for its kernel mapping at the 0xCxxx (quadrant 3) > address. This mapping is also accessible at 0x0xxx when PIDR=0 -- the > top 2 bits just selects the addressing mode, which is effectively the > same when PIDR=0 -- so address 0 translates to physical address 0 by > the kernel's linear map. > > Commit 371b8044 ("powerpc/64s: Initialize ISAv3 MMU registers before > setting partition table"), which zeroes PIDR at boot, caused this > situation, and that stops kernel access to NULL from faulting in boot. > Before this, we inherited what firmware or kexec gave, which is almost > always non-zero. > > futex_atomic_cmpxchg detection is done in boot, by testing if it > returns -EFAULT on a NULL address. This breaks when kernel access to > NULL during boot does not fault. > > This patch allocates a non-zero guard PID for init_mm, and switches > kernel context to the guard PID at boot. This disallows access to the > kernel mapping from quadrant 0 at boot. > > The effectiveness of this protection will be diminished a little after > boot when kernel threads inherit the last context, but those should > have NULL guard areas, and it's possible we will actually prefer to do > a non-lazy switch back to the guard PID in a future change. For now, > this gives a minimal fix, and gives NULL pointer protection for boot. I also have this as a part of another patch series. Since we already support cmpxchg(), i would suggest we avoid the runtime check. I needed this w.r.t hash so that we don't detect a NULL access as bad slb address because we don't have PACA slb_addr_limit initialized correctly that early. commit c42b0fb10027af0c44fc9e2f6f9586203c38f99b Author: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Date: Wed Jan 24 13:54:22 2018 +0530 Don't do futext cmp test. It access NULL address early in the boot and we want to avoid that to simplify the fault handling. futex_detect_cmpxchg() does a cmpxchg_futex_value_locked on a NULL user addr to runtime detect whether architecture implements atomic cmpxchg for futex. diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype index a429d859f15d..31bc2bd5dfd1 100644 --- a/arch/powerpc/platforms/Kconfig.cputype +++ b/arch/powerpc/platforms/Kconfig.cputype @@ -75,6 +75,7 @@ config PPC_BOOK3S_64 select ARCH_SUPPORTS_NUMA_BALANCING select IRQ_WORK select HAVE_KERNEL_XZ + select HAVE_FUTEX_CMPXCHG if FUTEX config PPC_BOOK3E_64 bool "Embedded processors" ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: Kernel 4.15 lost set_robust_list support on POWER 9 2018-02-06 3:17 ` Aneesh Kumar K.V @ 2018-02-06 4:29 ` Nicholas Piggin 0 siblings, 0 replies; 8+ messages in thread From: Nicholas Piggin @ 2018-02-06 4:29 UTC (permalink / raw) To: Aneesh Kumar K.V Cc: Benjamin Herrenschmidt, Mauricio Faria de Oliveira, linuxppc-dev, mpe, Florian Weimer, Jiri Slaby On Tue, 06 Feb 2018 08:47:03 +0530 "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote: > Nicholas Piggin <npiggin@gmail.com> writes: > > > On Tue, 06 Feb 2018 08:55:31 +1100 > > Benjamin Herrenschmidt <benh@au1.ibm.com> wrote: > > > >> On Mon, 2018-02-05 at 19:14 -0200, Mauricio Faria de Oliveira wrote: > >> > Nick, Michael, > >> > >> +Aneesh. > >> > >> > On 02/05/2018 10:48 AM, Florian Weimer wrote: > >> > > 7041 set_robust_list(0x7fff93dc3980, 24) = -1 ENOSYS (Function not > >> > > implemented) > >> > > >> > The regression was introduced by commit 371b8044 ("powerpc/64s: > >> > Initialize ISAv3 MMU registers before setting partition table"). > >> > > >> > The problem is Radix MMU specific (does not occur with 'disable_radix'), > >> > and does not occur with that code reverted (ie do not set PIDR to zero). > >> > > >> > Do you see any reasons why? > >> > (wondering if at all related to access_ok() in include/asm/uaccess.h) > > > > Does this help? > > > > powerpc/64s/radix: allocate guard-PID for kernel contexts at boot > > > > 64s/radix uses PID 0 for its kernel mapping at the 0xCxxx (quadrant 3) > > address. This mapping is also accessible at 0x0xxx when PIDR=0 -- the > > top 2 bits just selects the addressing mode, which is effectively the > > same when PIDR=0 -- so address 0 translates to physical address 0 by > > the kernel's linear map. > > > > Commit 371b8044 ("powerpc/64s: Initialize ISAv3 MMU registers before > > setting partition table"), which zeroes PIDR at boot, caused this > > situation, and that stops kernel access to NULL from faulting in boot. > > Before this, we inherited what firmware or kexec gave, which is almost > > always non-zero. > > > > futex_atomic_cmpxchg detection is done in boot, by testing if it > > returns -EFAULT on a NULL address. This breaks when kernel access to > > NULL during boot does not fault. > > > > This patch allocates a non-zero guard PID for init_mm, and switches > > kernel context to the guard PID at boot. This disallows access to the > > kernel mapping from quadrant 0 at boot. > > > > The effectiveness of this protection will be diminished a little after > > boot when kernel threads inherit the last context, but those should > > have NULL guard areas, and it's possible we will actually prefer to do > > a non-lazy switch back to the guard PID in a future change. For now, > > this gives a minimal fix, and gives NULL pointer protection for boot. > > I also have this as a part of another patch series. Since we already > support cmpxchg(), i would suggest we avoid the runtime check. > > I needed this w.r.t hash so that we don't detect a NULL access as bad > slb address because we don't have PACA slb_addr_limit initialized > correctly that early. > > commit c42b0fb10027af0c44fc9e2f6f9586203c38f99b > Author: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> > Date: Wed Jan 24 13:54:22 2018 +0530 > > Don't do futext cmp test. > > It access NULL address early in the boot and we want to avoid that to simplify > the fault handling. > futex_detect_cmpxchg() does a cmpxchg_futex_value_locked on a NULL user addr > to runtime detect whether architecture implements atomic cmpxchg for futex. > > diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype > index a429d859f15d..31bc2bd5dfd1 100644 > --- a/arch/powerpc/platforms/Kconfig.cputype > +++ b/arch/powerpc/platforms/Kconfig.cputype > @@ -75,6 +75,7 @@ config PPC_BOOK3S_64 > select ARCH_SUPPORTS_NUMA_BALANCING > select IRQ_WORK > select HAVE_KERNEL_XZ > + select HAVE_FUTEX_CMPXCHG if FUTEX > > config PPC_BOOK3E_64 > bool "Embedded processors" > > I think that's okay, but what I'd prefer is to set up the hash context sufficiently that it will cope with a userspace access (and preferably fault) before we switch on the MMU at boot. We can do this patch as well, as a "don't bother testing because we always support it" cleanup. Thanks, Nick ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Kernel 4.15 lost set_robust_list support on POWER 9 2018-02-06 1:06 ` Nicholas Piggin 2018-02-06 3:17 ` Aneesh Kumar K.V @ 2018-02-06 13:17 ` Mauricio Faria de Oliveira 1 sibling, 0 replies; 8+ messages in thread From: Mauricio Faria de Oliveira @ 2018-02-06 13:17 UTC (permalink / raw) To: Nicholas Piggin, Benjamin Herrenschmidt Cc: Florian Weimer, linuxppc-dev, Aneesh Kumar K.V, Jiri Slaby On 02/05/2018 11:06 PM, Nicholas Piggin wrote: > Does this help? > > powerpc/64s/radix: allocate guard-PID for kernel contexts at boot Yes, the test-case passes: # strace -e set_robust_list -f ./test set_robust_list(0x7fff8d453910, 24) = 0 +++ exited with 0 +++ # uname -r 4.15.0.guardpid Thanks! cheers, Mauricio ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2018-02-06 13:17 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-02-05 12:48 Kernel 4.15 lost set_robust_list support on POWER 9 Florian Weimer 2018-02-05 12:59 ` Florian Weimer 2018-02-05 21:14 ` Mauricio Faria de Oliveira 2018-02-05 21:55 ` Benjamin Herrenschmidt 2018-02-06 1:06 ` Nicholas Piggin 2018-02-06 3:17 ` Aneesh Kumar K.V 2018-02-06 4:29 ` Nicholas Piggin 2018-02-06 13:17 ` Mauricio Faria de Oliveira
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).