* [PATCH] ARC: Improve cmpxchng syscall implementation @ 2018-03-19 11:00 Alexey Brodkin 2018-03-19 18:29 ` Vineet Gupta 2018-06-19 9:26 ` Peter Zijlstra 0 siblings, 2 replies; 7+ messages in thread From: Alexey Brodkin @ 2018-03-19 11:00 UTC (permalink / raw) To: linux-snps-arc Cc: linux-kernel, Vineet Gupta, Alexey Brodkin, Peter Zijlstra, Max Filippov, linux-arch arc_usr_cmpxchg syscall is supposed to be used on platforms that lack support of Load-Locked/Store-Conditional instructions in hardware. And in that case we mimic missing hardware features with help of kernel's sycall that "atomically" checks current value in memory and then if it matches caller expectation new value is written to that same location. What's important in the description above: - Check-and-exchange must be "atomical" which means preemption must be disabled during entire "transaction" - Data accessed is from user-space, i.e. we're dealing with virtual addresses And in current implementation we have a couple of problems: 1. We do disable preemprion around __get_user() & __put_user() but that in its turn disables page fault handler. That means if a pointer to user's data has no mapping in the TLB we won't be able to access required data. Instead software "exception handling" code from __get_user_fn() will return -EFAULT. 2. What's worse if we're dealing with data from not yet allocated page (think of pre-copy-on-write state) we'll successfully read data but on write we'll silently return to user-space with correct result (which we really read just before). That leads to very strange problems in user-space app further down the line because new value was never written to the destination. 3. Regardless of what went wrong we'll return from syscall and user-space application will continue to execute. Even if user's pointer was completely bogus. In case of hardware LL/SC that app would have been killed by the kernel. With that change we attempt to imrove on all 3 items above: 1. We still disable preemption around read-and-write of user's data but if we happen to fail with either of them we're enabling preemption and try to force page fault so that we have a correct mapping in the TLB. Then re-try again in "atomic" context. 2. If real page fault fails or even access_ok() returns false we send SIGSEGV to the user-space process so if something goes seriously wrong we'll know about it much earlier. Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: linux-arch@vger.kernel.org --- arch/arc/kernel/process.c | 47 +++++++++++++++++++++++++++++++++++++---------- 1 file changed, 37 insertions(+), 10 deletions(-) diff --git a/arch/arc/kernel/process.c b/arch/arc/kernel/process.c index 5ac3b547453f..d7d3e16133d6 100644 --- a/arch/arc/kernel/process.c +++ b/arch/arc/kernel/process.c @@ -47,7 +47,9 @@ SYSCALL_DEFINE0(arc_gettls) SYSCALL_DEFINE3(arc_usr_cmpxchg, int *, uaddr, int, expected, int, new) { struct pt_regs *regs = current_pt_regs(); - int uval = -EFAULT; + struct page *page; + u32 val; + int ret; /* * This is only for old cores lacking LLOCK/SCOND, which by defintion @@ -60,23 +62,48 @@ SYSCALL_DEFINE3(arc_usr_cmpxchg, int *, uaddr, int, expected, int, new) /* Z indicates to userspace if operation succeded */ regs->status32 &= ~STATUS_Z_MASK; - if (!access_ok(VERIFY_WRITE, uaddr, sizeof(int))) - return -EFAULT; + ret = access_ok(VERIFY_WRITE, uaddr, sizeof(*uaddr)); + if (!ret) + goto fail; +again: preempt_disable(); - if (__get_user(uval, uaddr)) - goto done; - - if (uval == expected) { - if (!__put_user(new, uaddr)) + ret = __get_user(val, uaddr); + if (ret == -EFAULT) { + preempt_enable(); + ret = get_user_pages_fast((unsigned long)uaddr, 1, 1, &page); + if (ret < 0) + goto fail; + + put_page(page); + goto again; + } else if (ret) + goto fail; + + if (val == expected) { + ret = __put_user(new, uaddr); + if (!ret) regs->status32 |= STATUS_Z_MASK; } -done: preempt_enable(); - return uval; + if (ret == -EFAULT) { + ret = get_user_pages_fast((unsigned long)uaddr, 1, 1, &page); + if (ret < 0) + goto fail; + + put_page(page); + goto again; + } else if (ret) + goto fail; + + return val; + +fail: + force_sig(SIGSEGV, current); + return ret; } #ifdef CONFIG_ISA_ARCV2 -- 2.11.0 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] ARC: Improve cmpxchng syscall implementation 2018-03-19 11:00 [PATCH] ARC: Improve cmpxchng syscall implementation Alexey Brodkin @ 2018-03-19 18:29 ` Vineet Gupta 2018-03-21 11:54 ` Alexey Brodkin 2018-06-19 9:26 ` Peter Zijlstra 1 sibling, 1 reply; 7+ messages in thread From: Vineet Gupta @ 2018-03-19 18:29 UTC (permalink / raw) To: Alexey Brodkin, linux-snps-arc Cc: linux-kernel, Peter Zijlstra, Max Filippov, linux-arch On 03/19/2018 04:00 AM, Alexey Brodkin wrote: > arc_usr_cmpxchg syscall is supposed to be used on platforms > that lack support of Load-Locked/Store-Conditional instructions > in hardware. And in that case we mimic missing hardware features > with help of kernel's sycall that "atomically" checks current > value in memory and then if it matches caller expectation new > value is written to that same location. > ... ... > > 2. What's worse if we're dealing with data from not yet allocated > page (think of pre-copy-on-write state) we'll successfully > read data but on write we'll silently return to user-space > with correct result This is technically incorrect, even for reading, you need a page, which could be common zero page in certain cases. (which we really read just before). That leads > to very strange problems in user-space app further down the line > because new value was never written to the destination. > > 3. Regardless of what went wrong we'll return from syscall > and user-space application will continue to execute. > Even if user's pointer was completely bogus. Again we are exaggerating (from technical correctness POV) - if user pointer was bogs, the read would not have worked in first place etc. So lets tone down the rhetoric. > In case of hardware LL/SC that app would have been killed > by the kernel. > > With that change we attempt to imrove on all 3 items above: > > 1. We still disable preemption around read-and-write of > user's data but if we happen to fail with either of them > we're enabling preemption and try to force page fault so > that we have a correct mapping in the TLB. Then re-try > again in "atomic" context. > > 2. If real page fault fails or even access_ok() returns false > we send SIGSEGV to the user-space process so if something goes > seriously wrong we'll know about it much earlier. > > > /* > * This is only for old cores lacking LLOCK/SCOND, which by defintion > @@ -60,23 +62,48 @@ SYSCALL_DEFINE3(arc_usr_cmpxchg, int *, uaddr, int, expected, int, new) > /* Z indicates to userspace if operation succeded */ > regs->status32 &= ~STATUS_Z_MASK; > > - if (!access_ok(VERIFY_WRITE, uaddr, sizeof(int))) > - return -EFAULT; > + ret = access_ok(VERIFY_WRITE, uaddr, sizeof(*uaddr)); > + if (!ret) > + goto fail; > > +again: > preempt_disable(); > > - if (__get_user(uval, uaddr)) > - goto done; > - > - if (uval == expected) { > - if (!__put_user(new, uaddr)) > + ret = __get_user(val, uaddr); > + if (ret == -EFAULT) { Lets see if this warrants adding complexity ! This implies that TLB entry with Read permissions didn't exist for reading the var and page fault handler could not wire up even a zero page due to preempt_disable, meaning it was something not touched by userspace already - sort of uninitialized variable in user code. Otherwise it is extremely unlikely to start with a TLB entry with Read permissions, followed by syscall Trap only to find the entry missing, unless a global TLB flush came from other cores, right in the middle. But this syscall is not guaranteed to work with SMP anyways, so lets ignore any SMP misdoings here. Now in case it was *an* uninitialized var, do we have to guarantee any well defined semantics for the kernel emulation of cmpxchg ? IMO it should be fine to return 0 or -EFAULT etc. Infact -EFAULT is better as it will force a retry loop on user side, given the typical cmpxchg usage pattern. > + preempt_enable(); > + ret = get_user_pages_fast((unsigned long)uaddr, 1, 1, &page); > + if (ret < 0) > + goto fail; > + > + put_page(page); > + goto again; > + } else if (ret) > + goto fail; > + > + if (val == expected) { > + ret = __put_user(new, uaddr); > + if (!ret) > regs->status32 |= STATUS_Z_MASK; > } > > -done: > preempt_enable(); > > - return uval; > + if (ret == -EFAULT) { > + ret = get_user_pages_fast((unsigned long)uaddr, 1, 1, &page); > + if (ret < 0) > + goto fail; > + > + put_page(page); > + goto again; > + } else if (ret) > + goto fail; > + > + return val; > + > +fail: > + force_sig(SIGSEGV, current); > + return ret; > } > > #ifdef CONFIG_ISA_ARCV2 > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] ARC: Improve cmpxchng syscall implementation 2018-03-19 18:29 ` Vineet Gupta @ 2018-03-21 11:54 ` Alexey Brodkin 2018-04-04 8:56 ` Alexey Brodkin 2018-04-18 18:16 ` Vineet Gupta 0 siblings, 2 replies; 7+ messages in thread From: Alexey Brodkin @ 2018-03-21 11:54 UTC (permalink / raw) To: Vineet Gupta Cc: wbx@uclibc-ng.org, linux-kernel@vger.kernel.org, peterz@infradead.org, jcmvbkbc@gmail.com, linux-arch@vger.kernel.org, linux-snps-arc@lists.infradead.org Hi Vineet, On Mon, 2018-03-19 at 11:29 -0700, Vineet Gupta wrote: > On 03/19/2018 04:00 AM, Alexey Brodkin wrote: > > arc_usr_cmpxchg syscall is supposed to be used on platforms > > that lack support of Load-Locked/Store-Conditional instructions > > in hardware. And in that case we mimic missing hardware features > > with help of kernel's sycall that "atomically" checks current > > value in memory and then if it matches caller expectation new > > value is written to that same location. > > > > ... > ... > > > > > 2. What's worse if we're dealing with data from not yet allocated > > page (think of pre-copy-on-write state) we'll successfully > > read data but on write we'll silently return to user-space > > with correct result > > This is technically incorrect, even for reading, you need a page, which could be > common zero page in certain cases. Ok I'll reword it like. > > (which we really read just before). That leads > > to very strange problems in user-space app further down the line > > because new value was never written to the destination. > > > > 3. Regardless of what went wrong we'll return from syscall > > and user-space application will continue to execute. > > Even if user's pointer was completely bogus. > > Again we are exaggerating (from technical correctness POV) - if user pointer was > bogs, the read would not have worked in first place etc. So lets tone down the > rhetoric. Ok here I may rephrase it like that: ------------------------------->8----------------------------- 3. Regardless of what went wrong we'll return from syscall and user-space application will continue to execute. ------------------------------->8----------------------------- > > > In case of hardware LL/SC that app would have been killed > > by the kernel. > > > > With that change we attempt to imrove on all 3 items above: > > > > 1. We still disable preemption around read-and-write of > > user's data but if we happen to fail with either of them > > we're enabling preemption and try to force page fault so > > that we have a correct mapping in the TLB. Then re-try > > again in "atomic" context. > > > > 2. If real page fault fails or even access_ok() returns false > > we send SIGSEGV to the user-space process so if something goes > > seriously wrong we'll know about it much earlier. > > > > > > > > /* > > * This is only for old cores lacking LLOCK/SCOND, which by defintion > > @@ -60,23 +62,48 @@ SYSCALL_DEFINE3(arc_usr_cmpxchg, int *, uaddr, int, expected, int, new) > > /* Z indicates to userspace if operation succeded */ > > regs->status32 &= ~STATUS_Z_MASK; > > > > - if (!access_ok(VERIFY_WRITE, uaddr, sizeof(int))) > > - return -EFAULT; > > + ret = access_ok(VERIFY_WRITE, uaddr, sizeof(*uaddr)); > > + if (!ret) > > + goto fail; > > > > +again: > > preempt_disable(); > > > > - if (__get_user(uval, uaddr)) > > - goto done; > > - > > - if (uval == expected) { > > - if (!__put_user(new, uaddr)) > > + ret = __get_user(val, uaddr); > > + if (ret == -EFAULT) { > > > Lets see if this warrants adding complexity ! This implies that TLB entry with > Read permissions didn't exist for reading the var and page fault handler could not > wire up even a zero page due to preempt_disable, meaning it was something not > touched by userspace already - sort of uninitialized variable in user code. Ok I completely missed the fact that fast path TLB miss handler is being executed even if we have preemption disabled. So given the mapping exist we do not need to retry with enabled preemption. Still maybe I'm a bit paranoid here but IMHO it's good to be ready for a corner-case when the pointer is completely bogus and there's no mapping for him. I understand that today we only expect this syscall to be used from libc's internals but as long as syscall exists nobody stops anybody from using it directly without libc. So maybe instead of doing get_user_pages_fast() just send a SIGSEGV to the process? At least user will realize there's some problem at earlier stage. > Otherwise it is extremely unlikely to start with a TLB entry with Read > permissions, followed by syscall Trap only to find the entry missing, unless a > global TLB flush came from other cores, right in the middle. But this syscall is > not guaranteed to work with SMP anyways, so lets ignore any SMP misdoings here. Well but that's exactly the situation I was debugging: we start from data from read-only page and on attempt to write back modified value COW machinery gets involved. That was on UP platform. > Now in case it was *an* uninitialized var, do we have to guarantee any well > defined semantics for the kernel emulation of cmpxchg ? IMO it should be fine to > return 0 or -EFAULT etc. Infact -EFAULT is better as it will force a retry loop on > user side, given the typical cmpxchg usage pattern. The problem is libc only expects to get a value read from memory. And in theory expected value might be -14 which is basically -EFAULT. I'm not talking about 0 at all because in some cases that's exactly what user-space expects. So if we read unexpected value then we'll just return it without even attempting to write. If we read expected data but fail to write then we'll send a SIGSEGV and return whatever... let it be -EFAULT - anyways the app will be killed on exit from this syscall. -Alexey ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] ARC: Improve cmpxchng syscall implementation 2018-03-21 11:54 ` Alexey Brodkin @ 2018-04-04 8:56 ` Alexey Brodkin 2018-04-18 18:16 ` Vineet Gupta 1 sibling, 0 replies; 7+ messages in thread From: Alexey Brodkin @ 2018-04-04 8:56 UTC (permalink / raw) To: peterz@infradead.org, Vineet Gupta Cc: wbx@uclibc-ng.org, linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, jcmvbkbc@gmail.com, linux-snps-arc@lists.infradead.org Hi Vineet, Peter, On Wed, 2018-03-21 at 14:54 +0300, Alexey Brodkin wrote: > Hi Vineet, > > On Mon, 2018-03-19 at 11:29 -0700, Vineet Gupta wrote: > > On 03/19/2018 04:00 AM, Alexey Brodkin wrote: > > > arc_usr_cmpxchg syscall is supposed to be used on platforms > > > that lack support of Load-Locked/Store-Conditional instructions > > > in hardware. And in that case we mimic missing hardware features > > > with help of kernel's sycall that "atomically" checks current > > > value in memory and then if it matches caller expectation new > > > value is written to that same location. > > > > > > > ... > > ... > > > > > > > > 2. What's worse if we're dealing with data from not yet allocated > > > page (think of pre-copy-on-write state) we'll successfully > > > read data but on write we'll silently return to user-space > > > with correct result > > > > This is technically incorrect, even for reading, you need a page, which could be > > common zero page in certain cases. > > Ok I'll reword it like. > > > > > (which we really read just before). That leads > > > to very strange problems in user-space app further down the line > > > because new value was never written to the destination. > > > > > > 3. Regardless of what went wrong we'll return from syscall > > > and user-space application will continue to execute. > > > Even if user's pointer was completely bogus. > > > > Again we are exaggerating (from technical correctness POV) - if user pointer was > > bogs, the read would not have worked in first place etc. So lets tone down the > > rhetoric. > > Ok here I may rephrase it like that: > ------------------------------->8----------------------------- > 3. Regardless of what went wrong we'll return from syscall > and user-space application will continue to execute. > ------------------------------->8----------------------------- > > > > > > In case of hardware LL/SC that app would have been killed > > > by the kernel. > > > > > > With that change we attempt to imrove on all 3 items above: > > > > > > 1. We still disable preemption around read-and-write of > > > user's data but if we happen to fail with either of them > > > we're enabling preemption and try to force page fault so > > > that we have a correct mapping in the TLB. Then re-try > > > again in "atomic" context. > > > > > > 2. If real page fault fails or even access_ok() returns false > > > we send SIGSEGV to the user-space process so if something goes > > > seriously wrong we'll know about it much earlier. > > > > > > > > > > > > > /* > > > * This is only for old cores lacking LLOCK/SCOND, which by defintion > > > @@ -60,23 +62,48 @@ SYSCALL_DEFINE3(arc_usr_cmpxchg, int *, uaddr, int, expected, int, new) > > > /* Z indicates to userspace if operation succeded */ > > > regs->status32 &= ~STATUS_Z_MASK; > > > > > > - if (!access_ok(VERIFY_WRITE, uaddr, sizeof(int))) > > > - return -EFAULT; > > > + ret = access_ok(VERIFY_WRITE, uaddr, sizeof(*uaddr)); > > > + if (!ret) > > > + goto fail; > > > > > > +again: > > > preempt_disable(); > > > > > > - if (__get_user(uval, uaddr)) > > > - goto done; > > > - > > > - if (uval == expected) { > > > - if (!__put_user(new, uaddr)) > > > + ret = __get_user(val, uaddr); > > > + if (ret == -EFAULT) { > > > > > > Lets see if this warrants adding complexity ! This implies that TLB entry with > > Read permissions didn't exist for reading the var and page fault handler could not > > wire up even a zero page due to preempt_disable, meaning it was something not > > touched by userspace already - sort of uninitialized variable in user code. > > Ok I completely missed the fact that fast path TLB miss handler is being > executed even if we have preemption disabled. So given the mapping exist > we do not need to retry with enabled preemption. > > Still maybe I'm a bit paranoid here but IMHO it's good to be ready for a corner-case > when the pointer is completely bogus and there's no mapping for him. > I understand that today we only expect this syscall to be used from libc's > internals but as long as syscall exists nobody stops anybody from using it > directly without libc. So maybe instead of doing get_user_pages_fast() just > send a SIGSEGV to the process? At least user will realize there's some problem > at earlier stage. > > > Otherwise it is extremely unlikely to start with a TLB entry with Read > > permissions, followed by syscall Trap only to find the entry missing, unless a > > global TLB flush came from other cores, right in the middle. But this syscall is > > not guaranteed to work with SMP anyways, so lets ignore any SMP misdoings here. > > Well but that's exactly the situation I was debugging: we start from data from read-only > page and on attempt to write back modified value COW machinery gets involved. > > That was on UP platform. > > > Now in case it was *an* uninitialized var, do we have to guarantee any well > > defined semantics for the kernel emulation of cmpxchg ? IMO it should be fine to > > return 0 or -EFAULT etc. Infact -EFAULT is better as it will force a retry loop on > > user side, given the typical cmpxchg usage pattern. > > The problem is libc only expects to get a value read from memory. > And in theory expected value might be -14 which is basically -EFAULT. > I'm not talking about 0 at all because in some cases that's exactly what > user-space expects. > > So if we read unexpected value then we'll just return it without even attempting > to write. > > If we read expected data but fail to write then we'll send a SIGSEGV and > return whatever... let it be -EFAULT - anyways the app will be killed on exit from > this syscall. Any comments on my comments above? -Alexey ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] ARC: Improve cmpxchng syscall implementation 2018-03-21 11:54 ` Alexey Brodkin 2018-04-04 8:56 ` Alexey Brodkin @ 2018-04-18 18:16 ` Vineet Gupta 2018-06-19 7:58 ` Alexey Brodkin 1 sibling, 1 reply; 7+ messages in thread From: Vineet Gupta @ 2018-04-18 18:16 UTC (permalink / raw) To: Alexey Brodkin Cc: wbx@uclibc-ng.org, linux-kernel@vger.kernel.org, peterz@infradead.org, jcmvbkbc@gmail.com, linux-arch@vger.kernel.org, linux-snps-arc@lists.infradead.org On 03/21/2018 04:54 AM, Alexey Brodkin wrote: > /* >>> * This is only for old cores lacking LLOCK/SCOND, which by defintion >>> @@ -60,23 +62,48 @@ SYSCALL_DEFINE3(arc_usr_cmpxchg, int *, uaddr, int, expected, int, new) >>> /* Z indicates to userspace if operation succeded */ >>> regs->status32 &= ~STATUS_Z_MASK; >>> >>> - if (!access_ok(VERIFY_WRITE, uaddr, sizeof(int))) >>> - return -EFAULT; >>> + ret = access_ok(VERIFY_WRITE, uaddr, sizeof(*uaddr)); >>> + if (!ret) >>> + goto fail; >>> >>> +again: >>> preempt_disable(); >>> >>> - if (__get_user(uval, uaddr)) >>> - goto done; >>> - >>> - if (uval == expected) { >>> - if (!__put_user(new, uaddr)) >>> + ret = __get_user(val, uaddr); >>> + if (ret == -EFAULT) { >> >> Lets see if this warrants adding complexity ! This implies that TLB entry with >> Read permissions didn't exist for reading the var and page fault handler could not >> wire up even a zero page due to preempt_disable, meaning it was something not >> touched by userspace already - sort of uninitialized variable in user code. > Ok I completely missed the fact that fast path TLB miss handler is being > executed even if we have preemption disabled. So given the mapping exist > we do not need to retry with enabled preemption. > > Still maybe I'm a bit paranoid here but IMHO it's good to be ready for a corner-case > when the pointer is completely bogus and there's no mapping for him. > I understand that today we only expect this syscall to be used from libc's > internals but as long as syscall exists nobody stops anybody from using it > directly without libc. So maybe instead of doing get_user_pages_fast() just > send a SIGSEGV to the process? At least user will realize there's some problem > at earlier stage. if the pointer is bogus, we currently return -EFAULT, is that not enough ! I'm fine if u want to change that to segv. >> Otherwise it is extremely unlikely to start with a TLB entry with Read >> permissions, followed by syscall Trap only to find the entry missing, unless a >> global TLB flush came from other cores, right in the middle. But this syscall is >> not guaranteed to work with SMP anyways, so lets ignore any SMP misdoings here. > Well but that's exactly the situation I was debugging: we start from data from read-only > page and on attempt to write back modified value COW machinery gets involved. No exactly your situation. In your case the TLB entry *did* exist with Read permission. What I was pointing to is that case where it woudl vanish between user reading the backing page and making a syscall ! > >> Now in case it was *an* uninitialized var, do we have to guarantee any well >> defined semantics for the kernel emulation of cmpxchg ? IMO it should be fine to >> return 0 or -EFAULT etc. Infact -EFAULT is better as it will force a retry loop on >> user side, given the typical cmpxchg usage pattern. > The problem is libc only expects to get a value read from memory. > And in theory expected value might be -14 which is basically -EFAULT. > I'm not talking about 0 at all because in some cases that's exactly what > user-space expects. > > So if we read unexpected value then we'll just return it without even attempting > to write. > > If we read expected data but fail to write then we'll send a SIGSEGV and > return whatever... let it be -EFAULT - anyways the app will be killed on exit from > this syscall. I'm not sure what you mean here. I'm fine with adding segv kill semantics, but don't think complexity for get_user is worth it ! -Vineet ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] ARC: Improve cmpxchng syscall implementation 2018-04-18 18:16 ` Vineet Gupta @ 2018-06-19 7:58 ` Alexey Brodkin 0 siblings, 0 replies; 7+ messages in thread From: Alexey Brodkin @ 2018-06-19 7:58 UTC (permalink / raw) To: Vineet Gupta Cc: wbx@uclibc-ng.org, linux-kernel@vger.kernel.org, peterz@infradead.org, jcmvbkbc@gmail.com, linux-arch@vger.kernel.org, linux-snps-arc@lists.infradead.org Hi Vineet, On Wed, 2018-04-18 at 11:16 -0700, Vineet Gupta wrote: > On 03/21/2018 04:54 AM, Alexey Brodkin wrote: > > /* > > > > * This is only for old cores lacking LLOCK/SCOND, which by defintion > > > > @@ -60,23 +62,48 @@ SYSCALL_DEFINE3(arc_usr_cmpxchg, int *, uaddr, int, expected, int, new) > > > > /* Z indicates to userspace if operation succeded */ > > > > regs->status32 &= ~STATUS_Z_MASK; > > > > > > > > - if (!access_ok(VERIFY_WRITE, uaddr, sizeof(int))) > > > > - return -EFAULT; > > > > + ret = access_ok(VERIFY_WRITE, uaddr, sizeof(*uaddr)); > > > > + if (!ret) > > > > + goto fail; > > > > > > > > +again: > > > > preempt_disable(); > > > > > > > > - if (__get_user(uval, uaddr)) > > > > - goto done; > > > > - > > > > - if (uval == expected) { > > > > - if (!__put_user(new, uaddr)) > > > > + ret = __get_user(val, uaddr); > > > > + if (ret == -EFAULT) { > > > > > > Lets see if this warrants adding complexity ! This implies that TLB entry with > > > Read permissions didn't exist for reading the var and page fault handler could not > > > wire up even a zero page due to preempt_disable, meaning it was something not > > > touched by userspace already - sort of uninitialized variable in user code. > > > > Ok I completely missed the fact that fast path TLB miss handler is being > > executed even if we have preemption disabled. So given the mapping exist > > we do not need to retry with enabled preemption. > > > > Still maybe I'm a bit paranoid here but IMHO it's good to be ready for a corner-case > > when the pointer is completely bogus and there's no mapping for him. > > I understand that today we only expect this syscall to be used from libc's > > internals but as long as syscall exists nobody stops anybody from using it > > directly without libc. So maybe instead of doing get_user_pages_fast() just > > send a SIGSEGV to the process? At least user will realize there's some problem > > at earlier stage. > > if the pointer is bogus, we currently return -EFAULT, is that not enough ! I'm > fine if u want to change that to segv. Ok good. > > > Otherwise it is extremely unlikely to start with a TLB entry with Read > > > permissions, followed by syscall Trap only to find the entry missing, unless a > > > global TLB flush came from other cores, right in the middle. But this syscall is > > > not guaranteed to work with SMP anyways, so lets ignore any SMP misdoings here. > > > > Well but that's exactly the situation I was debugging: we start from data from read-only > > page and on attempt to write back modified value COW machinery gets involved. > > No exactly your situation. In your case the TLB entry *did* exist with Read > permission. What I was pointing to is that case where it woudl vanish between user > reading the backing page and making a syscall ! Probably I'm missing something here. Indeed there's already TLB entry with READ permission and we need ProtV exception to happen to update this entry such that it becomes READ-WRITE enabled. And in its turn for ProtV exception to happen we need to enable preemption and execute __put_user(). > > > Now in case it was *an* uninitialized var, do we have to guarantee any well > > > defined semantics for the kernel emulation of cmpxchg ? IMO it should be fine to > > > return 0 or -EFAULT etc. Infact -EFAULT is better as it will force a retry loop on > > > user side, given the typical cmpxchg usage pattern. > > > > The problem is libc only expects to get a value read from memory. > > And in theory expected value might be -14 which is basically -EFAULT. > > I'm not talking about 0 at all because in some cases that's exactly what > > user-space expects. > > > > So if we read unexpected value then we'll just return it without even attempting > > to write. > > > > If we read expected data but fail to write then we'll send a SIGSEGV and > > return whatever... let it be -EFAULT - anyways the app will be killed on exit from > > this syscall. > > I'm not sure what you mean here. I'm fine with adding segv kill semantics, but > don't think complexity for get_user is worth it ! This complexity adds predictability - if we cannot read or write data we kill the app as if our normal LD/ST fails fro whatever reason. -Alexey ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] ARC: Improve cmpxchng syscall implementation 2018-03-19 11:00 [PATCH] ARC: Improve cmpxchng syscall implementation Alexey Brodkin 2018-03-19 18:29 ` Vineet Gupta @ 2018-06-19 9:26 ` Peter Zijlstra 1 sibling, 0 replies; 7+ messages in thread From: Peter Zijlstra @ 2018-06-19 9:26 UTC (permalink / raw) To: Alexey Brodkin Cc: linux-snps-arc, linux-kernel, Vineet Gupta, Max Filippov, linux-arch On Mon, Mar 19, 2018 at 02:00:02PM +0300, Alexey Brodkin wrote: > diff --git a/arch/arc/kernel/process.c b/arch/arc/kernel/process.c > index 5ac3b547453f..d7d3e16133d6 100644 > --- a/arch/arc/kernel/process.c > +++ b/arch/arc/kernel/process.c > @@ -47,7 +47,9 @@ SYSCALL_DEFINE0(arc_gettls) > SYSCALL_DEFINE3(arc_usr_cmpxchg, int *, uaddr, int, expected, int, new) > { > struct pt_regs *regs = current_pt_regs(); > + struct page *page; > + u32 val; > + int ret; > > /* > * This is only for old cores lacking LLOCK/SCOND, which by defintion > @@ -60,23 +62,48 @@ SYSCALL_DEFINE3(arc_usr_cmpxchg, int *, uaddr, int, expected, int, new) > /* Z indicates to userspace if operation succeded */ > regs->status32 &= ~STATUS_Z_MASK; > > + ret = access_ok(VERIFY_WRITE, uaddr, sizeof(*uaddr)); > + if (!ret) > + goto fail; > > +again: > preempt_disable(); > > + ret = __get_user(val, uaddr); > + if (ret == -EFAULT) { > + preempt_enable(); > + ret = get_user_pages_fast((unsigned long)uaddr, 1, 1, &page); > + if (ret < 0) > + goto fail; > + > + put_page(page); > + goto again; > + } else if (ret) > + goto fail; goto fail with preempt disabled. > + > + if (val == expected) { > + ret = __put_user(new, uaddr); > + if (!ret) > regs->status32 |= STATUS_Z_MASK; > } > > -done: > preempt_enable(); > > - return uval; > + if (ret == -EFAULT) { > + ret = get_user_pages_fast((unsigned long)uaddr, 1, 1, &page); > + if (ret < 0) > + goto fail; > + > + put_page(page); > + goto again; > + } else if (ret) > + goto fail; > + > + return val; > + > +fail: > + force_sig(SIGSEGV, current); > + return ret; > } Sorry for the delay, but I would write it like: --- a/arch/arc/kernel/process.c +++ b/arch/arc/kernel/process.c @@ -47,7 +47,9 @@ SYSCALL_DEFINE0(arc_gettls) SYSCALL_DEFINE3(arc_usr_cmpxchg, int *, uaddr, int, expected, int, new) { struct pt_regs *regs = current_pt_regs(); - int uval = -EFAULT; + struct page *page; + u32 val; + int ret; /* * This is only for old cores lacking LLOCK/SCOND, which by defintion @@ -60,23 +62,46 @@ SYSCALL_DEFINE3(arc_usr_cmpxchg, int *, /* Z indicates to userspace if operation succeded */ regs->status32 &= ~STATUS_Z_MASK; - if (!access_ok(VERIFY_WRITE, uaddr, sizeof(int))) - return -EFAULT; + ret = access_ok(VERIFY_WRITE, uaddr, sizeof(*uaddr)); + if (!ret) + goto fail; +again: preempt_disable(); - if (__get_user(uval, uaddr)) - goto done; + ret = __get_user(val, uaddr); + if (ret) + goto fault; - if (uval == expected) { - if (!__put_user(new, uaddr)) - regs->status32 |= STATUS_Z_MASK; - } + if (val != expected) + goto out; -done: + ret = __put_user(new, uaddr); + if (ret) + goto fault; + + regs->status32 |= STATUS_Z_MASK; + +out: preempt_enable(); + return val; + +fault: + preempt_enable(); + + if (unlikely(ret != -EFAULT)) + goto fail; - return uval; + down_read(¤t->mm->mmap_sem); + ret = fixup_user_fault(current, current->mm, uaddr, FAULT_FLAG_WRITE, NULL); + up_read(¤t->mm->mmap_sem); + + if (likely(!ret)) + goto again; + +fail: + force_sig(SIGSEGV, current); + return ret; } #ifdef CONFIG_ISA_ARCV2 ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2018-06-19 9:26 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-03-19 11:00 [PATCH] ARC: Improve cmpxchng syscall implementation Alexey Brodkin 2018-03-19 18:29 ` Vineet Gupta 2018-03-21 11:54 ` Alexey Brodkin 2018-04-04 8:56 ` Alexey Brodkin 2018-04-18 18:16 ` Vineet Gupta 2018-06-19 7:58 ` Alexey Brodkin 2018-06-19 9:26 ` Peter Zijlstra
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).