linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alexey Brodkin <Alexey.Brodkin@synopsys.com>
To: "peterz@infradead.org" <peterz@infradead.org>,
	Vineet Gupta <Vineet.Gupta1@synopsys.com>
Cc: "wbx@uclibc-ng.org" <wbx@uclibc-ng.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>,
	"jcmvbkbc@gmail.com" <jcmvbkbc@gmail.com>,
	"linux-snps-arc@lists.infradead.org"
	<linux-snps-arc@lists.infradead.org>
Subject: Re: [PATCH] ARC: Improve cmpxchng syscall implementation
Date: Wed, 4 Apr 2018 08:56:13 +0000	[thread overview]
Message-ID: <1522832172.4851.7.camel@synopsys.com> (raw)
In-Reply-To: <1521633274.9805.30.camel@synopsys.com>

Hi Vineet, Peter,

On Wed, 2018-03-21 at 14:54 +0300, Alexey Brodkin wrote:
> Hi Vineet,
> 
> On Mon, 2018-03-19 at 11:29 -0700, Vineet Gupta wrote:
> > On 03/19/2018 04:00 AM, Alexey Brodkin wrote:
> > > arc_usr_cmpxchg syscall is supposed to be used on platforms
> > > that lack support of Load-Locked/Store-Conditional instructions
> > > in hardware. And in that case we mimic missing hardware features
> > > with help of kernel's sycall that "atomically" checks current
> > > value in memory and then if it matches caller expectation new
> > > value is written to that same location.
> > > 
> > 
> > ...
> > ...
> > 
> > > 
> > > 2. What's worse if we're dealing with data from not yet allocated
> > >     page (think of pre-copy-on-write state) we'll successfully
> > >     read data but on write we'll silently return to user-space
> > >     with correct result 
> > 
> > This is technically incorrect, even for reading, you need a page, which could be 
> > common zero page in certain cases.
> 
> Ok I'll reword it like.
> 
> > 
> > (which we really read just before). That leads
> > >     to very strange problems in user-space app further down the line
> > >     because new value was never written to the destination.
> > > 
> > > 3. Regardless of what went wrong we'll return from syscall
> > >     and user-space application will continue to execute.
> > >     Even if user's pointer was completely bogus.
> > 
> > Again we are exaggerating (from technical correctness POV) - if user pointer was 
> > bogs, the read would not have worked in first place etc. So lets tone down the 
> > rhetoric.
> 
> Ok here I may rephrase it like that:
> ------------------------------->8-----------------------------
> 3. Regardless of what went wrong we'll return from syscall
>    and user-space application will continue to execute.
> ------------------------------->8-----------------------------
> 
> > 
> > >     In case of hardware LL/SC that app would have been killed
> > >     by the kernel.
> > > 
> > > With that change we attempt to imrove on all 3 items above:
> > > 
> > > 1. We still disable preemption around read-and-write of
> > >     user's data but if we happen to fail with either of them
> > >     we're enabling preemption and try to force page fault so
> > >     that we have a correct mapping in the TLB. Then re-try
> > >     again in "atomic" context.
> > > 
> > > 2. If real page fault fails or even access_ok() returns false
> > >     we send SIGSEGV to the user-space process so if something goes
> > >     seriously wrong we'll know about it much earlier.
> > > 
> > 
> > 
> > >   
> > >   	/*
> > >   	 * This is only for old cores lacking LLOCK/SCOND, which by defintion
> > > @@ -60,23 +62,48 @@ SYSCALL_DEFINE3(arc_usr_cmpxchg, int *, uaddr, int, expected, int, new)
> > >   	/* Z indicates to userspace if operation succeded */
> > >   	regs->status32 &= ~STATUS_Z_MASK;
> > >   
> > > -	if (!access_ok(VERIFY_WRITE, uaddr, sizeof(int)))
> > > -		return -EFAULT;
> > > +	ret = access_ok(VERIFY_WRITE, uaddr, sizeof(*uaddr));
> > > +	if (!ret)
> > > +		goto fail;
> > >   
> > > +again:
> > >   	preempt_disable();
> > >   
> > > -	if (__get_user(uval, uaddr))
> > > -		goto done;
> > > -
> > > -	if (uval == expected) {
> > > -		if (!__put_user(new, uaddr))
> > > +	ret = __get_user(val, uaddr);
> > > +	if (ret == -EFAULT) {
> > 
> > 
> > Lets see if this warrants adding complexity ! This implies that TLB entry with 
> > Read permissions didn't exist for reading the var and page fault handler could not 
> > wire up even a zero page due to preempt_disable, meaning it was something not 
> > touched by userspace already - sort of uninitialized variable in user code.
> 
> Ok I completely missed the fact that fast path TLB miss handler is being
> executed even if we have preemption disabled. So given the mapping exist
> we do not need to retry with enabled preemption.
> 
> Still maybe I'm a bit paranoid here but IMHO it's good to be ready for a corner-case
> when the pointer is completely bogus and there's no mapping for him.
> I understand that today we only expect this syscall to be used from libc's
> internals but as long as syscall exists nobody stops anybody from using it
> directly without libc. So maybe instead of doing get_user_pages_fast() just
> send a SIGSEGV to the process? At least user will realize there's some problem
> at earlier stage.
> 
> > Otherwise it is extremely unlikely to start with a TLB entry with Read 
> > permissions, followed by syscall Trap only to find the entry missing, unless a 
> > global TLB flush came from other cores, right in the middle. But this syscall is 
> > not guaranteed to work with SMP anyways, so lets ignore any SMP misdoings here.
> 
> Well but that's exactly the situation I was debugging: we start from data from read-only
> page and on attempt to write back modified value COW machinery gets involved.
> 
> That was on UP platform.
> 
> > Now in case it was *an* uninitialized var, do we have to guarantee any well 
> > defined semantics for the kernel emulation of cmpxchg ? IMO it should be fine to 
> > return 0 or -EFAULT etc. Infact -EFAULT is better as it will force a retry loop on 
> > user side, given the typical cmpxchg usage pattern.
> 
> The problem is libc only expects to get a value read from memory.
> And in theory expected value might be -14 which is basically -EFAULT.
> I'm not talking about 0 at all because in some cases that's exactly what
> user-space expects.
> 
> So if we read unexpected value then we'll just return it without even attempting
> to write.
> 
> If we read expected data but fail to write then we'll send a SIGSEGV and
> return whatever... let it be -EFAULT - anyways the app will be killed on exit from
> this syscall.

Any comments on my comments above?

-Alexey

  reply	other threads:[~2018-04-04  8:56 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-19 11:00 [PATCH] ARC: Improve cmpxchng syscall implementation Alexey Brodkin
2018-03-19 18:29 ` Vineet Gupta
2018-03-21 11:54   ` Alexey Brodkin
2018-04-04  8:56     ` Alexey Brodkin [this message]
2018-04-18 18:16     ` Vineet Gupta
2018-06-19  7:58       ` Alexey Brodkin
2018-06-19  9:26 ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1522832172.4851.7.camel@synopsys.com \
    --to=alexey.brodkin@synopsys.com \
    --cc=Vineet.Gupta1@synopsys.com \
    --cc=jcmvbkbc@gmail.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-snps-arc@lists.infradead.org \
    --cc=peterz@infradead.org \
    --cc=wbx@uclibc-ng.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).