Re: [PATCH 2/5] nptl: Add rseq registration

All of lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH 2/5] nptl: Add rseq registration
       [not found]     ` <871r2podt9.fsf@oldenburg.str.redhat.com>
@ 2021-12-06 18:52       ` Mathieu Desnoyers
  2021-12-06 19:03         ` Florian Weimer
  0 siblings, 1 reply; 5+ messages in thread
From: Mathieu Desnoyers @ 2021-12-06 18:52 UTC (permalink / raw)
  To: Florian Weimer, paulmck, Boqun Feng, Peter Zijlstra
  Cc: libc-alpha, linux-kernel

[ Adding other kernel rseq maintainers in CC. ]

----- On Dec 6, 2021, at 12:14 PM, Florian Weimer fweimer@redhat.com wrote:

> * Mathieu Desnoyers:
> 
>> ----- On Dec 6, 2021, at 8:46 AM, Florian Weimer fweimer@redhat.com wrote:
>> [...]
>>> @@ -406,6 +407,9 @@ struct pthread
>>>   /* Used on strsignal.  */
>>>   struct tls_internal_t tls_state;
>>> 
>>> +  /* rseq area registered with the kernel.  */
>>> +  struct rseq rseq_area;
>>
>> The rseq UAPI requires that the fields within the rseq_area
>> are read-written with single-copy atomicity semantics.
>>
>> So either we define a "volatile struct rseq" here, or we'll need
>> to wrap all accesses with the proper volatile casts, or use the
>> relaxed_mo atomic accesses.
> 
> Under the C memory model, neither volatile nor relaxed MO result in
> single-copy atomicity semantics.  So I'm not sure what to make of this.
> Surely switching to inline assembly on all targets is over the top.
> 
> I think we can rely on a plain read doing the right thing for us.

AFAIU, the plain read does not prevent the compiler from re-loading the
value in case of high register pressure.

Accesses to rseq fields such as cpu_id need to be done as if those were
concurrently modified by a signal handler nesting on top of the user-space
code, with the particular twist that blocking signals has no effect on
concurrent updates.

I do not think we need to do the load in assembly. I was under the impression
that both volatile load and relaxed MO result in single-copy atomicity
semantics for an aligned pointer. Perhaps Paul, Peter, Boqun have something
to add here ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 2/5] nptl: Add rseq registration
  2021-12-06 18:52       ` [PATCH 2/5] nptl: Add rseq registration Mathieu Desnoyers
@ 2021-12-06 19:03         ` Florian Weimer
  2021-12-06 20:11           ` Paul E. McKenney
  0 siblings, 1 reply; 5+ messages in thread
From: Florian Weimer @ 2021-12-06 19:03 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: paulmck, Boqun Feng, Peter Zijlstra, libc-alpha, linux-kernel

* Mathieu Desnoyers:

> [ Adding other kernel rseq maintainers in CC. ]
>
> ----- On Dec 6, 2021, at 12:14 PM, Florian Weimer fweimer@redhat.com wrote:
>
>> * Mathieu Desnoyers:
>> 
>>> ----- On Dec 6, 2021, at 8:46 AM, Florian Weimer fweimer@redhat.com wrote:
>>> [...]
>>>> @@ -406,6 +407,9 @@ struct pthread
>>>>   /* Used on strsignal.  */
>>>>   struct tls_internal_t tls_state;
>>>> 
>>>> +  /* rseq area registered with the kernel.  */
>>>> +  struct rseq rseq_area;
>>>
>>> The rseq UAPI requires that the fields within the rseq_area
>>> are read-written with single-copy atomicity semantics.
>>>
>>> So either we define a "volatile struct rseq" here, or we'll need
>>> to wrap all accesses with the proper volatile casts, or use the
>>> relaxed_mo atomic accesses.
>> 
>> Under the C memory model, neither volatile nor relaxed MO result in
>> single-copy atomicity semantics.  So I'm not sure what to make of this.
>> Surely switching to inline assembly on all targets is over the top.
>> 
>> I think we can rely on a plain read doing the right thing for us.
>
> AFAIU, the plain read does not prevent the compiler from re-loading the
> value in case of high register pressure.
>
> Accesses to rseq fields such as cpu_id need to be done as if those were
> concurrently modified by a signal handler nesting on top of the user-space
> code, with the particular twist that blocking signals has no effect on
> concurrent updates.
>
> I do not think we need to do the load in assembly. I was under the impression
> that both volatile load and relaxed MO result in single-copy atomicity
> semantics for an aligned pointer. Perhaps Paul, Peter, Boqun have something
> to add here ?

The C memory model is broken and does not prevent out-of-thin-air
values.  As far as I know, this breaks single-copy atomicity.  In
practice, compilers will not exercise the latitude offered by the memory
model.  volatile does not ensure absence of data races.

Using atomics or volatile would require us to materialize the thread
pointer, given the current internal interfaces we have, and I don't want
to do this because this is supposed to be performance-critical code.
The compiler barrier inherent to the function call will have to be
enough.  I can add a comment to this effect:

  /* This load has single-copy atomicity semantics (as required for
     rseq) because the function call implies a compiler barrier.  */

Thanks,
Florian


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 2/5] nptl: Add rseq registration
  2021-12-06 19:03         ` Florian Weimer
@ 2021-12-06 20:11           ` Paul E. McKenney
  2021-12-06 20:26             ` Florian Weimer
  0 siblings, 1 reply; 5+ messages in thread
From: Paul E. McKenney @ 2021-12-06 20:11 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Mathieu Desnoyers, Boqun Feng, Peter Zijlstra, libc-alpha,
	linux-kernel

On Mon, Dec 06, 2021 at 08:03:26PM +0100, Florian Weimer wrote:
> * Mathieu Desnoyers:
> 
> > [ Adding other kernel rseq maintainers in CC. ]
> >
> > ----- On Dec 6, 2021, at 12:14 PM, Florian Weimer fweimer@redhat.com wrote:
> >
> >> * Mathieu Desnoyers:
> >> 
> >>> ----- On Dec 6, 2021, at 8:46 AM, Florian Weimer fweimer@redhat.com wrote:
> >>> [...]
> >>>> @@ -406,6 +407,9 @@ struct pthread
> >>>>   /* Used on strsignal.  */
> >>>>   struct tls_internal_t tls_state;
> >>>> 
> >>>> +  /* rseq area registered with the kernel.  */
> >>>> +  struct rseq rseq_area;
> >>>
> >>> The rseq UAPI requires that the fields within the rseq_area
> >>> are read-written with single-copy atomicity semantics.
> >>>
> >>> So either we define a "volatile struct rseq" here, or we'll need
> >>> to wrap all accesses with the proper volatile casts, or use the
> >>> relaxed_mo atomic accesses.
> >> 
> >> Under the C memory model, neither volatile nor relaxed MO result in
> >> single-copy atomicity semantics.  So I'm not sure what to make of this.
> >> Surely switching to inline assembly on all targets is over the top.
> >> 
> >> I think we can rely on a plain read doing the right thing for us.
> >
> > AFAIU, the plain read does not prevent the compiler from re-loading the
> > value in case of high register pressure.
> >
> > Accesses to rseq fields such as cpu_id need to be done as if those were
> > concurrently modified by a signal handler nesting on top of the user-space
> > code, with the particular twist that blocking signals has no effect on
> > concurrent updates.
> >
> > I do not think we need to do the load in assembly. I was under the impression
> > that both volatile load and relaxed MO result in single-copy atomicity
> > semantics for an aligned pointer. Perhaps Paul, Peter, Boqun have something
> > to add here ?
> 
> The C memory model is broken and does not prevent out-of-thin-air
> values.  As far as I know, this breaks single-copy atomicity.  In
> practice, compilers will not exercise the latitude offered by the memory
> model.  volatile does not ensure absence of data races.

Within the confines of the standard, agreed, use of the volatile keyword
does not explicitly prevent data races.

However, volatile accesses are (informally) defined to suffice for
device-driver memory accesses that communicate with devices, whether via
MMIO or DMA-style shared memory.  The device-driver firmware is often
written in C or C++.  So doesn't this informal device-driver guarantee
need to also do what is needed for userspace code that is communicating
with kernel code?  If not, why not?

> Using atomics or volatile would require us to materialize the thread
> pointer, given the current internal interfaces we have, and I don't want
> to do this because this is supposed to be performance-critical code.
> The compiler barrier inherent to the function call will have to be
> enough.  I can add a comment to this effect:
> 
>   /* This load has single-copy atomicity semantics (as required for
>      rseq) because the function call implies a compiler barrier.  */

Agreed on the need to be very careful to avoid degrading performance on
fast paths!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 2/5] nptl: Add rseq registration
  2021-12-06 20:11           ` Paul E. McKenney
@ 2021-12-06 20:26             ` Florian Weimer
  2021-12-06 21:08               ` Paul E. McKenney
  0 siblings, 1 reply; 5+ messages in thread
From: Florian Weimer @ 2021-12-06 20:26 UTC (permalink / raw)
  To: Paul E. McKenney via Libc-alpha
  Cc: paulmck, Peter Zijlstra, Boqun Feng, Mathieu Desnoyers,
	linux-kernel

* Paul E. McKenney via Libc-alpha:

>> The C memory model is broken and does not prevent out-of-thin-air
>> values.  As far as I know, this breaks single-copy atomicity.  In
>> practice, compilers will not exercise the latitude offered by the memory
>> model.  volatile does not ensure absence of data races.
>
> Within the confines of the standard, agreed, use of the volatile keyword
> does not explicitly prevent data races.
>
> However, volatile accesses are (informally) defined to suffice for
> device-driver memory accesses that communicate with devices, whether via
> MMIO or DMA-style shared memory.  The device-driver firmware is often
> written in C or C++.  So doesn't this informal device-driver guarantee
> need to also do what is needed for userspace code that is communicating
> with kernel code?  If not, why not?

The informal guarantee is probably good enough here, too.  However, the
actual accesses are behind macros, and those macros use either
non-volatile plain reads or inline assembler (which use
single-instruction naturally aligned reads).

THanks,
Florian


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 2/5] nptl: Add rseq registration
  2021-12-06 20:26             ` Florian Weimer
@ 2021-12-06 21:08               ` Paul E. McKenney
  0 siblings, 0 replies; 5+ messages in thread
From: Paul E. McKenney @ 2021-12-06 21:08 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Paul E. McKenney via Libc-alpha, Peter Zijlstra, Boqun Feng,
	Mathieu Desnoyers, linux-kernel

On Mon, Dec 06, 2021 at 09:26:51PM +0100, Florian Weimer wrote:
> * Paul E. McKenney via Libc-alpha:
> 
> >> The C memory model is broken and does not prevent out-of-thin-air
> >> values.  As far as I know, this breaks single-copy atomicity.  In
> >> practice, compilers will not exercise the latitude offered by the memory
> >> model.  volatile does not ensure absence of data races.
> >
> > Within the confines of the standard, agreed, use of the volatile keyword
> > does not explicitly prevent data races.
> >
> > However, volatile accesses are (informally) defined to suffice for
> > device-driver memory accesses that communicate with devices, whether via
> > MMIO or DMA-style shared memory.  The device-driver firmware is often
> > written in C or C++.  So doesn't this informal device-driver guarantee
> > need to also do what is needed for userspace code that is communicating
> > with kernel code?  If not, why not?
> 
> The informal guarantee is probably good enough here, too.  However, the
> actual accesses are behind macros, and those macros use either
> non-volatile plain reads or inline assembler (which use
> single-instruction naturally aligned reads).

Agreed, a non-volatile plain read is quite dangerous in this context.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-12-06 21:08 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <cover.1638798186.git.fweimer@redhat.com>
     [not found] ` <9c58724d604e160ebda5f667331fa41416c0d12b.1638798186.git.fweimer@redhat.com>
     [not found]   ` <1780152866.15126.1638809966443.JavaMail.zimbra@efficios.com>
     [not found]     ` <871r2podt9.fsf@oldenburg.str.redhat.com>
2021-12-06 18:52       ` [PATCH 2/5] nptl: Add rseq registration Mathieu Desnoyers
2021-12-06 19:03         ` Florian Weimer
2021-12-06 20:11           ` Paul E. McKenney
2021-12-06 20:26             ` Florian Weimer
2021-12-06 21:08               ` Paul E. McKenney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.