All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
To: Darren Hart <dvhart@linux.intel.com>,
	Thomas Gleixner <tglx@linutronix.de>
Cc: mtk.manpages@gmail.com, Carlos O'Donell <carlos@redhat.com>,
	Ingo Molnar <mingo@elte.hu>, Jakub Jelinek <jakub@redhat.com>,
	"linux-man@vger.kernel.org" <linux-man@vger.kernel.org>,
	lkml <linux-kernel@vger.kernel.org>,
	Arnd Bergmann <arnd@arndb.de>,
	Steven Rostedt <rostedt@goodmis.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Linux API <linux-api@vger.kernel.org>,
	Davidlohr Bueso <dave@stgolabs.net>,
	Jan Kiszka <jan.kiszka@siemens.com>
Subject: Re: futex(2) man page update help request
Date: Sat, 17 Jan 2015 10:16:34 +0100	[thread overview]
Message-ID: <54BA2872.5040003@gmail.com> (raw)
In-Reply-To: <D0DEF1AE.B7EDE%dvhart@linux.intel.com>

Hello Darren,

On 01/17/2015 02:33 AM, Darren Hart wrote:
> Corrected Davidlohr's email address.

Thanks!

> On 1/15/15, 7:12 AM, "Michael Kerrisk (man-pages)"
> <mtk.manpages@gmail.com> wrote:
> 
>> Hello Darren,
>>
>> I give you the same apology as to Thomas for the
>> long-delayed response to your mail.
>>
>> And I repeat my note to Thomas:
>> In the next day or two, I hope to send out the new version
>> of the futex(2) page for review. The new draft is a bit
>> bigger (okay -- 4 x bigger) than the current page. And there
>> are a quite number of FIXMEs that I've placed in the page
>> for various points--some minor, but a few major--that need
>> to be checked or fixed. Would you have some time to review
>> that page?
> 
> I'll make the time for that. I've wanted to see this for a while, so thank
> you for working on it!

Great!

>> In the meantime, I have a couple of questions, which, if
>> you could answer them, I would work some changes into the
>> page before sending.
>>
>> 1. In various places, distinction is made between non-PI
>>   futexs and PI futexes. But what determines that distinction?
>>   From the kernel's perspective, hat make a futex one type
>>   or another? I presume it is to do with the types of blocking
>>   waiters on the futex, but it would be good to have a formal
>>   definition.
> 
> You're right in that a uaddr is a uaddr is a uaddr. Also "there is no such
> thing as a futex", it doesn't exist as any kind of identifiable object, so
> these discussions can get rather confusing :-)

So, I want to make sure that I am clear on what you mean you say this.
You say "there is no such thing as a futex" because from the kernel's
perspective there is no visible entity in the uncontended case
(where everything can be dealt with in user space). And from user-space,
in the uncontended case all we're doing is memory operations. Right?

On the other hand, from a kernel perspective, we could say that a 
futex "exists" in the contended phases, since the kernel has allocated
state associated with the uaddr. Right?

> A "futex" becomes a PI futex when it is "created" via a PI futex op code.

Precisely which PI op codes? Is it: FUTEX_LOCK_PI, FUTEX_TRYLOCK_PI, and
FUTEX_CMP_REQUEUE_PI, and not FUTEX_WAIT_REQUEUE_PI or FUTEX_UNLOCK_PI?

> At that point, the syscall will ensure a pi_state is populated for the
> futex_q entry. See futex_lock_pi() for example. Before the locks are
> taken, there is a call to refill_pi_state_cache() which preps a pi_state
> for assignment later in futex_lock_pi_atomic(). This pi_state provides the
> necessary linkage to perform the priority boosting in the event of a
> priority inversion. This is handled externally from the futexes via the
> rt_mutex construct.
> 
> Clear as mud?

Not quite that bad, but... The thing is, still, the man page has text
such as the following (based on your wording):

       FUTEX_CMP_REQUEUE_PI (since Linux 2.6.31)
              This operation is a PI-aware variant of FUTEX_CMP_REQUEUE.
              It    requeues    waiters    that    are    blocked    via
              FUTEX_WAIT_REQUEUE_PI  on uaddr from a non-PI source futex
              (uaddr) to a PI target futex (uaddr2).

And elsewhere you said

    EINVAL is returned if the non-pi to pi or 
    op pairing semantics are violated.

When someone in user-land (e.g., me) reads pieces like that, they then 
want to find somewhere in the man page a description of what makes a 
futex a *PI futex* and probably some statements of the distinction 
between PI and non-PI futexes. And those statements should be from a 
perspective that is somewhat comprehensible to user-space. I'm not
yet confident that I can do that. Do you care to take a shot at it?

>> 2. Can you say something about the pairing requirements of
>>   FUTEX_WAIT_REQUEUE_PI and FUTEX_CMP_REQUEUE_PI.
>>   What is the requirement and why do we need it?
> 
> Briefly, these op codes exist to support a fairly specific use case:
> support for PI aware pthread condvars (glibc patch acceptance STILL
> PENDING FOR LOVE OF EVERYTHING HOLY WHY?!?!?! 

Yes, Jan Kiszka recently alerted me to the existence of 
https://sourceware.org/bugzilla/show_bug.cgi?id=11588
and I still have some text that you proposed (mail titled
("Pthread Condition Variables and Priority Inversion")
quite a long time ago for the pthread_cond_timedwait() page.
One day, when that page exists, I'll try to remember to add it.

> But is shipped with various
> PREEMPT_RT enabled Linux systems. Because these calls are paired, and more
> of the logic can happen on the kernel side (to preserve ownership of an
> rt_mutex with waiters), so in order to ensure userspace and kernelspace
> remain in sync, we pre-specify the target of the requeue in
> futex_wait_requeue_pi. This also limits the attack surface by only
> supporting exactly what it was meant to do. The corner cases get insane
> otherwise.

Thanks. I've added some text on pairing, based on your text above.

> We could walk through the various ways in which it would break if these
> pairing restrictions were not in place, but I'll have to take some serious
> time to page all those into working memory. Let me know if we need more
> detail here and I will.

I don't think we need that much level of detail.

>> Most of the rest of this mail is just a checklist noting
>> what I did with your comments. No response is needed
>> in most cases, but there is one that I have marked with
>> "???". If you could reply to that. I'd be grateful.
> 
> ...
> 
>>> For all the PI opcodes, we should probably mention something about the
>>> futex value scheme (TID), whereas the other opcodes do not require any
>>> specific value scheme.
>>>
>>> No Owner:	0
>>> Owner:		TID
>>> Waiters:	TID | FUTEX_WAITERS
>>>
>>> This is the relevant section from the referenced paper:
>>> 				
>>> The PI futex operations diverge from the oth-
>>> ers in that they impose a policy describing how
>>> the futex value is to be used. If the lock is un-
>>> owned, the futex value shall be 0. If owned, it
>>> shall be the thread id (tid) of the owning thread.
>>> If there are threads contending for the lock, then
>>> the FUTEX_WAITERS flag is set. With this policy in
>>> place, userspace can atomically acquire an unowned
>>> lock or release an uncontended lock using an atomic
>>> instruction and their own tid. A non-zero futex
>>> value will force waiters into the kernel to lock. The
>>> FUTEX_WAITERS flag forces the owner into the kernel
>>> to unlock. If the callers are forced into the kernel,
>>> they then deal directly with an underlying rt_mutex
>>> which implements the priority inheritance semantics.
>>> After the rt_mutex is acquired, the futex value is up-
>>> dated accordingly, before the calling thread returns
>>> to userspace.
>>>
>>> It is important to note that the kernel will update the futex value
>>> prior
>>> to returning to userspace. Unlike other futex op codes,
>>> FUTEX_CMP_REUQUE_PI (and FUTEX_WAIT_REQUEUE_PI, FUTEX_LOCK_PI are
>>> designed
>>> for the implementation of very specific IPC mechanisms).
>>
>> ??? Great text. May I presume that I can take this text
>> and freely adapt it for the man page? (Actually, this is a
>> request for forgiveness, rather than permission :-).)
> 
> Thanks, and no objection from me.

Thanks.

Cheers,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

  reply	other threads:[~2015-01-17  9:16 UTC|newest]

Thread overview: 145+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-14 10:35 futex(2) man page update help request Michael Kerrisk (man-pages)
     [not found] ` <537346E5.4050407-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2014-05-14 16:18   ` Darren Hart
2014-05-14 16:18     ` Darren Hart
2014-05-14 19:03     ` Michael Kerrisk (man-pages)
     [not found]       ` <CAKgNAkh+PWzT2SByaLk_OtiAXeZSkWoMgu+ivDOt1dTWVtaatQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-05-14 19:59         ` Darren Hart
2014-05-14 19:59           ` Darren Hart
2014-05-14 20:23         ` Carlos O'Donell
2014-05-14 20:23           ` Carlos O'Donell
     [not found]           ` <5373D0CA.2050204-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-05-14 20:44             ` Andy Lutomirski
2014-05-14 20:44               ` Andy Lutomirski
2014-05-14 23:34             ` Thomas Gleixner
2014-05-14 23:34               ` Thomas Gleixner
     [not found]               ` <alpine.DEB.2.02.1405150121230.6261-3cz04HxQygjZikZi3RtOZ1XZhhPuCNm+@public.gmane.org>
2014-05-15  3:12                 ` Carlos O'Donell
2014-05-15  3:12                   ` Carlos O'Donell
     [not found]                   ` <537430B5.2060001-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-05-15  4:49                     ` Michael Kerrisk (man-pages)
2014-05-15  4:49                       ` Michael Kerrisk (man-pages)
2014-05-15  4:53                 ` Michael Kerrisk (man-pages)
2014-05-15  4:53                   ` Michael Kerrisk (man-pages)
     [not found]                   ` <CAKgNAkjQ5Dd_U9OojXdgeforpRevvPHNFAw99kBFPCwHgf7Ggg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-05-15 14:14                     ` Thomas Gleixner
2014-05-15 14:14                       ` Thomas Gleixner
     [not found]                       ` <alpine.DEB.2.02.1405151144390.6261-3cz04HxQygjZikZi3RtOZ1XZhhPuCNm+@public.gmane.org>
2014-05-15 20:19                         ` Michael Kerrisk (man-pages)
2014-05-15 20:19                           ` Michael Kerrisk (man-pages)
     [not found]                           ` <53752157.9070803-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2014-08-04 14:46                             ` Carlos O'Donell
2014-08-04 14:46                               ` Carlos O'Donell
2014-05-15 20:35                         ` Darren Hart
2014-05-15 20:35                           ` Darren Hart
     [not found]                           ` <CF9A731D.913E6%dvhart-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2015-01-15 15:12                             ` Michael Kerrisk (man-pages)
2015-01-15 15:12                               ` Michael Kerrisk (man-pages)
     [not found]                               ` <54B7D8D4.2070203-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2015-01-17  1:33                                 ` Darren Hart
2015-01-17  1:33                                   ` Darren Hart
2015-01-17  9:16                                   ` Michael Kerrisk (man-pages) [this message]
     [not found]                                     ` <54BA2872.5040003-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2015-01-17 19:26                                       ` Darren Hart
2015-01-17 19:26                                         ` Darren Hart
     [not found]                                         ` <D0DFF430.B7F94%dvhart-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2015-01-18 10:18                                           ` Michael Kerrisk (man-pages)
2015-01-18 10:18                                             ` Michael Kerrisk (man-pages)
2015-01-15 15:10                         ` Michael Kerrisk (man-pages)
2015-01-15 15:10                           ` Michael Kerrisk (man-pages)
     [not found]                           ` <54B7D87C.3090901-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2015-01-15 22:23                             ` Thomas Gleixner
2015-01-15 22:23                               ` Thomas Gleixner
2015-01-16 15:17                               ` Michael Kerrisk (man-pages)
2015-01-16 15:17                                 ` Michael Kerrisk (man-pages)
     [not found]                                 ` <54B92B71.2090509-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2015-01-16 15:20                                   ` Thomas Gleixner
2015-01-16 15:20                                     ` Thomas Gleixner
2015-01-16 20:54                                     ` Michael Kerrisk (man-pages)
2015-01-16 20:54                                       ` Michael Kerrisk (man-pages)
     [not found]                                       ` <54B97A72.2050205-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2015-01-17  0:46                                         ` Darren Hart
2015-01-17  0:46                                           ` Darren Hart
     [not found]                                           ` <D0DEECA2.B7EAD%dvhart-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2015-01-19 10:45                                             ` Thomas Gleixner
2015-01-19 10:45                                               ` Thomas Gleixner
2015-01-19 14:07                                               ` Michael Kerrisk (man-pages)
2015-01-23 18:19                                             ` Torvald Riegel
2015-01-23 18:19                                               ` Torvald Riegel
2015-01-24 10:05                                               ` Thomas Gleixner
2015-01-24 12:58                                                 ` Torvald Riegel
     [not found]                                                   ` <1422104287.29655.13.camel-I2ZjUw8blINjztcc/or7kQ@public.gmane.org>
2015-01-24 16:25                                                     ` Thomas Gleixner
2015-01-24 16:25                                                       ` Thomas Gleixner
2015-01-17  0:56                                       ` Davidlohr Bueso
     [not found]                                         ` <1421456216.27134.2.camel-h16yJtLeMjHk1uMJSBkQmQ@public.gmane.org>
2015-01-17  1:11                                           ` Darren Hart
2015-01-17  1:11                                             ` Darren Hart
2015-01-23 18:29                             ` Torvald Riegel
2015-01-23 18:29                               ` Torvald Riegel
     [not found]                               ` <1422037788.29655.0.camel-I2ZjUw8blINjztcc/or7kQ@public.gmane.org>
2015-01-24 11:35                                 ` Thomas Gleixner
2015-01-24 11:35                                   ` Thomas Gleixner
2015-01-24 13:12                                   ` Torvald Riegel
2015-01-24 13:12                                     ` Torvald Riegel
     [not found]                                     ` <1422105142.29655.16.camel-I2ZjUw8blINjztcc/or7kQ@public.gmane.org>
2015-01-27  7:48                                       ` Michael Kerrisk (man-pages)
2015-01-27  7:48                                         ` Michael Kerrisk (man-pages)
2015-02-05 19:57                                   ` Darren Hart
2014-05-15  8:13             ` Peter Zijlstra
2014-05-15  8:13               ` Peter Zijlstra
     [not found]               ` <20140515081357.GC11096-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org>
2014-05-15 15:43                 ` Darren Hart
2014-05-15 15:43                   ` Darren Hart
2014-05-15  8:14             ` Peter Zijlstra
2014-05-15  8:14               ` Peter Zijlstra
     [not found]               ` <20140515081444.GD11096-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org>
2014-05-15 13:18                 ` Carlos O'Donell
2014-05-15 13:18                   ` Carlos O'Donell
     [not found]                   ` <5374BE9E.8080408-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-05-15 13:22                     ` Peter Zijlstra
2014-05-15 13:22                       ` Peter Zijlstra
     [not found]                       ` <20140515132235.GM30445-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org>
2014-05-15 13:49                         ` Michael Kerrisk (man-pages)
2014-05-15 13:49                           ` Michael Kerrisk (man-pages)
     [not found]                           ` <CAKgNAkjrS8WWMoQzsiOkMVn1_Bf06uFCL6ECU7z=mv0fszg+gQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-05-15 13:55                             ` Peter Zijlstra
2014-05-15 13:55                               ` Peter Zijlstra
2014-05-15 14:39                             ` Carlos O'Donell
2014-05-15 14:39                               ` Carlos O'Donell
2014-05-15 15:11                               ` Peter Zijlstra
2014-05-14 20:56         ` Davidlohr Bueso
2014-05-14 20:56           ` Davidlohr Bueso
2014-05-14 21:03           ` Darren Hart
2014-05-14 21:03             ` Darren Hart
     [not found]             ` <CF99279F.907FC%dvhart-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2014-05-14 22:21               ` Paul E. McKenney
2014-05-14 22:21                 ` Paul E. McKenney
     [not found]           ` <1400100977.3865.30.camel-5JQ4ckphU/8SZAcGdq5asR6epYMZPwEe5NbjCUgZEJk@public.gmane.org>
2014-05-15  0:28             ` H. Peter Anvin
2014-05-15  0:28               ` H. Peter Anvin
2014-05-15  0:35               ` Andy Lutomirski
     [not found]                 ` <CALCETrXzMiS9DwvmZn++wg0x6v-ZR0YP9fAdco4PRST=nTY4nQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-05-15  0:41                   ` H. Peter Anvin
2014-05-15  0:41                     ` H. Peter Anvin
     [not found]               ` <53740A30.20807-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
2014-05-15 19:10                 ` Carlos O'Donell
2014-05-15 19:10                   ` Carlos O'Donell
     [not found]     ` <CF98E3EF.90564%dvhart-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2014-05-14 21:05       ` Davidlohr Bueso
2014-05-14 21:05         ` Davidlohr Bueso
2014-05-15 15:15         ` Joseph S. Myers
2014-05-15  0:18       ` H. Peter Anvin
2014-05-15  0:18         ` H. Peter Anvin
     [not found]         ` <537407ED.8050606-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
2014-05-15  5:21           ` Darren Hart
2014-05-15  5:21             ` Darren Hart
     [not found]             ` <CF999CD6.90A93%dvhart-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2014-05-15  8:23               ` Peter Zijlstra
2014-05-15  8:23                 ` Peter Zijlstra
2014-05-15 13:46               ` Michael Kerrisk (man-pages)
2014-05-15 13:46                 ` Michael Kerrisk (man-pages)
     [not found]                 ` <5374C54B.7040408-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2014-05-15 14:59                   ` H. Peter Anvin
2014-05-15 14:59                     ` H. Peter Anvin
2014-05-15 15:42                 ` chrubis
     [not found]                   ` <20140515154246.GC6926-HSzIOc4LzcM@public.gmane.org>
2014-05-15 15:52                     ` H. Peter Anvin
2014-05-15 15:52                       ` H. Peter Anvin
     [not found]                       ` <5374E2A4.2070408-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
2014-05-15 16:01                         ` chrubis-AlSwsSmVLrQ
2014-05-15 16:01                           ` chrubis
     [not found]                           ` <20140515160152.GA7529-HSzIOc4LzcM@public.gmane.org>
2014-05-15 16:07                             ` H. Peter Anvin
2014-05-15 16:07                               ` H. Peter Anvin
     [not found]                               ` <5374E653.2080309-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
2014-05-15 16:17                                 ` chrubis-AlSwsSmVLrQ
2014-05-15 16:17                                   ` chrubis
2014-05-15 16:56                                   ` H. Peter Anvin
     [not found]                                     ` <5374F1B3.9080006-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
2014-05-15 17:06                                       ` chrubis-AlSwsSmVLrQ
2014-05-15 17:06                                         ` chrubis
2014-05-15 15:47                 ` Darren Hart
2014-05-15 15:35           ` chrubis-AlSwsSmVLrQ
2014-05-15 15:35             ` chrubis
2014-05-15 15:28       ` chrubis-AlSwsSmVLrQ
2014-05-15 15:28         ` chrubis
     [not found]         ` <20140515152834.GA6926-HSzIOc4LzcM@public.gmane.org>
2014-05-15 15:40           ` Steven Rostedt
2014-05-15 15:40             ` Steven Rostedt
2014-05-15 16:14         ` Darren Hart
2014-05-15 16:14           ` Darren Hart
     [not found]           ` <CF9A3521.90ECC%dvhart-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2014-05-15 16:30             ` chrubis-AlSwsSmVLrQ
2014-05-15 16:30               ` chrubis
     [not found]               ` <20140515163004.GB7959-HSzIOc4LzcM@public.gmane.org>
2014-05-15 18:17                 ` Darren Hart
2014-05-15 18:17                   ` Darren Hart
     [not found]                   ` <CF9A3A75.90F40%dvhart-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2014-05-15 19:05                     ` chrubis-AlSwsSmVLrQ
2014-05-15 19:05                       ` chrubis
     [not found]                       ` <20140515190529.GA8887-HSzIOc4LzcM@public.gmane.org>
2014-05-15 19:38                         ` Darren Hart
2014-05-15 19:38                           ` Darren Hart
     [not found]                           ` <CF9A658E.91322%dvhart-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2014-08-11 10:19                             ` chrubis-AlSwsSmVLrQ
2014-08-11 10:19                               ` chrubis
2014-11-26 13:41                             ` Cyril Hrubis
2014-11-26 13:41                               ` Cyril Hrubis
2015-02-16 13:14                           ` Cyril Hrubis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54BA2872.5040003@gmail.com \
    --to=mtk.manpages@gmail.com \
    --cc=arnd@arndb.de \
    --cc=carlos@redhat.com \
    --cc=dave@stgolabs.net \
    --cc=dvhart@linux.intel.com \
    --cc=jakub@redhat.com \
    --cc=jan.kiszka@siemens.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-man@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.