From: Darren Hart <dvhart@linux.intel.com>
To: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>,
Thomas Gleixner <tglx@linutronix.de>
Cc: "Carlos O'Donell" <carlos@redhat.com>,
Ingo Molnar <mingo@elte.hu>, Jakub Jelinek <jakub@redhat.com>,
"linux-man@vger.kernel.org" <linux-man@vger.kernel.org>,
lkml <linux-kernel@vger.kernel.org>,
Arnd Bergmann <arnd@arndb.de>,
Steven Rostedt <rostedt@goodmis.org>,
Peter Zijlstra <peterz@infradead.org>,
Linux API <linux-api@vger.kernel.org>,
Davidlohr Bueso <dave@stgolabs.net>,
Jan Kiszka <jan.kiszka@siemens.com>
Subject: Re: futex(2) man page update help request
Date: Sat, 17 Jan 2015 11:26:54 -0800 [thread overview]
Message-ID: <D0DFF430.B7F94%dvhart@linux.intel.com> (raw)
In-Reply-To: <54BA2872.5040003@gmail.com>
On 1/17/15, 1:16 AM, "Michael Kerrisk (man-pages)"
<mtk.manpages@gmail.com> wrote:
>Hello Darren,
>
>On 01/17/2015 02:33 AM, Darren Hart wrote:
>> Corrected Davidlohr's email address.
>
>Thanks!
>
>> On 1/15/15, 7:12 AM, "Michael Kerrisk (man-pages)"
>> <mtk.manpages@gmail.com> wrote:
>>
>>> Hello Darren,
>>>
>>> I give you the same apology as to Thomas for the
>>> long-delayed response to your mail.
>>>
>>> And I repeat my note to Thomas:
>>> In the next day or two, I hope to send out the new version
>>> of the futex(2) page for review. The new draft is a bit
>>> bigger (okay -- 4 x bigger) than the current page. And there
>>> are a quite number of FIXMEs that I've placed in the page
>>> for various points--some minor, but a few major--that need
>>> to be checked or fixed. Would you have some time to review
>>> that page?
>>
>> I'll make the time for that. I've wanted to see this for a while, so
>>thank
>> you for working on it!
>
>Great!
>
>>> In the meantime, I have a couple of questions, which, if
>>> you could answer them, I would work some changes into the
>>> page before sending.
>>>
>>> 1. In various places, distinction is made between non-PI
>>> futexs and PI futexes. But what determines that distinction?
>>> From the kernel's perspective, hat make a futex one type
>>> or another? I presume it is to do with the types of blocking
>>> waiters on the futex, but it would be good to have a formal
>>> definition.
>>
>> You're right in that a uaddr is a uaddr is a uaddr. Also "there is no
>>such
>> thing as a futex", it doesn't exist as any kind of identifiable object,
>>so
>> these discussions can get rather confusing :-)
>
>So, I want to make sure that I am clear on what you mean you say this.
>You say "there is no such thing as a futex" because from the kernel's
>perspective there is no visible entity in the uncontended case
>(where everything can be dealt with in user space). And from user-space,
>in the uncontended case all we're doing is memory operations. Right?
>
>On the other hand, from a kernel perspective, we could say that a
>futex "exists" in the contended phases, since the kernel has allocated
>state associated with the uaddr. Right?
Sorry, this was more anecdotal, and probably more of a distraction than
constructive. I just meant that unlike other things which you can point to
a specific struct for (task, rt_mutex, etc.), a "futex" has it's state
distributed across the backing store (uaddr), the queue (futex_q), the
pi_state, the rt_mutex, etc, and these span kernel space and userspace.
Your description above is correct.
>
>> A "futex" becomes a PI futex when it is "created" via a PI futex op
>>code.
>
>Precisely which PI op codes? Is it: FUTEX_LOCK_PI, FUTEX_TRYLOCK_PI, and
>FUTEX_CMP_REQUEUE_PI, and not FUTEX_WAIT_REQUEUE_PI or FUTEX_UNLOCK_PI?
Based on your wording below about taking a user POV on this, I'm going to
say "yes" here. These opcodes paired with the PI futex value policy
(described below) defines a "futex" as PI aware. These were created very
specifically in support of PI pthread_mutexes, so it makes a lot more
sense to talk about a PI aware pthread_mutex, than a PI aware futex, since
there is a lot of policy and scaffolding that has to be built up around it
to use it properly (this is what a PI pthread_mutex is).
>> At that point, the syscall will ensure a pi_state is populated for the
>> futex_q entry. See futex_lock_pi() for example. Before the locks are
>> taken, there is a call to refill_pi_state_cache() which preps a pi_state
>> for assignment later in futex_lock_pi_atomic(). This pi_state provides
>>the
>> necessary linkage to perform the priority boosting in the event of a
>> priority inversion. This is handled externally from the futexes via the
>> rt_mutex construct.
>>
>> Clear as mud?
>
>Not quite that bad, but... The thing is, still, the man page has text
>such as the following (based on your wording):
>
> FUTEX_CMP_REQUEUE_PI (since Linux 2.6.31)
> This operation is a PI-aware variant of FUTEX_CMP_REQUEUE.
> It requeues waiters that are blocked via
> FUTEX_WAIT_REQUEUE_PI on uaddr from a non-PI source futex
> (uaddr) to a PI target futex (uaddr2).
>
>And elsewhere you said
>
> EINVAL is returned if the non-pi to pi or
> op pairing semantics are violated.
>
>When someone in user-land (e.g., me) reads pieces like that, they then
>want to find somewhere in the man page a description of what makes a
>futex a *PI futex* and probably some statements of the distinction
>between PI and non-PI futexes. And those statements should be from a
>perspective that is somewhat comprehensible to user-space. I'm not
>yet confident that I can do that. Do you care to take a shot at it?
Hrm, tricky indeed. From userspace, what makes a "futex" PI is the policy
agreement between kernel and userspace (which is the value of the futex:
0, TID, TID|WAITERS, and never just WAITERS, and the use of PI aware futex
op codes when making the futex syscalls.
For a longer discussion of this policy, see Documentation/pi-futex.txt.
Also note that this policy can be combined with that for robust futexes,
adding the OWNERDIED component.
--
Darren Hart
Intel Open Source Technology Center
next prev parent reply other threads:[~2015-01-17 19:27 UTC|newest]
Thread overview: 80+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-05-14 10:35 futex(2) man page update help request Michael Kerrisk (man-pages)
2014-05-14 16:18 ` Darren Hart
2014-05-14 19:03 ` Michael Kerrisk (man-pages)
2014-05-14 19:59 ` Darren Hart
2014-05-14 20:23 ` Carlos O'Donell
2014-05-14 20:44 ` Andy Lutomirski
2014-05-14 23:34 ` Thomas Gleixner
2014-05-15 3:12 ` Carlos O'Donell
2014-05-15 4:49 ` Michael Kerrisk (man-pages)
2014-05-15 4:53 ` Michael Kerrisk (man-pages)
2014-05-15 14:14 ` Thomas Gleixner
2014-05-15 20:19 ` Michael Kerrisk (man-pages)
2014-08-04 14:46 ` Carlos O'Donell
2014-05-15 20:35 ` Darren Hart
2015-01-15 15:12 ` Michael Kerrisk (man-pages)
2015-01-17 1:33 ` Darren Hart
2015-01-17 9:16 ` Michael Kerrisk (man-pages)
2015-01-17 19:26 ` Darren Hart [this message]
2015-01-18 10:18 ` Michael Kerrisk (man-pages)
2015-01-15 15:10 ` Michael Kerrisk (man-pages)
2015-01-15 22:23 ` Thomas Gleixner
2015-01-16 15:17 ` Michael Kerrisk (man-pages)
2015-01-16 15:20 ` Thomas Gleixner
2015-01-16 20:54 ` Michael Kerrisk (man-pages)
2015-01-17 0:46 ` Darren Hart
2015-01-19 10:45 ` Thomas Gleixner
2015-01-19 14:07 ` Michael Kerrisk (man-pages)
2015-01-23 18:19 ` Torvald Riegel
2015-01-24 10:05 ` Thomas Gleixner
2015-01-24 12:58 ` Torvald Riegel
2015-01-24 16:25 ` Thomas Gleixner
2015-01-17 0:56 ` Davidlohr Bueso
2015-01-17 1:11 ` Darren Hart
2015-01-23 18:29 ` Torvald Riegel
2015-01-24 11:35 ` Thomas Gleixner
2015-01-24 13:12 ` Torvald Riegel
2015-01-27 7:48 ` Michael Kerrisk (man-pages)
2015-02-05 19:57 ` Darren Hart
2014-05-15 8:13 ` Peter Zijlstra
2014-05-15 15:43 ` Darren Hart
2014-05-15 8:14 ` Peter Zijlstra
2014-05-15 13:18 ` Carlos O'Donell
2014-05-15 13:22 ` Peter Zijlstra
2014-05-15 13:49 ` Michael Kerrisk (man-pages)
2014-05-15 13:55 ` Peter Zijlstra
2014-05-15 14:39 ` Carlos O'Donell
2014-05-15 15:11 ` Peter Zijlstra
2014-05-14 20:56 ` Davidlohr Bueso
2014-05-14 21:03 ` Darren Hart
2014-05-14 22:21 ` Paul E. McKenney
2014-05-15 0:28 ` H. Peter Anvin
2014-05-15 0:35 ` Andy Lutomirski
2014-05-15 0:41 ` H. Peter Anvin
2014-05-15 19:10 ` Carlos O'Donell
2014-05-14 21:05 ` Davidlohr Bueso
2014-05-15 15:15 ` Joseph S. Myers
2014-05-15 0:18 ` H. Peter Anvin
2014-05-15 5:21 ` Darren Hart
2014-05-15 8:23 ` Peter Zijlstra
2014-05-15 13:46 ` Michael Kerrisk (man-pages)
2014-05-15 14:59 ` H. Peter Anvin
2014-05-15 15:42 ` chrubis
2014-05-15 15:52 ` H. Peter Anvin
2014-05-15 16:01 ` chrubis
2014-05-15 16:07 ` H. Peter Anvin
2014-05-15 16:17 ` chrubis
2014-05-15 16:56 ` H. Peter Anvin
2014-05-15 17:06 ` chrubis
2014-05-15 15:47 ` Darren Hart
2014-05-15 15:35 ` chrubis
2014-05-15 15:28 ` chrubis
2014-05-15 15:40 ` Steven Rostedt
2014-05-15 16:14 ` Darren Hart
2014-05-15 16:30 ` chrubis
2014-05-15 18:17 ` Darren Hart
2014-05-15 19:05 ` chrubis
2014-05-15 19:38 ` Darren Hart
2014-08-11 10:19 ` chrubis
2014-11-26 13:41 ` Cyril Hrubis
2015-02-16 13:14 ` Cyril Hrubis
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=D0DFF430.B7F94%dvhart@linux.intel.com \
--to=dvhart@linux.intel.com \
--cc=arnd@arndb.de \
--cc=carlos@redhat.com \
--cc=dave@stgolabs.net \
--cc=jakub@redhat.com \
--cc=jan.kiszka@siemens.com \
--cc=linux-api@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-man@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=mtk.manpages@gmail.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).