From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752534AbbARKSc (ORCPT ); Sun, 18 Jan 2015 05:18:32 -0500 Received: from mail-we0-f173.google.com ([74.125.82.173]:36398 "EHLO mail-we0-f173.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751357AbbARKS3 (ORCPT ); Sun, 18 Jan 2015 05:18:29 -0500 Message-ID: <54BB886F.2080908@gmail.com> Date: Sun, 18 Jan 2015 11:18:23 +0100 From: "Michael Kerrisk (man-pages)" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: Darren Hart , Thomas Gleixner CC: mtk.manpages@gmail.com, "Carlos O'Donell" , Ingo Molnar , Jakub Jelinek , "linux-man@vger.kernel.org" , lkml , Arnd Bergmann , Steven Rostedt , Peter Zijlstra , Linux API , Davidlohr Bueso , Jan Kiszka Subject: Re: futex(2) man page update help request References: <537346E5.4050407@gmail.com> <5373D0CA.2050204@redhat.com> <54B7D8D4.2070203@gmail.com> <54BA2872.5040003@gmail.com> In-Reply-To: Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello Darren, On 01/17/2015 08:26 PM, Darren Hart wrote: > > On 1/17/15, 1:16 AM, "Michael Kerrisk (man-pages)" > wrote: [...] >>>> In the meantime, I have a couple of questions, which, if >>>> you could answer them, I would work some changes into the >>>> page before sending. >>>> >>>> 1. In various places, distinction is made between non-PI >>>> futexs and PI futexes. But what determines that distinction? >>>> From the kernel's perspective, hat make a futex one type >>>> or another? I presume it is to do with the types of blocking >>>> waiters on the futex, but it would be good to have a formal >>>> definition. >>> >>> You're right in that a uaddr is a uaddr is a uaddr. Also "there is no >>> such >>> thing as a futex", it doesn't exist as any kind of identifiable object, >>> so >>> these discussions can get rather confusing :-) >> >> So, I want to make sure that I am clear on what you mean you say this. >> You say "there is no such thing as a futex" because from the kernel's >> perspective there is no visible entity in the uncontended case >> (where everything can be dealt with in user space). And from user-space, >> in the uncontended case all we're doing is memory operations. Right? >> >> On the other hand, from a kernel perspective, we could say that a >> futex "exists" in the contended phases, since the kernel has allocated >> state associated with the uaddr. Right? > > > Sorry, this was more anecdotal, and probably more of a distraction than > constructive. I just meant that unlike other things which you can point to > a specific struct for (task, rt_mutex, etc.), a "futex" has it's state > distributed across the backing store (uaddr), the queue (futex_q), the > pi_state, the rt_mutex, etc, and these span kernel space and userspace. > Your description above is correct. Okay. Thanks. I've added a few more words to the page noting that the kernel maintains no state for a futex in the uncontended state. >>> A "futex" becomes a PI futex when it is "created" via a PI futex op >>> code. >> >> Precisely which PI op codes? Is it: FUTEX_LOCK_PI, FUTEX_TRYLOCK_PI, and >> FUTEX_CMP_REQUEUE_PI, and not FUTEX_WAIT_REQUEUE_PI or FUTEX_UNLOCK_PI? > > Based on your wording below about taking a user POV on this, I'm going to > say "yes" here. These opcodes paired with the PI futex value policy > (described below) defines a "futex" as PI aware. These were created very > specifically in support of PI pthread_mutexes, so it makes a lot more > sense to talk about a PI aware pthread_mutex, than a PI aware futex, since > there is a lot of policy and scaffolding that has to be built up around it > to use it properly (this is what a PI pthread_mutex is). See below. >>> At that point, the syscall will ensure a pi_state is populated for the >>> futex_q entry. See futex_lock_pi() for example. Before the locks are >>> taken, there is a call to refill_pi_state_cache() which preps a pi_state >>> for assignment later in futex_lock_pi_atomic(). This pi_state provides >>> the >>> necessary linkage to perform the priority boosting in the event of a >>> priority inversion. This is handled externally from the futexes via the >>> rt_mutex construct. >>> >>> Clear as mud? >> >> Not quite that bad, but... The thing is, still, the man page has text >> such as the following (based on your wording): >> >> FUTEX_CMP_REQUEUE_PI (since Linux 2.6.31) >> This operation is a PI-aware variant of FUTEX_CMP_REQUEUE. >> It requeues waiters that are blocked via >> FUTEX_WAIT_REQUEUE_PI on uaddr from a non-PI source futex >> (uaddr) to a PI target futex (uaddr2). >> >> And elsewhere you said >> >> EINVAL is returned if the non-pi to pi or >> op pairing semantics are violated. >> >> When someone in user-land (e.g., me) reads pieces like that, they then >> want to find somewhere in the man page a description of what makes a >> futex a *PI futex* and probably some statements of the distinction >> between PI and non-PI futexes. And those statements should be from a >> perspective that is somewhat comprehensible to user-space. I'm not >> yet confident that I can do that. Do you care to take a shot at it? > > Hrm, tricky indeed. From userspace, what makes a "futex" PI is the policy > agreement between kernel and userspace (which is the value of the futex: > 0, TID, TID|WAITERS, and never just WAITERS, and the use of PI aware futex > op codes when making the futex syscalls. Okay -- I've attempted to capture this in some text that I added to the page. > For a longer discussion of this policy, see Documentation/pi-futex.txt. Sad to say, that document doesn't supply that much more detail, in my reading of it, at least. > Also note that this policy can be combined with that for robust futexes, > adding the OWNERDIED component. Now there's two other stories that have yet to be dealt with ;-). I have a FIXME already in the page regarding OWNERDIED, and get_robust_list(2) is another page that seems like it could do with a fair bit of work, but that's a story for another day. Cheers, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/