public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Davi Arnaut <davi@haxent.com.br>
To: Ulrich Drepper <drepper@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Davide Libenzi <davidel@xmailserver.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [patch 14/22] pollfs: pollable futex
Date: Wed, 02 May 2007 09:20:11 -0300	[thread overview]
Message-ID: <463881FB.3010300@haxent.com.br> (raw)
In-Reply-To: <a36005b50705020040l8874d64m1bcf25da5cae885d@mail.gmail.com>

Ulrich Drepper wrote:
> On 5/1/07, Davi Arnaut <davi@haxent.com.br> wrote:
>> The pollable futex approach is far superior (send and receive events from
>> userspace or kernel) to eventfd and fixes (supercedes) FUTEX_FD at the same time.
>> [...]
> 
> You have to explain in detail how these interfaces are supposed to
> work.  From first sight and without understanding (all) the it seems
> it's far from useful.

It's basically a asynchronous FUTEX_WAIT with notification delivery
through a file descriptor.

> Pollable futexes are useful, but any solution which gets implemented
> must be sufficiently useful for all the uses we might have.

It's very useful for asynchronous event notification libraries
(libevent, liboop, libivykis, etc) because it integrates nicely with
their (e)poll main loops.

Usage schenario: you have 10 worker threads (and 10 futexes) for  disk
i/o (or whatever) and one manager thread which is a state machine
serving many clients (epoll loop).

In this scenario the workers threads have only two possible ways of
notifying the manager thread once a job is done: signals and pipe tricks.

For libraries, signals sux. They dont integrate well with poll() loops,
may have overflow issues (RT), and signal numbers may clash with other
libraries/code. The self-pipe trick waste resources (mostly unused pipe
buffer).

By using pollable futexes, all the manager thread has todo is to
associate each of these futexes with a file descriptor (plfutex) and
epoll() for their completion. Once the futex is signaled, epoll()
returns POLLIN for the file descriptor and the manager thread may
dequeue the notification status from anywhere.


> - the trivial is that you have a futex and you are just interest in
> seeing it change.   The same as FUTEX_WAIT. 

I'm just interested in seeing a FUTEX_WAKE. Yes, same as FUTEX_WAIT.

> I cannot figure out how all this works in your code.

Every futex has a wait queue (q->waiters) which is used to track
processes waiting on the futex. When the futex receives a FUTEX_WAKE it
wakes up all waiters on the wait queue. Also, a futex is considered
woken when it wait queue is empty (or lock_ptr == NULL).

When you register a file descriptor with select(), poll() or epoll() a
callback is queued into the futex wait queue. When the futex receives a
FUTEX_WAKE every callback is called and the event is registered within
each select(), poll() or epoll() table. This initiates a chain reaction
waking up all process sleeping on poll()/whatever.

> Does your read() call (that's the one to wait, yes?) work with O_NONBLOCK
> or how else do you get that behavior?

If the fd is marked O_NONBLOCK and the futex is not woken yet, it simply
returns -EAGAIN (pfs_read_nonblock). If O_NONBLOCK is not set, it waits
synchronously (pfs_read_block/wait_event_interruptible) on the futex
wait queue.

> - more complicated case: I have to wait for multiple futexes and lock
> them all at the same time or don't return at all.  This is possible with
> SysV semaphores and generally useful and needed. How can this be
> implemented with your scheme?

Remember, it's only about FUTEX_WAIT.

> - how does it work with PI futexes?

It dosen't work. AFAICS PI futexes don't use FUTEX_WAKE.

> - can I use a futex at the same time through this mechanism and using the normal
>   FUTEX_WAIT operation?  This is a killer if it's not the case.

Yes.

> - if you have multiple threads polling a futex and the waker wakes up
> one, what happens? It is simply not acceptable to have more than one 
> thread return from the poll() call, this would waste too many cycles,
> just to put all threads but one back to sleep.

Only one is waked up (whatever matches first on the futex hashed bucket).

--
Davi Arnaut

  parent reply	other threads:[~2007-05-02 12:20 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-05-02  5:22 [patch 00/22] pollfs: filesystem abstraction for pollable objects Davi Arnaut
2007-05-02  5:22 ` [patch 01/22] pollfs: kernel-side API header Davi Arnaut
2007-05-02  5:22 ` [patch 02/22] pollfs: file system operations Davi Arnaut
2007-05-02  5:22 ` [patch 03/22] pollfs: asynchronously wait for a signal Davi Arnaut
2007-05-02  5:22 ` [patch 04/22] pollfs: pollable signal Davi Arnaut
2007-05-02  5:22 ` [patch 05/22] pollfs: pollable signal compat code Davi Arnaut
2007-05-02  5:22 ` [patch 06/22] pollfs: export the plsignal system call Davi Arnaut
2007-05-02  5:22 ` [patch 07/22] pollfs: x86, wire up " Davi Arnaut
2007-05-02  5:22 ` [patch 08/22] pollfs: x86_64, " Davi Arnaut
2007-05-02  5:22 ` [patch 09/22] pollfs: pollable hrtimers Davi Arnaut
2007-05-02 21:16   ` Thomas Gleixner
2007-05-02 23:00     ` Davi Arnaut
2007-05-02  5:22 ` [patch 10/22] pollfs: export the pltimer system call Davi Arnaut
2007-05-02  5:22 ` [patch 11/22] pollfs: x86, wire up " Davi Arnaut
2007-05-02  5:22 ` [patch 12/22] pollfs: x86_64, " Davi Arnaut
2007-05-02  5:22 ` [patch 13/22] pollfs: asynchronous futex wait Davi Arnaut
2007-05-02  5:22 ` [patch 14/22] pollfs: pollable futex Davi Arnaut
2007-05-02  5:54   ` Eric Dumazet
2007-05-02  6:16     ` Davi Arnaut
2007-05-02  6:39       ` Eric Dumazet
2007-05-02  6:54         ` Davi Arnaut
2007-05-02  7:11         ` Davi Arnaut
2007-05-02  7:40   ` Ulrich Drepper
2007-05-02  7:55     ` Eric Dumazet
2007-05-02  8:08       ` Ulrich Drepper
2007-05-02  8:49         ` Eric Dumazet
2007-05-02 16:39           ` Ulrich Drepper
2007-05-02 16:59             ` Davi Arnaut
2007-05-02 17:10               ` Ulrich Drepper
2007-05-02 17:29                 ` Davide Libenzi
2007-05-02 17:53                   ` Ulrich Drepper
2007-05-02 18:21                     ` Davide Libenzi
2007-05-03 13:46                       ` Ulrich Drepper
2007-05-03 18:24                         ` Davide Libenzi
2007-05-03 19:03                           ` Ulrich Drepper
2007-05-03 22:14                             ` Davide Libenzi
2007-05-04 15:28                               ` Ulrich Drepper
2007-05-04 19:15                                 ` Davide Libenzi
2007-05-04 19:20                                   ` 2.6.20.4 / 2.6.21.1 AT91SAM9260-EK oops Ryan Ordway
2007-05-04 23:38                                   ` [patch 14/22] pollfs: pollable futex Ulrich Drepper
2007-05-05 18:54                                     ` Davide Libenzi
2007-05-06  7:50                                       ` Ulrich Drepper
2007-05-06 19:47                                         ` Davide Libenzi
2007-05-06 19:54                                         ` Andrew Morton
2007-05-06 20:18                                           ` Davide Libenzi
2007-05-06 21:57                                           ` Davi Arnaut
2007-05-07  5:33                                           ` Ulrich Drepper
2007-05-07  5:46                                           ` Ulrich Drepper
2007-05-02 17:37                 ` Davi Arnaut
2007-05-02 17:49                   ` Ulrich Drepper
2007-05-02 18:05                     ` Davi Arnaut
2007-05-03 13:40                       ` Ulrich Drepper
2007-05-02 12:20     ` Davi Arnaut [this message]
2007-05-02 12:39     ` Davi Arnaut
2007-05-02 16:46       ` Ulrich Drepper
2007-05-02 17:05         ` Davi Arnaut
2007-05-02  5:22 ` [patch 15/22] pollfs: export the plfutex system call Davi Arnaut
2007-05-02  5:22 ` [patch 16/22] pollfs: x86, wire up " Davi Arnaut
2007-05-02  5:22 ` [patch 17/22] pollfs: x86_64, " Davi Arnaut
2007-05-02  5:22 ` [patch 18/22] pollfs: check if a AIO event ring is empty Davi Arnaut
2007-05-02  5:22 ` [patch 19/22] pollfs: pollable aio Davi Arnaut
2007-05-02  5:22 ` [patch 20/22] pollfs: export the plaio system call Davi Arnaut
2007-05-02  5:22 ` [patch 21/22] pollfs: x86, wire up " Davi Arnaut
2007-05-02  5:22 ` [patch 22/22] pollfs: x86_64, " Davi Arnaut
2007-05-02  6:05 ` [patch 00/22] pollfs: filesystem abstraction for pollable objects Andrew Morton
2007-05-02 17:28   ` Davide Libenzi
2007-05-02 17:47     ` Davi Arnaut
2007-05-02 18:23       ` Davide Libenzi
2007-05-02 18:50         ` Davi Arnaut
2007-05-02 19:42           ` Davide Libenzi
2007-05-02 20:11             ` Davi Arnaut

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=463881FB.3010300@haxent.com.br \
    --to=davi@haxent.com.br \
    --cc=akpm@linux-foundation.org \
    --cc=davidel@xmailserver.org \
    --cc=drepper@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox