From: Eric Dumazet <dada1@cosmosbay.com>
To: Ingo Molnar <mingo@elte.hu>
Cc: linux kernel <linux-kernel@vger.kernel.org>,
Andi Kleen <andi@firstfloor.org>,
David Miller <davem@davemloft.net>,
cl@linux.com, jesse.brandeburg@intel.com, netdev@vger.kernel.org,
haoki@redhat.com, mchan@broadcom.com, davidel@xmailserver.org
Subject: Re: [PATCH] poll: Avoid extra wakeups in select/poll
Date: Wed, 29 Apr 2009 14:29:39 +0200 [thread overview]
Message-ID: <49F84833.5000908@cosmosbay.com> (raw)
In-Reply-To: <20090429102734.GC2373@elte.hu>
Ingo Molnar a écrit :
> * Eric Dumazet <dada1@cosmosbay.com> wrote:
>
>> Ingo Molnar a écrit :
>>> * Eric Dumazet <dada1@cosmosbay.com> wrote:
>>>
>>>> @@ -418,8 +429,16 @@ int do_select(int n, fd_set_bits *fds, struct timespec *end_time)
>>>> if (file) {
>>>> f_op = file->f_op;
>>>> mask = DEFAULT_POLLMASK;
>>>> - if (f_op && f_op->poll)
>>>> + if (f_op && f_op->poll) {
>>>> + if (wait) {
>>>> + wait->key = POLLEX_SET;
>>>> + if (in & bit)
>>>> + wait->key |= POLLIN_SET;
>>>> + if (out & bit)
>>>> + wait->key |= POLLOUT_SET;
>>>> + }
>>>> mask = (*f_op->poll)(file, retval ? NULL : wait);
>>>> + }
>>>> fput_light(file, fput_needed);
>>>> if ((mask & POLLIN_SET) && (in & bit)) {
>>>> res_in |= bit;
>>> Please factor this whole 'if (file)' branch out into a helper.
>>> Typical indentation levels go from 1 to 3 tabs - 4 should be avoided
>>> if possible and 5 is pretty excessive already. This goes to eight.
>>>
>> Thanks Ingo,
>>
>> Here is v3 of patch, with your Acked-by included :)
>>
>> This is IMHO clearer since helper immediatly follows POLLIN_SET / POLLOUT_SET /
>> POLLEX_SET defines.
>>
>> [PATCH] poll: Avoid extra wakeups in select/poll
>>
>> After introduction of keyed wakeups Davide Libenzi did on epoll, we
>> are able to avoid spurious wakeups in poll()/select() code too.
>>
>> For example, typical use of poll()/select() is to wait for incoming
>> network frames on many sockets. But TX completion for UDP/TCP
>> frames call sock_wfree() which in turn schedules thread.
>>
>> When scheduled, thread does a full scan of all polled fds and
>> can sleep again, because nothing is really available. If number
>> of fds is large, this cause significant load.
>>
>> This patch makes select()/poll() aware of keyed wakeups and
>> useless wakeups are avoided. This reduces number of context
>> switches by about 50% on some setups, and work performed
>> by sofirq handlers.
>>
>> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
>> Acked-by: David S. Miller <davem@davemloft.net>
>> Acked-by: Andi Kleen <ak@linux.intel.com>
>> Acked-by: Ingo Molnar <mingo@elte.hu>
>
>> ---
>> fs/select.c | 40 ++++++++++++++++++++++++++++++++++++----
>> include/linux/poll.h | 3 +++
>> 2 files changed, 39 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/select.c b/fs/select.c
>> index 0fe0e14..ba068ad 100644
>> --- a/fs/select.c
>> +++ b/fs/select.c
>> @@ -168,7 +168,7 @@ static struct poll_table_entry *poll_get_entry(struct poll_wqueues *p)
>> return table->entry++;
>> }
>>
>> -static int pollwake(wait_queue_t *wait, unsigned mode, int sync, void *key)
>> +static int __pollwake(wait_queue_t *wait, unsigned mode, int sync, void *key)
>> {
>> struct poll_wqueues *pwq = wait->private;
>> DECLARE_WAITQUEUE(dummy_wait, pwq->polling_task);
>> @@ -194,6 +194,16 @@ static int pollwake(wait_queue_t *wait, unsigned mode, int sync, void *key)
>> return default_wake_function(&dummy_wait, mode, sync, key);
>> }
>>
>> +static int pollwake(wait_queue_t *wait, unsigned mode, int sync, void *key)
>> +{
>> + struct poll_table_entry *entry;
>> +
>> + entry = container_of(wait, struct poll_table_entry, wait);
>> + if (key && !((unsigned long)key & entry->key))
>> + return 0;
>> + return __pollwake(wait, mode, sync, key);
>> +}
>> +
>> /* Add a new entry */
>> static void __pollwait(struct file *filp, wait_queue_head_t *wait_address,
>> poll_table *p)
>> @@ -205,6 +215,7 @@ static void __pollwait(struct file *filp, wait_queue_head_t *wait_address,
>> get_file(filp);
>> entry->filp = filp;
>> entry->wait_address = wait_address;
>> + entry->key = p->key;
>> init_waitqueue_func_entry(&entry->wait, pollwake);
>> entry->wait.private = pwq;
>> add_wait_queue(wait_address, &entry->wait);
>> @@ -362,6 +373,18 @@ get_max:
>> #define POLLOUT_SET (POLLWRBAND | POLLWRNORM | POLLOUT | POLLERR)
>> #define POLLEX_SET (POLLPRI)
>>
>> +static void wait_key_set(poll_table *wait, unsigned long in,
>> + unsigned long out, unsigned long bit)
>> +{
>> + if (wait) {
>> + wait->key = POLLEX_SET;
>> + if (in & bit)
>> + wait->key |= POLLIN_SET;
>> + if (out & bit)
>> + wait->key |= POLLOUT_SET;
>> + }
>> +}
>
> should be inline perhaps?
Well, I thought current practice was not using inline for such trivial functions,
as gcc already inlines them anyway.
Quoting Documentation/CodingStyle :
Often people argue that adding inline to functions that are static and used
only once is always a win since there is no space tradeoff. While this is
technically correct, gcc is capable of inlining these automatically without
help, and the maintenance issue of removing the inline when a second user
appears outweighs the potential value of the hint that tells gcc to do
something it would have done anyway.
Anyway :
[PATCH] poll: Avoid extra wakeups in select/poll
After introduction of keyed wakeups Davide Libenzi did on epoll, we
are able to avoid spurious wakeups in poll()/select() code too.
For example, typical use of poll()/select() is to wait for incoming
network frames on many sockets. But TX completion for UDP/TCP
frames call sock_wfree() which in turn schedules thread.
When scheduled, thread does a full scan of all polled fds and
can sleep again, because nothing is really available. If number
of fds is large, this cause significant load.
This patch makes select()/poll() aware of keyed wakeups and
useless wakeups are avoided. This reduces number of context
switches by about 50% on some setups, and work performed
by sofirq handlers.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
---
fs/select.c | 40 ++++++++++++++++++++++++++++++++++++----
include/linux/poll.h | 3 +++
2 files changed, 39 insertions(+), 4 deletions(-)
diff --git a/fs/select.c b/fs/select.c
index 0fe0e14..ba068ad 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -168,7 +168,7 @@ static struct poll_table_entry *poll_get_entry(struct poll_wqueues *p)
return table->entry++;
}
-static int pollwake(wait_queue_t *wait, unsigned mode, int sync, void *key)
+static int __pollwake(wait_queue_t *wait, unsigned mode, int sync, void *key)
{
struct poll_wqueues *pwq = wait->private;
DECLARE_WAITQUEUE(dummy_wait, pwq->polling_task);
@@ -194,6 +194,16 @@ static int pollwake(wait_queue_t *wait, unsigned mode, int sync, void *key)
return default_wake_function(&dummy_wait, mode, sync, key);
}
+static int pollwake(wait_queue_t *wait, unsigned mode, int sync, void *key)
+{
+ struct poll_table_entry *entry;
+
+ entry = container_of(wait, struct poll_table_entry, wait);
+ if (key && !((unsigned long)key & entry->key))
+ return 0;
+ return __pollwake(wait, mode, sync, key);
+}
+
/* Add a new entry */
static void __pollwait(struct file *filp, wait_queue_head_t *wait_address,
poll_table *p)
@@ -205,6 +215,7 @@ static void __pollwait(struct file *filp, wait_queue_head_t *wait_address,
get_file(filp);
entry->filp = filp;
entry->wait_address = wait_address;
+ entry->key = p->key;
init_waitqueue_func_entry(&entry->wait, pollwake);
entry->wait.private = pwq;
add_wait_queue(wait_address, &entry->wait);
@@ -362,6 +373,18 @@ get_max:
#define POLLOUT_SET (POLLWRBAND | POLLWRNORM | POLLOUT | POLLERR)
#define POLLEX_SET (POLLPRI)
+static inline void wait_key_set(poll_table *wait, unsigned long in,
+ unsigned long out, unsigned long bit)
+{
+ if (wait) {
+ wait->key = POLLEX_SET;
+ if (in & bit)
+ wait->key |= POLLIN_SET;
+ if (out & bit)
+ wait->key |= POLLOUT_SET;
+ }
+}
+
int do_select(int n, fd_set_bits *fds, struct timespec *end_time)
{
ktime_t expire, *to = NULL;
@@ -418,20 +441,25 @@ int do_select(int n, fd_set_bits *fds, struct timespec *end_time)
if (file) {
f_op = file->f_op;
mask = DEFAULT_POLLMASK;
- if (f_op && f_op->poll)
- mask = (*f_op->poll)(file, retval ? NULL : wait);
+ if (f_op && f_op->poll) {
+ wait_key_set(wait, in, out, bit);
+ mask = (*f_op->poll)(file, wait);
+ }
fput_light(file, fput_needed);
if ((mask & POLLIN_SET) && (in & bit)) {
res_in |= bit;
retval++;
+ wait = NULL;
}
if ((mask & POLLOUT_SET) && (out & bit)) {
res_out |= bit;
retval++;
+ wait = NULL;
}
if ((mask & POLLEX_SET) && (ex & bit)) {
res_ex |= bit;
retval++;
+ wait = NULL;
}
}
}
@@ -685,8 +713,12 @@ static inline unsigned int do_pollfd(struct pollfd *pollfd, poll_table *pwait)
mask = POLLNVAL;
if (file != NULL) {
mask = DEFAULT_POLLMASK;
- if (file->f_op && file->f_op->poll)
+ if (file->f_op && file->f_op->poll) {
+ if (pwait)
+ pwait->key = pollfd->events |
+ POLLERR | POLLHUP;
mask = file->f_op->poll(file, pwait);
+ }
/* Mask out unneeded events. */
mask &= pollfd->events | POLLERR | POLLHUP;
fput_light(file, fput_needed);
diff --git a/include/linux/poll.h b/include/linux/poll.h
index 8c24ef8..3327389 100644
--- a/include/linux/poll.h
+++ b/include/linux/poll.h
@@ -32,6 +32,7 @@ typedef void (*poll_queue_proc)(struct file *, wait_queue_head_t *, struct poll_
typedef struct poll_table_struct {
poll_queue_proc qproc;
+ unsigned long key;
} poll_table;
static inline void poll_wait(struct file * filp, wait_queue_head_t * wait_address, poll_table *p)
@@ -43,10 +44,12 @@ static inline void poll_wait(struct file * filp, wait_queue_head_t * wait_addres
static inline void init_poll_funcptr(poll_table *pt, poll_queue_proc qproc)
{
pt->qproc = qproc;
+ pt->key = ~0UL; /* all events enabled */
}
struct poll_table_entry {
struct file *filp;
+ unsigned long key;
wait_queue_t wait;
wait_queue_head_t *wait_address;
};
next prev parent reply other threads:[~2009-04-29 12:31 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-04-24 20:10 udp ping pong with various process bindings (and correct cpu mappings) Christoph Lameter
2009-04-24 21:18 ` Eric Dumazet
2009-04-25 15:47 ` [PATCH] net: Avoid extra wakeups of threads blocked in wait_for_packet() Eric Dumazet
2009-04-26 9:04 ` David Miller
2009-04-26 10:46 ` [PATCH] poll: Avoid extra wakeups Eric Dumazet
2009-04-26 13:33 ` Jarek Poplawski
2009-04-26 14:27 ` Eric Dumazet
2009-04-28 9:15 ` David Miller
2009-04-28 9:24 ` Eric Dumazet
2009-04-28 14:21 ` Andi Kleen
2009-04-28 14:58 ` Eric Dumazet
2009-04-28 15:06 ` [PATCH] poll: Avoid extra wakeups in select/poll Eric Dumazet
2009-04-28 19:05 ` Christoph Lameter
2009-04-28 20:05 ` Eric Dumazet
2009-04-28 20:14 ` Christoph Lameter
2009-04-28 20:33 ` Eric Dumazet
2009-04-28 20:49 ` Christoph Lameter
2009-04-28 21:04 ` Eric Dumazet
2009-04-28 21:00 ` Christoph Lameter
2009-04-28 21:05 ` Eric Dumazet
2009-04-28 21:04 ` Christoph Lameter
2009-04-28 21:11 ` Eric Dumazet
2009-04-29 9:11 ` Ingo Molnar
2009-04-30 10:49 ` Eric Dumazet
2009-04-30 11:57 ` Ingo Molnar
2009-04-30 14:08 ` Eric Dumazet
2009-04-30 16:07 ` [BUG] perf_counter: change cpu frequencies Eric Dumazet
2009-05-03 6:06 ` Eric Dumazet
2009-05-03 7:25 ` Ingo Molnar
2009-05-04 10:39 ` Eric Dumazet
2009-04-30 21:24 ` [PATCH] poll: Avoid extra wakeups in select/poll Paul E. McKenney
2009-04-29 7:20 ` Andrew Morton
2009-04-29 7:35 ` Andi Kleen
2009-04-29 7:37 ` Eric Dumazet
2009-04-29 9:22 ` Ingo Molnar
2009-04-29 7:39 ` Eric Dumazet
2009-04-29 8:26 ` Eric Dumazet
2009-04-29 9:16 ` Ingo Molnar
2009-04-29 9:36 ` Eric Dumazet
2009-04-29 10:27 ` Ingo Molnar
2009-04-29 12:29 ` Eric Dumazet [this message]
2009-04-29 13:07 ` Ingo Molnar
2009-04-29 15:53 ` Davide Libenzi
2009-04-28 9:26 ` [PATCH] net: Avoid extra wakeups of threads blocked in wait_for_packet() David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=49F84833.5000908@cosmosbay.com \
--to=dada1@cosmosbay.com \
--cc=andi@firstfloor.org \
--cc=cl@linux.com \
--cc=davem@davemloft.net \
--cc=davidel@xmailserver.org \
--cc=haoki@redhat.com \
--cc=jesse.brandeburg@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mchan@broadcom.com \
--cc=mingo@elte.hu \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.