All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@elte.hu>
To: Eric Dumazet <dada1@cosmosbay.com>
Cc: linux kernel <linux-kernel@vger.kernel.org>,
	Andi Kleen <andi@firstfloor.org>,
	David Miller <davem@davemloft.net>,
	cl@linux.com, jesse.brandeburg@intel.com, netdev@vger.kernel.org,
	haoki@redhat.com, mchan@broadcom.com, davidel@xmailserver.org
Subject: Re: [PATCH] poll: Avoid extra wakeups in select/poll
Date: Wed, 29 Apr 2009 12:27:34 +0200	[thread overview]
Message-ID: <20090429102734.GC2373@elte.hu> (raw)
In-Reply-To: <49F81FB9.50504@cosmosbay.com>


* Eric Dumazet <dada1@cosmosbay.com> wrote:

> Ingo Molnar a écrit :
> > * Eric Dumazet <dada1@cosmosbay.com> wrote:
> > 
> >> @@ -418,8 +429,16 @@ int do_select(int n, fd_set_bits *fds, struct timespec *end_time)
> >>  				if (file) {
> >>  					f_op = file->f_op;
> >>  					mask = DEFAULT_POLLMASK;
> >> -					if (f_op && f_op->poll)
> >> +					if (f_op && f_op->poll) {
> >> +						if (wait) {
> >> +							wait->key = POLLEX_SET;
> >> +							if (in & bit)
> >> +								wait->key |= POLLIN_SET;
> >> +							if (out & bit)
> >> +								wait->key |= POLLOUT_SET;
> >> +						}
> >>  						mask = (*f_op->poll)(file, retval ? NULL : wait);
> >> +					}
> >>  					fput_light(file, fput_needed);
> >>  					if ((mask & POLLIN_SET) && (in & bit)) {
> >>  						res_in |= bit;
> > 
> > Please factor this whole 'if (file)' branch out into a helper. 
> > Typical indentation levels go from 1 to 3 tabs - 4 should be avoided 
> > if possible and 5 is pretty excessive already. This goes to eight.
> > 
> 
> Thanks Ingo,
> 
> Here is v3 of patch, with your Acked-by included :)
> 
> This is IMHO clearer since helper immediatly follows POLLIN_SET / POLLOUT_SET /
> POLLEX_SET defines.
> 
> [PATCH] poll: Avoid extra wakeups in select/poll
> 
> After introduction of keyed wakeups Davide Libenzi did on epoll, we
> are able to avoid spurious wakeups in poll()/select() code too.
> 
> For example, typical use of poll()/select() is to wait for incoming
> network frames on many sockets. But TX completion for UDP/TCP 
> frames call sock_wfree() which in turn schedules thread.
> 
> When scheduled, thread does a full scan of all polled fds and
> can sleep again, because nothing is really available. If number
> of fds is large, this cause significant load.
> 
> This patch makes select()/poll() aware of keyed wakeups and
> useless wakeups are avoided. This reduces number of context
> switches by about 50% on some setups, and work performed
> by sofirq handlers.
> 
> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
> Acked-by: David S. Miller <davem@davemloft.net>
> Acked-by: Andi Kleen <ak@linux.intel.com>
> Acked-by: Ingo Molnar <mingo@elte.hu>

> ---
>  fs/select.c          |   40 ++++++++++++++++++++++++++++++++++++----
>  include/linux/poll.h |    3 +++
>  2 files changed, 39 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/select.c b/fs/select.c
> index 0fe0e14..ba068ad 100644
> --- a/fs/select.c
> +++ b/fs/select.c
> @@ -168,7 +168,7 @@ static struct poll_table_entry *poll_get_entry(struct poll_wqueues *p)
>  	return table->entry++;
>  }
>  
> -static int pollwake(wait_queue_t *wait, unsigned mode, int sync, void *key)
> +static int __pollwake(wait_queue_t *wait, unsigned mode, int sync, void *key)
>  {
>  	struct poll_wqueues *pwq = wait->private;
>  	DECLARE_WAITQUEUE(dummy_wait, pwq->polling_task);
> @@ -194,6 +194,16 @@ static int pollwake(wait_queue_t *wait, unsigned mode, int sync, void *key)
>  	return default_wake_function(&dummy_wait, mode, sync, key);
>  }
>  
> +static int pollwake(wait_queue_t *wait, unsigned mode, int sync, void *key)
> +{
> +	struct poll_table_entry *entry;
> +
> +	entry = container_of(wait, struct poll_table_entry, wait);
> +	if (key && !((unsigned long)key & entry->key))
> +		return 0;
> +	return __pollwake(wait, mode, sync, key);
> +}
> +
>  /* Add a new entry */
>  static void __pollwait(struct file *filp, wait_queue_head_t *wait_address,
>  				poll_table *p)
> @@ -205,6 +215,7 @@ static void __pollwait(struct file *filp, wait_queue_head_t *wait_address,
>  	get_file(filp);
>  	entry->filp = filp;
>  	entry->wait_address = wait_address;
> +	entry->key = p->key;
>  	init_waitqueue_func_entry(&entry->wait, pollwake);
>  	entry->wait.private = pwq;
>  	add_wait_queue(wait_address, &entry->wait);
> @@ -362,6 +373,18 @@ get_max:
>  #define POLLOUT_SET (POLLWRBAND | POLLWRNORM | POLLOUT | POLLERR)
>  #define POLLEX_SET (POLLPRI)
>  
> +static void wait_key_set(poll_table *wait, unsigned long in,
> +			 unsigned long out, unsigned long bit)
> +{
> +	if (wait) {
> +		wait->key = POLLEX_SET;
> +		if (in & bit)
> +			wait->key |= POLLIN_SET;
> +		if (out & bit)
> +			wait->key |= POLLOUT_SET;
> +	}
> +}

should be inline perhaps?

> +
>  int do_select(int n, fd_set_bits *fds, struct timespec *end_time)
>  {
>  	ktime_t expire, *to = NULL;
> @@ -418,20 +441,25 @@ int do_select(int n, fd_set_bits *fds, struct timespec *end_time)
>  				if (file) {
>  					f_op = file->f_op;
>  					mask = DEFAULT_POLLMASK;
> -					if (f_op && f_op->poll)
> -						mask = (*f_op->poll)(file, retval ? NULL : wait);
> +					if (f_op && f_op->poll) {
> +						wait_key_set(wait, in, out, bit);
> +						mask = (*f_op->poll)(file, wait);
> +					}
>  					fput_light(file, fput_needed);
>  					if ((mask & POLLIN_SET) && (in & bit)) {
>  						res_in |= bit;
>  						retval++;
> +						wait = NULL;
>  					}
>  					if ((mask & POLLOUT_SET) && (out & bit)) {
>  						res_out |= bit;
>  						retval++;
> +						wait = NULL;
>  					}
>  					if ((mask & POLLEX_SET) && (ex & bit)) {
>  						res_ex |= bit;
>  						retval++;
> +						wait = NULL;
>  					}
>  				}
>  			}

Looks much nicer now!  [ I'd still suggest to factor out the guts of 
do_select() as its nesting is excessive that hurts its reviewability 
quite a bit - but now your patch does not make the situation any 
worse. ]

Even-More-Acked-by: Ingo Molnar <mingo@elte.hu>

	Ingo

  reply	other threads:[~2009-04-29 10:28 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-04-24 20:10 udp ping pong with various process bindings (and correct cpu mappings) Christoph Lameter
2009-04-24 21:18 ` Eric Dumazet
2009-04-25 15:47 ` [PATCH] net: Avoid extra wakeups of threads blocked in wait_for_packet() Eric Dumazet
2009-04-26  9:04   ` David Miller
2009-04-26 10:46     ` [PATCH] poll: Avoid extra wakeups Eric Dumazet
2009-04-26 13:33       ` Jarek Poplawski
2009-04-26 14:27         ` Eric Dumazet
2009-04-28  9:15       ` David Miller
2009-04-28  9:24         ` Eric Dumazet
2009-04-28 14:21       ` Andi Kleen
2009-04-28 14:58         ` Eric Dumazet
2009-04-28 15:06         ` [PATCH] poll: Avoid extra wakeups in select/poll Eric Dumazet
2009-04-28 19:05           ` Christoph Lameter
2009-04-28 20:05             ` Eric Dumazet
2009-04-28 20:14               ` Christoph Lameter
2009-04-28 20:33                 ` Eric Dumazet
2009-04-28 20:49                   ` Christoph Lameter
2009-04-28 21:04                     ` Eric Dumazet
2009-04-28 21:00                       ` Christoph Lameter
2009-04-28 21:05                       ` Eric Dumazet
2009-04-28 21:04                         ` Christoph Lameter
2009-04-28 21:11                       ` Eric Dumazet
2009-04-29  9:11                         ` Ingo Molnar
2009-04-30 10:49                           ` Eric Dumazet
2009-04-30 11:57                             ` Ingo Molnar
2009-04-30 14:08                               ` Eric Dumazet
2009-04-30 16:07                                 ` [BUG] perf_counter: change cpu frequencies Eric Dumazet
2009-05-03  6:06                                   ` Eric Dumazet
2009-05-03  7:25                                     ` Ingo Molnar
2009-05-04 10:39                                       ` Eric Dumazet
2009-04-30 21:24                                 ` [PATCH] poll: Avoid extra wakeups in select/poll Paul E. McKenney
2009-04-29  7:20           ` Andrew Morton
2009-04-29  7:35             ` Andi Kleen
2009-04-29  7:37               ` Eric Dumazet
2009-04-29  9:22               ` Ingo Molnar
2009-04-29  7:39             ` Eric Dumazet
2009-04-29  8:26               ` Eric Dumazet
2009-04-29  9:16           ` Ingo Molnar
2009-04-29  9:36             ` Eric Dumazet
2009-04-29 10:27               ` Ingo Molnar [this message]
2009-04-29 12:29                 ` Eric Dumazet
2009-04-29 13:07                   ` Ingo Molnar
2009-04-29 15:53                   ` Davide Libenzi
2009-04-28  9:26   ` [PATCH] net: Avoid extra wakeups of threads blocked in wait_for_packet() David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090429102734.GC2373@elte.hu \
    --to=mingo@elte.hu \
    --cc=andi@firstfloor.org \
    --cc=cl@linux.com \
    --cc=dada1@cosmosbay.com \
    --cc=davem@davemloft.net \
    --cc=davidel@xmailserver.org \
    --cc=haoki@redhat.com \
    --cc=jesse.brandeburg@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mchan@broadcom.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.