All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Eric Dumazet <dada1@cosmosbay.com>
Cc: David Miller <davem@davemloft.net>,
	kchang@athenacr.com, netdev@vger.kernel.org,
	cl@linux-foundation.org, bmb@athenacr.com
Subject: Re: Multicast packet loss
Date: Tue, 17 Mar 2009 11:11:56 +0100	[thread overview]
Message-ID: <1237284716.5189.284.camel@laptop> (raw)
In-Reply-To: <49BED109.3020504@cosmosbay.com>

On Mon, 2009-03-16 at 23:22 +0100, Eric Dumazet wrote:

> Here is the last incantation of the patch, that of course should be
> split in two parts and better Changelog for further discussion on lkml.

I read the entire thread up to now, and I still don't really understand
the Changelog, sorry :(

> [PATCH] softirq: Introduce mechanism to defer wakeups
> 
> Some network workloads need to call scheduler too many times. For example,
> each received multicast frame can wakeup many threads. ksoftirqd is then
> not able to drain NIC RX queues in time and we get frame losses and high
> latencies.
> 
> This patch adds an infrastructure to delay work done in
> sock_def_readable() at end of do_softirq(). This needs to
> make available current->softirq_context even if !CONFIG_TRACE_IRQFLAGS

How does that solve the wakeup issue?

> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
> ---

> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -158,6 +158,42 @@ void local_bh_enable_ip(unsigned long ip)
>  }
>  EXPORT_SYMBOL(local_bh_enable_ip);
>  
> +
> +#define SOFTIRQ_DELAY_END (struct softirq_delay *)1L
> +static DEFINE_PER_CPU(struct softirq_delay *, softirq_delay_head) = {
> +	SOFTIRQ_DELAY_END
> +};

Why the magic termination value? Can't we NULL terminate the list

> +
> +/*
> + * Caller must disable preemption, and take care of appropriate
> + * locking and refcounting
> + */

Shouldn't we call it __softirq_delay_queue() if the caller needs to
disabled preemption?

Futhermore, don't we always require the caller to take care of lifetime
issues when we queue something?

> +int softirq_delay_queue(struct softirq_delay *sdel)
> +{
> +	if (!sdel->next) {
> +		sdel->next = __get_cpu_var(softirq_delay_head);
> +		__get_cpu_var(softirq_delay_head) = sdel;
> +		return 1;
> +	}
> +	return 0;
> +}
> +
> +/*
> + * Because locking is provided by subsystem, please note
> + * that sdel->func(sdel) is responsible for setting sdel->next to NULL
> + */
> +static void softirq_delay_exec(void)
> +{
> +	struct softirq_delay *sdel;
> +
> +	while ((sdel = __get_cpu_var(softirq_delay_head)) != SOFTIRQ_DELAY_END) {
> +		__get_cpu_var(softirq_delay_head) = sdel->next;
> +		sdel->func(sdel);	/*	sdel->next = NULL;*/
> +		}
> +}

Why can't we write:

  struct softirq_delay *sdel, *next;

  sdel = __get_cpu_var(softirq_delay_head);
  __get_cpu_var(softirq_delay_head) = NULL;

  while (sdel) {
    next = sdel->next;
    sdel->func(sdel);
    sdel = next;
  }

Why does it matter what happens to sdel->next? we've done the callback.

Aah, the crux is in the re-use policy.. that most certainly does deserve
a comment.

How about we make sdel->next point to itself in the init case?

Then we can write:

  while (sdel) {
    next = sdel->next;
    sdel->next = sdel;
    sdel->func(sdel);
    sdel = next;
  }

and have the enqueue bit look like:

int __softirq_delay_queue(struct softirq_delay *sdel)
{
  struct softirq_delay **head;

  if (sdel->next != sdel)
    return 0;

  head = &__get_cpu_var(softirq_delay_head);
  sdel->next = *head;
  *head = sdel;
  return 1;
}
     
> @@ -1691,6 +1694,43 @@ static void sock_def_readable(struct sock *sk, int len)
>  	read_unlock(&sk->sk_callback_lock);
>  }
>  
> +/*
> + * helper function called by softirq_delay_exec(),
> + * if inet_def_readable() queued us.
> + */
> +static void sock_readable_defer(struct softirq_delay *sdel)
> +{
> +	struct sock *sk = container_of(sdel, struct sock, sk_delay);
> +
> +	sdel->next = NULL;
> +	/*
> +	 * At this point, we dont own a lock on socket, only a reference.
> +	 * We must commit above write, or another cpu could miss a wakeup
> +	 */
> +	smp_wmb();

Where's the matching barrier?

> +	sock_def_readable(sk, 0);
> +	sock_put(sk);
> +}
> +
> +/*
> + * Custom version of sock_def_readable()
> + * We want to defer scheduler processing at the end of do_softirq()
> + * Called with socket locked.
> + */
> +void inet_def_readable(struct sock *sk, int len)
> +{
> +	if (running_from_softirq()) {
> +		if (softirq_delay_queue(&sk->sk_delay))
> +			/*
> +			 * If we queued this socket, take a reference on it
> +			 * Caller owns socket lock, so write to sk_delay.next
> +			 * will be committed before unlock.
> +			 */
> +			sock_hold(sk);
> +	} else
> +		sock_def_readable(sk, len);
> +}

OK, so the idea is to handle a bunch of packets and instead of waking N
threads for each packet, only wake them once at the end of the batch?

Sounds like a sensible idea.. 


  reply	other threads:[~2009-03-17 10:12 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-01-30 17:49 Multicast packet loss Kenny Chang
2009-01-30 19:04 ` Eric Dumazet
2009-01-30 19:17 ` Denys Fedoryschenko
2009-01-30 20:03 ` Neil Horman
2009-01-30 22:29   ` Kenny Chang
2009-01-30 22:41     ` Eric Dumazet
2009-01-31 16:03       ` Neil Horman
2009-02-02 16:13         ` Kenny Chang
2009-02-02 16:48         ` Kenny Chang
2009-02-03 11:55           ` Neil Horman
2009-02-03 15:20             ` Kenny Chang
2009-02-04  1:15               ` Neil Horman
2009-02-04 16:07                 ` Kenny Chang
2009-02-04 16:46                   ` Wesley Chow
2009-02-04 18:11                     ` Eric Dumazet
2009-02-05 13:33                       ` Neil Horman
2009-02-05 13:46                         ` Wesley Chow
2009-02-05 13:29                   ` Neil Horman
2009-02-01 12:40       ` Eric Dumazet
2009-02-02 13:45         ` Neil Horman
2009-02-02 16:57           ` Eric Dumazet
2009-02-02 18:22             ` Neil Horman
2009-02-02 19:51               ` Wes Chow
2009-02-02 20:29                 ` Eric Dumazet
2009-02-02 21:09                   ` Wes Chow
2009-02-02 21:31                     ` Eric Dumazet
2009-02-03 17:34                       ` Kenny Chang
2009-02-04  1:21                         ` Neil Horman
2009-02-26 17:15                           ` Kenny Chang
2009-02-28  8:51                             ` Eric Dumazet
2009-03-01 17:03                               ` Eric Dumazet
2009-03-04  8:16                               ` David Miller
2009-03-04  8:36                                 ` Eric Dumazet
2009-03-07  7:46                                   ` Eric Dumazet
2009-03-08 16:46                                     ` Eric Dumazet
2009-03-09  2:49                                       ` David Miller
2009-03-09  6:36                                         ` Eric Dumazet
2009-03-13 21:51                                           ` David Miller
2009-03-13 22:30                                             ` Eric Dumazet
2009-03-13 22:38                                               ` David Miller
2009-03-13 22:45                                                 ` Eric Dumazet
2009-03-14  9:03                                                   ` [PATCH] net: reorder fields of struct socket Eric Dumazet
2009-03-16  2:59                                                     ` David Miller
2009-03-16 22:22                                                 ` Multicast packet loss Eric Dumazet
2009-03-17 10:11                                                   ` Peter Zijlstra [this message]
2009-03-17 11:08                                                     ` Eric Dumazet
2009-03-17 11:57                                                       ` Peter Zijlstra
2009-03-17 15:00                                                       ` Brian Bloniarz
2009-03-17 15:16                                                         ` Eric Dumazet
2009-03-17 19:39                                                           ` David Stevens
2009-03-17 21:19                                                             ` Eric Dumazet
2009-04-03 19:28                                                   ` Brian Bloniarz
2009-04-05 13:49                                                     ` Eric Dumazet
2009-04-06 21:53                                                       ` Brian Bloniarz
2009-04-06 22:12                                                         ` Brian Bloniarz
2009-04-07 20:08                                                       ` Brian Bloniarz
2009-04-08  8:12                                                         ` Eric Dumazet
2009-03-09 22:56                                       ` Brian Bloniarz
2009-03-10  5:28                                         ` Eric Dumazet
2009-03-10 23:22                                           ` Brian Bloniarz
2009-03-11  3:00                                             ` Eric Dumazet
2009-03-12 15:47                                               ` Brian Bloniarz
2009-03-12 16:34                                                 ` Eric Dumazet
2009-02-27 18:40       ` Christoph Lameter
2009-02-27 18:56         ` Eric Dumazet
2009-02-27 19:45           ` Christoph Lameter
2009-02-27 20:12             ` Eric Dumazet
2009-02-27 21:36               ` Eric Dumazet
2009-02-02 13:53     ` Eric Dumazet
  -- strict thread matches above, loose matches on Subject: below --
2009-04-05 14:42 bmb

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1237284716.5189.284.camel@laptop \
    --to=peterz@infradead.org \
    --cc=bmb@athenacr.com \
    --cc=cl@linux-foundation.org \
    --cc=dada1@cosmosbay.com \
    --cc=davem@davemloft.net \
    --cc=kchang@athenacr.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.