public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <eric.dumazet@gmail.com>
To: Willem de Bruijn <willemb@google.com>
Cc: netdev@vger.kernel.org, davem@davemloft.net, edumazet@google.com
Subject: Re: [PATCH] rps: selective flow shedding during softnet overflow
Date: Fri, 19 Apr 2013 10:58:54 -0700	[thread overview]
Message-ID: <1366394334.16391.36.camel@edumazet-glaptop> (raw)
In-Reply-To: <1366393612-16885-1-git-send-email-willemb@google.com>

On Fri, 2013-04-19 at 13:46 -0400, Willem de Bruijn wrote:
> A cpu executing the network receive path sheds packets when its input
> queue grows to netdev_max_backlog. A single high rate flow (such as a
> spoofed source DoS) can exceed a single cpu processing rate and will
> degrade throughput of other flows hashed onto the same cpu.
> 
> This patch adds a more fine grained hashtable. If the netdev backlog
> is above a threshold, IRQ cpus track the ratio of total traffic of
> each flow (using 1024 buckets, configurable). The ratio is measured
> by counting the number of packets per flow over the last 256 packets
> from the source cpu. Any flow that occupies a large fraction of this
> (set at 50%) will see packet drop while above the threshold.
> 
> Tested:
> Setup is a muli-threaded UDP echo server with network rx IRQ on cpu0,
> kernel receive (RPS) on cpu0 and application threads on cpus 2--7
> each handling 20k req/s. Throughput halves when hit with a 400 kpps
> antagonist storm. With this patch applied, antagonist overload is
> dropped and the server processes its complete load.
> 
> The patch is effective when kernel receive processing is the
> bottleneck. The above RPS scenario is a extreme, but the same is
> reached with RFS and sufficient kernel processing (iptables, packet
> socket tap, ..).
> 
> Signed-off-by: Willem de Bruijn <willemb@google.com>
> ---

> +#ifdef CONFIG_NET_FLOW_LIMIT
> +#define FLOW_LIMIT_HISTORY	(1 << 8)	/* must be ^2 */
> +struct sd_flow_limit {
> +	u64			count;
> +	unsigned int		history_head;
> +	u16			history[FLOW_LIMIT_HISTORY];
> +	u8			buckets[];
> +};
> +
> +extern int netdev_flow_limit_table_len;
> +#endif /* CONFIG_NET_FLOW_LIMIT */
> +
>  /*
>   * Incoming packets are placed on per-cpu queues
>   */
> @@ -1808,6 +1820,10 @@ struct softnet_data {
>  	unsigned int		dropped;
>  	struct sk_buff_head	input_pkt_queue;
>  	struct napi_struct	backlog;
> +
> +#ifdef CONFIG_NET_FLOW_LIMIT
> +	struct sd_flow_limit	*flow_limit;
> +#endif
>  };
>  
>  static inline void input_queue_head_incr(struct softnet_data *sd)
> diff --git a/net/Kconfig b/net/Kconfig
> index 2ddc904..ff66a4f 100644
> --- a/net/Kconfig
> +++ b/net/Kconfig
> @@ -259,6 +259,16 @@ config BPF_JIT
>  	  packet sniffing (libpcap/tcpdump). Note : Admin should enable
>  	  this feature changing /proc/sys/net/core/bpf_jit_enable
>  
> +config NET_FLOW_LIMIT
> +	bool "Flow shedding under load"
> +	---help---
> +	  The network stack has to drop packets when a receive processing CPUs
> +	  backlog reaches netdev_max_backlog. If a few out of many active flows
> +	  generate the vast majority of load, drop their traffic earlier to
> +	  maintain capacity for the other flows. This feature provides servers
> +	  with many clients some protection against DoS by a single (spoofed)
> +	  flow that greatly exceeds average workload.
> +
>  menu "Network testing"
>  
>  config NET_PKTGEN
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 3655ff9..67a4ae0 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -3054,6 +3054,47 @@ static int rps_ipi_queued(struct softnet_data *sd)
>  	return 0;
>  }
>  
> +#ifdef CONFIG_NET_FLOW_LIMIT
> +int netdev_flow_limit_table_len __read_mostly = (1 << 12);
> +#endif
> +
> +static bool skb_flow_limit(struct sk_buff *skb, unsigned int qlen)
> +{
> +#ifdef CONFIG_NET_FLOW_LIMIT
> +	struct sd_flow_limit *fl;
> +	struct softnet_data *sd;
> +	unsigned int old_flow, new_flow;
> +
> +	if (qlen < (netdev_max_backlog >> 1))
> +		return false;
> +
> +	sd = &per_cpu(softnet_data, smp_processor_id());
> +
> +	rcu_read_lock();
> +	fl = rcu_dereference(sd->flow_limit);
> +	if (fl) {
> +		new_flow = skb_get_rxhash(skb) &
> +			   (netdev_flow_limit_table_len - 1);

There is a race accessing netdev_flow_limit_table_len

(the admin might change the value, and we might do an out of bound
access)

This should be a field in fl, aka fl->mask, so thats its safe


> +		old_flow = fl->history[fl->history_head];
> +		fl->history[fl->history_head] = new_flow;
> +
> +		fl->history_head++;
> +		fl->history_head &= FLOW_LIMIT_HISTORY - 1;
> +
> +		if (likely(fl->buckets[old_flow]))
> +			fl->buckets[old_flow]--;
> +
> +		if (++fl->buckets[new_flow] > (FLOW_LIMIT_HISTORY >> 1)) {
> +			fl->count++;
> +			rcu_read_unlock();
> +			return true;
> +		}
> +	}
> +	rcu_read_unlock();
> +#endif
> +	return false;
> +}
> +

Very nice work by the way !

  reply	other threads:[~2013-04-19 17:58 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-04-19 17:46 [PATCH] rps: selective flow shedding during softnet overflow Willem de Bruijn
2013-04-19 17:58 ` Eric Dumazet [this message]
2013-04-22 20:40   ` Willem de Bruijn
2013-04-22 20:46     ` [PATCH net-next v2] " Willem de Bruijn
2013-04-22 22:30       ` Eric Dumazet
2013-04-23 18:45         ` Willem de Bruijn
2013-04-23 18:46           ` [PATCH net-next v3] " Willem de Bruijn
2013-04-23 19:18             ` Eric Dumazet
2013-04-23 20:30               ` Willem de Bruijn
2013-04-23 20:31                 ` [PATCH net-next v4] " Willem de Bruijn
2013-04-23 21:23                   ` Stephen Hemminger
2013-04-23 21:37                     ` Willem de Bruijn
2013-04-23 21:37                     ` Eric Dumazet
2013-04-23 21:52                       ` Stephen Hemminger
2013-04-23 22:34                         ` David Miller
2013-04-24  0:09                         ` Eric Dumazet
2013-04-24  0:37                           ` [PATCH net-next v5] " Willem de Bruijn
2013-04-24  1:07                             ` Eric Dumazet
2013-04-25  8:20                             ` David Miller
2013-05-20 14:02                               ` [PATCH net-next v6] " Willem de Bruijn
2013-05-20 16:00                                 ` Eric Dumazet
2013-05-20 16:08                                   ` Willem de Bruijn
2013-05-20 20:48                                   ` David Miller
2013-04-24  1:25                           ` [PATCH net-next v4] " Jamal Hadi Salim
2013-04-24  1:32                             ` Eric Dumazet
2013-04-24  1:44                               ` Jamal Hadi Salim
2013-04-24  2:11                                 ` Eric Dumazet
2013-04-24 13:00                                   ` Jamal Hadi Salim
2013-04-24 14:41                                     ` Eric Dumazet
2013-04-23 22:33                     ` David Miller
2013-04-23 21:34                   ` Eric Dumazet
2013-04-23 22:41                   ` David Miller
2013-04-23 23:11                     ` Eric Dumazet
2013-04-23 23:15                       ` David Miller
2013-04-23 23:26                         ` Eric Dumazet
2013-04-24  0:03                         ` Stephen Hemminger
2013-04-24  0:00                     ` Willem de Bruijn
2013-04-23 20:46                 ` [PATCH net-next v3] " Eric Dumazet
2013-04-19 19:03 ` [PATCH] " Stephen Hemminger
2013-04-19 19:21   ` Eric Dumazet
2013-04-19 20:11   ` Willem de Bruijn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1366394334.16391.36.camel@edumazet-glaptop \
    --to=eric.dumazet@gmail.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=willemb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox