From: Eric Dumazet <eric.dumazet@gmail.com>
To: Willem de Bruijn <willemb@google.com>
Cc: netdev@vger.kernel.org, davem@davemloft.net, edumazet@google.com
Subject: Re: [PATCH] rps: selective flow shedding during softnet overflow
Date: Fri, 19 Apr 2013 10:58:54 -0700 [thread overview]
Message-ID: <1366394334.16391.36.camel@edumazet-glaptop> (raw)
In-Reply-To: <1366393612-16885-1-git-send-email-willemb@google.com>
On Fri, 2013-04-19 at 13:46 -0400, Willem de Bruijn wrote:
> A cpu executing the network receive path sheds packets when its input
> queue grows to netdev_max_backlog. A single high rate flow (such as a
> spoofed source DoS) can exceed a single cpu processing rate and will
> degrade throughput of other flows hashed onto the same cpu.
>
> This patch adds a more fine grained hashtable. If the netdev backlog
> is above a threshold, IRQ cpus track the ratio of total traffic of
> each flow (using 1024 buckets, configurable). The ratio is measured
> by counting the number of packets per flow over the last 256 packets
> from the source cpu. Any flow that occupies a large fraction of this
> (set at 50%) will see packet drop while above the threshold.
>
> Tested:
> Setup is a muli-threaded UDP echo server with network rx IRQ on cpu0,
> kernel receive (RPS) on cpu0 and application threads on cpus 2--7
> each handling 20k req/s. Throughput halves when hit with a 400 kpps
> antagonist storm. With this patch applied, antagonist overload is
> dropped and the server processes its complete load.
>
> The patch is effective when kernel receive processing is the
> bottleneck. The above RPS scenario is a extreme, but the same is
> reached with RFS and sufficient kernel processing (iptables, packet
> socket tap, ..).
>
> Signed-off-by: Willem de Bruijn <willemb@google.com>
> ---
> +#ifdef CONFIG_NET_FLOW_LIMIT
> +#define FLOW_LIMIT_HISTORY (1 << 8) /* must be ^2 */
> +struct sd_flow_limit {
> + u64 count;
> + unsigned int history_head;
> + u16 history[FLOW_LIMIT_HISTORY];
> + u8 buckets[];
> +};
> +
> +extern int netdev_flow_limit_table_len;
> +#endif /* CONFIG_NET_FLOW_LIMIT */
> +
> /*
> * Incoming packets are placed on per-cpu queues
> */
> @@ -1808,6 +1820,10 @@ struct softnet_data {
> unsigned int dropped;
> struct sk_buff_head input_pkt_queue;
> struct napi_struct backlog;
> +
> +#ifdef CONFIG_NET_FLOW_LIMIT
> + struct sd_flow_limit *flow_limit;
> +#endif
> };
>
> static inline void input_queue_head_incr(struct softnet_data *sd)
> diff --git a/net/Kconfig b/net/Kconfig
> index 2ddc904..ff66a4f 100644
> --- a/net/Kconfig
> +++ b/net/Kconfig
> @@ -259,6 +259,16 @@ config BPF_JIT
> packet sniffing (libpcap/tcpdump). Note : Admin should enable
> this feature changing /proc/sys/net/core/bpf_jit_enable
>
> +config NET_FLOW_LIMIT
> + bool "Flow shedding under load"
> + ---help---
> + The network stack has to drop packets when a receive processing CPUs
> + backlog reaches netdev_max_backlog. If a few out of many active flows
> + generate the vast majority of load, drop their traffic earlier to
> + maintain capacity for the other flows. This feature provides servers
> + with many clients some protection against DoS by a single (spoofed)
> + flow that greatly exceeds average workload.
> +
> menu "Network testing"
>
> config NET_PKTGEN
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 3655ff9..67a4ae0 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -3054,6 +3054,47 @@ static int rps_ipi_queued(struct softnet_data *sd)
> return 0;
> }
>
> +#ifdef CONFIG_NET_FLOW_LIMIT
> +int netdev_flow_limit_table_len __read_mostly = (1 << 12);
> +#endif
> +
> +static bool skb_flow_limit(struct sk_buff *skb, unsigned int qlen)
> +{
> +#ifdef CONFIG_NET_FLOW_LIMIT
> + struct sd_flow_limit *fl;
> + struct softnet_data *sd;
> + unsigned int old_flow, new_flow;
> +
> + if (qlen < (netdev_max_backlog >> 1))
> + return false;
> +
> + sd = &per_cpu(softnet_data, smp_processor_id());
> +
> + rcu_read_lock();
> + fl = rcu_dereference(sd->flow_limit);
> + if (fl) {
> + new_flow = skb_get_rxhash(skb) &
> + (netdev_flow_limit_table_len - 1);
There is a race accessing netdev_flow_limit_table_len
(the admin might change the value, and we might do an out of bound
access)
This should be a field in fl, aka fl->mask, so thats its safe
> + old_flow = fl->history[fl->history_head];
> + fl->history[fl->history_head] = new_flow;
> +
> + fl->history_head++;
> + fl->history_head &= FLOW_LIMIT_HISTORY - 1;
> +
> + if (likely(fl->buckets[old_flow]))
> + fl->buckets[old_flow]--;
> +
> + if (++fl->buckets[new_flow] > (FLOW_LIMIT_HISTORY >> 1)) {
> + fl->count++;
> + rcu_read_unlock();
> + return true;
> + }
> + }
> + rcu_read_unlock();
> +#endif
> + return false;
> +}
> +
Very nice work by the way !
next prev parent reply other threads:[~2013-04-19 17:58 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-04-19 17:46 [PATCH] rps: selective flow shedding during softnet overflow Willem de Bruijn
2013-04-19 17:58 ` Eric Dumazet [this message]
2013-04-22 20:40 ` Willem de Bruijn
2013-04-22 20:46 ` [PATCH net-next v2] " Willem de Bruijn
2013-04-22 22:30 ` Eric Dumazet
2013-04-23 18:45 ` Willem de Bruijn
2013-04-23 18:46 ` [PATCH net-next v3] " Willem de Bruijn
2013-04-23 19:18 ` Eric Dumazet
2013-04-23 20:30 ` Willem de Bruijn
2013-04-23 20:31 ` [PATCH net-next v4] " Willem de Bruijn
2013-04-23 21:23 ` Stephen Hemminger
2013-04-23 21:37 ` Willem de Bruijn
2013-04-23 21:37 ` Eric Dumazet
2013-04-23 21:52 ` Stephen Hemminger
2013-04-23 22:34 ` David Miller
2013-04-24 0:09 ` Eric Dumazet
2013-04-24 0:37 ` [PATCH net-next v5] " Willem de Bruijn
2013-04-24 1:07 ` Eric Dumazet
2013-04-25 8:20 ` David Miller
2013-05-20 14:02 ` [PATCH net-next v6] " Willem de Bruijn
2013-05-20 16:00 ` Eric Dumazet
2013-05-20 16:08 ` Willem de Bruijn
2013-05-20 20:48 ` David Miller
2013-04-24 1:25 ` [PATCH net-next v4] " Jamal Hadi Salim
2013-04-24 1:32 ` Eric Dumazet
2013-04-24 1:44 ` Jamal Hadi Salim
2013-04-24 2:11 ` Eric Dumazet
2013-04-24 13:00 ` Jamal Hadi Salim
2013-04-24 14:41 ` Eric Dumazet
2013-04-23 22:33 ` David Miller
2013-04-23 21:34 ` Eric Dumazet
2013-04-23 22:41 ` David Miller
2013-04-23 23:11 ` Eric Dumazet
2013-04-23 23:15 ` David Miller
2013-04-23 23:26 ` Eric Dumazet
2013-04-24 0:03 ` Stephen Hemminger
2013-04-24 0:00 ` Willem de Bruijn
2013-04-23 20:46 ` [PATCH net-next v3] " Eric Dumazet
2013-04-19 19:03 ` [PATCH] " Stephen Hemminger
2013-04-19 19:21 ` Eric Dumazet
2013-04-19 20:11 ` Willem de Bruijn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1366394334.16391.36.camel@edumazet-glaptop \
--to=eric.dumazet@gmail.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=netdev@vger.kernel.org \
--cc=willemb@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox