Re: [PATCH net-next] fq_codel: add batch ability to fq_codel_drop()

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Jesper Dangaard Brouer <brouer@redhat.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: brouer@redhat.com, David Miller <davem@davemloft.net>,
	netdev <netdev@vger.kernel.org>, Dave Taht <dave.taht@gmail.com>,
	Jonathan Morton <chromatix99@gmail.com>
Subject: Re: [PATCH net-next] fq_codel: add batch ability to fq_codel_drop()
Date: Mon, 2 May 2016 09:49:54 +0200	[thread overview]
Message-ID: <20160502094954.24cc9549@redhat.com> (raw)
In-Reply-To: <1462146446.5535.236.camel@edumazet-glaptop3.roam.corp.google.com>

On Sun, 01 May 2016 16:47:26 -0700
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> From: Eric Dumazet <edumazet@google.com>
> 
> In presence of inelastic flows and stress, we can call
> fq_codel_drop() for every packet entering fq_codel qdisc.
> 
> fq_codel_drop() is quite expensive, as it does a linear scan
> of 4 KB of memory to find a fat flow.
> Once found, it drops the oldest packet of this flow.
> 
> Instead of dropping a single packet, try to drop 50% of the backlog
> of this fat flow, with a configurable limit of 64 packets per round.
> 
> TCA_FQ_CODEL_DROP_BATCH_SIZE is the new attribute to make this
> limit configurable.
> 
> With this strategy the 4 KB search is amortized to a single cache line
> per drop [1], so fq_codel_drop() no longer appears at the top of kernel
> profile in presence of few inelastic flows.
> 
> [1] Assuming a 64byte cache line, and 1024 buckets
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Dave Taht <dave.taht@gmail.com>
> Cc: Jonathan Morton <chromatix99@gmail.com>
> ---
>  include/uapi/linux/pkt_sched.h |    1 
>  net/sched/sch_fq_codel.c       |   64 +++++++++++++++++++++----------
>  2 files changed, 46 insertions(+), 19 deletions(-)
> 
> diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
> index 1c78c7454c7c..a11afecd4482 100644
> --- a/include/uapi/linux/pkt_sched.h
> +++ b/include/uapi/linux/pkt_sched.h
> @@ -718,6 +718,7 @@ enum {
>  	TCA_FQ_CODEL_FLOWS,
>  	TCA_FQ_CODEL_QUANTUM,
>  	TCA_FQ_CODEL_CE_THRESHOLD,
> +	TCA_FQ_CODEL_DROP_BATCH_SIZE,
>  	__TCA_FQ_CODEL_MAX
>  };
>  
> diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
> index a5e420b3d4ab..e7b42b0d5145 100644
> --- a/net/sched/sch_fq_codel.c
> +++ b/net/sched/sch_fq_codel.c
> @@ -59,6 +59,7 @@ struct fq_codel_sched_data {
>  	u32		flows_cnt;	/* number of flows */
>  	u32		perturbation;	/* hash perturbation */
>  	u32		quantum;	/* psched_mtu(qdisc_dev(sch)); */
> +	u32		drop_batch_size;
>  	struct codel_params cparams;
>  	struct codel_stats cstats;
>  	u32		drop_overlimit;
> @@ -135,17 +136,20 @@ static inline void flow_queue_add(struct fq_codel_flow *flow,
>  	skb->next = NULL;
>  }
>  
> -static unsigned int fq_codel_drop(struct Qdisc *sch)
> +static unsigned int fq_codel_drop(struct Qdisc *sch, unsigned int max_packets)
>  {
>  	struct fq_codel_sched_data *q = qdisc_priv(sch);
>  	struct sk_buff *skb;
>  	unsigned int maxbacklog = 0, idx = 0, i, len;
>  	struct fq_codel_flow *flow;
> +	unsigned int threshold;
>  
> -	/* Queue is full! Find the fat flow and drop packet from it.
> +	/* Queue is full! Find the fat flow and drop packet(s) from it.
>  	 * This might sound expensive, but with 1024 flows, we scan
>  	 * 4KB of memory, and we dont need to handle a complex tree
>  	 * in fast path (packet queue/enqueue) with many cache misses.
> +	 * In stress mode, we'll try to drop 64 packets from the flow,
> +	 * amortizing this linear lookup to one cache line per drop.
>  	 */
>  	for (i = 0; i < q->flows_cnt; i++) {
>  		if (q->backlogs[i] > maxbacklog) {
> @@ -153,15 +157,24 @@ static unsigned int fq_codel_drop(struct Qdisc *sch)
>  			idx = i;
>  		}
>  	}
> +
> +	/* Our goal is to drop half of this fat flow backlog */
> +	threshold = maxbacklog >> 1;
> +
>  	flow = &q->flows[idx];
> -	skb = dequeue_head(flow);
> -	len = qdisc_pkt_len(skb);
> +	len = 0;
> +	i = 0;
> +	do {
> +		skb = dequeue_head(flow);
> +		len += qdisc_pkt_len(skb);
> +		kfree_skb(skb);
> +	} while (++i < max_packets && len < threshold);
> +
> +	flow->dropped += i;

What about using bulk free of SKBs here?

There is a very high probability that we are hitting SLUB slowpath,
which involves an expensive locked cmpxchg_double per packet.  Instead
we can amortize this cost via kmem_cache_free_bulk().

Maybe extend kfree_skb_list() to hide the slab/kmem_cache call?


>  	q->backlogs[idx] -= len;
> -	sch->q.qlen--;
> -	qdisc_qstats_drop(sch);
> -	qdisc_qstats_backlog_dec(sch, skb);
> -	kfree_skb(skb);
> -	flow->dropped++;
> +	sch->qstats.drops += i;
> +	sch->qstats.backlog -= len;
> +	sch->q.qlen -= i;
>  	return idx;
>  }
>  

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

next prev parent reply	other threads:[~2016-05-02  7:50 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-01 23:47 [PATCH net-next] fq_codel: add batch ability to fq_codel_drop() Eric Dumazet
2016-05-02  7:49 ` Jesper Dangaard Brouer [this message]
2016-05-02 14:34   ` Eric Dumazet
2016-05-02 16:00     ` Jesper Dangaard Brouer
2016-05-02 16:12       ` Eric Dumazet
2016-05-02 17:15         ` Jesper Dangaard Brouer
2016-05-03  1:07           ` Dave Taht
2016-05-03 16:47 ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160502094954.24cc9549@redhat.com \
    --to=brouer@redhat.com \
    --cc=chromatix99@gmail.com \
    --cc=dave.taht@gmail.com \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).