All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jesper Dangaard Brouer <brouer@redhat.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: brouer@redhat.com, David Miller <davem@davemloft.net>,
	netdev <netdev@vger.kernel.org>, Dave Taht <dave.taht@gmail.com>,
	Jonathan Morton <chromatix99@gmail.com>
Subject: Re: [PATCH net-next] fq_codel: add batch ability to fq_codel_drop()
Date: Mon, 2 May 2016 09:49:54 +0200	[thread overview]
Message-ID: <20160502094954.24cc9549@redhat.com> (raw)
In-Reply-To: <1462146446.5535.236.camel@edumazet-glaptop3.roam.corp.google.com>

On Sun, 01 May 2016 16:47:26 -0700
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> From: Eric Dumazet <edumazet@google.com>
> 
> In presence of inelastic flows and stress, we can call
> fq_codel_drop() for every packet entering fq_codel qdisc.
> 
> fq_codel_drop() is quite expensive, as it does a linear scan
> of 4 KB of memory to find a fat flow.
> Once found, it drops the oldest packet of this flow.
> 
> Instead of dropping a single packet, try to drop 50% of the backlog
> of this fat flow, with a configurable limit of 64 packets per round.
> 
> TCA_FQ_CODEL_DROP_BATCH_SIZE is the new attribute to make this
> limit configurable.
> 
> With this strategy the 4 KB search is amortized to a single cache line
> per drop [1], so fq_codel_drop() no longer appears at the top of kernel
> profile in presence of few inelastic flows.
> 
> [1] Assuming a 64byte cache line, and 1024 buckets
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Dave Taht <dave.taht@gmail.com>
> Cc: Jonathan Morton <chromatix99@gmail.com>
> ---
>  include/uapi/linux/pkt_sched.h |    1 
>  net/sched/sch_fq_codel.c       |   64 +++++++++++++++++++++----------
>  2 files changed, 46 insertions(+), 19 deletions(-)
> 
> diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
> index 1c78c7454c7c..a11afecd4482 100644
> --- a/include/uapi/linux/pkt_sched.h
> +++ b/include/uapi/linux/pkt_sched.h
> @@ -718,6 +718,7 @@ enum {
>  	TCA_FQ_CODEL_FLOWS,
>  	TCA_FQ_CODEL_QUANTUM,
>  	TCA_FQ_CODEL_CE_THRESHOLD,
> +	TCA_FQ_CODEL_DROP_BATCH_SIZE,
>  	__TCA_FQ_CODEL_MAX
>  };
>  
> diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
> index a5e420b3d4ab..e7b42b0d5145 100644
> --- a/net/sched/sch_fq_codel.c
> +++ b/net/sched/sch_fq_codel.c
> @@ -59,6 +59,7 @@ struct fq_codel_sched_data {
>  	u32		flows_cnt;	/* number of flows */
>  	u32		perturbation;	/* hash perturbation */
>  	u32		quantum;	/* psched_mtu(qdisc_dev(sch)); */
> +	u32		drop_batch_size;
>  	struct codel_params cparams;
>  	struct codel_stats cstats;
>  	u32		drop_overlimit;
> @@ -135,17 +136,20 @@ static inline void flow_queue_add(struct fq_codel_flow *flow,
>  	skb->next = NULL;
>  }
>  
> -static unsigned int fq_codel_drop(struct Qdisc *sch)
> +static unsigned int fq_codel_drop(struct Qdisc *sch, unsigned int max_packets)
>  {
>  	struct fq_codel_sched_data *q = qdisc_priv(sch);
>  	struct sk_buff *skb;
>  	unsigned int maxbacklog = 0, idx = 0, i, len;
>  	struct fq_codel_flow *flow;
> +	unsigned int threshold;
>  
> -	/* Queue is full! Find the fat flow and drop packet from it.
> +	/* Queue is full! Find the fat flow and drop packet(s) from it.
>  	 * This might sound expensive, but with 1024 flows, we scan
>  	 * 4KB of memory, and we dont need to handle a complex tree
>  	 * in fast path (packet queue/enqueue) with many cache misses.
> +	 * In stress mode, we'll try to drop 64 packets from the flow,
> +	 * amortizing this linear lookup to one cache line per drop.
>  	 */
>  	for (i = 0; i < q->flows_cnt; i++) {
>  		if (q->backlogs[i] > maxbacklog) {
> @@ -153,15 +157,24 @@ static unsigned int fq_codel_drop(struct Qdisc *sch)
>  			idx = i;
>  		}
>  	}
> +
> +	/* Our goal is to drop half of this fat flow backlog */
> +	threshold = maxbacklog >> 1;
> +
>  	flow = &q->flows[idx];
> -	skb = dequeue_head(flow);
> -	len = qdisc_pkt_len(skb);
> +	len = 0;
> +	i = 0;
> +	do {
> +		skb = dequeue_head(flow);
> +		len += qdisc_pkt_len(skb);
> +		kfree_skb(skb);
> +	} while (++i < max_packets && len < threshold);
> +
> +	flow->dropped += i;

What about using bulk free of SKBs here?

There is a very high probability that we are hitting SLUB slowpath,
which involves an expensive locked cmpxchg_double per packet.  Instead
we can amortize this cost via kmem_cache_free_bulk().

Maybe extend kfree_skb_list() to hide the slab/kmem_cache call?


>  	q->backlogs[idx] -= len;
> -	sch->q.qlen--;
> -	qdisc_qstats_drop(sch);
> -	qdisc_qstats_backlog_dec(sch, skb);
> -	kfree_skb(skb);
> -	flow->dropped++;
> +	sch->qstats.drops += i;
> +	sch->qstats.backlog -= len;
> +	sch->q.qlen -= i;
>  	return idx;
>  }
>  

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

  reply	other threads:[~2016-05-02  7:50 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-01 23:47 [PATCH net-next] fq_codel: add batch ability to fq_codel_drop() Eric Dumazet
2016-05-02  7:49 ` Jesper Dangaard Brouer [this message]
2016-05-02 14:34   ` Eric Dumazet
2016-05-02 16:00     ` Jesper Dangaard Brouer
2016-05-02 16:12       ` Eric Dumazet
2016-05-02 17:15         ` Jesper Dangaard Brouer
2016-05-03  1:07           ` Dave Taht
2016-05-03 16:47 ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160502094954.24cc9549@redhat.com \
    --to=brouer@redhat.com \
    --cc=chromatix99@gmail.com \
    --cc=dave.taht@gmail.com \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.