From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jesper Dangaard Brouer Subject: Re: [PATCH net-next 1/4] net_sched: drop packets after root qdisc lock is released Date: Wed, 22 Jun 2016 17:14:37 +0200 Message-ID: <20160622171437.3f52d3a4@redhat.com> References: <1466576212-15012-1-git-send-email-edumazet@google.com> <1466576212-15012-2-git-send-email-edumazet@google.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: "David S . Miller" , netdev , John Fastabend , Eric Dumazet , brouer@redhat.com To: Eric Dumazet Return-path: Received: from mx1.redhat.com ([209.132.183.28]:56100 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750957AbcFVPOm (ORCPT ); Wed, 22 Jun 2016 11:14:42 -0400 In-Reply-To: <1466576212-15012-2-git-send-email-edumazet@google.com> Sender: netdev-owner@vger.kernel.org List-ID: On Tue, 21 Jun 2016 23:16:49 -0700 Eric Dumazet wrote: > Qdisc performance suffers when packets are dropped at enqueue() > time because drops (kfree_skb()) are done while qdisc lock is held, > delaying a dequeue() draining the queue. > > Nominal throughput can be reduced by 50 % when this happens, > at a time we would like the dequeue() to proceed as fast as possible. > > Even FQ is vulnerable to this problem, while one of FQ goals was > to provide some flow isolation. > > This patch adds a 'struct sk_buff **to_free' parameter to all > qdisc->enqueue(), and in qdisc_drop() helper. > > I measured a performance increase of up to 12 %, but this patch > is a prereq so that future batches in enqueue() can fly. > > Signed-off-by: Eric Dumazet > --- [...] > +/* Instead of calling kfree_skb() while root qdisc lock is held, > + * queue the skb for future freeing at end of __dev_xmit_skb() > + */ > +static inline void __qdisc_drop(struct sk_buff *skb, struct sk_buff **to_free) > +{ > + skb->next = *to_free; > + *to_free = skb; > +} > + [...] > diff --git a/net/core/dev.c b/net/core/dev.c > index d40593b3b9fb..aba10d2a8bc3 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -3070,6 +3070,7 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q, > struct netdev_queue *txq) > { > spinlock_t *root_lock = qdisc_lock(q); > + struct sk_buff *to_free = NULL; > bool contended; > int rc; > > @@ -3086,7 +3087,7 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q, > > spin_lock(root_lock); > if (unlikely(test_bit(__QDISC_STATE_DEACTIVATED, &q->state))) { > - kfree_skb(skb); > + __qdisc_drop(skb, &to_free); > rc = NET_XMIT_DROP; > } else if ((q->flags & TCQ_F_CAN_BYPASS) && !qdisc_qlen(q) && > qdisc_run_begin(q)) { > @@ -3109,7 +3110,7 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q, > > rc = NET_XMIT_SUCCESS; > } else { > - rc = q->enqueue(skb, q) & NET_XMIT_MASK; > + rc = q->enqueue(skb, q, &to_free) & NET_XMIT_MASK; > if (qdisc_run_begin(q)) { > if (unlikely(contended)) { > spin_unlock(&q->busylock); > @@ -3119,6 +3120,8 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q, > } > } > spin_unlock(root_lock); > + if (unlikely(to_free)) > + kfree_skb_list(to_free); Great, now there is a good argument for implementing kmem_cache bulk freeing inside kfree_skb_list(). I did a ugly PoC implementation once, but there was no use-case that really needed the performance boost. Acked-by: Jesper Dangaard Brouer -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat Author of http://www.iptv-analyzer.org LinkedIn: http://www.linkedin.com/in/brouer