[RFC PATCH 06/12] net: sched: support qdisc_reset on NOLOCK qdisc

Netdev List
 help / color / mirror / Atom feed

From: John Fastabend <john.fastabend@gmail.com>
To: daniel@iogearbox.net, eric.dumazet@gmail.com, jhs@mojatatu.com,
	aduyck@mirantis.com, brouer@redhat.com, davem@davemloft.net
Cc: john.r.fastabend@intel.com, netdev@vger.kernel.org,
	john.fastabend@gmail.com
Subject: [RFC PATCH 06/12] net: sched: support qdisc_reset on NOLOCK qdisc
Date: Wed, 30 Dec 2015 09:53:13 -0800	[thread overview]
Message-ID: <20151230175313.26257.46445.stgit@john-Precision-Tower-5810> (raw)
In-Reply-To: <20151230175000.26257.41532.stgit@john-Precision-Tower-5810>

The qdisc_reset operation depends on the qdisc lock at the moment
to halt any additions to gso_skb and statistics while the list is
free'd and the stats zeroed.

Without the qdisc lock we can not guarantee another cpu is not in
the process of adding a skb to one of the "cells". Here are the
two cases we have to handle.

 case 1: qdisc_graft operation. In this case a "new" qdisc is attached
	 and the 'qdisc_destroy' operation is called on the old qdisc.
	 The destroy operation will wait a rcu grace period and call
	 qdisc_rcu_free(). At which point gso_cpu_skb is free'd along
	 with all stats so no need to zero stats and gso_cpu_skb from
	 the reset operation itself.

	 Because we can not continue to call qdisc_reset before waiting
	 an rcu grace period so that the qdisc is detached from all
	 cpus simply do not call qdisc_reset() at all and let the
	 qdisc_destroy operation clean up the qdisc. Note, a refcnt
	 greater than 1 would cause the destroy operation to be
	 aborted however if this ever happened the reference to the
	 qdisc would be lost and we would have a memory leak.

 case 2: dev_deactivate sequence. This can come from a user bringing
	 the interface down which causes the gso_skb list to be flushed
	 and the qlen zero'd. At the moment this is protected by the
	 qdisc lock so while we clear the qlen/gso_skb fields we are
	 guaranteed no new skbs are added. For the lockless case
	 though this is not true. To resolve this move the qdisc_reset
	 call after the new qdisc is assigned and a grace period is
	 exercised to ensure no new skbs can be enqueued. Further
	 the RTNL lock is held so we can not get another call to
	 activate the qdisc while the skb lists are being free'd.

	 Finally, fix qdisc_reset to handle the per cpu stats and
	 skb lists.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 net/sched/sch_generic.c |   42 +++++++++++++++++++++++++++++++++++-------
 1 file changed, 35 insertions(+), 7 deletions(-)

diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 9aeb51f..134fb95 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -666,6 +666,18 @@ void qdisc_reset(struct Qdisc *qdisc)
 	if (ops->reset)
 		ops->reset(qdisc);
 
+	if (qdisc->gso_cpu_skb) {
+		int i;
+
+		for_each_possible_cpu(i) {
+			struct gso_cell *cell;
+
+			cell = per_cpu_ptr(qdisc->gso_cpu_skb, i);
+			if (cell)
+				kfree_skb_list(cell->skb);
+		}
+	}
+
 	if (qdisc->gso_skb) {
 		kfree_skb_list(qdisc->gso_skb);
 		qdisc->gso_skb = NULL;
@@ -740,10 +752,6 @@ struct Qdisc *dev_graft_qdisc(struct netdev_queue *dev_queue,
 	root_lock = qdisc_lock(oqdisc);
 	spin_lock_bh(root_lock);
 
-	/* Prune old scheduler */
-	if (oqdisc && atomic_read(&oqdisc->refcnt) <= 1)
-		qdisc_reset(oqdisc);
-
 	/* ... and graft new one */
 	if (qdisc == NULL)
 		qdisc = &noop_qdisc;
@@ -857,7 +865,6 @@ static void dev_deactivate_queue(struct net_device *dev,
 			set_bit(__QDISC_STATE_DEACTIVATED, &qdisc->state);
 
 		rcu_assign_pointer(dev_queue->qdisc, qdisc_default);
-		qdisc_reset(qdisc);
 
 		spin_unlock_bh(qdisc_lock(qdisc));
 	}
@@ -890,6 +897,18 @@ static bool some_qdisc_is_busy(struct net_device *dev)
 	return false;
 }
 
+static void dev_qdisc_reset(struct net_device *dev,
+			    struct netdev_queue *dev_queue,
+			    void *none)
+{
+	struct Qdisc *qdisc = dev_queue->qdisc_sleeping;
+
+	WARN_ON(!qdisc);
+
+	if (qdisc)
+		qdisc_reset(qdisc);
+}
+
 /**
  * 	dev_deactivate_many - deactivate transmissions on several devices
  * 	@head: list of devices to deactivate
@@ -910,7 +929,7 @@ void dev_deactivate_many(struct list_head *head)
 					     &noop_qdisc);
 
 		dev_watchdog_down(dev);
-		sync_needed |= !dev->dismantle;
+		sync_needed = true;
 	}
 
 	/* Wait for outstanding qdisc-less dev_queue_xmit calls.
@@ -921,9 +940,18 @@ void dev_deactivate_many(struct list_head *head)
 		synchronize_net();
 
 	/* Wait for outstanding qdisc_run calls. */
-	list_for_each_entry(dev, head, close_list)
+	list_for_each_entry(dev, head, close_list) {
 		while (some_qdisc_is_busy(dev))
 			yield();
+
+		/* The new qdisc is assigned at this point so we can safely
+		 * unwind stale skb lists and qdisc statistics
+		 */
+		netdev_for_each_tx_queue(dev, dev_qdisc_reset, NULL);
+		if (dev_ingress_queue(dev))
+			dev_qdisc_reset(dev, dev_ingress_queue(dev), NULL);
+	}
+
 }
 
 void dev_deactivate(struct net_device *dev)

next prev parent reply	other threads:[~2015-12-30 17:53 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-30 17:50 [RFC PATCH 00/12] drop the qdisc lock for pfifo_fast/mq John Fastabend
2015-12-30 17:51 ` [RFC PATCH 01/12] lib: array based lock free queue John Fastabend
2016-01-13 19:28   ` Jesper Dangaard Brouer
2015-12-30 17:51 ` [RFC PATCH 02/12] net: sched: free per cpu bstats John Fastabend
2016-01-04 15:21   ` Daniel Borkmann
2016-01-04 17:32     ` Eric Dumazet
2016-01-04 18:08       ` John Fastabend
2015-12-30 17:51 ` [RFC PATCH 03/12] net: sched: allow qdiscs to handle locking John Fastabend
2015-12-30 17:52 ` [RFC PATCH 04/12] net: sched: provide per cpu qstat helpers John Fastabend
2015-12-30 17:52 ` [RFC PATCH 05/12] net: sched: per cpu gso handlers John Fastabend
2015-12-30 20:26   ` Jesper Dangaard Brouer
2015-12-30 20:42     ` John Fastabend
2015-12-30 17:53 ` John Fastabend [this message]
2016-01-01  2:30   ` [RFC PATCH 06/12] net: sched: support qdisc_reset on NOLOCK qdisc Alexei Starovoitov
2016-01-03 19:37     ` John Fastabend
2016-01-13 16:20   ` David Miller
2016-01-13 18:03     ` John Fastabend
2016-01-15 19:44       ` David Miller
2015-12-30 17:53 ` [RFC PATCH 07/12] net: sched: qdisc_qlen for per cpu logic John Fastabend
2015-12-30 17:53 ` [RFC PATCH 08/12] net: sched: a dflt qdisc may be used with per cpu stats John Fastabend
2015-12-30 17:54 ` [RFC PATCH 09/12] net: sched: pfifo_fast use alf_queue John Fastabend
2016-01-13 16:24   ` David Miller
2016-01-13 18:18     ` John Fastabend
2015-12-30 17:54 ` [RFC PATCH 10/12] net: sched: helper to sum qlen John Fastabend
2015-12-30 17:55 ` [RFC PATCH 11/12] net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mq John Fastabend
2015-12-30 17:55 ` [RFC PATCH 12/12] net: sched: pfifo_fast new option to deque multiple pkts John Fastabend
2015-12-30 18:13   ` John Fastabend
2016-01-06 13:14 ` [RFC PATCH 00/12] drop the qdisc lock for pfifo_fast/mq Jamal Hadi Salim
2016-01-07 23:30   ` John Fastabend

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:9aeb51f dfblob:134fb95 )
 OR (
bs:"[RFC PATCH 06/12] net: sched: support qdisc_reset on NOLOCK qdisc" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151230175313.26257.46445.stgit@john-Precision-Tower-5810 \
    --to=john.fastabend@gmail.com \
    --cc=aduyck@mirantis.com \
    --cc=brouer@redhat.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=jhs@mojatatu.com \
    --cc=john.r.fastabend@intel.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox