netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: John Fastabend <john.fastabend@gmail.com>
To: eric.dumazet@gmail.com, jhs@mojatatu.com, davem@davemloft.net,
	brouer@redhat.com, xiyou.wangcong@gmail.com,
	alexei.starovoitov@gmail.com
Cc: john.r.fastabend@intel.com, netdev@vger.kernel.org,
	john.fastabend@gmail.com
Subject: [net-next PATCH 08/15] net: sched: support qdisc_reset on NOLOCK qdisc
Date: Tue, 23 Aug 2016 13:26:09 -0700	[thread overview]
Message-ID: <20160823202609.14368.14098.stgit@john-Precision-Tower-5810> (raw)
In-Reply-To: <20160823202135.14368.62466.stgit@john-Precision-Tower-5810>

The qdisc_reset operation depends on the qdisc lock at the moment
to halt any additions to gso_skb and statistics while the list is
free'd and the stats zeroed.

Without the qdisc lock we can not guarantee another cpu is not in
the process of adding a skb to one of the "cells".

To resolve the dev_deactivate sequence that can come from a user
bringing the interface down which causes the gso_skb list to be
flushed and the qlen zero'd. At the moment this is protected by the
qdisc lock so while we clear the qlen/gso_skb fields we are guaranteed
no new skbs are added. For the lockless case though this is not true.
To resolve this move the qdisc_reset call after the new qdisc is
assigned and a grace period is exercised to ensure no new skbs can be
enqueued. Further the RTNL lock is held so we can not get another call
to activate the qdisc while the skb lists are being free'd.

Finally, fix qdisc_reset to handle the per cpu stats and skb lists.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 net/sched/sch_generic.c |   41 +++++++++++++++++++++++++++++++++++------
 1 file changed, 35 insertions(+), 6 deletions(-)

diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 112d029..fd4a2b9 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -738,6 +738,20 @@ void qdisc_reset(struct Qdisc *qdisc)
 	kfree_skb(qdisc->skb_bad_txq);
 	qdisc->skb_bad_txq = NULL;
 
+	if (qdisc->gso_cpu_skb) {
+		int i;
+
+		for_each_possible_cpu(i) {
+			struct gso_cell *cell;
+
+			cell = per_cpu_ptr(qdisc->gso_cpu_skb, i);
+			if (cell) {
+				kfree_skb_list(cell->skb);
+				cell->skb = NULL;
+			}
+		}
+	}
+
 	if (qdisc->gso_skb) {
 		kfree_skb_list(qdisc->gso_skb);
 		qdisc->gso_skb = NULL;
@@ -926,7 +940,6 @@ static void dev_deactivate_queue(struct net_device *dev,
 			set_bit(__QDISC_STATE_DEACTIVATED, &qdisc->state);
 
 		rcu_assign_pointer(dev_queue->qdisc, qdisc_default);
-		qdisc_reset(qdisc);
 
 		spin_unlock_bh(qdisc_lock(qdisc));
 	}
@@ -963,6 +976,16 @@ static bool some_qdisc_is_busy(struct net_device *dev)
 	return false;
 }
 
+static void dev_qdisc_reset(struct net_device *dev,
+			    struct netdev_queue *dev_queue,
+			    void *none)
+{
+	struct Qdisc *qdisc = dev_queue->qdisc_sleeping;
+
+	if (qdisc)
+		qdisc_reset(qdisc);
+}
+
 /**
  * 	dev_deactivate_many - deactivate transmissions on several devices
  * 	@head: list of devices to deactivate
@@ -973,7 +996,6 @@ static bool some_qdisc_is_busy(struct net_device *dev)
 void dev_deactivate_many(struct list_head *head)
 {
 	struct net_device *dev;
-	bool sync_needed = false;
 
 	list_for_each_entry(dev, head, close_list) {
 		netdev_for_each_tx_queue(dev, dev_deactivate_queue,
@@ -983,20 +1005,27 @@ void dev_deactivate_many(struct list_head *head)
 					     &noop_qdisc);
 
 		dev_watchdog_down(dev);
-		sync_needed |= !dev->dismantle;
 	}
 
 	/* Wait for outstanding qdisc-less dev_queue_xmit calls.
 	 * This is avoided if all devices are in dismantle phase :
 	 * Caller will call synchronize_net() for us
 	 */
-	if (sync_needed)
-		synchronize_net();
+	synchronize_net();
 
 	/* Wait for outstanding qdisc_run calls. */
-	list_for_each_entry(dev, head, close_list)
+	list_for_each_entry(dev, head, close_list) {
 		while (some_qdisc_is_busy(dev))
 			yield();
+
+		/* The new qdisc is assigned at this point so we can safely
+		 * unwind stale skb lists and qdisc statistics
+		 */
+		netdev_for_each_tx_queue(dev, dev_qdisc_reset, NULL);
+		if (dev_ingress_queue(dev))
+			dev_qdisc_reset(dev, dev_ingress_queue(dev), NULL);
+	}
+
 }
 
 void dev_deactivate(struct net_device *dev)

  parent reply	other threads:[~2016-08-23 20:26 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-23 20:22 [net-next PATCH 00/15] support lockless qdisc John Fastabend
2016-08-23 20:22 ` [net-next PATCH 01/15] net: sched: cleanup qdisc_run and __qdisc_run semantics John Fastabend
2016-08-23 20:38   ` Eric Dumazet
2016-08-23 20:23 ` [net-next PATCH 02/15] net: sched: allow qdiscs to handle locking John Fastabend
2016-08-23 21:08   ` Eric Dumazet
2016-08-23 22:32     ` John Fastabend
2016-08-23 20:23 ` [net-next PATCH 03/15] net: sched: remove remaining uses for qdisc_qlen in xmit path John Fastabend
2016-08-23 21:10   ` Eric Dumazet
2016-08-23 20:24 ` [net-next PATCH 04/15] net: sched: provide per cpu qstat helpers John Fastabend
2016-08-23 23:25   ` Eric Dumazet
2016-08-23 23:50     ` John Fastabend
2016-08-23 20:24 ` [net-next PATCH 05/15] net: sched: a dflt qdisc may be used with per cpu stats John Fastabend
2016-08-24 16:29   ` Eric Dumazet
2016-08-24 16:41   ` Eric Dumazet
2016-08-24 17:13     ` John Fastabend
2016-08-24 17:26       ` Eric Dumazet
2016-08-24 17:50         ` John Fastabend
2016-08-24 19:08           ` Eric Dumazet
2016-08-23 20:25 ` [net-next PATCH 06/15] net: sched: per cpu gso handlers John Fastabend
2016-08-23 20:25 ` [net-next PATCH 07/15] net: sched: drop qdisc_reset from dev_graft_qdisc John Fastabend
2016-08-23 20:26 ` John Fastabend [this message]
2016-08-23 20:26 ` [net-next PATCH 09/15] net: sched: support skb_bad_tx with lockless qdisc John Fastabend
2016-08-23 20:26 ` [net-next PATCH 10/15] net: sched: qdisc_qlen for per cpu logic John Fastabend
2016-08-23 20:27 ` [net-next PATCH 11/15] net: sched: helper to sum qlen John Fastabend
2016-08-23 20:27 ` [net-next PATCH 12/15] net: sched: lockless support for netif_schedule John Fastabend
2016-09-07 14:50   ` John Fastabend
2016-08-23 20:28 ` [net-next PATCH 13/15] net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mq John Fastabend
2016-08-23 20:28 ` [net-next PATCH 14/15] net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mqprio John Fastabend
2016-08-23 20:28 ` [net-next PATCH 15/15] net: sched: pfifo_fast use skb_array John Fastabend
2016-09-01  8:26   ` [lkp] [net] c4c75f963d: inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160823202609.14368.14098.stgit@john-Precision-Tower-5810 \
    --to=john.fastabend@gmail.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=brouer@redhat.com \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=jhs@mojatatu.com \
    --cc=john.r.fastabend@intel.com \
    --cc=netdev@vger.kernel.org \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).