netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jiri Pirko <jiri@resnulli.us>
To: John Fastabend <john.fastabend@gmail.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>,
	David Miller <davem@davemloft.net>,
	Jakub Kicinski <kubakici@wp.pl>,
	Linux Kernel Network Developers <netdev@vger.kernel.org>,
	Eric Dumazet <eric.dumazet@gmail.com>
Subject: Re: [PATCH] net: Revert "net_sched: no need to free qdisc in RCU callback"
Date: Thu, 21 Dec 2017 09:39:08 +0100	[thread overview]
Message-ID: <20171221083908.GA1930@nanopsycho> (raw)
In-Reply-To: <3c7d356b-8293-9e60-ede0-a92188296867@gmail.com>

Thu, Dec 21, 2017 at 12:34:05AM CET, john.fastabend@gmail.com wrote:
>On 12/20/2017 03:23 PM, Cong Wang wrote:
>> On Wed, Dec 20, 2017 at 3:05 PM, John Fastabend
>> <john.fastabend@gmail.com> wrote:
>>> On 12/20/2017 02:41 PM, Cong Wang wrote:
>>>> On Wed, Dec 20, 2017 at 12:09 PM, John Fastabend
>>>> <john.fastabend@gmail.com> wrote:
>>>>> RCU grace period is needed for lockless qdiscs added in the commit
>>>>> c5ad119fb6c09 ("net: sched: pfifo_fast use skb_array").
>>>>>
>>>>> It is needed now that qdiscs may be lockless otherwise we risk
>>>>> free'ing a qdisc that is still in use from datapath. Additionally,
>>>>> push list cleanup into RCU callback. Otherwise we risk the datapath
>>>>> adding skbs during removal.
>>>>
>>>> What about qdisc_graft() -> dev_deactivate() -> synchronize_net() ?
>>>> It doesn't work with your "lockless" patches?
>>>>
>>>
>>> Well this is only in the 'parent == NULL' case otherwise we call
>>> cops->graft(). Most sch_* seem to use qdisc_replace and this uses
>>> sch_tree_lock().
>>>
>>> The only converted qdisc mq and mqprio at this point don't care
>>> though and do their own dev_deactivate/activate. So its not fixing
>>> anything in the above mentioned commit.
>> 
>> Sure, removing a class does not impact the whole device,
>> but removing the root qdisc does.
>> 
>> After your "lockless", skb_array_consume_bh() is called in
>> pfifo_fast_reset() and ptr_ring_cleanup() is called in
>> pfifo_fast_destroy(), assuming skb_array is not buggy, what race
>> do we have here with datapath?
>> 
>
>None at the moment.
>
>> 
>>>
>>> I still think it will need to be done eventually. If it resolves
>>> the miniq case it seems like a good idea. Although per Jakub's comment
>>> perhaps I pulled too much into the RCU handler.
>> 
>> The case Jakub reported is a RCU callback missing a rcu
>> barrier. I don't understand why you keep believing it is RCU
>> readers on datapath.> 
>> Not even to mention ingress is not affected by your "lockless"
>> thing.
>> 
>
>I was thinking about the case where we want a lockless qdisc
>with classes. Doing the qdisc destroy after a grace period would
>solve this. Also we could start to cleanup a lot of the locking
>and extra bits around 'running' qdisc and such by doing a clean
>xchg on the qdisc layer. It seems that a dev_activate/deactivate
>just to install a new qdisc is not needed.
>
>Anyways future work. However if it resolves the miniq issue, as
>Jiri indicated, seems like a clean fix. Although Jakub's issue
>with the patch would need to be addressed. Seems he gets a WARN_ON
>if the offload is not disabled but the device is unitialized.

Why just moving qdisc_free to rcu is not enough? It would resolve this
issue and also avoid using synchronize net. Something like:

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 83a3e47..487288e 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -100,6 +100,7 @@ struct Qdisc {
 	refcount_t		refcnt;
 
 	spinlock_t		busylock ____cacheline_aligned_in_smp;
+	struct rcu_head		rcu;
 };
 
 static inline void qdisc_refcount_inc(struct Qdisc *qdisc)
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index cd1b200..9beffd1 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -698,8 +698,10 @@ void qdisc_reset(struct Qdisc *qdisc)
 }
 EXPORT_SYMBOL(qdisc_reset);
 
-static void qdisc_free(struct Qdisc *qdisc)
+static void qdisc_free_rcu(struct rcu_head *rcu)
 {
+	struct Qdisc *qdisc = container_of(rcu, struct Qdisc, rcu);
+
 	if (qdisc_is_percpu_stats(qdisc)) {
 		free_percpu(qdisc->cpu_bstats);
 		free_percpu(qdisc->cpu_qstats);
@@ -732,7 +734,7 @@ void qdisc_destroy(struct Qdisc *qdisc)
 
 	kfree_skb_list(qdisc->gso_skb);
 	kfree_skb(qdisc->skb_bad_txq);
-	qdisc_free(qdisc);
+	call_rcu(&qdisc->rcu, qdisc_free_rcu);
 }
 EXPORT_SYMBOL(qdisc_destroy);
 
-- 
2.9.5

  reply	other threads:[~2017-12-21  8:39 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-20 20:09 [PATCH] net: Revert "net_sched: no need to free qdisc in RCU callback" John Fastabend
2017-12-20 21:59 ` Jakub Kicinski
2017-12-20 23:40   ` John Fastabend
2017-12-20 22:41 ` Cong Wang
2017-12-20 23:05   ` John Fastabend
2017-12-20 23:23     ` Cong Wang
2017-12-20 23:34       ` John Fastabend
2017-12-21  8:39         ` Jiri Pirko [this message]
2017-12-21 20:59           ` Cong Wang
2017-12-22  9:35             ` Jiri Pirko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171221083908.GA1930@nanopsycho \
    --to=jiri@resnulli.us \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=john.fastabend@gmail.com \
    --cc=kubakici@wp.pl \
    --cc=netdev@vger.kernel.org \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).