From: Jiri Pirko <jiri@resnulli.us>
To: John Fastabend <john.fastabend@gmail.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>,
David Miller <davem@davemloft.net>,
Jakub Kicinski <kubakici@wp.pl>,
Linux Kernel Network Developers <netdev@vger.kernel.org>,
Eric Dumazet <eric.dumazet@gmail.com>
Subject: Re: [PATCH] net: Revert "net_sched: no need to free qdisc in RCU callback"
Date: Thu, 21 Dec 2017 09:39:08 +0100 [thread overview]
Message-ID: <20171221083908.GA1930@nanopsycho> (raw)
In-Reply-To: <3c7d356b-8293-9e60-ede0-a92188296867@gmail.com>
Thu, Dec 21, 2017 at 12:34:05AM CET, john.fastabend@gmail.com wrote:
>On 12/20/2017 03:23 PM, Cong Wang wrote:
>> On Wed, Dec 20, 2017 at 3:05 PM, John Fastabend
>> <john.fastabend@gmail.com> wrote:
>>> On 12/20/2017 02:41 PM, Cong Wang wrote:
>>>> On Wed, Dec 20, 2017 at 12:09 PM, John Fastabend
>>>> <john.fastabend@gmail.com> wrote:
>>>>> RCU grace period is needed for lockless qdiscs added in the commit
>>>>> c5ad119fb6c09 ("net: sched: pfifo_fast use skb_array").
>>>>>
>>>>> It is needed now that qdiscs may be lockless otherwise we risk
>>>>> free'ing a qdisc that is still in use from datapath. Additionally,
>>>>> push list cleanup into RCU callback. Otherwise we risk the datapath
>>>>> adding skbs during removal.
>>>>
>>>> What about qdisc_graft() -> dev_deactivate() -> synchronize_net() ?
>>>> It doesn't work with your "lockless" patches?
>>>>
>>>
>>> Well this is only in the 'parent == NULL' case otherwise we call
>>> cops->graft(). Most sch_* seem to use qdisc_replace and this uses
>>> sch_tree_lock().
>>>
>>> The only converted qdisc mq and mqprio at this point don't care
>>> though and do their own dev_deactivate/activate. So its not fixing
>>> anything in the above mentioned commit.
>>
>> Sure, removing a class does not impact the whole device,
>> but removing the root qdisc does.
>>
>> After your "lockless", skb_array_consume_bh() is called in
>> pfifo_fast_reset() and ptr_ring_cleanup() is called in
>> pfifo_fast_destroy(), assuming skb_array is not buggy, what race
>> do we have here with datapath?
>>
>
>None at the moment.
>
>>
>>>
>>> I still think it will need to be done eventually. If it resolves
>>> the miniq case it seems like a good idea. Although per Jakub's comment
>>> perhaps I pulled too much into the RCU handler.
>>
>> The case Jakub reported is a RCU callback missing a rcu
>> barrier. I don't understand why you keep believing it is RCU
>> readers on datapath.>
>> Not even to mention ingress is not affected by your "lockless"
>> thing.
>>
>
>I was thinking about the case where we want a lockless qdisc
>with classes. Doing the qdisc destroy after a grace period would
>solve this. Also we could start to cleanup a lot of the locking
>and extra bits around 'running' qdisc and such by doing a clean
>xchg on the qdisc layer. It seems that a dev_activate/deactivate
>just to install a new qdisc is not needed.
>
>Anyways future work. However if it resolves the miniq issue, as
>Jiri indicated, seems like a clean fix. Although Jakub's issue
>with the patch would need to be addressed. Seems he gets a WARN_ON
>if the offload is not disabled but the device is unitialized.
Why just moving qdisc_free to rcu is not enough? It would resolve this
issue and also avoid using synchronize net. Something like:
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 83a3e47..487288e 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -100,6 +100,7 @@ struct Qdisc {
refcount_t refcnt;
spinlock_t busylock ____cacheline_aligned_in_smp;
+ struct rcu_head rcu;
};
static inline void qdisc_refcount_inc(struct Qdisc *qdisc)
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index cd1b200..9beffd1 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -698,8 +698,10 @@ void qdisc_reset(struct Qdisc *qdisc)
}
EXPORT_SYMBOL(qdisc_reset);
-static void qdisc_free(struct Qdisc *qdisc)
+static void qdisc_free_rcu(struct rcu_head *rcu)
{
+ struct Qdisc *qdisc = container_of(rcu, struct Qdisc, rcu);
+
if (qdisc_is_percpu_stats(qdisc)) {
free_percpu(qdisc->cpu_bstats);
free_percpu(qdisc->cpu_qstats);
@@ -732,7 +734,7 @@ void qdisc_destroy(struct Qdisc *qdisc)
kfree_skb_list(qdisc->gso_skb);
kfree_skb(qdisc->skb_bad_txq);
- qdisc_free(qdisc);
+ call_rcu(&qdisc->rcu, qdisc_free_rcu);
}
EXPORT_SYMBOL(qdisc_destroy);
--
2.9.5
next prev parent reply other threads:[~2017-12-21 8:39 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-12-20 20:09 [PATCH] net: Revert "net_sched: no need to free qdisc in RCU callback" John Fastabend
2017-12-20 21:59 ` Jakub Kicinski
2017-12-20 23:40 ` John Fastabend
2017-12-20 22:41 ` Cong Wang
2017-12-20 23:05 ` John Fastabend
2017-12-20 23:23 ` Cong Wang
2017-12-20 23:34 ` John Fastabend
2017-12-21 8:39 ` Jiri Pirko [this message]
2017-12-21 20:59 ` Cong Wang
2017-12-22 9:35 ` Jiri Pirko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171221083908.GA1930@nanopsycho \
--to=jiri@resnulli.us \
--cc=davem@davemloft.net \
--cc=eric.dumazet@gmail.com \
--cc=john.fastabend@gmail.com \
--cc=kubakici@wp.pl \
--cc=netdev@vger.kernel.org \
--cc=xiyou.wangcong@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.