From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alexei Starovoitov Subject: Re: [RFC PATCH 06/12] net: sched: support qdisc_reset on NOLOCK qdisc Date: Thu, 31 Dec 2015 18:30:04 -0800 Message-ID: <20160101023003.GA17287@ast-mbp.thefacebook.com> References: <20151230175000.26257.41532.stgit@john-Precision-Tower-5810> <20151230175313.26257.46445.stgit@john-Precision-Tower-5810> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: daniel@iogearbox.net, eric.dumazet@gmail.com, jhs@mojatatu.com, aduyck@mirantis.com, brouer@redhat.com, davem@davemloft.net, john.r.fastabend@intel.com, netdev@vger.kernel.org To: John Fastabend Return-path: Received: from mail-pa0-f42.google.com ([209.85.220.42]:36548 "EHLO mail-pa0-f42.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751008AbcAACaH (ORCPT ); Thu, 31 Dec 2015 21:30:07 -0500 Received: by mail-pa0-f42.google.com with SMTP id yy13so66328330pab.3 for ; Thu, 31 Dec 2015 18:30:06 -0800 (PST) Content-Disposition: inline In-Reply-To: <20151230175313.26257.46445.stgit@john-Precision-Tower-5810> Sender: netdev-owner@vger.kernel.org List-ID: On Wed, Dec 30, 2015 at 09:53:13AM -0800, John Fastabend wrote: > The qdisc_reset operation depends on the qdisc lock at the moment > to halt any additions to gso_skb and statistics while the list is > free'd and the stats zeroed. > > Without the qdisc lock we can not guarantee another cpu is not in > the process of adding a skb to one of the "cells". Here are the > two cases we have to handle. > > case 1: qdisc_graft operation. In this case a "new" qdisc is attached > and the 'qdisc_destroy' operation is called on the old qdisc. > The destroy operation will wait a rcu grace period and call > qdisc_rcu_free(). At which point gso_cpu_skb is free'd along > with all stats so no need to zero stats and gso_cpu_skb from > the reset operation itself. > > Because we can not continue to call qdisc_reset before waiting > an rcu grace period so that the qdisc is detached from all > cpus simply do not call qdisc_reset() at all and let the > qdisc_destroy operation clean up the qdisc. Note, a refcnt > greater than 1 would cause the destroy operation to be > aborted however if this ever happened the reference to the > qdisc would be lost and we would have a memory leak. > > case 2: dev_deactivate sequence. This can come from a user bringing > the interface down which causes the gso_skb list to be flushed > and the qlen zero'd. At the moment this is protected by the > qdisc lock so while we clear the qlen/gso_skb fields we are > guaranteed no new skbs are added. For the lockless case > though this is not true. To resolve this move the qdisc_reset > call after the new qdisc is assigned and a grace period is > exercised to ensure no new skbs can be enqueued. Further > the RTNL lock is held so we can not get another call to > activate the qdisc while the skb lists are being free'd. > > Finally, fix qdisc_reset to handle the per cpu stats and > skb lists. > > Signed-off-by: John Fastabend ... > - /* Prune old scheduler */ > - if (oqdisc && atomic_read(&oqdisc->refcnt) <= 1) > - qdisc_reset(oqdisc); > - ... > - sync_needed |= !dev->dismantle; > + sync_needed = true; I think killing above <=1 check and forcing synchronize_net() will make qdisc destruction more reliable than it's right now. Your commit log sounds too pessimistic :) btw, sync_needed variable can be removed as well. All other patches look good. Great stuff overall!