From mboxrd@z Thu Jan 1 00:00:00 1970 From: David Miller Subject: Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). Date: Mon, 11 Aug 2008 22:40:47 -0700 (PDT) Message-ID: <20080811.224047.154563272.davem@davemloft.net> References: <20080811205357.GA15293@ami.dom.local> <20080811.181235.141399855.davem@davemloft.net> <20080812052048.GA4291@ff.dom.local> Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: netdev@vger.kernel.org To: jarkao2@gmail.com Return-path: Received: from 74-93-104-97-Washington.hfc.comcastbusiness.net ([74.93.104.97]:52842 "EHLO sunset.davemloft.net" rhost-flags-OK-FAIL-OK-OK) by vger.kernel.org with ESMTP id S1750878AbYHLFkq (ORCPT ); Tue, 12 Aug 2008 01:40:46 -0400 In-Reply-To: <20080812052048.GA4291@ff.dom.local> Sender: netdev-owner@vger.kernel.org List-ID: From: Jarek Poplawski Date: Tue, 12 Aug 2008 05:20:48 +0000 > On Mon, Aug 11, 2008 at 06:12:35PM -0700, David Miller wrote: > > From: Jarek Poplawski > > Date: Mon, 11 Aug 2008 22:53:57 +0200 > > > > > pkt_sched: Destroy gen estimators under rtnl_lock(). > > > > > > gen_kill_estimator() requires rtnl_lock() protection, and since it is > > > called in qdisc ->destroy() too, this has to go back from RCU callback > > > to qdisc_destroy(). > > > > > > Signed-off-by: Jarek Poplawski > > > > We can't do this. And at a minimum, the final ->reset() must > > occur in the RCU callback, otherwise asynchronous threads of > > execution could queue packets into this dying qdisc and > > such packets would leak forever. > > Could you explain this more? I've thought this synchronize_rcu() is > just to prevent this (and what these comments talk about?): Those comments are out of date and I need to update them. In fact this whole loop is now largely pointless. The rcu_dereference() on dev_queue->qdisc happens before the QDISC_RUNNING bit is set. We no longer resample the qdisc under any kind of lock. Because we no longer have a top-level lock that synchronizes the setting of dev_queue->qdisc Rather, the lock we use for calling ->enqueue() and ->dequeue() is inside of the root qdisc itself. That's why all of the real destruction has to occur in the RCU handler. Anyways, this is part of the problem I think is causing the crash the Intel folks are triggering. We sample the qdisc in dev_queue_xmit() or wherever, then we attach that to the per-cpu ->output_queue to process it via qdisc_run() in the software interrupt handler. The RCU quiesce period extends to the next scheduling point and this is enough if we do normal direct softirq processing of this qdisc. But if it gets postponed into ksoftirqd... the RCU will pass too early. I'm still thinking about how to fix this without avoiding RCU and without adding new synchronization primitives.