From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Paul E. McKenney" Subject: Re: [BUG] NULL pointer dereference in skb_dequeue Date: Tue, 12 Aug 2008 13:18:58 -0700 Message-ID: <20080812201858.GD6819@linux.vnet.ibm.com> References: <20080810190458.GA7279@ami.dom.local> <20080811100126.GA6401@ff.dom.local> <20080811232657.GQ6762@linux.vnet.ibm.com> <20080812063622.GA5066@ff.dom.local> <20080812134224.GC6909@linux.vnet.ibm.com> <20080812180927.GA3180@ami.dom.local> Reply-To: paulmck@linux.vnet.ibm.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: David Miller , emil.s.tantilov@intel.com, jeffrey.t.kirsher@intel.com, netdev@vger.kernel.org To: Jarek Poplawski Return-path: Received: from e6.ny.us.ibm.com ([32.97.182.146]:57088 "EHLO e6.ny.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750857AbYHLUZa (ORCPT ); Tue, 12 Aug 2008 16:25:30 -0400 Received: from d01relay07.pok.ibm.com (d01relay07.pok.ibm.com [9.56.227.147]) by e6.ny.us.ibm.com (8.13.8/8.13.8) with ESMTP id m7CKRoES006481 for ; Tue, 12 Aug 2008 16:27:50 -0400 Received: from d03av04.boulder.ibm.com (d03av04.boulder.ibm.com [9.17.195.170]) by d01relay07.pok.ibm.com (8.13.8/8.13.8/NCO v9.0) with ESMTP id m7CKP2Ql729282 for ; Tue, 12 Aug 2008 16:25:03 -0400 Received: from d03av04.boulder.ibm.com (loopback [127.0.0.1]) by d03av04.boulder.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id m7CKOx6d030427 for ; Tue, 12 Aug 2008 14:25:02 -0600 Content-Disposition: inline In-Reply-To: <20080812180927.GA3180@ami.dom.local> Sender: netdev-owner@vger.kernel.org List-ID: On Tue, Aug 12, 2008 at 08:09:27PM +0200, Jarek Poplawski wrote: > On Tue, Aug 12, 2008 at 06:42:24AM -0700, Paul E. McKenney wrote: > > On Tue, Aug 12, 2008 at 06:36:22AM +0000, Jarek Poplawski wrote: > ... > > > >From net/sched/sch_generic.c: > > > > > > void __qdisc_run(struct Qdisc *q) > > > { > > > unsigned long start_time = jiffies; > > > > > > while (qdisc_restart(q)) { > > > /* > > > * Postpone processing if > > > * 1. another process needs the CPU; > > > * 2. we've been doing it for too long. > > > */ > > > if (need_resched() || jiffies != start_time) { > > > __netif_schedule(q); > > > > > > This function is run from dev_queue_xmit() (net/core/dev.c) under > > > rcu_read_lock_bh(), and this "q" pointer is passed here for later use > > > (reading) by softirq run net_tx_action(). Alas in net/ RCU primitives > > > are probably omitted in a few places... > > > > If I understand this code, one way to handle it would be to increment > > q->refcnt before passing to netif_schedule(), then decrementing it > > (within an RCU read-side critical section) in the softirq handler. > > > > There are probably other ways to handle this as well. > > I understand this similarly (but I'm still trying to find out what's > wrong with reading this again in a separate read-side section). The usual problem with re-reading in a separate read-side critical section is that someone might have removed/destroyed it in the meantime. Consider the following example: Task 0: rcu_read_lock(); p = rcu_dereference(global_pointer); if (p == NULL) { rcu_read_unlock(); goto somewhere_else; } do_something_with(p); rcu_read_unlock(); do_some_unrelated_stuff(); rcu_read_lock(); do_something_else_with(p); /* BUG!!! */ rcu_read_unlock(); somewhere_else: Task 1: spin_lock(&mylock); p = global_pointer; global_pointer = NULL; spin_unlock(&mylock); synchronize_rcu(); kfree(p); Suppose task 0 picks up the global_pointer just before task 1 NULLs it. Then Task 1's synchronize_rcu() is within its rights to return as soon as task 0 executes its first rcu_read_unlock(). This means that task 1's kfree(p) might happen before task 0's do_something_else_with(p), which could cause general death and destruction. > David gave some additional explanations (which BTW don't look to me > like very "orthodox" RCU) in this thread: > http://marc.info/?l=linux-netdev&m=121851847805942&w=2 It looks to me like Dave believes that there is in fact a problem: http://marc.info/?l=linux-netdev&m=121851965707714&w=2 But if it gets postponed into ksoftirqd... the RCU will pass too early. I'm still thinking about how to fix this without avoiding RCU and without adding new synchronization primitives. The only change to Dave's comment that I would make is to his first paragraph: But if it gets postponed into ksoftirqd or if the kernel has been built with CONFIG_PREEMPT_RCU... the RCU will pass too early. My thought would be to use a reference count as noted earlier, on the grounds that postponing to softirq should be relatively rare. But again I really cannot claim to understand this code. Or am I missing something here? Thanx, Paul > Thanks, > Jarek P. > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html