From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeff Garzik Subject: Re: NAPI note Date: Tue, 18 Feb 2003 22:14:55 -0500 Sender: netdev-bounce@oss.sgi.com Message-ID: <3E52F6AF.7000004@pobox.com> References: <3E4D66DF.3040800@colorfullife.com> <3E4D8295.2050400@pobox.com> <20030217.185719.28797590.davem@redhat.com> <3E525FD8.1060009@colorfullife.com> <20030218212441.M25195@shell.cyberus.ca> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Cc: Manfred Spraul , "David S. Miller" , zaitcev@redhat.com, jbourne@mtroyal.ab.ca, netdev@oss.sgi.com Return-path: To: jamal In-Reply-To: <20030218212441.M25195@shell.cyberus.ca> Errors-to: netdev-bounce@oss.sgi.com List-Id: netdev.vger.kernel.org jamal wrote: > > On Tue, 18 Feb 2003, Manfred Spraul wrote: > > >>David S. Miller wrote: >> >> >>> From: Jeff Garzik >>> Date: Fri, 14 Feb 2003 18:58:13 -0500 >>> >>> Manfred Spraul wrote: >>> > It seems to be a generic NAPI restriction: >>> > The caller of netif_receive_skb() must not own a spinlock that is >>> > acquired from an interrupt handler. >>> >>> Thanks much for noticing this, Manfred. >>> >>>I think this logic is buggy. >>> >>>In the example I've seen posted, only a NAPI implementation bug >>>could cause the situation to occur. >>> >>>If cpu1 is in ->poll() for the driver, then by definition the >>>device shall not cause interrupts. The device's interrupts >>>are disabled before we enter the ->poll() handler, and as such >>>the "cpu2 take device interrupt and takes driver->lock" cannot >>>occur. >>> >>> >> >>No. I think the rule is that drivers that use the NAPI interface must >>not cause interrupts for packet receive and out-of-rx-buffers conditions. > > > Ah, but that is only one of two rules. > Theres other drivers which dont follow this rule and just shutdown > all interupt sources. I know that the e1000 for example does this. > I am not sure about the tg3. I think the doc says this but may not > emphasize it as strongly. > So if tg3 uses method 2 then its as Dave says - a bug. tg3 shuts down all interrupt sources, and handles all interrupt events in dev->poll(). David and I hashed it out a bit on IRC. The problem is that deliver_to_old_ones() waits, and thus the deadlock that Manfred described. For 2.4.x, the solution is simply to avoid the deadlock in the driver. For 2.5.x, David hinted that deliver_to_old_ones() may be going away. >>But what about media error interrupts, or tx interrupts? Or MIB counter >>overflow, etc. What about shared pci interrupts? > > > Shared interupts should be interesting actually. > However if you are in poll mode and you receive an interupt you should be > able to quickly determine its not yours without much effect on shared > locks, no? Normally, yes. However tg3 grabs a lock just about anytime it does anything. ;-) A long term project of mine is to slowly remove these locks, but that must wait until the driver stabilizes, and is overall a long process. Most of the locks _are_ removeable, but we keep hit deadlock bugs like this, and hardware bugs which need workarounds, so those come first. >>All of them could occur, and if they take a spinlock that is held across >>netif_receive_skb(), then it can deadlock. >> > > > yes this could happen with method 1 of programming the driver; however, > tx, receive, link are essentially separate threads and would hardly share > locks. They do in tg3's case. The locks can be removed eventually, but such is the state of life right now. Jeff