From mboxrd@z Thu Jan 1 00:00:00 1970 From: David Miller Subject: [ofa-general] Re: [Bug, PATCH and another Bug] Was: Fix refcounting problem with netif_rx_reschedule() Date: Wed, 19 Sep 2007 09:05:57 -0700 (PDT) Message-ID: <20070919.090557.24612742.davem@davemloft.net> References: <20070919115403.19455.65941.sendpatchset@K50wks273871wss.in.ibm.com> Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: netdev@vger.kernel.org, rdreier@cisco.com, general@lists.openfabrics.org To: krkumar2@in.ibm.com Return-path: In-Reply-To: <20070919115403.19455.65941.sendpatchset@K50wks273871wss.in.ibm.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: general-bounces@lists.openfabrics.org Errors-To: general-bounces@lists.openfabrics.org List-Id: netdev.vger.kernel.org From: Krishna Kumar Date: Wed, 19 Sep 2007 17:24:03 +0530 > Note: during steps F-H and C-E, priv/napi is read/modified by both cpu's > which is another bug relating to the same race. > > I guess the above patch is not required if this bug (in IPoIB) is fixed? The NAPI_STATE_SCHED flag bit should provide all of the necessary synchornization. Only the setter of that bit should add the NAPI instance to the polling list. The polling loop runs atomically on the cpu where the NAPI instance got added to the per-cpu polling list. And therefore decisions to complete NAPI are serialized too. That serialized completion decision is also when the list deletion occurs. I'm starting to suspect the whole problem comes from the resched facility, and now I really don't blame Stephen for trying to delete it. Semantically it really makes things very difficult, especially wrt. to the atomicity of the list handling.