From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755877AbXD0O2q (ORCPT ); Fri, 27 Apr 2007 10:28:46 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1755907AbXD0O2q (ORCPT ); Fri, 27 Apr 2007 10:28:46 -0400 Received: from omta01sl.mx.bigpond.com ([144.140.92.153]:16488 "EHLO omta01sl.mx.bigpond.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755877AbXD0O2o (ORCPT ); Fri, 27 Apr 2007 10:28:44 -0400 Message-ID: <4632088C.8000509@bigpond.net.au> Date: Sat, 28 Apr 2007 00:28:28 +1000 From: Peter Williams User-Agent: Thunderbird 1.5.0.10 (X11/20070302) MIME-Version: 1.0 To: Neil Horman , Linus Torvalds , jgarzik@pobox.com CC: Linux Kernel Mailing List Subject: Re: Linux-2.6.21 hangs during post boot initialization phase References: <463166C2.6060106@bigpond.net.au> <46319297.905@bigpond.net.au> <20070427122247.GA19017@hmsreliant.homelinux.net> In-Reply-To: <20070427122247.GA19017@hmsreliant.homelinux.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Authentication-Info: Submitted using SMTP AUTH PLAIN at oaamta01sl.mx.bigpond.com from [138.130.231.4] using ID pwil3058@bigpond.net.au at Fri, 27 Apr 2007 14:28:41 +0000 Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Neil Horman wrote: > On Fri, Apr 27, 2007 at 04:05:11PM +1000, Peter Williams wrote: >> Linus Torvalds wrote: >>> On Fri, 27 Apr 2007, Peter Williams wrote: >>>> The 2.6.21 kernel is hanging during the post boot phase where various >>>> daemons >>>> are being started (not always the same daemon unfortunately). >>>> >>>> This problem was not present in 2.6.21-rc7 and there is no oops or other >>>> unusual output in the system log at the time the hang occurs. >>> Can you use "git bisect" to narrow it down a bit more? It's only 125 >>> commits, so bisecting even just three or four kernels will narrow it down >>> to a handful. >> As the changes became, smaller the builds became quicker :-) and after 7 >> iterations we have: >> >> >> author Neil Horman >> Fri, 20 Apr 2007 13:54:58 +0000 (09:54 -0400) >> committer Jeff Garzik >> Tue, 24 Apr 2007 16:43:07 +0000 (12:43 -0400) >> commit b748d9e3b80dc7e6ce6bf7399f57964b99a4104c >> tree 887909e1f735bb444ef0e3e370f34401fa6eee02 tree | snapshot >> parent d91c088b39e3c66d309938de858775bb90fd1ead commit | diff >> sis900: Allocate rx replacement buffer before rx operation >> >> The sis900 driver appears to have a bug in which the receive routine >> passes the skbuff holding the received frame to the network stack before >> refilling the buffer in the rx ring. If a new skbuff cannot be >> allocated, the >> driver simply leaves a hole in the rx ring, which causes the driver to stop >> receiving frames and become non-recoverable without an rmmod/insmod >> according to >> reporters. This patch reverses that order, attempting to allocate a >> replacement >> buffer first, and receiving the new frame only if one can be allocated. >> If no >> skbuff can be allocated, the current skbuf in the rx ring is recycled, >> dropping >> the current frame, but keeping the NIC operational. >> >> Signed-off-by: Neil Horman >> Signed-off-by: Jeff Garzik >> >> Peter >> -- >> Peter Williams pwil3058@bigpond.net.au >> >> "Learning, n. The kind of ignorance distinguishing the studious." >> -- Ambrose Bierce > > This was reported to me last night, and I've posted a patch to fix it, its > available here: > http://marc.info/?l=linux-netdev&m=117761259222165&w=2 > > It applies on top of the previous patch, and should fix your problem. > > Here's a copy of the patch > > Thanks & Regards > Neil > > > diff --git a/drivers/net/sis900.c b/drivers/net/sis900.c > index a6a0f09..7e44939 100644 > --- a/drivers/net/sis900.c > +++ b/drivers/net/sis900.c > @@ -1754,6 +1754,7 @@ static int sis900_rx(struct net_device *net_dev) > sis_priv->rx_ring[entry].cmdsts = RX_BUF_SIZE; > } else { > struct sk_buff * skb; > + struct sk_buff * rx_skb; > > pci_unmap_single(sis_priv->pci_dev, > sis_priv->rx_ring[entry].bufptr, RX_BUF_SIZE, > @@ -1787,10 +1788,10 @@ static int sis900_rx(struct net_device *net_dev) > } > > /* give the socket buffer to upper layers */ > - skb = sis_priv->rx_skbuff[entry]; > - skb_put(skb, rx_size); > - skb->protocol = eth_type_trans(skb, net_dev); > - netif_rx(skb); > + rx_skb = sis_priv->rx_skbuff[entry]; > + skb_put(rx_skb, rx_size); > + skb->protocol = eth_type_trans(rx_skb, net_dev); > + netif_rx(rx_skb); > > /* some network statistics */ > if ((rx_status & BCAST) == MCAST) This patch fixes the problem for me. Peter -- Peter Williams pwil3058@bigpond.net.au "Learning, n. The kind of ignorance distinguishing the studious." -- Ambrose Bierce