From mboxrd@z Thu Jan 1 00:00:00 1970 From: David Vrabel Subject: Re: netback Oops then xenwatch stuck in D state Date: Thu, 14 Feb 2013 12:47:29 +0000 Message-ID: <511CDCE1.8040409@citrix.com> References: <510C3AA3.2090508@theshore.net> <50E3A390-C52B-476A-8B20-BADBA42F3775@theshore.net> <51181924.4050500@theshore.net> <1360583103.16636.29.camel@zion.uk.xensource.com> <1360663133.20449.123.camel@zakaz.uk.xensource.com> <511AFFC9.3050404@theshore.net> <1360779868.16636.92.camel@zion.uk.xensource.com> <1360780669.16636.94.camel@zion.uk.xensource.com> <511BE780.9000707@cantab.net> <20130213201725.GA1453@zion.uk.xensource.com> <511CB86702000078000BE1C5@nat28.tlf.novell.com> <1360840843.16636.109.camel@zion.uk.xensource.com> <511CDD2202000078000BE2F3@nat28.tlf.novell.com> <511CD69B.6050900@citrix.com> <1360844950.16636.123.camel@zion.uk.xensource.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1360844950.16636.123.camel@zion.uk.xensource.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Wei Liu Cc: Ian Campbell , Jan Beulich , "xen-devel@lists.xen.org" List-Id: xen-devel@lists.xenproject.org On 14/02/13 12:29, Wei Liu wrote: > On Thu, 2013-02-14 at 12:20 +0000, David Vrabel wrote: >> On 14/02/13 11:48, Jan Beulich wrote: >>>>>> On 14.02.13 at 12:20, Wei Liu wrote: >>> >>>> If this is a bug, and, if my previous patch fixes Christopher's OOPS, he >>>> will hit this bug soon when shutting down DomU. >>> >>> I don't think this patch will fix his problems, which - as described >>> yesterday - I'm relatively certain result from the harsh action >>> netbk_fatal_tx_err() does. >> >> I can't see anything broken in netbk_fatal_tx_err(). >> >> However, a call to netbk_fatal_tx_err() may result in the vif's ref >> count going to 1 which means a simutaneous attempt to shutdown the vif >> will free the net device. > >> Netback thread Xenwatch thread >> >> netbk_fatal_tx_err() netback_remove() >> xenvif_disconnect() >> ... >> free_netdev() >> netbk_tx_err() Oops! >> > > This is not a problem. Reading comments and code of the commit, > netbk_fatal_tx_err shuts down the vif entirely (at the moment the timer > is not handled though) which should make sure it will never get > scheduled again, so in practice it will never hit netbk_tx_err. Without the fix to the error paths of netbk_count_requests(), then if it returned 0 netbk_tx_err() may be called. e.g., if txreq.size < ETH_HLEN. netbk_fatal_tx_err() should call del_timer_sync() on the credit timer (vif->credit_timeout) as well, otherwise it may fire and attempt to reschedule the vif, which will then oops as vif->netbk == NULL. David