From mboxrd@z Thu Jan 1 00:00:00 1970 From: Julien Grall Subject: Re: rcu_sched self-detect stall when disable vif device Date: Wed, 28 Jan 2015 16:45:29 +0000 Message-ID: <54C91229.8090104@linaro.org> References: <54C7B6E8.9080106@linaro.org> <20150127164539.GJ24026@zion.uk.xensource.com> <54C7C131.9030502@linaro.org> <20150127165312.GK24026@zion.uk.xensource.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20150127165312.GK24026@zion.uk.xensource.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Wei Liu Cc: David Vrabel , Ian Campbell , xen-devel List-Id: xen-devel@lists.xenproject.org On 27/01/15 16:53, Wei Liu wrote: > On Tue, Jan 27, 2015 at 04:47:45PM +0000, Julien Grall wrote: >> On 27/01/15 16:45, Wei Liu wrote: >>> On Tue, Jan 27, 2015 at 04:03:52PM +0000, Julien Grall wrote: >>>> Hi, >>>> >>>> While I'm working on support for 64K page in netfront, I got >>>> an rcu_sced self-detect message. It happens when netback is >>>> disabling the vif device due to an error. >>>> >>>> I'm using Linux 3.19-rc5 on seattle (ARM64). Any idea why >>>> the processor is stucked in xenvif_rx_queue_purge? >>>> >>> >>> When you try to release a SKB, core network driver need to enter some >>> RCU cirital region to clean up. dst_release for one, calls call_rcu. >> >> But this message shouldn't happen in normal condition or because of >> netfront. Right? >> > > Never saw report like this before, even in the case that netfront is > buggy. This is only happening when preemption is not enabled (i.e CONFIG_PREEMPT_NONE in the config file) in the backend kernel. When the vif is disabled, the loop in xenvif_kthread_guest_rx turned into an infinite loop. In my case, the code executed looks like: 1. for (;;) { 2. xenvif_wait_for_rx_work(queue); 3. 4. if (kthread_should_stop()) 5. break; 6. 7. if (unlikely(vif->disabled && queue->id == 0) { 8. xenvif_carrier_off(vif); 9. xenvif_rx_queue_purge(queue); 10. continue; 11. } 12. } The wait on line 2 will return directly because the vif is disabled (see xenvif_have_rx_work) We are on queue 0, so the condition on line 7 is true. Therefore we will loop on line 10. And so on... On platform where preemption is not enabled, this thread will never yield/give the hand to another thread (unless the domain is destroyed). Regards, -- Julien Grall