From mboxrd@z Thu Jan  1 00:00:00 1970
From: David Vrabel <david.vrabel@citrix.com>
Subject: Re: rcu_sched self-detect stall when disable vif device
Date: Wed, 28 Jan 2015 17:06:26 +0000
Message-ID: <54C91712.3020806@citrix.com>
References: <54C7B6E8.9080106@linaro.org>
	<20150127164539.GJ24026@zion.uk.xensource.com>
	<54C7C131.9030502@linaro.org>
	<20150127165312.GK24026@zion.uk.xensource.com>
	<54C91229.8090104@linaro.org>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
In-Reply-To: <54C91229.8090104@linaro.org>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Julien Grall <julien.grall@linaro.org>, Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>, xen-devel <xen-devel@lists.xen.org>
List-Id: xen-devel@lists.xenproject.org

On 28/01/15 16:45, Julien Grall wrote:
> On 27/01/15 16:53, Wei Liu wrote:
>> On Tue, Jan 27, 2015 at 04:47:45PM +0000, Julien Grall wrote:
>>> On 27/01/15 16:45, Wei Liu wrote:
>>>> On Tue, Jan 27, 2015 at 04:03:52PM +0000, Julien Grall wrote:
>>>>> Hi,
>>>>>
>>>>> While I'm working on support for 64K page in netfront, I got
>>>>> an rcu_sced self-detect message. It happens when netback is
>>>>> disabling the vif device due to an error.
>>>>>
>>>>> I'm using Linux 3.19-rc5 on seattle (ARM64). Any idea why
>>>>> the processor is stucked in xenvif_rx_queue_purge?
>>>>>
>>>>
>>>> When you try to release a SKB, core network driver need to enter some
>>>> RCU cirital region to clean up. dst_release for one, calls call_rcu.
>>>
>>> But this message shouldn't happen in normal condition or because of
>>> netfront. Right?
>>>
>>
>> Never saw  report like this before, even in the case that netfront is
>> buggy.
> 
> This is only happening when preemption is not enabled (i.e
> CONFIG_PREEMPT_NONE in the config file) in the backend kernel.
> 
> When the vif is disabled, the loop in xenvif_kthread_guest_rx turned
> into an infinite loop. In my case, the code executed looks like:
> 
> 
>  1. for (;;) {
>  2. 	xenvif_wait_for_rx_work(queue);
>  3.
>  4.	if (kthread_should_stop())
>  5.         break;
>  6.
>  7.	if (unlikely(vif->disabled && queue->id == 0) {
>  8.		xenvif_carrier_off(vif);
>  9.		xenvif_rx_queue_purge(queue);
> 10.		continue;
> 11.	}
> 12. }
> 
> The wait on line 2 will return directly because the vif is disabled
> (see xenvif_have_rx_work)
> 
> We are on queue 0, so the condition on line 7 is true. Therefore we will
> loop on line 10. And so on...
> 
> On platform where preemption is not enabled, this thread will never
> yield/give the hand to another thread (unless the domain is destroyed).

I'm not sure why we have a continue in the vif->disabled case and not
just a break.  Can you try that?

David