From mboxrd@z Thu Jan  1 00:00:00 1970
From: Asias He <asias@redhat.com>
Subject: Re: [RFC PATCH 2/5] block: Do not stop draining if waitqueue is not
 empty.
Date: Fri, 25 May 2012 09:16:47 +0800
Message-ID: <4FBEDD7F.4090300@redhat.com>
References: <1337591313-26333-1-git-send-email-asias@redhat.com> <1337591313-26333-2-git-send-email-asias@redhat.com> <20120521153922.GA6549@google.com> <4FBB36D7.9030202@redhat.com> <20120522150742.GA14339@google.com> <4FBCFA13.5030401@redhat.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Cc: Jens Axboe <axboe@kernel.dk>, linux-fsdevel@vger.kernel.org
To: Tejun Heo <tj@kernel.org>
Return-path: <linux-fsdevel-owner@vger.kernel.org>
Received: from mx1.redhat.com ([209.132.183.28]:47533 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1752161Ab2EYBPz (ORCPT <rfc822;linux-fsdevel@vger.kernel.org>);
	Thu, 24 May 2012 21:15:55 -0400
In-Reply-To: <4FBCFA13.5030401@redhat.com>
Sender: linux-fsdevel-owner@vger.kernel.org
List-ID: <linux-fsdevel.vger.kernel.org>

Hi, Tejun and Jens

On 05/23/2012 10:54 PM, Asias He wrote:
> On 05/22/2012 11:07 PM, Tejun Heo wrote:
>> Hello, Asias.
>>
>> On Tue, May 22, 2012 at 02:48:55PM +0800, Asias He wrote:
>>> I actually saw this happened though it should not happen. I have no
>>> idea why this happens. Maybe unbalanced prepare_to_wait_exclusive()
>>> in get_request_wait() and wake_up() in __freed_request()?
>>
>> Hmm.... unbalanced how? I think bugs in this area are much more
>> likely to show up as live queue hang rather than issues during queue
>> shutdown.
>
> I added some debug code to count the number of sleep and wakeup in
> get_request_wait() and __freed_request().
>
> I found this after queue cleanup. rl->wait[] is not empty while
> rl->count[] == 0. There are exactly nr_sleep - nr_wakeup of process in D
> state. So missed wakeup happens? Any ideas to do more debug to find the
> root-cause?

Ping.

> [ 52.917115] ---> nr_sleep=1046, nr_wakeup=873, delta=173
>
> $ vmstat 1
> 1 173 0 712640 24292 96172 0 0 0 0 419 757 0 0 0 100 0
> 0 173 0 712764 24292 96180 0 0 0 0 472 725 0 1 0 97 2
>
>
>>> With this happened, I saw some fio threads in D state which are
>>> sleeping on get_request_wait(). If I wake up the threads in the wait
>>> queue in q->abort_queue_fn() callback which i proposed in the 1/5 of
>>> this patch set, the queue cleanup and thus hot-unplug went pretty
>>> well. (Passed 3000~ rounds of test, without this 2~ round of test
>>> would fail). See this patch [RFC PATCH 4/5] virtio-blk: Use
>>> q->abort_queue_fn() to abort.
>>
>> If the problem is that easily reproducible (you mean that you can
>> reproduce it every other time, right?), it would be immensely helpful
>> if you can root cause the issue properly. As it currently stands,
>> this series seems to work around the problem by adding extra API
>> without properly root-causing it. Workarounds without proper
>> root-causing are already pretty bad and adding extra API for that is
>> rather silly, IMHO.
>
> Yes. it is very easy to reproduce. /me Trying to figure the root cause out.
>
-- 
Asias