From mboxrd@z Thu Jan  1 00:00:00 1970
From: Asias He <asias@redhat.com>
Subject: Re: [RFC PATCH 2/5] block: Do not stop draining if waitqueue is not
 empty.
Date: Wed, 23 May 2012 22:54:11 +0800
Message-ID: <4FBCFA13.5030401@redhat.com>
References: <1337591313-26333-1-git-send-email-asias@redhat.com> <1337591313-26333-2-git-send-email-asias@redhat.com> <20120521153922.GA6549@google.com> <4FBB36D7.9030202@redhat.com> <20120522150742.GA14339@google.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Cc: Jens Axboe <axboe@kernel.dk>, linux-fsdevel@vger.kernel.org
To: Tejun Heo <tj@kernel.org>
Return-path: <linux-fsdevel-owner@vger.kernel.org>
Received: from mx1.redhat.com ([209.132.183.28]:18515 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S933502Ab2EWOxU (ORCPT <rfc822;linux-fsdevel@vger.kernel.org>);
	Wed, 23 May 2012 10:53:20 -0400
In-Reply-To: <20120522150742.GA14339@google.com>
Sender: linux-fsdevel-owner@vger.kernel.org
List-ID: <linux-fsdevel.vger.kernel.org>

On 05/22/2012 11:07 PM, Tejun Heo wrote:
> Hello, Asias.
>
> On Tue, May 22, 2012 at 02:48:55PM +0800, Asias He wrote:
>> I actually saw this happened though it should not happen. I have no
>> idea why this happens. Maybe unbalanced prepare_to_wait_exclusive()
>> in get_request_wait() and wake_up() in __freed_request()?
>
> Hmm.... unbalanced how?  I think bugs in this area are much more
> likely to show up as live queue hang rather than issues during queue
> shutdown.

I added some debug code to count the number of sleep and wakeup in 
get_request_wait() and __freed_request().

I found this after queue cleanup. rl->wait[] is not empty while 
rl->count[] == 0. There are exactly nr_sleep - nr_wakeup of process in D 
state. So missed wakeup happens? Any ideas to do more debug to find the 
root-cause?

[   52.917115] ---> nr_sleep=1046, nr_wakeup=873, delta=173

$ vmstat 1
  1 173  0 712640  24292  96172 0 0  0  0  419  757  0  0  0 100  0
  0 173  0 712764  24292  96180 0 0  0  0  472  725  0  1  0 97  2


>> With this happened, I saw some fio threads in D state which are
>> sleeping on get_request_wait(). If I wake up the threads in the wait
>> queue in q->abort_queue_fn() callback which i proposed in the 1/5 of
>> this patch set, the queue cleanup and thus hot-unplug went pretty
>> well. (Passed 3000~ rounds of test, without this 2~ round of test
>> would fail). See this patch [RFC PATCH 4/5] virtio-blk: Use
>> q->abort_queue_fn() to abort.
>
> If the problem is that easily reproducible (you mean that you can
> reproduce it every other time, right?), it would be immensely helpful
> if you can root cause the issue properly.  As it currently stands,
> this series seems to work around the problem by adding extra API
> without properly root-causing it.  Workarounds without proper
> root-causing are already pretty bad and adding extra API for that is
> rather silly, IMHO.

Yes. it is very easy to reproduce. /me Trying to figure the root cause out.

-- 
Asias