From mboxrd@z Thu Jan 1 00:00:00 1970 From: Asias He Subject: Re: [RFC PATCH 2/5] block: Do not stop draining if waitqueue is not empty. Date: Fri, 25 May 2012 09:16:47 +0800 Message-ID: <4FBEDD7F.4090300@redhat.com> References: <1337591313-26333-1-git-send-email-asias@redhat.com> <1337591313-26333-2-git-send-email-asias@redhat.com> <20120521153922.GA6549@google.com> <4FBB36D7.9030202@redhat.com> <20120522150742.GA14339@google.com> <4FBCFA13.5030401@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: Jens Axboe , linux-fsdevel@vger.kernel.org To: Tejun Heo Return-path: Received: from mx1.redhat.com ([209.132.183.28]:47533 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752161Ab2EYBPz (ORCPT ); Thu, 24 May 2012 21:15:55 -0400 In-Reply-To: <4FBCFA13.5030401@redhat.com> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: Hi, Tejun and Jens On 05/23/2012 10:54 PM, Asias He wrote: > On 05/22/2012 11:07 PM, Tejun Heo wrote: >> Hello, Asias. >> >> On Tue, May 22, 2012 at 02:48:55PM +0800, Asias He wrote: >>> I actually saw this happened though it should not happen. I have no >>> idea why this happens. Maybe unbalanced prepare_to_wait_exclusive() >>> in get_request_wait() and wake_up() in __freed_request()? >> >> Hmm.... unbalanced how? I think bugs in this area are much more >> likely to show up as live queue hang rather than issues during queue >> shutdown. > > I added some debug code to count the number of sleep and wakeup in > get_request_wait() and __freed_request(). > > I found this after queue cleanup. rl->wait[] is not empty while > rl->count[] == 0. There are exactly nr_sleep - nr_wakeup of process in D > state. So missed wakeup happens? Any ideas to do more debug to find the > root-cause? Ping. > [ 52.917115] ---> nr_sleep=1046, nr_wakeup=873, delta=173 > > $ vmstat 1 > 1 173 0 712640 24292 96172 0 0 0 0 419 757 0 0 0 100 0 > 0 173 0 712764 24292 96180 0 0 0 0 472 725 0 1 0 97 2 > > >>> With this happened, I saw some fio threads in D state which are >>> sleeping on get_request_wait(). If I wake up the threads in the wait >>> queue in q->abort_queue_fn() callback which i proposed in the 1/5 of >>> this patch set, the queue cleanup and thus hot-unplug went pretty >>> well. (Passed 3000~ rounds of test, without this 2~ round of test >>> would fail). See this patch [RFC PATCH 4/5] virtio-blk: Use >>> q->abort_queue_fn() to abort. >> >> If the problem is that easily reproducible (you mean that you can >> reproduce it every other time, right?), it would be immensely helpful >> if you can root cause the issue properly. As it currently stands, >> this series seems to work around the problem by adding extra API >> without properly root-causing it. Workarounds without proper >> root-causing are already pretty bad and adding extra API for that is >> rather silly, IMHO. > > Yes. it is very easy to reproduce. /me Trying to figure the root cause out. > -- Asias