From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1752547AbcHKRg6 (ORCPT <rfc822;w@1wt.eu>);
	Thu, 11 Aug 2016 13:36:58 -0400
Received: from mx1.redhat.com ([209.132.183.28]:35946 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1752328AbcHKRg4 (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
	Thu, 11 Aug 2016 13:36:56 -0400
Date: Thu, 11 Aug 2016 19:36:52 +0200
From: Oleg Nesterov <oleg@redhat.com>
To: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
        "mingo@kernel.org" <mingo@kernel.org>,
        Andrew Morton <akpm@linux-foundation.org>,
        Johannes Weiner <hannes@cmpxchg.org>, Neil Brown <neilb@suse.de>,
        Michael Shaver <jmshaver@gmail.com>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] sched: Avoid that __wait_on_bit_lock() hangs
Message-ID: <20160811173651.GA31803@redhat.com>
References: <20160804140938.GB24652@twins.programming.kicks-ass.net>
 <16207b90-2e6c-fe23-1b4b-3763e5cf0384@sandisk.com>
 <20160808102213.GA6879@twins.programming.kicks-ass.net>
 <4091e252-18d9-1795-de63-9fbc678aa6b1@acm.org>
 <20160808162038.GA25927@redhat.com>
 <78fafdc1-d4ae-a9a2-169c-1d456b6e6e41@sandisk.com>
 <20160809171459.GA13840@redhat.com>
 <3cec7657-caa9-92ca-9f0e-34f073a6ed8c@sandisk.com>
 <20160810104555.GA3333@redhat.com>
 <4d2e02f8-c7da-ee1a-1068-25492cbffebe@sandisk.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <4d2e02f8-c7da-ee1a-1068-25492cbffebe@sandisk.com>
User-Agent: Mutt/1.5.24 (2015-08-30)
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]); Thu, 11 Aug 2016 17:36:55 +0000 (UTC)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

Hi Bart,

On 08/10, Bart Van Assche wrote:
>
> That's an excellent catch. With your previous patch and this patch applied I
> can't reproduce the hang in truncate_inode_pages_range() anymore.

Great, thanks.

I'll send another debugging patch tomorrow, I was a bit busy today. The next
step is obvious, we need to know the caller.

But just in case, this doesn't necessarily mean that the usage of
__ClearPageLocked() is actually buggy, we don't really know this so far...

And I can't understand another oddity. Your test-case hangs in kill_bdev()
path which sleeps with bdev->bd_openers == 0 under bdev->bd_mutex so it can't
be re-opened. However, since your change in abort_exclusive_wait() helped,
there should be the readers sleeping in lock_killable() and thus bd_openers
can't be zero.

Nevermind, I don't understand this code even remotely, we will see later
who should be asked.

> I still
> see some other wait_on_page_bit() hangs after an I/O error has occurred.
> However, the hangs that I still see are related to waiting on buffer head
> state changes and not on the PG_locked page flag.

I don't know if this is right or not... lets discuss this later.

Thanks!

Oleg.