From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752547AbcHKRg6 (ORCPT ); Thu, 11 Aug 2016 13:36:58 -0400 Received: from mx1.redhat.com ([209.132.183.28]:35946 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752328AbcHKRg4 (ORCPT ); Thu, 11 Aug 2016 13:36:56 -0400 Date: Thu, 11 Aug 2016 19:36:52 +0200 From: Oleg Nesterov To: Bart Van Assche Cc: Peter Zijlstra , "mingo@kernel.org" , Andrew Morton , Johannes Weiner , Neil Brown , Michael Shaver , "linux-kernel@vger.kernel.org" Subject: Re: [PATCH] sched: Avoid that __wait_on_bit_lock() hangs Message-ID: <20160811173651.GA31803@redhat.com> References: <20160804140938.GB24652@twins.programming.kicks-ass.net> <16207b90-2e6c-fe23-1b4b-3763e5cf0384@sandisk.com> <20160808102213.GA6879@twins.programming.kicks-ass.net> <4091e252-18d9-1795-de63-9fbc678aa6b1@acm.org> <20160808162038.GA25927@redhat.com> <78fafdc1-d4ae-a9a2-169c-1d456b6e6e41@sandisk.com> <20160809171459.GA13840@redhat.com> <3cec7657-caa9-92ca-9f0e-34f073a6ed8c@sandisk.com> <20160810104555.GA3333@redhat.com> <4d2e02f8-c7da-ee1a-1068-25492cbffebe@sandisk.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4d2e02f8-c7da-ee1a-1068-25492cbffebe@sandisk.com> User-Agent: Mutt/1.5.24 (2015-08-30) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]); Thu, 11 Aug 2016 17:36:55 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Bart, On 08/10, Bart Van Assche wrote: > > That's an excellent catch. With your previous patch and this patch applied I > can't reproduce the hang in truncate_inode_pages_range() anymore. Great, thanks. I'll send another debugging patch tomorrow, I was a bit busy today. The next step is obvious, we need to know the caller. But just in case, this doesn't necessarily mean that the usage of __ClearPageLocked() is actually buggy, we don't really know this so far... And I can't understand another oddity. Your test-case hangs in kill_bdev() path which sleeps with bdev->bd_openers == 0 under bdev->bd_mutex so it can't be re-opened. However, since your change in abort_exclusive_wait() helped, there should be the readers sleeping in lock_killable() and thus bd_openers can't be zero. Nevermind, I don't understand this code even remotely, we will see later who should be asked. > I still > see some other wait_on_page_bit() hangs after an I/O error has occurred. > However, the hangs that I still see are related to waiting on buffer head > state changes and not on the PG_locked page flag. I don't know if this is right or not... lets discuss this later. Thanks! Oleg.