From mboxrd@z Thu Jan  1 00:00:00 1970
From: Mingming Cao <cmm@us.ibm.com>
Subject: Re: kjournald() with DIO
Date: Tue, 13 Sep 2005 10:53:01 -0700
Message-ID: <1126633981.4012.31.camel@localhost.localdomain>
References: <1126567387.14837.36.camel@dyn9047017102.beaverton.ibm.com>
	 <20050912163732.036b2971.akpm@osdl.org>
	 <1126569984.14837.47.camel@dyn9047017102.beaverton.ibm.com>
	 <20050912172935.19907edf.akpm@osdl.org>
Reply-To: cmm@us.ibm.com
Mime-Version: 1.0
Content-Type: text/plain
Content-Transfer-Encoding: 7bit
Cc: Badari Pulavarty <pbadari@us.ibm.com>,
	linux-fsdevel@vger.kernel.org, sct@redhat.com
Return-path: <linux-fsdevel-owner@vger.kernel.org>
Received: from e33.co.us.ibm.com ([32.97.110.131]:16612 "EHLO
	e33.co.us.ibm.com") by vger.kernel.org with ESMTP id S964941AbVIMRyV
	(ORCPT <rfc822;linux-fsdevel@vger.kernel.org>);
	Tue, 13 Sep 2005 13:54:21 -0400
Received: from westrelay02.boulder.ibm.com (westrelay02.boulder.ibm.com [9.17.195.11])
	by e33.co.us.ibm.com (8.12.10/8.12.9) with ESMTP id j8DHrbvd144404
	for <linux-fsdevel@vger.kernel.org>; Tue, 13 Sep 2005 13:53:40 -0400
Received: from d03av01.boulder.ibm.com (d03av01.boulder.ibm.com [9.17.195.167])
	by westrelay02.boulder.ibm.com (8.12.10/NCO/VERS6.7) with ESMTP id j8DHrQtC412028
	for <linux-fsdevel@vger.kernel.org>; Tue, 13 Sep 2005 11:53:26 -0600
Received: from d03av01.boulder.ibm.com (loopback [127.0.0.1])
	by d03av01.boulder.ibm.com (8.12.11/8.13.3) with ESMTP id j8DHr3tX026267
	for <linux-fsdevel@vger.kernel.org>; Tue, 13 Sep 2005 11:53:03 -0600
To: Andrew Morton <akpm@osdl.org>
In-Reply-To: <20050912172935.19907edf.akpm@osdl.org>
Sender: linux-fsdevel-owner@vger.kernel.org
List-Id: linux-fsdevel.vger.kernel.org

On Mon, 2005-09-12 at 17:29 -0700, Andrew Morton wrote:
> Badari Pulavarty <pbadari@us.ibm.com> wrote:
> >
> > Didn't help. How can this help ?
> 
> Oh well.  Different bug.  See http://bugzilla.kernel.org/show_bug.cgi?id=4964
> 
> > DIO code is kicking back to buffered mode, since its trying to
> > fill-in the holes. The fix you suggested might have helped, if
> > IO is beyond file size or file size not ending on page boundary.
> > What am I missing ?
> 
> Have you spoken with Mingming about this?  She's chasing a similar race. 
> She's currently working on getting the ext3-debug patch working so we can
> find out how those buffers (which apparently aren't even attached to the
> journal) came to have an elevated refcount.
> 
Hi Andrew, Badari was looking at the race with me and he came up with
some hack patch to "remember" who are the last two caller to get_bh(),
and once we found a buffer whose reference count is >1 but the journal
head is NULL, it start trace this bufferhead.(dump who is the last two
caller to get_bh())

The analysis results shows that there are race between
journal_try_to_free_buffers() and kjournald.  It possible that when
kjournald is waiting for a buffer unlocked, it released the journal-
>j_list_lock(commit.c, line 398), but it is still holding a reference
count on that buffer while wait_on_buffer().  This allowed the
__journal_try_to_free_buffer() to proceed (which is waiting for the
j_list_lock and will unlink the journal head from the buffer
eventually). journal_try_to_free_buffer() will call try_to_free_buffer()
if the journal head is NULL, at that point, since kjournald is still
holding a reference count on that buffer, try_to_free_buffers()-
>drop_buffers() failed because buffer is busy.


This race happened in SLES9 SP1/SP2, also it happend in 2.6.11, but
could not reproduce in mainline 2.6.13. SPLES9 sp1/sp2 and 2.6.11 all
have the patch to allow invalidate_inode_page() to return -EIO to deal
with the DIO-with-dirty-mapped-write case.  But in 2.6.13, it
invalidates using a range, probably that's why mainline doesn't 
show the problem (the chance of calling invalidate_inode_pages2_range on
a busy page is reduced if the range is passed as parameter)

> Apparently the debug patch is currently oopsing.  We need to fix that.

Sorry about the lack of response on this, it certainly very powerful and
useful tool. Last time I tried it again the kernel would not boot. I am
working on it, slowly:)

Mingming