From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mingming Cao Subject: Re: kjournald() with DIO Date: Tue, 13 Sep 2005 10:53:01 -0700 Message-ID: <1126633981.4012.31.camel@localhost.localdomain> References: <1126567387.14837.36.camel@dyn9047017102.beaverton.ibm.com> <20050912163732.036b2971.akpm@osdl.org> <1126569984.14837.47.camel@dyn9047017102.beaverton.ibm.com> <20050912172935.19907edf.akpm@osdl.org> Reply-To: cmm@us.ibm.com Mime-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 7bit Cc: Badari Pulavarty , linux-fsdevel@vger.kernel.org, sct@redhat.com Return-path: Received: from e33.co.us.ibm.com ([32.97.110.131]:16612 "EHLO e33.co.us.ibm.com") by vger.kernel.org with ESMTP id S964941AbVIMRyV (ORCPT ); Tue, 13 Sep 2005 13:54:21 -0400 Received: from westrelay02.boulder.ibm.com (westrelay02.boulder.ibm.com [9.17.195.11]) by e33.co.us.ibm.com (8.12.10/8.12.9) with ESMTP id j8DHrbvd144404 for ; Tue, 13 Sep 2005 13:53:40 -0400 Received: from d03av01.boulder.ibm.com (d03av01.boulder.ibm.com [9.17.195.167]) by westrelay02.boulder.ibm.com (8.12.10/NCO/VERS6.7) with ESMTP id j8DHrQtC412028 for ; Tue, 13 Sep 2005 11:53:26 -0600 Received: from d03av01.boulder.ibm.com (loopback [127.0.0.1]) by d03av01.boulder.ibm.com (8.12.11/8.13.3) with ESMTP id j8DHr3tX026267 for ; Tue, 13 Sep 2005 11:53:03 -0600 To: Andrew Morton In-Reply-To: <20050912172935.19907edf.akpm@osdl.org> Sender: linux-fsdevel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org On Mon, 2005-09-12 at 17:29 -0700, Andrew Morton wrote: > Badari Pulavarty wrote: > > > > Didn't help. How can this help ? > > Oh well. Different bug. See http://bugzilla.kernel.org/show_bug.cgi?id=4964 > > > DIO code is kicking back to buffered mode, since its trying to > > fill-in the holes. The fix you suggested might have helped, if > > IO is beyond file size or file size not ending on page boundary. > > What am I missing ? > > Have you spoken with Mingming about this? She's chasing a similar race. > She's currently working on getting the ext3-debug patch working so we can > find out how those buffers (which apparently aren't even attached to the > journal) came to have an elevated refcount. > Hi Andrew, Badari was looking at the race with me and he came up with some hack patch to "remember" who are the last two caller to get_bh(), and once we found a buffer whose reference count is >1 but the journal head is NULL, it start trace this bufferhead.(dump who is the last two caller to get_bh()) The analysis results shows that there are race between journal_try_to_free_buffers() and kjournald. It possible that when kjournald is waiting for a buffer unlocked, it released the journal- >j_list_lock(commit.c, line 398), but it is still holding a reference count on that buffer while wait_on_buffer(). This allowed the __journal_try_to_free_buffer() to proceed (which is waiting for the j_list_lock and will unlink the journal head from the buffer eventually). journal_try_to_free_buffer() will call try_to_free_buffer() if the journal head is NULL, at that point, since kjournald is still holding a reference count on that buffer, try_to_free_buffers()- >drop_buffers() failed because buffer is busy. This race happened in SLES9 SP1/SP2, also it happend in 2.6.11, but could not reproduce in mainline 2.6.13. SPLES9 sp1/sp2 and 2.6.11 all have the patch to allow invalidate_inode_page() to return -EIO to deal with the DIO-with-dirty-mapped-write case. But in 2.6.13, it invalidates using a range, probably that's why mainline doesn't show the problem (the chance of calling invalidate_inode_pages2_range on a busy page is reduced if the range is passed as parameter) > Apparently the debug patch is currently oopsing. We need to fix that. Sorry about the lack of response on this, it certainly very powerful and useful tool. Last time I tried it again the kernel would not boot. I am working on it, slowly:) Mingming