From mboxrd@z Thu Jan 1 00:00:00 1970 From: Badari Pulavarty Subject: Re: kjournald() with DIO Date: Thu, 15 Sep 2005 13:49:31 -0700 Message-ID: <1126817371.14837.155.camel@dyn9047017102.beaverton.ibm.com> References: <20050912172935.19907edf.akpm@osdl.org> <1126630370.14837.60.camel@dyn9047017102.beaverton.ibm.com> <20050913160701.355cd46a.akpm@osdl.org> <1126718583.4010.6.camel@localhost.localdomain> <20050914111809.41c5b395.akpm@osdl.org> <1126734025.4010.21.camel@localhost.localdomain> <20050914150224.3b6d7051.akpm@osdl.org> <1126796604.14837.111.camel@dyn9047017102.beaverton.ibm.com> <20050915192225.GJ4122@opteron.random> <20050915130018.287270e4.akpm@osdl.org> <20050915202019.GK4122@opteron.random> <20050915133500.754a8b4d.akpm@osdl.org> Mime-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 7bit Cc: Andrea Arcangeli , cmm@us.ibm.com, linux-fsdevel@vger.kernel.org, sct@redhat.com Return-path: Received: from e35.co.us.ibm.com ([32.97.110.133]:62181 "EHLO e35.co.us.ibm.com") by vger.kernel.org with ESMTP id S932595AbVIOUvT (ORCPT ); Thu, 15 Sep 2005 16:51:19 -0400 Received: from westrelay02.boulder.ibm.com (westrelay02.boulder.ibm.com [9.17.195.11]) by e35.co.us.ibm.com (8.12.10/8.12.9) with ESMTP id j8FKpGLZ229776 for ; Thu, 15 Sep 2005 16:51:16 -0400 Received: from d03av02.boulder.ibm.com (d03av02.boulder.ibm.com [9.17.195.168]) by westrelay02.boulder.ibm.com (8.12.10/NCO/VERS6.7) with ESMTP id j8FKnxab485708 for ; Thu, 15 Sep 2005 14:49:59 -0600 Received: from d03av02.boulder.ibm.com (loopback [127.0.0.1]) by d03av02.boulder.ibm.com (8.12.11/8.13.3) with ESMTP id j8FKnxd7001686 for ; Thu, 15 Sep 2005 14:49:59 -0600 To: Andrew Morton In-Reply-To: <20050915133500.754a8b4d.akpm@osdl.org> Sender: linux-fsdevel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org On Thu, 2005-09-15 at 13:35 -0700, Andrew Morton wrote: > Andrea Arcangeli wrote: > > > > > Probably we could simulate ->releasepage(page) by using > > > ->invalidatepage(page, PAGE_CACHE_SIZE), but that's relying on > > > side-effects, or would need new semantics defined, or something. > > > > The problem is that one thing is the memory reclaim, one thing is the > > invalidate_inode_pages and one thing is the truncate. > > > > They need three different behaviours. > > Yup. > > > But we've only two API to ask the fs to get rid of the buffers, one is > > destroying dirty buffers, the other is non-blocking, we might need one > > that is blocking and it isn't destroying dirty buffers. > > > > The "wait/gfpmask" parameter to releasepage is ignored by ext3, perhaps > > we should change the semantics of that bit, and have it passed as > > GFP_KERNEL only by the invalidate_inode_pages. > > That would work. > > > > I made releasepage() nonblocking to avoid blocking processes in the memory > > > reclaim paths. Given that it's best-effort, it may as well just trylock > > > everything it needs and give up if that doesn't work out. > > > > I agree the memory reclaim can remain nonblocking. > > > > However note that the failure here is in try_to_free_buffers because the > > buffer is pinned by the journal code (it's not a trylock failing). > > Buffer is BH_Mapped, BH_Req, BH_Uptodate (0x19) - nicely tracked by IBM. > > I'll need reminding - why was the buffer unfreeable? Because kjournald had > a ref on it? Whereabouts is that happening? > > kjournald() is commiting the transaction and doing IO to buffers, since previous DIO write kicked it back to buffered mode (due to hole filling). Here is the race: DIO Process kjounald() journal_commit_transaction() ... /* submited buffers for IO */ /* Waiting for IO to complete */ while (t_locked_list) { ... get_bh(bh); if (buffer_locked(bh)) { spin_unlock(&journal->j_list_lock); wait_on_buffer(bh); invalidate_complete_page() .. ext3_releasepage() journal_try_to_free_buffers() journal_put_journal_head() __journal_try_to_free_buffer() <--- freed jh try_to_free_buffers() drop_buffers() if (buffer_busy(bh)) goto failed; <<--- returns EIO due to b_count