From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mingming Cao Subject: Re: kjournald() with DIO Date: Wed, 14 Sep 2005 14:40:24 -0700 Message-ID: <1126734025.4010.21.camel@localhost.localdomain> References: <1126567387.14837.36.camel@dyn9047017102.beaverton.ibm.com> <20050912163732.036b2971.akpm@osdl.org> <1126569984.14837.47.camel@dyn9047017102.beaverton.ibm.com> <20050912172935.19907edf.akpm@osdl.org> <1126630370.14837.60.camel@dyn9047017102.beaverton.ibm.com> <20050913160701.355cd46a.akpm@osdl.org> <1126718583.4010.6.camel@localhost.localdomain> <20050914111809.41c5b395.akpm@osdl.org> Reply-To: cmm@us.ibm.com Mime-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 7bit Cc: pbadari@us.ibm.com, linux-fsdevel@vger.kernel.org, sct@redhat.com, andrea@suse.de Return-path: Received: from e32.co.us.ibm.com ([32.97.110.130]:47604 "EHLO e32.co.us.ibm.com") by vger.kernel.org with ESMTP id S932777AbVINVlt (ORCPT ); Wed, 14 Sep 2005 17:41:49 -0400 Received: from d03relay04.boulder.ibm.com (d03relay04.boulder.ibm.com [9.17.195.106]) by e32.co.us.ibm.com (8.12.10/8.12.9) with ESMTP id j8ELfiBh082394 for ; Wed, 14 Sep 2005 17:41:44 -0400 Received: from d03av01.boulder.ibm.com (d03av01.boulder.ibm.com [9.17.195.167]) by d03relay04.boulder.ibm.com (8.12.10/NCO/VERS6.7) with ESMTP id j8ELesQZ381866 for ; Wed, 14 Sep 2005 15:40:54 -0600 Received: from d03av01.boulder.ibm.com (loopback [127.0.0.1]) by d03av01.boulder.ibm.com (8.12.11/8.13.3) with ESMTP id j8ELeQJD023414 for ; Wed, 14 Sep 2005 15:40:27 -0600 To: Andrew Morton In-Reply-To: <20050914111809.41c5b395.akpm@osdl.org> Sender: linux-fsdevel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org On Wed, 2005-09-14 at 11:18 -0700, Andrew Morton wrote: > Mingming Cao wrote: > > > > On Tue, 2005-09-13 at 16:07 -0700, Andrew Morton wrote: > > > > > Or simply ignore the invalidate_inode_pages2_range() return value in > > > generic_file_direct_IO(). > > > > > Could we simply do that? > > > > I found some discussions about why we check the return value of > > invalidate_inode_pages2_range() in generic_file_direct_IO(): > > http://marc.theaimsgroup.com/?l=linux-kernel&m=109850054025709&w=2 > > Well found. That brings it back. > > > It seems the check for EIO was added to 2.6.11 to handle the case of > > parallel direct IO and mapped IO. It is possible that the mapped IO > > dirty the pages after the a_ops->direct_IO. In that case, an error will > > return back to the caller of DIO to indicate the race. > > According to the logic we discussed last year, > invalidate_inode_pages2_range() only needs to return -EIO if it failed to > invalidate a page, and that page was dirty. > > The -EIO is there to tell the caller that another process dirtied pagecache > against the file (within the range of the direct-io write()) after > generic_file_direct_IO() has synced the pagecache to disk. > > The -EIO is telling the direct-io write()r "hey, the data which you wrote > was overwritten by a racing buffered-write() or mmapped-write". It's not > obvious to me _why_ we should tell the direct-io write()r this - after all, > we assume that's what the application developer wanted to do. > > Still, we don't have to worry about that at present because > invalidate_inode_pages2_range() is just doing the wrong thing: it's > treating this elevated-refcount buffer_head as if it was a dirty page, and > it's not. > > How about this? I proposed similar idea to Andrea in the bug report before. Andrea expressed this concern: with this(try_to_free_buffers() still fail to drop the buffer because of this elevated-refcount by kjournald), block_read_full_page will not re-read from disk the buffers the next time a buffered-IO read from disk, after the direct-io has completed. This is because the buffer is marked uptodate. How could we handle this? Mingming