From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Morton Subject: Re: [RFC PATCH 0/2] O_DIRECT locking rework Date: Fri, 20 Oct 2006 12:30:58 -0700 Message-ID: <20061020123058.8b422339.akpm@osdl.org> References: <20061020183237.GA8674@think.oraclecorp.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: linux-fsdevel@vger.kernel.org, zach.brown@oracle.com Return-path: Received: from smtp.osdl.org ([65.172.181.4]:35017 "EHLO smtp.osdl.org") by vger.kernel.org with ESMTP id S2992732AbWJTTbX (ORCPT ); Fri, 20 Oct 2006 15:31:23 -0400 To: Chris Mason In-Reply-To: <20061020183237.GA8674@think.oraclecorp.com> Sender: linux-fsdevel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org On Fri, 20 Oct 2006 14:32:37 -0400 Chris Mason wrote: > Hello everyone, > > O_DIRECT locking currently uses a few different per-inode locks to > prevent races between buffered io and direct io. This is awkward, and > sometimes grows races where we expose old data on disk. > > For example, I can't quite see how we protect from an mmap triggered > writepage from filling a hole in the middle of an O_DIRECT read. > > This patch set changes O_DIRECT to use page locks instead of > mutex/semaphores. It looks in the radix tree for pages affected by this > O_DIRECT read/wrte and locks any pages it finds. > > For any pages not present, a stub page struct is inserted into the > radix tree. The page cache routines are changed to either wait on this > place holder page or ignore it as appropriate. Place holders are not > valid pages at all, you can't trust page->index or any other field. > > The first patch introduces these place holder pages. The second patch > changes fs/direct-io.c to use them. Patch #2 needs work, > direct-io.c:lock_page_range can be made much faster, and it needs to be > changed to work in chunks instead of pinning down the whole range at > once. > > But, this is enough for people to comment on the basic idea. Testing > has been very light. I'm not sure I've covered all of the buffered vs > direct races yet. The main goal of posting now is to talk about the > place holder pages and possible optimizations. > > For the XFS guys, you probably want to avoid the page locking steps as > well, a later version will honor that. > Boy, it doesn't do much to simplify the code, does it? An opportunity for Linus to again share with us his opinions on direct-io. Conceptually, sticking locked pages into the mapping is a good thing to do, because it meshes in with all our existing synchronisation design. I think the fake placeholder page can be a kernel-wide thing rather than per-dio? That would be most desirable because then we have #define PagePlaceHolder(page) (page == global_placeholder_page) which saves a precious page flag.