From mboxrd@z Thu Jan 1 00:00:00 1970 From: Chris Mason Subject: Re: [PATCH RFC] O_DIRECT reads and writes without i_sem Date: Mon, 01 Nov 2004 11:56:49 -0500 Message-ID: <1099328209.23475.98.camel@watt.suse.com> References: <1099323127.23475.80.camel@watt.suse.com> <20041101160844.GA29251@infradead.org> Mime-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 7bit Cc: linux-fsdevel@vger.kernel.org, akpm@osdl.org Return-path: Received: from cantor.suse.de ([195.135.220.2]:12480 "EHLO Cantor.suse.de") by vger.kernel.org with ESMTP id S262569AbUKAQ5v (ORCPT ); Mon, 1 Nov 2004 11:57:51 -0500 To: Christoph Hellwig In-Reply-To: <20041101160844.GA29251@infradead.org> Sender: linux-fsdevel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org On Mon, 2004-11-01 at 16:08 +0000, Christoph Hellwig wrote: > On Mon, Nov 01, 2004 at 10:32:07AM -0500, Chris Mason wrote: > > Right now, O_DIRECT reads and writes on regular files have to take i_sem > > while reading file metadata in order to make sure we don't race with > > hole filling. [ ... ] > This gets too complicated for it's own sake. What about going down the > XFS route and making i_sem a r/w semaphore that's taken only shared > during read and write I/O, but exclusive while setting up write I/O > outside of i_size? Alternatively just move the I/O locking into the > filesytem. > Nod, it is too complex, this is why I posted early ;) The only place in the FS specific code for the locking I added to fs/direct-io.c would be in each filesystem get_blocks call. I went for direct-io.c because that's where all the other locking already was. shrug. If we do down_read(i_rw_sem) for all cases except growing the file, then we still have no locking for O_DIRECT while we are filling holes in the file. The filesystem write function could do this: down_read(i_rw_sem) read fs metadata if (filling hole) { up_read(i_rw_sem) down_write(i_rw_sem) goto retry } But it also gets nasty in a hurry. The second alternative is just to make O_DIRECT lock pages (or stub pages if they aren't in cache) in the filesystem address space, but this might add significant overhead in radix tree operations during O_DIRECT. -chris