From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Morton Subject: Re: [patch][rfc] fs: fix nobh error handling Date: Tue, 7 Aug 2007 19:33:47 -0700 Message-ID: <20070807193347.fbcd7f38.akpm@linux-foundation.org> References: <20070807055129.GE17986@wotan.suse.de> <20070807180903.3cf36b77.akpm@linux-foundation.org> <20070808021838.GA11018@wotan.suse.de> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: linux-fsdevel@vger.kernel.org, Dave Kleikamp , Badari Pulavarty To: Nick Piggin Return-path: Received: from smtp2.linux-foundation.org ([207.189.120.14]:59759 "EHLO smtp2.linux-foundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758738AbXHHCee (ORCPT ); Tue, 7 Aug 2007 22:34:34 -0400 In-Reply-To: <20070808021838.GA11018@wotan.suse.de> Sender: linux-fsdevel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org On Wed, 8 Aug 2007 04:18:38 +0200 Nick Piggin wrote: > On Tue, Aug 07, 2007 at 06:09:03PM -0700, Andrew Morton wrote: > > On Tue, 7 Aug 2007 07:51:29 +0200 > > Nick Piggin wrote: > > > > > nobh mode error handling is not just pretty slack, it's wrong. > > > > > > One cannot zero out the whole page to ensure new blocks are zeroed, > > > because it just brings the whole page "uptodate" with zeroes even if > > > that may not be the correct uptodate data. Also, other parts of the > > > page may already contain dirty data which would get lost by zeroing > > > it out. Thirdly, the writeback of zeroes to the new blocks will also > > > erase existing blocks. All these conditions are pagecache and/or > > > filesystem corruption. > > > > > > The problem comes about because we didn't keep track of which buffers > > > actually are new or old. However it is not enough just to keep only > > > this state, because at the point we start dirtying parts of the page > > > (new blocks, with zeroes), the handling of IO errors becomes impossible > > > without buffers because the page may only be partially uptodate, in > > > which case the page flags allone cannot capture the state of the parts > > > of the page. > > > > > > So allocate all buffers for the page upfront, but leave them unattached > > > so that they don't pick up any other references and can be freed when > > > we're done. If the error path is hit, then zero the new buffers as the > > > regular buffer path does, then attach the buffers to the page so that > > > it can actually be written out correctly and be subject to the normal > > > IO error handling paths. > > > > > > As an upshot, we save 1K of kernel stack on ia64 or powerpc 64K page > > > systems. > > > > > > Signed-off-by: Nick Piggin > > > > > > > With this change, nobh_prepare_write() can magically attach buffers to the > > page. But a filesystem which is running in nobh mode wouldn't expect that, > > and could quite legitimately go BUG, or leak data or, more seriously and > > much less fixably, just go and overwrite page->private, because it "knows" > > that nobody else is using ->private. > > I was fairly sure that a filesystem can not assume buffers won't be > attached, because there are various error case paths thta do exactly > the same thing (eg. nobh_writepage can call __block_write_full_page > which will attach buffers). oh crap, that's sad. Either we broke it later on or I misremembered. > Does any filesystem assume this? Is it not already broken? Yes, it would be broken. > > > I'd have thought that it would be better to not attach the buffers and to > > go ahead and do whatever synchronous IO is needed in the error recovery > > code, then free those buffers again. > > It is hard because if the synchronous IO fails, then what do you do? Do what we usually do when an IO error happens: crash the kernel? (Sorry, have been spending too long at bugzilla.kernel.org) > You could try making it up as you go along, but of course if we _can_ > attach the buffers here then it would be preferable to do that. IMO. > > > > Also, you have a couple of (cheerily uncommented) PagePrivate() tests in > > there which should be page_has_buffers(). > > Yeah, I guess the whole thing needs more commenting :P > page_has_buffers... right, I'll change that. Did it get much testing?