From mboxrd@z Thu Jan 1 00:00:00 1970 From: Nick Piggin Subject: Re: [patch 6/6] mm: fsync livelock avoidance Date: Fri, 12 Dec 2008 00:43:35 +0100 Message-ID: <20081211234335.GG8294@wotan.suse.de> References: <20081210072454.GB27096@wotan.suse.de> <20081210074209.GG27096@wotan.suse.de> <20081211142347.2546b16c.akpm@linux-foundation.org> <20081211224514.GE8294@wotan.suse.de> <20081211151407.bacd44e5.akpm@linux-foundation.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: linux-fsdevel@vger.kernel.org, mpatocka@redhat.com To: Andrew Morton Return-path: Received: from ns2.suse.de ([195.135.220.15]:56399 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757378AbYLKXnj (ORCPT ); Thu, 11 Dec 2008 18:43:39 -0500 Content-Disposition: inline In-Reply-To: <20081211151407.bacd44e5.akpm@linux-foundation.org> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Thu, Dec 11, 2008 at 03:14:07PM -0800, Andrew Morton wrote: > On Thu, 11 Dec 2008 23:45:14 +0100 > Nick Piggin wrote: > > > > > For simplicity, I have removed the "don't wait for writeout if we hit -EIO" > > > > logic from a couple of places. I don't know if this is really worth the added > > > > complexity (EIO will still get reported, but it will just take a bit longer; > > > > an app can't rely in specific behaviour or timeliness here). > > > > > > This is ungood. The device layer likes to twiddle thumbs for 30 > > > seconds or more when it hits an IO error. We went and made that 30,000 > > > or more.. > > > > It isn't really a good solution anyway, > > what isn't a good solution to what? To the problem of long waits on IO errors. > > because I think it's much > > less likely for writepage to return -EIO directly. Usually they > > would come back via data IO completion asynchronously. > > umm, maybe. If all the file metadata is in pagecache. Often it is not. I'd say, often it *is* because the buffer layer allocates/maps/reserves blocks for the page when it gets dirtied. Of course these checks will catch some cases for some filesystems, but they're not a good general solution to the problem of EIO errors taking a long time, IMO, because there are other ways it can happen.