From mboxrd@z Thu Jan  1 00:00:00 1970
From: Jeff Layton <jlayton@redhat.com>
Subject: Re: flush and EIO errors when writepages fails
Date: Sat, 21 Jun 2008 08:27:42 -0400
Message-ID: <20080621082742.4b311132@tleilax.poochiereds.net>
References: <524f69650806201534g67704841u9f942a71b6d9caa3@mail.gmail.com>
	<20080621070556.GA1155@2ka.mipt.ru>
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Cc: Evgeniy Polyakov <johnpol@2ka.mipt.ru>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Shirish Pargaonkar <shirishpargaonkar@gmail.com>,
	Dave Kleikamp <shaggy@linux.vnet.ibm.com>,
	Jody French <jfrench@austin.rr.com>
To: Steve French <smfrench@gmail.com>
Return-path: <linux-fsdevel-owner@vger.kernel.org>
Received: from mx1.redhat.com ([66.187.233.31]:59837 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1750800AbYFUMbR (ORCPT <rfc822;linux-fsdevel@vger.kernel.org>);
	Sat, 21 Jun 2008 08:31:17 -0400
In-Reply-To: <20080621070556.GA1155@2ka.mipt.ru>
Sender: linux-fsdevel-owner@vger.kernel.org
List-ID: <linux-fsdevel.vger.kernel.org>

On Sat, 21 Jun 2008 11:05:56 +0400
Evgeniy Polyakov <johnpol@2ka.mipt.ru> wrote:

> Hi.
> 
> On Fri, Jun 20, 2008 at 05:34:21PM -0500, Steve French (smfrench@gmail.com) wrote:
> > > Right, but with the current implementation, once filemap_fdatawrite
> > > returns, any pages that that run touched are either written out or
> > > discarded.
> 
> Depending on writepages() implementation, it is not always the case.
> 

Right. I meant with the current cifs_writepages() implementation. It
looks like when the write fails we walk the pagevec and do:

                                if (rc)
                                        SetPageError(page);
                                kunmap(page);
                                unlock_page(page);
                                end_page_writeback(page);
                                page_cache_release(page);

So I'm not certain that the data is actually discarded. I'm no expert
in page accounting, but I'm pretty sure that we won't attempt to write
it out again from CIFS.

> > That could explain some problems if true.  When writepages fails, we
> > make the pages as in error (PG_error flag?) and presumably they are
> > still dirty.   Why in the world would anyone free the pages just
> > because we failed the first time and need to write them again later?
> > Do you where (presumably in /mm) pages could be freed that are still
> > dirty (it is hard to find where the PG_error flag is checked etc)?
> 
> You can clear writeback bit but leave/set dirty bit in completion
> callback for given request. You can manually insert page into radix tree
> with dirty tag. You can also lock page and do not allow to unlock it
> until you resent your data.
> 
> So, there is plenty of possibilities to break page accounting in own
> writepages() method :)
> 

I still think that if you get a hard error back from the server then
you're not likely to have much success on a second attempt to write out
the pages. Returning an error to userspace as soon as you realize that
it didn't work seems reasonable to me. A non-responding server, on the
other hand, may be a place that's recoverable.

Either way, if we really want to do a second attempt to write out the
pagevec, then adding some code to cifs_writepages that sleeps for a bit
and implements this seems like the thing to do. I'm not convinced that
it will actually make much difference, but it seems unlikely to hurt
anything.

-- 
Jeff Layton <jlayton@redhat.com>