From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeff Layton Subject: Re: how to handle failed writes in the middle of a set? Date: Sat, 28 Jan 2012 09:59:37 -0500 Message-ID: <20120128095937.29a6ac16@tlielax.poochiereds.net> References: <20120128064422.5d5e4022@tlielax.poochiereds.net> <1327761391.2924.9.camel@dabdike.int.hansenpartnership.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-cifs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org To: James Bottomley Return-path: In-Reply-To: <1327761391.2924.9.camel-sFMDBYUN5F8GjUHQrlYNx2Wm91YjaHnnhRte9Li2A+AAvxtiuMwx3w@public.gmane.org> Sender: linux-cifs-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-Id: linux-fsdevel.vger.kernel.org On Sat, 28 Jan 2012 08:36:31 -0600 James Bottomley wrote: > On Sat, 2012-01-28 at 06:44 -0500, Jeff Layton wrote: > > The SMB protocol specifies that if you don't have an oplock then writes > > and reads to/from the server are not supposed to use the cache. Currently > > cifs does this sort of write serially. I'd like to change it to do them > > in parallel for better performance, but I'm not sure what to do in the > > following situation: > > > > Suppose we have a wsize of 64k. An application opens a file for write > > and does not get an oplock. It sends down a 192k write from userspace. > > cifs breaks that up into 3 SMB_COM_WRITE_AND_X calls on the wire, > > fires them off in parallel and waits for them to return. The first and > > third write succeed, but the second one (the one in the middle) fails > > with a hard error. > > > > How should we return from the write at that point? The alternatives I > > see are: > > > > 1/ return -EIO for the whole thing, even though part of it was > > successfully written? > > This would be the safest return. Whether it's optimal depends on how > the writes are issued (and by what) and whether the error handling is > sophisticated enough. > > > 2/ pretend only the first write succeeded, even though the part > > afterward might have been corrupted? > > This would be what the current Linux SCSI behaviour is today (assuming > the underlying storage reports it). We mark the sectors up to the > failure good and then error the rest. Assuming the cifs client is > sophisticated enough, it should be OK to do this, and would represent > the most accurate information. > > > 3/ do something else? > > Like what? I'm assuming from the way you phrased the question the error > returns in cifs aren't sophisticated enough to do one per chunk (or > sector)? In linux, we could, in theory return OK for writes 1 and 3 and > error write 2, but that's because we can carry one error per bio. > However, we never do this because disk errors are always sequential and > we'd have to have the bio boundary aligned correctly for your chunks > (because a bio always completes partially beginning with good and ending > with bad). > No idea what else we could do... We we have to return something there to the application on (for instance) a write(2) syscall. I don't see how we can represent that situation more granularly in the context of that. FWIW, if we assume that the 2nd write failed, then we'll end up with a sparse file or zero-filled gap in the file on the server. I guess you're correct that returning an EIO on the whole thing would be safest... -- Jeff Layton