linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* how to handle failed writes in the middle of a set?
@ 2012-01-28 11:44 Jeff Layton
       [not found] ` <20120128064422.5d5e4022-9yPaYZwiELC+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
  0 siblings, 1 reply; 3+ messages in thread
From: Jeff Layton @ 2012-01-28 11:44 UTC (permalink / raw)
  To: linux-fsdevel, linux-cifs

The SMB protocol specifies that if you don't have an oplock then writes
and reads to/from the server are not supposed to use the cache. Currently
cifs does this sort of write serially. I'd like to change it to do them
in parallel for better performance, but I'm not sure what to do in the
following situation:

Suppose we have a wsize of 64k. An application opens a file for write
and does not get an oplock. It sends down a 192k write from userspace.
cifs breaks that up into 3 SMB_COM_WRITE_AND_X calls on the wire,
fires them off in parallel and waits for them to return. The first and
third write succeed, but the second one (the one in the middle) fails
with a hard error.

How should we return from the write at that point? The alternatives I
see are:

1/ return -EIO for the whole thing, even though part of it was
successfully written?

2/ pretend only the first write succeeded, even though the part
afterward might have been corrupted?

3/ do something else?

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: how to handle failed writes in the middle of a set?
       [not found] ` <20120128064422.5d5e4022-9yPaYZwiELC+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
@ 2012-01-28 14:36   ` James Bottomley
       [not found]     ` <1327761391.2924.9.camel-sFMDBYUN5F8GjUHQrlYNx2Wm91YjaHnnhRte9Li2A+AAvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 3+ messages in thread
From: James Bottomley @ 2012-01-28 14:36 UTC (permalink / raw)
  To: Jeff Layton
  Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-cifs-u79uwXL29TY76Z2rM5mHXA

On Sat, 2012-01-28 at 06:44 -0500, Jeff Layton wrote:
> The SMB protocol specifies that if you don't have an oplock then writes
> and reads to/from the server are not supposed to use the cache. Currently
> cifs does this sort of write serially. I'd like to change it to do them
> in parallel for better performance, but I'm not sure what to do in the
> following situation:
> 
> Suppose we have a wsize of 64k. An application opens a file for write
> and does not get an oplock. It sends down a 192k write from userspace.
> cifs breaks that up into 3 SMB_COM_WRITE_AND_X calls on the wire,
> fires them off in parallel and waits for them to return. The first and
> third write succeed, but the second one (the one in the middle) fails
> with a hard error.
> 
> How should we return from the write at that point? The alternatives I
> see are:
> 
> 1/ return -EIO for the whole thing, even though part of it was
> successfully written?

This would be the safest return.  Whether it's optimal depends on how
the writes are issued (and by what) and whether the error handling is
sophisticated enough.

> 2/ pretend only the first write succeeded, even though the part
> afterward might have been corrupted?

This would be what the current Linux SCSI behaviour is today (assuming
the underlying storage reports it).  We mark the sectors up to the
failure good and then error the rest.  Assuming the cifs client is
sophisticated enough, it should be OK to do this, and would represent
the most accurate information.

> 3/ do something else?

Like what?  I'm assuming from the way you phrased the question the error
returns in cifs aren't sophisticated enough to do one per chunk (or
sector)?  In linux, we could, in theory return OK for writes 1 and 3 and
error write 2, but that's because we can carry one error per bio.
However, we never do this because disk errors are always sequential and
we'd have to have the bio boundary aligned correctly for your chunks
(because a bio always completes partially beginning with good and ending
with bad).

James

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: how to handle failed writes in the middle of a set?
       [not found]     ` <1327761391.2924.9.camel-sFMDBYUN5F8GjUHQrlYNx2Wm91YjaHnnhRte9Li2A+AAvxtiuMwx3w@public.gmane.org>
@ 2012-01-28 14:59       ` Jeff Layton
  0 siblings, 0 replies; 3+ messages in thread
From: Jeff Layton @ 2012-01-28 14:59 UTC (permalink / raw)
  To: James Bottomley
  Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-cifs-u79uwXL29TY76Z2rM5mHXA

On Sat, 28 Jan 2012 08:36:31 -0600
James Bottomley <James.Bottomley-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org> wrote:

> On Sat, 2012-01-28 at 06:44 -0500, Jeff Layton wrote:
> > The SMB protocol specifies that if you don't have an oplock then writes
> > and reads to/from the server are not supposed to use the cache. Currently
> > cifs does this sort of write serially. I'd like to change it to do them
> > in parallel for better performance, but I'm not sure what to do in the
> > following situation:
> > 
> > Suppose we have a wsize of 64k. An application opens a file for write
> > and does not get an oplock. It sends down a 192k write from userspace.
> > cifs breaks that up into 3 SMB_COM_WRITE_AND_X calls on the wire,
> > fires them off in parallel and waits for them to return. The first and
> > third write succeed, but the second one (the one in the middle) fails
> > with a hard error.
> > 
> > How should we return from the write at that point? The alternatives I
> > see are:
> > 
> > 1/ return -EIO for the whole thing, even though part of it was
> > successfully written?
> 
> This would be the safest return.  Whether it's optimal depends on how
> the writes are issued (and by what) and whether the error handling is
> sophisticated enough.
> 
> > 2/ pretend only the first write succeeded, even though the part
> > afterward might have been corrupted?
> 
> This would be what the current Linux SCSI behaviour is today (assuming
> the underlying storage reports it).  We mark the sectors up to the
> failure good and then error the rest.  Assuming the cifs client is
> sophisticated enough, it should be OK to do this, and would represent
> the most accurate information.
> 
> > 3/ do something else?
> 
> Like what?  I'm assuming from the way you phrased the question the error
> returns in cifs aren't sophisticated enough to do one per chunk (or
> sector)?  In linux, we could, in theory return OK for writes 1 and 3 and
> error write 2, but that's because we can carry one error per bio.
> However, we never do this because disk errors are always sequential and
> we'd have to have the bio boundary aligned correctly for your chunks
> (because a bio always completes partially beginning with good and ending
> with bad).
>

No idea what else we could do...

We we have to return something there to the application on (for
instance) a write(2) syscall. I don't see how we can represent that
situation more granularly in the context of that.

FWIW, if we assume that the 2nd write failed, then we'll end up with a
sparse file or zero-filled gap in the file on the server. I guess
you're correct that returning an EIO on the whole thing would be
safest...

--
Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2012-01-28 14:59 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-01-28 11:44 how to handle failed writes in the middle of a set? Jeff Layton
     [not found] ` <20120128064422.5d5e4022-9yPaYZwiELC+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
2012-01-28 14:36   ` James Bottomley
     [not found]     ` <1327761391.2924.9.camel-sFMDBYUN5F8GjUHQrlYNx2Wm91YjaHnnhRte9Li2A+AAvxtiuMwx3w@public.gmane.org>
2012-01-28 14:59       ` Jeff Layton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).