linux-man.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* SuSE O_DIRECT|O_NONBLOCK overload
       [not found]               ` <53203BE5.402-l3A5Bk7waGM@public.gmane.org>
@ 2014-03-12 11:00                 ` Christoph Hellwig
       [not found]                   ` <20140312110015.GA29907-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  0 siblings, 1 reply; 3+ messages in thread
From: Christoph Hellwig @ 2014-03-12 11:00 UTC (permalink / raw)
  To: NeilBrown
  Cc: Jens Axboe, Alexander Viro, Linus Torvalds,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-man-u79uwXL29TY76Z2rM5mHXA

The SLES12 tree has various patches to implement special
O_DIRECT|O_NONBLOCK semantics for block devices:

	https://gitorious.org/opensuse/kernel-source/source/806eab3e4b02e798c1ae942440051f81c822ca35:patches.suse/block-nonblock-causes-failfast

this seems genuinely useful and I'd be really happy if people would do
this work upstream for two reasons:

 a) implementing different semantics only in a vendor kernel is a
    nightmare.  No proper way to document it in the man pages for
    example, and silent breakage of applications that expect it to be
    present, or even more nasty not present.
 b) Which brings us to: we had various issues with adding O_NONBLOCK to
    files that didn't support it before.  How well was this whole feature
    tested?

--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: SuSE O_DIRECT|O_NONBLOCK overload
       [not found]                   ` <20140312110015.GA29907-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
@ 2014-03-13  0:15                     ` NeilBrown
       [not found]                       ` <20140313111555.2f15f19f-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
  0 siblings, 1 reply; 3+ messages in thread
From: NeilBrown @ 2014-03-13  0:15 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Alexander Viro, Linus Torvalds,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-man-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 3249 bytes --]

On Wed, 12 Mar 2014 04:00:15 -0700 Christoph Hellwig <hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
wrote:

> The SLES12 tree has various patches to implement special
> O_DIRECT|O_NONBLOCK semantics for block devices:
> 
> 	https://gitorious.org/opensuse/kernel-source/source/806eab3e4b02e798c1ae942440051f81c822ca35:patches.suse/block-nonblock-causes-failfast
> 
> this seems genuinely useful and I'd be really happy if people would do
> this work upstream for two reasons:
> 
>  a) implementing different semantics only in a vendor kernel is a
>     nightmare.  No proper way to document it in the man pages for
>     example, and silent breakage of applications that expect it to be
>     present, or even more nasty not present.
>  b) Which brings us to: we had various issues with adding O_NONBLOCK to
>     files that didn't support it before.  How well was this whole feature
>     tested?


This "feature" was really just a hack because a particular customer needed
something in a particular situation.

At the core of this in my thinking is the 'failfast' BIO flag ... or 'flags'
really because there are now three of them.

They don't seem to be documented or uniformly supported or used much at
all.  dm-multipath uses one, and btrfs uses another.  There could be value in
using one or more or something in md but as they aren't documented and could
mean almost anything I have stayed away.
I tried adding some sort of 'failfast' support to md once and I would get
occasional failures from regular sata devices which otherwise appeared to be
working perfectly well.  So it seemed that "fast" was altogether *too* fast.

For a particular customer with some particular hardware there were issues
where that hardware could choose not to respond for extended periods.  So we
modified the driver to accept a 'timeout' module parameter and to cause
REQ_FAILFAST_DEV (I think) requests to fail with -ETIMEDOUT if they could not
be serviced in that time.

We then modified md to cope with that particular well-defined semantic. And
hacked "O_NONBLOCK" support in so that mdadm could access the device without
the risk of hanging indefinitely.

I would be happy to bring at least some of this functionality into mainline,
but I would need a "FAILFAST" flag that actually meant something useful and
was sufficiently well documented so that if some driver got it wrong, I would
be justified in blaming the driver for not meeting the expectations that I
encoded into md.

I think that the FAILFAST flag that I need would do some error recovery but
would be time limited.  Maybe a software TLER (Time Limited Error Recovery).

I also think there should probably be just one FAILFAST flag.  Where it was
the DEV or the TRANSPORT or the DRIVER that failed could be returned in the
error code for any caller that cared.  But as I don't know why the one became
three I could well be missing something important.


As for testing, only basic "does it function as expected" testing.
Part of the reason for only modifying O_NONBLOCK behaviour where O_DIRECT was
also set was to make it extremely unlikely that any code would use this
feature except code that specifically needed it.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: SuSE O_DIRECT|O_NONBLOCK overload
       [not found]                       ` <20140313111555.2f15f19f-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
@ 2014-03-14 17:46                         ` Mike Christie
  0 siblings, 0 replies; 3+ messages in thread
From: Mike Christie @ 2014-03-14 17:46 UTC (permalink / raw)
  To: NeilBrown
  Cc: Christoph Hellwig, Jens Axboe, Alexander Viro, Linus Torvalds,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-man-u79uwXL29TY76Z2rM5mHXA

On 03/12/2014 07:15 PM, NeilBrown wrote:
> I also think there should probably be just one FAILFAST flag.  Where it was
> the DEV or the TRANSPORT or the DRIVER that failed could be returned in the
> error code for any caller that cared.  But as I don't know why the one became
> three I could well be missing something important.

It was for multipath. The problem was dm-multipath does not know what to
do with low level device errors, but wanted transport errors returned
quickly. Other drivers like the scsi_dh (formerly dm hardware handlers)
modules, want all errors fast failed.

I can add documentation or we can just change the code to better suite
your needs.
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2014-03-14 17:46 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20140130132620.GA6031@infradead.org>
     [not found] ` <20140130132630.GB6031@infradead.org>
     [not found]   ` <x49iorp3fhm.fsf@segfault.boston.devel.redhat.com>
     [not found]     ` <20140308155240.GA32297@infradead.org>
     [not found]       ` <CAMM=eLewbtK7LiNZCXo=hm-sjVU2+MN+xTFuxb89mDA1iS3r0w@mail.gmail.com>
     [not found]         ` <531B74B6.4070004@suse.de>
     [not found]           ` <20140312102849.GA26509@infradead.org>
     [not found]             ` <53203BE5.402@suse.de>
     [not found]               ` <53203BE5.402-l3A5Bk7waGM@public.gmane.org>
2014-03-12 11:00                 ` SuSE O_DIRECT|O_NONBLOCK overload Christoph Hellwig
     [not found]                   ` <20140312110015.GA29907-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2014-03-13  0:15                     ` NeilBrown
     [not found]                       ` <20140313111555.2f15f19f-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
2014-03-14 17:46                         ` Mike Christie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).