* Re: [patch|rfc] add support for I/O scheduler tuning
2010-11-10 16:47 [patch|rfc] add support for I/O scheduler tuning Jeff Moyer
@ 2010-11-10 17:03 ` Jeff Moyer
2010-11-10 18:26 ` David Zeuthen
` (6 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Jeff Moyer @ 2010-11-10 17:03 UTC (permalink / raw)
To: linux-hotplug
Jeff Moyer <jmoyer@redhat.com> writes:
> Hi,
>
> From within the block layer in the kernel, it is difficult to
> automatically detect the performance characteristics of the underlying
> storage. It was suggested by Jens Axboe at LSF2010 that we write a udev
> rule to tune the I/O scheduler properly for most cases. The basic
> approach is to leave CFQ's default tunings alone for SATA disks. For
> everything else, turn off slice idling and bump the quantum in order to
> drive higher queue depths. This patch is an attempt to implement this.
>
> I've tested it in a variety of configurations:
> - cciss devices
> - sata disks
> - sata ssds
> - enterprise storage (single path)
> - enterprise storage (multi-path)
> - multiple paths to a sata disk (yes, you can actually do that!)
>
> The tuning works as expected in all of those scenarios. I look forward
> to your comments.
>
I forgot to mention that Harald Hoyer provided a great deal of help in
getting me up to speed on udev, so thanks are indeed due to him.
Cheers,
Jeff
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [patch|rfc] add support for I/O scheduler tuning
2010-11-10 16:47 [patch|rfc] add support for I/O scheduler tuning Jeff Moyer
2010-11-10 17:03 ` Jeff Moyer
@ 2010-11-10 18:26 ` David Zeuthen
2010-11-10 20:03 ` Vivek Goyal
` (5 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: David Zeuthen @ 2010-11-10 18:26 UTC (permalink / raw)
To: linux-hotplug
Hi,
On Wed, Nov 10, 2010 at 11:47 AM, Jeff Moyer <jmoyer@redhat.com> wrote:
> Hi,
>
> From within the block layer in the kernel, it is difficult to
> automatically detect the performance characteristics of the underlying
> storage. It was suggested by Jens Axboe at LSF2010 that we write a udev
> rule to tune the I/O scheduler properly for most cases. The basic
> approach is to leave CFQ's default tunings alone for SATA disks. For
> everything else, turn off slice idling and bump the quantum in order to
> drive higher queue depths. This patch is an attempt to implement this.
>
> I've tested it in a variety of configurations:
> - cciss devices
> - sata disks
> - sata ssds
> - enterprise storage (single path)
> - enterprise storage (multi-path)
> - multiple paths to a sata disk (yes, you can actually do that!)
>
> The tuning works as expected in all of those scenarios. I look forward
> to your comments.
This looks useful, but I really think the kernel driver creating the
block device should choose/change the defaults for the created block
device - it seems really backwards to do this in user-space as an
afterthought.
David
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [patch|rfc] add support for I/O scheduler tuning
2010-11-10 16:47 [patch|rfc] add support for I/O scheduler tuning Jeff Moyer
2010-11-10 17:03 ` Jeff Moyer
2010-11-10 18:26 ` David Zeuthen
@ 2010-11-10 20:03 ` Vivek Goyal
2010-11-10 20:08 ` Jens Axboe
` (4 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Vivek Goyal @ 2010-11-10 20:03 UTC (permalink / raw)
To: linux-hotplug
On Wed, Nov 10, 2010 at 01:26:21PM -0500, David Zeuthen wrote:
> Hi,
>
> On Wed, Nov 10, 2010 at 11:47 AM, Jeff Moyer <jmoyer@redhat.com> wrote:
> > Hi,
> >
> > From within the block layer in the kernel, it is difficult to
> > automatically detect the performance characteristics of the underlying
> > storage. It was suggested by Jens Axboe at LSF2010 that we write a udev
> > rule to tune the I/O scheduler properly for most cases. The basic
> > approach is to leave CFQ's default tunings alone for SATA disks. For
> > everything else, turn off slice idling and bump the quantum in order to
> > drive higher queue depths. This patch is an attempt to implement this.
> >
> > I've tested it in a variety of configurations:
> > - cciss devices
> > - sata disks
> > - sata ssds
> > - enterprise storage (single path)
> > - enterprise storage (multi-path)
> > - multiple paths to a sata disk (yes, you can actually do that!)
> >
> > The tuning works as expected in all of those scenarios. I look forward
> > to your comments.
>
> This looks useful, but I really think the kernel driver creating the
> block device should choose/change the defaults for the created block
> device - it seems really backwards to do this in user-space as an
> afterthought.
I think it just becomes little easier to implement in user space so that
if things don't work as expected, somebody can easily disable the rules
or somebody can easily refine the rule further to better suite their
needs instead of driver hardcoding this decision.
Thanks
Vivek
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [patch|rfc] add support for I/O scheduler tuning
2010-11-10 16:47 [patch|rfc] add support for I/O scheduler tuning Jeff Moyer
` (2 preceding siblings ...)
2010-11-10 20:03 ` Vivek Goyal
@ 2010-11-10 20:08 ` Jens Axboe
2010-11-11 20:07 ` Jeff Moyer
` (3 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Jens Axboe @ 2010-11-10 20:08 UTC (permalink / raw)
To: linux-hotplug
On 2010-11-10 21:03, Vivek Goyal wrote:
> On Wed, Nov 10, 2010 at 01:26:21PM -0500, David Zeuthen wrote:
>> Hi,
>>
>> On Wed, Nov 10, 2010 at 11:47 AM, Jeff Moyer <jmoyer@redhat.com> wrote:
>>> Hi,
>>>
>>> From within the block layer in the kernel, it is difficult to
>>> automatically detect the performance characteristics of the underlying
>>> storage. It was suggested by Jens Axboe at LSF2010 that we write a udev
>>> rule to tune the I/O scheduler properly for most cases. The basic
>>> approach is to leave CFQ's default tunings alone for SATA disks. For
>>> everything else, turn off slice idling and bump the quantum in order to
>>> drive higher queue depths. This patch is an attempt to implement this.
>>>
>>> I've tested it in a variety of configurations:
>>> - cciss devices
>>> - sata disks
>>> - sata ssds
>>> - enterprise storage (single path)
>>> - enterprise storage (multi-path)
>>> - multiple paths to a sata disk (yes, you can actually do that!)
>>>
>>> The tuning works as expected in all of those scenarios. I look forward
>>> to your comments.
>>
>> This looks useful, but I really think the kernel driver creating the
>> block device should choose/change the defaults for the created block
>> device - it seems really backwards to do this in user-space as an
>> afterthought.
>
> I think it just becomes little easier to implement in user space so that
> if things don't work as expected, somebody can easily disable the rules
> or somebody can easily refine the rule further to better suite their
> needs instead of driver hardcoding this decision.
That's the primary reason why I suggested doing this in user space. Plus
we don't always know in the kernel, at least this provides an easier way
to auto-tune things.
--
Jens Axboe
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [patch|rfc] add support for I/O scheduler tuning
2010-11-10 16:47 [patch|rfc] add support for I/O scheduler tuning Jeff Moyer
` (3 preceding siblings ...)
2010-11-10 20:08 ` Jens Axboe
@ 2010-11-11 20:07 ` Jeff Moyer
2010-11-12 14:36 ` Kay Sievers
` (2 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Jeff Moyer @ 2010-11-11 20:07 UTC (permalink / raw)
To: linux-hotplug
Jens Axboe <axboe@kernel.dk> writes:
> On 2010-11-10 21:03, Vivek Goyal wrote:
>> On Wed, Nov 10, 2010 at 01:26:21PM -0500, David Zeuthen wrote:
>>> Hi,
>>>
>>> On Wed, Nov 10, 2010 at 11:47 AM, Jeff Moyer <jmoyer@redhat.com> wrote:
>>>> Hi,
>>>>
>>>> From within the block layer in the kernel, it is difficult to
>>>> automatically detect the performance characteristics of the underlying
>>>> storage. It was suggested by Jens Axboe at LSF2010 that we write a udev
>>>> rule to tune the I/O scheduler properly for most cases. The basic
>>>> approach is to leave CFQ's default tunings alone for SATA disks. For
>>>> everything else, turn off slice idling and bump the quantum in order to
>>>> drive higher queue depths. This patch is an attempt to implement this.
>>>>
>>>> I've tested it in a variety of configurations:
>>>> - cciss devices
>>>> - sata disks
>>>> - sata ssds
>>>> - enterprise storage (single path)
>>>> - enterprise storage (multi-path)
>>>> - multiple paths to a sata disk (yes, you can actually do that!)
>>>>
>>>> The tuning works as expected in all of those scenarios. I look forward
>>>> to your comments.
>>>
>>> This looks useful, but I really think the kernel driver creating the
>>> block device should choose/change the defaults for the created block
>>> device - it seems really backwards to do this in user-space as an
>>> afterthought.
>>
>> I think it just becomes little easier to implement in user space so that
>> if things don't work as expected, somebody can easily disable the rules
>> or somebody can easily refine the rule further to better suite their
>> needs instead of driver hardcoding this decision.
>
> That's the primary reason why I suggested doing this in user space. Plus
> we don't always know in the kernel, at least this provides an easier way
> to auto-tune things.
Right, so given the above, is there still opposition to doing this in
udev?
Thanks!
Jeff
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [patch|rfc] add support for I/O scheduler tuning
2010-11-10 16:47 [patch|rfc] add support for I/O scheduler tuning Jeff Moyer
` (4 preceding siblings ...)
2010-11-11 20:07 ` Jeff Moyer
@ 2010-11-12 14:36 ` Kay Sievers
2010-11-15 14:57 ` Vivek Goyal
2010-11-15 15:43 ` Kay Sievers
7 siblings, 0 replies; 9+ messages in thread
From: Kay Sievers @ 2010-11-12 14:36 UTC (permalink / raw)
To: linux-hotplug
On Thu, Nov 11, 2010 at 21:07, Jeff Moyer <jmoyer@redhat.com> wrote:
> Jens Axboe <axboe@kernel.dk> writes:
>> On 2010-11-10 21:03, Vivek Goyal wrote:
>>> On Wed, Nov 10, 2010 at 01:26:21PM -0500, David Zeuthen wrote:
>>>> On Wed, Nov 10, 2010 at 11:47 AM, Jeff Moyer <jmoyer@redhat.com> wrote:
>>>>> From within the block layer in the kernel, it is difficult to
>>>>> automatically detect the performance characteristics of the underlying
>>>>> storage. It was suggested by Jens Axboe at LSF2010 that we write a udev
>>>>> rule to tune the I/O scheduler properly for most cases. The basic
>>>>> approach is to leave CFQ's default tunings alone for SATA disks. For
>>>>> everything else, turn off slice idling and bump the quantum in order to
>>>>> drive higher queue depths. This patch is an attempt to implement this.
>>>>>
>>>>> I've tested it in a variety of configurations:
>>>>> - cciss devices
>>>>> - sata disks
>>>>> - sata ssds
>>>>> - enterprise storage (single path)
>>>>> - enterprise storage (multi-path)
>>>>> - multiple paths to a sata disk (yes, you can actually do that!)
>>>>>
>>>>> The tuning works as expected in all of those scenarios. I look forward
>>>>> to your comments.
>>>>
>>>> This looks useful, but I really think the kernel driver creating the
>>>> block device should choose/change the defaults for the created block
>>>> device - it seems really backwards to do this in user-space as an
>>>> afterthought.
>>>
>>> I think it just becomes little easier to implement in user space so that
>>> if things don't work as expected, somebody can easily disable the rules
>>> or somebody can easily refine the rule further to better suite their
>>> needs instead of driver hardcoding this decision.
>>
>> That's the primary reason why I suggested doing this in user space. Plus
>> we don't always know in the kernel, at least this provides an easier way
>> to auto-tune things.
>
> Right, so given the above, is there still opposition to doing this in
> udev?
Not in general. Udev can do such things, that's what it's there for.
It can do quirks, custom setups, and support tweaked configs that way.
But it's usually not meant to set common defaults for every box. The
last time we got into this business, and set timeouts for scsi devices
from udev, we broke more recent kernels that did not like the
specified values anymore, and we needed to remove all that in released
versions, to be able to safely run newer kernels. And we've been told
not to do such a thing in the future.
And all what your rules are doing is to unconditionally apply
kernel-internal knowledge to kernel devices -- which if you look at it
from one step back -- is a bit weird.
So I guess, this should be done from the multipath package, the dm
setup, some 'tweak.rpm', ... I'm not sure, if we can do that for
everybody from the main udev sources, for the same reasons the scsi
timeout was wrong to do from udev. The time we added it, it seemed to
be the right thing, but 2 years later it wasn't, because the kernel
evolved, and we got into its way.
Kay
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [patch|rfc] add support for I/O scheduler tuning
2010-11-10 16:47 [patch|rfc] add support for I/O scheduler tuning Jeff Moyer
` (5 preceding siblings ...)
2010-11-12 14:36 ` Kay Sievers
@ 2010-11-15 14:57 ` Vivek Goyal
2010-11-15 15:43 ` Kay Sievers
7 siblings, 0 replies; 9+ messages in thread
From: Vivek Goyal @ 2010-11-15 14:57 UTC (permalink / raw)
To: linux-hotplug
On Fri, Nov 12, 2010 at 03:36:47PM +0100, Kay Sievers wrote:
> On Thu, Nov 11, 2010 at 21:07, Jeff Moyer <jmoyer@redhat.com> wrote:
> > Jens Axboe <axboe@kernel.dk> writes:
> >> On 2010-11-10 21:03, Vivek Goyal wrote:
> >>> On Wed, Nov 10, 2010 at 01:26:21PM -0500, David Zeuthen wrote:
> >>>> On Wed, Nov 10, 2010 at 11:47 AM, Jeff Moyer <jmoyer@redhat.com> wrote:
> >>>>> From within the block layer in the kernel, it is difficult to
> >>>>> automatically detect the performance characteristics of the underlying
> >>>>> storage. It was suggested by Jens Axboe at LSF2010 that we write a udev
> >>>>> rule to tune the I/O scheduler properly for most cases. The basic
> >>>>> approach is to leave CFQ's default tunings alone for SATA disks. For
> >>>>> everything else, turn off slice idling and bump the quantum in order to
> >>>>> drive higher queue depths. This patch is an attempt to implement this.
> >>>>>
> >>>>> I've tested it in a variety of configurations:
> >>>>> - cciss devices
> >>>>> - sata disks
> >>>>> - sata ssds
> >>>>> - enterprise storage (single path)
> >>>>> - enterprise storage (multi-path)
> >>>>> - multiple paths to a sata disk (yes, you can actually do that!)
> >>>>>
> >>>>> The tuning works as expected in all of those scenarios. I look forward
> >>>>> to your comments.
> >>>>
> >>>> This looks useful, but I really think the kernel driver creating the
> >>>> block device should choose/change the defaults for the created block
> >>>> device - it seems really backwards to do this in user-space as an
> >>>> afterthought.
> >>>
> >>> I think it just becomes little easier to implement in user space so that
> >>> if things don't work as expected, somebody can easily disable the rules
> >>> or somebody can easily refine the rule further to better suite their
> >>> needs instead of driver hardcoding this decision.
> >>
> >> That's the primary reason why I suggested doing this in user space. Plus
> >> we don't always know in the kernel, at least this provides an easier way
> >> to auto-tune things.
> >
> > Right, so given the above, is there still opposition to doing this in
> > udev?
>
> Not in general. Udev can do such things, that's what it's there for.
> It can do quirks, custom setups, and support tweaked configs that way.
>
> But it's usually not meant to set common defaults for every box. The
> last time we got into this business, and set timeouts for scsi devices
> from udev, we broke more recent kernels that did not like the
> specified values anymore, and we needed to remove all that in released
> versions, to be able to safely run newer kernels. And we've been told
> not to do such a thing in the future.
>
> And all what your rules are doing is to unconditionally apply
> kernel-internal knowledge to kernel devices -- which if you look at it
> from one step back -- is a bit weird.
>
> So I guess, this should be done from the multipath package, the dm
> setup, some 'tweak.rpm', ... I'm not sure, if we can do that for
> everybody from the main udev sources, for the same reasons the scsi
> timeout was wrong to do from udev. The time we added it, it seemed to
> be the right thing, but 2 years later it wasn't, because the kernel
> evolved, and we got into its way.
Hi Kay,
I can understand the issue of a rule being not valid anymore if kernel
evolves. But the question is what's wrong with that? Why can't we keep
on updating the udev rules as kernel and hardware evolves. Are they
supposed to be set in stone once a rule has been written?
Even if we move the rule to some other user space package, then that
package will face the same issue of rule being not valid any more if
kernel evolves. So that will be equivalent just shifting the problem
from one user space package to other.
To me key thing here is whether udev should try to set up some reasonable
IO scheduler defaults for system or not or it should be left entirely
to kernel.
"Deadline" IO scheduler generally works very well with enterprise storage.
CFQ primarly cuts down seeks for very seeky media like SATA drive. Kernel
by default keeps CFQ as default for all the devices and we are trying to
improve out of the box experience for the user instead of imposing CFQ
on everybody and expecting them to change it later to deadline where
appropriate.
Because rules are still not very clear yet and we are not sure how well
this notion of CFQ for SATA is going to play with everybody, to me it
still might not be a bad idea to initially write a udev rule and if
this works reasonably well or kernel evolves, we can modify the rule
accordingly.
Thanks
Vivek
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [patch|rfc] add support for I/O scheduler tuning
2010-11-10 16:47 [patch|rfc] add support for I/O scheduler tuning Jeff Moyer
` (6 preceding siblings ...)
2010-11-15 14:57 ` Vivek Goyal
@ 2010-11-15 15:43 ` Kay Sievers
7 siblings, 0 replies; 9+ messages in thread
From: Kay Sievers @ 2010-11-15 15:43 UTC (permalink / raw)
To: linux-hotplug
On Mon, Nov 15, 2010 at 15:57, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Fri, Nov 12, 2010 at 03:36:47PM +0100, Kay Sievers wrote:
>> On Thu, Nov 11, 2010 at 21:07, Jeff Moyer <jmoyer@redhat.com> wrote:
>> > Jens Axboe <axboe@kernel.dk> writes:
>> >> On 2010-11-10 21:03, Vivek Goyal wrote:
>> >>> On Wed, Nov 10, 2010 at 01:26:21PM -0500, David Zeuthen wrote:
>> >>>> On Wed, Nov 10, 2010 at 11:47 AM, Jeff Moyer <jmoyer@redhat.com> wrote:
>> >>>>> From within the block layer in the kernel, it is difficult to
>> >>>>> automatically detect the performance characteristics of the underlying
>> >>>>> storage. It was suggested by Jens Axboe at LSF2010 that we write a udev
>> >>>>> rule to tune the I/O scheduler properly for most cases. The basic
>> >>>>> approach is to leave CFQ's default tunings alone for SATA disks. For
>> >>>>> everything else, turn off slice idling and bump the quantum in order to
>> >>>>> drive higher queue depths. This patch is an attempt to implement this.
>> >>>>>
>> >>>>> I've tested it in a variety of configurations:
>> >>>>> - cciss devices
>> >>>>> - sata disks
>> >>>>> - sata ssds
>> >>>>> - enterprise storage (single path)
>> >>>>> - enterprise storage (multi-path)
>> >>>>> - multiple paths to a sata disk (yes, you can actually do that!)
>> >>>>>
>> >>>>> The tuning works as expected in all of those scenarios. I look forward
>> >>>>> to your comments.
>> >>>>
>> >>>> This looks useful, but I really think the kernel driver creating the
>> >>>> block device should choose/change the defaults for the created block
>> >>>> device - it seems really backwards to do this in user-space as an
>> >>>> afterthought.
>> >>>
>> >>> I think it just becomes little easier to implement in user space so that
>> >>> if things don't work as expected, somebody can easily disable the rules
>> >>> or somebody can easily refine the rule further to better suite their
>> >>> needs instead of driver hardcoding this decision.
>> >>
>> >> That's the primary reason why I suggested doing this in user space. Plus
>> >> we don't always know in the kernel, at least this provides an easier way
>> >> to auto-tune things.
>> >
>> > Right, so given the above, is there still opposition to doing this in
>> > udev?
>>
>> Not in general. Udev can do such things, that's what it's there for.
>> It can do quirks, custom setups, and support tweaked configs that way.
>>
>> But it's usually not meant to set common defaults for every box. The
>> last time we got into this business, and set timeouts for scsi devices
>> from udev, we broke more recent kernels that did not like the
>> specified values anymore, and we needed to remove all that in released
>> versions, to be able to safely run newer kernels. And we've been told
>> not to do such a thing in the future.
>>
>> And all what your rules are doing is to unconditionally apply
>> kernel-internal knowledge to kernel devices -- which if you look at it
>> from one step back -- is a bit weird.
>>
>> So I guess, this should be done from the multipath package, the dm
>> setup, some 'tweak.rpm', ... I'm not sure, if we can do that for
>> everybody from the main udev sources, for the same reasons the scsi
>> timeout was wrong to do from udev. The time we added it, it seemed to
>> be the right thing, but 2 years later it wasn't, because the kernel
>> evolved, and we got into its way.
>
> Hi Kay,
>
> I can understand the issue of a rule being not valid anymore if kernel
> evolves. But the question is what's wrong with that? Why can't we keep
> on updating the udev rules as kernel and hardware evolves. Are they
> supposed to be set in stone once a rule has been written?
>
> Even if we move the rule to some other user space package, then that
> package will face the same issue of rule being not valid any more if
> kernel evolves. So that will be equivalent just shifting the problem
> from one user space package to other.
>
> To me key thing here is whether udev should try to set up some reasonable
> IO scheduler defaults for system or not or it should be left entirely
> to kernel.
>
> "Deadline" IO scheduler generally works very well with enterprise storage.
> CFQ primarly cuts down seeks for very seeky media like SATA drive. Kernel
> by default keeps CFQ as default for all the devices and we are trying to
> improve out of the box experience for the user instead of imposing CFQ
> on everybody and expecting them to change it later to deadline where
> appropriate.
>
> Because rules are still not very clear yet and we are not sure how well
> this notion of CFQ for SATA is going to play with everybody, to me it
> still might not be a bad idea to initially write a udev rule and if
> this works reasonably well or kernel evolves, we can modify the rule
> accordingly.
Udev can be the engine to change stuff on demand, but it should not
ship *common defaults* which are only gathered from kernel
information. If that's the goal, and it should be done for all
systems, please change the kernel to that directly, and don't put that
into udev.
It would be a different picture if userspace would be involved in some
sense, like persistently storing results of 'disk tests', or something
similar, and applying calculated values based on these earlier
results, to the actual disk when it is re-discovered. It would
probably also involve permanent monitoring, and updating these values.
Retrieving simple kernel values and re-apply them to the kernel does
not make much sense in general, not for block devices, not for other
subsystems, and things like that should not got into the udev
repository for the reasons mentioned in the earlier mail.
Kay
^ permalink raw reply [flat|nested] 9+ messages in thread