linux-hotplug.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch|rfc] add support for I/O scheduler tuning
@ 2010-11-10 16:47 Jeff Moyer
  2010-11-10 17:03 ` Jeff Moyer
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: Jeff Moyer @ 2010-11-10 16:47 UTC (permalink / raw)
  To: linux-hotplug

Hi,

From within the block layer in the kernel, it is difficult to
automatically detect the performance characteristics of the underlying
storage.  It was suggested by Jens Axboe at LSF2010 that we write a udev
rule to tune the I/O scheduler properly for most cases.  The basic
approach is to leave CFQ's default tunings alone for SATA disks.  For
everything else, turn off slice idling and bump the quantum in order to
drive higher queue depths.  This patch is an attempt to implement this.

I've tested it in a variety of configurations:
- cciss devices
- sata disks
- sata ssds
- enterprise storage (single path)
- enterprise storage (multi-path)
- multiple paths to a sata disk (yes, you can actually do that!)

The tuning works as expected in all of those scenarios.  I look forward
to your comments.

Thanks in advance!

-Jeff

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>

diff --git a/Makefile.am b/Makefile.am
index 032eb28..673c371 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -622,6 +622,16 @@ keymaps-distcheck-hook: extras/keymap/keys.txt
 	$(top_srcdir)/extras/keymap/check-keymaps.sh $(top_srcdir) $^
 DISTCHECK_HOOKS += keymaps-distcheck-hook
 
+# ------------------------------------------------------------------------------
+# iosched - optimize I/O scheduler tunings
+# ------------------------------------------------------------------------------
+EXTRA_DIST += extras/iosched/80-iosched.rules \
+	extras/iosched/80-mpath-iosched.rules extras/iosched/mpath-iosched.sh
+dist_udevrules_DATA += extras/iosched/80-iosched.rules \
+	extras/iosched/80-mpath-iosched.rules
+dist_libexec_SCRIPTS += extras/iosched/mpath-iosched.sh
+
+
 endif # ENABLE_EXTRAS
 
 # ------------------------------------------------------------------------------
diff --git a/extras/iosched/80-iosched.rules b/extras/iosched/80-iosched.rules
new file mode 100644
index 0000000..163f240
--- /dev/null
+++ b/extras/iosched/80-iosched.rules
@@ -0,0 +1,14 @@
+#
+# CFQ's default tunings are geared towards slow SATA disks.  If we detect
+# anything else, we change the tunings to drive deeper queue depths and
+# keep the device busy.
+#
+SUBSYSTEM!="block", GOTO="end_iosched"
+KERNEL="dm-*", GOTO="end_iosched"
+ENV{DEVTYPE}="partition", GOTO="end_iosched"
+ACTION!="add|change", GOTO="end_iosched"
+ENV{ID_BUS}="ata", GOTO="end_iosched"
+ATTR{queue/scheduler}!="*\[cfq\]", GOTO="end_iosched"
+ATTR{queue/iosched/slice_idle}="0"
+ATTR{queue/iosched/quantum}="32"
+LABEL="end_iosched"
diff --git a/extras/iosched/80-mpath-iosched.rules b/extras/iosched/80-mpath-iosched.rules
new file mode 100644
index 0000000..ece9e78
--- /dev/null
+++ b/extras/iosched/80-mpath-iosched.rules
@@ -0,0 +1,9 @@
+SUBSYSTEM!="block", GOTO="end_mpath_iosched"
+ENV{DEVTYPE}="partition", GOTO="end_mpath_iosched"
+KERNEL!="dm-*", GOTO="end_mpath_iosched"
+ACTION!="change", GOTO="end_mpath_iosched"
+ATTR{queue/scheduler}!="*\[cfq\]", GOTO="end_mpath_iosched"
+ENV{DM_UUID}!="mpath-?*", GOTO="end_mpath_iosched"
+ENV{DM_ACTION}="PATH_FAILED", GOTO="end_mpath_iosched"
+RUN+="mpath-iosched.sh"
+LABEL="end_mpath_iosched"
diff --git a/extras/iosched/mpath-iosched.sh b/extras/iosched/mpath-iosched.sh
new file mode 100755
index 0000000..51fb292
--- /dev/null
+++ b/extras/iosched/mpath-iosched.sh
@@ -0,0 +1,26 @@
+#!/bin/bash
+
+#
+# For the request-based multipath driver, the I/O scheduler runs on the
+# multipath device, not the underlying "slave" devices.  This script
+# checks the ID_BUS attribute for each of the slave devices.  If it finds
+# an ata device, it leaves the I/O scheduler tunings alone.  For any other
+# device, we tune the I/O scheduler to try to keep the device busy.
+#
+PATH=/sbin:$PATH
+
+needs_tuning=1
+for slave in /sys${DEVPATH}/slaves/*; do
+	bus_type=$(udevadm info --query=property --path=$slave | grep ID_BUS | awk -F= '{print $2}')
+	if [ "$bus_type" = "ata" ]; then
+		needs_tuning=0
+		break
+	fi
+done
+
+if [ $needs_tuning -eq 1 ]; then
+	echo 0 > /sys${DEVPATH}/queue/iosched/slice_idle
+	echo 32 > /sys${DEVPATH}/queue/iosched/quantum
+fi
+
+exit 0

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [patch|rfc] add support for I/O scheduler tuning
  2010-11-10 16:47 [patch|rfc] add support for I/O scheduler tuning Jeff Moyer
@ 2010-11-10 17:03 ` Jeff Moyer
  2010-11-10 18:26 ` David Zeuthen
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Jeff Moyer @ 2010-11-10 17:03 UTC (permalink / raw)
  To: linux-hotplug

Jeff Moyer <jmoyer@redhat.com> writes:

> Hi,
>
> From within the block layer in the kernel, it is difficult to
> automatically detect the performance characteristics of the underlying
> storage.  It was suggested by Jens Axboe at LSF2010 that we write a udev
> rule to tune the I/O scheduler properly for most cases.  The basic
> approach is to leave CFQ's default tunings alone for SATA disks.  For
> everything else, turn off slice idling and bump the quantum in order to
> drive higher queue depths.  This patch is an attempt to implement this.
>
> I've tested it in a variety of configurations:
> - cciss devices
> - sata disks
> - sata ssds
> - enterprise storage (single path)
> - enterprise storage (multi-path)
> - multiple paths to a sata disk (yes, you can actually do that!)
>
> The tuning works as expected in all of those scenarios.  I look forward
> to your comments.
>

I forgot to mention that Harald Hoyer provided a great deal of help in
getting me up to speed on udev, so thanks are indeed due to him.

Cheers,
Jeff

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [patch|rfc] add support for I/O scheduler tuning
  2010-11-10 16:47 [patch|rfc] add support for I/O scheduler tuning Jeff Moyer
  2010-11-10 17:03 ` Jeff Moyer
@ 2010-11-10 18:26 ` David Zeuthen
  2010-11-10 20:03 ` Vivek Goyal
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: David Zeuthen @ 2010-11-10 18:26 UTC (permalink / raw)
  To: linux-hotplug

Hi,

On Wed, Nov 10, 2010 at 11:47 AM, Jeff Moyer <jmoyer@redhat.com> wrote:
> Hi,
>
> From within the block layer in the kernel, it is difficult to
> automatically detect the performance characteristics of the underlying
> storage.  It was suggested by Jens Axboe at LSF2010 that we write a udev
> rule to tune the I/O scheduler properly for most cases.  The basic
> approach is to leave CFQ's default tunings alone for SATA disks.  For
> everything else, turn off slice idling and bump the quantum in order to
> drive higher queue depths.  This patch is an attempt to implement this.
>
> I've tested it in a variety of configurations:
> - cciss devices
> - sata disks
> - sata ssds
> - enterprise storage (single path)
> - enterprise storage (multi-path)
> - multiple paths to a sata disk (yes, you can actually do that!)
>
> The tuning works as expected in all of those scenarios.  I look forward
> to your comments.

This looks useful, but I really think the kernel driver creating the
block device should choose/change the defaults for the created block
device - it seems really backwards to do this in user-space as an
afterthought.

     David

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [patch|rfc] add support for I/O scheduler tuning
  2010-11-10 16:47 [patch|rfc] add support for I/O scheduler tuning Jeff Moyer
  2010-11-10 17:03 ` Jeff Moyer
  2010-11-10 18:26 ` David Zeuthen
@ 2010-11-10 20:03 ` Vivek Goyal
  2010-11-10 20:08 ` Jens Axboe
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Vivek Goyal @ 2010-11-10 20:03 UTC (permalink / raw)
  To: linux-hotplug

On Wed, Nov 10, 2010 at 01:26:21PM -0500, David Zeuthen wrote:
> Hi,
> 
> On Wed, Nov 10, 2010 at 11:47 AM, Jeff Moyer <jmoyer@redhat.com> wrote:
> > Hi,
> >
> > From within the block layer in the kernel, it is difficult to
> > automatically detect the performance characteristics of the underlying
> > storage.  It was suggested by Jens Axboe at LSF2010 that we write a udev
> > rule to tune the I/O scheduler properly for most cases.  The basic
> > approach is to leave CFQ's default tunings alone for SATA disks.  For
> > everything else, turn off slice idling and bump the quantum in order to
> > drive higher queue depths.  This patch is an attempt to implement this.
> >
> > I've tested it in a variety of configurations:
> > - cciss devices
> > - sata disks
> > - sata ssds
> > - enterprise storage (single path)
> > - enterprise storage (multi-path)
> > - multiple paths to a sata disk (yes, you can actually do that!)
> >
> > The tuning works as expected in all of those scenarios.  I look forward
> > to your comments.
> 
> This looks useful, but I really think the kernel driver creating the
> block device should choose/change the defaults for the created block
> device - it seems really backwards to do this in user-space as an
> afterthought.

I think it just becomes little easier to implement in user space so that
if things don't work as expected, somebody can easily disable the rules
or somebody can easily refine the rule further to better suite their
needs instead of driver hardcoding this decision.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [patch|rfc] add support for I/O scheduler tuning
  2010-11-10 16:47 [patch|rfc] add support for I/O scheduler tuning Jeff Moyer
                   ` (2 preceding siblings ...)
  2010-11-10 20:03 ` Vivek Goyal
@ 2010-11-10 20:08 ` Jens Axboe
  2010-11-11 20:07 ` Jeff Moyer
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Jens Axboe @ 2010-11-10 20:08 UTC (permalink / raw)
  To: linux-hotplug

On 2010-11-10 21:03, Vivek Goyal wrote:
> On Wed, Nov 10, 2010 at 01:26:21PM -0500, David Zeuthen wrote:
>> Hi,
>>
>> On Wed, Nov 10, 2010 at 11:47 AM, Jeff Moyer <jmoyer@redhat.com> wrote:
>>> Hi,
>>>
>>> From within the block layer in the kernel, it is difficult to
>>> automatically detect the performance characteristics of the underlying
>>> storage.  It was suggested by Jens Axboe at LSF2010 that we write a udev
>>> rule to tune the I/O scheduler properly for most cases.  The basic
>>> approach is to leave CFQ's default tunings alone for SATA disks.  For
>>> everything else, turn off slice idling and bump the quantum in order to
>>> drive higher queue depths.  This patch is an attempt to implement this.
>>>
>>> I've tested it in a variety of configurations:
>>> - cciss devices
>>> - sata disks
>>> - sata ssds
>>> - enterprise storage (single path)
>>> - enterprise storage (multi-path)
>>> - multiple paths to a sata disk (yes, you can actually do that!)
>>>
>>> The tuning works as expected in all of those scenarios.  I look forward
>>> to your comments.
>>
>> This looks useful, but I really think the kernel driver creating the
>> block device should choose/change the defaults for the created block
>> device - it seems really backwards to do this in user-space as an
>> afterthought.
> 
> I think it just becomes little easier to implement in user space so that
> if things don't work as expected, somebody can easily disable the rules
> or somebody can easily refine the rule further to better suite their
> needs instead of driver hardcoding this decision.

That's the primary reason why I suggested doing this in user space. Plus
we don't always know in the kernel, at least this provides an easier way
to auto-tune things.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [patch|rfc] add support for I/O scheduler tuning
  2010-11-10 16:47 [patch|rfc] add support for I/O scheduler tuning Jeff Moyer
                   ` (3 preceding siblings ...)
  2010-11-10 20:08 ` Jens Axboe
@ 2010-11-11 20:07 ` Jeff Moyer
  2010-11-12 14:36 ` Kay Sievers
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Jeff Moyer @ 2010-11-11 20:07 UTC (permalink / raw)
  To: linux-hotplug

Jens Axboe <axboe@kernel.dk> writes:

> On 2010-11-10 21:03, Vivek Goyal wrote:
>> On Wed, Nov 10, 2010 at 01:26:21PM -0500, David Zeuthen wrote:
>>> Hi,
>>>
>>> On Wed, Nov 10, 2010 at 11:47 AM, Jeff Moyer <jmoyer@redhat.com> wrote:
>>>> Hi,
>>>>
>>>> From within the block layer in the kernel, it is difficult to
>>>> automatically detect the performance characteristics of the underlying
>>>> storage.  It was suggested by Jens Axboe at LSF2010 that we write a udev
>>>> rule to tune the I/O scheduler properly for most cases.  The basic
>>>> approach is to leave CFQ's default tunings alone for SATA disks.  For
>>>> everything else, turn off slice idling and bump the quantum in order to
>>>> drive higher queue depths.  This patch is an attempt to implement this.
>>>>
>>>> I've tested it in a variety of configurations:
>>>> - cciss devices
>>>> - sata disks
>>>> - sata ssds
>>>> - enterprise storage (single path)
>>>> - enterprise storage (multi-path)
>>>> - multiple paths to a sata disk (yes, you can actually do that!)
>>>>
>>>> The tuning works as expected in all of those scenarios.  I look forward
>>>> to your comments.
>>>
>>> This looks useful, but I really think the kernel driver creating the
>>> block device should choose/change the defaults for the created block
>>> device - it seems really backwards to do this in user-space as an
>>> afterthought.
>> 
>> I think it just becomes little easier to implement in user space so that
>> if things don't work as expected, somebody can easily disable the rules
>> or somebody can easily refine the rule further to better suite their
>> needs instead of driver hardcoding this decision.
>
> That's the primary reason why I suggested doing this in user space. Plus
> we don't always know in the kernel, at least this provides an easier way
> to auto-tune things.

Right, so given the above, is there still opposition to doing this in
udev?

Thanks!
Jeff

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [patch|rfc] add support for I/O scheduler tuning
  2010-11-10 16:47 [patch|rfc] add support for I/O scheduler tuning Jeff Moyer
                   ` (4 preceding siblings ...)
  2010-11-11 20:07 ` Jeff Moyer
@ 2010-11-12 14:36 ` Kay Sievers
  2010-11-15 14:57 ` Vivek Goyal
  2010-11-15 15:43 ` Kay Sievers
  7 siblings, 0 replies; 9+ messages in thread
From: Kay Sievers @ 2010-11-12 14:36 UTC (permalink / raw)
  To: linux-hotplug

On Thu, Nov 11, 2010 at 21:07, Jeff Moyer <jmoyer@redhat.com> wrote:
> Jens Axboe <axboe@kernel.dk> writes:
>> On 2010-11-10 21:03, Vivek Goyal wrote:
>>> On Wed, Nov 10, 2010 at 01:26:21PM -0500, David Zeuthen wrote:
>>>> On Wed, Nov 10, 2010 at 11:47 AM, Jeff Moyer <jmoyer@redhat.com> wrote:
>>>>> From within the block layer in the kernel, it is difficult to
>>>>> automatically detect the performance characteristics of the underlying
>>>>> storage.  It was suggested by Jens Axboe at LSF2010 that we write a udev
>>>>> rule to tune the I/O scheduler properly for most cases.  The basic
>>>>> approach is to leave CFQ's default tunings alone for SATA disks.  For
>>>>> everything else, turn off slice idling and bump the quantum in order to
>>>>> drive higher queue depths.  This patch is an attempt to implement this.
>>>>>
>>>>> I've tested it in a variety of configurations:
>>>>> - cciss devices
>>>>> - sata disks
>>>>> - sata ssds
>>>>> - enterprise storage (single path)
>>>>> - enterprise storage (multi-path)
>>>>> - multiple paths to a sata disk (yes, you can actually do that!)
>>>>>
>>>>> The tuning works as expected in all of those scenarios.  I look forward
>>>>> to your comments.
>>>>
>>>> This looks useful, but I really think the kernel driver creating the
>>>> block device should choose/change the defaults for the created block
>>>> device - it seems really backwards to do this in user-space as an
>>>> afterthought.
>>>
>>> I think it just becomes little easier to implement in user space so that
>>> if things don't work as expected, somebody can easily disable the rules
>>> or somebody can easily refine the rule further to better suite their
>>> needs instead of driver hardcoding this decision.
>>
>> That's the primary reason why I suggested doing this in user space. Plus
>> we don't always know in the kernel, at least this provides an easier way
>> to auto-tune things.
>
> Right, so given the above, is there still opposition to doing this in
> udev?

Not in general. Udev can do such things, that's what it's there for.
It can do quirks, custom setups, and support tweaked configs that way.

But it's usually not meant to set common defaults for every box. The
last time we got into this business, and set timeouts for scsi devices
from udev, we broke more recent kernels that did not like the
specified values anymore, and we needed to remove all that in released
versions, to be able to safely run newer kernels. And we've been told
not to do such a thing in the future.

And all what your rules are doing is to unconditionally apply
kernel-internal knowledge to kernel devices -- which if you look at it
from one step back -- is a bit weird.

So I guess, this should be done from the multipath package, the dm
setup, some 'tweak.rpm', ...  I'm not sure, if we can do that for
everybody from the main udev sources, for the same reasons the scsi
timeout was wrong to do from udev. The time we added it, it seemed to
be the right thing, but 2 years later it wasn't, because the kernel
evolved, and we got into its way.

Kay

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [patch|rfc] add support for I/O scheduler tuning
  2010-11-10 16:47 [patch|rfc] add support for I/O scheduler tuning Jeff Moyer
                   ` (5 preceding siblings ...)
  2010-11-12 14:36 ` Kay Sievers
@ 2010-11-15 14:57 ` Vivek Goyal
  2010-11-15 15:43 ` Kay Sievers
  7 siblings, 0 replies; 9+ messages in thread
From: Vivek Goyal @ 2010-11-15 14:57 UTC (permalink / raw)
  To: linux-hotplug

On Fri, Nov 12, 2010 at 03:36:47PM +0100, Kay Sievers wrote:
> On Thu, Nov 11, 2010 at 21:07, Jeff Moyer <jmoyer@redhat.com> wrote:
> > Jens Axboe <axboe@kernel.dk> writes:
> >> On 2010-11-10 21:03, Vivek Goyal wrote:
> >>> On Wed, Nov 10, 2010 at 01:26:21PM -0500, David Zeuthen wrote:
> >>>> On Wed, Nov 10, 2010 at 11:47 AM, Jeff Moyer <jmoyer@redhat.com> wrote:
> >>>>> From within the block layer in the kernel, it is difficult to
> >>>>> automatically detect the performance characteristics of the underlying
> >>>>> storage.  It was suggested by Jens Axboe at LSF2010 that we write a udev
> >>>>> rule to tune the I/O scheduler properly for most cases.  The basic
> >>>>> approach is to leave CFQ's default tunings alone for SATA disks.  For
> >>>>> everything else, turn off slice idling and bump the quantum in order to
> >>>>> drive higher queue depths.  This patch is an attempt to implement this.
> >>>>>
> >>>>> I've tested it in a variety of configurations:
> >>>>> - cciss devices
> >>>>> - sata disks
> >>>>> - sata ssds
> >>>>> - enterprise storage (single path)
> >>>>> - enterprise storage (multi-path)
> >>>>> - multiple paths to a sata disk (yes, you can actually do that!)
> >>>>>
> >>>>> The tuning works as expected in all of those scenarios.  I look forward
> >>>>> to your comments.
> >>>>
> >>>> This looks useful, but I really think the kernel driver creating the
> >>>> block device should choose/change the defaults for the created block
> >>>> device - it seems really backwards to do this in user-space as an
> >>>> afterthought.
> >>>
> >>> I think it just becomes little easier to implement in user space so that
> >>> if things don't work as expected, somebody can easily disable the rules
> >>> or somebody can easily refine the rule further to better suite their
> >>> needs instead of driver hardcoding this decision.
> >>
> >> That's the primary reason why I suggested doing this in user space. Plus
> >> we don't always know in the kernel, at least this provides an easier way
> >> to auto-tune things.
> >
> > Right, so given the above, is there still opposition to doing this in
> > udev?
> 
> Not in general. Udev can do such things, that's what it's there for.
> It can do quirks, custom setups, and support tweaked configs that way.
> 
> But it's usually not meant to set common defaults for every box. The
> last time we got into this business, and set timeouts for scsi devices
> from udev, we broke more recent kernels that did not like the
> specified values anymore, and we needed to remove all that in released
> versions, to be able to safely run newer kernels. And we've been told
> not to do such a thing in the future.
> 
> And all what your rules are doing is to unconditionally apply
> kernel-internal knowledge to kernel devices -- which if you look at it
> from one step back -- is a bit weird.
> 
> So I guess, this should be done from the multipath package, the dm
> setup, some 'tweak.rpm', ...  I'm not sure, if we can do that for
> everybody from the main udev sources, for the same reasons the scsi
> timeout was wrong to do from udev. The time we added it, it seemed to
> be the right thing, but 2 years later it wasn't, because the kernel
> evolved, and we got into its way.

Hi Kay,

I can understand the issue of a rule being not valid anymore if kernel
evolves. But the question is what's wrong with that? Why can't we keep
on updating the udev rules as kernel and hardware evolves. Are they
supposed to be set in stone once a rule has been written?

Even if we move the rule to some other user space package, then that
package will face the same issue of rule being not valid any more if
kernel evolves. So that will be equivalent just shifting the problem
from one user space package to other.

To me key thing here is whether udev should try to set up some reasonable
IO scheduler defaults for system or not or it should be left entirely
to kernel.

"Deadline" IO scheduler generally works very well with enterprise storage.
CFQ primarly cuts down seeks for very seeky media like SATA drive. Kernel
by default keeps CFQ as default for all the devices and we are trying to
improve out of the box experience for the user instead of imposing CFQ
on everybody and expecting them to change it later to deadline where
appropriate.

Because rules are still not very clear yet and we are not sure how well
this notion of CFQ for SATA is going to play with everybody, to me it
still might not be a bad idea to initially write a udev rule and if
this works reasonably well or kernel evolves, we can modify the rule
accordingly.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [patch|rfc] add support for I/O scheduler tuning
  2010-11-10 16:47 [patch|rfc] add support for I/O scheduler tuning Jeff Moyer
                   ` (6 preceding siblings ...)
  2010-11-15 14:57 ` Vivek Goyal
@ 2010-11-15 15:43 ` Kay Sievers
  7 siblings, 0 replies; 9+ messages in thread
From: Kay Sievers @ 2010-11-15 15:43 UTC (permalink / raw)
  To: linux-hotplug

On Mon, Nov 15, 2010 at 15:57, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Fri, Nov 12, 2010 at 03:36:47PM +0100, Kay Sievers wrote:
>> On Thu, Nov 11, 2010 at 21:07, Jeff Moyer <jmoyer@redhat.com> wrote:
>> > Jens Axboe <axboe@kernel.dk> writes:
>> >> On 2010-11-10 21:03, Vivek Goyal wrote:
>> >>> On Wed, Nov 10, 2010 at 01:26:21PM -0500, David Zeuthen wrote:
>> >>>> On Wed, Nov 10, 2010 at 11:47 AM, Jeff Moyer <jmoyer@redhat.com> wrote:
>> >>>>> From within the block layer in the kernel, it is difficult to
>> >>>>> automatically detect the performance characteristics of the underlying
>> >>>>> storage.  It was suggested by Jens Axboe at LSF2010 that we write a udev
>> >>>>> rule to tune the I/O scheduler properly for most cases.  The basic
>> >>>>> approach is to leave CFQ's default tunings alone for SATA disks.  For
>> >>>>> everything else, turn off slice idling and bump the quantum in order to
>> >>>>> drive higher queue depths.  This patch is an attempt to implement this.
>> >>>>>
>> >>>>> I've tested it in a variety of configurations:
>> >>>>> - cciss devices
>> >>>>> - sata disks
>> >>>>> - sata ssds
>> >>>>> - enterprise storage (single path)
>> >>>>> - enterprise storage (multi-path)
>> >>>>> - multiple paths to a sata disk (yes, you can actually do that!)
>> >>>>>
>> >>>>> The tuning works as expected in all of those scenarios.  I look forward
>> >>>>> to your comments.
>> >>>>
>> >>>> This looks useful, but I really think the kernel driver creating the
>> >>>> block device should choose/change the defaults for the created block
>> >>>> device - it seems really backwards to do this in user-space as an
>> >>>> afterthought.
>> >>>
>> >>> I think it just becomes little easier to implement in user space so that
>> >>> if things don't work as expected, somebody can easily disable the rules
>> >>> or somebody can easily refine the rule further to better suite their
>> >>> needs instead of driver hardcoding this decision.
>> >>
>> >> That's the primary reason why I suggested doing this in user space. Plus
>> >> we don't always know in the kernel, at least this provides an easier way
>> >> to auto-tune things.
>> >
>> > Right, so given the above, is there still opposition to doing this in
>> > udev?
>>
>> Not in general. Udev can do such things, that's what it's there for.
>> It can do quirks, custom setups, and support tweaked configs that way.
>>
>> But it's usually not meant to set common defaults for every box. The
>> last time we got into this business, and set timeouts for scsi devices
>> from udev, we broke more recent kernels that did not like the
>> specified values anymore, and we needed to remove all that in released
>> versions, to be able to safely run newer kernels. And we've been told
>> not to do such a thing in the future.
>>
>> And all what your rules are doing is to unconditionally apply
>> kernel-internal knowledge to kernel devices -- which if you look at it
>> from one step back -- is a bit weird.
>>
>> So I guess, this should be done from the multipath package, the dm
>> setup, some 'tweak.rpm', ...  I'm not sure, if we can do that for
>> everybody from the main udev sources, for the same reasons the scsi
>> timeout was wrong to do from udev. The time we added it, it seemed to
>> be the right thing, but 2 years later it wasn't, because the kernel
>> evolved, and we got into its way.
>
> Hi Kay,
>
> I can understand the issue of a rule being not valid anymore if kernel
> evolves. But the question is what's wrong with that? Why can't we keep
> on updating the udev rules as kernel and hardware evolves. Are they
> supposed to be set in stone once a rule has been written?
>
> Even if we move the rule to some other user space package, then that
> package will face the same issue of rule being not valid any more if
> kernel evolves. So that will be equivalent just shifting the problem
> from one user space package to other.
>
> To me key thing here is whether udev should try to set up some reasonable
> IO scheduler defaults for system or not or it should be left entirely
> to kernel.
>
> "Deadline" IO scheduler generally works very well with enterprise storage.
> CFQ primarly cuts down seeks for very seeky media like SATA drive. Kernel
> by default keeps CFQ as default for all the devices and we are trying to
> improve out of the box experience for the user instead of imposing CFQ
> on everybody and expecting them to change it later to deadline where
> appropriate.
>
> Because rules are still not very clear yet and we are not sure how well
> this notion of CFQ for SATA is going to play with everybody, to me it
> still might not be a bad idea to initially write a udev rule and if
> this works reasonably well or kernel evolves, we can modify the rule
> accordingly.

Udev can be the engine to change stuff on demand, but it should not
ship *common defaults* which are only gathered from kernel
information. If that's the goal, and it should be done for all
systems, please change the kernel to that directly, and don't put that
into udev.

It would be a different picture if userspace would be involved in some
sense, like persistently storing results of 'disk tests', or something
similar, and applying calculated values based on these earlier
results, to the actual disk when it is re-discovered. It would
probably also involve permanent monitoring, and updating these values.

Retrieving simple kernel values and re-apply them to the kernel does
not make much sense in general, not for block devices, not for other
subsystems, and things like that should not got into the udev
repository for the reasons mentioned in the earlier mail.

Kay

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2010-11-15 15:43 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-11-10 16:47 [patch|rfc] add support for I/O scheduler tuning Jeff Moyer
2010-11-10 17:03 ` Jeff Moyer
2010-11-10 18:26 ` David Zeuthen
2010-11-10 20:03 ` Vivek Goyal
2010-11-10 20:08 ` Jens Axboe
2010-11-11 20:07 ` Jeff Moyer
2010-11-12 14:36 ` Kay Sievers
2010-11-15 14:57 ` Vivek Goyal
2010-11-15 15:43 ` Kay Sievers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).