* [RFC PATCH] block: Change default IO scheduler to deadline except SATA
@ 2012-04-10 13:37 Vivek Goyal
2012-04-10 13:56 ` Jens Axboe
0 siblings, 1 reply; 18+ messages in thread
From: Vivek Goyal @ 2012-04-10 13:37 UTC (permalink / raw)
To: linux kernel mailing list, Jens Axboe; +Cc: Moyer Jeff Moyer
Hi,
I am wondering if CFQ as default scheduler is still the right choice. CFQ
generally works well on slow rotational media (SATA?). But often
underperforms on faster storage (storage arrays, PCIE SSDs, virtualized
disk in linux guests etc). People often put logic in user space to tune their
systems and change IO scheduler to deadline to get better performance on
faster storage.
Though there is not one good answer for all kind of storage and for all
kind of workloads, I am wondering if we can provide a better default and
that is change default IO scheduler to "deadline" except SATA.
One can argue that some SAS disks can be slow too and benefit from CFQ. Yes,
but default IO scheduler choice is not perfect anyway. It just tries to
cater to a wide variety of use cases out of the box.
So I am throwing this patch out see if it flies. Personally, I think it
might turn out to be a more reasonable default.
Thanks
Vivek
Change default IO scheduler to deadline except SATA disks.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
block/Kconfig.iosched | 2 +-
block/elevator.c | 2 +-
drivers/ata/libata-scsi.c | 4 ++++
include/linux/elevator.h | 2 ++
4 files changed, 8 insertions(+), 2 deletions(-)
Index: linux-2.6/block/Kconfig.iosched
===================================================================
--- linux-2.6.orig/block/Kconfig.iosched 2012-04-09 22:18:30.941885325 -0400
+++ linux-2.6/block/Kconfig.iosched 2012-04-09 22:18:51.982885971 -0400
@@ -45,7 +45,7 @@ config CFQ_GROUP_IOSCHED
choice
prompt "Default I/O scheduler"
- default DEFAULT_CFQ
+ default DEFAULT_DEADLINE
help
Select the I/O scheduler which will be used by default for all
block devices.
Index: linux-2.6/drivers/ata/libata-scsi.c
===================================================================
--- linux-2.6.orig/drivers/ata/libata-scsi.c 2012-04-09 22:18:30.946885325 -0400
+++ linux-2.6/drivers/ata/libata-scsi.c 2012-04-10 01:09:10.529292695 -0400
@@ -1146,6 +1146,10 @@ static int ata_scsi_dev_config(struct sc
blk_queue_flush_queueable(q, false);
+ /* Change IO scheduler to CFQ */
+ if (!(*chosen_elevator))
+ elevator_change(q, "cfq");
+
dev->sdev = sdev;
return 0;
}
Index: linux-2.6/block/elevator.c
===================================================================
--- linux-2.6.orig/block/elevator.c 2012-04-09 22:18:30.000000000 -0400
+++ linux-2.6/block/elevator.c 2012-04-10 20:11:10.296866631 -0400
@@ -130,7 +130,7 @@ static int elevator_init_queue(struct re
return -ENOMEM;
}
-static char chosen_elevator[ELV_NAME_MAX];
+char chosen_elevator[ELV_NAME_MAX];
static int __init elevator_setup(char *str)
{
Index: linux-2.6/include/linux/elevator.h
===================================================================
--- linux-2.6.orig/include/linux/elevator.h 2012-03-13 01:07:29.000000000 -0400
+++ linux-2.6/include/linux/elevator.h 2012-04-10 01:07:44.303289797 -0400
@@ -204,5 +204,7 @@ enum {
INIT_LIST_HEAD(&(rq)->csd.list); \
} while (0)
+extern char chosen_elevator[];
+
#endif /* CONFIG_BLOCK */
#endif
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH] block: Change default IO scheduler to deadline except SATA
2012-04-10 13:37 Vivek Goyal
@ 2012-04-10 13:56 ` Jens Axboe
2012-04-10 14:21 ` Vivek Goyal
0 siblings, 1 reply; 18+ messages in thread
From: Jens Axboe @ 2012-04-10 13:56 UTC (permalink / raw)
To: Vivek Goyal; +Cc: linux kernel mailing list, Moyer Jeff Moyer
On 2012-04-10 15:37, Vivek Goyal wrote:
> Hi,
>
> I am wondering if CFQ as default scheduler is still the right choice. CFQ
> generally works well on slow rotational media (SATA?). But often
> underperforms on faster storage (storage arrays, PCIE SSDs, virtualized
> disk in linux guests etc). People often put logic in user space to tune their
> systems and change IO scheduler to deadline to get better performance on
> faster storage.
>
> Though there is not one good answer for all kind of storage and for all
> kind of workloads, I am wondering if we can provide a better default and
> that is change default IO scheduler to "deadline" except SATA.
>
> One can argue that some SAS disks can be slow too and benefit from CFQ. Yes,
> but default IO scheduler choice is not perfect anyway. It just tries to
> cater to a wide variety of use cases out of the box.
>
> So I am throwing this patch out see if it flies. Personally, I think it
> might turn out to be a more reasonable default.
I think it'd be a lot more sane to just use CFQ on rotational single
devices, and default to deadline on raid or non-rotational devices. This
still isn't perfect, since less worthy SSDs still benefit from the
read/write separation, and some multi device configs will be faster as
well. But it's better.
The below patch is not a good idea. There's no clear distinction between
on what CFQ is now the default.
--
Jens Axboe
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH] block: Change default IO scheduler to deadline except SATA
2012-04-10 13:56 ` Jens Axboe
@ 2012-04-10 14:21 ` Vivek Goyal
2012-04-10 15:10 ` Vivek Goyal
0 siblings, 1 reply; 18+ messages in thread
From: Vivek Goyal @ 2012-04-10 14:21 UTC (permalink / raw)
To: Jens Axboe; +Cc: linux kernel mailing list, Moyer Jeff Moyer
On Tue, Apr 10, 2012 at 03:56:39PM +0200, Jens Axboe wrote:
> On 2012-04-10 15:37, Vivek Goyal wrote:
> > Hi,
> >
> > I am wondering if CFQ as default scheduler is still the right choice. CFQ
> > generally works well on slow rotational media (SATA?). But often
> > underperforms on faster storage (storage arrays, PCIE SSDs, virtualized
> > disk in linux guests etc). People often put logic in user space to tune their
> > systems and change IO scheduler to deadline to get better performance on
> > faster storage.
> >
> > Though there is not one good answer for all kind of storage and for all
> > kind of workloads, I am wondering if we can provide a better default and
> > that is change default IO scheduler to "deadline" except SATA.
> >
> > One can argue that some SAS disks can be slow too and benefit from CFQ. Yes,
> > but default IO scheduler choice is not perfect anyway. It just tries to
> > cater to a wide variety of use cases out of the box.
> >
> > So I am throwing this patch out see if it flies. Personally, I think it
> > might turn out to be a more reasonable default.
>
> I think it'd be a lot more sane to just use CFQ on rotational single
> devices, and default to deadline on raid or non-rotational devices. This
> still isn't perfect, since less worthy SSDs still benefit from the
> read/write separation, and some multi device configs will be faster as
> well. But it's better.
Hi Jens,
Thanks. Taking a decision based on rotational flag makes sense. I am
not sure that does one get the information that a block device is a single
device or not. Especially with HBAs, SCSI Luns over Fiber, iSCSI Luns etc.
I have few Scsi Luns exported to me backed by a storage array. Everything
runs CFQ by default. And though disks in the array are rotational, they
are RAIDed and AFAIK, this information is not available to driver.
I am not sure if there is an easy way to get similar info for dm/md devices.
>
> The below patch is not a good idea. There's no clear distinction between
> on what CFQ is now the default.
Can it be thought of as that block layer default is "deadline" and driver
can override it. Yes, I agree that it is not very clear though.
Thanks
Vivek
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH] block: Change default IO scheduler to deadline except SATA
2012-04-10 14:21 ` Vivek Goyal
@ 2012-04-10 15:10 ` Vivek Goyal
2012-04-10 16:13 ` Mike Snitzer
2012-04-10 18:41 ` Jens Axboe
0 siblings, 2 replies; 18+ messages in thread
From: Vivek Goyal @ 2012-04-10 15:10 UTC (permalink / raw)
To: Jens Axboe; +Cc: linux kernel mailing list, Moyer Jeff Moyer
On Tue, Apr 10, 2012 at 10:21:48AM -0400, Vivek Goyal wrote:
> On Tue, Apr 10, 2012 at 03:56:39PM +0200, Jens Axboe wrote:
> > On 2012-04-10 15:37, Vivek Goyal wrote:
> > > Hi,
> > >
> > > I am wondering if CFQ as default scheduler is still the right choice. CFQ
> > > generally works well on slow rotational media (SATA?). But often
> > > underperforms on faster storage (storage arrays, PCIE SSDs, virtualized
> > > disk in linux guests etc). People often put logic in user space to tune their
> > > systems and change IO scheduler to deadline to get better performance on
> > > faster storage.
> > >
> > > Though there is not one good answer for all kind of storage and for all
> > > kind of workloads, I am wondering if we can provide a better default and
> > > that is change default IO scheduler to "deadline" except SATA.
> > >
> > > One can argue that some SAS disks can be slow too and benefit from CFQ. Yes,
> > > but default IO scheduler choice is not perfect anyway. It just tries to
> > > cater to a wide variety of use cases out of the box.
> > >
> > > So I am throwing this patch out see if it flies. Personally, I think it
> > > might turn out to be a more reasonable default.
> >
> > I think it'd be a lot more sane to just use CFQ on rotational single
> > devices, and default to deadline on raid or non-rotational devices. This
> > still isn't perfect, since less worthy SSDs still benefit from the
> > read/write separation, and some multi device configs will be faster as
> > well. But it's better.
>
> Hi Jens,
>
> Thanks. Taking a decision based on rotational flag makes sense. I am
> not sure that does one get the information that a block device is a single
> device or not. Especially with HBAs, SCSI Luns over Fiber, iSCSI Luns etc.
> I have few Scsi Luns exported to me backed by a storage array. Everything
> runs CFQ by default. And though disks in the array are rotational, they
> are RAIDed and AFAIK, this information is not available to driver.
>
> I am not sure if there is an easy way to get similar info for dm/md devices.
Thinking more about it, even if we have a way to define a request queue
flag for multi devices (QUEUE_FLAG_MULTI_DEVICE), when can block layer
take a decision to change the IO scheduler. At queue alloc and init time
driver might not have even called add_disk() or set all the
flags/properties of the queue. So doing it at queue alloc/init time might
not be best.
And later we get control only when actual IO happens on the queue and
doing one more check or trying to change elevator in IO path is not a
good idea.
May be when driver tries to set ROTATIONAL or MULTI_DEVICE flag, we can
check and change elevator then.
So we are back to the question of can scsi devices find out if a Lun
is backed by single disk or multiple disks.
Thanks
Vivek
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH] block: Change default IO scheduler to deadline except SATA
2012-04-10 15:10 ` Vivek Goyal
@ 2012-04-10 16:13 ` Mike Snitzer
2012-04-10 17:28 ` Vivek Goyal
` (2 more replies)
2012-04-10 18:41 ` Jens Axboe
1 sibling, 3 replies; 18+ messages in thread
From: Mike Snitzer @ 2012-04-10 16:13 UTC (permalink / raw)
To: Vivek Goyal; +Cc: Jens Axboe, linux kernel mailing list, martin.petersen
On Tue, Apr 10, 2012 at 11:10 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
>
> On Tue, Apr 10, 2012 at 10:21:48AM -0400, Vivek Goyal wrote:
> > On Tue, Apr 10, 2012 at 03:56:39PM +0200, Jens Axboe wrote:
> > > I think it'd be a lot more sane to just use CFQ on rotational single
> > > devices, and default to deadline on raid or non-rotational devices.
> > > This
> > > still isn't perfect, since less worthy SSDs still benefit from the
> > > read/write separation, and some multi device configs will be faster as
> > > well. But it's better.
> >
> > Hi Jens,
> >
> > Thanks. Taking a decision based on rotational flag makes sense. I am
> > not sure that does one get the information that a block device is a
> > single
> > device or not. Especially with HBAs, SCSI Luns over Fiber, iSCSI Luns
> > etc.
> > I have few Scsi Luns exported to me backed by a storage array.
> > Everything
> > runs CFQ by default. And though disks in the array are rotational, they
> > are RAIDed and AFAIK, this information is not available to driver.
> >
> > I am not sure if there is an easy way to get similar info for dm/md
> > devices.
>
> Thinking more about it, even if we have a way to define a request queue
> flag for multi devices (QUEUE_FLAG_MULTI_DEVICE), when can block layer
> take a decision to change the IO scheduler. At queue alloc and init time
> driver might not have even called add_disk() or set all the
> flags/properties of the queue. So doing it at queue alloc/init time might
> not be best.
>
> And later we get control only when actual IO happens on the queue and
> doing one more check or trying to change elevator in IO path is not a
> good idea.
>
> May be when driver tries to set ROTATIONAL or MULTI_DEVICE flag, we can
> check and change elevator then.
>
> So we are back to the question of can scsi devices find out if a Lun
> is backed by single disk or multiple disks.
I'm not aware of any discrete attribute (comparable to 'rotational'
flag) that SCSI devices will advertise that indicates "I'm a raid
array".
That said, we can have a _very_ good hint that a SCSI device is a raid array if:
1) optimal_io_size is not zero, minimum_io_size is not equal to
optimal_io_size, and optimal_io_size is a multiple of minimum_io_size
2) WCE=0 (higher-end arrays with a writeback cache)
Determining 1 could be enough, we should probably ignore 2 as it isn't
an absolute indication that a device is composed of multiple devices
(especially not if considered independently of 1).
Mike
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH] block: Change default IO scheduler to deadline except SATA
2012-04-10 16:13 ` Mike Snitzer
@ 2012-04-10 17:28 ` Vivek Goyal
2012-04-10 17:40 ` Mike Snitzer
2012-04-10 18:36 ` Jens Axboe
2012-04-11 16:25 ` Martin K. Petersen
2 siblings, 1 reply; 18+ messages in thread
From: Vivek Goyal @ 2012-04-10 17:28 UTC (permalink / raw)
To: Mike Snitzer; +Cc: Jens Axboe, linux kernel mailing list, martin.petersen
On Tue, Apr 10, 2012 at 12:13:07PM -0400, Mike Snitzer wrote:
[..]
> > So we are back to the question of can scsi devices find out if a Lun
> > is backed by single disk or multiple disks.
>
> I'm not aware of any discrete attribute (comparable to 'rotational'
> flag) that SCSI devices will advertise that indicates "I'm a raid
> array".
>
> That said, we can have a _very_ good hint that a SCSI device is a raid array if:
>
> 1) optimal_io_size is not zero, minimum_io_size is not equal to
> optimal_io_size, and optimal_io_size is a multiple of minimum_io_size
>
> 2) WCE=0 (higher-end arrays with a writeback cache)
>
> Determining 1 could be enough, we should probably ignore 2 as it isn't
> an absolute indication that a device is composed of multiple devices
> (especially not if considered independently of 1).
Umm..., somehow relying on optimal_io_size != minimum_io_size sounds odd
to me (assuming it works).
I checked bunch of Luns exported to me and all of them have
optimal_io_size=0.
I have few FC Luns exported from two array vendors. I have few iscsi Luns
exported from two separate vendors and all these luns have
optimal_io_size=0.
Thanks
Vivek
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH] block: Change default IO scheduler to deadline except SATA
2012-04-10 17:28 ` Vivek Goyal
@ 2012-04-10 17:40 ` Mike Snitzer
0 siblings, 0 replies; 18+ messages in thread
From: Mike Snitzer @ 2012-04-10 17:40 UTC (permalink / raw)
To: Vivek Goyal; +Cc: Jens Axboe, linux kernel mailing list, martin.petersen
On Tue, Apr 10 2012 at 1:28pm -0400,
Vivek Goyal <vgoyal@redhat.com> wrote:
> On Tue, Apr 10, 2012 at 12:13:07PM -0400, Mike Snitzer wrote:
>
> [..]
> > > So we are back to the question of can scsi devices find out if a Lun
> > > is backed by single disk or multiple disks.
> >
> > I'm not aware of any discrete attribute (comparable to 'rotational'
> > flag) that SCSI devices will advertise that indicates "I'm a raid
> > array".
> >
> > That said, we can have a _very_ good hint that a SCSI device is a raid array if:
> >
> > 1) optimal_io_size is not zero, minimum_io_size is not equal to
> > optimal_io_size, and optimal_io_size is a multiple of minimum_io_size
> >
> > 2) WCE=0 (higher-end arrays with a writeback cache)
> >
> > Determining 1 could be enough, we should probably ignore 2 as it isn't
> > an absolute indication that a device is composed of multiple devices
> > (especially not if considered independently of 1).
>
> Umm..., somehow relying on optimal_io_size != minimum_io_size sounds odd
> to me (assuming it works).
>
> I checked bunch of Luns exported to me and all of them have
> optimal_io_size=0.
As I said above, this would only apply if "optimal_io_size is not zero, ..."
> I have few FC Luns exported from two array vendors. I have few iscsi Luns
> exported from two separate vendors and all these luns have
> optimal_io_size=0.
Seems all those LUNs haven't exported their limits via BLOCK LIMITS VPD.
But more recent firmware could rectify that.
Anyway, there is no one size fits all here, sorry to disappoint you!
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH] block: Change default IO scheduler to deadline except SATA
@ 2012-04-10 17:44 Xose Vazquez Perez
0 siblings, 0 replies; 18+ messages in thread
From: Xose Vazquez Perez @ 2012-04-10 17:44 UTC (permalink / raw)
To: linux-kernel
Vivek Goyal wrote:
> I am wondering if CFQ as default scheduler is still the right choice. CFQ
> generally works well on slow rotational media (SATA?). But often
> underperforms on faster storage (storage arrays, PCIE SSDs, virtualized
> disk in linux guests etc). People often put logic in user space to tune their
> systems and change IO scheduler to deadline to get better performance on
> faster storage.
>
> Though there is not one good answer for all kind of storage and for all
> kind of workloads, I am wondering if we can provide a better default and
> that is change default IO scheduler to "deadline" except SATA.
>
> One can argue that some SAS disks can be slow too and benefit from CFQ. Yes,
> but default IO scheduler choice is not perfect anyway. It just tries to
> cater to a wide variety of use cases out of the box.
>
> So I am throwing this patch out see if it flies. Personally, I think it
> might turn out to be a more reasonable default.
done time ago for dasd devices.
chuchi:~/curre/linux-2.6 $ grep -ri deadline drivers/s390/block/*
drivers/s390/block/dasd.c: rc = elevator_init(block->request_queue, "deadline");
drivers/s390/block/Kconfig: select IOSCHED_DEADLINE
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH] block: Change default IO scheduler to deadline except SATA
2012-04-10 16:13 ` Mike Snitzer
2012-04-10 17:28 ` Vivek Goyal
@ 2012-04-10 18:36 ` Jens Axboe
2012-04-11 16:25 ` Martin K. Petersen
2 siblings, 0 replies; 18+ messages in thread
From: Jens Axboe @ 2012-04-10 18:36 UTC (permalink / raw)
To: Mike Snitzer; +Cc: Vivek Goyal, linux kernel mailing list, martin.petersen
On 2012-04-10 18:13, Mike Snitzer wrote:
> That said, we can have a _very_ good hint that a SCSI device is a raid array if:
>
> 1) optimal_io_size is not zero, minimum_io_size is not equal to
> optimal_io_size, and optimal_io_size is a multiple of minimum_io_size
>
> 2) WCE=0 (higher-end arrays with a writeback cache)
Any other device than a basic SATA disk will have WCE=0. So I don't
think that gives you a very good hint of the nature of it,
unfortunately.
--
Jens Axboe
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH] block: Change default IO scheduler to deadline except SATA
2012-04-10 15:10 ` Vivek Goyal
2012-04-10 16:13 ` Mike Snitzer
@ 2012-04-10 18:41 ` Jens Axboe
2012-04-10 18:53 ` Vivek Goyal
1 sibling, 1 reply; 18+ messages in thread
From: Jens Axboe @ 2012-04-10 18:41 UTC (permalink / raw)
To: Vivek Goyal; +Cc: linux kernel mailing list, Moyer Jeff Moyer
On 2012-04-10 17:10, Vivek Goyal wrote:
> On Tue, Apr 10, 2012 at 10:21:48AM -0400, Vivek Goyal wrote:
>> On Tue, Apr 10, 2012 at 03:56:39PM +0200, Jens Axboe wrote:
>>> On 2012-04-10 15:37, Vivek Goyal wrote:
>>>> Hi,
>>>>
>>>> I am wondering if CFQ as default scheduler is still the right choice. CFQ
>>>> generally works well on slow rotational media (SATA?). But often
>>>> underperforms on faster storage (storage arrays, PCIE SSDs, virtualized
>>>> disk in linux guests etc). People often put logic in user space to tune their
>>>> systems and change IO scheduler to deadline to get better performance on
>>>> faster storage.
>>>>
>>>> Though there is not one good answer for all kind of storage and for all
>>>> kind of workloads, I am wondering if we can provide a better default and
>>>> that is change default IO scheduler to "deadline" except SATA.
>>>>
>>>> One can argue that some SAS disks can be slow too and benefit from CFQ. Yes,
>>>> but default IO scheduler choice is not perfect anyway. It just tries to
>>>> cater to a wide variety of use cases out of the box.
>>>>
>>>> So I am throwing this patch out see if it flies. Personally, I think it
>>>> might turn out to be a more reasonable default.
>>>
>>> I think it'd be a lot more sane to just use CFQ on rotational single
>>> devices, and default to deadline on raid or non-rotational devices. This
>>> still isn't perfect, since less worthy SSDs still benefit from the
>>> read/write separation, and some multi device configs will be faster as
>>> well. But it's better.
>>
>> Hi Jens,
>>
>> Thanks. Taking a decision based on rotational flag makes sense. I am
>> not sure that does one get the information that a block device is a single
>> device or not. Especially with HBAs, SCSI Luns over Fiber, iSCSI Luns etc.
>> I have few Scsi Luns exported to me backed by a storage array. Everything
>> runs CFQ by default. And though disks in the array are rotational, they
>> are RAIDed and AFAIK, this information is not available to driver.
>>
>> I am not sure if there is an easy way to get similar info for dm/md devices.
>
> Thinking more about it, even if we have a way to define a request queue
> flag for multi devices (QUEUE_FLAG_MULTI_DEVICE), when can block layer
> take a decision to change the IO scheduler. At queue alloc and init time
> driver might not have even called add_disk() or set all the
> flags/properties of the queue. So doing it at queue alloc/init time might
> not be best.
>
> And later we get control only when actual IO happens on the queue and
> doing one more check or trying to change elevator in IO path is not a
> good idea.
>
> May be when driver tries to set ROTATIONAL or MULTI_DEVICE flag, we can
> check and change elevator then.
>
> So we are back to the question of can scsi devices find out if a Lun
> is backed by single disk or multiple disks.
The cleanest would be to have the driver signal these attributes at
probe time. You could even adjust CFQ properties based on this, driving
the queue depth harder etc. Realistically, going forward, most fast
flash devices will be driven by a noop-like scheduler on multiqueue. So
CPU cost of the IO scheduler can mostly be ignored, since CFQ cost on
even big RAIDs isn't an issue due to the low IOPS rates.
--
Jens Axboe
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH] block: Change default IO scheduler to deadline except SATA
2012-04-10 18:41 ` Jens Axboe
@ 2012-04-10 18:53 ` Vivek Goyal
2012-04-10 18:56 ` Jens Axboe
0 siblings, 1 reply; 18+ messages in thread
From: Vivek Goyal @ 2012-04-10 18:53 UTC (permalink / raw)
To: Jens Axboe; +Cc: linux kernel mailing list, Moyer Jeff Moyer
On Tue, Apr 10, 2012 at 08:41:08PM +0200, Jens Axboe wrote:
[..]
> > So we are back to the question of can scsi devices find out if a Lun
> > is backed by single disk or multiple disks.
>
> The cleanest would be to have the driver signal these attributes at
> probe time. You could even adjust CFQ properties based on this, driving
> the queue depth harder etc. Realistically, going forward, most fast
> flash devices will be driven by a noop-like scheduler on multiqueue. So
> CPU cost of the IO scheduler can mostly be ignored, since CFQ cost on
> even big RAIDs isn't an issue due to the low IOPS rates.
Agreed that on RAID CPU cost is not a problem. Just that idling and low
queue depth kills the performance.
So apart from "rotational" if driver can give some hints about underlying
devices being RAID (or multi device), it will help. Just that it looks
like scsi does not have a way to determine that.
Thanks
Vivek
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH] block: Change default IO scheduler to deadline except SATA
2012-04-10 18:53 ` Vivek Goyal
@ 2012-04-10 18:56 ` Jens Axboe
2012-04-10 19:11 ` Vivek Goyal
0 siblings, 1 reply; 18+ messages in thread
From: Jens Axboe @ 2012-04-10 18:56 UTC (permalink / raw)
To: Vivek Goyal; +Cc: linux kernel mailing list, Moyer Jeff Moyer
On 2012-04-10 20:53, Vivek Goyal wrote:
> On Tue, Apr 10, 2012 at 08:41:08PM +0200, Jens Axboe wrote:
>
> [..]
>>> So we are back to the question of can scsi devices find out if a Lun
>>> is backed by single disk or multiple disks.
>>
>> The cleanest would be to have the driver signal these attributes at
>> probe time. You could even adjust CFQ properties based on this, driving
>> the queue depth harder etc. Realistically, going forward, most fast
>> flash devices will be driven by a noop-like scheduler on multiqueue. So
>> CPU cost of the IO scheduler can mostly be ignored, since CFQ cost on
>> even big RAIDs isn't an issue due to the low IOPS rates.
>
> Agreed that on RAID CPU cost is not a problem. Just that idling and low
> queue depth kills the performance.
Exactly, and both of these are trivially adjustable as long as we know
when to do it.
> So apart from "rotational" if driver can give some hints about underlying
> devices being RAID (or multi device), it will help. Just that it looks
> like scsi does not have a way to determine that.
This sort of thing should be done with a udev rule. It should not be too
hard to match for the most popular arrays, catching the majority of the
setups by default. Or you could ask the SCSI folks for some heuristics,
it's not unlikely that a few different attributes could make that bullet
proof, pretty much.
--
Jens Axboe
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH] block: Change default IO scheduler to deadline except SATA
2012-04-10 18:56 ` Jens Axboe
@ 2012-04-10 19:11 ` Vivek Goyal
2012-04-10 19:19 ` Jens Axboe
0 siblings, 1 reply; 18+ messages in thread
From: Vivek Goyal @ 2012-04-10 19:11 UTC (permalink / raw)
To: Jens Axboe
Cc: linux kernel mailing list, Moyer Jeff Moyer, linux-scsi,
kay.sievers
On Tue, Apr 10, 2012 at 08:56:19PM +0200, Jens Axboe wrote:
> On 2012-04-10 20:53, Vivek Goyal wrote:
> > On Tue, Apr 10, 2012 at 08:41:08PM +0200, Jens Axboe wrote:
> >
> > [..]
> >>> So we are back to the question of can scsi devices find out if a Lun
> >>> is backed by single disk or multiple disks.
> >>
> >> The cleanest would be to have the driver signal these attributes at
> >> probe time. You could even adjust CFQ properties based on this, driving
> >> the queue depth harder etc. Realistically, going forward, most fast
> >> flash devices will be driven by a noop-like scheduler on multiqueue. So
> >> CPU cost of the IO scheduler can mostly be ignored, since CFQ cost on
> >> even big RAIDs isn't an issue due to the low IOPS rates.
> >
> > Agreed that on RAID CPU cost is not a problem. Just that idling and low
> > queue depth kills the performance.
>
> Exactly, and both of these are trivially adjustable as long as we know
> when to do it.
>
> > So apart from "rotational" if driver can give some hints about underlying
> > devices being RAID (or multi device), it will help. Just that it looks
> > like scsi does not have a way to determine that.
>
> This sort of thing should be done with a udev rule.
[CCing kay]
Kay does not like the idea of doing something along this line in udev.
He thinks that kernel changes over a period of time making udev rules
stale and hence it should be done in kernel. :-) I think he has had
some not so good experiences in the past.
Though personally I think that anything which is not set in stone should
go to udev. It atleast allows for easy change if user does not like the
setting. (disable the rule, modify the rule etc). And then rules evolve
as things change in kernel.
Anyway, this point can be detabted later once we figure out what's the
set of atrributes to look at.
> It should not be too
> hard to match for the most popular arrays, catching the majority of the
> setups by default. Or you could ask the SCSI folks for some heuristics,
> it's not unlikely that a few different attributes could make that bullet
> proof, pretty much.
I am wondering what will happen to request based multipath targets in
this scheme. There will have to be I guess additional rules to look for
underlying paths and then change the io scheduler accordingly.
CCing linux-scsi, if scsi guys have some ideas on what can we look at to
determine what scheduler to use.
Thanks
Vivek
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH] block: Change default IO scheduler to deadline except SATA
2012-04-10 19:11 ` Vivek Goyal
@ 2012-04-10 19:19 ` Jens Axboe
2012-04-10 19:43 ` Mike Snitzer
0 siblings, 1 reply; 18+ messages in thread
From: Jens Axboe @ 2012-04-10 19:19 UTC (permalink / raw)
To: Vivek Goyal
Cc: linux kernel mailing list, Moyer Jeff Moyer, linux-scsi,
kay.sievers
On 2012-04-10 21:11, Vivek Goyal wrote:
> On Tue, Apr 10, 2012 at 08:56:19PM +0200, Jens Axboe wrote:
>> On 2012-04-10 20:53, Vivek Goyal wrote:
>>> On Tue, Apr 10, 2012 at 08:41:08PM +0200, Jens Axboe wrote:
>>>
>>> [..]
>>>>> So we are back to the question of can scsi devices find out if a Lun
>>>>> is backed by single disk or multiple disks.
>>>>
>>>> The cleanest would be to have the driver signal these attributes at
>>>> probe time. You could even adjust CFQ properties based on this, driving
>>>> the queue depth harder etc. Realistically, going forward, most fast
>>>> flash devices will be driven by a noop-like scheduler on multiqueue. So
>>>> CPU cost of the IO scheduler can mostly be ignored, since CFQ cost on
>>>> even big RAIDs isn't an issue due to the low IOPS rates.
>>>
>>> Agreed that on RAID CPU cost is not a problem. Just that idling and low
>>> queue depth kills the performance.
>>
>> Exactly, and both of these are trivially adjustable as long as we know
>> when to do it.
>>
>>> So apart from "rotational" if driver can give some hints about underlying
>>> devices being RAID (or multi device), it will help. Just that it looks
>>> like scsi does not have a way to determine that.
>>
>> This sort of thing should be done with a udev rule.
>
> [CCing kay]
>
> Kay does not like the idea of doing something along this line in udev.
> He thinks that kernel changes over a period of time making udev rules
> stale and hence it should be done in kernel. :-) I think he has had
> some not so good experiences in the past.
>
> Though personally I think that anything which is not set in stone should
> go to udev. It atleast allows for easy change if user does not like the
> setting. (disable the rule, modify the rule etc). And then rules evolve
> as things change in kernel.
>
> Anyway, this point can be detabted later once we figure out what's the
> set of atrributes to look at.
It's a bit tricky. But supposedly sysfs files are part of the ABI, no
matter how silly that may be. For these particular tunables, that means
that some parts of the ABI are only valid/there if others contain a
specific value. So I'm assuming that udev does not want to rely on that.
Now I don't know a lot about udev or udev rules, but if you could make
it depend on the value of <dev>/queue/scheduler, then it should
(supposedly) be stable and safe to rely on. It all depends on what kind
of logic you can stuff into the rules.
In any case, I'm sure that udev does not want to ship with those rules.
It would have to be a separate package. Which is fine, in my opinion.
>> It should not be too
>> hard to match for the most popular arrays, catching the majority of the
>> setups by default. Or you could ask the SCSI folks for some heuristics,
>> it's not unlikely that a few different attributes could make that bullet
>> proof, pretty much.
>
> I am wondering what will happen to request based multipath targets in
> this scheme. There will have to be I guess additional rules to look for
> underlying paths and then change the io scheduler accordingly.
If each path is a device, each device should get caught and matched.
--
Jens Axboe
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH] block: Change default IO scheduler to deadline except SATA
2012-04-10 19:19 ` Jens Axboe
@ 2012-04-10 19:43 ` Mike Snitzer
2012-04-10 19:55 ` Jens Axboe
0 siblings, 1 reply; 18+ messages in thread
From: Mike Snitzer @ 2012-04-10 19:43 UTC (permalink / raw)
To: Jens Axboe
Cc: Vivek Goyal, linux kernel mailing list, Moyer Jeff Moyer,
linux-scsi, kay.sievers
On Tue, Apr 10, 2012 at 3:19 PM, Jens Axboe <axboe@kernel.dk> wrote:
> On 2012-04-10 21:11, Vivek Goyal wrote:
>> On Tue, Apr 10, 2012 at 08:56:19PM +0200, Jens Axboe wrote:
>>> On 2012-04-10 20:53, Vivek Goyal wrote:
>>>> On Tue, Apr 10, 2012 at 08:41:08PM +0200, Jens Axboe wrote:
>>>>
>>>> [..]
>>>>>> So we are back to the question of can scsi devices find out if a Lun
>>>>>> is backed by single disk or multiple disks.
>>>>>
>>>>> The cleanest would be to have the driver signal these attributes at
>>>>> probe time. You could even adjust CFQ properties based on this, driving
>>>>> the queue depth harder etc. Realistically, going forward, most fast
>>>>> flash devices will be driven by a noop-like scheduler on multiqueue. So
>>>>> CPU cost of the IO scheduler can mostly be ignored, since CFQ cost on
>>>>> even big RAIDs isn't an issue due to the low IOPS rates.
>>>>
>>>> Agreed that on RAID CPU cost is not a problem. Just that idling and low
>>>> queue depth kills the performance.
>>>
>>> Exactly, and both of these are trivially adjustable as long as we know
>>> when to do it.
>>>
>>>> So apart from "rotational" if driver can give some hints about underlying
>>>> devices being RAID (or multi device), it will help. Just that it looks
>>>> like scsi does not have a way to determine that.
>>>
>>> This sort of thing should be done with a udev rule.
>>
>> [CCing kay]
>>
>> Kay does not like the idea of doing something along this line in udev.
>> He thinks that kernel changes over a period of time making udev rules
>> stale and hence it should be done in kernel. :-) I think he has had
>> some not so good experiences in the past.
>>
>> Though personally I think that anything which is not set in stone should
>> go to udev. It atleast allows for easy change if user does not like the
>> setting. (disable the rule, modify the rule etc). And then rules evolve
>> as things change in kernel.
>>
>> Anyway, this point can be detabted later once we figure out what's the
>> set of atrributes to look at.
>
> It's a bit tricky. But supposedly sysfs files are part of the ABI, no
> matter how silly that may be. For these particular tunables, that means
> that some parts of the ABI are only valid/there if others contain a
> specific value. So I'm assuming that udev does not want to rely on that.
> Now I don't know a lot about udev or udev rules, but if you could make
> it depend on the value of <dev>/queue/scheduler, then it should
> (supposedly) be stable and safe to rely on. It all depends on what kind
> of logic you can stuff into the rules.
>
> In any case, I'm sure that udev does not want to ship with those rules.
> It would have to be a separate package. Which is fine, in my opinion.
>
>>> It should not be too
>>> hard to match for the most popular arrays, catching the majority of the
>>> setups by default. Or you could ask the SCSI folks for some heuristics,
>>> it's not unlikely that a few different attributes could make that bullet
>>> proof, pretty much.
>>
>> I am wondering what will happen to request based multipath targets in
>> this scheme. There will have to be I guess additional rules to look for
>> underlying paths and then change the io scheduler accordingly.
>
> If each path is a device, each device should get caught and matched.
I'm still missing your position (other than you now wanting to make it
a userspace concern).
Put differently: why should CFQ still be the default?
It is pretty widely held that deadline is the more sane default
(multiple distros are now using it, deadline is default for guests,
etc). CFQ has become more niche. The Linux default really should
reflect this.
The only case where defaulting to CFQ seems to make sense is
rotational SATA (and USB).
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH] block: Change default IO scheduler to deadline except SATA
2012-04-10 19:43 ` Mike Snitzer
@ 2012-04-10 19:55 ` Jens Axboe
2012-04-10 20:12 ` Mike Snitzer
0 siblings, 1 reply; 18+ messages in thread
From: Jens Axboe @ 2012-04-10 19:55 UTC (permalink / raw)
To: Mike Snitzer
Cc: Vivek Goyal, linux kernel mailing list, Moyer Jeff Moyer,
linux-scsi, kay.sievers
On 2012-04-10 21:43, Mike Snitzer wrote:
> On Tue, Apr 10, 2012 at 3:19 PM, Jens Axboe <axboe@kernel.dk> wrote:
>> On 2012-04-10 21:11, Vivek Goyal wrote:
>>> On Tue, Apr 10, 2012 at 08:56:19PM +0200, Jens Axboe wrote:
>>>> On 2012-04-10 20:53, Vivek Goyal wrote:
>>>>> On Tue, Apr 10, 2012 at 08:41:08PM +0200, Jens Axboe wrote:
>>>>>
>>>>> [..]
>>>>>>> So we are back to the question of can scsi devices find out if a Lun
>>>>>>> is backed by single disk or multiple disks.
>>>>>>
>>>>>> The cleanest would be to have the driver signal these attributes at
>>>>>> probe time. You could even adjust CFQ properties based on this, driving
>>>>>> the queue depth harder etc. Realistically, going forward, most fast
>>>>>> flash devices will be driven by a noop-like scheduler on multiqueue. So
>>>>>> CPU cost of the IO scheduler can mostly be ignored, since CFQ cost on
>>>>>> even big RAIDs isn't an issue due to the low IOPS rates.
>>>>>
>>>>> Agreed that on RAID CPU cost is not a problem. Just that idling and low
>>>>> queue depth kills the performance.
>>>>
>>>> Exactly, and both of these are trivially adjustable as long as we know
>>>> when to do it.
>>>>
>>>>> So apart from "rotational" if driver can give some hints about underlying
>>>>> devices being RAID (or multi device), it will help. Just that it looks
>>>>> like scsi does not have a way to determine that.
>>>>
>>>> This sort of thing should be done with a udev rule.
>>>
>>> [CCing kay]
>>>
>>> Kay does not like the idea of doing something along this line in udev.
>>> He thinks that kernel changes over a period of time making udev rules
>>> stale and hence it should be done in kernel. :-) I think he has had
>>> some not so good experiences in the past.
>>>
>>> Though personally I think that anything which is not set in stone should
>>> go to udev. It atleast allows for easy change if user does not like the
>>> setting. (disable the rule, modify the rule etc). And then rules evolve
>>> as things change in kernel.
>>>
>>> Anyway, this point can be detabted later once we figure out what's the
>>> set of atrributes to look at.
>>
>> It's a bit tricky. But supposedly sysfs files are part of the ABI, no
>> matter how silly that may be. For these particular tunables, that means
>> that some parts of the ABI are only valid/there if others contain a
>> specific value. So I'm assuming that udev does not want to rely on that.
>> Now I don't know a lot about udev or udev rules, but if you could make
>> it depend on the value of <dev>/queue/scheduler, then it should
>> (supposedly) be stable and safe to rely on. It all depends on what kind
>> of logic you can stuff into the rules.
>>
>> In any case, I'm sure that udev does not want to ship with those rules.
>> It would have to be a separate package. Which is fine, in my opinion.
>>
>>>> It should not be too
>>>> hard to match for the most popular arrays, catching the majority of the
>>>> setups by default. Or you could ask the SCSI folks for some heuristics,
>>>> it's not unlikely that a few different attributes could make that bullet
>>>> proof, pretty much.
>>>
>>> I am wondering what will happen to request based multipath targets in
>>> this scheme. There will have to be I guess additional rules to look for
>>> underlying paths and then change the io scheduler accordingly.
>>
>> If each path is a device, each device should get caught and matched.
>
> I'm still missing your position (other than you now wanting to make it
> a userspace concern).
>
> Put differently: why should CFQ still be the default?
>
> It is pretty widely held that deadline is the more sane default
> (multiple distros are now using it, deadline is default for guests,
> etc). CFQ has become more niche. The Linux default really should
> reflect this.
>
> The only case where defaulting to CFQ seems to make sense is
> rotational SATA (and USB).
That's the precisely the reason it should still be the default. The
default settings should reflect a good user experience out of the box.
Most desktop machines are still using SATA drives. And even those that
made the leap to SSD, lots of those are still pretty sucky at high queue
depths or without read/write separation. So I'm quite sure the default
still makes a lot of sense.
Punt tuning to the server side. If you absolutely want the best
performance out of your _particular_ workload, you are expecting and
required to tune things anyway. Not just the IO scheduler, but in
general. You can't make the same requirements for the desktop.
As to kernel vs user, I just see little reason for doing it in the
kernel if we can put that policy in user space.
--
Jens Axboe
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH] block: Change default IO scheduler to deadline except SATA
2012-04-10 19:55 ` Jens Axboe
@ 2012-04-10 20:12 ` Mike Snitzer
0 siblings, 0 replies; 18+ messages in thread
From: Mike Snitzer @ 2012-04-10 20:12 UTC (permalink / raw)
To: Jens Axboe
Cc: Vivek Goyal, linux kernel mailing list, Moyer Jeff Moyer,
linux-scsi, kay.sievers
On Tue, Apr 10 2012 at 3:55pm -0400,
Jens Axboe <axboe@kernel.dk> wrote:
> On 2012-04-10 21:43, Mike Snitzer wrote:
> > I'm still missing your position (other than you now wanting to make it
> > a userspace concern).
> >
> > Put differently: why should CFQ still be the default?
> >
> > It is pretty widely held that deadline is the more sane default
> > (multiple distros are now using it, deadline is default for guests,
> > etc). CFQ has become more niche. The Linux default really should
> > reflect this.
> >
> > The only case where defaulting to CFQ seems to make sense is
> > rotational SATA (and USB).
>
> That's the precisely the reason it should still be the default. The
> default settings should reflect a good user experience out of the box.
> Most desktop machines are still using SATA drives. And even those that
> made the leap to SSD, lots of those are still pretty sucky at high queue
> depths or without read/write separation. So I'm quite sure the default
> still makes a lot of sense.
I agree that a default of CFQ still makes sense for SATA and USB.
But why can't there be multiple defaults?
default: deadline
SATA and USB default: cfq
> Punt tuning to the server side. If you absolutely want the best
> performance out of your _particular_ workload, you are expecting and
> required to tune things anyway. Not just the IO scheduler, but in
> general. You can't make the same requirements for the desktop.
Just because server admins are more used to tuning doesn't mean all
server admins do it -- especially not on first evaluation.
> As to kernel vs user, I just see little reason for doing it in the
> kernel if we can put that policy in user space.
There are distro packages that are shipped to control such knobs,
e.g. tuned.
They don't help _at all_ if the user doesn't know about them. Knob
tuning is tedious on multiple levels. Much like this thread ;)
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH] block: Change default IO scheduler to deadline except SATA
2012-04-10 16:13 ` Mike Snitzer
2012-04-10 17:28 ` Vivek Goyal
2012-04-10 18:36 ` Jens Axboe
@ 2012-04-11 16:25 ` Martin K. Petersen
2 siblings, 0 replies; 18+ messages in thread
From: Martin K. Petersen @ 2012-04-11 16:25 UTC (permalink / raw)
To: Mike Snitzer
Cc: Vivek Goyal, Jens Axboe, linux kernel mailing list,
martin.petersen
>>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:
Mike> I'm not aware of any discrete attribute (comparable to
Mike> 'rotational' flag) that SCSI devices will advertise that indicates
Mike> "I'm a raid array".
Sadly, no.
Mike> That said, we can have a _very_ good hint that a SCSI device is a
Mike> raid array if:
Mike> 1) optimal_io_size is not zero, minimum_io_size is not equal to
Mike> optimal_io_size, and optimal_io_size is a multiple of
Mike> minimum_io_size
Unfortunately there are still a lot of arrays out there that don't
export the relevant VPDs.
I know there's a lot of resistance to doing stuff in the kernel that can
be done in udev. But with my distro hat on we update the kernel much,
much more frequently than we update udev rules. For a multitude of
reasons.
Also, when it comes to disk arrays we already have a significant portion
of them hardwired in the kernel anyway (quirks, LUN discovery, device
handlers). So I'd personally be fine with having a BLIST_ARRAY flag that
we could trigger off of.
--
Martin K. Petersen Oracle Linux Engineering
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2012-04-11 16:25 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-04-10 17:44 [RFC PATCH] block: Change default IO scheduler to deadline except SATA Xose Vazquez Perez
-- strict thread matches above, loose matches on Subject: below --
2012-04-10 13:37 Vivek Goyal
2012-04-10 13:56 ` Jens Axboe
2012-04-10 14:21 ` Vivek Goyal
2012-04-10 15:10 ` Vivek Goyal
2012-04-10 16:13 ` Mike Snitzer
2012-04-10 17:28 ` Vivek Goyal
2012-04-10 17:40 ` Mike Snitzer
2012-04-10 18:36 ` Jens Axboe
2012-04-11 16:25 ` Martin K. Petersen
2012-04-10 18:41 ` Jens Axboe
2012-04-10 18:53 ` Vivek Goyal
2012-04-10 18:56 ` Jens Axboe
2012-04-10 19:11 ` Vivek Goyal
2012-04-10 19:19 ` Jens Axboe
2012-04-10 19:43 ` Mike Snitzer
2012-04-10 19:55 ` Jens Axboe
2012-04-10 20:12 ` Mike Snitzer
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).