* [PATCH] block: disallow changing max_sectors_kb on a request stacking device
@ 2016-10-28 19:45 Mike Snitzer
2016-11-07 16:40 ` Mike Snitzer
0 siblings, 1 reply; 8+ messages in thread
From: Mike Snitzer @ 2016-10-28 19:45 UTC (permalink / raw)
To: axboe; +Cc: linux-block, dm-devel
Otherwise users can easily shoot themselves in the foot by creating the
situation where the top-level stacked device (e.g. DM multipath) has a
larger max_sectors_kb than the underlying device(s). Which will
certainly lead to IO errors due to the "over max size limit" check in
blk_cloned_rq_check_limits().
Use of WARN_ON_ONCE() gives users visibility into which application
attempted to perform this unsupported max_sectors_kb change.
This is a crude, yet effective, solution that forces the use of system
software (e.g. udev rules or multipathd) to tweak max_sectors_kb of the
underlying devices _before_ a layer like DM multipath is layered ontop.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
block/blk-sysfs.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 9cc8d7c..cce43d7 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -199,8 +199,12 @@ queue_max_sectors_store(struct request_queue *q, const char *page, size_t count)
unsigned long max_sectors_kb,
max_hw_sectors_kb = queue_max_hw_sectors(q) >> 1,
page_kb = 1 << (PAGE_SHIFT - 10);
- ssize_t ret = queue_var_store(&max_sectors_kb, page, count);
+ ssize_t ret;
+
+ if (WARN_ON_ONCE(blk_queue_stackable(q)))
+ return -EINVAL;
+ ret = queue_var_store(&max_sectors_kb, page, count);
if (ret < 0)
return ret;
--
2.8.4 (Apple Git-73)
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: block: disallow changing max_sectors_kb on a request stacking device
2016-10-28 19:45 [PATCH] block: disallow changing max_sectors_kb on a request stacking device Mike Snitzer
@ 2016-11-07 16:40 ` Mike Snitzer
2016-11-07 19:26 ` [PATCH v2] " Mike Snitzer
0 siblings, 1 reply; 8+ messages in thread
From: Mike Snitzer @ 2016-11-07 16:40 UTC (permalink / raw)
To: axboe; +Cc: linux-block, dm-devel
On Fri, Oct 28 2016 at 3:45pm -0400,
Mike Snitzer <snitzer@redhat.com> wrote:
> Otherwise users can easily shoot themselves in the foot by creating the
> situation where the top-level stacked device (e.g. DM multipath) has a
> larger max_sectors_kb than the underlying device(s). Which will
> certainly lead to IO errors due to the "over max size limit" check in
> blk_cloned_rq_check_limits().
>
> Use of WARN_ON_ONCE() gives users visibility into which application
> attempted to perform this unsupported max_sectors_kb change.
>
> This is a crude, yet effective, solution that forces the use of system
> software (e.g. udev rules or multipathd) to tweak max_sectors_kb of the
> underlying devices _before_ a layer like DM multipath is layered ontop.
>
> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
> ---
> block/blk-sysfs.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index 9cc8d7c..cce43d7 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -199,8 +199,12 @@ queue_max_sectors_store(struct request_queue *q, const char *page, size_t count)
> unsigned long max_sectors_kb,
> max_hw_sectors_kb = queue_max_hw_sectors(q) >> 1,
> page_kb = 1 << (PAGE_SHIFT - 10);
> - ssize_t ret = queue_var_store(&max_sectors_kb, page, count);
> + ssize_t ret;
> +
> + if (WARN_ON_ONCE(blk_queue_stackable(q)))
> + return -EINVAL;
>
> + ret = queue_var_store(&max_sectors_kb, page, count);
> if (ret < 0)
> return ret;
>
> --
> 2.8.4 (Apple Git-73)
Jens,
Never heard from you on this patch. Would you like me to resend without
the WARN_ON_ONCE? (Failing is enough, the userspace code that attempts
to set max_sectors_kb on DM multipath will get the user's attention)
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v2] block: disallow changing max_sectors_kb on a request stacking device
2016-11-07 16:40 ` Mike Snitzer
@ 2016-11-07 19:26 ` Mike Snitzer
2016-11-07 19:32 ` Jens Axboe
0 siblings, 1 reply; 8+ messages in thread
From: Mike Snitzer @ 2016-11-07 19:26 UTC (permalink / raw)
To: axboe; +Cc: linux-block, dm-devel
Otherwise users can easily shoot themselves in the foot by creating the
situation where the top-level stacked device (e.g. DM multipath) has a
larger max_sectors_kb than the underlying device(s). Which will
certainly lead to IO errors due to the "over max size limit" check in
blk_cloned_rq_check_limits().
This is a crude, yet effective, solution that forces the use of system
software (e.g. udev rules or multipathd) to tweak max_sectors_kb of the
underlying devices _before_ a layer like DM multipath is layered ontop.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
block/blk-sysfs.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 9cc8d7c..934f326 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -199,8 +199,12 @@ queue_max_sectors_store(struct request_queue *q, const char *page, size_t count)
unsigned long max_sectors_kb,
max_hw_sectors_kb = queue_max_hw_sectors(q) >> 1,
page_kb = 1 << (PAGE_SHIFT - 10);
- ssize_t ret = queue_var_store(&max_sectors_kb, page, count);
+ ssize_t ret;
+
+ if (blk_queue_stackable(q))
+ return -EINVAL;
+ ret = queue_var_store(&max_sectors_kb, page, count);
if (ret < 0)
return ret;
--
2.9.3 (Apple Git-75)
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v2] block: disallow changing max_sectors_kb on a request stacking device
2016-11-07 19:26 ` [PATCH v2] " Mike Snitzer
@ 2016-11-07 19:32 ` Jens Axboe
2016-11-07 21:27 ` Mike Snitzer
0 siblings, 1 reply; 8+ messages in thread
From: Jens Axboe @ 2016-11-07 19:32 UTC (permalink / raw)
To: Mike Snitzer; +Cc: linux-block, dm-devel
On 11/07/2016 12:26 PM, Mike Snitzer wrote:
> Otherwise users can easily shoot themselves in the foot by creating the
> situation where the top-level stacked device (e.g. DM multipath) has a
> larger max_sectors_kb than the underlying device(s). Which will
> certainly lead to IO errors due to the "over max size limit" check in
> blk_cloned_rq_check_limits().
>
> This is a crude, yet effective, solution that forces the use of system
> software (e.g. udev rules or multipathd) to tweak max_sectors_kb of the
> underlying devices _before_ a layer like DM multipath is layered ontop.
Maybe I'm missing something, but the code we have in place splits it
into max sectors for software and hardware. Shouldn't the stacked
devices have max_hw_sectors capped to what the lower levels support? If
that was done, we would not have to worry about a user fiddling with
max_sectors_kb, since it could only be smaller (or equal to) the max
size of the lower level.
--
Jens Axboe
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2] block: disallow changing max_sectors_kb on a request stacking device
2016-11-07 19:32 ` Jens Axboe
@ 2016-11-07 21:27 ` Mike Snitzer
2016-11-08 2:46 ` Martin K. Petersen
0 siblings, 1 reply; 8+ messages in thread
From: Mike Snitzer @ 2016-11-07 21:27 UTC (permalink / raw)
To: Jens Axboe; +Cc: linux-block, dm-devel, Martin K. Petersen
On Mon, Nov 07 2016 at 2:32pm -0500,
Jens Axboe <axboe@kernel.dk> wrote:
> On 11/07/2016 12:26 PM, Mike Snitzer wrote:
> >Otherwise users can easily shoot themselves in the foot by creating the
> >situation where the top-level stacked device (e.g. DM multipath) has a
> >larger max_sectors_kb than the underlying device(s). Which will
> >certainly lead to IO errors due to the "over max size limit" check in
> >blk_cloned_rq_check_limits().
> >
> >This is a crude, yet effective, solution that forces the use of system
> >software (e.g. udev rules or multipathd) to tweak max_sectors_kb of the
> >underlying devices _before_ a layer like DM multipath is layered ontop.
>
> Maybe I'm missing something, but the code we have in place splits it
> into max sectors for software and hardware. Shouldn't the stacked
> devices have max_hw_sectors capped to what the lower levels support? If
> that was done, we would not have to worry about a user fiddling with
> max_sectors_kb, since it could only be smaller (or equal to) the max
> size of the lower level.
DM multipath just uses blk_stack_limits() to stack limits, which has:
t->max_sectors = min_not_zero(t->max_sectors, b->max_sectors);
t->max_hw_sectors = min_not_zero(t->max_hw_sectors, b->max_hw_sectors);
t->max_dev_sectors = min_not_zero(t->max_dev_sectors, b->max_dev_sectors);
But I assume you realize that.. I'm just missing the relation you're
saying exists, or should exist, between max_hw_sectors and max_sectors
(other than the obvious: max_sectors cannot be greater than
max_hw_sectors) as they relate to stacking.
You're suggesting that when the DM multipath device's limits are stacked
up from the underlying devices: cap the mpath's max_hw_sectors to the
underlying devices' max_sectors?
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2] block: disallow changing max_sectors_kb on a request stacking device
2016-11-07 21:27 ` Mike Snitzer
@ 2016-11-08 2:46 ` Martin K. Petersen
2016-11-08 3:34 ` Mike Snitzer
0 siblings, 1 reply; 8+ messages in thread
From: Martin K. Petersen @ 2016-11-08 2:46 UTC (permalink / raw)
To: Mike Snitzer; +Cc: Jens Axboe, linux-block, dm-devel, Martin K. Petersen
>>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:
Mike,
Mike> You're suggesting that when the DM multipath device's limits are
Mike> stacked up from the underlying devices: cap the mpath's
Mike> max_hw_sectors to the underlying devices' max_sectors?
That will break SG_IO, firmware upgrades, etc. (things you probably
shouldn't do on the MP device in the first place but which users often
do).
However, doesn't it make more sense to tweak limits on DM device instead
of the underlying ones? It seems a bit counter-intuitive to me to change
max_sectors_kb on a different device than the one where the filesystem
whose max I/O size you want to change is located.
I guess this brings us back to the desire to be able to restack the
limits at will when something changes somewhere...
--
Martin K. Petersen Oracle Linux Engineering
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2] block: disallow changing max_sectors_kb on a request stacking device
2016-11-08 2:46 ` Martin K. Petersen
@ 2016-11-08 3:34 ` Mike Snitzer
2016-11-08 21:10 ` Martin K. Petersen
0 siblings, 1 reply; 8+ messages in thread
From: Mike Snitzer @ 2016-11-08 3:34 UTC (permalink / raw)
To: Martin K. Petersen; +Cc: Jens Axboe, linux-block, dm-devel
On Mon, Nov 07 2016 at 9:46pm -0500,
Martin K. Petersen <martin.petersen@oracle.com> wrote:
> >>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:
>
> Mike,
>
> Mike> You're suggesting that when the DM multipath device's limits are
> Mike> stacked up from the underlying devices: cap the mpath's
> Mike> max_hw_sectors to the underlying devices' max_sectors?
>
> That will break SG_IO, firmware upgrades, etc. (things you probably
> shouldn't do on the MP device in the first place but which users often
> do).
>
> However, doesn't it make more sense to tweak limits on DM device instead
> of the underlying ones? It seems a bit counter-intuitive to me to change
> max_sectors_kb on a different device than the one where the filesystem
> whose max I/O size you want to change is located.
As you know, the limits are stacked from the bottom-up.
> I guess this brings us back to the desire to be able to restack the
> limits at will when something changes somewhere...
Tough to do that in a race-free way when DM multipath requests are
cloned for submission to the underlying devices.
As the commit header from this thread mentioned, what I've arrived at is
to have multipathd establish a desired max_sectors_kb (if configured to
do so in multipath.conf) on the underlying paths _before_ the multipath
device is created. Yet to check with Ben Marzinski or others but it
seems like a reasonable feature (and really the cleaniset way to deal
with this issue IMHO.. no need for lots of kernel code changes for
something so niche).
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2] block: disallow changing max_sectors_kb on a request stacking device
2016-11-08 3:34 ` Mike Snitzer
@ 2016-11-08 21:10 ` Martin K. Petersen
0 siblings, 0 replies; 8+ messages in thread
From: Martin K. Petersen @ 2016-11-08 21:10 UTC (permalink / raw)
To: Mike Snitzer; +Cc: Martin K. Petersen, Jens Axboe, linux-block, dm-devel
>>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:
Mike,
>> However, doesn't it make more sense to tweak limits on DM device
>> instead of the underlying ones? It seems a bit counter-intuitive to
>> me to change max_sectors_kb on a different device than the one where
>> the filesystem whose max I/O size you want to change is located.
Mike> As you know, the limits are stacked from the bottom-up.
Yep, but since max_sectors_kb is a performance tunable and not something
enforced by the hardware, maybe we should consider treating it
differently?
Mike> As the commit header from this thread mentioned, what I've arrived
Mike> at is to have multipathd establish a desired max_sectors_kb (if
Mike> configured to do so in multipath.conf) on the underlying paths
Mike> _before_ the multipath device is created. Yet to check with Ben
Mike> Marzinski or others but it seems like a reasonable feature (and
Mike> really the cleaniset way to deal with this issue IMHO.. no need
Mike> for lots of kernel code changes for something so niche).
That's fine. Another option would be to use the max_dev_sectors queue
limit to contain the minimum max_sectors from below. It was really only
envisioned as a LLD limit but it may be useful in this case.
queue_max_sectors_store() already enforces it.
--
Martin K. Petersen Oracle Linux Engineering
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2016-11-08 21:10 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-10-28 19:45 [PATCH] block: disallow changing max_sectors_kb on a request stacking device Mike Snitzer
2016-11-07 16:40 ` Mike Snitzer
2016-11-07 19:26 ` [PATCH v2] " Mike Snitzer
2016-11-07 19:32 ` Jens Axboe
2016-11-07 21:27 ` Mike Snitzer
2016-11-08 2:46 ` Martin K. Petersen
2016-11-08 3:34 ` Mike Snitzer
2016-11-08 21:10 ` Martin K. Petersen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).