* [PATCH] block: fix FS_IOC_GETLBMD_CAP parsing in blkdev_common_ioctl()
@ 2025-07-09 18:10 Arnd Bergmann
2025-07-09 18:27 ` Darrick J. Wong
2025-07-10 8:00 ` Christian Brauner
0 siblings, 2 replies; 11+ messages in thread
From: Arnd Bergmann @ 2025-07-09 18:10 UTC (permalink / raw)
To: linux-fsdevel, linux-block, Anuj Gupta, Martin K . Petersen,
Kanchan Joshi
Cc: ltp, dan.carpenter, benjamin.copeland, rbm, Arnd Bergmann,
Naresh Kamboju, Anders Roxell, Jens Axboe, Pavel Begunkov,
Christian Brauner, Alexey Dobriyan, Darrick J. Wong, Eric Biggers,
linux-kernel
From: Arnd Bergmann <arnd@arndb.de>
Anders and Naresh found that the addition of the FS_IOC_GETLBMD_CAP
handling in the blockdev ioctl handler breaks all ioctls with
_IOC_NR==2, as the new command is not added to the switch but only
a few of the command bits are check.
Refine the check to also validate the direction/type/length bits,
but still allow all supported sizes for future extensions.
Move the new command to the end of the function to avoid slowing
down normal ioctl commands with the added branches.
Fixes: 9eb22f7fedfc ("fs: add ioctl to query metadata and protection info capabilities")
Link: https://lore.kernel.org/all/CA+G9fYvk9HHE5UJ7cdJHTcY6P5JKnp+_e+sdC5U-ZQFTP9_hqQ@mail.gmail.com/
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Cc: Anders Roxell <anders.roxell@linaro.org>
Cc: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
It seems that we have a lot of drivers with the same bug, as the
large majority of all _IOC_NR() users in the kernel fail to also
check the other bits of the ioctl command code. There are currently
55 files referencing _IOC_NR, and they all need to be manually
checked for this problem.
---
block/ioctl.c | 14 +++++++++-----
1 file changed, 9 insertions(+), 5 deletions(-)
diff --git a/block/ioctl.c b/block/ioctl.c
index 9ad403733e19..5e5a422bd09f 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -567,9 +567,6 @@ static int blkdev_common_ioctl(struct block_device *bdev, blk_mode_t mode,
{
unsigned int max_sectors;
- if (_IOC_NR(cmd) == _IOC_NR(FS_IOC_GETLBMD_CAP))
- return blk_get_meta_cap(bdev, cmd, argp);
-
switch (cmd) {
case BLKFLSBUF:
return blkdev_flushbuf(bdev, cmd, arg);
@@ -647,9 +644,16 @@ static int blkdev_common_ioctl(struct block_device *bdev, blk_mode_t mode,
return blkdev_pr_preempt(bdev, mode, argp, true);
case IOC_PR_CLEAR:
return blkdev_pr_clear(bdev, mode, argp);
- default:
- return -ENOIOCTLCMD;
}
+
+ if (_IOC_DIR(cmd) == _IOC_DIR(FS_IOC_GETLBMD_CAP) &&
+ _IOC_TYPE(cmd) == _IOC_TYPE(FS_IOC_GETLBMD_CAP) &&
+ _IOC_NR(cmd) == _IOC_NR(FS_IOC_GETLBMD_CAP) &&
+ _IOC_SIZE(cmd) >= LBMD_SIZE_VER0 &&
+ _IOC_SIZE(cmd) <= _IOC_SIZE(FS_IOC_GETLBMD_CAP))
+ return blk_get_meta_cap(bdev, cmd, argp);
+
+ return -ENOIOCTLCMD;
}
/*
--
2.39.5
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH] block: fix FS_IOC_GETLBMD_CAP parsing in blkdev_common_ioctl()
2025-07-09 18:10 [PATCH] block: fix FS_IOC_GETLBMD_CAP parsing in blkdev_common_ioctl() Arnd Bergmann
@ 2025-07-09 18:27 ` Darrick J. Wong
2025-07-09 20:30 ` Arnd Bergmann
2025-07-10 8:00 ` Christian Brauner
1 sibling, 1 reply; 11+ messages in thread
From: Darrick J. Wong @ 2025-07-09 18:27 UTC (permalink / raw)
To: Arnd Bergmann
Cc: linux-fsdevel, linux-block, Anuj Gupta, Martin K . Petersen,
Kanchan Joshi, ltp, dan.carpenter, benjamin.copeland, rbm,
Arnd Bergmann, Naresh Kamboju, Anders Roxell, Jens Axboe,
Pavel Begunkov, Christian Brauner, Alexey Dobriyan, Eric Biggers,
linux-kernel
On Wed, Jul 09, 2025 at 08:10:14PM +0200, Arnd Bergmann wrote:
> From: Arnd Bergmann <arnd@arndb.de>
>
> Anders and Naresh found that the addition of the FS_IOC_GETLBMD_CAP
> handling in the blockdev ioctl handler breaks all ioctls with
> _IOC_NR==2, as the new command is not added to the switch but only
> a few of the command bits are check.
>
> Refine the check to also validate the direction/type/length bits,
> but still allow all supported sizes for future extensions.
>
> Move the new command to the end of the function to avoid slowing
> down normal ioctl commands with the added branches.
>
> Fixes: 9eb22f7fedfc ("fs: add ioctl to query metadata and protection info capabilities")
> Link: https://lore.kernel.org/all/CA+G9fYvk9HHE5UJ7cdJHTcY6P5JKnp+_e+sdC5U-ZQFTP9_hqQ@mail.gmail.com/
> Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
> Cc: Anders Roxell <anders.roxell@linaro.org>
> Cc: Naresh Kamboju <naresh.kamboju@linaro.org>
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> ---
> It seems that we have a lot of drivers with the same bug, as the
> large majority of all _IOC_NR() users in the kernel fail to also
> check the other bits of the ioctl command code. There are currently
> 55 files referencing _IOC_NR, and they all need to be manually
> checked for this problem.
> ---
> block/ioctl.c | 14 +++++++++-----
> 1 file changed, 9 insertions(+), 5 deletions(-)
>
> diff --git a/block/ioctl.c b/block/ioctl.c
> index 9ad403733e19..5e5a422bd09f 100644
> --- a/block/ioctl.c
> +++ b/block/ioctl.c
> @@ -567,9 +567,6 @@ static int blkdev_common_ioctl(struct block_device *bdev, blk_mode_t mode,
> {
> unsigned int max_sectors;
>
> - if (_IOC_NR(cmd) == _IOC_NR(FS_IOC_GETLBMD_CAP))
> - return blk_get_meta_cap(bdev, cmd, argp);
> -
> switch (cmd) {
> case BLKFLSBUF:
> return blkdev_flushbuf(bdev, cmd, arg);
> @@ -647,9 +644,16 @@ static int blkdev_common_ioctl(struct block_device *bdev, blk_mode_t mode,
> return blkdev_pr_preempt(bdev, mode, argp, true);
> case IOC_PR_CLEAR:
> return blkdev_pr_clear(bdev, mode, argp);
> - default:
> - return -ENOIOCTLCMD;
> }
> +
> + if (_IOC_DIR(cmd) == _IOC_DIR(FS_IOC_GETLBMD_CAP) &&
> + _IOC_TYPE(cmd) == _IOC_TYPE(FS_IOC_GETLBMD_CAP) &&
> + _IOC_NR(cmd) == _IOC_NR(FS_IOC_GETLBMD_CAP) &&
I think this problem was introduced by brauner trying to persuade people
to perform size independent dispatch of ioctls:
switch (_IOC_NR(cmd)) {
case _IOC_NR(FS_IOC_FSGETXATTR):
if (WARN_ON_ONCE(_IOC_TYPE(cmd) != _IOC_TYPE(FS_IOC_FSGETXATTR)))
return SOMETHING_SOMETHING;
/* Only handle original size. */
return ioctl_fsgetxattr(filp, argp);
https://lore.kernel.org/linux-xfs/20250515-bedarf-absagen-464773be3e72@brauner/
though we probably want a helper or something to encapsulate those three
comparisons to avoid the SOMETHING_SOMETHING part:
#define IOC_DISPATCH(c) \
((c) & ~(_IOC(0, 0, 0, _IOC_SIZE(_IOC_SIZEMASK))))
switch (IOC_DISPATCH(cmd)) {
case IOC_DISPATCH(FS_IOC_FSGETXATTR):
return ioctl_fsgetxattr(filp, cmd, argp);
Assuming that ioctl_fsgetxattr derives size from @cmd and rejects values
that it doesn't like. Hrm?
> + _IOC_SIZE(cmd) >= LBMD_SIZE_VER0 &&
> + _IOC_SIZE(cmd) <= _IOC_SIZE(FS_IOC_GETLBMD_CAP))
blk_get_meta_cap already checks this.
--D
> + return blk_get_meta_cap(bdev, cmd, argp);
> +
> + return -ENOIOCTLCMD;
> }
>
> /*
> --
> 2.39.5
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] block: fix FS_IOC_GETLBMD_CAP parsing in blkdev_common_ioctl()
2025-07-09 18:27 ` Darrick J. Wong
@ 2025-07-09 20:30 ` Arnd Bergmann
0 siblings, 0 replies; 11+ messages in thread
From: Arnd Bergmann @ 2025-07-09 20:30 UTC (permalink / raw)
To: Darrick J. Wong, Arnd Bergmann
Cc: linux-fsdevel, linux-block, Anuj Gupta, Martin K. Petersen,
Kanchan Joshi, LTP List, Dan Carpenter, Benjamin Copeland, rbm,
Naresh Kamboju, Anders Roxell, Jens Axboe, Pavel Begunkov,
Christian Brauner, Alexey Dobriyan, Eric Biggers, linux-kernel
On Wed, Jul 9, 2025, at 20:27, Darrick J. Wong wrote:
> On Wed, Jul 09, 2025 at 08:10:14PM +0200, Arnd Bergmann wrote:
> though we probably want a helper or something to encapsulate those three
> comparisons to avoid the SOMETHING_SOMETHING part:
>
> #define IOC_DISPATCH(c) \
> ((c) & ~(_IOC(0, 0, 0, _IOC_SIZE(_IOC_SIZEMASK))))
>
> switch (IOC_DISPATCH(cmd)) {
> case IOC_DISPATCH(FS_IOC_FSGETXATTR):
> return ioctl_fsgetxattr(filp, cmd, argp);
>
> Assuming that ioctl_fsgetxattr derives size from @cmd and rejects values
> that it doesn't like. Hrm?
This may work in specific cases, but it adds a lot of complexity
and room for error if we try to do this in more places:
Ignoring the 'size' argument as above would mean that
each case now has to add an extra size check in each 'case',
which then defeats the entire purpose.
I should maybe dig out my notes for table-driver ioctl
handlers, if we want to improve the way that drivers define
their ioctl implementations, I'm sure there is some
infrastructure we can come up with that can help here,
but I don't think 'same as before but more macros' is the
answer.
joydev_ioctl_common() is an existing example doing something
like it and gets it right, while snd_compr_ioctl() is an
example that looks completely broken to me.
>> + _IOC_SIZE(cmd) >= LBMD_SIZE_VER0 &&
>> + _IOC_SIZE(cmd) <= _IOC_SIZE(FS_IOC_GETLBMD_CAP))
>
> blk_get_meta_cap already checks this.
I had thought about removing it there, but decided against that.
Maybe a better way would be to have the full check inside of
blk_get_meta_cap() and use the -ENOIOCTLCMD return code
to keep the caller simple:
switch(cmd) {
...
default:
break;
}
ret = blk_get_meta_cap(bdev, cmd, argp);
if (ret != -ENOIOCTLCMD)
return ret;
...
return -ENOIOCTLCMD;
Arnd
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] block: fix FS_IOC_GETLBMD_CAP parsing in blkdev_common_ioctl()
2025-07-09 18:10 [PATCH] block: fix FS_IOC_GETLBMD_CAP parsing in blkdev_common_ioctl() Arnd Bergmann
2025-07-09 18:27 ` Darrick J. Wong
@ 2025-07-10 8:00 ` Christian Brauner
2025-07-10 8:14 ` Christoph Hellwig
2025-07-10 10:11 ` Arnd Bergmann
1 sibling, 2 replies; 11+ messages in thread
From: Christian Brauner @ 2025-07-10 8:00 UTC (permalink / raw)
To: Arnd Bergmann
Cc: linux-fsdevel, linux-block, Anuj Gupta, Martin K . Petersen,
Kanchan Joshi, ltp, dan.carpenter, benjamin.copeland, rbm,
Arnd Bergmann, Naresh Kamboju, Anders Roxell, Jens Axboe,
Pavel Begunkov, Alexey Dobriyan, Darrick J. Wong, Eric Biggers,
linux-kernel
On Wed, Jul 09, 2025 at 08:10:14PM +0200, Arnd Bergmann wrote:
> From: Arnd Bergmann <arnd@arndb.de>
>
> Anders and Naresh found that the addition of the FS_IOC_GETLBMD_CAP
> handling in the blockdev ioctl handler breaks all ioctls with
> _IOC_NR==2, as the new command is not added to the switch but only
> a few of the command bits are check.
>
> Refine the check to also validate the direction/type/length bits,
> but still allow all supported sizes for future extensions.
>
> Move the new command to the end of the function to avoid slowing
> down normal ioctl commands with the added branches.
>
> Fixes: 9eb22f7fedfc ("fs: add ioctl to query metadata and protection info capabilities")
> Link: https://lore.kernel.org/all/CA+G9fYvk9HHE5UJ7cdJHTcY6P5JKnp+_e+sdC5U-ZQFTP9_hqQ@mail.gmail.com/
> Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
> Cc: Anders Roxell <anders.roxell@linaro.org>
> Cc: Naresh Kamboju <naresh.kamboju@linaro.org>
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> ---
Thanks!
> It seems that we have a lot of drivers with the same bug, as the
> large majority of all _IOC_NR() users in the kernel fail to also
> check the other bits of the ioctl command code. There are currently
> 55 files referencing _IOC_NR, and they all need to be manually
> checked for this problem.
> ---
The current documentation in Documentation/dev-tools/checkuapi.rst needs
updating too then.
I want this to work. So as a start we should have a common static inline
helper that encapsulates the barrage of checks.
> block/ioctl.c | 14 +++++++++-----
> 1 file changed, 9 insertions(+), 5 deletions(-)
>
> diff --git a/block/ioctl.c b/block/ioctl.c
> index 9ad403733e19..5e5a422bd09f 100644
> --- a/block/ioctl.c
> +++ b/block/ioctl.c
> @@ -567,9 +567,6 @@ static int blkdev_common_ioctl(struct block_device *bdev, blk_mode_t mode,
> {
> unsigned int max_sectors;
>
> - if (_IOC_NR(cmd) == _IOC_NR(FS_IOC_GETLBMD_CAP))
> - return blk_get_meta_cap(bdev, cmd, argp);
> -
> switch (cmd) {
> case BLKFLSBUF:
> return blkdev_flushbuf(bdev, cmd, arg);
> @@ -647,9 +644,16 @@ static int blkdev_common_ioctl(struct block_device *bdev, blk_mode_t mode,
> return blkdev_pr_preempt(bdev, mode, argp, true);
> case IOC_PR_CLEAR:
> return blkdev_pr_clear(bdev, mode, argp);
> - default:
> - return -ENOIOCTLCMD;
> }
> +
> + if (_IOC_DIR(cmd) == _IOC_DIR(FS_IOC_GETLBMD_CAP) &&
> + _IOC_TYPE(cmd) == _IOC_TYPE(FS_IOC_GETLBMD_CAP) &&
> + _IOC_NR(cmd) == _IOC_NR(FS_IOC_GETLBMD_CAP) &&
> + _IOC_SIZE(cmd) >= LBMD_SIZE_VER0 &&
> + _IOC_SIZE(cmd) <= _IOC_SIZE(FS_IOC_GETLBMD_CAP))
This part is wrong as we handle larger sizes just fine via
copy_struct_{from,to}_user().
Arnd, objections to writing it as follows?:
diff --git a/block/ioctl.c b/block/ioctl.c
index 9ad403733e19..9887ec55f8ce 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -567,9 +567,6 @@ static int blkdev_common_ioctl(struct block_device *bdev, blk_mode_t mode,
{
unsigned int max_sectors;
- if (_IOC_NR(cmd) == _IOC_NR(FS_IOC_GETLBMD_CAP))
- return blk_get_meta_cap(bdev, cmd, argp);
-
switch (cmd) {
case BLKFLSBUF:
return blkdev_flushbuf(bdev, cmd, arg);
@@ -647,9 +644,25 @@ static int blkdev_common_ioctl(struct block_device *bdev, blk_mode_t mode,
return blkdev_pr_preempt(bdev, mode, argp, true);
case IOC_PR_CLEAR:
return blkdev_pr_clear(bdev, mode, argp);
- default:
- return -ENOIOCTLCMD;
}
+
+ /* extensible ioctls */
+ switch (_IOC_NR(cmd)) {
+ case _IOC_NR(FS_IOC_GETLBMD_CAP):
+ if (_IOC_DIR(cmd) != _IOC_DIR(FS_IOC_GETLBMD_CAP))
+ break;
+ if (_IOC_TYPE(cmd) != _IOC_TYPE(FS_IOC_GETLBMD_CAP))
+ break;
+ if (_IOC_NR(cmd) != _IOC_NR(FS_IOC_GETLBMD_CAP))
+ break;
+ if (_IOC_SIZE(cmd) < LBMD_SIZE_VER0)
+ break;
+ if (_IOC_SIZE(cmd) > PAGE_SIZE)
+ break;
+ return blk_get_meta_cap(bdev, cmd, argp);
+ }
+
+ return -ENOIOCTLCMD;
}
/*
And can I ask you to please take a look at fs/pidfs.c:pidfd_ioctl() and
fs/nsfs.c:ns_ioctl()?
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH] block: fix FS_IOC_GETLBMD_CAP parsing in blkdev_common_ioctl()
2025-07-10 8:00 ` Christian Brauner
@ 2025-07-10 8:14 ` Christoph Hellwig
2025-07-10 10:50 ` Arnd Bergmann
2025-07-10 10:11 ` Arnd Bergmann
1 sibling, 1 reply; 11+ messages in thread
From: Christoph Hellwig @ 2025-07-10 8:14 UTC (permalink / raw)
To: Christian Brauner
Cc: Arnd Bergmann, linux-fsdevel, linux-block, Anuj Gupta,
Martin K . Petersen, Kanchan Joshi, ltp, dan.carpenter,
benjamin.copeland, rbm, Arnd Bergmann, Naresh Kamboju,
Anders Roxell, Jens Axboe, Pavel Begunkov, Alexey Dobriyan,
Darrick J. Wong, Eric Biggers, linux-kernel
On Thu, Jul 10, 2025 at 10:00:48AM +0200, Christian Brauner wrote:
> + switch (_IOC_NR(cmd)) {
> + case _IOC_NR(FS_IOC_GETLBMD_CAP):
> + if (_IOC_DIR(cmd) != _IOC_DIR(FS_IOC_GETLBMD_CAP))
> + break;
> + if (_IOC_TYPE(cmd) != _IOC_TYPE(FS_IOC_GETLBMD_CAP))
> + break;
> + if (_IOC_NR(cmd) != _IOC_NR(FS_IOC_GETLBMD_CAP))
> + break;
> + if (_IOC_SIZE(cmd) < LBMD_SIZE_VER0)
> + break;
> + if (_IOC_SIZE(cmd) > PAGE_SIZE)
> + break;
> + return blk_get_meta_cap(bdev, cmd, argp);
> + }
Yikes. I really don't get why we're trying change the way how ioctls
worked forever. We can and usually do use the size based macro already.
And when we introduce a new size (which should happen rarely), we add a
new entry to the switch using the normal _IO* macros, and either
rename the struct, or use offsetofend in the _IO* entry for the old one.
Just in XFS which I remember in detail we've done that to extend
structures in backwards compatible ways multiple times.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] block: fix FS_IOC_GETLBMD_CAP parsing in blkdev_common_ioctl()
2025-07-10 8:00 ` Christian Brauner
2025-07-10 8:14 ` Christoph Hellwig
@ 2025-07-10 10:11 ` Arnd Bergmann
2025-07-10 12:03 ` Christian Brauner
1 sibling, 1 reply; 11+ messages in thread
From: Arnd Bergmann @ 2025-07-10 10:11 UTC (permalink / raw)
To: Christian Brauner, Arnd Bergmann
Cc: linux-fsdevel, linux-block, Anuj Gupta, Martin K. Petersen,
Kanchan Joshi, LTP List, Dan Carpenter, Benjamin Copeland, rbm,
Naresh Kamboju, Anders Roxell, Jens Axboe, Pavel Begunkov,
Alexey Dobriyan, Darrick J. Wong, Eric Biggers, linux-kernel
On Thu, Jul 10, 2025, at 10:00, Christian Brauner wrote:
> On Wed, Jul 09, 2025 at 08:10:14PM +0200, Arnd Bergmann wrote:
>> + if (_IOC_DIR(cmd) == _IOC_DIR(FS_IOC_GETLBMD_CAP) &&
>> + _IOC_TYPE(cmd) == _IOC_TYPE(FS_IOC_GETLBMD_CAP) &&
>> + _IOC_NR(cmd) == _IOC_NR(FS_IOC_GETLBMD_CAP) &&
>> + _IOC_SIZE(cmd) >= LBMD_SIZE_VER0 &&
>> + _IOC_SIZE(cmd) <= _IOC_SIZE(FS_IOC_GETLBMD_CAP))
>
> This part is wrong as we handle larger sizes just fine via
> copy_struct_{from,to}_user().
I feel that is still an open question. As I understand it,
you want to make it slightly easier for userspace callers
to use future versions of an ioctl command by allowing them in
old kernels as well, by moving that complexity into the kernel.
Checking against _IOC_SIZE(FS_IOC_GETLBMD_CAP) would keep the
behavior consistent with the traditional model where userspace
needs to have a fallback to previous ABI versions.
> Arnd, objections to writing it as follows?:
> + /* extensible ioctls */
> + switch (_IOC_NR(cmd)) {
> + case _IOC_NR(FS_IOC_GETLBMD_CAP):
> + if (_IOC_DIR(cmd) != _IOC_DIR(FS_IOC_GETLBMD_CAP))
> + break;
> + if (_IOC_TYPE(cmd) != _IOC_TYPE(FS_IOC_GETLBMD_CAP))
> + break;
> + if (_IOC_NR(cmd) != _IOC_NR(FS_IOC_GETLBMD_CAP))
> + break;
> + if (_IOC_SIZE(cmd) < LBMD_SIZE_VER0)
> + break;
> + if (_IOC_SIZE(cmd) > PAGE_SIZE)
> + break;
> + return blk_get_meta_cap(bdev, cmd, argp);
The 'PAGE_SIZE' seems arbitrary here, especially since that is often
larger than the maximum size that can be encoded in an ioctl command
number (8KB or 16KB depending on the architecture). If we do need
an upper bound larger than _IOC_SIZE(FS_IOC_GETLBMD_CAP), I think it
should be a fixed number rather than a configuration dependent
one, and I would prefer a smaller one over a larger one. Anything
over a few dozen bytes is certainly suspicious, and once it gets
to thousands of bytes, you need a dynamic allocation to avoid stack
overflow in the kernel.
I had already updated my patch to move the checks into
blk_get_meta_cap() itself and keep the caller simpler:
diff --git a/block/blk-integrity.c b/block/blk-integrity.c
index 9d9dc9c32083..2909ebf27dc2 100644
--- a/block/blk-integrity.c
+++ b/block/blk-integrity.c
@@ -62,10 +62,13 @@ int blk_get_meta_cap(struct block_device *bdev, unsigned int cmd,
struct logical_block_metadata_cap meta_cap = {};
size_t usize = _IOC_SIZE(cmd);
- if (!argp)
- return -EINVAL;
- if (usize < LBMD_SIZE_VER0)
- return -EINVAL;
+ if (_IOC_DIR(cmd) != _IOC_DIR(FS_IOC_GETLBMD_CAP) ||
+ _IOC_TYPE(cmd) != _IOC_TYPE(FS_IOC_GETLBMD_CAP) ||
+ _IOC_NR(cmd) != _IOC_NR(FS_IOC_GETLBMD_CAP) ||
+ _IOC_SIZE(cmd) < LBMD_SIZE_VER0 ||
+ _IOC_SIZE(cmd) > _IOC_SIZE(FS_IOC_GETLBMD_CAP))
+ return -ENOIOCTLCMD;
+
if (!bi)
goto out;
diff --git a/block/ioctl.c b/block/ioctl.c
index 9ad403733e19..af2e22e5533c 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -566,9 +566,11 @@ static int blkdev_common_ioctl(struct block_device *bdev, blk_mode_t mode,
void __user *argp)
{
unsigned int max_sectors;
+ int ret;
- if (_IOC_NR(cmd) == _IOC_NR(FS_IOC_GETLBMD_CAP))
- return blk_get_meta_cap(bdev, cmd, argp);
+ ret = blk_get_meta_cap(bdev, cmd, argp);
+ if (ret != -ENOIOCTLCMD)
+ return ret;
switch (cmd) {
case BLKFLSBUF:
Regardless of what upper bound we pick, that at least limits
the complexity to the one function that actually wants it.
> And can I ask you to please take a look at fs/pidfs.c:pidfd_ioctl() and
PIDFD_GET_INFO has part of the same problem, as it still fails to
check the _IOC_DIR() bits. I see you added a check for _IOC_TYPE()
in commit 091ee63e36e8 ("pidfs: improve ioctl handling"), but
the comment you added describes an unrelated issue and the fix
was incomplete.
> fs/nsfs.c:ns_ioctl()?
You tried to add a similar validation in commit 7fd511f8c911
("nsfs: validate ioctls"), but it seems you got that wrong
both by missing the _IOC_DIR check and by having a typo in the
'_IOC_TYPE(cmd) == _IOC_TYPE(cmd)' line that means this is
always true rather than comparing against 'NSIO'.
Overall my feeling is similar to Christoph's, that the added
complexity in any of these three cases was a mistake, as it's
too easy to mess it up.
Arnd
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH] block: fix FS_IOC_GETLBMD_CAP parsing in blkdev_common_ioctl()
2025-07-10 8:14 ` Christoph Hellwig
@ 2025-07-10 10:50 ` Arnd Bergmann
2025-07-10 10:59 ` Christoph Hellwig
2025-07-10 12:11 ` Christian Brauner
0 siblings, 2 replies; 11+ messages in thread
From: Arnd Bergmann @ 2025-07-10 10:50 UTC (permalink / raw)
To: Christoph Hellwig, Christian Brauner
Cc: Arnd Bergmann, linux-fsdevel, linux-block, Anuj Gupta,
Martin K. Petersen, Kanchan Joshi, LTP List, Dan Carpenter,
Benjamin Copeland, rbm, Naresh Kamboju, Anders Roxell, Jens Axboe,
Pavel Begunkov, Alexey Dobriyan, Darrick J. Wong, Eric Biggers,
linux-kernel
On Thu, Jul 10, 2025, at 10:14, Christoph Hellwig wrote:
> On Thu, Jul 10, 2025 at 10:00:48AM +0200, Christian Brauner wrote:
>> + switch (_IOC_NR(cmd)) {
>> + case _IOC_NR(FS_IOC_GETLBMD_CAP):
>> + if (_IOC_DIR(cmd) != _IOC_DIR(FS_IOC_GETLBMD_CAP))
>> + break;
>> + if (_IOC_TYPE(cmd) != _IOC_TYPE(FS_IOC_GETLBMD_CAP))
>> + break;
>> + if (_IOC_NR(cmd) != _IOC_NR(FS_IOC_GETLBMD_CAP))
>> + break;
>> + if (_IOC_SIZE(cmd) < LBMD_SIZE_VER0)
>> + break;
>> + if (_IOC_SIZE(cmd) > PAGE_SIZE)
>> + break;
>> + return blk_get_meta_cap(bdev, cmd, argp);
>> + }
>
> Yikes. I really don't get why we're trying change the way how ioctls
> worked forever. We can and usually do use the size based macro already.
> And when we introduce a new size (which should happen rarely), we add a
> new entry to the switch using the normal _IO* macros, and either
> rename the struct, or use offsetofend in the _IO* entry for the old one.
>
> Just in XFS which I remember in detail we've done that to extend
> structures in backwards compatible ways multiple times.
There are multiple methods we've used to do this in the past,
but I don't think any of them are great, including the version
that Christian is trying to push now:
The most common variant is to leave extra room at the end of
a structure and use that as in your 1fd8159e7ca4 ("xfs: export zoned
geometry via XFS_FSOP_GEOM") and many other examples.
This is probably the easiest and it only fails once you run out of
spare room and have to pick a new command number. A common mistake
here is to forget checking the padding in the input data against
zero, so old kernels just ignore whatever new userspace tried
to pass.
I think the variant from commit 1b6d968de22b ("xfs: bump
XFS_IOC_FSGEOMETRY to v5 structures") where the old structure
gets renamed and the existing macro refers to a different
command code is more problematic. We used to always require
userspace to be built against the oldest kernel headers it could run
on. This worked fine in the past but it appears that userspace
(in particular glibc) has increasingly expected to also work
on older kernels when building against new headers.
Adding a new command code along with the new structure as in
cc68a8a5a433 ("btrfs: new ioctl TREE_SEARCH_V2") is probably
better here: While this does require userspace to have code
for calling either version, building an old program against
the new header still does the right thing and works on both
old and new kernels.
Christian's version using the copy_struct_{from,to}_user()
aims to avoid most of the problems. The main downside I see
here is the extra complexity in the kernel. As far as I can
tell, this has mainly led to extra kernel bugs but has not
actually resulted in any structure getting seamlessly
extended.
Arnd
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] block: fix FS_IOC_GETLBMD_CAP parsing in blkdev_common_ioctl()
2025-07-10 10:50 ` Arnd Bergmann
@ 2025-07-10 10:59 ` Christoph Hellwig
2025-07-10 11:52 ` Arnd Bergmann
2025-07-10 12:11 ` Christian Brauner
1 sibling, 1 reply; 11+ messages in thread
From: Christoph Hellwig @ 2025-07-10 10:59 UTC (permalink / raw)
To: Arnd Bergmann
Cc: Christoph Hellwig, Christian Brauner, Arnd Bergmann,
linux-fsdevel, linux-block, Anuj Gupta, Martin K. Petersen,
Kanchan Joshi, LTP List, Dan Carpenter, Benjamin Copeland, rbm,
Naresh Kamboju, Anders Roxell, Jens Axboe, Pavel Begunkov,
Alexey Dobriyan, Darrick J. Wong, Eric Biggers, linux-kernel
On Thu, Jul 10, 2025 at 12:50:44PM +0200, Arnd Bergmann wrote:
> There are multiple methods we've used to do this in the past,
> but I don't think any of them are great, including the version
> that Christian is trying to push now:
>
> The most common variant is to leave extra room at the end of
> a structure and use that as in your 1fd8159e7ca4 ("xfs: export zoned
> geometry via XFS_FSOP_GEOM") and many other examples.
That's using the space. I had that discussion before in context of
this API, and I still think that reserving a small amount of space
that can be used for extensions is usually good practice. Often
we get some of that for free by 64-bit aligning anyway, and adding
a bit more tends to also be useful.
> This is probably the easiest and it only fails once you run out of
> spare room and have to pick a new command number. A common mistake
> here is to forget checking the padding in the input data against
> zero, so old kernels just ignore whatever new userspace tried
> to pass.
>
> I think the variant from commit 1b6d968de22b ("xfs: bump
> XFS_IOC_FSGEOMETRY to v5 structures") where the old structure
> gets renamed and the existing macro refers to a different
> command code is more problematic. We used to always require
> userspace to be built against the oldest kernel headers it could run
> on. This worked fine in the past but it appears that userspace
> (in particular glibc) has increasingly expected to also work
> on older kernels when building against new headers.
This is what I meant. Note that the userspace in this case also keeps a
case trying the old structure, but that does indeed require keeping the
userspace somewhat in lockstep if you do the renaming as in this example.
The better example would be one using a new new for the extended
structure, or requiring a feature macro to get the larger structure.
> Christian's version using the copy_struct_{from,to}_user()
> aims to avoid most of the problems. The main downside I see
> here is the extra complexity in the kernel. As far as I can
> tell, this has mainly led to extra kernel bugs but has not
> actually resulted in any structure getting seamlessly
> extended.
That is my (non-scientific) impression as well.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] block: fix FS_IOC_GETLBMD_CAP parsing in blkdev_common_ioctl()
2025-07-10 10:59 ` Christoph Hellwig
@ 2025-07-10 11:52 ` Arnd Bergmann
0 siblings, 0 replies; 11+ messages in thread
From: Arnd Bergmann @ 2025-07-10 11:52 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Christian Brauner, Arnd Bergmann, linux-fsdevel, linux-block,
Anuj Gupta, Martin K. Petersen, Kanchan Joshi, LTP List,
Dan Carpenter, Benjamin Copeland, rbm, Naresh Kamboju,
Anders Roxell, Jens Axboe, Pavel Begunkov, Alexey Dobriyan,
Darrick J. Wong, Eric Biggers, linux-kernel
On Thu, Jul 10, 2025, at 12:59, Christoph Hellwig wrote:
>> I think the variant from commit 1b6d968de22b ("xfs: bump
>> XFS_IOC_FSGEOMETRY to v5 structures") where the old structure
>> gets renamed and the existing macro refers to a different
>> command code is more problematic. We used to always require
>> userspace to be built against the oldest kernel headers it could run
>> on. This worked fine in the past but it appears that userspace
>> (in particular glibc) has increasingly expected to also work
>> on older kernels when building against new headers.
>
> This is what I meant. Note that the userspace in this case also keeps a
> case trying the old structure, but that does indeed require keeping the
> userspace somewhat in lockstep if you do the renaming as in this example.
Right, it's fine for applications that keep a copy of the uapi
header file, because they can implement both versions when they
update to the new version of that file.
Redefining the ioctl command code does break if you have an
unmodified application source tree that unintentionally uses
the updated /usr/include/linux/*.h file. In this case there is
no benefit from the new header because it isn't aware of the
new struct member but it still ends up failing on old kernels.
Arnd
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] block: fix FS_IOC_GETLBMD_CAP parsing in blkdev_common_ioctl()
2025-07-10 10:11 ` Arnd Bergmann
@ 2025-07-10 12:03 ` Christian Brauner
0 siblings, 0 replies; 11+ messages in thread
From: Christian Brauner @ 2025-07-10 12:03 UTC (permalink / raw)
To: Arnd Bergmann
Cc: Arnd Bergmann, linux-fsdevel, linux-block, Anuj Gupta,
Martin K. Petersen, Kanchan Joshi, LTP List, Dan Carpenter,
Benjamin Copeland, rbm, Naresh Kamboju, Anders Roxell, Jens Axboe,
Pavel Begunkov, Alexey Dobriyan, Darrick J. Wong, Eric Biggers,
linux-kernel
On Thu, Jul 10, 2025 at 12:11:12PM +0200, Arnd Bergmann wrote:
> On Thu, Jul 10, 2025, at 10:00, Christian Brauner wrote:
> > On Wed, Jul 09, 2025 at 08:10:14PM +0200, Arnd Bergmann wrote:
>
> >> + if (_IOC_DIR(cmd) == _IOC_DIR(FS_IOC_GETLBMD_CAP) &&
> >> + _IOC_TYPE(cmd) == _IOC_TYPE(FS_IOC_GETLBMD_CAP) &&
> >> + _IOC_NR(cmd) == _IOC_NR(FS_IOC_GETLBMD_CAP) &&
> >> + _IOC_SIZE(cmd) >= LBMD_SIZE_VER0 &&
> >> + _IOC_SIZE(cmd) <= _IOC_SIZE(FS_IOC_GETLBMD_CAP))
> >
> > This part is wrong as we handle larger sizes just fine via
> > copy_struct_{from,to}_user().
>
> I feel that is still an open question. As I understand it,
> you want to make it slightly easier for userspace callers
> to use future versions of an ioctl command by allowing them in
> old kernels as well, by moving that complexity into the kernel.
>
> Checking against _IOC_SIZE(FS_IOC_GETLBMD_CAP) would keep the
> behavior consistent with the traditional model where userspace
> needs to have a fallback to previous ABI versions.
>
> > Arnd, objections to writing it as follows?:
>
> > + /* extensible ioctls */
> > + switch (_IOC_NR(cmd)) {
> > + case _IOC_NR(FS_IOC_GETLBMD_CAP):
> > + if (_IOC_DIR(cmd) != _IOC_DIR(FS_IOC_GETLBMD_CAP))
> > + break;
> > + if (_IOC_TYPE(cmd) != _IOC_TYPE(FS_IOC_GETLBMD_CAP))
> > + break;
> > + if (_IOC_NR(cmd) != _IOC_NR(FS_IOC_GETLBMD_CAP))
> > + break;
> > + if (_IOC_SIZE(cmd) < LBMD_SIZE_VER0)
> > + break;
> > + if (_IOC_SIZE(cmd) > PAGE_SIZE)
> > + break;
> > + return blk_get_meta_cap(bdev, cmd, argp);
>
> The 'PAGE_SIZE' seems arbitrary here, especially since that is often
I used that as an illustration since we're capping nearly all (regular)
uapi structures at that as a somewhat reasonable upper bound. If that's
smaller that's fine.
> larger than the maximum size that can be encoded in an ioctl command
> number (8KB or 16KB depending on the architecture). If we do need
> an upper bound larger than _IOC_SIZE(FS_IOC_GETLBMD_CAP), I think it
> should be a fixed number rather than a configuration dependent
> one, and I would prefer a smaller one over a larger one. Anything
Sure, fine by me.
> over a few dozen bytes is certainly suspicious, and once it gets
> to thousands of bytes, you need a dynamic allocation to avoid stack
> overflow in the kernel.
Sure, we do that already in some cases because we have use-cases for
that.
>
> I had already updated my patch to move the checks into
> blk_get_meta_cap() itself and keep the caller simpler:
Ok.
> Regardless of what upper bound we pick, that at least limits
> the complexity to the one function that actually wants it.
Ok.
>
> > And can I ask you to please take a look at fs/pidfs.c:pidfd_ioctl() and
>
> PIDFD_GET_INFO has part of the same problem, as it still fails to
> check the _IOC_DIR() bits. I see you added a check for _IOC_TYPE()
> in commit 091ee63e36e8 ("pidfs: improve ioctl handling"), but
> the comment you added describes an unrelated issue and the fix
> was incomplete.
>
> > fs/nsfs.c:ns_ioctl()?
>
> You tried to add a similar validation in commit 7fd511f8c911
> ("nsfs: validate ioctls"), but it seems you got that wrong
> both by missing the _IOC_DIR check and by having a typo in the
> '_IOC_TYPE(cmd) == _IOC_TYPE(cmd)' line that means this is
> always true rather than comparing against 'NSIO'.
>
> Overall my feeling is similar to Christoph's, that the added
> complexity in any of these three cases was a mistake, as it's
> too easy to mess it up.
For pidfs and nsfs it will definitely be the way we're doing it.
Especially with structures we definitely want to grow. I have zero
interest in spamming userspace with either a fragmented set of 100
ioctls that never give consistent information because they need to be
called in sequence nor constantly revised V2-100 ioctl commands that all
use a new structure with a fixed layout.
The fact that the ioctl api currently lacks validation is more testament
to how unspecified the requirements for ioctls and their encoding are
and we better fix that in general because like it or not we already have
quite a few size-versioned ioctls anyway. This is nothing we cannot fix.
Tell me what work you need and we'll do it.
The alternative I see is that I will start being very liberal in adding
extensible system calls for stuff such as pidfd_info() and ns_info()
instead of making them ioctls on the file where they belong.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] block: fix FS_IOC_GETLBMD_CAP parsing in blkdev_common_ioctl()
2025-07-10 10:50 ` Arnd Bergmann
2025-07-10 10:59 ` Christoph Hellwig
@ 2025-07-10 12:11 ` Christian Brauner
1 sibling, 0 replies; 11+ messages in thread
From: Christian Brauner @ 2025-07-10 12:11 UTC (permalink / raw)
To: Arnd Bergmann
Cc: Christoph Hellwig, Arnd Bergmann, linux-fsdevel, linux-block,
Anuj Gupta, Martin K. Petersen, Kanchan Joshi, LTP List,
Dan Carpenter, Benjamin Copeland, rbm, Naresh Kamboju,
Anders Roxell, Jens Axboe, Pavel Begunkov, Alexey Dobriyan,
Darrick J. Wong, Eric Biggers, linux-kernel
> Christian's version using the copy_struct_{from,to}_user()
> aims to avoid most of the problems. The main downside I see
> here is the extra complexity in the kernel. As far as I can
> tell, this has mainly led to extra kernel bugs but has not
> actually resulted in any structure getting seamlessly
> extended.
We extended ioctls multiple times seemlessly and other than this bug
right here I'm not aware of anything serious. Not liking it is fine of
course but saying "this caused a bug so go away" I won't take all too
seriously, sorry.
I don't want to go down the road of structure revisions for stuff in the
generic layer. Others can do whatever they see fit ofc and userspace can
then have its usualy ifdeffery and structure layout detection party
instead of a clean generic solution. I'd rather clean up the necessary
vetting bits and properly document how this can be done.
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2025-07-10 12:11 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-09 18:10 [PATCH] block: fix FS_IOC_GETLBMD_CAP parsing in blkdev_common_ioctl() Arnd Bergmann
2025-07-09 18:27 ` Darrick J. Wong
2025-07-09 20:30 ` Arnd Bergmann
2025-07-10 8:00 ` Christian Brauner
2025-07-10 8:14 ` Christoph Hellwig
2025-07-10 10:50 ` Arnd Bergmann
2025-07-10 10:59 ` Christoph Hellwig
2025-07-10 11:52 ` Arnd Bergmann
2025-07-10 12:11 ` Christian Brauner
2025-07-10 10:11 ` Arnd Bergmann
2025-07-10 12:03 ` Christian Brauner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).