All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Günther Noack" <gnoack@google.com>
To: Bryam Vargas <hexlabsecurity@proton.me>
Cc: "Mickaël Salaün" <mic@digikod.net>,
	"Paul Moore" <paul@paul-moore.com>,
	"Jens Axboe" <axboe@kernel.dk>, "Keith Busch" <kbusch@kernel.org>,
	"Christoph Hellwig" <hch@lst.de>,
	"Sagi Grimberg" <sagi@grimberg.me>,
	linux-security-module@vger.kernel.org, io-uring@vger.kernel.org,
	linux-block@vger.kernel.org, linux-nvme@lists.infradead.org,
	linux-kernel@vger.kernel.org
Subject: Re: Landlock: LANDLOCK_ACCESS_FS_IOCTL_DEV bypass via io_uring IORING_OP_URING_CMD
Date: Wed, 17 Jun 2026 11:47:56 +0200	[thread overview]
Message-ID: <ajJtTHyqWTmX7lHo@google.com> (raw)
In-Reply-To: <20260616201633.275067-1-hexlabsecurity@proton.me>

Hello Bryam!

Thanks for the report!

On Tue, Jun 16, 2026 at 08:16:41PM +0000, Bryam Vargas wrote:
> Hello Mickaël, and Landlock / io_uring folks,
> 
> A task confined by a Landlock ruleset that grants READ_FILE/WRITE_FILE on a block
> or NVMe character device but withholds LANDLOCK_ACCESS_FS_IOCTL_DEV can still
> reach the device-command surface through io_uring IORING_OP_URING_CMD with the
> IOCTL_DEV check bypassed: the request enters the device-command handler (block
> discard, or the NVMe char-device passthrough) where the equivalent ioctl(2) is
> denied. The destructive completion and the NVMe-admin surface follow from the
> code -- see Impact.
> 
> Affected
> --------
> Any kernel with CONFIG_SECURITY_LANDLOCK=y and Landlock enabled that supports
> LANDLOCK_ACCESS_FS_IOCTL_DEV (Landlock ABI >= 5, since Linux 6.8) and io_uring
> uring_cmd for the device class (block BLOCK_URING_CMD_DISCARD; NVMe passthrough).
> Confirmed by source inspection on mainline (v7.1-rc7) and reproduced on Linux
> 7.0.11 (Landlock ABI 8). The confined task needs a writable fd to a device it is
> legitimately allowed to use (e.g. a partition/loop device or an NVMe namespace
> passed into a container or granted by the ruleset); no CAP is required to reach
> the io_uring path. The gap is structural -- Landlock has never registered a
> uring_cmd hook -- so it is present from ABI 5 (Linux 6.8) through current
> mainline (v7.1-rc7) and is not a regression tied to a single Fixes: commit.
> 
> Root cause
> ----------
> On the ioctl(2) path, the syscall handler in fs/ioctl.c calls
> security_file_ioctl() (its only call site on the ioctl(2) path) before
> dispatching to do_vfs_ioctl(); that reaches Landlock hook_file_ioctl_common(),
> which denies a device ioctl unless the file's
> allowed_access holds LANDLOCK_ACCESS_FS_IOCTL_DEV (BLKDISCARD/BLKSECDISCARD/
> BLKZEROOUT and NVMe passthrough are not in the is_masked_device_ioctl()
> allow-list, so they require the right).
> 
> io_uring reaches the same device-command surface by a different producer:
> 
>   IORING_OP_URING_CMD -> io_uring_cmd()   io_uring/uring_cmd.c
>    -> security_uring_cmd(ioucmd)          (the ONLY LSM gate on this path)
>    -> file->f_op->uring_cmd()             e.g. blkdev_uring_cmd() / nvme_ns_chr_uring_cmd()
> 
> Landlock's LSM_HOOK_INIT list (security/landlock/fs.c, net.c, task.c) registers
> file_ioctl/file_ioctl_compat but no uring_cmd hook -- only SELinux
> (selinux_uring_cmd) and Smack (smack_uring_cmd) gate this surface -- so
> security_uring_cmd() returns 0 for a Landlocked task and hook_file_ioctl /
> IOCTL_DEV is never consulted. For block, blkdev_cmd_discard() is then gated only
> by BLK_OPEN_WRITE; for NVMe, nvme_ns_chr_uring_cmd() reaches the admin/IO
> passthrough with no security_file_ioctl on the path. There is no shared helper
> that re-applies the IOCTL_DEV check.
> 
> SELinux and Smack hooking uring_cmd while Landlock does not is the coverage
> asymmetry; the Landlock documentation describes IOCTL_DEV as gating ioctl(2) but
> does not mention io_uring.
> 
> Reproducer
> ----------
> A self-contained PoC is available on request (it needs root only to set up a loop
> block device and open it; Landlock enforcement is uid-independent, so the
> confined child demonstrates the gap regardless of the setup uid). The child
> applies a Landlock ruleset handling READ_FILE|WRITE_FILE|IOCTL_DEV with a rule
> granting only READ_FILE|WRITE_FILE on the device, then:
> 
>   (1) ioctl(fd, BLKDISCARD, range)        -> -EACCES  (Landlock enforces IOCTL_DEV)
>   (2) IORING_OP_URING_CMD,
>       cmd_op = BLOCK_URING_CMD_DISCARD     -> reaches the block command handler
> 
> Observed on Linux 7.0.11 (Landlock ABI 8):
> 
>   [1] ioctl(BLKDISCARD)   -> ret=-1 errno=13 (Permission denied)
>   [2] uring_cmd(DISCARD)  -> cqe.res=-22 (Invalid argument)
> 
> A Landlock denial is always -EACCES; the io_uring path returned -EINVAL, which
> originates in a post-authorization check inside the block command handler
> (blk_validate_byte_range() in blkdev_cmd_discard()), reached only after
> security_uring_cmd() returned 0. So this run demonstrates the authorization
> bypass -- the request traversed the LSM gate into the block device-command
> handler with no IOCTL_DEV check -- and then failed a parameter check, not an
> authorization check. The destructive completion (an authorized discard with a
> granularity-aligned range) is the expected behaviour but was not exercised in
> this run.
> 
> Impact
> ------
> Demonstrated: the LANDLOCK_ACCESS_FS_IOCTL_DEV authorization is bypassed. The
> device-command request reaches the block command handler with no Landlock check;
> the only remaining gate is BLK_OPEN_WRITE (held, since the policy granted write).
> Inferred from the code, not exercised here: an authorized DISCARD with a valid
> range completes (DISCARD/secure-erase semantics, destroying on-device data), and
> the same missing hook leaves the NVMe char-device uring_cmd surface ungated --
> nvme_ns_chr_uring_cmd (namespace device /dev/nvmeXnY) -> nvme_ns_uring_cmd for
> NVME_URING_CMD_IO/IO_VEC passthrough, and nvme_dev_uring_cmd (controller device
> /dev/nvmeX) for NVME_URING_CMD_ADMIN (format, sanitize, firmware download,
> security send) -- both reach f_op->uring_cmd with no Landlock/IOCTL_DEV gate.
> 
> So the confirmed finding is a missing authorization (the confined task escapes
> its own IOCTL_DEV restriction); the destructive data effect and the NVMe-admin
> high-water-mark follow from the code but are not shown in the run above. The
> proven authorization bypass alone scores CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:C/C:N/I:H/A:N
> (6.5 Medium) -- S:C because the confined task crosses the Landlock policy
> boundary it was placed under, I:H because the bypassed path reaches a handler
> whose authorized completion modifies device data. With the device command
> completing destructively the projected ceiling is
> CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:C/C:N/I:H/A:H (8.4 High), the A:H component
> reasoned from the source rather than executed. No memory safety is involved.
> 
> Suggested direction
> -------------------
> Have Landlock register a uring_cmd hook that maps the device command to the same
> checks the ioctl path applies (IOCTL_DEV, and truncate where relevant), so a
> single chokepoint covers every f_op->uring_cmd provider (block, NVMe, ublk, and
> any future one). Mirrors how SELinux/Smack already gate this surface.
> 
> I am happy to send a patch for this if you would like.

I have read through the code a bit, but I am not sure I follow the argument of
this report. Let me paraphrase my understanding --

* LANDLOCK_ACCESS_FS_IOCTL_DEV is documented as blocking ioctl(2)
  commands on opened character and block devices.
  (c.f. https://docs.kernel.org/userspace-api/landlock.html#filesystem-flags)

* One of many block-device IOCTL operations is BLKDISCARD.

* Block devices offer BLKDISCARD over io_uring as well,
  but io_uring does *not* offer a generic interface through which you
  can do IOCTLs.  It *only* implements BLOCK_URING_CMD_DISCARD in that
  place.  The header where that constant is defined happens to use one
  of the ioctl macros to construct the number, but points out that "It's
  a different number space from ioctl()" (see
  include/uapi/linux/blkdev.h).

So... while this is similar to IOCTL, and while this block device operation is
also available through ioctl(2), this is a different command multiplexer
than IOCTL and I am not convinced that that namespace should be guarded with
the same LANDLOCK_ACCESS_FS_IOCTL_DEV access right.

Do I understand correctly that the only operation affected in this report is
BLOCK_URING_CMD_DISCARD?  Or are there other operations affected by this
(through other devices)?  I saw you also mentioned the truncate right above,
but I assume that for this access right you have not found a way to side-step
it (assuming that this calls the more specific LSM hooks).

Thanks,
—Günther

      parent reply	other threads:[~2026-06-17  9:48 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-16 20:16 Landlock: LANDLOCK_ACCESS_FS_IOCTL_DEV bypass via io_uring IORING_OP_URING_CMD Bryam Vargas
2026-06-16 20:36 ` Jens Axboe
2026-06-17  2:25   ` Bryam Vargas
2026-06-17  2:44     ` Jens Axboe
2026-06-17 14:16       ` Mickaël Salaün
2026-06-17  9:47 ` Günther Noack [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ajJtTHyqWTmX7lHo@google.com \
    --to=gnoack@google.com \
    --cc=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=hexlabsecurity@proton.me \
    --cc=io-uring@vger.kernel.org \
    --cc=kbusch@kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=mic@digikod.net \
    --cc=paul@paul-moore.com \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.