* Re: [PATCH v7 13/43] btrfs: adapt readdir for encrypted and nokey names
From: Eric Biggers @ 2026-06-01 23:44 UTC (permalink / raw)
To: Daniel Vacek
Cc: Chris Mason, Josef Bacik, Theodore Y. Ts'o, Jaegeuk Kim,
Jens Axboe, David Sterba, linux-block, linux-fscrypt, linux-btrfs,
linux-kernel, Omar Sandoval, Sweet Tea Dorminy
In-Reply-To: <20260513085340.3673127-14-neelx@suse.com>
On Wed, May 13, 2026 at 10:52:47AM +0200, Daniel Vacek wrote:
> + /*
> + * TODO: This should maybe be using the crypto API, not the fallback,
> + * but fscrypt uses the fallback and this is only used in emulation of
> + * fscrypt's buffer sha256 method.
> + */
You can delete this TODO. The SHA-256 library functions just do the
right thing now.
- Eric
^ permalink raw reply
* Re: [PATCH v7 08/43] fscrypt: add documentation about extent encryption
From: Eric Biggers @ 2026-06-01 23:43 UTC (permalink / raw)
To: Daniel Vacek
Cc: Chris Mason, Josef Bacik, Theodore Y. Ts'o, Jaegeuk Kim,
Jens Axboe, David Sterba, Jonathan Corbet, Shuah Khan,
linux-block, linux-fscrypt, linux-btrfs, linux-kernel, linux-doc
In-Reply-To: <20260513085340.3673127-9-neelx@suse.com>
On Wed, May 13, 2026 at 10:52:42AM +0200, Daniel Vacek wrote:
> From: Josef Bacik <josef@toxicpanda.com>
>
> Add a couple of sections to the fscrypt documentation about per-extent
> encryption.
A few things that were missed:
- "Contents encryption" section should be updated to document how
extents are encrypted
- btrfs should be added to the list of filesystems in the introduction
section and in the "Tests" section
- "Read-only kernel memory compromise" should be updated to mention the
new limitation which extent-based encryption creates.
Maybe after the sentence "Per-file keys for in-use files will *not* be
removed or wiped.", insert "On filesystems that use per-extent
encryption, master keys for in-use files will not be wiped either."
The list of filesystems can also be found in the help text for the
CONFIG_FS_ENCRYPTION kconfig. That should be updated too.
> +For certain file systems, such as btrfs, it's desired to derive a
> +per-extent encryption key. This is to enable features such as snapshots
It's necessary, not just "desired". The per-file keys just don't work
when multiple files can share the same extents.
> +Currently the inode's master key and encryption policy must match the
> +extent, so you cannot share extents between inodes that were encrypted
> +differently.
Here's another instance of "the inode". It should say something like:
Currently, each extent's master key and encryption mode always match the
corresponding values from each inode that shares the extent.
> +The extent encryption context mirrors the important parts of the above
> +`Encryption context`_, with a few omissions. The struct is defined as
> +follows::
> +
> + struct fscrypt_extent_context {
> + u8 version;
> + u8 encryption_mode;
> + u8 master_key_identifier[FSCRYPT_KEY_IDENTIFIER_SIZE];
> + u8 nonce[FSCRYPT_FILE_NONCE_SIZE];
> + };
> +
> +Currently all fields much match the containing inode's encryption
> +context, with the exception of the nonce.
> +
> +Additionally extent encryption is only supported with
> +FSCRYPT_EXTENT_CONTEXT_V2 using the standard policy; all other policies
> +are disallowed.
FSCRYPT_EXTENT_CONTEXT_V2 should say FSCRYPT_EXTENT_CONTEXT_V1.
Both version and nonce are exceptions, not just nonce.
Unclear what "the standard policy" is.
- Eric
^ permalink raw reply
* Re: [PATCH v3] loop: Fix NULL pointer dereference in lo_rw_aio()
From: Ming Lei @ 2026-06-01 23:36 UTC (permalink / raw)
To: Hillf Danton
Cc: Qu Wenruo, Christoph Hellwig, Damien Le Moal, Tetsuo Handa,
Jens Axboe, Bart Van Assche, linux-block, LKML, Andrew Morton,
Linus Torvalds, linux-btrfs, David Sterba, linux-fsdevel,
Christian Brauner
In-Reply-To: <20260601231731.1403-1-hdanton@sina.com>
On Tue, Jun 02, 2026 at 07:17:30AM +0800, Hillf Danton wrote:
> On Mon, 1 Jun 2026 17:14:59 -0500 Ming Lei wrote:
> > On Tue, Jun 02, 2026 at 05:51:26AM +0800, Hillf Danton wrote:
> > > On OnMon, 1 Jun 2026 10:29:25 -0500 Ming Lei wrote:
> > > > syzbot log shows the null-ptr-deref is on WRITE, instead of DISCARD.
> > > >
> > > > https://syzkaller.appspot.com/bug?extid=cd8a9a308e879a4e2c28
> > > >
> > > > Adding WARN_ON(!lo->lo_backing_file) in loop_queue_rq() might capture
> > > > this bio submission context if this req isn't issued via wq.
> > > >
> > > I suspect this makes $.02 sense given the check of Lo_bound upon queuing rq.
> >
> > Can't lo->lo_state be updated after the check? It is totally lockless...
> >
> Sounds good hm... do you mean it is UNWISE to not flush the loop workqueue
> when closing disk?
Quite the opposite, it is wise to not flush wq in __loop_clr_fd(), please
see my previous comment.
Thanks,
Ming
^ permalink raw reply
* Re: [PATCH v3] loop: Fix NULL pointer dereference in lo_rw_aio()
From: Hillf Danton @ 2026-06-01 23:17 UTC (permalink / raw)
To: Ming Lei
Cc: Qu Wenruo, Christoph Hellwig, Damien Le Moal, Tetsuo Handa,
Jens Axboe, Bart Van Assche, linux-block, LKML, Andrew Morton,
Linus Torvalds, linux-btrfs, David Sterba, linux-fsdevel,
Christian Brauner
In-Reply-To: <ah4EY2nhPxEETufZ@fedora>
On Mon, 1 Jun 2026 17:14:59 -0500 Ming Lei wrote:
> On Tue, Jun 02, 2026 at 05:51:26AM +0800, Hillf Danton wrote:
> > On OnMon, 1 Jun 2026 10:29:25 -0500 Ming Lei wrote:
> > > syzbot log shows the null-ptr-deref is on WRITE, instead of DISCARD.
> > >
> > > https://syzkaller.appspot.com/bug?extid=cd8a9a308e879a4e2c28
> > >
> > > Adding WARN_ON(!lo->lo_backing_file) in loop_queue_rq() might capture
> > > this bio submission context if this req isn't issued via wq.
> > >
> > I suspect this makes $.02 sense given the check of Lo_bound upon queuing rq.
>
> Can't lo->lo_state be updated after the check? It is totally lockless...
>
Sounds good hm... do you mean it is UNWISE to not flush the loop workqueue
when closing disk?
^ permalink raw reply
* Re: [PATCH v7 05/43] blk-crypto: add a process bio callback
From: Eric Biggers @ 2026-06-01 23:09 UTC (permalink / raw)
To: Daniel Vacek
Cc: Chris Mason, Josef Bacik, Theodore Y. Ts'o, Jaegeuk Kim,
Jens Axboe, David Sterba, linux-block, linux-fscrypt, linux-btrfs,
linux-kernel
In-Reply-To: <20260513085340.3673127-6-neelx@suse.com>
On Wed, May 13, 2026 at 10:52:39AM +0200, Daniel Vacek wrote:
> @@ -330,6 +330,17 @@ static void __blk_crypto_fallback_encrypt_bio(struct bio *src_bio,
> }
> }
>
> + /* Process the encrypted bio before we submit it. */
> + if (bc->bc_key->crypto_cfg.process_bio) {
> + blk_status_t status;
> +
> + status = bc->bc_key->crypto_cfg.process_bio(src_bio, enc_bio);
> + if (status != BLK_STS_OK) {
> + enc_bio->bi_status = status;
> + goto out_free_enc_bio;
> + }
> + }
This looks very out of place in this function, because this function
potentially submits multiple encrypted bios whereas this code assumes
there's just one. I think we could at least use a comment here, as well
as a WARN_ON_ONCE(!bc->bc_key->crypto_cfg.process_bio) in the case where
an additional bio is being submitted.
> + /*
> + * We cannot split bio's that have process_bio, as they require the original bio.
> + * The upper layer must make sure to limit the submitted bio's appropriately.
> + */
> + if (bio_segments(src_bio) > BIO_MAX_VECS && bc->bc_key->crypto_cfg.process_bio) {
Checks should be reversed so that the expensive bio_segments() check
only gets done if process_bio is non-NULL.
(Sashiko spotted this too, by the way)
> + * @process_bio: the call back if the upper layer needs to process the encrypted
> + * bio
or NULL if no such callback is needed.
> + * @proces_bio: optional callback to process encrypted bios.
Typo.
By the way, after writing kerneldoc you should run the kerneldoc
checker, as it will find things like this:
$ scripts/kernel-doc -v -none include/linux/blk-crypto.h
Info: include/linux/blk-crypto.h:79 Scanning doc for struct blk_crypto_config
Warning: include/linux/blk-crypto.h:95 struct member 'process_bio' not described in 'blk_crypto_config'
^ permalink raw reply
* Re: [PATCH v7 04/43] fscrypt: conditionally don't wipe mk secret until the last active user is done
From: Eric Biggers @ 2026-06-01 23:04 UTC (permalink / raw)
To: Daniel Vacek
Cc: Chris Mason, Josef Bacik, Theodore Y. Ts'o, Jaegeuk Kim,
Jens Axboe, David Sterba, linux-block, linux-fscrypt, linux-btrfs,
linux-kernel
In-Reply-To: <20260513085340.3673127-5-neelx@suse.com>
On Wed, May 13, 2026 at 10:52:38AM +0200, Daniel Vacek wrote:
> From: Josef Bacik <josef@toxicpanda.com>
>
> Previously we were wiping the master key secret when we do
> FS_IOC_REMOVE_ENCRYPTION_KEY, and then using the fact that it was
> cleared as the mechanism from keeping new users from being setup.
This seems to be describing the state prior to commit 15baf55481de, not
the current tip of tree. We do still wipe on
FS_IOC_REMOVE_ENCRYPTION_KEY, but there's a separate boolean that
maintains the status and everyone looks at that now.
> * FSCRYPT_KEY_STATUS_INCOMPLETELY_REMOVED
> * Removal of this key has been initiated, but some inodes that were
> - * unlocked with it are still in-use. Like ABSENT, ->mk_secret is wiped,
> - * and the key can no longer be used to unlock inodes. Unlike ABSENT, the
> - * key is still in the keyring; ->mk_decrypted_inodes is nonempty; and
> + * unlocked with it are still in-use.
> + * For filesystems using per-extent encryption ->mk_secret is still
> + * being kept as the per-extent keys are derived at writeout time.
> + * Otherwise, like ABSENT, ->mk_secret is wiped, and the key can
> + * no longer be used to unlock inodes. Unlike ABSENT, the key is
> + * still in the keyring; ->mk_decrypted_inodes is nonempty; and
> * ->mk_active_refs > 0, being equal to the size of ->mk_decrypted_inodes.
This is saying that a bunch of things don't apply to extent-based
encryption, due to being after the "Otherwise," when in fact they
actually still do. So the wording here needs improvement. How about:
->mk_secret exists only if the filesystem uses extent-based
encryption, to support key derivation during file data writeback;
otherwise it is wiped. Either way, the key can no longer be used to
unlock inodes. Unlike ABSENT, the key is still in the keyring;
->mk_decrypted_inodes is nonempty; and ->mk_active_refs
> 0, being equal to the size of ->mk_decrypted_inodes.
> *
> * This state transitions to ABSENT if ->mk_decrypted_inodes becomes empty,
> diff --git a/fs/crypto/keyring.c b/fs/crypto/keyring.c
> index be8e6e8011f2..796e02a0db25 100644
> --- a/fs/crypto/keyring.c
> +++ b/fs/crypto/keyring.c
> @@ -110,6 +110,14 @@ void fscrypt_put_master_key_activeref(struct super_block *sb,
> WARN_ON_ONCE(mk->mk_present);
> WARN_ON_ONCE(!list_empty(&mk->mk_decrypted_inodes));
>
> + /* We can't wipe the master key secret until the last activeref is
> + * dropped on the master key with per-extent encryption since the key
> + * derivation continues to happen as long as there are active refs.
> + * Wipe it here now that we're done using it.
> + */
> + if (sb->s_cop->has_per_extent_encryption)
> + wipe_master_key_secret(&mk->mk_secret);
wipe_master_key_secret() is idempotent, so we might as well just do it
unconditionally here.
- Eric
^ permalink raw reply
* Re: [PATCH v7 02/43] fscrypt: allow inline encryption for extent based encryption
From: Eric Biggers @ 2026-06-01 22:49 UTC (permalink / raw)
To: Daniel Vacek
Cc: Chris Mason, Josef Bacik, Theodore Y. Ts'o, Jaegeuk Kim,
Jens Axboe, David Sterba, linux-block, linux-fscrypt, linux-btrfs,
linux-kernel
In-Reply-To: <20260513085340.3673127-3-neelx@suse.com>
On Wed, May 13, 2026 at 10:52:36AM +0200, Daniel Vacek wrote:
> From: Josef Bacik <josef@toxicpanda.com>
>
> Instead of requiring -o inlinecrypt to enable inline encryption, allow
> having s_cop->has_per_extent_encryption to indicate that this file
> system supports inline encryption.
>
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> Signed-off-by: Daniel Vacek <neelx@suse.com>
> ---
>
> v5: https://lore.kernel.org/linux-btrfs/ba0289bf103653d5d98ef576756c9a2a66192865.1706116485.git.josef@toxicpanda.com/
> * No changes since.
There are multiple places that check SB_INLINECRYPT, and just one was
updated. Perhaps we should just require that if a filesystem sets
has_per_extent_encryption, then it also sets SB_INLINECRYPT
unconditionally?
- Eric
^ permalink raw reply
* Re: [PATCH v7 01/43] fscrypt: add per-extent encryption support
From: Eric Biggers @ 2026-06-01 22:44 UTC (permalink / raw)
To: Daniel Vacek
Cc: Chris Mason, Josef Bacik, Theodore Y. Ts'o, Jaegeuk Kim,
Jens Axboe, David Sterba, linux-block, linux-fscrypt, linux-btrfs,
linux-kernel
In-Reply-To: <20260513085340.3673127-2-neelx@suse.com>
On Wed, May 13, 2026 at 10:52:35AM +0200, Daniel Vacek wrote:
> From: Josef Bacik <josef@toxicpanda.com>
>
> This adds the code necessary for per-extent encryption. We will store a
> nonce for every extent we create, and then use the inode's policy and
> the extents nonce to derive a per-extent key.
>
> This is meant to be flexible, if we choose to expand the on-disk extent
> information in the future we have a version number we can use to change
> what exists on disk.
>
> The file system indicates it wants to use per-extent encryption by
> setting s_cop->has_per_extent_encryption. This also requires the use of
> inline block encryption.
>
> The support is relatively straightforward, the only "extra" bit is we're
> deriving a per-extent key to use for the encryption, the inode still
> controls the policy and access to the master key.
>
> Since extent based encryption uses a lot of keys, we're requiring the
> use of inline block crypto if you use extent-based encryption. This
> enables us to take advantage of the built in pooling and reclamation of
> the crypto structures that underpin all of the encryption.
The whole reason for extent-based encryption is that extents can be
shared between inodes. So the repeated mentions of "the inode" are
really confusing. This shows up in a lot of different places.
What's actually implemented is that each extent stores its own
(encryption_mode, master_key_identifier, nonce), but for now the
invariant is maintained that all inodes that reference an extent share
the same (encryption_mode, master_key_identifier) as the extent.
It would be helpful to document this stuff accordingly.
> +/*
> + * fscrypt_extent_context - the encryption context of an extent
> + *
> + * This is the on-disk information stored for an extent. The nonce is used as a
> + * KDF input in conjuction with the inode context to derive a per-extent key for
> + * encryption. This is used only when the filesystem uses per-extent encryption.
> + *
Basically the same issue here. The master_key_identifier is actually
stored in the extent. Just the current implementation enforces that
when the filesystem accesses the extent through some inode, that inode
also has the same master_key_identifier. How about replacing the second
sentence with something like: "The nonce and master_key_identifier are
used to derive the key which encrypts the extent."
> + * With the current implementation, master_key_identifier and encryption mode
> + * must match the inode context. These are here for future expansion where we
> + * may want the option of mixing different keys and encryption modes for the
> + * same file.
> + */
Likewise. Something like: With the current implementation,
master_key_identifier and encryption_mode always match the corresponding
values from the fscrypt_context in each inode that shares the extent.
> +struct fscrypt_extent_context {
> + u8 version; /* FSCRYPT_EXTENT_CONTEXT_V1 */
> + u8 encryption_mode;
> + u8 master_key_identifier[FSCRYPT_KEY_IDENTIFIER_SIZE];
> + u8 nonce[FSCRYPT_FILE_NONCE_SIZE];
> +};
Well, it's an extent nonce, not a file nonce. It seems it's handled
completely separately from the existing file nonce, so it probably
should get its own size constant FSCRYPT_EXTENT_NONCE_SIZE.
> +/**
> + * fscrypt_set_bio_crypt_ctx_from_extent() - prepare a file contents bio for
> + * inline crypto with extent
> + * encryption
> + * @bio: a bio which will eventually be submitted to the file
> + * @ei: the extent's crypto info
@ei: the extent's crypto info, or NULL if the extent is unencrypted
> + * If the contents of the file should be encrypted (or decrypted) with inline
> + * encryption, then assign the appropriate encryption context to the bio.
"If the contents of the file should be encrypted (or decrypted) with
inline encryption" => "If the extent should be encrypted (or decrypted)"
There's no "file" here. And inline encryption is the only option for
extents.
> +/**
> + * fscrypt_mergeable_extent_bio() - test whether data can be added to a bio
> + * @bio: the bio being built up
> + * @ei: the fscrypt_extent_info for this extent
@ei: the extent's crypto info, or NULL if the extent is unencrypted
> + * @pos: the next extent logical offset (in bytes) in the I/O
> + *
> + * When building a bio which may contain data which should undergo inline
> + * encryption (or decryption) via fscrypt,
When building a bio which may contain data which should undergo extent
encryption (or decryption)
> +static struct fscrypt_extent_info *
> +setup_extent_info(struct inode *inode, const u8 nonce[FSCRYPT_FILE_NONCE_SIZE])
> +{
> + struct fscrypt_extent_info *ei;
> + struct fscrypt_inode_info *ci;
> + struct fscrypt_master_key *mk;
> + u8 derived_key[FSCRYPT_MAX_RAW_KEY_SIZE];
> + int keysize;
> + int err;
> +
> + ci = *fscrypt_inode_info_addr(inode);
fscrypt_get_inode_info_raw()
> +/**
> + * fscrypt_prepare_new_extent() - prepare to create a new extent for a file
> + * @inode: the encrypted inode
> + *
> + * If the inode is encrypted, setup the fscrypt_extent_info for a new extent.
> +
> + * This will include the nonce and the derived key necessary for the extent to
> + * be encrypted. This is only meant to be used with inline crypto and on inodes
> + * that need their contents encrypted.
This is ambiguous and contradictory about what type of @inode is
required. It should be something like:
* @inode: an encrypted regular file with its key already set up, on a
* filesystem that uses per-extent encryption
*
* Prepare to encrypt a new extent by generating a new extent nonce,
* deriving an extent key, and allocating an fscrypt_extent_info.
> * This doesn't persist the new extents encryption context, this is done later
> * by calling fscrypt_set_extent_context().
There's no function with that name
> + if (WARN_ON_ONCE(!*fscrypt_inode_info_addr(inode)))
> + return ERR_PTR(-EOPNOTSUPP);
> + if (WARN_ON_ONCE(!fscrypt_inode_uses_inline_crypto(inode)))
> + return ERR_PTR(-EOPNOTSUPP);
I'm confused what these checks are trying to do. The first part checks
for the inode's encryption key, but setup_extent_info() does that
anyway. The second part is maybe intended to check that the file uses
extent encryption, but it doesn't do it correctly. That would require:
fscrypt_needs_contents_encryption(inode) &&
inode->i_sb->s_cop->has_per_extent_encryption.
It probably would make sense to check that directly in
setup_extent_info(), so that it's closer to the call to
fscrypt_hkdf_expand() which would has a *very* bad failure mode when
!has_per_extent_encryption.
> +/**
> + * fscrypt_load_extent_info() - create an fscrypt_extent_info from the context
> + * @inode: the inode
> + * @ctx: the context buffer
> + * @ctx_size: the size of the context buffer
> + *
> + * Create the fscrypt_extent_info and derive the key based on the
> + * fscrypt_extent_context buffer that is provided.
> + *
> + * Return: The newly allocated fscrypt_extent_info on success, -EOPNOTSUPP if
> + * we're not encrypted, or another -errno code
> + */
What context is this expected to be called in? I see the caller uses
memalloc_nofs_save(). This would require making ->mk_sem nofs-safe; is
there a plan to do that? (Sashiko noticed this too, by the way.)
> + const struct fscrypt_inode_info *ci = *fscrypt_inode_info_addr(inode);
fscrypt_get_inode_info_raw(inode)
> +/**
> + * fscrypt_set_extent_context() - Set the fscrypt extent context of a new extent
It seems the function name and semantics changed at some point, but the
kerneldoc wasn't updated.
> + * @inode: the inode this extent belongs to
The inode that the extent will initially belong to, I guess?
> +ssize_t fscrypt_context_for_new_extent(struct inode *inode,
> + struct fscrypt_extent_info *ei, u8 *buf)
> +{
> + struct fscrypt_extent_context *ctx = (struct fscrypt_extent_context *)buf;
> + const struct fscrypt_inode_info *ci = *fscrypt_inode_info_addr(inode);
fscrypt_get_inode_info_raw(inode)
- Eric
^ permalink raw reply
* Re: [PATCH v3] loop: Fix NULL pointer dereference in lo_rw_aio()
From: Qu Wenruo @ 2026-06-01 22:27 UTC (permalink / raw)
To: Brian Foster, Christoph Hellwig
Cc: Damien Le Moal, Tetsuo Handa, Ming Lei, Jens Axboe,
Bart Van Assche, linux-block, LKML, Andrew Morton, Linus Torvalds,
linux-btrfs, David Sterba, linux-fsdevel, Christian Brauner
In-Reply-To: <ah2zdCQ0WA6ol2Gn@bfoster>
在 2026/6/2 01:59, Brian Foster 写道:
> On Mon, Jun 01, 2026 at 04:40:34PM +0200, Christoph Hellwig wrote:
>> On Thu, May 28, 2026 at 07:46:24PM +0930, Qu Wenruo wrote:
>>>> e.g. 9c7504aa72b6 ("xfs: track and serialize in-flight async buffers against
>>>> unmount")
>>> Considering the xfs fix is pretty old, it's before the fix hint thus no
>>> such mention in fstests.
>>>
>>> Do you happen to know which test case is for that fix?
>>> I'd like to adapt it for btrfs as a reproducer.
>>
>> No. Adding Brian who authored that commit.
>>
>
> I haven't followed through the full thread here... But if you're just
> looking for an existing test case associated with the commit above on
> XFS, I did some quick digging and xfs/311 is the original reproducer for
> that one.
Thanks a lot! I'll use the same delayed umount to verify the behavior of
btrfs.
Thanks,
Qu
>
> Brian
>
^ permalink raw reply
* Re: [PATCH] ublk: fix null-ptr-deref in ublk_queue_cmd
From: Ming Lei @ 2026-06-01 22:21 UTC (permalink / raw)
To: Yun Zhou; +Cc: axboe, linux-block, linux-kernel
In-Reply-To: <20260601185544.1180263-1-yun.zhou@windriver.com>
On Mon, Jun 1, 2026 at 1:55 PM Yun Zhou <yun.zhou@windriver.com> wrote:
>
> ublk_queue_cmd() dereferences ios[tag].cmd without NULL check. The cmd
> pointer can be NULL when ublk_cancel_cmd() races with IO dispatch during
> server teardown:
>
> CPU0 (partition scan work) CPU1 (io_uring cancel callback)
> ublk_queue_rq()
> ublk_prep_req() -> OK
> check canceling -> false
> ublk_start_cancel()
> quiesce, set canceling, unquiesce
> ublk_cancel_cmd()
> io->cmd = NULL
> ublk_queue_cmd()
> cmd = ios[tag].cmd -> NULL
> ublk_get_uring_cmd_pdu(cmd) -> null-ptr-deref
>
> The race window exists because ublk_cancel_cmd() can execute between the
> canceling flag check and the cmd dereference in ublk_queue_cmd(). This
> cannot be closed with simple synchronization since blk_mq_quiesce_queue
> only waits for in-flight dispatches, not requests already past the
> canceling check.
>
> Fix by checking cmd for NULL before dereferencing. When NULL, abort the
> request via __ublk_abort_rq() which handles both recovery (requeue) and
> non-recovery (end with IOERR) cases.
>
> Fixes: 71f28f3136af ("ublk_drv: add io_uring based userspace block driver")
> Reported-by: syzbot+415b9ec753cd2a196087@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=415b9ec753cd2a196087
> Signed-off-by: Yun Zhou <yun.zhou@windriver.com>
Hello Yun,
It should have been addressed by the following commit:
https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux.git/commit/?h=for-7.2/block&id=1133b93fc7f63defaa2c07d5f49873c14bb74681
Thanks,
Ming Lei
^ permalink raw reply
* Re: [PATCH v3] loop: Fix NULL pointer dereference in lo_rw_aio()
From: Ming Lei @ 2026-06-01 22:14 UTC (permalink / raw)
To: Hillf Danton
Cc: Qu Wenruo, Christoph Hellwig, Damien Le Moal, Tetsuo Handa,
Jens Axboe, Bart Van Assche, linux-block, LKML, Andrew Morton,
Linus Torvalds, linux-btrfs, David Sterba, linux-fsdevel,
Christian Brauner
In-Reply-To: <20260601215129.1375-1-hdanton@sina.com>
On Tue, Jun 02, 2026 at 05:51:26AM +0800, Hillf Danton wrote:
> On Mon, 1 Jun 2026 10:29:25 -0500 Ming Lei wrote:
> >On Thu, May 28, 2026 at 5:16 AM Qu Wenruo <wqu@suse.com> wrote:
> >> 在 2026/5/28 18:08, Christoph Hellwig 写道:
> >> > On Thu, May 28, 2026 at 03:11:05AM +0900, Damien Le Moal wrote:
> >> >> It sounds like the VFS unmount call needs to have something that waits for
> >> >> sync() to complete. Though, it really feels very strange that an FS can complete
> >> >
> >> > I don't think this is the VFS-controlled VFS file data writeback, which
> >> > we wait on, but some kind of fs controlled metadata. And yes, it looks
> >> > like those file systems are buggy in that area. We definitively had
> >> > such bugs in XFS before and fixed them.
> >> >
> >> > e.g. 9c7504aa72b6 ("xfs: track and serialize in-flight async buffers against
> >> > unmount")
> >> Considering the xfs fix is pretty old, it's before the fix hint thus no
> >> such mention in fstests.
> >>
> >> Do you happen to know which test case is for that fix?
> >> I'd like to adapt it for btrfs as a reproducer.
> >>
> >> This syzbot report doesn't provide a reproducer.
> >>
> >>
> >> Another thing is, if it's some btrfs bios on-the-fly after
> >> close_ctree(), the most common symptom should be NULL pointer
> >> dereference inside various btrfs endio functions.
> >> As all those end_bbio_*() functions are referring to either fs_info or
> >> inode/eb, thus if the fs is unmounted before the bio finished, they
> >> should all cause use-after-free.
> >>
> >> The only exception is discard, which is using blkdev_issue_discard()
> >> thus has no such reference to btrfs internal structure, but that's out
> >> of my understanding.
> >
> > syzbot log shows the null-ptr-deref is on WRITE, instead of DISCARD.
> >
> > https://syzkaller.appspot.com/bug?extid=cd8a9a308e879a4e2c28
> >
> > Adding WARN_ON(!lo->lo_backing_file) in loop_queue_rq() might capture
> > this bio submission context if this req isn't issued via wq.
> >
> I suspect this makes $.02 sense given the check of Lo_bound upon queuing rq.
Can't lo->lo_state be updated after the check? It is totally lockless...
Thanks,
Ming
^ permalink raw reply
* Re: [PATCH v3] loop: Fix NULL pointer dereference in lo_rw_aio()
From: Hillf Danton @ 2026-06-01 21:51 UTC (permalink / raw)
To: Ming Lei
Cc: Qu Wenruo, Christoph Hellwig, Damien Le Moal, Tetsuo Handa,
Jens Axboe, Bart Van Assche, linux-block, LKML, Andrew Morton,
Linus Torvalds, linux-btrfs, David Sterba, linux-fsdevel,
Christian Brauner
In-Reply-To: <CACVXFVOSm=VpYfxUEpYk4yC2N1uAoK=vbg_A=h51GXeeArzCpw@mail.gmail.com>
On Mon, 1 Jun 2026 10:29:25 -0500 Ming Lei wrote:
>On Thu, May 28, 2026 at 5:16 AM Qu Wenruo <wqu@suse.com> wrote:
>> 在 2026/5/28 18:08, Christoph Hellwig 写道:
>> > On Thu, May 28, 2026 at 03:11:05AM +0900, Damien Le Moal wrote:
>> >> It sounds like the VFS unmount call needs to have something that waits for
>> >> sync() to complete. Though, it really feels very strange that an FS can complete
>> >
>> > I don't think this is the VFS-controlled VFS file data writeback, which
>> > we wait on, but some kind of fs controlled metadata. And yes, it looks
>> > like those file systems are buggy in that area. We definitively had
>> > such bugs in XFS before and fixed them.
>> >
>> > e.g. 9c7504aa72b6 ("xfs: track and serialize in-flight async buffers against
>> > unmount")
>> Considering the xfs fix is pretty old, it's before the fix hint thus no
>> such mention in fstests.
>>
>> Do you happen to know which test case is for that fix?
>> I'd like to adapt it for btrfs as a reproducer.
>>
>> This syzbot report doesn't provide a reproducer.
>>
>>
>> Another thing is, if it's some btrfs bios on-the-fly after
>> close_ctree(), the most common symptom should be NULL pointer
>> dereference inside various btrfs endio functions.
>> As all those end_bbio_*() functions are referring to either fs_info or
>> inode/eb, thus if the fs is unmounted before the bio finished, they
>> should all cause use-after-free.
>>
>> The only exception is discard, which is using blkdev_issue_discard()
>> thus has no such reference to btrfs internal structure, but that's out
>> of my understanding.
>
> syzbot log shows the null-ptr-deref is on WRITE, instead of DISCARD.
>
> https://syzkaller.appspot.com/bug?extid=cd8a9a308e879a4e2c28
>
> Adding WARN_ON(!lo->lo_backing_file) in loop_queue_rq() might capture
> this bio submission context if this req isn't issued via wq.
>
I suspect this makes $.02 sense given the check of Lo_bound upon queuing rq.
static blk_status_t loop_queue_rq(struct blk_mq_hw_ctx *hctx,
const struct blk_mq_queue_data *bd)
{
struct request *rq = bd->rq;
struct loop_cmd *cmd = blk_mq_rq_to_pdu(rq);
struct loop_device *lo = rq->q->queuedata;
blk_mq_start_request(rq);
if (data_race(READ_ONCE(lo->lo_state)) != Lo_bound)
return BLK_STS_IOERR;
^ permalink raw reply
* Re: [PATCH] blk-iocost: use irq-safe locking in cgroup handlers
From: Bart Van Assche @ 2026-06-01 21:50 UTC (permalink / raw)
To: Yu Kuai, tj, josef; +Cc: axboe, linux-block, linux-kernel
In-Reply-To: <20260601061312.896801-1-yukuai@fygo.io>
On 5/31/26 11:13 PM, Yu Kuai wrote:
> @@ -3378,14 +3378,14 @@ static u64 ioc_cost_model_prfill(struct seq_file *sf,
> if (!dname)
> return 0;
>
> - spin_lock(&ioc->lock);
> + spin_lock_irq(&ioc->lock);
> seq_printf(sf, "%s ctrl=%s model=linear "
> "rbps=%llu rseqiops=%llu rrandiops=%llu "
> "wbps=%llu wseqiops=%llu wrandiops=%llu\n",
> dname, ioc->user_cost_model ? "user" : "auto",
> u[I_LCOEF_RBPS], u[I_LCOEF_RSEQIOPS], u[I_LCOEF_RRANDIOPS],
> u[I_LCOEF_WBPS], u[I_LCOEF_WSEQIOPS], u[I_LCOEF_WRANDIOPS]);
> - spin_unlock(&ioc->lock);
> + spin_unlock_irq(&ioc->lock);
> return 0;
> }
This change is wrong. ioc_cost_model_prfill() only has one caller,
namely blkcg_print_blkgs(). blkcg_print_blkgs() calls the above function
with interrupts disabled. The spin_unlock_irq(&ioc->lock) at the end of
the above function enables interrupts while q->queue_lock is held. If an
interrupt happens on the same CPU core before q->queue_lock is unlocked,
and that interrupt tries to lock q->queue_lock, a deadlock will occur.
Bart.
^ permalink raw reply
* Re: [PATCH v7 00/43] btrfs: add fscrypt support
From: Eric Biggers @ 2026-06-01 20:09 UTC (permalink / raw)
To: David Sterba
Cc: Daniel Vacek, David Sterba, linux-block, linux-fscrypt,
linux-btrfs, linux-kernel, Chris Mason, Josef Bacik,
Theodore Y. Ts'o, Jaegeuk Kim, Jens Axboe
In-Reply-To: <20260601185730.GE880787@twin.jikos.cz>
On Mon, Jun 01, 2026 at 08:57:30PM +0200, David Sterba wrote:
> On Sat, May 30, 2026 at 05:28:12PM -0700, Eric Biggers wrote:
> > On Fri, May 22, 2026 at 09:00:46AM +0200, Daniel Vacek wrote:
>
> > It's been really hard to find time to review this huge patchset. I've
> > started going through it and will try to leave comments next week.
>
> Thank you. The patchset is huge but we'd like your feedback namely on
> the crypto layer changes, i.e. the first 8 patches.
I know, but properly reviewing those patches requires reviewing their
usage in btrfs too. Plus the whole feature needs to be solid: if
there's a problem with it that's my problem too. And some problems,
e.g. any in the on-disk format, would be unfixable later, so we have to
get them right from the beginning.
> It's too late for 7.2 also because you've asked for some changes.
I think the Sashiko reviews alone make it very clear that this needs
some more work. So if you're considering this to be ready and just
waiting for me, I don't think that's the best way to move this forwards.
> I'm not sure if it was mentioned before wrt how to get this merged. My
> preferred way is to get your Ack for the crypto changes and then we can
> add it to linux-next via our btrfs git.
That will be fine once it's ready.
- Eric
^ permalink raw reply
* Re: [PATCH v5 09/12] block/blk-mq-debugfs: Improve lock context annotations
From: Bart Van Assche @ 2026-06-01 20:00 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Jens Axboe, linux-block, Damien Le Moal, Nathan Chancellor
In-Reply-To: <20260601073337.GE7799@lst.de>
On 6/1/26 12:33 AM, Christoph Hellwig wrote:
> On Thu, May 28, 2026 at 12:45:46PM -0700, Bart Van Assche wrote:
>> --- a/block/blk-mq-debugfs.c
>> +++ b/block/blk-mq-debugfs.c
>> @@ -20,7 +20,7 @@ static int queue_poll_stat_show(void *data, struct seq_file *m)
>> }
>>
>> static void *queue_requeue_list_start(struct seq_file *m, loff_t *pos)
>> - __acquires(&q->requeue_lock)
>> + __acquires(&((struct request_queue *)m->private)->requeue_lock)
>
> I try to member where we got stuck on this, but isn't there a way
> to at least have a macro for this dereference that can have a comment?
>
> Also I guess most seq_file users will have a mess like this, but I
> don't really know a good way around it.
What was requested last time, to introduce macros for similar
expressions in the I/O schedulers, has been implemented. I will
introduce helper macros in blk-mq-debugfs.c too. Inline functions
can't be used for this purpose because Clang's alias analyzer
does not expand inline functions.
Thanks,
Bart.
^ permalink raw reply
* Re: [PATCH v5 08/12] block/blk-iocost: Add lock context annotations
From: Bart Van Assche @ 2026-06-01 19:46 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Jens Axboe, linux-block, Damien Le Moal, Tejun Heo, Josef Bacik,
Marco Elver
In-Reply-To: <20260601073220.GD7799@lst.de>
On 6/1/26 12:32 AM, Christoph Hellwig wrote:
> On Thu, May 28, 2026 at 12:45:45PM -0700, Bart Van Assche wrote:
>> Since iocg_lock() and iocg_unlock() both use conditional locking,
>> annotate both with __no_context_analysis and use token_context_lock() to
>> introduce a new lock context.
>
> Both of these are only called from two funtions. Have you looked into
> merging the locking logic into these helpers to see if this improves
> the situation? __no_context_analysis just seems like a big hammer
> for something relatively simple like this.
This will require splitting ioc_rqos_throttle(). I will look into this.
Bart.
^ permalink raw reply
* [PATCH] ublk: fix null-ptr-deref in ublk_queue_cmd
From: Yun Zhou @ 2026-06-01 18:55 UTC (permalink / raw)
To: tom.leiming, axboe; +Cc: linux-block, linux-kernel, yun.zhou
ublk_queue_cmd() dereferences ios[tag].cmd without NULL check. The cmd
pointer can be NULL when ublk_cancel_cmd() races with IO dispatch during
server teardown:
CPU0 (partition scan work) CPU1 (io_uring cancel callback)
ublk_queue_rq()
ublk_prep_req() -> OK
check canceling -> false
ublk_start_cancel()
quiesce, set canceling, unquiesce
ublk_cancel_cmd()
io->cmd = NULL
ublk_queue_cmd()
cmd = ios[tag].cmd -> NULL
ublk_get_uring_cmd_pdu(cmd) -> null-ptr-deref
The race window exists because ublk_cancel_cmd() can execute between the
canceling flag check and the cmd dereference in ublk_queue_cmd(). This
cannot be closed with simple synchronization since blk_mq_quiesce_queue
only waits for in-flight dispatches, not requests already past the
canceling check.
Fix by checking cmd for NULL before dereferencing. When NULL, abort the
request via __ublk_abort_rq() which handles both recovery (requeue) and
non-recovery (end with IOERR) cases.
Fixes: 71f28f3136af ("ublk_drv: add io_uring based userspace block driver")
Reported-by: syzbot+415b9ec753cd2a196087@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=415b9ec753cd2a196087
Signed-off-by: Yun Zhou <yun.zhou@windriver.com>
---
drivers/block/ublk_drv.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index 6c041eaebdb9..656e0d4e92bb 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -2090,8 +2090,14 @@ static void ublk_batch_queue_cmd(struct ublk_queue *ubq, struct request *rq, boo
static void ublk_queue_cmd(struct ublk_queue *ubq, struct request *rq)
{
struct io_uring_cmd *cmd = ubq->ios[rq->tag].cmd;
- struct ublk_uring_cmd_pdu *pdu = ublk_get_uring_cmd_pdu(cmd);
+ struct ublk_uring_cmd_pdu *pdu;
+
+ if (unlikely(!cmd)) {
+ __ublk_abort_rq(ubq, rq);
+ return;
+ }
+ pdu = ublk_get_uring_cmd_pdu(cmd);
pdu->req = rq;
io_uring_cmd_complete_in_task(cmd, ublk_cmd_tw_cb);
}
--
2.43.0
^ permalink raw reply related
* Re: [PATCH] block, bfq: release cgroup stats with bfq_group
From: Jens Axboe @ 2026-06-01 18:59 UTC (permalink / raw)
To: linux-block, Yu Kuai; +Cc: linux-kernel, jack
In-Reply-To: <20260601061502.899552-1-yukuai@fygo.io>
On Mon, 01 Jun 2026 14:15:02 +0800, Yu Kuai wrote:
> BFQ cgroup stats contain percpu counters embedded in struct bfq_group,
> but the old free path destroys them from bfq_pd_free(), which is tied
> to blkg policy-data teardown.
>
> That is not the same lifetime as struct bfq_group. BFQ pins bfq_group
> while bfq_queue entities refer to it, so bfq_pd_free() can drop the
> policy-data reference while other bfq_group references still exist. The
> following blkcg change also defers policy-data release through RCU and
> leaves BFQ to run the final bfqg_put() from an RCU callback. For that
> conversion, stats teardown must belong to the last bfq_group put, not to
> policy-data teardown.
>
> [...]
Applied, thanks!
[1/1] block, bfq: release cgroup stats with bfq_group
commit: 9e1183a465e4a912cce1e4b389d0d7394c127eb5
Best regards,
--
Jens Axboe
^ permalink raw reply
* Re: [PATCH] blk-iocost: use irq-safe locking in cgroup handlers
From: Jens Axboe @ 2026-06-01 18:59 UTC (permalink / raw)
To: tj, josef, Yu Kuai; +Cc: linux-block, linux-kernel
In-Reply-To: <20260601061312.896801-1-yukuai@fygo.io>
On Mon, 01 Jun 2026 14:13:12 +0800, Yu Kuai wrote:
> ioc_timer_fn() acquires ioc->lock from timer softirq context. The
> io.weight, io.cost.qos and io.cost.model cgroup handlers can take the
> same lock from process context, and the direct handler paths must not do
> so with interrupts enabled.
>
> A blkcg policy configuration reproducer with lockdep reproduced the
> following report:
>
> [...]
Applied, thanks!
[1/1] blk-iocost: use irq-safe locking in cgroup handlers
commit: 1059a743064b873aaaf34ebcff89110c7753fa32
Best regards,
--
Jens Axboe
^ permalink raw reply
* Re: [PATCH v7 00/43] btrfs: add fscrypt support
From: David Sterba @ 2026-06-01 18:57 UTC (permalink / raw)
To: Eric Biggers
Cc: Daniel Vacek, David Sterba, linux-block, linux-fscrypt,
linux-btrfs, linux-kernel, Chris Mason, Josef Bacik,
Theodore Y. Ts'o, Jaegeuk Kim, Jens Axboe
In-Reply-To: <20260531002812.GA2302@sol>
On Sat, May 30, 2026 at 05:28:12PM -0700, Eric Biggers wrote:
> On Fri, May 22, 2026 at 09:00:46AM +0200, Daniel Vacek wrote:
> It's been really hard to find time to review this huge patchset. I've
> started going through it and will try to leave comments next week.
Thank you. The patchset is huge but we'd like your feedback namely on
the crypto layer changes, i.e. the first 8 patches. As this is outside
of btrfs code we can't fix it incrementally later on. The current size
of btrfs-only can be slightly reduced but a lot of must stay so it's
clear how the fscrypt API is used.
I'm not sure if it was mentioned before wrt how to get this merged. My
preferred way is to get your Ack for the crypto changes and then we can
add it to linux-next via our btrfs git.
It's too late for 7.2 also because you've asked for some changes. So the
target is 7.3, hopefully giving all of us enough time to have the
mergeable version in ~7.2-rc3 timeframe.
If you have other ideas how to proceed with the merge process, please
let me know.
> In the mean time it would be really helpful if you went through the
> Sashiko reviews
> (https://sashiko.dev/#/patchset/20260513085340.3673127-1-neelx%40suse.com)
> and address the ones that make sense to. It found 93 issues including
> 16 critical ones, which is kind of a lot. Some of them are the same
> things I'm noticing already. Same for the issue that Christoph noticed
> where new devices can be added; Sashiko had already found that too.
>
> If I'm going to have to use my limited human review time to point out
> issues that were already found, that's not a great use of time.
>
> I also don't see any information about how this was tested (and will
> continue to be tested in the future).
The testing is not straightforward as it needs 3 projects to
synchronize, kernel, fstests and btrfs-progs. Testing may need to use
custom git branches for all of them. For btrfs-progs the changes will ge
it in soon. For fstests it can be a chicken-egg problem, as they don't
accept tests for unmerged code. We've been using our fstests [1] with
additional fixups (not upstreamable, related to CI workarounds). Though
I'm not sure if Daniel has updated the branch with his most recent
version.
[1] https://github.com/btrfs/fstests branch staging
^ permalink raw reply
* Re: [GIT PULL] md-7.2-20260531
From: Jens Axboe @ 2026-06-01 18:55 UTC (permalink / raw)
To: Yu Kuai
Cc: linux-raid, linux-block, Abd-Alrhman Masalkhi, Benjamin Marzinski,
Chen Cheng, Christoph Hellwig, Li Nan, Thorsten Blum
In-Reply-To: <20260531112618.160409-1-yukuai@fygo.io>
On 5/31/26 5:26 AM, Yu Kuai wrote:
> Hi Jens,
>
> Please consider pulling the following changes into your for-7.2/block
> branch.
>
> This pull request contains:
>
> Bug Fixes:
> - Only requeue dm-raid bios when dm is suspending. (Benjamin Marzinski)
> - Reset raid10 read_slot when reusing r10bio for discard. (Chen Cheng)
> - Fix raid1/raid10 deadlock in read error recovery path. (Abd-Alrhman Masalkhi)
> - Fix raid1/raid10 error-path detection with md_cloned_bio(). (Abd-Alrhman Masalkhi)
> - Fix raid1/raid10 bio accounting for split md cloned bios. (Abd-Alrhman Masalkhi)
> - Fix raid1 nr_pending leak in REQ_ATOMIC bad-block path. (Abd-Alrhman Masalkhi)
>
> Improvements:
> - Skip redundant raid_disks updates when the value is unchanged. (Abd-Alrhman Masalkhi)
>
> Cleanups:
> - Update MAINTAINERS email addresses. (Yu Kuai, Li Nan)
> - Clean up raid1 read error handling. (Christoph Hellwig)
> - Move the exceed_read_errors condition out of fix_read_error(). (Christoph Hellwig)
> - Use str_plural() in raid0 dump_zones(). (Thorsten Blum)
Pulled, thanks.
Side note - emails from you with the new email address all end up in
spam. I just noticed some of them this morning. But I think you want to
ensure that the DKIM etc part of your email is all kosher, gmail flags
all of it.
--
Jens Axboe
^ permalink raw reply
* bio_copy_from_iter
From: Matthew Wilcox @ 2026-06-01 17:10 UTC (permalink / raw)
To: Christoph Hellwig, Jens Axboe, linux-block
I'd like to remove copy_page_to_iter() (and only have
copy_folio_to_iter()). That led me to looking at bio_copy_to_iter().
At first glance, switching it to bio_for_each_folio_all() makes a lot of
sense -- if there are large folios involved, then we can copy an entire
folio at a time instead of a page.
But what I can't prove to my satisfaction is that every bio passed to
bio_copy_to_iter() necessarily contains folios. That's not currently
necessary, but will become necessary in the future. [1].
But I started thinking about what we could/should do in the future
when bvecs refer only to physical addresses and not pages. What we'll
have is a list of (phys, len) tuples and need to copy that to an iter.
We don't have an API for that today. Should we add:
copy_phys_to_iter(phys_addr_t phys, size_t len, struct iov_iter *i);
We arguably get even more benefit by doing this as we'd no longer break the
bio up into pages or folios, but can keep the contiguous chunks. Not that
optimising bio_uncopy_user() is going to be a particularly big win,
but this API might be useful in places that are more performance sensitive.
[1] I believe all these bvecs are constructed using
blk_rq_map_user_iov() which can end up calling bio_add_vmalloc(),
and vmalloc pages will not be folios.
^ permalink raw reply
* Re: [PATCH v3] loop: Fix NULL pointer dereference in lo_rw_aio()
From: Brian Foster @ 2026-06-01 16:29 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Qu Wenruo, Damien Le Moal, Tetsuo Handa, Ming Lei, Jens Axboe,
Bart Van Assche, linux-block, LKML, Andrew Morton, Linus Torvalds,
linux-btrfs, David Sterba, linux-fsdevel, Christian Brauner
In-Reply-To: <20260601144034.GA4918@lst.de>
On Mon, Jun 01, 2026 at 04:40:34PM +0200, Christoph Hellwig wrote:
> On Thu, May 28, 2026 at 07:46:24PM +0930, Qu Wenruo wrote:
> >> e.g. 9c7504aa72b6 ("xfs: track and serialize in-flight async buffers against
> >> unmount")
> > Considering the xfs fix is pretty old, it's before the fix hint thus no
> > such mention in fstests.
> >
> > Do you happen to know which test case is for that fix?
> > I'd like to adapt it for btrfs as a reproducer.
>
> No. Adding Brian who authored that commit.
>
I haven't followed through the full thread here... But if you're just
looking for an existing test case associated with the commit above on
XFS, I did some quick digging and xfs/311 is the original reproducer for
that one.
Brian
^ permalink raw reply
* Re: [PATCH v3 4/4] dm crypt: batch all sectors of a bio per crypto request
From: Mikulas Patocka @ 2026-06-01 16:15 UTC (permalink / raw)
To: Leonid Ravich
Cc: Herbert Xu, Alasdair Kergon, Ard Biesheuvel, Eric Biggers,
Jens Axboe, Horia Geanta, Gilad Ben-Yossef, linux-crypto,
dm-devel, linux-block
In-Reply-To: <20260601085644.13026-5-lravich@amazon.com>
On Mon, 1 Jun 2026, Leonid Ravich wrote:
> When the underlying skcipher driver advertises support for multiple
> data units in a single request (CRYPTO_ALG_SKCIPHER_MULTI_DATA_UNIT),
> configure the cipher with cc->sector_size as data_unit_size and
> submit one request per bio instead of one request per sector. This
> removes per-sector overhead in the crypto API hot path: request
> allocation, callback dispatch, completion handling, and SG setup.
>
> The optimisation is enabled automatically at table load when all
> of the following hold:
>
> - the cipher is non-aead (i.e. skcipher);
> - tfms_count is 1 (interleaved per-sector keys would break batching);
> - the IV mode is plain or plain64 (the only modes whose generator
> produces a sequential 64-bit little-endian counter that the cipher
> can extend by adding the data-unit index, matching the convention
> documented in crypto_skcipher_set_data_unit_size());
> - the iv_gen_ops->post() hook is unset (lmk and tcw use it; both are
> already excluded by the IV-mode test, but the explicit check makes
> the assumption durable against future IV modes);
> - dm-integrity is not stacked (no integrity tag or integrity IV);
> - the cipher driver advertises multi-data-unit support.
>
> A new CRYPT_MULTI_DATA_UNIT cipher_flag, set once at construction
> time, gates the multi-data-unit path. The existing per-sector path
> in crypt_convert_block_skcipher() is unchanged; the new
> crypt_convert_block_skcipher_multi() is reached from a small dispatch
> in crypt_convert() and shares the same backlog/-EBUSY/-EINPROGRESS
> flow control with the per-sector path.
>
> Heap-allocated scatterlists are stashed in dm_crypt_request and freed
> in crypt_free_req_skcipher() to avoid races between the synchronous-
> success free path and async-completion reuse from the request pool.
> On -ENOMEM during scatterlist allocation, the bio is requeued via
> BLK_STS_DEV_RESOURCE rather than failed, matching the behaviour of
> the existing -ENOMEM path for crypto request allocation.
>
> Verified end-to-end with a byte-equivalence test: encrypted output of
> plain64 dm-crypt with the multi-data-unit path matches output of the
> single-data-unit path bit-for-bit over a 256 MB device.
>
> Signed-off-by: Leonid Ravich <lravich@amazon.com>
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
^ permalink raw reply
* Re: [PATCH] block, bfq: release cgroup stats with bfq_group
From: Jan Kara @ 2026-06-01 16:13 UTC (permalink / raw)
To: Yu Kuai; +Cc: axboe, linux-block, linux-kernel, jack
In-Reply-To: <20260601061502.899552-1-yukuai@fygo.io>
On Mon 01-06-26 14:15:02, Yu Kuai wrote:
> BFQ cgroup stats contain percpu counters embedded in struct bfq_group,
> but the old free path destroys them from bfq_pd_free(), which is tied
> to blkg policy-data teardown.
>
> That is not the same lifetime as struct bfq_group. BFQ pins bfq_group
> while bfq_queue entities refer to it, so bfq_pd_free() can drop the
> policy-data reference while other bfq_group references still exist. The
> following blkcg change also defers policy-data release through RCU and
> leaves BFQ to run the final bfqg_put() from an RCU callback. For that
> conversion, stats teardown must belong to the last bfq_group put, not to
> policy-data teardown.
>
> Move stats teardown to bfqg_put() so the embedded counters are destroyed
> exactly when the last bfq_group reference is released, before kfree(bfqg).
>
> Without this preparatory change, the RCU-delayed policy-data free
> conversion reproduced the following KASAN report:
>
> BUG: KASAN: slab-use-after-free in percpu_counter_destroy_many+0xf1/0x2e0
> Write of size 8 at addr ffff88811d9409e0 by task test_blkcg/535
>
> CPU: 0 UID: 0 PID: 535 Comm: test_blkcg Not tainted 7.1.0-rc2-g1e14adca0199 #1 PREEMPT ea13f83d4b74a12510d20db4a7d9a0fe8275f05c
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-5.fc42 04/01/2014
> Call Trace:
> <TASK>
> dump_stack_lvl+0x54/0x70
> print_address_description+0x77/0x200
> ? percpu_counter_destroy_many+0xf1/0x2e0
> print_report+0x64/0x70
> kasan_report+0x118/0x150
> ? percpu_counter_destroy_many+0xf1/0x2e0
> percpu_counter_destroy_many+0xf1/0x2e0
> __mmdrop+0x1d8/0x350
> finish_task_switch+0x3f5/0x570
> __schedule+0xe8e/0x18a0
> schedule+0xfe/0x1c0
> schedule_timeout+0x7f/0x1d0
> __wait_for_common+0x26c/0x3f0
> wait_for_completion_state+0x21/0x40
> call_usermodehelper_exec+0x271/0x2c0
> __request_module+0x296/0x410
> elv_iosched_store+0x1bc/0x2c0
> queue_attr_store+0x152/0x1c0
> kernfs_fop_write_iter+0x1d7/0x280
> vfs_write+0x580/0x630
> ksys_write+0xec/0x190
> do_syscall_64+0x156/0x490
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
>
> Allocated by task 535:
> kasan_save_track+0x3e/0x80
> __kasan_kmalloc+0x72/0x90
> bfq_pd_alloc+0x60/0x100 [bfq]
> blkg_create+0x3bb/0xbe0
> blkg_lookup_create+0x3a2/0x460
> blkg_conf_start+0x24a/0x2d0
> bfq_io_set_weight+0x17f/0x430 [bfq]
> cgroup_file_write+0x1c5/0x4b0
> kernfs_fop_write_iter+0x1d7/0x280
> vfs_write+0x580/0x630
> ksys_write+0xec/0x190
> do_syscall_64+0x156/0x490
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
>
> Freed by task 0:
> kasan_save_track+0x3e/0x80
> kasan_save_free_info+0x46/0x50
> __kasan_slab_free+0x3a/0x60
> kfree+0x14e/0x4f0
> rcu_core+0x6f3/0xcd0
> handle_softirqs+0x1a0/0x550
> __irq_exit_rcu+0x8c/0x150
> irq_exit_rcu+0xe/0x20
> sysvec_apic_timer_interrupt+0x6e/0x80
> asm_sysvec_apic_timer_interrupt+0x1a/0x20
>
> Last potentially related work creation:
> kasan_save_stack+0x3e/0x60
> kasan_record_aux_stack+0x99/0xb0
> call_rcu+0x55/0x5c0
> blkg_free_workfn+0x130/0x220
> process_scheduled_works+0x655/0xb60
> worker_thread+0x446/0x600
> kthread+0x1f4/0x230
> ret_from_fork+0x259/0x420
> ret_from_fork_asm+0x1a/0x30
>
> Signed-off-by: Yu Kuai <yukuai@fygo.io>
Makes sense. Feel free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
> ---
> block/bfq-cgroup.c | 43 ++++++++++++++++++++++---------------------
> 1 file changed, 22 insertions(+), 21 deletions(-)
>
> diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
> index ac83b0668764..37ab70930c8d 100644
> --- a/block/bfq-cgroup.c
> +++ b/block/bfq-cgroup.c
> @@ -300,6 +300,25 @@ static struct bfq_group *bfqg_parent(struct bfq_group *bfqg)
> return pblkg ? blkg_to_bfqg(pblkg) : NULL;
> }
>
> +static void bfqg_stats_exit(struct bfqg_stats *stats)
> +{
> + blkg_rwstat_exit(&stats->bytes);
> + blkg_rwstat_exit(&stats->ios);
> +#ifdef CONFIG_BFQ_CGROUP_DEBUG
> + blkg_rwstat_exit(&stats->merged);
> + blkg_rwstat_exit(&stats->service_time);
> + blkg_rwstat_exit(&stats->wait_time);
> + blkg_rwstat_exit(&stats->queued);
> + bfq_stat_exit(&stats->time);
> + bfq_stat_exit(&stats->avg_queue_size_sum);
> + bfq_stat_exit(&stats->avg_queue_size_samples);
> + bfq_stat_exit(&stats->dequeue);
> + bfq_stat_exit(&stats->group_wait_time);
> + bfq_stat_exit(&stats->idle_time);
> + bfq_stat_exit(&stats->empty_time);
> +#endif
> +}
> +
> struct bfq_group *bfqq_group(struct bfq_queue *bfqq)
> {
> struct bfq_entity *group_entity = bfqq->entity.parent;
> @@ -321,8 +340,10 @@ static void bfqg_get(struct bfq_group *bfqg)
>
> static void bfqg_put(struct bfq_group *bfqg)
> {
> - if (refcount_dec_and_test(&bfqg->ref))
> + if (refcount_dec_and_test(&bfqg->ref)) {
> + bfqg_stats_exit(&bfqg->stats);
> kfree(bfqg);
> + }
> }
>
> static void bfqg_and_blkg_get(struct bfq_group *bfqg)
> @@ -433,25 +454,6 @@ void bfq_init_entity(struct bfq_entity *entity, struct bfq_group *bfqg)
> entity->sched_data = &bfqg->sched_data;
> }
>
> -static void bfqg_stats_exit(struct bfqg_stats *stats)
> -{
> - blkg_rwstat_exit(&stats->bytes);
> - blkg_rwstat_exit(&stats->ios);
> -#ifdef CONFIG_BFQ_CGROUP_DEBUG
> - blkg_rwstat_exit(&stats->merged);
> - blkg_rwstat_exit(&stats->service_time);
> - blkg_rwstat_exit(&stats->wait_time);
> - blkg_rwstat_exit(&stats->queued);
> - bfq_stat_exit(&stats->time);
> - bfq_stat_exit(&stats->avg_queue_size_sum);
> - bfq_stat_exit(&stats->avg_queue_size_samples);
> - bfq_stat_exit(&stats->dequeue);
> - bfq_stat_exit(&stats->group_wait_time);
> - bfq_stat_exit(&stats->idle_time);
> - bfq_stat_exit(&stats->empty_time);
> -#endif
> -}
> -
> static int bfqg_stats_init(struct bfqg_stats *stats, gfp_t gfp)
> {
> if (blkg_rwstat_init(&stats->bytes, gfp) ||
> @@ -552,7 +554,6 @@ static void bfq_pd_free(struct blkg_policy_data *pd)
> {
> struct bfq_group *bfqg = pd_to_bfqg(pd);
>
> - bfqg_stats_exit(&bfqg->stats);
> bfqg_put(bfqg);
> }
>
> --
> 2.51.0
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox