Linux block layer

Linux block layer
 help / color / mirror / Atom feed

* [PATCH v2] block: partitions: replace __get_free_page() with kmalloc()
From: Mike Rapoport (Microsoft) @ 2026-05-27 14:33 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, Hannes Reinecke, Matthew Wilcox, Mike Rapoport,
	Vlastimil Babka, linux-block, linux-kernel, linux-mm

check_partition() allocates a buffer to use as backing memory for
seq_buf.

This buffer can be allocated with kmalloc() as there's nothing special
about it to go directly to the page allocator.

kmalloc() provides a better API that does not require ugly casts and
kfree() does not need to know the size of the freed object.

For a single allocation on the cold path the performance difference between
kmalloc() and __get_free_pages() is not measurable as both allocators take
an object/page from a per-CPU list for fast path allocations.

For the slow path the performance is anyway determined by the amount of
reclaim involved rather than by what allocator is used.

Replace use of __get_free_page() with kmalloc() and free_page() with
kfree().

Link: https://lore.kernel.org/all/635405e4-9423-4a25-a6e7-e03c8ea0bcbe@redhat.com
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
---
This is a (tiny) part of larger work of replacing page allocator calls
with kmalloc:

Also in git:
https://git.kernel.org/pub/scm/linux/kernel/git/rppt/linux.git gfp-to-kmalloc/block

Signed-off-by: Mike Rapoport <rppt@kernel.org>
---
v2 changes:
* reword changelog

To: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
v1: https://patch.msgid.link/20260520-block-v1-1-6463dc2cf042@kernel.org
---
 block/partitions/core.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/block/partitions/core.c b/block/partitions/core.c
index 5d5332ce586b..b5c59b79ca7c 100644
--- a/block/partitions/core.c
+++ b/block/partitions/core.c
@@ -124,7 +124,7 @@ static struct parsed_partitions *check_partition(struct gendisk *hd)
 	state = allocate_partitions(hd);
 	if (!state)
 		return NULL;
-	state->pp_buf.buffer = (char *)__get_free_page(GFP_KERNEL);
+	state->pp_buf.buffer = kmalloc(PAGE_SIZE, GFP_KERNEL);
 	if (!state->pp_buf.buffer) {
 		free_partitions(state);
 		return NULL;
@@ -154,7 +154,7 @@ static struct parsed_partitions *check_partition(struct gendisk *hd)
 	if (res > 0) {
 		printk(KERN_INFO "%s", seq_buf_str(&state->pp_buf));
 
-		free_page((unsigned long)state->pp_buf.buffer);
+		kfree(state->pp_buf.buffer);
 		return state;
 	}
 	if (state->access_beyond_eod)
@@ -170,7 +170,7 @@ static struct parsed_partitions *check_partition(struct gendisk *hd)
 		printk(KERN_INFO "%s", seq_buf_str(&state->pp_buf));
 	}
 
-	free_page((unsigned long)state->pp_buf.buffer);
+	kfree(state->pp_buf.buffer);
 	free_partitions(state);
 	return ERR_PTR(res);
 }

---
base-commit: 5d6919055dec134de3c40167a490f33c74c12581
change-id: 20260520-block-25582753fd38

Best regards,
--  
Sincerely yours,
Mike.


^ permalink raw reply related

* Re: [PATCH v2 0/2] zram: fix UAF in zram_bvec_write_partial() and drop dead bio plumbing
From: Cunlong Li @ 2026-05-27 14:15 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Minchan Kim, Jens Axboe, Andrew Morton, Christoph Hellwig,
	linux-block, linux-mm, linux-kernel, stable
In-Reply-To: <ahabSU6QcGJ4T4ZP@google.com>

On Wed, May 27, 2026 at 04:21:53PM +0900, Sergey Senozhatsky wrote:
> On (26/05/27 12:49), Cunlong Li wrote:
> > Patch 1 fixes a use-after-free in zram_bvec_write_partial() that
> > happens on PAGE_SIZE > 4K configurations when a partial write hits a
> > ZRAM_WB slot.
> > 
> > Patch 2 is a follow-up cleanup that drops the now-unused bio parameter
> > from zram_bvec_write_partial() and zram_bvec_write(), no functional
> > change.
> 
> Did you test it?

Compile-tested only so far; I haven't had a chance to run a
PAGE_SIZE > 4K reproducer yet.

Thanks for the review.

> 
> Looks reasonable (unless I'm missing something):
> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>

^ permalink raw reply

* Re: Should the "loop" driver reject using pseudo files as backing file?
From: Christoph Hellwig @ 2026-05-27 14:14 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: Jens Axboe, linux-block
In-Reply-To: <d38e4600-3c32-491f-aa49-905f4fad1bfb@I-love.SAKURA.ne.jp>

On Wed, May 27, 2026 at 10:52:02AM +0900, Tetsuo Handa wrote:
> I noticed that /dev/loopX accepts pseudo files, for currently
> loop_validate_file() does
> 
> 	if (!S_ISREG(inode->i_mode) && !S_ISBLK(inode->i_mode))
> 		return -EINVAL;
> 
> and pseudo files are treated as S_ISREG().
> 
> Reading pseudo files via /dev/loopX causes bogus results (tries to
> repeatedly read the entire content up to the size visible to "ls"
> command). I think that allowing such usage will confuse userspace
> programs.

You get what you pay for.  Many other in-kernel and userspace users
will be just as confused by them.

^ permalink raw reply

* Re: [PATCH v2 1/2] zram: fix use-after-free in zram_bvec_write_partial()
From: Cunlong Li @ 2026-05-27 14:13 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Minchan Kim, Sergey Senozhatsky, Jens Axboe, Andrew Morton,
	linux-block, linux-mm, linux-kernel, stable
In-Reply-To: <20260527072414.GA17856@lst.de>

On Wed, May 27, 2026 at 09:24:14AM +0200, Christoph Hellwig wrote:
> On Wed, May 27, 2026 at 12:49:24PM +0800, Cunlong Li wrote:
> > zram_read_page() picks the sync or async backing device read path
> > based on whether the parent bio is NULL.  zram_bvec_write_partial()
> > passes its parent bio down, so for ZRAM_WB slots the read is
> > dispatched asynchronously and zram_read_page() returns 0 while the
> > bio is still in flight.  The caller then runs memcpy_from_bvec(),
> > zram_write_page() and __free_page() on the buffer, leaving the
> > async read to write into a freed page.
> > 
> > zram_bvec_read_partial() was switched to NULL in commit 4e3c87b9421d
> > ("zram: fix synchronous reads") for the same reason; the
> > write_partial counterpart was missed.
> > 
> > Fixes: 4e3c87b9421d ("zram: fix synchronous reads")
> 
> That's just the last patch touching the line.  This bio chaining goes
> further back.  AFAICS all the way to introducing backing device support
> in: 8e654f8fbff5 ("zram: read page from backing device")

You're right, thanks for catching this -- will fix in v3 with:

Fixes: 8e654f8fbff5 ("zram: read page from backing device")

> 
> The patch itself looks good, though:
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply

* Re: [PATCH v7] block: propagate in_flight to whole disk on partition I/O
From: Christoph Hellwig @ 2026-05-27 14:10 UTC (permalink / raw)
  To: Tang Yizhou; +Cc: axboe, kbusch, yukuai, linux-block, linux-kernel, Leon Hwang
In-Reply-To: <20260526021555.359500-1-yizhou.tang@shopee.com>

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply

* Re: [PATCH] rust: block: mq: align init_request numa_node arg with C signature
From: Gary Guo @ 2026-05-27 14:06 UTC (permalink / raw)
  To: mateusz.nowicki, Andreas Hindborg
  Cc: Boqun Feng, Miguel Ojeda, Gary Guo, Björn Roy Baron,
	Benno Lossin, Alice Ryhl, Trevor Gross, Danilo Krummrich,
	Jens Axboe, linux-block, rust-for-linux, linux-kernel
In-Reply-To: <1552076fb3201e5e47ead0989e793472@posteo.net>

On Wed May 27, 2026 at 12:57 PM BST, mateusz.nowicki wrote:
> Hello Andreas,
>
> how can I catch it earlier in the future? I verified patch correctness 
> with compiling 'allyesconfig' but
> I didn't catch rust issue.

Currently allyesconfig will skip Rust if you don't have a suitable version of
Rust compiler installed.

There's discussions on getting it changed:
https://lore.kernel.org/rust-for-linux/20260521-evolve-to-crab-v2-1-c18e0e98fc54@chaosmail.tech/

Best,
Gary

^ permalink raw reply

* Re: [PATCH RFC 5/5] block, nvme: add failed_bio callback for multipath bio failover
From: Christoph Hellwig @ 2026-05-27 14:04 UTC (permalink / raw)
  To: Keith Busch
  Cc: Christoph Hellwig, Keith Busch, linux-block, linux-nvme, axboe,
	tom.leiming, coshi036, Igor.Achkinazi, dlemoal
In-Reply-To: <ag3Sweie8nWMpmqq@kbusch-mbp>

Sorry for the late reply, took me a bit to catch up after conference
travel last week and a public holiday on Monday.

On Wed, May 20, 2026 at 09:26:57AM -0600, Keith Busch wrote:
> > Yes, and in the case being addressed here, the "zero capacity" setting
> > is path specific, hence the driver wants to attempt a failover. I
> > imagine general capacity violations are not path specific though, so
> > this is kind of a weird case.
> 
> Oh, and it's not just the zero capacity IO error that multipath wants to
> hanlde. It's also that we've marked the path's disk dead, so there's a
> race if bio_queue_enter() will call bio_io_error() that this patch
> handles. I should have mentioned that case too, which wasn't handled
> with the BIO_REMAPPED flag suggestion.

Well, how do we know the bio actually is owned by the driver?  Callers
both in file systems and remapping block drivers can call bio_io_error
before calling into the driver that owns bi_bdev.

I guess we'd need some flag to desіgnate it is owned by the driver,
and bio_set_dev would have to clear it.

Alternatively we could pas a bdev to bio_submit* and only update
bi_bdev just before the bio is passed to the driver.  Given that
the idea to set the bdev at allocation time didn't work out this
might be sensible, but it will cause a lot of churn.

^ permalink raw reply

* Re: [PATCHv2 1/2] block: export passthrough stats enabled
From: Christoph Hellwig @ 2026-05-27 13:28 UTC (permalink / raw)
  To: Keith Busch
  Cc: Christoph Hellwig, Keith Busch, linux-block, linux-nvme, axboe,
	nilay
In-Reply-To: <ahbwAxG1Pj2z3Q21@kbusch-mbp>

On Wed, May 27, 2026 at 07:22:11AM -0600, Keith Busch wrote:
> On Wed, May 27, 2026 at 03:14:58PM +0200, Christoph Hellwig wrote:
> > > +static inline bool blk_rq_passthrough_stats(struct request *req,
> > > +					    struct request_queue *q)
> > 
> > The kerneldoc requested last time would be really nice to have.
> > Also, now that this is a public API killing the q argument and
> > just using req->q would make the API easier to understand.
> 
> Oh, my missing kerneldoc is clearly a problem. We need a separate
> argument for the request_queue because the q we want to count stats for
> may not be the one providing the request.

Even more argument for having the documentation :)


^ permalink raw reply

* Re: [PATCHv2 2/2] nvme: add support multipath passthrough iostats
From: Keith Busch @ 2026-05-27 13:27 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Keith Busch, linux-block, linux-nvme, axboe, nilay
In-Reply-To: <20260527131657.GB10351@lst.de>

On Wed, May 27, 2026 at 03:16:57PM +0200, Christoph Hellwig wrote:
> On Tue, May 26, 2026 at 08:39:21AM -0700, Keith Busch wrote:
> > +	struct nvme_ns *ns = q->queuedata;
> >  	struct request *req;
> >  
> > +	if (ns && nvme_ns_head_multipath(ns->head))
> > +		rq_flags |= REQ_NVME_MPATH;
> 
> I just wanted to come with reasons why it this is wrong, but it actually
> seems already - we only have a ns for multipath nodes if multipathing
> is enabled.  Maybe throw a little comment in on that?

Yes. An added subtlety here is that we unconditionally set
REQ_FAILFAST_DRIVER for passthrough requests, so it does not get
failover consideration. You can't steal bio's from a REQ_DRV_IN/OUT
request because all the necessary driver info is attached to the
original request, which doesn't follow the bio.

^ permalink raw reply

* Re: [PATCHv2 1/2] block: export passthrough stats enabled
From: Keith Busch @ 2026-05-27 13:22 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Keith Busch, linux-block, linux-nvme, axboe, nilay
In-Reply-To: <20260527131458.GA10351@lst.de>

On Wed, May 27, 2026 at 03:14:58PM +0200, Christoph Hellwig wrote:
> > +static inline bool blk_rq_passthrough_stats(struct request *req,
> > +					    struct request_queue *q)
> 
> The kerneldoc requested last time would be really nice to have.
> Also, now that this is a public API killing the q argument and
> just using req->q would make the API easier to understand.

Oh, my missing kerneldoc is clearly a problem. We need a separate
argument for the request_queue because the q we want to count stats for
may not be the one providing the request. For nvme multipath, we want to
count stats against the head's request_queue, but the request comes from
the ns path, so we need both args.

^ permalink raw reply

* Re: [PATCHv2 2/2] nvme: add support multipath passthrough iostats
From: Christoph Hellwig @ 2026-05-27 13:16 UTC (permalink / raw)
  To: Keith Busch; +Cc: linux-block, linux-nvme, axboe, hch, nilay, Keith Busch
In-Reply-To: <20260526153921.2402015-3-kbusch@meta.com>

On Tue, May 26, 2026 at 08:39:21AM -0700, Keith Busch wrote:
> +	struct nvme_ns *ns = q->queuedata;
>  	struct request *req;
>  
> +	if (ns && nvme_ns_head_multipath(ns->head))
> +		rq_flags |= REQ_NVME_MPATH;

I just wanted to come with reasons why it this is wrong, but it actually
seems already - we only have a ns for multipath nodes if multipathing
is enabled.  Maybe throw a little comment in on that?

Otherwise looks good.

^ permalink raw reply

* Re: [PATCH] block: partitions: replace __get_free_page() with kmalloc()
From: Hannes Reinecke @ 2026-05-27 13:16 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Vlastimil Babka, Matthew Wilcox, Mike Rapoport, Jens Axboe,
	linux-block, linux-kernel, linux-mm
In-Reply-To: <ahbti6M8zRWR1oGa@infradead.org>

On 5/27/26 15:11, Christoph Hellwig wrote:
> On Wed, May 27, 2026 at 12:04:29PM +0200, Hannes Reinecke wrote:
>> Precisely my reasoning. In most cases, __get_free_page() is just a
>> lazy way of saying "I need some memory and the allocation should not fail".
> 
> Huh?  __get_free_page and kmalloc can and should fail.
> 
Even better, one argument less why it should not be converted :-)

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich

^ permalink raw reply

* Re: [PATCHv2 1/2] block: export passthrough stats enabled
From: Christoph Hellwig @ 2026-05-27 13:14 UTC (permalink / raw)
  To: Keith Busch; +Cc: linux-block, linux-nvme, axboe, hch, nilay, Keith Busch
In-Reply-To: <20260526153921.2402015-2-kbusch@meta.com>

> +static inline bool blk_rq_passthrough_stats(struct request *req,
> +					    struct request_queue *q)

The kerneldoc requested last time would be really nice to have.
Also, now that this is a public API killing the q argument and
just using req->q would make the API easier to understand.

^ permalink raw reply

* Re: [PATCH] block: partitions: replace __get_free_page() with kmalloc()
From: Christoph Hellwig @ 2026-05-27 13:11 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Vlastimil Babka, Matthew Wilcox, Mike Rapoport, Christoph Hellwig,
	Jens Axboe, linux-block, linux-kernel, linux-mm
In-Reply-To: <17258981-d5b3-4ed1-9b3d-56a0b6ecb3f9@suse.de>

On Wed, May 27, 2026 at 12:04:29PM +0200, Hannes Reinecke wrote:
> Precisely my reasoning. In most cases, __get_free_page() is just a
> lazy way of saying "I need some memory and the allocation should not fail".

Huh?  __get_free_page and kmalloc can and should fail.


^ permalink raw reply

* Re: [PATCH v6 1/4] block: add task-context bio completion infrastructure
From: Christoph Hellwig @ 2026-05-27 13:00 UTC (permalink / raw)
  To: Jan Kara
  Cc: Tal Zussman, Christoph Hellwig, Jens Axboe,
	Matthew Wilcox (Oracle), Christian Brauner, Darrick J. Wong,
	Carlos Maiolino, Alexander Viro, Dave Chinner, Bart Van Assche,
	linux-block, linux-kernel, linux-xfs, linux-fsdevel, linux-mm,
	Gao Xiang
In-Reply-To: <rkb5oei6cx2erbyininczz6ukbnquqheexhu7tznn5rslkkdn7@b4kpdyhdwdb6>

On Wed, May 27, 2026 at 11:42:28AM +0200, Jan Kara wrote:
> > I ran some experiments with fio on both XFS and a raw block device. Five
> > iterations each for 60s. Results below.
> > 
> > TLDR: Removing the delay doesn't significantly decrease user-visible
> > latency or otherwise improve performance, but does significantly reduce
> > throughput and increase context switches in some workloads (e.g. C).
> > I think it makes sense to leave the delay as-is. Thoughts?
> 
> Thanks for the test! One question below:

Thanks from me as well!

> 
> > Results:
> > 
> > Workloads (all `uncached=1`):
> >   A: rw=write     bs=128k iodepth=1   ioengine=pvsync2     # XFS
> >   B: rw=write     bs=128k iodepth=128 ioengine=io_uring    # XFS
> >   C: rw=randwrite bs=4k   iodepth=32  ioengine=io_uring    # XFS
> >   D: rw=rw 50/50  bs=64k  iodepth=32  ioengine=io_uring    # XFS
> >   E: rw=write     bs=128k iodepth=128 ioengine=io_uring    # raw /dev/nvmeXn1
> >   F: rw=write     bs=128k iodepth=128 numjobs=4
> >      + vm.dirty_bytes=64MB, vm.dirty_background_bytes=32MB # XFS
> > 
> > Mean ± stddev across 5 iterations:
> > 
> >     metric                     delay=1           delay=0     delta
> >     --------------------------------------------------------------
> > 
> >   A seq 128k qd1
> >     BW (MB/s)                4333 ± 27         4374 ± 34     +0.9%
> >     p99   (us)              36.2 ± 0.8        35.8 ± 0.4     -1.1%
> >     p999  (us)               3260 ± 75         3228 ± 29     -1.0%
> >     ctx-switches          184 k ± 59 k     3.68 M ± 65 k    +1903%
> >     cs / io                0.09 ± 0.03       1.86 ± 0.03    +1888%
> >     avg bios/run            80.4 ± 0.6         1.1 ± 0.0    -98.7%
> 
> So 1 jiffie delay is (with default HZ=1000) 1ms. That means for this load
> the completion latency should be at least 1000us but your results show p99
> latency of 36. What am I missing?

Yes, this looks a bit odd.  Unless there's multiple threads submitting
and somehow the completions get batched this should complete one
bio at a time and be the worst case for the delay scheme.

> >   C rand 4k qd32
> >     BW (MB/s)               66.2 ± 0.8        44.6 ± 7.4    -32.7%
> >     p99   (us)              8002 ± 174      17990 ± 6800   +124.8%
> >     p999  (us)             11390 ± 554     31890 ± 11076   +180.0%
> >     ctx-switches         3.67 M ± 45 k    3.59 M ± 106 k     -2.2%
> >     cs / io                3.78 ± 0.04       5.62 ± 0.83    +48.7%
> >     avg bios/run            32.3 ± 1.0         3.1 ± 0.3    -90.5%
> 
> I'm somewhat surprised how larger is the completion latency is here without
> the delay. Is that due to a contention on local lock between the IO completion
> interrupt and the worker? Or why is the completion latency so big here when
> the case B with more IOs in flight, less bios per run, still had significantly
> lower latency in the delay=0 case?

Note that in the past we had major problems with workqueue scheduling
latency.  At some point these got mitigated a lot, but if they are back
for this workload that might be one reason.


^ permalink raw reply

* Re: [LSF/MM/BPF RFC PATCH 00/13]
From: Haris Iqbal @ 2026-05-27 12:44 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: linux-block, linux-rdma, linux-kernel, axboe, bvanassche, hch,
	jgg, jinpu.wang
In-Reply-To: <20260512103424.GR15586@unreal>

On Tue, May 12, 2026 at 12:34 PM Leon Romanovsky <leon@kernel.org> wrote:
>
> On Tue, May 05, 2026 at 09:46:12AM +0200, Md Haris Iqbal wrote:
> > Following a conversation with Bart yesterday, I am sending the RMR+BRMR
> > code through patch for easier review.
> >
> > The patches apply over the for-next branch of the block tree over commit
> > 07dfa981ca3
> >
> > For context,
> > RMR (Reliable Multicast over RTRS) is a kernel module that provides
> > active-active block-level replication over RDMA. It guarantees delivery
> > of IO to a group of storage nodes and handles resynchronization of data
> > directly between storage nodes without involving the compute client.
> >
> > BRMR (Block device over RMR) sits on top of RMR and exposes a standard
> > Linux block device (/dev/brmrX) backed by an RMR pool. Together, RMR and
> > BRMR provide a single-hop replication and resynchronization solution for
> > RDMA-connected storage clusters.
> >
> > My session is on Wednesday, at 12 in the storage room (Istanbul).
>
> To summarize the discussion:
>
> 1. Move as much logic as possible into the block layer; RDMA should serve
>    strictly as a transport.
> 2. Identify another in‑kernel user of this functionality, and add support for
>    it if required. At least accommodate potential users elsewhere in the
>    kernel.

Thanks for the summary Leon.

The main logic which handles multicast/replication legs, missed I/O
tracking, re-synchronization, etc are the core parts of RMR.
If we move those to a separate module, there won't be much left in
RMR. RMR already uses RTRS from the RDMA subsystem as transport.

Having said that, I am not against moving RMR out of the RDMA layer.
It can serve as a reliable replication service/library for any other
user in the kernel to use.
Which subsystem (block or something else) would be a better fit then,
can be discussed.

PS: Would this be a good candidate for a session/discussion in the upcoming LPC?

>
> Thanks

^ permalink raw reply

* Re: [LSF/MM/BPF TOPIC] A block level, active-active replication solution
From: Haris Iqbal @ 2026-05-27 12:16 UTC (permalink / raw)
  To: Philipp Reisner; +Cc: lsf-pc, linux-block, Jia Li
In-Reply-To: <afm2PewS5Gi5QU61@ryzen9>

On Tue, May 5, 2026 at 11:20 AM Philipp Reisner
<philipp.reisner@linbit.com> wrote:
>
> Am Tue, Feb 03, 2026 at 04:09:59PM +0100 schrieb Haris Iqbal:
> > Hi Haris,
> >
> > We are working on a pair of kernel modules which would offer a new
> > replication solution in the Linux kernel. It would be a block level,
> > active-active replication solution for RDMA transport.
> >
> > The existing block level replication solution in the Linux kernel is
> > DRBD, which is an active-passive solution. The data replication in
> > DRBD happens through 2 network hops.
> >
> >
> > An active-active solution which one can build is by exporting block
> > devices, either through NVMeOF or RNBD/RTRS, over the network, and
> > then creating a raid1 device over it. It would provide a single hop
> > replication solution, but the synchronization during a degraded state
> > goes through 2 hops.
> >
> > The proposed solution would provide an active-active single hop
> > replication, and a single hop synchronization (directly between
> > storage nodes) in case of a degraded state.
> [...]
>
>
> I stumbled across this post because of the newer replies.
>
> I want to point out that we have significantly developed DRBD over the
> last 15 Years as an out-of-tree module. In the past months, we began
> the process of getting all those improvements back into Linux
> upstream.
>
> With that, DRBD9 became multi-node. It does the “active-active single
> hop replication” as it is. The networking part is now abstracted into
> transport modules. We have one for TCP, one for load balancing across
> multiple TCP connections, and one for RDMA.
>
> What you are doing here, in DRBD lingo, is a diskless primary
> connected to multiple storage nodes.
>
> Find everything here https://github.com/LINBIT.
> The latest edition of what we bring to the upstreaming discussion:
> https://github.com/LINBIT/linux-drbd/tree/drbd-next

Hi Philipp,

Interesting.
I looked into the diskless primary mode configuration for DRBD, and it
does look similar to what RMR/BRMR offers.
We plan to do comparison runs of DRBD diskless primary mode, and RMR/BRMR.

I see the DRBD version in the current kernel is still 8.x.x.
Do you have an ETA by when can we have version 9 in the kernel?

>
> Philipp

^ permalink raw reply

* Re: [PATCH] block: skip sync_blockdev() on surprise removal in bdev_mark_dead()
From: Christian Brauner @ 2026-05-27 12:02 UTC (permalink / raw)
  To: Chao Shi
  Cc: Jens Axboe, Christoph Hellwig, Josef Bacik, linux-block,
	linux-kernel, Sungwoo Kim, Dave Tian, Weidong Zhu
In-Reply-To: <20260522220025.1770388-1-coshi036@gmail.com>

On Fri, May 22, 2026 at 06:00:25PM -0400, Chao Shi wrote:
> bdev_mark_dead()'s @surprise == true means the device is already gone.
> The filesystem callback fs_bdev_mark_dead() honours this and skips
> sync_filesystem(), but the bare block device path (no ->mark_dead op)
> lost its !surprise guard when the holder ->mark_dead callback was wired
> up (see Fixes), and now calls sync_blockdev() unconditionally, which can
> hang forever waiting on writeback that can no longer complete.
> 
> syzkaller hit this via nvme_reset_work()'s "I/O queues lost" path:
> nvme_mark_namespaces_dead() -> blk_mark_disk_dead() ->
> bdev_mark_dead(bdev, true) -> sync_blockdev() blocks in
> folio_wait_writeback(), wedging the reset worker and every task waiting
> on it.
> 
> Skip the sync on surprise removal, matching fs_bdev_mark_dead();
> invalidate_bdev() still runs. Orderly removal (surprise == false) is
> unchanged.
> 
> Fixes: d8530de5a6e8 ("block: call into the file system for bdev_mark_dead")
> Found by FuzzNvme(Syzkaller with FEMU fuzzing framework).
> Acked-by: Sungwoo Kim <iam@sung-woo.kim>
> Acked-by: Dave Tian <daveti@purdue.edu>
> Acked-by: Weidong Zhu <weizhu@fiu.edu>
> Signed-off-by: Chao Shi <coshi036@gmail.com>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply

* Re: [PATCH] rust: block: mq: align init_request numa_node arg with C signature
From: mateusz.nowicki @ 2026-05-27 11:57 UTC (permalink / raw)
  To: Andreas Hindborg
  Cc: Boqun Feng, Miguel Ojeda, Gary Guo, Björn Roy Baron,
	Benno Lossin, Alice Ryhl, Trevor Gross, Danilo Krummrich,
	Jens Axboe, linux-block, rust-for-linux, linux-kernel
In-Reply-To: <20260527-block-for-next-2026-05-26-2200-failure-v1-1-4865889e282c@kernel.org>

Hello Andreas,

how can I catch it earlier in the future? I verified patch correctness 
with compiling 'allyesconfig' but
I didn't catch rust issue.

Regards,
Mateusz

On 27.05.2026 11:18, Andreas Hindborg wrote:
> Commit b040a1a4523d ("block: switch numa_node to int in
> blk_mq_hw_ctx and init_request") changed the type of the
> `numa_node` argument of `blk_mq_ops::init_request` from
> `unsigned int` to `int`. Update the Rust callback signature to
> match, so that the function item can be coerced to the C fn
> pointer type stored in `blk_mq_ops`.
> 
> Without this change the Rust block layer fails to build:
> 
>   error[E0308]: mismatched types
>      --> rust/kernel/block/mq/operations.rs:274:28
>       |
>   274 |         init_request: Some(Self::init_request_callback),
>       |                       ---- ^^^^^^^^^^^^^^^^^^^^^^^^^^^
>       |                       expected fn pointer, found fn item
>       |
>       = note: expected fn pointer
>                 `unsafe extern "C" fn(_, _, _, i32) -> _`
>                     found fn item
>                 `unsafe extern "C" fn(_, _, _, u32) -> _ {...}`
> 
> The argument is unused on the Rust side, so this is a pure
> type-signature change with no functional impact.
> 
> Fixes: b040a1a4523d ("block: switch numa_node to int in blk_mq_hw_ctx
> and init_request")
> Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org>
> ---
>  rust/kernel/block/mq/operations.rs | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/rust/kernel/block/mq/operations.rs
> b/rust/kernel/block/mq/operations.rs
> index 8ad46129a52c..861903e18fbf 100644
> --- a/rust/kernel/block/mq/operations.rs
> +++ b/rust/kernel/block/mq/operations.rs
> @@ -218,7 +218,7 @@ impl<T: Operations> OperationsVTable<T> {
>          _set: *mut bindings::blk_mq_tag_set,
>          rq: *mut bindings::request,
>          _hctx_idx: crate::ffi::c_uint,
> -        _numa_node: crate::ffi::c_uint,
> +        _numa_node: crate::ffi::c_int,
>      ) -> crate::ffi::c_int {
>          from_result(|| {
>              // SAFETY: By the safety requirements of this function, 
> `rq` points
> 
> ---
> base-commit: 27236c051c01c1c1025e0e0d12a107082557e8f1
> change-id: 20260527-block-for-next-2026-05-26-2200-failure-64907085fc49
> 
> Best regards,

^ permalink raw reply

* Re: [PATCH] block: blk-zoned: fix zwplug refcount leak on write error path
From: Shin'ichiro Kawasaki @ 2026-05-27 11:47 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Haris Iqbal, Wentao Liang, Jens Axboe, linux-block, linux-kernel,
	stable
In-Reply-To: <90a1581d-9a3c-46db-bc7b-5fd1a9d9c0e1@kernel.org>

On May 27, 2026 / 08:15, Damien Le Moal wrote:
[...]
> Wentao,
> 
> You clearly did not test this at all because if you had, you would have seen
> all the warning splats that your patch triggers.

FYI, the blktests CI run for the patch caught failures at block/017, zbd/004,
zbd/009 and zbd/012.

# RUN_ZONED_TESTS=1 ./check block/017
block/017 (do I/O and check the inflight counter)            [passed]
    runtime  2.264s  ...  2.140s
block/017 (zoned) (do I/O and check the inflight counter)    [failed]
    runtime  2.107s  ...  2.080s
    something found in dmesg:
    [  207.429382] [   T1852] run blktests block/017 at 2026-05-27 20:43:45
    [  207.466894] [   T1852] null_blk: nullb1: using native zone append
    [  207.479158] [   T1852] null_blk: disk nullb1 created
    [  207.810531] [   T1956] null_blk: disk nullb0 created
    [  207.811528] [   T1956] null_blk: module loaded
    [  207.830801] [   T1852] null_blk: nullb1: using native zone append
    [  208.404359] [   T1852] null_blk: disk nullb1 created
    [  209.174141] [      C2] ------------[ cut here ]------------
    [  209.175354] [      C2] WARNING: block/blk-zoned.c:590 at disk_free_zone_wplug+0x30c/0x3b0, CPU#2: swapper/2/0
    [  209.176896] [      C2] Modules linked in: null_blk nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables qrtr sunrpc 9pnet_virtio 9pnet i2c_piix4 pcspkr netfs i2c_smbus dm_multipath nfnetlink zram vmw_vsock_virtio_transport vmw_vsock_virtio_transport_common vsock bochs drm_client_lib nvme drm_shmem_helper xfs drm_kms_helper sym53c8xx nvme_core floppy nvme_keyring nvme_auth scsi_transport_spi e1000 drm serio_raw ata_generic pata_acpi i2c_dev qemu_fw_cfg virtiofs fuse virtio_console [last unloaded: null_blk]
    ...
    (See '/home/shin/Blktests/blktests/results/nodev_zoned/block/017.dmesg' for the entire message)
# ./check zbd/004 zbd/009 zbd/012
zbd/004 => nullb1 (write split across sequential zones)      [failed]
    runtime  0.152s  ...  0.626s
    something found in dmesg:
    [  231.263084] [   T2067] run blktests zbd/004 at 2026-05-27 20:44:08
    [  231.714947] [   T2105] ------------[ cut here ]------------
    [  231.716700] [   T2105] refcount_t: underflow; use-after-free.
    [  231.717849] [   T2105] WARNING: lib/refcount.c:28 at refcount_warn_saturate+0xa9/0xe0, CPU#3: dd/2105
    [  231.720269] [   T2105] Modules linked in: null_blk nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables qrtr sunrpc 9pnet_virtio 9pnet i2c_piix4 pcspkr netfs i2c_smbus dm_multipath nfnetlink zram vmw_vsock_virtio_transport vmw_vsock_virtio_transport_common vsock bochs drm_client_lib nvme drm_shmem_helper xfs drm_kms_helper sym53c8xx nvme_core floppy nvme_keyring nvme_auth scsi_transport_spi e1000 drm serio_raw ata_generic pata_acpi i2c_dev qemu_fw_cfg virtiofs fuse virtio_console [last unloaded: null_blk]
    [  231.730390] [   T2105] CPU: 3 UID: 0 PID: 2105 Comm: dd Tainted: G        W           7.1.0-rc5+ #3 PREEMPT(full)
    [  231.732289] [   T2105] Tainted: [W]=WARN
    [  231.733281] [   T2105] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-10.fc44 06/10/2025
    [  231.735090] [   T2105] RIP: 0010:refcount_warn_saturate+0xa9/0xe0
    [  231.736514] [   T2105] Code: bd ee 5d 03 67 48 0f b9 3a 5b 5d c3 cc cc cc cc 48 8d 3d ba ee 5d 03 67 48 0f b9 3a 5b 5d e9 ce ea 85 01 48 8d 3d b7 ee 5d 03 <67> 48 0f b9 3a 5b 5d c3 cc cc cc cc 48 8d 3d b4 ee 5d 03 67 48 0f
    ...
    (See '/home/shin/Blktests/blktests/results/nullb1/zbd/004.dmesg' for the entire message)
zbd/009 (test gap zone support with BTRFS)                   [failed]
    runtime  11.646s  ...  1.424s
    --- tests/zbd/009.out	2023-04-06 10:11:07.928670527 +0900
    +++ /home/shin/Blktests/blktests/results/nodev/zbd/009.out.bad	2026-05-27 20:44:12.743034470 +0900
    @@ -1,2 +1,4 @@
     Running zbd/009
    -Test complete
    +mount: /home/shin/Blktests/blktests/results/tmpdir.zbd.009.xLW/mnt: wrong fs type, bad option, bad superblock on /dev/sdd, missing codepage or helper program, or other error.
    +       dmesg(1) may have more information after failed mount system call.
    +Test failed
zbd/012 (test requeuing of zoned writes and queue freezing)  [failed]
    runtime  42.181s  ...  23.791s
    --- tests/zbd/012.out	2025-03-06 19:32:02.536851507 +0900
    +++ /home/shin/Blktests/blktests/results/nodev/zbd/012.out.bad	2026-05-27 20:44:38.677211476 +0900
    @@ -2,6 +2,4 @@
     1
     2
     4
    -8
    -16
    -Test complete
    +Test failed

^ permalink raw reply

* Re: [PATCH v3] loop: Fix NULL pointer dereference in lo_rw_aio()
From: Tetsuo Handa @ 2026-05-27 11:29 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, Bart Van Assche, Christoph Hellwig, Damien Le Moal,
	linux-block, LKML, Andrew Morton, Linus Torvalds, linux-btrfs,
	David Sterba, linux-fsdevel, Christian Brauner
In-Reply-To: <ahZeYQ0cLE1i8TGs@fedora>

On 2026/05/27 12:00, Ming Lei wrote:
> On Wed, May 27, 2026 at 10:35:56AM +0900, Tetsuo Handa wrote:
>> On 2026/05/27 10:20, Ming Lei wrote:
>>>> Of course we should try to figure out the root cause first, but how can we do?
>>>
>>> Definitely unexpected write IO(after umount & loop closed) from btrfs is more serious,
>>> which may cause data loss, so CC btrfs list and maintainer.
>>

I had a conversation with Google AI mode, and received the following response.

--------------------------------------------------------------------------------
Technical Analysis: lo_rw_aio Null Pointer Dereference / UAF since v7.1-rc1

1. The Root Cause of the Timing Shift

This regression was introduced during the v7.1-rc1 merge window, primarily exposed by
Commit 65565ca5f99b ("block: unify the synchronous bi_end_io callbacks"), along with
helper refactorings like Commit 92c3737a2473 ("block: add a bio_submit_or_kill helper").

Prior to v7.0, the synchronous I/O completion path inherently contained execution lags (due
to serialized completion handling and context switches) before notifying upper layers. This
latency accidentally acted as a natural safety barrier. It ensured that by the time a file
system completed its final sync_filesystem() and initiated umount, the loop driver's internal
workqueue (lo_rw_aio) had already finished processing everything.

In v7.1, the unification and optimization of bi_end_io significantly minimized this latency.
The filesystem now learns of "I/O completion" much faster. Consequently, highly-concurrent
execution pipelines like btrfs or jfs proceed rapidly through kill_sb() and blkdev_put(),
ultimately invoking lo_release() -> __loop_clr_fd() while the loop driver's backend kworker
is still in the middle of executing the last sub-millisecond asynchronous file-backed I/O
request.

2. Why the Block Layer's Built-in Quiesce/Freeze Fails

There is an implicit assumption that standard block layer freeze mechanisms (blk_mq_freeze_queue())
protect the device lifetime during release. However, the v7.1 BIO helper refactoring introduced
a synchronization gap:

  1. The filesystem triggers its final metadata or journal updates (e.g., txCommit in jfs or
     delayed refcount updates in btrfs) right during the unmount/close boundary.
  2. Due to the optimized execution path, these requests bypass the block layer's active
     request-tracking metrics at the exact moment blk_mq_freeze_queue() or state validation
     checks evaluated them as zero.
  3. The block layer assumes the queue is safe and silent, allowing __loop_clr_fd() to
     progress and nullify lo->lo_backing_file (or trigger fput()).
  4. The leaked asynchronous kworker wakes up a fraction of a millisecond too late, attempts
     to access lo->lo_backing_file or invokes kiocb_end_write() -> file_inode(), leading to
     either a general protection fault (Null pointer dereference) or a Use-After-Free (UAF).

3. Why This Isn't Just an "Unexpected FS Bug"

While the write I/O originates from file systems like btrfs and jfs post-close, blaming the
file systems entirely ignores the underlying infrastructure change. The core issue is that the
block layer altered its synchronization behavior, breaking the barrier contract that
VFS and file systems historically relied on during the device release path.

Papering over this inside individual file systems would require adding heavy, duplicated
barriers inside every single filesystem's unmount path.

^ permalink raw reply

* Re: [PATCH] rust: block: mq: align init_request numa_node arg with C signature
From: Gary Guo @ 2026-05-27 11:09 UTC (permalink / raw)
  To: Alice Ryhl, Gary Guo
  Cc: Andreas Hindborg, Boqun Feng, Miguel Ojeda, Björn Roy Baron,
	Benno Lossin, Trevor Gross, Danilo Krummrich, Jens Axboe,
	Mateusz Nowicki, linux-block, rust-for-linux, linux-kernel
In-Reply-To: <CAH5fLgiobJ2yCbDM56H8rzMeAWv0eaqt8-vPchU1EV3VEYb=jA@mail.gmail.com>

On Wed May 27, 2026 at 11:59 AM BST, Alice Ryhl wrote:
> On Wed, May 27, 2026 at 12:57 PM Gary Guo <gary@garyguo.net> wrote:
>>
>> On Wed May 27, 2026 at 10:18 AM BST, Andreas Hindborg wrote:
>> > Commit b040a1a4523d ("block: switch numa_node to int in
>> > blk_mq_hw_ctx and init_request") changed the type of the
>> > `numa_node` argument of `blk_mq_ops::init_request` from
>> > `unsigned int` to `int`. Update the Rust callback signature to
>> > match, so that the function item can be coerced to the C fn
>> > pointer type stored in `blk_mq_ops`.
>> >
>> > Without this change the Rust block layer fails to build:
>> >
>> >   error[E0308]: mismatched types
>> >      --> rust/kernel/block/mq/operations.rs:274:28
>> >       |
>> >   274 |         init_request: Some(Self::init_request_callback),
>> >       |                       ---- ^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> >       |                       expected fn pointer, found fn item
>> >       |
>> >       = note: expected fn pointer
>> >                 `unsafe extern "C" fn(_, _, _, i32) -> _`
>> >                     found fn item
>> >                 `unsafe extern "C" fn(_, _, _, u32) -> _ {...}`
>> >
>> > The argument is unused on the Rust side, so this is a pure
>> > type-signature change with no functional impact.
>> >
>> > Fixes: b040a1a4523d ("block: switch numa_node to int in blk_mq_hw_ctx and init_request")
>> > Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org>
>>
>> You could also just use `i32` instead of `ffi::c_int`. But it doesn't really
>> matter for this patch.
>
> By the way, all these constants are in the prelude.

I wonder if we should actually encourage people, by removing them from prelude,
to use i32/u32 instead of c_int/c_uint, isize/usize instead of c_long/c_ulong
and similar, given all Linux ABIs have consistent mapping for them.

On a different note, perhaps it might worth adding a lint to klint to check for
path references for types available via prelude for other types.

Best,
Gary

^ permalink raw reply

* [PATCH] scsi: bsg: copy uring_cmd payload to prevent double-fetch from shared SQE
From: Rahul Chandelkar @ 2026-05-27 10:59 UTC (permalink / raw)
  To: rc, James E . J . Bottomley, Martin K . Petersen, Jens Axboe,
	FUJITA Tomonori
  Cc: linux-scsi, linux-block, io-uring, linux-kernel, stable

scsi_bsg_uring_cmd() and scsi_bsg_map_user_buffer() read bsg_uring_cmd
fields directly from the shared mmap'd io_uring submission ring via
io_uring_sqe128_cmd().  On the inline execution path, io_uring has not
yet copied the SQE to kernel memory, so a concurrent userspace thread
can modify fields between reads.

cmd->request_len is read for the bounds check, for the cmd_len
assignment, and for the copy_from_user length.  A racing thread can
change request_len between the bounds check (passes with <= 32) and
copy_from_user (uses the enlarged value), overflowing the 32-byte
scmd->cmnd[] buffer into subsequent struct scsi_cmnd fields.

scsi_bsg_map_user_buffer() independently re-derives its cmd pointer
from the same shared SQE, re-reading dout_xfer_len, din_xfer_len,
dout_xferp, and din_xferp, enabling direction confusion and buffer
length races.

Copy struct bsg_uring_cmd to a stack-local variable before use in both
functions.  The pointer variable 'cmd' is redirected to the local copy
so the rest of each function is unchanged.

Tested with KASAN on QEMU (virtio-scsi, 2 vCPUs).  Without this fix,
a two-thread race produces:

  BUG: KASAN: wild-memory-access in scsi_queue_rq+0x4a3/0x58a0
  Write of size 96 at addr dead000000001000 by task poc/67
  Call Trace:
   kasan_report+0xce/0x100
   __asan_memset+0x23/0x50
   scsi_queue_rq+0x4a3/0x58a0
   scsi_bsg_uring_cmd+0x942/0x1570
   io_uring_cmd+0x2f6/0x950
   io_issue_sqe+0xe5/0x22d0

Fixes: 7b6d3255e7f8 ("scsi: bsg: add io_uring passthrough handler")
Cc: stable@vger.kernel.org
Signed-off-by: Rahul Chandelkar <rc@rexion.ai>
---
 drivers/scsi/scsi_bsg.c | 23 ++++++++++++++++++-----
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/scsi_bsg.c b/drivers/scsi/scsi_bsg.c
index e80dec53174e..244740655eb0 100644
--- a/drivers/scsi/scsi_bsg.c
+++ b/drivers/scsi/scsi_bsg.c
@@ -78,13 +78,21 @@ static int scsi_bsg_map_user_buffer(struct request *req,
 				    struct io_uring_cmd *ioucmd,
 				    unsigned int issue_flags, gfp_t gfp_mask)
 {
-	const struct bsg_uring_cmd *cmd = io_uring_sqe128_cmd(ioucmd->sqe, struct bsg_uring_cmd);
-	bool is_write = cmd->dout_xfer_len > 0;
-	u64 buf_addr = is_write ? cmd->dout_xferp : cmd->din_xferp;
-	unsigned long buf_len = is_write ? cmd->dout_xfer_len : cmd->din_xfer_len;
+	struct bsg_uring_cmd local_cmd;
+	const struct bsg_uring_cmd *cmd;
+	bool is_write;
+	u64 buf_addr;
+	unsigned long buf_len;
 	struct iov_iter iter;
 	int ret;
 
+	memcpy(&local_cmd, io_uring_sqe128_cmd(ioucmd->sqe, struct bsg_uring_cmd),
+	       sizeof(local_cmd));
+	cmd = &local_cmd;
+	is_write = cmd->dout_xfer_len > 0;
+	buf_addr = is_write ? cmd->dout_xferp : cmd->din_xferp;
+	buf_len = is_write ? cmd->dout_xfer_len : cmd->din_xfer_len;
+
 	if (ioucmd->flags & IORING_URING_CMD_FIXED) {
 		ret = io_uring_cmd_import_fixed(buf_addr, buf_len,
 						is_write ? WRITE : READ,
@@ -104,13 +112,18 @@ static int scsi_bsg_uring_cmd(struct request_queue *q, struct io_uring_cmd *iouc
 			       unsigned int issue_flags, bool open_for_write)
 {
 	struct scsi_bsg_uring_cmd_pdu *pdu = scsi_bsg_uring_cmd_pdu(ioucmd);
-	const struct bsg_uring_cmd *cmd = io_uring_sqe128_cmd(ioucmd->sqe, struct bsg_uring_cmd);
+	struct bsg_uring_cmd local_cmd;
+	const struct bsg_uring_cmd *cmd;
 	struct scsi_cmnd *scmd;
 	struct request *req;
 	blk_mq_req_flags_t blk_flags = 0;
 	gfp_t gfp_mask = GFP_KERNEL;
 	int ret;
 
+	memcpy(&local_cmd, io_uring_sqe128_cmd(ioucmd->sqe, struct bsg_uring_cmd),
+	       sizeof(local_cmd));
+	cmd = &local_cmd;
+
 	if (cmd->protocol != BSG_PROTOCOL_SCSI ||
 	    cmd->subprotocol != BSG_SUB_PROTOCOL_SCSI_CMD)
 		return -EINVAL;
-- 
2.54.0


^ permalink raw reply related

* Re: [PATCH] rust: block: mq: align init_request numa_node arg with C signature
From: Alice Ryhl @ 2026-05-27 10:59 UTC (permalink / raw)
  To: Gary Guo
  Cc: Andreas Hindborg, Boqun Feng, Miguel Ojeda, Björn Roy Baron,
	Benno Lossin, Trevor Gross, Danilo Krummrich, Jens Axboe,
	Mateusz Nowicki, linux-block, rust-for-linux, linux-kernel
In-Reply-To: <DITELF5LNOB6.2IHJU3443SMVH@garyguo.net>

On Wed, May 27, 2026 at 12:57 PM Gary Guo <gary@garyguo.net> wrote:
>
> On Wed May 27, 2026 at 10:18 AM BST, Andreas Hindborg wrote:
> > Commit b040a1a4523d ("block: switch numa_node to int in
> > blk_mq_hw_ctx and init_request") changed the type of the
> > `numa_node` argument of `blk_mq_ops::init_request` from
> > `unsigned int` to `int`. Update the Rust callback signature to
> > match, so that the function item can be coerced to the C fn
> > pointer type stored in `blk_mq_ops`.
> >
> > Without this change the Rust block layer fails to build:
> >
> >   error[E0308]: mismatched types
> >      --> rust/kernel/block/mq/operations.rs:274:28
> >       |
> >   274 |         init_request: Some(Self::init_request_callback),
> >       |                       ---- ^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >       |                       expected fn pointer, found fn item
> >       |
> >       = note: expected fn pointer
> >                 `unsafe extern "C" fn(_, _, _, i32) -> _`
> >                     found fn item
> >                 `unsafe extern "C" fn(_, _, _, u32) -> _ {...}`
> >
> > The argument is unused on the Rust side, so this is a pure
> > type-signature change with no functional impact.
> >
> > Fixes: b040a1a4523d ("block: switch numa_node to int in blk_mq_hw_ctx and init_request")
> > Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org>
>
> You could also just use `i32` instead of `ffi::c_int`. But it doesn't really
> matter for this patch.

By the way, all these constants are in the prelude.

Alice

^ permalink raw reply

* Re: [PATCH] rust: block: mq: align init_request numa_node arg with C signature
From: Gary Guo @ 2026-05-27 10:56 UTC (permalink / raw)
  To: Andreas Hindborg, Boqun Feng, Miguel Ojeda, Gary Guo,
	Björn Roy Baron, Benno Lossin, Alice Ryhl, Trevor Gross,
	Danilo Krummrich, Jens Axboe
  Cc: Mateusz Nowicki, linux-block, rust-for-linux, linux-kernel
In-Reply-To: <20260527-block-for-next-2026-05-26-2200-failure-v1-1-4865889e282c@kernel.org>

On Wed May 27, 2026 at 10:18 AM BST, Andreas Hindborg wrote:
> Commit b040a1a4523d ("block: switch numa_node to int in
> blk_mq_hw_ctx and init_request") changed the type of the
> `numa_node` argument of `blk_mq_ops::init_request` from
> `unsigned int` to `int`. Update the Rust callback signature to
> match, so that the function item can be coerced to the C fn
> pointer type stored in `blk_mq_ops`.
> 
> Without this change the Rust block layer fails to build:
> 
>   error[E0308]: mismatched types
>      --> rust/kernel/block/mq/operations.rs:274:28
>       |
>   274 |         init_request: Some(Self::init_request_callback),
>       |                       ---- ^^^^^^^^^^^^^^^^^^^^^^^^^^^
>       |                       expected fn pointer, found fn item
>       |
>       = note: expected fn pointer
>                 `unsafe extern "C" fn(_, _, _, i32) -> _`
>                     found fn item
>                 `unsafe extern "C" fn(_, _, _, u32) -> _ {...}`
> 
> The argument is unused on the Rust side, so this is a pure
> type-signature change with no functional impact.
> 
> Fixes: b040a1a4523d ("block: switch numa_node to int in blk_mq_hw_ctx and init_request")
> Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org>

You could also just use `i32` instead of `ffi::c_int`. But it doesn't really
matter for this patch.

Reviewed-by: Gary Guo <gary@garyguo.net>

> ---
>  rust/kernel/block/mq/operations.rs | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox