* [RFC PATCH 0/1] add fixed file table support
@ 2025-04-14 18:54 Brian Song
2025-04-14 18:54 ` [RFC PATCH 1/1] This work adds support for registering block file descriptors to the io_uring instance and uses IOSQE_FIXED_FILE in I/O requests (SQEs) to avoid the cost of fdget() in the kernel. It is a basic implementation for testing, and does not yet handle cases where block devices are removed Brian Song
0 siblings, 1 reply; 2+ messages in thread
From: Brian Song @ 2025-04-14 18:54 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, stefanha, Brian Song
Hi everyone,
I am a GSoC QEMU community applicant this year, and I have just
completed this contribution task suggested by the project mentors
Kevin and Stefan. This task requires registering the file descriptor
of a block file that currently uses io_uring as the AIO method to an
io_uring instance, so that when the kernel processes I/O requests, it
can directly use the index to find the file information and avoid
frequent file lookups (fdget()) in the kernel. This is expected to
improve I/O performance.
Note that since this is currently just a proof-of-concept that enables
benchmarking, handling scenarios like block file removal is not yet
implemented. Testing was conducted using fio for random read operations,
and based on the results, there doesn’t seem to be a significant I/O
performance improvement.
Please feel free to share any thoughts!
Thanks,
Brian
The specific testing method and results are as follows:
guest $ sudo fio --filename=/dev/vda \
--runtime=120 \
--ioengine=io_uring \
--direct=1 \
--ramp_time=5 \
--name=randread \
--readwrite=randread \
--iodepth=64 \
--numjobs=1 \
--blocksize=4k \
--runtime=30 \
--time_based=1
** Guest with fixed file table support: **
**vda (guest.img)**
randread: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring, iodepth=64
fio-3.39
Starting 1 process
Jobs: 1 (f=1): [r(1)][100.0%][r=502MiB/s][r=128k IOPS][eta 00m:00s]
randread: (groupid=0, jobs=1): err= 0: pid=1208: Fri Apr 11 23:18:26 2025
read: IOPS=127k, BW=496MiB/s (520MB/s)(14.5GiB/30001msec)
slat (usec): min=2, max=3541, avg= 5.89, stdev= 3.71
clat (usec): min=8, max=24149, avg=496.40, stdev=149.85
lat (usec): min=11, max=24161, avg=502.29, stdev=149.89
clat percentiles (usec):
| 1.00th=[ 375], 5.00th=[ 433], 10.00th=[ 449], 20.00th=[ 461],
| 30.00th=[ 469], 40.00th=[ 474], 50.00th=[ 482], 60.00th=[ 486],
| 70.00th=[ 494], 80.00th=[ 502], 90.00th=[ 515], 95.00th=[ 537],
| 99.00th=[ 1287], 99.50th=[ 1516], 99.90th=[ 1827], 99.95th=[ 1958],
| 99.99th=[ 2573]
bw ( KiB/s): min=484856, max=530928, per=100.00%, avg=508499.90, stdev=8880.33, samples=60
iops : min=121214, max=132732, avg=127124.98, stdev=2220.10, samples=60
lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%, 250=0.05%
lat (usec) : 500=79.75%, 750=18.05%, 1000=0.44%
lat (msec) : 2=1.66%, 4=0.04%, 10=0.01%, 20=0.01%, 50=0.01%
cpu : usr=44.52%, sys=55.43%, ctx=199, majf=0, minf=36
IO depths : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued rwts: total=3810630,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
READ: bw=496MiB/s (520MB/s), 496MiB/s-496MiB/s (520MB/s-520MB/s), io=14.5GiB (15.6GB), run=30001-30001msec
Disk stats (read/write):
vda: ios=4422643/234, sectors=35381152/14793, merge=0/20, ticks=120202/328, in_queue=120535, util=95.02%
** Guest without fixed file table support**
** vda **
randread: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring, iodepth=64
fio-3.39
Starting 1 process
Jobs: 1 (f=1): [r(1)][100.0%][r=459MiB/s][r=118k IOPS][eta 00m:00s]
randread: (groupid=0, jobs=1): err= 0: pid=1217: Fri Apr 11 23:16:24 2025
read: IOPS=127k, BW=498MiB/s (522MB/s)(14.6GiB/30001msec)
slat (usec): min=2, max=246, avg= 5.91, stdev= 3.19
clat (usec): min=10, max=21817, avg=494.55, stdev=149.50
lat (usec): min=17, max=21827, avg=500.46, stdev=149.59
clat percentiles (usec):
| 1.00th=[ 318], 5.00th=[ 392], 10.00th=[ 433], 20.00th=[ 457],
| 30.00th=[ 469], 40.00th=[ 478], 50.00th=[ 482], 60.00th=[ 490],
| 70.00th=[ 494], 80.00th=[ 502], 90.00th=[ 529], 95.00th=[ 562],
| 99.00th=[ 1270], 99.50th=[ 1516], 99.90th=[ 1827], 99.95th=[ 1958],
| 99.99th=[ 2376]
bw ( KiB/s): min=441768, max=568144, per=100.00%, avg=510363.83, stdev=23076.31, samples=60
iops : min=110442, max=142036, avg=127590.88, stdev=5769.07, samples=60
lat (usec) : 20=0.01%, 50=0.01%, 100=0.02%, 250=0.10%, 500=76.37%
lat (usec) : 750=21.30%, 1000=0.52%
lat (msec) : 2=1.65%, 4=0.04%, 10=0.01%, 20=0.01%, 50=0.01%
cpu : usr=43.71%, sys=56.26%, ctx=133, majf=0, minf=36
IO depths : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued rwts: total=3824929,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
READ: bw=498MiB/s (522MB/s), 498MiB/s-498MiB/s (522MB/s-522MB/s), io=14.6GiB (15.7GB), run=30001-30001msec
Disk stats (read/write):
vda: ios=4468557/140, sectors=35748456/8817, merge=0/18, ticks=129894/244, in_queue=130143, util=95.00%
Brian Song (1):
This work adds support for registering block file descriptors to the
io_uring instance and uses IOSQE_FIXED_FILE in I/O requests (SQEs)
to avoid the cost of fdget() in the kernel. It is a basic
implementation for testing, and does not yet handle cases where
block devices are removed.
block/io_uring.c | 60 ++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 60 insertions(+)
--
2.43.0
^ permalink raw reply [flat|nested] 2+ messages in thread
* [RFC PATCH 1/1] This work adds support for registering block file descriptors to the io_uring instance and uses IOSQE_FIXED_FILE in I/O requests (SQEs) to avoid the cost of fdget() in the kernel. It is a basic implementation for testing, and does not yet handle cases where block devices are removed.
2025-04-14 18:54 [RFC PATCH 0/1] add fixed file table support Brian Song
@ 2025-04-14 18:54 ` Brian Song
0 siblings, 0 replies; 2+ messages in thread
From: Brian Song @ 2025-04-14 18:54 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, stefanha, Brian Song
Suggested-by: Kevin Wolf <kwolf@redhat.com>
Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Brian Song <hibriansong@gmail.com>
---
block/io_uring.c | 60 ++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 60 insertions(+)
diff --git a/block/io_uring.c b/block/io_uring.c
index dd4f304910..94a875fbae 100644
--- a/block/io_uring.c
+++ b/block/io_uring.c
@@ -58,6 +58,11 @@ struct LuringState {
LuringQueue io_q;
QEMUBH *completion_bh;
+
+ /* fixed file support */
+ int *registered_fds;
+ int nr_registered_fds;
+ int max_registered_fds; /* size of registered_fds */
};
/**
@@ -323,6 +328,41 @@ static void luring_deferred_fn(void *opaque)
}
}
+static int luring_register_fd(LuringState *s, int fd)
+{
+ int idx;
+ int *new_fds;
+ int ret;
+
+ for (idx = 0; idx < s->nr_registered_fds; idx++) {
+ if (s->registered_fds[idx] == fd) {
+ return idx;
+ }
+ }
+
+ /* Grow the array if needed */
+ if (s->nr_registered_fds >= s->max_registered_fds) {
+ int new_max = s->max_registered_fds * 2;
+ new_fds = g_realloc(s->registered_fds, sizeof(int) * new_max);
+ if (!new_fds) {
+ return -ENOMEM;
+ }
+ s->registered_fds = new_fds;
+ s->max_registered_fds = new_max;
+ }
+
+ idx = s->nr_registered_fds++;
+ s->registered_fds[idx] = fd;
+
+ ret = io_uring_register_files(&s->ring, s->registered_fds, s->nr_registered_fds);
+ if (ret < 0) {
+ s->nr_registered_fds--;
+ return ret;
+ }
+
+ return idx;
+}
+
/**
* luring_do_submit:
* @fd: file descriptor for I/O
@@ -339,6 +379,15 @@ static int luring_do_submit(int fd, LuringAIOCB *luringcb, LuringState *s,
{
int ret;
struct io_uring_sqe *sqes = &luringcb->sqeq;
+ int fixed_fd_idx;
+
+ fixed_fd_idx = luring_register_fd(s, fd);
+ if (fixed_fd_idx < 0) {
+ return fixed_fd_idx;
+ }
+
+ sqes->flags |= IOSQE_FIXED_FILE;
+ sqes->fd = fixed_fd_idx;
switch (type) {
case QEMU_AIO_WRITE:
@@ -447,6 +496,11 @@ LuringState *luring_init(Error **errp)
return NULL;
}
+ /* Initialize fixed file support */
+ s->max_registered_fds = 1024;
+ s->registered_fds = g_new0(int, s->max_registered_fds);
+ s->nr_registered_fds = 0;
+
ioq_init(&s->io_q);
return s;
@@ -454,6 +508,12 @@ LuringState *luring_init(Error **errp)
void luring_cleanup(LuringState *s)
{
+ if (s->registered_fds) {
+ if (s->nr_registered_fds > 0) {
+ io_uring_unregister_files(&s->ring);
+ }
+ g_free(s->registered_fds);
+ }
io_uring_queue_exit(&s->ring);
trace_luring_cleanup_state(s);
g_free(s);
--
2.43.0
^ permalink raw reply related [flat|nested] 2+ messages in thread
end of thread, other threads:[~2025-04-14 19:31 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-14 18:54 [RFC PATCH 0/1] add fixed file table support Brian Song
2025-04-14 18:54 ` [RFC PATCH 1/1] This work adds support for registering block file descriptors to the io_uring instance and uses IOSQE_FIXED_FILE in I/O requests (SQEs) to avoid the cost of fdget() in the kernel. It is a basic implementation for testing, and does not yet handle cases where block devices are removed Brian Song
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).