* Re: ublk-qcow2: ublk-qcow2 is available [not found] <Yza1u1KfKa7ycQm0@T590> @ 2022-10-03 19:53 ` Stefan Hajnoczi 2022-10-03 23:57 ` Denis V. Lunev 2022-10-04 9:43 ` Ming Lei 0 siblings, 2 replies; 22+ messages in thread From: Stefan Hajnoczi @ 2022-10-03 19:53 UTC (permalink / raw) To: Ming Lei Cc: io-uring, linux-block, linux-kernel, Kirill Tkhai, Manuel Bentele, qemu-devel, Kevin Wolf, rjones, Xie Yongji, Denis V. Lunev, Stefano Garzarella [-- Attachment #1: Type: text/plain, Size: 4485 bytes --] On Fri, Sep 30, 2022 at 05:24:11PM +0800, Ming Lei wrote: > ublk-qcow2 is available now. Cool, thanks for sharing! > > So far it provides basic read/write function, and compression and snapshot > aren't supported yet. The target/backend implementation is completely > based on io_uring, and share the same io_uring with ublk IO command > handler, just like what ublk-loop does. > > Follows the main motivations of ublk-qcow2: > > - building one complicated target from scratch helps libublksrv APIs/functions > become mature/stable more quickly, since qcow2 is complicated and needs more > requirement from libublksrv compared with other simple ones(loop, null) > > - there are several attempts of implementing qcow2 driver in kernel, such as > ``qloop`` [2], ``dm-qcow2`` [3] and ``in kernel qcow2(ro)`` [4], so ublk-qcow2 > might useful be for covering requirement in this field > > - performance comparison with qemu-nbd, and it was my 1st thought to evaluate > performance of ublk/io_uring backend by writing one ublk-qcow2 since ublksrv > is started > > - help to abstract common building block or design pattern for writing new ublk > target/backend > > So far it basically passes xfstest(XFS) test by using ublk-qcow2 block > device as TEST_DEV, and kernel building workload is verified too. Also > soft update approach is applied in meta flushing, and meta data > integrity is guaranteed, 'make test T=qcow2/040' covers this kind of > test, and only cluster leak is reported during this test. > > The performance data looks much better compared with qemu-nbd, see > details in commit log[1], README[5] and STATUS[6]. And the test covers both > empty image and pre-allocated image, for example of pre-allocated qcow2 > image(8GB): > > - qemu-nbd (make test T=qcow2/002) Single queue? > randwrite(4k): jobs 1, iops 24605 > randread(4k): jobs 1, iops 30938 > randrw(4k): jobs 1, iops read 13981 write 14001 > rw(512k): jobs 1, iops read 724 write 728 Please try qemu-storage-daemon's VDUSE export type as well. The command-line should be similar to this: # modprobe virtio_vdpa # attaches vDPA devices to host kernel # modprobe vduse # qemu-storage-daemon \ --blockdev file,filename=test.qcow2,cache.direct=of|off,aio=native,node-name=file \ --blockdev qcow2,file=file,node-name=qcow2 \ --object iothread,id=iothread0 \ --export vduse-blk,id=vduse0,name=vduse0,num-queues=$(nproc),node-name=qcow2,writable=on,iothread=iothread0 # vdpa dev add name vduse0 mgmtdev vduse A virtio-blk device should appear and xfstests can be run on it (typically /dev/vda unless you already have other virtio-blk devices). Afterwards you can destroy the device using: # vdpa dev del vduse0 > > - ublk-qcow2 (make test T=qcow2/022) There are a lot of other factors not directly related to NBD vs ublk. In order to get an apples-to-apples comparison with qemu-* a ublk export type is needed in qemu-storage-daemon. That way only the difference is the ublk interface and the rest of the code path is identical, making it possible to compare NBD, VDUSE, ublk, etc more precisely. I think that comparison is interesting before comparing different qcow2 implementations because qcow2 sits on top of too much other code. It's hard to know what should be accounted to configuration differences, implementation differences, or fundamental differences that cannot be overcome (this is the interesting part!). > randwrite(4k): jobs 1, iops 104481 > randread(4k): jobs 1, iops 114937 > randrw(4k): jobs 1, iops read 53630 write 53577 > rw(512k): jobs 1, iops read 1412 write 1423 > > Also ublk-qcow2 aligns queue's chunk_sectors limit with qcow2's cluster size, > which is 64KB at default, this way simplifies backend io handling, but > it could be increased to 512K or more proper size for improving sequential > IO perf, just need one coroutine to handle more than one IOs. > > > [1] https://github.com/ming1/ubdsrv/commit/9faabbec3a92ca83ddae92335c66eabbeff654e7 > [2] https://upcommons.upc.edu/bitstream/handle/2099.1/9619/65757.pdf?sequence=1&isAllowed=y > [3] https://lwn.net/Articles/889429/ > [4] https://lab.ks.uni-freiburg.de/projects/kernel-qcow2/repository > [5] https://github.com/ming1/ubdsrv/blob/master/qcow2/README.rst > [6] https://github.com/ming1/ubdsrv/blob/master/qcow2/STATUS.rst > > Thanks, > Ming > [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: ublk-qcow2: ublk-qcow2 is available 2022-10-03 19:53 ` ublk-qcow2: ublk-qcow2 is available Stefan Hajnoczi @ 2022-10-03 23:57 ` Denis V. Lunev 2022-10-05 15:11 ` Stefan Hajnoczi 2022-10-04 9:43 ` Ming Lei 1 sibling, 1 reply; 22+ messages in thread From: Denis V. Lunev @ 2022-10-03 23:57 UTC (permalink / raw) To: Stefan Hajnoczi, Ming Lei Cc: io-uring, linux-block, linux-kernel, Kirill Tkhai, Manuel Bentele, qemu-devel, Kevin Wolf, rjones, Xie Yongji, Stefano Garzarella On 10/3/22 21:53, Stefan Hajnoczi wrote: > On Fri, Sep 30, 2022 at 05:24:11PM +0800, Ming Lei wrote: >> ublk-qcow2 is available now. > Cool, thanks for sharing! yep >> So far it provides basic read/write function, and compression and snapshot >> aren't supported yet. The target/backend implementation is completely >> based on io_uring, and share the same io_uring with ublk IO command >> handler, just like what ublk-loop does. >> >> Follows the main motivations of ublk-qcow2: >> >> - building one complicated target from scratch helps libublksrv APIs/functions >> become mature/stable more quickly, since qcow2 is complicated and needs more >> requirement from libublksrv compared with other simple ones(loop, null) >> >> - there are several attempts of implementing qcow2 driver in kernel, such as >> ``qloop`` [2], ``dm-qcow2`` [3] and ``in kernel qcow2(ro)`` [4], so ublk-qcow2 >> might useful be for covering requirement in this field There is one important thing to keep in mind about all partly-userspace implementations though: * any single allocation happened in the context of the userspace daemon through try_to_free_pages() in kernel has a possibility to trigger the operation, which will require userspace daemon action, which is inside the kernel now. * the probability of this is higher in the overcommitted environment This was the main motivation of us in favor for the in-kernel implementation. >> - performance comparison with qemu-nbd, and it was my 1st thought to evaluate >> performance of ublk/io_uring backend by writing one ublk-qcow2 since ublksrv >> is started >> >> - help to abstract common building block or design pattern for writing new ublk >> target/backend >> >> So far it basically passes xfstest(XFS) test by using ublk-qcow2 block >> device as TEST_DEV, and kernel building workload is verified too. Also >> soft update approach is applied in meta flushing, and meta data >> integrity is guaranteed, 'make test T=qcow2/040' covers this kind of >> test, and only cluster leak is reported during this test. >> >> The performance data looks much better compared with qemu-nbd, see >> details in commit log[1], README[5] and STATUS[6]. And the test covers both >> empty image and pre-allocated image, for example of pre-allocated qcow2 >> image(8GB): >> >> - qemu-nbd (make test T=qcow2/002) > Single queue? > >> randwrite(4k): jobs 1, iops 24605 >> randread(4k): jobs 1, iops 30938 >> randrw(4k): jobs 1, iops read 13981 write 14001 >> rw(512k): jobs 1, iops read 724 write 728 > Please try qemu-storage-daemon's VDUSE export type as well. The > command-line should be similar to this: > > # modprobe virtio_vdpa # attaches vDPA devices to host kernel > # modprobe vduse > # qemu-storage-daemon \ > --blockdev file,filename=test.qcow2,cache.direct=of|off,aio=native,node-name=file \ > --blockdev qcow2,file=file,node-name=qcow2 \ > --object iothread,id=iothread0 \ > --export vduse-blk,id=vduse0,name=vduse0,num-queues=$(nproc),node-name=qcow2,writable=on,iothread=iothread0 > # vdpa dev add name vduse0 mgmtdev vduse > > A virtio-blk device should appear and xfstests can be run on it > (typically /dev/vda unless you already have other virtio-blk devices). > > Afterwards you can destroy the device using: > > # vdpa dev del vduse0 but this would be anyway limited by a single thread doing AIO in qemu-storage-daemon, I believe. >> - ublk-qcow2 (make test T=qcow2/022) > There are a lot of other factors not directly related to NBD vs ublk. In > order to get an apples-to-apples comparison with qemu-* a ublk export > type is needed in qemu-storage-daemon. That way only the difference is > the ublk interface and the rest of the code path is identical, making it > possible to compare NBD, VDUSE, ublk, etc more precisely. > > I think that comparison is interesting before comparing different qcow2 > implementations because qcow2 sits on top of too much other code. It's > hard to know what should be accounted to configuration differences, > implementation differences, or fundamental differences that cannot be > overcome (this is the interesting part!). > >> randwrite(4k): jobs 1, iops 104481 >> randread(4k): jobs 1, iops 114937 >> randrw(4k): jobs 1, iops read 53630 write 53577 >> rw(512k): jobs 1, iops read 1412 write 1423 >> >> Also ublk-qcow2 aligns queue's chunk_sectors limit with qcow2's cluster size, >> which is 64KB at default, this way simplifies backend io handling, but >> it could be increased to 512K or more proper size for improving sequential >> IO perf, just need one coroutine to handle more than one IOs. >> >> >> [1] https://github.com/ming1/ubdsrv/commit/9faabbec3a92ca83ddae92335c66eabbeff654e7 >> [2] https://upcommons.upc.edu/bitstream/handle/2099.1/9619/65757.pdf?sequence=1&isAllowed=y >> [3] https://lwn.net/Articles/889429/ >> [4] https://lab.ks.uni-freiburg.de/projects/kernel-qcow2/repository >> [5] https://github.com/ming1/ubdsrv/blob/master/qcow2/README.rst >> [6] https://github.com/ming1/ubdsrv/blob/master/qcow2/STATUS.rst interesting... Den ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: ublk-qcow2: ublk-qcow2 is available 2022-10-03 23:57 ` Denis V. Lunev @ 2022-10-05 15:11 ` Stefan Hajnoczi 2022-10-06 10:26 ` Ming Lei 0 siblings, 1 reply; 22+ messages in thread From: Stefan Hajnoczi @ 2022-10-05 15:11 UTC (permalink / raw) To: Denis V. Lunev Cc: Ming Lei, io-uring, linux-block, linux-kernel, Kirill Tkhai, Manuel Bentele, qemu-devel, Kevin Wolf, rjones, Xie Yongji, Stefano Garzarella, Josef Bacik [-- Attachment #1: Type: text/plain, Size: 1848 bytes --] On Tue, Oct 04, 2022 at 01:57:50AM +0200, Denis V. Lunev wrote: > On 10/3/22 21:53, Stefan Hajnoczi wrote: > > On Fri, Sep 30, 2022 at 05:24:11PM +0800, Ming Lei wrote: > > > ublk-qcow2 is available now. > > Cool, thanks for sharing! > yep > > > > So far it provides basic read/write function, and compression and snapshot > > > aren't supported yet. The target/backend implementation is completely > > > based on io_uring, and share the same io_uring with ublk IO command > > > handler, just like what ublk-loop does. > > > > > > Follows the main motivations of ublk-qcow2: > > > > > > - building one complicated target from scratch helps libublksrv APIs/functions > > > become mature/stable more quickly, since qcow2 is complicated and needs more > > > requirement from libublksrv compared with other simple ones(loop, null) > > > > > > - there are several attempts of implementing qcow2 driver in kernel, such as > > > ``qloop`` [2], ``dm-qcow2`` [3] and ``in kernel qcow2(ro)`` [4], so ublk-qcow2 > > > might useful be for covering requirement in this field > There is one important thing to keep in mind about all partly-userspace > implementations though: > * any single allocation happened in the context of the > userspace daemon through try_to_free_pages() in > kernel has a possibility to trigger the operation, > which will require userspace daemon action, which > is inside the kernel now. > * the probability of this is higher in the overcommitted > environment > > This was the main motivation of us in favor for the in-kernel > implementation. CCed Josef Bacik because the Linux NBD driver has dealt with memory reclaim hangs in the past. Josef: Any thoughts on userspace block drivers (whether NBD or ublk) and how to avoid hangs in memory reclaim? Stefan [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: ublk-qcow2: ublk-qcow2 is available 2022-10-05 15:11 ` Stefan Hajnoczi @ 2022-10-06 10:26 ` Ming Lei 2022-10-06 13:59 ` Stefan Hajnoczi 0 siblings, 1 reply; 22+ messages in thread From: Ming Lei @ 2022-10-06 10:26 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Denis V. Lunev, io-uring, linux-block, linux-kernel, Kirill Tkhai, Manuel Bentele, qemu-devel, Kevin Wolf, rjones, Xie Yongji, Stefano Garzarella, Josef Bacik On Wed, Oct 05, 2022 at 11:11:32AM -0400, Stefan Hajnoczi wrote: > On Tue, Oct 04, 2022 at 01:57:50AM +0200, Denis V. Lunev wrote: > > On 10/3/22 21:53, Stefan Hajnoczi wrote: > > > On Fri, Sep 30, 2022 at 05:24:11PM +0800, Ming Lei wrote: > > > > ublk-qcow2 is available now. > > > Cool, thanks for sharing! > > yep > > > > > > So far it provides basic read/write function, and compression and snapshot > > > > aren't supported yet. The target/backend implementation is completely > > > > based on io_uring, and share the same io_uring with ublk IO command > > > > handler, just like what ublk-loop does. > > > > > > > > Follows the main motivations of ublk-qcow2: > > > > > > > > - building one complicated target from scratch helps libublksrv APIs/functions > > > > become mature/stable more quickly, since qcow2 is complicated and needs more > > > > requirement from libublksrv compared with other simple ones(loop, null) > > > > > > > > - there are several attempts of implementing qcow2 driver in kernel, such as > > > > ``qloop`` [2], ``dm-qcow2`` [3] and ``in kernel qcow2(ro)`` [4], so ublk-qcow2 > > > > might useful be for covering requirement in this field > > There is one important thing to keep in mind about all partly-userspace > > implementations though: > > * any single allocation happened in the context of the > > userspace daemon through try_to_free_pages() in > > kernel has a possibility to trigger the operation, > > which will require userspace daemon action, which > > is inside the kernel now. > > * the probability of this is higher in the overcommitted > > environment > > > > This was the main motivation of us in favor for the in-kernel > > implementation. > > CCed Josef Bacik because the Linux NBD driver has dealt with memory > reclaim hangs in the past. > > Josef: Any thoughts on userspace block drivers (whether NBD or ublk) and > how to avoid hangs in memory reclaim? If I remember correctly, there isn't new report after the last NBD(TCMU) deadlock in memory reclaim was addressed by 8d19f1c8e193 ("prctl: PR_{G,S}ET_IO_FLUSHER to support controlling memory reclaim"). Thanks, Ming ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: ublk-qcow2: ublk-qcow2 is available 2022-10-06 10:26 ` Ming Lei @ 2022-10-06 13:59 ` Stefan Hajnoczi 2022-10-06 15:09 ` Ming Lei 0 siblings, 1 reply; 22+ messages in thread From: Stefan Hajnoczi @ 2022-10-06 13:59 UTC (permalink / raw) To: Denis V. Lunev Cc: Ming Lei, io-uring, linux-block, linux-kernel, Kirill Tkhai, Manuel Bentele, qemu-devel, Kevin Wolf, rjones, Xie Yongji, Stefano Garzarella, Josef Bacik [-- Attachment #1: Type: text/plain, Size: 2769 bytes --] On Thu, Oct 06, 2022 at 06:26:15PM +0800, Ming Lei wrote: > On Wed, Oct 05, 2022 at 11:11:32AM -0400, Stefan Hajnoczi wrote: > > On Tue, Oct 04, 2022 at 01:57:50AM +0200, Denis V. Lunev wrote: > > > On 10/3/22 21:53, Stefan Hajnoczi wrote: > > > > On Fri, Sep 30, 2022 at 05:24:11PM +0800, Ming Lei wrote: > > > > > ublk-qcow2 is available now. > > > > Cool, thanks for sharing! > > > yep > > > > > > > > So far it provides basic read/write function, and compression and snapshot > > > > > aren't supported yet. The target/backend implementation is completely > > > > > based on io_uring, and share the same io_uring with ublk IO command > > > > > handler, just like what ublk-loop does. > > > > > > > > > > Follows the main motivations of ublk-qcow2: > > > > > > > > > > - building one complicated target from scratch helps libublksrv APIs/functions > > > > > become mature/stable more quickly, since qcow2 is complicated and needs more > > > > > requirement from libublksrv compared with other simple ones(loop, null) > > > > > > > > > > - there are several attempts of implementing qcow2 driver in kernel, such as > > > > > ``qloop`` [2], ``dm-qcow2`` [3] and ``in kernel qcow2(ro)`` [4], so ublk-qcow2 > > > > > might useful be for covering requirement in this field > > > There is one important thing to keep in mind about all partly-userspace > > > implementations though: > > > * any single allocation happened in the context of the > > > userspace daemon through try_to_free_pages() in > > > kernel has a possibility to trigger the operation, > > > which will require userspace daemon action, which > > > is inside the kernel now. > > > * the probability of this is higher in the overcommitted > > > environment > > > > > > This was the main motivation of us in favor for the in-kernel > > > implementation. > > > > CCed Josef Bacik because the Linux NBD driver has dealt with memory > > reclaim hangs in the past. > > > > Josef: Any thoughts on userspace block drivers (whether NBD or ublk) and > > how to avoid hangs in memory reclaim? > > If I remember correctly, there isn't new report after the last NBD(TCMU) deadlock > in memory reclaim was addressed by 8d19f1c8e193 ("prctl: PR_{G,S}ET_IO_FLUSHER > to support controlling memory reclaim"). Denis: I'm trying to understand the problem you described. Is this correct: Due to memory pressure, the kernel reclaims pages and submits a write to a ublk block device. The userspace process attempts to allocate memory in order to service the write request, but it gets stuck because there is no memory available. As a result reclaim gets stuck, the system is unable to free more memory and therefore it hangs? Stefan [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: ublk-qcow2: ublk-qcow2 is available 2022-10-06 13:59 ` Stefan Hajnoczi @ 2022-10-06 15:09 ` Ming Lei 2022-10-06 18:29 ` Stefan Hajnoczi 0 siblings, 1 reply; 22+ messages in thread From: Ming Lei @ 2022-10-06 15:09 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Denis V. Lunev, io-uring, linux-block, linux-kernel, Kirill Tkhai, Manuel Bentele, qemu-devel, Kevin Wolf, rjones, Xie Yongji, Stefano Garzarella, Josef Bacik On Thu, Oct 06, 2022 at 09:59:40AM -0400, Stefan Hajnoczi wrote: > On Thu, Oct 06, 2022 at 06:26:15PM +0800, Ming Lei wrote: > > On Wed, Oct 05, 2022 at 11:11:32AM -0400, Stefan Hajnoczi wrote: > > > On Tue, Oct 04, 2022 at 01:57:50AM +0200, Denis V. Lunev wrote: > > > > On 10/3/22 21:53, Stefan Hajnoczi wrote: > > > > > On Fri, Sep 30, 2022 at 05:24:11PM +0800, Ming Lei wrote: > > > > > > ublk-qcow2 is available now. > > > > > Cool, thanks for sharing! > > > > yep > > > > > > > > > > So far it provides basic read/write function, and compression and snapshot > > > > > > aren't supported yet. The target/backend implementation is completely > > > > > > based on io_uring, and share the same io_uring with ublk IO command > > > > > > handler, just like what ublk-loop does. > > > > > > > > > > > > Follows the main motivations of ublk-qcow2: > > > > > > > > > > > > - building one complicated target from scratch helps libublksrv APIs/functions > > > > > > become mature/stable more quickly, since qcow2 is complicated and needs more > > > > > > requirement from libublksrv compared with other simple ones(loop, null) > > > > > > > > > > > > - there are several attempts of implementing qcow2 driver in kernel, such as > > > > > > ``qloop`` [2], ``dm-qcow2`` [3] and ``in kernel qcow2(ro)`` [4], so ublk-qcow2 > > > > > > might useful be for covering requirement in this field > > > > There is one important thing to keep in mind about all partly-userspace > > > > implementations though: > > > > * any single allocation happened in the context of the > > > > userspace daemon through try_to_free_pages() in > > > > kernel has a possibility to trigger the operation, > > > > which will require userspace daemon action, which > > > > is inside the kernel now. > > > > * the probability of this is higher in the overcommitted > > > > environment > > > > > > > > This was the main motivation of us in favor for the in-kernel > > > > implementation. > > > > > > CCed Josef Bacik because the Linux NBD driver has dealt with memory > > > reclaim hangs in the past. > > > > > > Josef: Any thoughts on userspace block drivers (whether NBD or ublk) and > > > how to avoid hangs in memory reclaim? > > > > If I remember correctly, there isn't new report after the last NBD(TCMU) deadlock > > in memory reclaim was addressed by 8d19f1c8e193 ("prctl: PR_{G,S}ET_IO_FLUSHER > > to support controlling memory reclaim"). > > Denis: I'm trying to understand the problem you described. Is this > correct: > > Due to memory pressure, the kernel reclaims pages and submits a write to > a ublk block device. The userspace process attempts to allocate memory > in order to service the write request, but it gets stuck because there > is no memory available. As a result reclaim gets stuck, the system is > unable to free more memory and therefore it hangs? The process should be killed in this situation if PR_SET_IO_FLUSHER is applied since the page allocation is done in VM fault handler. Firstly in theory the userspace part should provide forward progress guarantee in code path for handling IO, such as reserving/mlock pages for such situation. However, this issue isn't unique for nbd or ublk, all userspace block device should have such potential risk, and vduse is no exception, IMO. Secondly with proper/enough swap space, I think it is hard to trigger such kind of issue. Finally ublk driver has added user recovery commands for recovering from crash, and ublksrv will support it soon. Thanks, Ming ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: ublk-qcow2: ublk-qcow2 is available 2022-10-06 15:09 ` Ming Lei @ 2022-10-06 18:29 ` Stefan Hajnoczi 2022-10-07 11:21 ` Ming Lei 0 siblings, 1 reply; 22+ messages in thread From: Stefan Hajnoczi @ 2022-10-06 18:29 UTC (permalink / raw) To: Ming Lei Cc: Denis V. Lunev, io-uring, linux-block, linux-kernel, Kirill Tkhai, Manuel Bentele, qemu-devel, Kevin Wolf, rjones, Xie Yongji, Stefano Garzarella, Josef Bacik, Mike Christie [-- Attachment #1: Type: text/plain, Size: 4166 bytes --] On Thu, Oct 06, 2022 at 11:09:48PM +0800, Ming Lei wrote: > On Thu, Oct 06, 2022 at 09:59:40AM -0400, Stefan Hajnoczi wrote: > > On Thu, Oct 06, 2022 at 06:26:15PM +0800, Ming Lei wrote: > > > On Wed, Oct 05, 2022 at 11:11:32AM -0400, Stefan Hajnoczi wrote: > > > > On Tue, Oct 04, 2022 at 01:57:50AM +0200, Denis V. Lunev wrote: > > > > > On 10/3/22 21:53, Stefan Hajnoczi wrote: > > > > > > On Fri, Sep 30, 2022 at 05:24:11PM +0800, Ming Lei wrote: > > > > > > > ublk-qcow2 is available now. > > > > > > Cool, thanks for sharing! > > > > > yep > > > > > > > > > > > > So far it provides basic read/write function, and compression and snapshot > > > > > > > aren't supported yet. The target/backend implementation is completely > > > > > > > based on io_uring, and share the same io_uring with ublk IO command > > > > > > > handler, just like what ublk-loop does. > > > > > > > > > > > > > > Follows the main motivations of ublk-qcow2: > > > > > > > > > > > > > > - building one complicated target from scratch helps libublksrv APIs/functions > > > > > > > become mature/stable more quickly, since qcow2 is complicated and needs more > > > > > > > requirement from libublksrv compared with other simple ones(loop, null) > > > > > > > > > > > > > > - there are several attempts of implementing qcow2 driver in kernel, such as > > > > > > > ``qloop`` [2], ``dm-qcow2`` [3] and ``in kernel qcow2(ro)`` [4], so ublk-qcow2 > > > > > > > might useful be for covering requirement in this field > > > > > There is one important thing to keep in mind about all partly-userspace > > > > > implementations though: > > > > > * any single allocation happened in the context of the > > > > > userspace daemon through try_to_free_pages() in > > > > > kernel has a possibility to trigger the operation, > > > > > which will require userspace daemon action, which > > > > > is inside the kernel now. > > > > > * the probability of this is higher in the overcommitted > > > > > environment > > > > > > > > > > This was the main motivation of us in favor for the in-kernel > > > > > implementation. > > > > > > > > CCed Josef Bacik because the Linux NBD driver has dealt with memory > > > > reclaim hangs in the past. > > > > > > > > Josef: Any thoughts on userspace block drivers (whether NBD or ublk) and > > > > how to avoid hangs in memory reclaim? > > > > > > If I remember correctly, there isn't new report after the last NBD(TCMU) deadlock > > > in memory reclaim was addressed by 8d19f1c8e193 ("prctl: PR_{G,S}ET_IO_FLUSHER > > > to support controlling memory reclaim"). > > > > Denis: I'm trying to understand the problem you described. Is this > > correct: > > > > Due to memory pressure, the kernel reclaims pages and submits a write to > > a ublk block device. The userspace process attempts to allocate memory > > in order to service the write request, but it gets stuck because there > > is no memory available. As a result reclaim gets stuck, the system is > > unable to free more memory and therefore it hangs? > > The process should be killed in this situation if PR_SET_IO_FLUSHER > is applied since the page allocation is done in VM fault handler. Thanks for mentioning PR_SET_IO_FLUSHER. There is more info in commit 8d19f1c8e1937baf74e1962aae9f90fa3aeab463 ("prctl: PR_{G,S}ET_IO_FLUSHER to support controlling memory reclaim"). It requires CAP_SYS_RESOURCE :/. This makes me wonder whether unprivileged ublk will ever be possible. I think this addresses Denis' concern about hangs, but it doesn't solve them because I/O will fail. The real solution is probably what you mentioned... > Firstly in theory the userspace part should provide forward progress > guarantee in code path for handling IO, such as reserving/mlock pages > for such situation. However, this issue isn't unique for nbd or ublk, > all userspace block device should have such potential risk, and vduse > is no exception, IMO. ...here. Userspace needs to minimize memory allocations in the I/O code path and reserve sufficient resources to make forward progress. Stefan [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: ublk-qcow2: ublk-qcow2 is available 2022-10-06 18:29 ` Stefan Hajnoczi @ 2022-10-07 11:21 ` Ming Lei 0 siblings, 0 replies; 22+ messages in thread From: Ming Lei @ 2022-10-07 11:21 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Denis V. Lunev, io-uring, linux-block, linux-kernel, Kirill Tkhai, Manuel Bentele, qemu-devel, Kevin Wolf, rjones, Xie Yongji, Stefano Garzarella, Josef Bacik, Mike Christie On Thu, Oct 06, 2022 at 02:29:55PM -0400, Stefan Hajnoczi wrote: > On Thu, Oct 06, 2022 at 11:09:48PM +0800, Ming Lei wrote: > > On Thu, Oct 06, 2022 at 09:59:40AM -0400, Stefan Hajnoczi wrote: > > > On Thu, Oct 06, 2022 at 06:26:15PM +0800, Ming Lei wrote: > > > > On Wed, Oct 05, 2022 at 11:11:32AM -0400, Stefan Hajnoczi wrote: > > > > > On Tue, Oct 04, 2022 at 01:57:50AM +0200, Denis V. Lunev wrote: > > > > > > On 10/3/22 21:53, Stefan Hajnoczi wrote: > > > > > > > On Fri, Sep 30, 2022 at 05:24:11PM +0800, Ming Lei wrote: > > > > > > > > ublk-qcow2 is available now. > > > > > > > Cool, thanks for sharing! > > > > > > yep > > > > > > > > > > > > > > So far it provides basic read/write function, and compression and snapshot > > > > > > > > aren't supported yet. The target/backend implementation is completely > > > > > > > > based on io_uring, and share the same io_uring with ublk IO command > > > > > > > > handler, just like what ublk-loop does. > > > > > > > > > > > > > > > > Follows the main motivations of ublk-qcow2: > > > > > > > > > > > > > > > > - building one complicated target from scratch helps libublksrv APIs/functions > > > > > > > > become mature/stable more quickly, since qcow2 is complicated and needs more > > > > > > > > requirement from libublksrv compared with other simple ones(loop, null) > > > > > > > > > > > > > > > > - there are several attempts of implementing qcow2 driver in kernel, such as > > > > > > > > ``qloop`` [2], ``dm-qcow2`` [3] and ``in kernel qcow2(ro)`` [4], so ublk-qcow2 > > > > > > > > might useful be for covering requirement in this field > > > > > > There is one important thing to keep in mind about all partly-userspace > > > > > > implementations though: > > > > > > * any single allocation happened in the context of the > > > > > > userspace daemon through try_to_free_pages() in > > > > > > kernel has a possibility to trigger the operation, > > > > > > which will require userspace daemon action, which > > > > > > is inside the kernel now. > > > > > > * the probability of this is higher in the overcommitted > > > > > > environment > > > > > > > > > > > > This was the main motivation of us in favor for the in-kernel > > > > > > implementation. > > > > > > > > > > CCed Josef Bacik because the Linux NBD driver has dealt with memory > > > > > reclaim hangs in the past. > > > > > > > > > > Josef: Any thoughts on userspace block drivers (whether NBD or ublk) and > > > > > how to avoid hangs in memory reclaim? > > > > > > > > If I remember correctly, there isn't new report after the last NBD(TCMU) deadlock > > > > in memory reclaim was addressed by 8d19f1c8e193 ("prctl: PR_{G,S}ET_IO_FLUSHER > > > > to support controlling memory reclaim"). > > > > > > Denis: I'm trying to understand the problem you described. Is this > > > correct: > > > > > > Due to memory pressure, the kernel reclaims pages and submits a write to > > > a ublk block device. The userspace process attempts to allocate memory > > > in order to service the write request, but it gets stuck because there > > > is no memory available. As a result reclaim gets stuck, the system is > > > unable to free more memory and therefore it hangs? > > > > The process should be killed in this situation if PR_SET_IO_FLUSHER > > is applied since the page allocation is done in VM fault handler. > > Thanks for mentioning PR_SET_IO_FLUSHER. There is more info in commit > 8d19f1c8e1937baf74e1962aae9f90fa3aeab463 ("prctl: PR_{G,S}ET_IO_FLUSHER > to support controlling memory reclaim"). > > It requires CAP_SYS_RESOURCE :/. This makes me wonder whether > unprivileged ublk will ever be possible. IMO, it shouldn't be one blocker, there might be lots of choices for us - unprivileged ublk can simply not call it, if such io hang is triggered, ublksrv is capable of figuring out this problem, then kill & recover the device. - set PR_IO_FLUSHER for current task in ublk_ch_uring_cmd(UBLK_IO_FETCH_REQ) - ... > > I think this addresses Denis' concern about hangs, but it doesn't solve > them because I/O will fail. The real solution is probably what you > mentioned... So far, not see real report yet, and it may be never one issue if proper swap device/file is configured. Thanks, Ming ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: ublk-qcow2: ublk-qcow2 is available 2022-10-03 19:53 ` ublk-qcow2: ublk-qcow2 is available Stefan Hajnoczi 2022-10-03 23:57 ` Denis V. Lunev @ 2022-10-04 9:43 ` Ming Lei 2022-10-04 13:53 ` Stefan Hajnoczi 1 sibling, 1 reply; 22+ messages in thread From: Ming Lei @ 2022-10-04 9:43 UTC (permalink / raw) To: Stefan Hajnoczi Cc: io-uring, linux-block, linux-kernel, Kirill Tkhai, Manuel Bentele, qemu-devel, Kevin Wolf, rjones, Xie Yongji, Denis V. Lunev, Stefano Garzarella On Mon, Oct 03, 2022 at 03:53:41PM -0400, Stefan Hajnoczi wrote: > On Fri, Sep 30, 2022 at 05:24:11PM +0800, Ming Lei wrote: > > ublk-qcow2 is available now. > > Cool, thanks for sharing! > > > > > So far it provides basic read/write function, and compression and snapshot > > aren't supported yet. The target/backend implementation is completely > > based on io_uring, and share the same io_uring with ublk IO command > > handler, just like what ublk-loop does. > > > > Follows the main motivations of ublk-qcow2: > > > > - building one complicated target from scratch helps libublksrv APIs/functions > > become mature/stable more quickly, since qcow2 is complicated and needs more > > requirement from libublksrv compared with other simple ones(loop, null) > > > > - there are several attempts of implementing qcow2 driver in kernel, such as > > ``qloop`` [2], ``dm-qcow2`` [3] and ``in kernel qcow2(ro)`` [4], so ublk-qcow2 > > might useful be for covering requirement in this field > > > > - performance comparison with qemu-nbd, and it was my 1st thought to evaluate > > performance of ublk/io_uring backend by writing one ublk-qcow2 since ublksrv > > is started > > > > - help to abstract common building block or design pattern for writing new ublk > > target/backend > > > > So far it basically passes xfstest(XFS) test by using ublk-qcow2 block > > device as TEST_DEV, and kernel building workload is verified too. Also > > soft update approach is applied in meta flushing, and meta data > > integrity is guaranteed, 'make test T=qcow2/040' covers this kind of > > test, and only cluster leak is reported during this test. > > > > The performance data looks much better compared with qemu-nbd, see > > details in commit log[1], README[5] and STATUS[6]. And the test covers both > > empty image and pre-allocated image, for example of pre-allocated qcow2 > > image(8GB): > > > > - qemu-nbd (make test T=qcow2/002) > > Single queue? Yeah. > > > randwrite(4k): jobs 1, iops 24605 > > randread(4k): jobs 1, iops 30938 > > randrw(4k): jobs 1, iops read 13981 write 14001 > > rw(512k): jobs 1, iops read 724 write 728 > > Please try qemu-storage-daemon's VDUSE export type as well. The > command-line should be similar to this: > > # modprobe virtio_vdpa # attaches vDPA devices to host kernel Not found virtio_vdpa module even though I enabled all the following options: --- vDPA drivers <M> vDPA device simulator core <M> vDPA simulator for networking device <M> vDPA simulator for block device <M> VDUSE (vDPA Device in Userspace) support <M> Intel IFC VF vDPA driver <M> Virtio PCI bridge vDPA driver <M> vDPA driver for Alibaba ENI BTW, my test environment is VM and the shared data is done in VM too, and can virtio_vdpa be used inside VM? > # modprobe vduse > # qemu-storage-daemon \ > --blockdev file,filename=test.qcow2,cache.direct=of|off,aio=native,node-name=file \ > --blockdev qcow2,file=file,node-name=qcow2 \ > --object iothread,id=iothread0 \ > --export vduse-blk,id=vduse0,name=vduse0,num-queues=$(nproc),node-name=qcow2,writable=on,iothread=iothread0 > # vdpa dev add name vduse0 mgmtdev vduse > > A virtio-blk device should appear and xfstests can be run on it > (typically /dev/vda unless you already have other virtio-blk devices). > > Afterwards you can destroy the device using: > > # vdpa dev del vduse0 > > > > > - ublk-qcow2 (make test T=qcow2/022) > > There are a lot of other factors not directly related to NBD vs ublk. In > order to get an apples-to-apples comparison with qemu-* a ublk export > type is needed in qemu-storage-daemon. That way only the difference is > the ublk interface and the rest of the code path is identical, making it > possible to compare NBD, VDUSE, ublk, etc more precisely. Maybe not true. ublk-qcow2 uses io_uring to handle all backend IO(include meta IO) completely, and so far single io_uring/pthread is for handling all qcow2 IOs and IO command. thanks, Ming ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: ublk-qcow2: ublk-qcow2 is available 2022-10-04 9:43 ` Ming Lei @ 2022-10-04 13:53 ` Stefan Hajnoczi 2022-10-05 4:18 ` Ming Lei 2022-10-06 10:14 ` Richard W.M. Jones 0 siblings, 2 replies; 22+ messages in thread From: Stefan Hajnoczi @ 2022-10-04 13:53 UTC (permalink / raw) To: Ming Lei Cc: Stefan Hajnoczi, io-uring, linux-block, linux-kernel, Kirill Tkhai, Manuel Bentele, qemu-devel, Kevin Wolf, rjones, Xie Yongji, Denis V. Lunev, Stefano Garzarella On Tue, 4 Oct 2022 at 05:44, Ming Lei <tom.leiming@gmail.com> wrote: > > On Mon, Oct 03, 2022 at 03:53:41PM -0400, Stefan Hajnoczi wrote: > > On Fri, Sep 30, 2022 at 05:24:11PM +0800, Ming Lei wrote: > > > ublk-qcow2 is available now. > > > > Cool, thanks for sharing! > > > > > > > > So far it provides basic read/write function, and compression and snapshot > > > aren't supported yet. The target/backend implementation is completely > > > based on io_uring, and share the same io_uring with ublk IO command > > > handler, just like what ublk-loop does. > > > > > > Follows the main motivations of ublk-qcow2: > > > > > > - building one complicated target from scratch helps libublksrv APIs/functions > > > become mature/stable more quickly, since qcow2 is complicated and needs more > > > requirement from libublksrv compared with other simple ones(loop, null) > > > > > > - there are several attempts of implementing qcow2 driver in kernel, such as > > > ``qloop`` [2], ``dm-qcow2`` [3] and ``in kernel qcow2(ro)`` [4], so ublk-qcow2 > > > might useful be for covering requirement in this field > > > > > > - performance comparison with qemu-nbd, and it was my 1st thought to evaluate > > > performance of ublk/io_uring backend by writing one ublk-qcow2 since ublksrv > > > is started > > > > > > - help to abstract common building block or design pattern for writing new ublk > > > target/backend > > > > > > So far it basically passes xfstest(XFS) test by using ublk-qcow2 block > > > device as TEST_DEV, and kernel building workload is verified too. Also > > > soft update approach is applied in meta flushing, and meta data > > > integrity is guaranteed, 'make test T=qcow2/040' covers this kind of > > > test, and only cluster leak is reported during this test. > > > > > > The performance data looks much better compared with qemu-nbd, see > > > details in commit log[1], README[5] and STATUS[6]. And the test covers both > > > empty image and pre-allocated image, for example of pre-allocated qcow2 > > > image(8GB): > > > > > > - qemu-nbd (make test T=qcow2/002) > > > > Single queue? > > Yeah. > > > > > > randwrite(4k): jobs 1, iops 24605 > > > randread(4k): jobs 1, iops 30938 > > > randrw(4k): jobs 1, iops read 13981 write 14001 > > > rw(512k): jobs 1, iops read 724 write 728 > > > > Please try qemu-storage-daemon's VDUSE export type as well. The > > command-line should be similar to this: > > > > # modprobe virtio_vdpa # attaches vDPA devices to host kernel > > Not found virtio_vdpa module even though I enabled all the following > options: > > --- vDPA drivers > <M> vDPA device simulator core > <M> vDPA simulator for networking device > <M> vDPA simulator for block device > <M> VDUSE (vDPA Device in Userspace) support > <M> Intel IFC VF vDPA driver > <M> Virtio PCI bridge vDPA driver > <M> vDPA driver for Alibaba ENI > > BTW, my test environment is VM and the shared data is done in VM too, and > can virtio_vdpa be used inside VM? I hope Xie Yongji can help explain how to benchmark VDUSE. virtio_vdpa is available inside guests too. Please check that VIRTIO_VDPA ("vDPA driver for virtio devices") is enabled in "Virtio drivers" menu. > > > # modprobe vduse > > # qemu-storage-daemon \ > > --blockdev file,filename=test.qcow2,cache.direct=of|off,aio=native,node-name=file \ > > --blockdev qcow2,file=file,node-name=qcow2 \ > > --object iothread,id=iothread0 \ > > --export vduse-blk,id=vduse0,name=vduse0,num-queues=$(nproc),node-name=qcow2,writable=on,iothread=iothread0 > > # vdpa dev add name vduse0 mgmtdev vduse > > > > A virtio-blk device should appear and xfstests can be run on it > > (typically /dev/vda unless you already have other virtio-blk devices). > > > > Afterwards you can destroy the device using: > > > > # vdpa dev del vduse0 > > > > > > > > - ublk-qcow2 (make test T=qcow2/022) > > > > There are a lot of other factors not directly related to NBD vs ublk. In > > order to get an apples-to-apples comparison with qemu-* a ublk export > > type is needed in qemu-storage-daemon. That way only the difference is > > the ublk interface and the rest of the code path is identical, making it > > possible to compare NBD, VDUSE, ublk, etc more precisely. > > Maybe not true. > > ublk-qcow2 uses io_uring to handle all backend IO(include meta IO) completely, > and so far single io_uring/pthread is for handling all qcow2 IOs and IO > command. qemu-nbd doesn't use io_uring to handle the backend IO, so we don't know whether the benchmark demonstrates that ublk is faster than NBD, that the ublk-qcow2 implementation is faster than qemu-nbd's qcow2, whether there are miscellaneous implementation differences between ublk-qcow2 and qemu-nbd (like using the same io_uring context for both ublk and backend IO), or something else. I'm suggesting measuring changes to just 1 variable at a time. Otherwise it's hard to reach a conclusion about the root cause of the performance difference. Let's learn why ublk-qcow2 performs well. Stefan ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: ublk-qcow2: ublk-qcow2 is available 2022-10-04 13:53 ` Stefan Hajnoczi @ 2022-10-05 4:18 ` Ming Lei 2022-10-05 12:21 ` Stefan Hajnoczi 2022-10-06 10:14 ` Richard W.M. Jones 1 sibling, 1 reply; 22+ messages in thread From: Ming Lei @ 2022-10-05 4:18 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Stefan Hajnoczi, io-uring, linux-block, linux-kernel, Kirill Tkhai, Manuel Bentele, qemu-devel, Kevin Wolf, rjones, Xie Yongji, Denis V. Lunev, Stefano Garzarella On Tue, Oct 04, 2022 at 09:53:32AM -0400, Stefan Hajnoczi wrote: > On Tue, 4 Oct 2022 at 05:44, Ming Lei <tom.leiming@gmail.com> wrote: > > > > On Mon, Oct 03, 2022 at 03:53:41PM -0400, Stefan Hajnoczi wrote: > > > On Fri, Sep 30, 2022 at 05:24:11PM +0800, Ming Lei wrote: > > > > ublk-qcow2 is available now. > > > > > > Cool, thanks for sharing! > > > > > > > > > > > So far it provides basic read/write function, and compression and snapshot > > > > aren't supported yet. The target/backend implementation is completely > > > > based on io_uring, and share the same io_uring with ublk IO command > > > > handler, just like what ublk-loop does. > > > > > > > > Follows the main motivations of ublk-qcow2: > > > > > > > > - building one complicated target from scratch helps libublksrv APIs/functions > > > > become mature/stable more quickly, since qcow2 is complicated and needs more > > > > requirement from libublksrv compared with other simple ones(loop, null) > > > > > > > > - there are several attempts of implementing qcow2 driver in kernel, such as > > > > ``qloop`` [2], ``dm-qcow2`` [3] and ``in kernel qcow2(ro)`` [4], so ublk-qcow2 > > > > might useful be for covering requirement in this field > > > > > > > > - performance comparison with qemu-nbd, and it was my 1st thought to evaluate > > > > performance of ublk/io_uring backend by writing one ublk-qcow2 since ublksrv > > > > is started > > > > > > > > - help to abstract common building block or design pattern for writing new ublk > > > > target/backend > > > > > > > > So far it basically passes xfstest(XFS) test by using ublk-qcow2 block > > > > device as TEST_DEV, and kernel building workload is verified too. Also > > > > soft update approach is applied in meta flushing, and meta data > > > > integrity is guaranteed, 'make test T=qcow2/040' covers this kind of > > > > test, and only cluster leak is reported during this test. > > > > > > > > The performance data looks much better compared with qemu-nbd, see > > > > details in commit log[1], README[5] and STATUS[6]. And the test covers both > > > > empty image and pre-allocated image, for example of pre-allocated qcow2 > > > > image(8GB): > > > > > > > > - qemu-nbd (make test T=qcow2/002) > > > > > > Single queue? > > > > Yeah. > > > > > > > > > randwrite(4k): jobs 1, iops 24605 > > > > randread(4k): jobs 1, iops 30938 > > > > randrw(4k): jobs 1, iops read 13981 write 14001 > > > > rw(512k): jobs 1, iops read 724 write 728 > > > > > > Please try qemu-storage-daemon's VDUSE export type as well. The > > > command-line should be similar to this: > > > > > > # modprobe virtio_vdpa # attaches vDPA devices to host kernel > > > > Not found virtio_vdpa module even though I enabled all the following > > options: > > > > --- vDPA drivers > > <M> vDPA device simulator core > > <M> vDPA simulator for networking device > > <M> vDPA simulator for block device > > <M> VDUSE (vDPA Device in Userspace) support > > <M> Intel IFC VF vDPA driver > > <M> Virtio PCI bridge vDPA driver > > <M> vDPA driver for Alibaba ENI > > > > BTW, my test environment is VM and the shared data is done in VM too, and > > can virtio_vdpa be used inside VM? > > I hope Xie Yongji can help explain how to benchmark VDUSE. > > virtio_vdpa is available inside guests too. Please check that > VIRTIO_VDPA ("vDPA driver for virtio devices") is enabled in "Virtio > drivers" menu. > > > > > > # modprobe vduse > > > # qemu-storage-daemon \ > > > --blockdev file,filename=test.qcow2,cache.direct=of|off,aio=native,node-name=file \ > > > --blockdev qcow2,file=file,node-name=qcow2 \ > > > --object iothread,id=iothread0 \ > > > --export vduse-blk,id=vduse0,name=vduse0,num-queues=$(nproc),node-name=qcow2,writable=on,iothread=iothread0 > > > # vdpa dev add name vduse0 mgmtdev vduse > > > > > > A virtio-blk device should appear and xfstests can be run on it > > > (typically /dev/vda unless you already have other virtio-blk devices). > > > > > > Afterwards you can destroy the device using: > > > > > > # vdpa dev del vduse0 > > > > > > > > > > > - ublk-qcow2 (make test T=qcow2/022) > > > > > > There are a lot of other factors not directly related to NBD vs ublk. In > > > order to get an apples-to-apples comparison with qemu-* a ublk export > > > type is needed in qemu-storage-daemon. That way only the difference is > > > the ublk interface and the rest of the code path is identical, making it > > > possible to compare NBD, VDUSE, ublk, etc more precisely. > > > > Maybe not true. > > > > ublk-qcow2 uses io_uring to handle all backend IO(include meta IO) completely, > > and so far single io_uring/pthread is for handling all qcow2 IOs and IO > > command. > > qemu-nbd doesn't use io_uring to handle the backend IO, so we don't I tried to use it via --aio=io_uring for setting up qemu-nbd, but not succeed. > know whether the benchmark demonstrates that ublk is faster than NBD, > that the ublk-qcow2 implementation is faster than qemu-nbd's qcow2, > whether there are miscellaneous implementation differences between > ublk-qcow2 and qemu-nbd (like using the same io_uring context for both > ublk and backend IO), or something else. The theory shouldn't be too complicated: 1) io uring passthough(pt) communication is fast than socket, and io command is carried over io_uring pt commands, and should be fast than virio communication too. 2) io uring io handling is fast than libaio which is taken in the test on qemu-nbd, and all qcow2 backend io(include meta io) is handled by io_uring. https://github.com/ming1/ubdsrv/blob/master/tests/common/qcow2_common 3) ublk uses one single io_uring to handle all io commands and qcow2 backend IOs, so batching handling is common, and it is easy to see dozens of IOs/io commands handled in single syscall, or even more. > > I'm suggesting measuring changes to just 1 variable at a time. > Otherwise it's hard to reach a conclusion about the root cause of the > performance difference. Let's learn why ublk-qcow2 performs well. Turns out the latest Fedora 37-beta doesn't support vdpa yet, so I built qemu from the latest github tree, and finally it starts to work. And test kernel is v6.0 release. Follows the test result, and all three devices are setup as single queue, and all tests are run in single job, still done in one VM, and the test images are stored on XFS/virito-scsi backed SSD. The 1st group tests all three block device which is backed by empty qcow2 image. The 2nd group tests all the three block devices backed by pre-allocated qcow2 image. Except for big sequential IO(512K), there is still not small gap between vdpa-virtio-blk and ublk. 1. run fio on block device over empty qcow2 image 1) qemu-nbd running qcow2/001 run perf test on empty qcow2 image via nbd fio (nbd(/mnt/data/ublk_null_8G_nYbgF.qcow2), libaio, bs 4k, dio, hw queues:1)... randwrite: jobs 1, iops 8549 randread: jobs 1, iops 34829 randrw: jobs 1, iops read 11363 write 11333 rw(512k): jobs 1, iops read 590 write 597 2) ublk-qcow2 running qcow2/021 run perf test on empty qcow2 image via ublk fio (ublk/qcow2( -f /mnt/data/ublk_null_8G_s761j.qcow2), libaio, bs 4k, dio, hw queues:1, uring_comp: 0, get_data: 0). randwrite: jobs 1, iops 16086 randread: jobs 1, iops 172720 randrw: jobs 1, iops read 35760 write 35702 rw(512k): jobs 1, iops read 1140 write 1149 3) vdpa-virtio-blk running debug/test_dev run io test on specified device fio (vdpa(/dev/vdc), libaio, bs 4k, dio, hw queues:1)... randwrite: jobs 1, iops 8626 randread: jobs 1, iops 126118 randrw: jobs 1, iops read 17698 write 17665 rw(512k): jobs 1, iops read 1023 write 1031 2. run fio on block device over pre-allocated qcow2 image 1) qemu-nbd running qcow2/002 run perf test on pre-allocated qcow2 image via nbd fio (nbd(/mnt/data/ublk_data_8G_sc0SB.qcow2), libaio, bs 4k, dio, hw queues:1)... randwrite: jobs 1, iops 21439 randread: jobs 1, iops 30336 randrw: jobs 1, iops read 11476 write 11449 rw(512k): jobs 1, iops read 718 write 722 2) ublk-qcow2 running qcow2/022 run perf test on pre-allocated qcow2 image via ublk fio (ublk/qcow2( -f /mnt/data/ublk_data_8G_yZiaJ.qcow2), libaio, bs 4k, dio, hw queues:1, uring_comp: 0, get_data: 0). randwrite: jobs 1, iops 98757 randread: jobs 1, iops 110246 randrw: jobs 1, iops read 47229 write 47161 rw(512k): jobs 1, iops read 1416 write 1427 3) vdpa-virtio-blk running debug/test_dev run io test on specified device fio (vdpa(/dev/vdc), libaio, bs 4k, dio, hw queues:1)... randwrite: jobs 1, iops 47317 randread: jobs 1, iops 74092 randrw: jobs 1, iops read 27196 write 27234 rw(512k): jobs 1, iops read 1447 write 1458 thanks, Ming ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: ublk-qcow2: ublk-qcow2 is available 2022-10-05 4:18 ` Ming Lei @ 2022-10-05 12:21 ` Stefan Hajnoczi 2022-10-05 12:38 ` Denis V. Lunev 2022-10-06 11:24 ` Ming Lei 0 siblings, 2 replies; 22+ messages in thread From: Stefan Hajnoczi @ 2022-10-05 12:21 UTC (permalink / raw) To: Ming Lei Cc: Stefan Hajnoczi, io-uring, linux-block, linux-kernel, Kirill Tkhai, Manuel Bentele, qemu-devel, Kevin Wolf, rjones, Xie Yongji, Denis V. Lunev, Stefano Garzarella On Wed, 5 Oct 2022 at 00:19, Ming Lei <tom.leiming@gmail.com> wrote: > > On Tue, Oct 04, 2022 at 09:53:32AM -0400, Stefan Hajnoczi wrote: > > On Tue, 4 Oct 2022 at 05:44, Ming Lei <tom.leiming@gmail.com> wrote: > > > > > > On Mon, Oct 03, 2022 at 03:53:41PM -0400, Stefan Hajnoczi wrote: > > > > On Fri, Sep 30, 2022 at 05:24:11PM +0800, Ming Lei wrote: > > > > > ublk-qcow2 is available now. > > > > > > > > Cool, thanks for sharing! > > > > > > > > > > > > > > So far it provides basic read/write function, and compression and snapshot > > > > > aren't supported yet. The target/backend implementation is completely > > > > > based on io_uring, and share the same io_uring with ublk IO command > > > > > handler, just like what ublk-loop does. > > > > > > > > > > Follows the main motivations of ublk-qcow2: > > > > > > > > > > - building one complicated target from scratch helps libublksrv APIs/functions > > > > > become mature/stable more quickly, since qcow2 is complicated and needs more > > > > > requirement from libublksrv compared with other simple ones(loop, null) > > > > > > > > > > - there are several attempts of implementing qcow2 driver in kernel, such as > > > > > ``qloop`` [2], ``dm-qcow2`` [3] and ``in kernel qcow2(ro)`` [4], so ublk-qcow2 > > > > > might useful be for covering requirement in this field > > > > > > > > > > - performance comparison with qemu-nbd, and it was my 1st thought to evaluate > > > > > performance of ublk/io_uring backend by writing one ublk-qcow2 since ublksrv > > > > > is started > > > > > > > > > > - help to abstract common building block or design pattern for writing new ublk > > > > > target/backend > > > > > > > > > > So far it basically passes xfstest(XFS) test by using ublk-qcow2 block > > > > > device as TEST_DEV, and kernel building workload is verified too. Also > > > > > soft update approach is applied in meta flushing, and meta data > > > > > integrity is guaranteed, 'make test T=qcow2/040' covers this kind of > > > > > test, and only cluster leak is reported during this test. > > > > > > > > > > The performance data looks much better compared with qemu-nbd, see > > > > > details in commit log[1], README[5] and STATUS[6]. And the test covers both > > > > > empty image and pre-allocated image, for example of pre-allocated qcow2 > > > > > image(8GB): > > > > > > > > > > - qemu-nbd (make test T=qcow2/002) > > > > > > > > Single queue? > > > > > > Yeah. > > > > > > > > > > > > randwrite(4k): jobs 1, iops 24605 > > > > > randread(4k): jobs 1, iops 30938 > > > > > randrw(4k): jobs 1, iops read 13981 write 14001 > > > > > rw(512k): jobs 1, iops read 724 write 728 > > > > > > > > Please try qemu-storage-daemon's VDUSE export type as well. The > > > > command-line should be similar to this: > > > > > > > > # modprobe virtio_vdpa # attaches vDPA devices to host kernel > > > > > > Not found virtio_vdpa module even though I enabled all the following > > > options: > > > > > > --- vDPA drivers > > > <M> vDPA device simulator core > > > <M> vDPA simulator for networking device > > > <M> vDPA simulator for block device > > > <M> VDUSE (vDPA Device in Userspace) support > > > <M> Intel IFC VF vDPA driver > > > <M> Virtio PCI bridge vDPA driver > > > <M> vDPA driver for Alibaba ENI > > > > > > BTW, my test environment is VM and the shared data is done in VM too, and > > > can virtio_vdpa be used inside VM? > > > > I hope Xie Yongji can help explain how to benchmark VDUSE. > > > > virtio_vdpa is available inside guests too. Please check that > > VIRTIO_VDPA ("vDPA driver for virtio devices") is enabled in "Virtio > > drivers" menu. > > > > > > > > > # modprobe vduse > > > > # qemu-storage-daemon \ > > > > --blockdev file,filename=test.qcow2,cache.direct=of|off,aio=native,node-name=file \ > > > > --blockdev qcow2,file=file,node-name=qcow2 \ > > > > --object iothread,id=iothread0 \ > > > > --export vduse-blk,id=vduse0,name=vduse0,num-queues=$(nproc),node-name=qcow2,writable=on,iothread=iothread0 > > > > # vdpa dev add name vduse0 mgmtdev vduse > > > > > > > > A virtio-blk device should appear and xfstests can be run on it > > > > (typically /dev/vda unless you already have other virtio-blk devices). > > > > > > > > Afterwards you can destroy the device using: > > > > > > > > # vdpa dev del vduse0 > > > > > > > > > > > > > > - ublk-qcow2 (make test T=qcow2/022) > > > > > > > > There are a lot of other factors not directly related to NBD vs ublk. In > > > > order to get an apples-to-apples comparison with qemu-* a ublk export > > > > type is needed in qemu-storage-daemon. That way only the difference is > > > > the ublk interface and the rest of the code path is identical, making it > > > > possible to compare NBD, VDUSE, ublk, etc more precisely. > > > > > > Maybe not true. > > > > > > ublk-qcow2 uses io_uring to handle all backend IO(include meta IO) completely, > > > and so far single io_uring/pthread is for handling all qcow2 IOs and IO > > > command. > > > > qemu-nbd doesn't use io_uring to handle the backend IO, so we don't > > I tried to use it via --aio=io_uring for setting up qemu-nbd, but not succeed. > > > know whether the benchmark demonstrates that ublk is faster than NBD, > > that the ublk-qcow2 implementation is faster than qemu-nbd's qcow2, > > whether there are miscellaneous implementation differences between > > ublk-qcow2 and qemu-nbd (like using the same io_uring context for both > > ublk and backend IO), or something else. > > The theory shouldn't be too complicated: > > 1) io uring passthough(pt) communication is fast than socket, and io command > is carried over io_uring pt commands, and should be fast than virio > communication too. > > 2) io uring io handling is fast than libaio which is taken in the > test on qemu-nbd, and all qcow2 backend io(include meta io) is handled > by io_uring. > > https://github.com/ming1/ubdsrv/blob/master/tests/common/qcow2_common > > 3) ublk uses one single io_uring to handle all io commands and qcow2 > backend IOs, so batching handling is common, and it is easy to see > dozens of IOs/io commands handled in single syscall, or even more. I agree with the theory but theory has to be tested through experiments in order to validate it. We can all learn from systematic performance analysis - there might even be bottlenecks in ublk that can be solved to improve performance further. > > > > I'm suggesting measuring changes to just 1 variable at a time. > > Otherwise it's hard to reach a conclusion about the root cause of the > > performance difference. Let's learn why ublk-qcow2 performs well. > > Turns out the latest Fedora 37-beta doesn't support vdpa yet, so I built > qemu from the latest github tree, and finally it starts to work. And test kernel > is v6.0 release. > > Follows the test result, and all three devices are setup as single > queue, and all tests are run in single job, still done in one VM, and > the test images are stored on XFS/virito-scsi backed SSD. > > The 1st group tests all three block device which is backed by empty > qcow2 image. > > The 2nd group tests all the three block devices backed by pre-allocated > qcow2 image. > > Except for big sequential IO(512K), there is still not small gap between > vdpa-virtio-blk and ublk. > > 1. run fio on block device over empty qcow2 image > 1) qemu-nbd > running qcow2/001 > run perf test on empty qcow2 image via nbd > fio (nbd(/mnt/data/ublk_null_8G_nYbgF.qcow2), libaio, bs 4k, dio, hw queues:1)... > randwrite: jobs 1, iops 8549 > randread: jobs 1, iops 34829 > randrw: jobs 1, iops read 11363 write 11333 > rw(512k): jobs 1, iops read 590 write 597 > > > 2) ublk-qcow2 > running qcow2/021 > run perf test on empty qcow2 image via ublk > fio (ublk/qcow2( -f /mnt/data/ublk_null_8G_s761j.qcow2), libaio, bs 4k, dio, hw queues:1, uring_comp: 0, get_data: 0). > randwrite: jobs 1, iops 16086 > randread: jobs 1, iops 172720 > randrw: jobs 1, iops read 35760 write 35702 > rw(512k): jobs 1, iops read 1140 write 1149 > > 3) vdpa-virtio-blk > running debug/test_dev > run io test on specified device > fio (vdpa(/dev/vdc), libaio, bs 4k, dio, hw queues:1)... > randwrite: jobs 1, iops 8626 > randread: jobs 1, iops 126118 > randrw: jobs 1, iops read 17698 write 17665 > rw(512k): jobs 1, iops read 1023 write 1031 > > > 2. run fio on block device over pre-allocated qcow2 image > 1) qemu-nbd > running qcow2/002 > run perf test on pre-allocated qcow2 image via nbd > fio (nbd(/mnt/data/ublk_data_8G_sc0SB.qcow2), libaio, bs 4k, dio, hw queues:1)... > randwrite: jobs 1, iops 21439 > randread: jobs 1, iops 30336 > randrw: jobs 1, iops read 11476 write 11449 > rw(512k): jobs 1, iops read 718 write 722 > > 2) ublk-qcow2 > running qcow2/022 > run perf test on pre-allocated qcow2 image via ublk > fio (ublk/qcow2( -f /mnt/data/ublk_data_8G_yZiaJ.qcow2), libaio, bs 4k, dio, hw queues:1, uring_comp: 0, get_data: 0). > randwrite: jobs 1, iops 98757 > randread: jobs 1, iops 110246 > randrw: jobs 1, iops read 47229 write 47161 > rw(512k): jobs 1, iops read 1416 write 1427 > > 3) vdpa-virtio-blk > running debug/test_dev > run io test on specified device > fio (vdpa(/dev/vdc), libaio, bs 4k, dio, hw queues:1)... > randwrite: jobs 1, iops 47317 > randread: jobs 1, iops 74092 > randrw: jobs 1, iops read 27196 write 27234 > rw(512k): jobs 1, iops read 1447 write 1458 Thanks for including VDUSE results! ublk looks great here and worth considering even in cases where NBD or VDUSE is already being used. Stefan ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: ublk-qcow2: ublk-qcow2 is available 2022-10-05 12:21 ` Stefan Hajnoczi @ 2022-10-05 12:38 ` Denis V. Lunev 2022-10-06 11:24 ` Ming Lei 1 sibling, 0 replies; 22+ messages in thread From: Denis V. Lunev @ 2022-10-05 12:38 UTC (permalink / raw) To: Stefan Hajnoczi, Ming Lei Cc: Stefan Hajnoczi, io-uring, linux-block, linux-kernel, Kirill Tkhai, Manuel Bentele, qemu-devel, Kevin Wolf, rjones, Xie Yongji, Stefano Garzarella, Andrey Zhadchenko On 10/5/22 14:21, Stefan Hajnoczi wrote: > On Wed, 5 Oct 2022 at 00:19, Ming Lei <tom.leiming@gmail.com> wrote: >> On Tue, Oct 04, 2022 at 09:53:32AM -0400, Stefan Hajnoczi wrote: >>> On Tue, 4 Oct 2022 at 05:44, Ming Lei <tom.leiming@gmail.com> wrote: >>>> On Mon, Oct 03, 2022 at 03:53:41PM -0400, Stefan Hajnoczi wrote: >>>>> On Fri, Sep 30, 2022 at 05:24:11PM +0800, Ming Lei wrote: >>>>>> ublk-qcow2 is available now. >>>>> Cool, thanks for sharing! >>>>> >>>>>> So far it provides basic read/write function, and compression and snapshot >>>>>> aren't supported yet. The target/backend implementation is completely >>>>>> based on io_uring, and share the same io_uring with ublk IO command >>>>>> handler, just like what ublk-loop does. >>>>>> >>>>>> Follows the main motivations of ublk-qcow2: >>>>>> >>>>>> - building one complicated target from scratch helps libublksrv APIs/functions >>>>>> become mature/stable more quickly, since qcow2 is complicated and needs more >>>>>> requirement from libublksrv compared with other simple ones(loop, null) >>>>>> >>>>>> - there are several attempts of implementing qcow2 driver in kernel, such as >>>>>> ``qloop`` [2], ``dm-qcow2`` [3] and ``in kernel qcow2(ro)`` [4], so ublk-qcow2 >>>>>> might useful be for covering requirement in this field >>>>>> >>>>>> - performance comparison with qemu-nbd, and it was my 1st thought to evaluate >>>>>> performance of ublk/io_uring backend by writing one ublk-qcow2 since ublksrv >>>>>> is started >>>>>> >>>>>> - help to abstract common building block or design pattern for writing new ublk >>>>>> target/backend >>>>>> >>>>>> So far it basically passes xfstest(XFS) test by using ublk-qcow2 block >>>>>> device as TEST_DEV, and kernel building workload is verified too. Also >>>>>> soft update approach is applied in meta flushing, and meta data >>>>>> integrity is guaranteed, 'make test T=qcow2/040' covers this kind of >>>>>> test, and only cluster leak is reported during this test. >>>>>> >>>>>> The performance data looks much better compared with qemu-nbd, see >>>>>> details in commit log[1], README[5] and STATUS[6]. And the test covers both >>>>>> empty image and pre-allocated image, for example of pre-allocated qcow2 >>>>>> image(8GB): >>>>>> >>>>>> - qemu-nbd (make test T=qcow2/002) >>>>> Single queue? >>>> Yeah. >>>> >>>>>> randwrite(4k): jobs 1, iops 24605 >>>>>> randread(4k): jobs 1, iops 30938 >>>>>> randrw(4k): jobs 1, iops read 13981 write 14001 >>>>>> rw(512k): jobs 1, iops read 724 write 728 >>>>> Please try qemu-storage-daemon's VDUSE export type as well. The >>>>> command-line should be similar to this: >>>>> >>>>> # modprobe virtio_vdpa # attaches vDPA devices to host kernel >>>> Not found virtio_vdpa module even though I enabled all the following >>>> options: >>>> >>>> --- vDPA drivers >>>> <M> vDPA device simulator core >>>> <M> vDPA simulator for networking device >>>> <M> vDPA simulator for block device >>>> <M> VDUSE (vDPA Device in Userspace) support >>>> <M> Intel IFC VF vDPA driver >>>> <M> Virtio PCI bridge vDPA driver >>>> <M> vDPA driver for Alibaba ENI >>>> >>>> BTW, my test environment is VM and the shared data is done in VM too, and >>>> can virtio_vdpa be used inside VM? >>> I hope Xie Yongji can help explain how to benchmark VDUSE. >>> >>> virtio_vdpa is available inside guests too. Please check that >>> VIRTIO_VDPA ("vDPA driver for virtio devices") is enabled in "Virtio >>> drivers" menu. >>> >>>>> # modprobe vduse >>>>> # qemu-storage-daemon \ >>>>> --blockdev file,filename=test.qcow2,cache.direct=of|off,aio=native,node-name=file \ >>>>> --blockdev qcow2,file=file,node-name=qcow2 \ >>>>> --object iothread,id=iothread0 \ >>>>> --export vduse-blk,id=vduse0,name=vduse0,num-queues=$(nproc),node-name=qcow2,writable=on,iothread=iothread0 >>>>> # vdpa dev add name vduse0 mgmtdev vduse >>>>> >>>>> A virtio-blk device should appear and xfstests can be run on it >>>>> (typically /dev/vda unless you already have other virtio-blk devices). >>>>> >>>>> Afterwards you can destroy the device using: >>>>> >>>>> # vdpa dev del vduse0 >>>>> >>>>>> - ublk-qcow2 (make test T=qcow2/022) >>>>> There are a lot of other factors not directly related to NBD vs ublk. In >>>>> order to get an apples-to-apples comparison with qemu-* a ublk export >>>>> type is needed in qemu-storage-daemon. That way only the difference is >>>>> the ublk interface and the rest of the code path is identical, making it >>>>> possible to compare NBD, VDUSE, ublk, etc more precisely. >>>> Maybe not true. >>>> >>>> ublk-qcow2 uses io_uring to handle all backend IO(include meta IO) completely, >>>> and so far single io_uring/pthread is for handling all qcow2 IOs and IO >>>> command. >>> qemu-nbd doesn't use io_uring to handle the backend IO, so we don't >> I tried to use it via --aio=io_uring for setting up qemu-nbd, but not succeed. >> >>> know whether the benchmark demonstrates that ublk is faster than NBD, >>> that the ublk-qcow2 implementation is faster than qemu-nbd's qcow2, >>> whether there are miscellaneous implementation differences between >>> ublk-qcow2 and qemu-nbd (like using the same io_uring context for both >>> ublk and backend IO), or something else. >> The theory shouldn't be too complicated: >> >> 1) io uring passthough(pt) communication is fast than socket, and io command >> is carried over io_uring pt commands, and should be fast than virio >> communication too. >> >> 2) io uring io handling is fast than libaio which is taken in the >> test on qemu-nbd, and all qcow2 backend io(include meta io) is handled >> by io_uring. >> >> https://github.com/ming1/ubdsrv/blob/master/tests/common/qcow2_common >> >> 3) ublk uses one single io_uring to handle all io commands and qcow2 >> backend IOs, so batching handling is common, and it is easy to see >> dozens of IOs/io commands handled in single syscall, or even more. > I agree with the theory but theory has to be tested through > experiments in order to validate it. We can all learn from systematic > performance analysis - there might even be bottlenecks in ublk that > can be solved to improve performance further. > >>> I'm suggesting measuring changes to just 1 variable at a time. >>> Otherwise it's hard to reach a conclusion about the root cause of the >>> performance difference. Let's learn why ublk-qcow2 performs well. >> Turns out the latest Fedora 37-beta doesn't support vdpa yet, so I built >> qemu from the latest github tree, and finally it starts to work. And test kernel >> is v6.0 release. >> >> Follows the test result, and all three devices are setup as single >> queue, and all tests are run in single job, still done in one VM, and >> the test images are stored on XFS/virito-scsi backed SSD. >> >> The 1st group tests all three block device which is backed by empty >> qcow2 image. >> >> The 2nd group tests all the three block devices backed by pre-allocated >> qcow2 image. >> >> Except for big sequential IO(512K), there is still not small gap between >> vdpa-virtio-blk and ublk. >> >> 1. run fio on block device over empty qcow2 image >> 1) qemu-nbd >> running qcow2/001 >> run perf test on empty qcow2 image via nbd >> fio (nbd(/mnt/data/ublk_null_8G_nYbgF.qcow2), libaio, bs 4k, dio, hw queues:1)... >> randwrite: jobs 1, iops 8549 >> randread: jobs 1, iops 34829 >> randrw: jobs 1, iops read 11363 write 11333 >> rw(512k): jobs 1, iops read 590 write 597 >> >> >> 2) ublk-qcow2 >> running qcow2/021 >> run perf test on empty qcow2 image via ublk >> fio (ublk/qcow2( -f /mnt/data/ublk_null_8G_s761j.qcow2), libaio, bs 4k, dio, hw queues:1, uring_comp: 0, get_data: 0). >> randwrite: jobs 1, iops 16086 >> randread: jobs 1, iops 172720 >> randrw: jobs 1, iops read 35760 write 35702 >> rw(512k): jobs 1, iops read 1140 write 1149 >> >> 3) vdpa-virtio-blk >> running debug/test_dev >> run io test on specified device >> fio (vdpa(/dev/vdc), libaio, bs 4k, dio, hw queues:1)... >> randwrite: jobs 1, iops 8626 >> randread: jobs 1, iops 126118 >> randrw: jobs 1, iops read 17698 write 17665 >> rw(512k): jobs 1, iops read 1023 write 1031 >> >> >> 2. run fio on block device over pre-allocated qcow2 image >> 1) qemu-nbd >> running qcow2/002 >> run perf test on pre-allocated qcow2 image via nbd >> fio (nbd(/mnt/data/ublk_data_8G_sc0SB.qcow2), libaio, bs 4k, dio, hw queues:1)... >> randwrite: jobs 1, iops 21439 >> randread: jobs 1, iops 30336 >> randrw: jobs 1, iops read 11476 write 11449 >> rw(512k): jobs 1, iops read 718 write 722 >> >> 2) ublk-qcow2 >> running qcow2/022 >> run perf test on pre-allocated qcow2 image via ublk >> fio (ublk/qcow2( -f /mnt/data/ublk_data_8G_yZiaJ.qcow2), libaio, bs 4k, dio, hw queues:1, uring_comp: 0, get_data: 0). >> randwrite: jobs 1, iops 98757 >> randread: jobs 1, iops 110246 >> randrw: jobs 1, iops read 47229 write 47161 >> rw(512k): jobs 1, iops read 1416 write 1427 >> >> 3) vdpa-virtio-blk >> running debug/test_dev >> run io test on specified device >> fio (vdpa(/dev/vdc), libaio, bs 4k, dio, hw queues:1)... >> randwrite: jobs 1, iops 47317 >> randread: jobs 1, iops 74092 >> randrw: jobs 1, iops read 27196 write 27234 >> rw(512k): jobs 1, iops read 1447 write 1458 > Thanks for including VDUSE results! ublk looks great here and worth > considering even in cases where NBD or VDUSE is already being used. > > Stefan + Andrey Zhadchenko ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: ublk-qcow2: ublk-qcow2 is available 2022-10-05 12:21 ` Stefan Hajnoczi 2022-10-05 12:38 ` Denis V. Lunev @ 2022-10-06 11:24 ` Ming Lei 2022-10-07 10:04 ` Yongji Xie 1 sibling, 1 reply; 22+ messages in thread From: Ming Lei @ 2022-10-06 11:24 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Stefan Hajnoczi, io-uring, linux-block, linux-kernel, Kirill Tkhai, Manuel Bentele, qemu-devel, Kevin Wolf, rjones, Xie Yongji, Denis V. Lunev, Stefano Garzarella On Wed, Oct 05, 2022 at 08:21:45AM -0400, Stefan Hajnoczi wrote: > On Wed, 5 Oct 2022 at 00:19, Ming Lei <tom.leiming@gmail.com> wrote: > > > > On Tue, Oct 04, 2022 at 09:53:32AM -0400, Stefan Hajnoczi wrote: > > > On Tue, 4 Oct 2022 at 05:44, Ming Lei <tom.leiming@gmail.com> wrote: > > > > > > > > On Mon, Oct 03, 2022 at 03:53:41PM -0400, Stefan Hajnoczi wrote: > > > > > On Fri, Sep 30, 2022 at 05:24:11PM +0800, Ming Lei wrote: > > > > > > ublk-qcow2 is available now. > > > > > > > > > > Cool, thanks for sharing! > > > > > > > > > > > > > > > > > So far it provides basic read/write function, and compression and snapshot > > > > > > aren't supported yet. The target/backend implementation is completely > > > > > > based on io_uring, and share the same io_uring with ublk IO command > > > > > > handler, just like what ublk-loop does. > > > > > > > > > > > > Follows the main motivations of ublk-qcow2: > > > > > > > > > > > > - building one complicated target from scratch helps libublksrv APIs/functions > > > > > > become mature/stable more quickly, since qcow2 is complicated and needs more > > > > > > requirement from libublksrv compared with other simple ones(loop, null) > > > > > > > > > > > > - there are several attempts of implementing qcow2 driver in kernel, such as > > > > > > ``qloop`` [2], ``dm-qcow2`` [3] and ``in kernel qcow2(ro)`` [4], so ublk-qcow2 > > > > > > might useful be for covering requirement in this field > > > > > > > > > > > > - performance comparison with qemu-nbd, and it was my 1st thought to evaluate > > > > > > performance of ublk/io_uring backend by writing one ublk-qcow2 since ublksrv > > > > > > is started > > > > > > > > > > > > - help to abstract common building block or design pattern for writing new ublk > > > > > > target/backend > > > > > > > > > > > > So far it basically passes xfstest(XFS) test by using ublk-qcow2 block > > > > > > device as TEST_DEV, and kernel building workload is verified too. Also > > > > > > soft update approach is applied in meta flushing, and meta data > > > > > > integrity is guaranteed, 'make test T=qcow2/040' covers this kind of > > > > > > test, and only cluster leak is reported during this test. > > > > > > > > > > > > The performance data looks much better compared with qemu-nbd, see > > > > > > details in commit log[1], README[5] and STATUS[6]. And the test covers both > > > > > > empty image and pre-allocated image, for example of pre-allocated qcow2 > > > > > > image(8GB): > > > > > > > > > > > > - qemu-nbd (make test T=qcow2/002) > > > > > > > > > > Single queue? > > > > > > > > Yeah. > > > > > > > > > > > > > > > randwrite(4k): jobs 1, iops 24605 > > > > > > randread(4k): jobs 1, iops 30938 > > > > > > randrw(4k): jobs 1, iops read 13981 write 14001 > > > > > > rw(512k): jobs 1, iops read 724 write 728 > > > > > > > > > > Please try qemu-storage-daemon's VDUSE export type as well. The > > > > > command-line should be similar to this: > > > > > > > > > > # modprobe virtio_vdpa # attaches vDPA devices to host kernel > > > > > > > > Not found virtio_vdpa module even though I enabled all the following > > > > options: > > > > > > > > --- vDPA drivers > > > > <M> vDPA device simulator core > > > > <M> vDPA simulator for networking device > > > > <M> vDPA simulator for block device > > > > <M> VDUSE (vDPA Device in Userspace) support > > > > <M> Intel IFC VF vDPA driver > > > > <M> Virtio PCI bridge vDPA driver > > > > <M> vDPA driver for Alibaba ENI > > > > > > > > BTW, my test environment is VM and the shared data is done in VM too, and > > > > can virtio_vdpa be used inside VM? > > > > > > I hope Xie Yongji can help explain how to benchmark VDUSE. > > > > > > virtio_vdpa is available inside guests too. Please check that > > > VIRTIO_VDPA ("vDPA driver for virtio devices") is enabled in "Virtio > > > drivers" menu. > > > > > > > > > > > > # modprobe vduse > > > > > # qemu-storage-daemon \ > > > > > --blockdev file,filename=test.qcow2,cache.direct=of|off,aio=native,node-name=file \ > > > > > --blockdev qcow2,file=file,node-name=qcow2 \ > > > > > --object iothread,id=iothread0 \ > > > > > --export vduse-blk,id=vduse0,name=vduse0,num-queues=$(nproc),node-name=qcow2,writable=on,iothread=iothread0 > > > > > # vdpa dev add name vduse0 mgmtdev vduse > > > > > > > > > > A virtio-blk device should appear and xfstests can be run on it > > > > > (typically /dev/vda unless you already have other virtio-blk devices). > > > > > > > > > > Afterwards you can destroy the device using: > > > > > > > > > > # vdpa dev del vduse0 > > > > > > > > > > > > > > > > > - ublk-qcow2 (make test T=qcow2/022) > > > > > > > > > > There are a lot of other factors not directly related to NBD vs ublk. In > > > > > order to get an apples-to-apples comparison with qemu-* a ublk export > > > > > type is needed in qemu-storage-daemon. That way only the difference is > > > > > the ublk interface and the rest of the code path is identical, making it > > > > > possible to compare NBD, VDUSE, ublk, etc more precisely. > > > > > > > > Maybe not true. > > > > > > > > ublk-qcow2 uses io_uring to handle all backend IO(include meta IO) completely, > > > > and so far single io_uring/pthread is for handling all qcow2 IOs and IO > > > > command. > > > > > > qemu-nbd doesn't use io_uring to handle the backend IO, so we don't > > > > I tried to use it via --aio=io_uring for setting up qemu-nbd, but not succeed. > > > > > know whether the benchmark demonstrates that ublk is faster than NBD, > > > that the ublk-qcow2 implementation is faster than qemu-nbd's qcow2, > > > whether there are miscellaneous implementation differences between > > > ublk-qcow2 and qemu-nbd (like using the same io_uring context for both > > > ublk and backend IO), or something else. > > > > The theory shouldn't be too complicated: > > > > 1) io uring passthough(pt) communication is fast than socket, and io command > > is carried over io_uring pt commands, and should be fast than virio > > communication too. > > > > 2) io uring io handling is fast than libaio which is taken in the > > test on qemu-nbd, and all qcow2 backend io(include meta io) is handled > > by io_uring. > > > > https://github.com/ming1/ubdsrv/blob/master/tests/common/qcow2_common > > > > 3) ublk uses one single io_uring to handle all io commands and qcow2 > > backend IOs, so batching handling is common, and it is easy to see > > dozens of IOs/io commands handled in single syscall, or even more. > > I agree with the theory but theory has to be tested through > experiments in order to validate it. We can all learn from systematic > performance analysis - there might even be bottlenecks in ublk that > can be solved to improve performance further. Indeed, one thing is that ublk uses get user pages to retrieve user pages for copying data, this way may add latency for big chunk IO, since latency of get user pages should be increased linearly by nr_pages. I looked into vduse code a bit too, and vduse still needs the page copy, but lots of bounce pages are allocated and cached in the whole device lifetime, this way can void the latency for retrieving & allocating pages runtime with cost of extra memory consumption. Correct me if it is wrong, Xie Yongji or anyone? ublk has code to deal with device idle, and it may apply the similar cache approach intelligently in future. But I think here the final solution could be applying zero copy for avoiding the big chunk copy, or use hardware engine. Thanks, Ming ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: ublk-qcow2: ublk-qcow2 is available 2022-10-06 11:24 ` Ming Lei @ 2022-10-07 10:04 ` Yongji Xie 2022-10-07 10:51 ` Ming Lei 0 siblings, 1 reply; 22+ messages in thread From: Yongji Xie @ 2022-10-07 10:04 UTC (permalink / raw) To: Ming Lei Cc: Stefan Hajnoczi, Stefan Hajnoczi, io-uring, linux-block, linux-kernel, Kirill Tkhai, Manuel Bentele, qemu-devel, Kevin Wolf, rjones, Denis V. Lunev, Stefano Garzarella On Thu, Oct 6, 2022 at 7:24 PM Ming Lei <tom.leiming@gmail.com> wrote: > > On Wed, Oct 05, 2022 at 08:21:45AM -0400, Stefan Hajnoczi wrote: > > On Wed, 5 Oct 2022 at 00:19, Ming Lei <tom.leiming@gmail.com> wrote: > > > > > > On Tue, Oct 04, 2022 at 09:53:32AM -0400, Stefan Hajnoczi wrote: > > > > On Tue, 4 Oct 2022 at 05:44, Ming Lei <tom.leiming@gmail.com> wrote: > > > > > > > > > > On Mon, Oct 03, 2022 at 03:53:41PM -0400, Stefan Hajnoczi wrote: > > > > > > On Fri, Sep 30, 2022 at 05:24:11PM +0800, Ming Lei wrote: > > > > > > > ublk-qcow2 is available now. > > > > > > > > > > > > Cool, thanks for sharing! > > > > > > > > > > > > > > > > > > > > So far it provides basic read/write function, and compression and snapshot > > > > > > > aren't supported yet. The target/backend implementation is completely > > > > > > > based on io_uring, and share the same io_uring with ublk IO command > > > > > > > handler, just like what ublk-loop does. > > > > > > > > > > > > > > Follows the main motivations of ublk-qcow2: > > > > > > > > > > > > > > - building one complicated target from scratch helps libublksrv APIs/functions > > > > > > > become mature/stable more quickly, since qcow2 is complicated and needs more > > > > > > > requirement from libublksrv compared with other simple ones(loop, null) > > > > > > > > > > > > > > - there are several attempts of implementing qcow2 driver in kernel, such as > > > > > > > ``qloop`` [2], ``dm-qcow2`` [3] and ``in kernel qcow2(ro)`` [4], so ublk-qcow2 > > > > > > > might useful be for covering requirement in this field > > > > > > > > > > > > > > - performance comparison with qemu-nbd, and it was my 1st thought to evaluate > > > > > > > performance of ublk/io_uring backend by writing one ublk-qcow2 since ublksrv > > > > > > > is started > > > > > > > > > > > > > > - help to abstract common building block or design pattern for writing new ublk > > > > > > > target/backend > > > > > > > > > > > > > > So far it basically passes xfstest(XFS) test by using ublk-qcow2 block > > > > > > > device as TEST_DEV, and kernel building workload is verified too. Also > > > > > > > soft update approach is applied in meta flushing, and meta data > > > > > > > integrity is guaranteed, 'make test T=qcow2/040' covers this kind of > > > > > > > test, and only cluster leak is reported during this test. > > > > > > > > > > > > > > The performance data looks much better compared with qemu-nbd, see > > > > > > > details in commit log[1], README[5] and STATUS[6]. And the test covers both > > > > > > > empty image and pre-allocated image, for example of pre-allocated qcow2 > > > > > > > image(8GB): > > > > > > > > > > > > > > - qemu-nbd (make test T=qcow2/002) > > > > > > > > > > > > Single queue? > > > > > > > > > > Yeah. > > > > > > > > > > > > > > > > > > randwrite(4k): jobs 1, iops 24605 > > > > > > > randread(4k): jobs 1, iops 30938 > > > > > > > randrw(4k): jobs 1, iops read 13981 write 14001 > > > > > > > rw(512k): jobs 1, iops read 724 write 728 > > > > > > > > > > > > Please try qemu-storage-daemon's VDUSE export type as well. The > > > > > > command-line should be similar to this: > > > > > > > > > > > > # modprobe virtio_vdpa # attaches vDPA devices to host kernel > > > > > > > > > > Not found virtio_vdpa module even though I enabled all the following > > > > > options: > > > > > > > > > > --- vDPA drivers > > > > > <M> vDPA device simulator core > > > > > <M> vDPA simulator for networking device > > > > > <M> vDPA simulator for block device > > > > > <M> VDUSE (vDPA Device in Userspace) support > > > > > <M> Intel IFC VF vDPA driver > > > > > <M> Virtio PCI bridge vDPA driver > > > > > <M> vDPA driver for Alibaba ENI > > > > > > > > > > BTW, my test environment is VM and the shared data is done in VM too, and > > > > > can virtio_vdpa be used inside VM? > > > > > > > > I hope Xie Yongji can help explain how to benchmark VDUSE. > > > > > > > > virtio_vdpa is available inside guests too. Please check that > > > > VIRTIO_VDPA ("vDPA driver for virtio devices") is enabled in "Virtio > > > > drivers" menu. > > > > > > > > > > > > > > > # modprobe vduse > > > > > > # qemu-storage-daemon \ > > > > > > --blockdev file,filename=test.qcow2,cache.direct=of|off,aio=native,node-name=file \ > > > > > > --blockdev qcow2,file=file,node-name=qcow2 \ > > > > > > --object iothread,id=iothread0 \ > > > > > > --export vduse-blk,id=vduse0,name=vduse0,num-queues=$(nproc),node-name=qcow2,writable=on,iothread=iothread0 > > > > > > # vdpa dev add name vduse0 mgmtdev vduse > > > > > > > > > > > > A virtio-blk device should appear and xfstests can be run on it > > > > > > (typically /dev/vda unless you already have other virtio-blk devices). > > > > > > > > > > > > Afterwards you can destroy the device using: > > > > > > > > > > > > # vdpa dev del vduse0 > > > > > > > > > > > > > > > > > > > > - ublk-qcow2 (make test T=qcow2/022) > > > > > > > > > > > > There are a lot of other factors not directly related to NBD vs ublk. In > > > > > > order to get an apples-to-apples comparison with qemu-* a ublk export > > > > > > type is needed in qemu-storage-daemon. That way only the difference is > > > > > > the ublk interface and the rest of the code path is identical, making it > > > > > > possible to compare NBD, VDUSE, ublk, etc more precisely. > > > > > > > > > > Maybe not true. > > > > > > > > > > ublk-qcow2 uses io_uring to handle all backend IO(include meta IO) completely, > > > > > and so far single io_uring/pthread is for handling all qcow2 IOs and IO > > > > > command. > > > > > > > > qemu-nbd doesn't use io_uring to handle the backend IO, so we don't > > > > > > I tried to use it via --aio=io_uring for setting up qemu-nbd, but not succeed. > > > > > > > know whether the benchmark demonstrates that ublk is faster than NBD, > > > > that the ublk-qcow2 implementation is faster than qemu-nbd's qcow2, > > > > whether there are miscellaneous implementation differences between > > > > ublk-qcow2 and qemu-nbd (like using the same io_uring context for both > > > > ublk and backend IO), or something else. > > > > > > The theory shouldn't be too complicated: > > > > > > 1) io uring passthough(pt) communication is fast than socket, and io command > > > is carried over io_uring pt commands, and should be fast than virio > > > communication too. > > > > > > 2) io uring io handling is fast than libaio which is taken in the > > > test on qemu-nbd, and all qcow2 backend io(include meta io) is handled > > > by io_uring. > > > > > > https://github.com/ming1/ubdsrv/blob/master/tests/common/qcow2_common > > > > > > 3) ublk uses one single io_uring to handle all io commands and qcow2 > > > backend IOs, so batching handling is common, and it is easy to see > > > dozens of IOs/io commands handled in single syscall, or even more. > > > > I agree with the theory but theory has to be tested through > > experiments in order to validate it. We can all learn from systematic > > performance analysis - there might even be bottlenecks in ublk that > > can be solved to improve performance further. > > Indeed, one thing is that ublk uses get user pages to retrieve user pages > for copying data, this way may add latency for big chunk IO, since > latency of get user pages should be increased linearly by nr_pages. > > I looked into vduse code a bit too, and vduse still needs the page copy, > but lots of bounce pages are allocated and cached in the whole device > lifetime, this way can void the latency for retrieving & allocating > pages runtime with cost of extra memory consumption. Correct me > if it is wrong, Xie Yongji or anyone? > Yes, you are right. Another way is registering the preallocated userspace memory as bounce buffer. Thanks, Yongji ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: ublk-qcow2: ublk-qcow2 is available 2022-10-07 10:04 ` Yongji Xie @ 2022-10-07 10:51 ` Ming Lei 2022-10-07 11:21 ` Yongji Xie 0 siblings, 1 reply; 22+ messages in thread From: Ming Lei @ 2022-10-07 10:51 UTC (permalink / raw) To: Yongji Xie Cc: Stefan Hajnoczi, Stefan Hajnoczi, io-uring, linux-block, linux-kernel, Kirill Tkhai, Manuel Bentele, qemu-devel, Kevin Wolf, rjones, Denis V. Lunev, Stefano Garzarella On Fri, Oct 07, 2022 at 06:04:29PM +0800, Yongji Xie wrote: > On Thu, Oct 6, 2022 at 7:24 PM Ming Lei <tom.leiming@gmail.com> wrote: > > > > On Wed, Oct 05, 2022 at 08:21:45AM -0400, Stefan Hajnoczi wrote: > > > On Wed, 5 Oct 2022 at 00:19, Ming Lei <tom.leiming@gmail.com> wrote: > > > > > > > > On Tue, Oct 04, 2022 at 09:53:32AM -0400, Stefan Hajnoczi wrote: > > > > > On Tue, 4 Oct 2022 at 05:44, Ming Lei <tom.leiming@gmail.com> wrote: > > > > > > > > > > > > On Mon, Oct 03, 2022 at 03:53:41PM -0400, Stefan Hajnoczi wrote: > > > > > > > On Fri, Sep 30, 2022 at 05:24:11PM +0800, Ming Lei wrote: > > > > > > > > ublk-qcow2 is available now. > > > > > > > > > > > > > > Cool, thanks for sharing! > > > > > > > > > > > > > > > > > > > > > > > So far it provides basic read/write function, and compression and snapshot > > > > > > > > aren't supported yet. The target/backend implementation is completely > > > > > > > > based on io_uring, and share the same io_uring with ublk IO command > > > > > > > > handler, just like what ublk-loop does. > > > > > > > > > > > > > > > > Follows the main motivations of ublk-qcow2: > > > > > > > > > > > > > > > > - building one complicated target from scratch helps libublksrv APIs/functions > > > > > > > > become mature/stable more quickly, since qcow2 is complicated and needs more > > > > > > > > requirement from libublksrv compared with other simple ones(loop, null) > > > > > > > > > > > > > > > > - there are several attempts of implementing qcow2 driver in kernel, such as > > > > > > > > ``qloop`` [2], ``dm-qcow2`` [3] and ``in kernel qcow2(ro)`` [4], so ublk-qcow2 > > > > > > > > might useful be for covering requirement in this field > > > > > > > > > > > > > > > > - performance comparison with qemu-nbd, and it was my 1st thought to evaluate > > > > > > > > performance of ublk/io_uring backend by writing one ublk-qcow2 since ublksrv > > > > > > > > is started > > > > > > > > > > > > > > > > - help to abstract common building block or design pattern for writing new ublk > > > > > > > > target/backend > > > > > > > > > > > > > > > > So far it basically passes xfstest(XFS) test by using ublk-qcow2 block > > > > > > > > device as TEST_DEV, and kernel building workload is verified too. Also > > > > > > > > soft update approach is applied in meta flushing, and meta data > > > > > > > > integrity is guaranteed, 'make test T=qcow2/040' covers this kind of > > > > > > > > test, and only cluster leak is reported during this test. > > > > > > > > > > > > > > > > The performance data looks much better compared with qemu-nbd, see > > > > > > > > details in commit log[1], README[5] and STATUS[6]. And the test covers both > > > > > > > > empty image and pre-allocated image, for example of pre-allocated qcow2 > > > > > > > > image(8GB): > > > > > > > > > > > > > > > > - qemu-nbd (make test T=qcow2/002) > > > > > > > > > > > > > > Single queue? > > > > > > > > > > > > Yeah. > > > > > > > > > > > > > > > > > > > > > randwrite(4k): jobs 1, iops 24605 > > > > > > > > randread(4k): jobs 1, iops 30938 > > > > > > > > randrw(4k): jobs 1, iops read 13981 write 14001 > > > > > > > > rw(512k): jobs 1, iops read 724 write 728 > > > > > > > > > > > > > > Please try qemu-storage-daemon's VDUSE export type as well. The > > > > > > > command-line should be similar to this: > > > > > > > > > > > > > > # modprobe virtio_vdpa # attaches vDPA devices to host kernel > > > > > > > > > > > > Not found virtio_vdpa module even though I enabled all the following > > > > > > options: > > > > > > > > > > > > --- vDPA drivers > > > > > > <M> vDPA device simulator core > > > > > > <M> vDPA simulator for networking device > > > > > > <M> vDPA simulator for block device > > > > > > <M> VDUSE (vDPA Device in Userspace) support > > > > > > <M> Intel IFC VF vDPA driver > > > > > > <M> Virtio PCI bridge vDPA driver > > > > > > <M> vDPA driver for Alibaba ENI > > > > > > > > > > > > BTW, my test environment is VM and the shared data is done in VM too, and > > > > > > can virtio_vdpa be used inside VM? > > > > > > > > > > I hope Xie Yongji can help explain how to benchmark VDUSE. > > > > > > > > > > virtio_vdpa is available inside guests too. Please check that > > > > > VIRTIO_VDPA ("vDPA driver for virtio devices") is enabled in "Virtio > > > > > drivers" menu. > > > > > > > > > > > > > > > > > > # modprobe vduse > > > > > > > # qemu-storage-daemon \ > > > > > > > --blockdev file,filename=test.qcow2,cache.direct=of|off,aio=native,node-name=file \ > > > > > > > --blockdev qcow2,file=file,node-name=qcow2 \ > > > > > > > --object iothread,id=iothread0 \ > > > > > > > --export vduse-blk,id=vduse0,name=vduse0,num-queues=$(nproc),node-name=qcow2,writable=on,iothread=iothread0 > > > > > > > # vdpa dev add name vduse0 mgmtdev vduse > > > > > > > > > > > > > > A virtio-blk device should appear and xfstests can be run on it > > > > > > > (typically /dev/vda unless you already have other virtio-blk devices). > > > > > > > > > > > > > > Afterwards you can destroy the device using: > > > > > > > > > > > > > > # vdpa dev del vduse0 > > > > > > > > > > > > > > > > > > > > > > > - ublk-qcow2 (make test T=qcow2/022) > > > > > > > > > > > > > > There are a lot of other factors not directly related to NBD vs ublk. In > > > > > > > order to get an apples-to-apples comparison with qemu-* a ublk export > > > > > > > type is needed in qemu-storage-daemon. That way only the difference is > > > > > > > the ublk interface and the rest of the code path is identical, making it > > > > > > > possible to compare NBD, VDUSE, ublk, etc more precisely. > > > > > > > > > > > > Maybe not true. > > > > > > > > > > > > ublk-qcow2 uses io_uring to handle all backend IO(include meta IO) completely, > > > > > > and so far single io_uring/pthread is for handling all qcow2 IOs and IO > > > > > > command. > > > > > > > > > > qemu-nbd doesn't use io_uring to handle the backend IO, so we don't > > > > > > > > I tried to use it via --aio=io_uring for setting up qemu-nbd, but not succeed. > > > > > > > > > know whether the benchmark demonstrates that ublk is faster than NBD, > > > > > that the ublk-qcow2 implementation is faster than qemu-nbd's qcow2, > > > > > whether there are miscellaneous implementation differences between > > > > > ublk-qcow2 and qemu-nbd (like using the same io_uring context for both > > > > > ublk and backend IO), or something else. > > > > > > > > The theory shouldn't be too complicated: > > > > > > > > 1) io uring passthough(pt) communication is fast than socket, and io command > > > > is carried over io_uring pt commands, and should be fast than virio > > > > communication too. > > > > > > > > 2) io uring io handling is fast than libaio which is taken in the > > > > test on qemu-nbd, and all qcow2 backend io(include meta io) is handled > > > > by io_uring. > > > > > > > > https://github.com/ming1/ubdsrv/blob/master/tests/common/qcow2_common > > > > > > > > 3) ublk uses one single io_uring to handle all io commands and qcow2 > > > > backend IOs, so batching handling is common, and it is easy to see > > > > dozens of IOs/io commands handled in single syscall, or even more. > > > > > > I agree with the theory but theory has to be tested through > > > experiments in order to validate it. We can all learn from systematic > > > performance analysis - there might even be bottlenecks in ublk that > > > can be solved to improve performance further. > > > > Indeed, one thing is that ublk uses get user pages to retrieve user pages > > for copying data, this way may add latency for big chunk IO, since > > latency of get user pages should be increased linearly by nr_pages. > > > > I looked into vduse code a bit too, and vduse still needs the page copy, > > but lots of bounce pages are allocated and cached in the whole device > > lifetime, this way can void the latency for retrieving & allocating > > pages runtime with cost of extra memory consumption. Correct me > > if it is wrong, Xie Yongji or anyone? > > > > Yes, you are right. Another way is registering the preallocated > userspace memory as bounce buffer. Thanks for the clarification. IMO, the pages consumption is too much for vduse, each vdpa device has one vduse_iova_domain which may allocate 64K bounce pages at most, and these pages won't be freed until freeing the device. But it is one solution for implementing generic userspace device(not limit to block device), and this idea seems great. Thanks, Ming ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: ublk-qcow2: ublk-qcow2 is available 2022-10-07 10:51 ` Ming Lei @ 2022-10-07 11:21 ` Yongji Xie 2022-10-07 11:23 ` Ming Lei 0 siblings, 1 reply; 22+ messages in thread From: Yongji Xie @ 2022-10-07 11:21 UTC (permalink / raw) To: Ming Lei Cc: Stefan Hajnoczi, Stefan Hajnoczi, io-uring, linux-block, linux-kernel, Kirill Tkhai, Manuel Bentele, qemu-devel, Kevin Wolf, rjones, Denis V. Lunev, Stefano Garzarella On Fri, Oct 7, 2022 at 6:51 PM Ming Lei <tom.leiming@gmail.com> wrote: > > On Fri, Oct 07, 2022 at 06:04:29PM +0800, Yongji Xie wrote: > > On Thu, Oct 6, 2022 at 7:24 PM Ming Lei <tom.leiming@gmail.com> wrote: > > > > > > On Wed, Oct 05, 2022 at 08:21:45AM -0400, Stefan Hajnoczi wrote: > > > > On Wed, 5 Oct 2022 at 00:19, Ming Lei <tom.leiming@gmail.com> wrote: > > > > > > > > > > On Tue, Oct 04, 2022 at 09:53:32AM -0400, Stefan Hajnoczi wrote: > > > > > > On Tue, 4 Oct 2022 at 05:44, Ming Lei <tom.leiming@gmail.com> wrote: > > > > > > > > > > > > > > On Mon, Oct 03, 2022 at 03:53:41PM -0400, Stefan Hajnoczi wrote: > > > > > > > > On Fri, Sep 30, 2022 at 05:24:11PM +0800, Ming Lei wrote: > > > > > > > > > ublk-qcow2 is available now. > > > > > > > > > > > > > > > > Cool, thanks for sharing! > > > > > > > > > > > > > > > > > > > > > > > > > > So far it provides basic read/write function, and compression and snapshot > > > > > > > > > aren't supported yet. The target/backend implementation is completely > > > > > > > > > based on io_uring, and share the same io_uring with ublk IO command > > > > > > > > > handler, just like what ublk-loop does. > > > > > > > > > > > > > > > > > > Follows the main motivations of ublk-qcow2: > > > > > > > > > > > > > > > > > > - building one complicated target from scratch helps libublksrv APIs/functions > > > > > > > > > become mature/stable more quickly, since qcow2 is complicated and needs more > > > > > > > > > requirement from libublksrv compared with other simple ones(loop, null) > > > > > > > > > > > > > > > > > > - there are several attempts of implementing qcow2 driver in kernel, such as > > > > > > > > > ``qloop`` [2], ``dm-qcow2`` [3] and ``in kernel qcow2(ro)`` [4], so ublk-qcow2 > > > > > > > > > might useful be for covering requirement in this field > > > > > > > > > > > > > > > > > > - performance comparison with qemu-nbd, and it was my 1st thought to evaluate > > > > > > > > > performance of ublk/io_uring backend by writing one ublk-qcow2 since ublksrv > > > > > > > > > is started > > > > > > > > > > > > > > > > > > - help to abstract common building block or design pattern for writing new ublk > > > > > > > > > target/backend > > > > > > > > > > > > > > > > > > So far it basically passes xfstest(XFS) test by using ublk-qcow2 block > > > > > > > > > device as TEST_DEV, and kernel building workload is verified too. Also > > > > > > > > > soft update approach is applied in meta flushing, and meta data > > > > > > > > > integrity is guaranteed, 'make test T=qcow2/040' covers this kind of > > > > > > > > > test, and only cluster leak is reported during this test. > > > > > > > > > > > > > > > > > > The performance data looks much better compared with qemu-nbd, see > > > > > > > > > details in commit log[1], README[5] and STATUS[6]. And the test covers both > > > > > > > > > empty image and pre-allocated image, for example of pre-allocated qcow2 > > > > > > > > > image(8GB): > > > > > > > > > > > > > > > > > > - qemu-nbd (make test T=qcow2/002) > > > > > > > > > > > > > > > > Single queue? > > > > > > > > > > > > > > Yeah. > > > > > > > > > > > > > > > > > > > > > > > > randwrite(4k): jobs 1, iops 24605 > > > > > > > > > randread(4k): jobs 1, iops 30938 > > > > > > > > > randrw(4k): jobs 1, iops read 13981 write 14001 > > > > > > > > > rw(512k): jobs 1, iops read 724 write 728 > > > > > > > > > > > > > > > > Please try qemu-storage-daemon's VDUSE export type as well. The > > > > > > > > command-line should be similar to this: > > > > > > > > > > > > > > > > # modprobe virtio_vdpa # attaches vDPA devices to host kernel > > > > > > > > > > > > > > Not found virtio_vdpa module even though I enabled all the following > > > > > > > options: > > > > > > > > > > > > > > --- vDPA drivers > > > > > > > <M> vDPA device simulator core > > > > > > > <M> vDPA simulator for networking device > > > > > > > <M> vDPA simulator for block device > > > > > > > <M> VDUSE (vDPA Device in Userspace) support > > > > > > > <M> Intel IFC VF vDPA driver > > > > > > > <M> Virtio PCI bridge vDPA driver > > > > > > > <M> vDPA driver for Alibaba ENI > > > > > > > > > > > > > > BTW, my test environment is VM and the shared data is done in VM too, and > > > > > > > can virtio_vdpa be used inside VM? > > > > > > > > > > > > I hope Xie Yongji can help explain how to benchmark VDUSE. > > > > > > > > > > > > virtio_vdpa is available inside guests too. Please check that > > > > > > VIRTIO_VDPA ("vDPA driver for virtio devices") is enabled in "Virtio > > > > > > drivers" menu. > > > > > > > > > > > > > > > > > > > > > # modprobe vduse > > > > > > > > # qemu-storage-daemon \ > > > > > > > > --blockdev file,filename=test.qcow2,cache.direct=of|off,aio=native,node-name=file \ > > > > > > > > --blockdev qcow2,file=file,node-name=qcow2 \ > > > > > > > > --object iothread,id=iothread0 \ > > > > > > > > --export vduse-blk,id=vduse0,name=vduse0,num-queues=$(nproc),node-name=qcow2,writable=on,iothread=iothread0 > > > > > > > > # vdpa dev add name vduse0 mgmtdev vduse > > > > > > > > > > > > > > > > A virtio-blk device should appear and xfstests can be run on it > > > > > > > > (typically /dev/vda unless you already have other virtio-blk devices). > > > > > > > > > > > > > > > > Afterwards you can destroy the device using: > > > > > > > > > > > > > > > > # vdpa dev del vduse0 > > > > > > > > > > > > > > > > > > > > > > > > > > - ublk-qcow2 (make test T=qcow2/022) > > > > > > > > > > > > > > > > There are a lot of other factors not directly related to NBD vs ublk. In > > > > > > > > order to get an apples-to-apples comparison with qemu-* a ublk export > > > > > > > > type is needed in qemu-storage-daemon. That way only the difference is > > > > > > > > the ublk interface and the rest of the code path is identical, making it > > > > > > > > possible to compare NBD, VDUSE, ublk, etc more precisely. > > > > > > > > > > > > > > Maybe not true. > > > > > > > > > > > > > > ublk-qcow2 uses io_uring to handle all backend IO(include meta IO) completely, > > > > > > > and so far single io_uring/pthread is for handling all qcow2 IOs and IO > > > > > > > command. > > > > > > > > > > > > qemu-nbd doesn't use io_uring to handle the backend IO, so we don't > > > > > > > > > > I tried to use it via --aio=io_uring for setting up qemu-nbd, but not succeed. > > > > > > > > > > > know whether the benchmark demonstrates that ublk is faster than NBD, > > > > > > that the ublk-qcow2 implementation is faster than qemu-nbd's qcow2, > > > > > > whether there are miscellaneous implementation differences between > > > > > > ublk-qcow2 and qemu-nbd (like using the same io_uring context for both > > > > > > ublk and backend IO), or something else. > > > > > > > > > > The theory shouldn't be too complicated: > > > > > > > > > > 1) io uring passthough(pt) communication is fast than socket, and io command > > > > > is carried over io_uring pt commands, and should be fast than virio > > > > > communication too. > > > > > > > > > > 2) io uring io handling is fast than libaio which is taken in the > > > > > test on qemu-nbd, and all qcow2 backend io(include meta io) is handled > > > > > by io_uring. > > > > > > > > > > https://github.com/ming1/ubdsrv/blob/master/tests/common/qcow2_common > > > > > > > > > > 3) ublk uses one single io_uring to handle all io commands and qcow2 > > > > > backend IOs, so batching handling is common, and it is easy to see > > > > > dozens of IOs/io commands handled in single syscall, or even more. > > > > > > > > I agree with the theory but theory has to be tested through > > > > experiments in order to validate it. We can all learn from systematic > > > > performance analysis - there might even be bottlenecks in ublk that > > > > can be solved to improve performance further. > > > > > > Indeed, one thing is that ublk uses get user pages to retrieve user pages > > > for copying data, this way may add latency for big chunk IO, since > > > latency of get user pages should be increased linearly by nr_pages. > > > > > > I looked into vduse code a bit too, and vduse still needs the page copy, > > > but lots of bounce pages are allocated and cached in the whole device > > > lifetime, this way can void the latency for retrieving & allocating > > > pages runtime with cost of extra memory consumption. Correct me > > > if it is wrong, Xie Yongji or anyone? > > > > > > > Yes, you are right. Another way is registering the preallocated > > userspace memory as bounce buffer. > > Thanks for the clarification. > > IMO, the pages consumption is too much for vduse, each vdpa device > has one vduse_iova_domain which may allocate 64K bounce pages at most, > and these pages won't be freed until freeing the device. > Yes, actually in our initial design, this can be mitigated by some memory reclaim mechanism and zero copy support. Even we can let multiple vdpa device share one iova domain. Thanks, Yongji > But it is one solution for implementing generic userspace device(not > limit to block device), and this idea seems great. > > > > > Thanks, > Ming ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: ublk-qcow2: ublk-qcow2 is available 2022-10-07 11:21 ` Yongji Xie @ 2022-10-07 11:23 ` Ming Lei 0 siblings, 0 replies; 22+ messages in thread From: Ming Lei @ 2022-10-07 11:23 UTC (permalink / raw) To: Yongji Xie Cc: Stefan Hajnoczi, Stefan Hajnoczi, io-uring, linux-block, linux-kernel, Kirill Tkhai, Manuel Bentele, qemu-devel, Kevin Wolf, rjones, Denis V. Lunev, Stefano Garzarella On Fri, Oct 07, 2022 at 07:21:51PM +0800, Yongji Xie wrote: > On Fri, Oct 7, 2022 at 6:51 PM Ming Lei <tom.leiming@gmail.com> wrote: > > > > On Fri, Oct 07, 2022 at 06:04:29PM +0800, Yongji Xie wrote: > > > On Thu, Oct 6, 2022 at 7:24 PM Ming Lei <tom.leiming@gmail.com> wrote: > > > > > > > > On Wed, Oct 05, 2022 at 08:21:45AM -0400, Stefan Hajnoczi wrote: > > > > > On Wed, 5 Oct 2022 at 00:19, Ming Lei <tom.leiming@gmail.com> wrote: > > > > > > > > > > > > On Tue, Oct 04, 2022 at 09:53:32AM -0400, Stefan Hajnoczi wrote: > > > > > > > On Tue, 4 Oct 2022 at 05:44, Ming Lei <tom.leiming@gmail.com> wrote: > > > > > > > > > > > > > > > > On Mon, Oct 03, 2022 at 03:53:41PM -0400, Stefan Hajnoczi wrote: > > > > > > > > > On Fri, Sep 30, 2022 at 05:24:11PM +0800, Ming Lei wrote: > > > > > > > > > > ublk-qcow2 is available now. > > > > > > > > > > > > > > > > > > Cool, thanks for sharing! > > > > > > > > > > > > > > > > > > > > > > > > > > > > > So far it provides basic read/write function, and compression and snapshot > > > > > > > > > > aren't supported yet. The target/backend implementation is completely > > > > > > > > > > based on io_uring, and share the same io_uring with ublk IO command > > > > > > > > > > handler, just like what ublk-loop does. > > > > > > > > > > > > > > > > > > > > Follows the main motivations of ublk-qcow2: > > > > > > > > > > > > > > > > > > > > - building one complicated target from scratch helps libublksrv APIs/functions > > > > > > > > > > become mature/stable more quickly, since qcow2 is complicated and needs more > > > > > > > > > > requirement from libublksrv compared with other simple ones(loop, null) > > > > > > > > > > > > > > > > > > > > - there are several attempts of implementing qcow2 driver in kernel, such as > > > > > > > > > > ``qloop`` [2], ``dm-qcow2`` [3] and ``in kernel qcow2(ro)`` [4], so ublk-qcow2 > > > > > > > > > > might useful be for covering requirement in this field > > > > > > > > > > > > > > > > > > > > - performance comparison with qemu-nbd, and it was my 1st thought to evaluate > > > > > > > > > > performance of ublk/io_uring backend by writing one ublk-qcow2 since ublksrv > > > > > > > > > > is started > > > > > > > > > > > > > > > > > > > > - help to abstract common building block or design pattern for writing new ublk > > > > > > > > > > target/backend > > > > > > > > > > > > > > > > > > > > So far it basically passes xfstest(XFS) test by using ublk-qcow2 block > > > > > > > > > > device as TEST_DEV, and kernel building workload is verified too. Also > > > > > > > > > > soft update approach is applied in meta flushing, and meta data > > > > > > > > > > integrity is guaranteed, 'make test T=qcow2/040' covers this kind of > > > > > > > > > > test, and only cluster leak is reported during this test. > > > > > > > > > > > > > > > > > > > > The performance data looks much better compared with qemu-nbd, see > > > > > > > > > > details in commit log[1], README[5] and STATUS[6]. And the test covers both > > > > > > > > > > empty image and pre-allocated image, for example of pre-allocated qcow2 > > > > > > > > > > image(8GB): > > > > > > > > > > > > > > > > > > > > - qemu-nbd (make test T=qcow2/002) > > > > > > > > > > > > > > > > > > Single queue? > > > > > > > > > > > > > > > > Yeah. > > > > > > > > > > > > > > > > > > > > > > > > > > > randwrite(4k): jobs 1, iops 24605 > > > > > > > > > > randread(4k): jobs 1, iops 30938 > > > > > > > > > > randrw(4k): jobs 1, iops read 13981 write 14001 > > > > > > > > > > rw(512k): jobs 1, iops read 724 write 728 > > > > > > > > > > > > > > > > > > Please try qemu-storage-daemon's VDUSE export type as well. The > > > > > > > > > command-line should be similar to this: > > > > > > > > > > > > > > > > > > # modprobe virtio_vdpa # attaches vDPA devices to host kernel > > > > > > > > > > > > > > > > Not found virtio_vdpa module even though I enabled all the following > > > > > > > > options: > > > > > > > > > > > > > > > > --- vDPA drivers > > > > > > > > <M> vDPA device simulator core > > > > > > > > <M> vDPA simulator for networking device > > > > > > > > <M> vDPA simulator for block device > > > > > > > > <M> VDUSE (vDPA Device in Userspace) support > > > > > > > > <M> Intel IFC VF vDPA driver > > > > > > > > <M> Virtio PCI bridge vDPA driver > > > > > > > > <M> vDPA driver for Alibaba ENI > > > > > > > > > > > > > > > > BTW, my test environment is VM and the shared data is done in VM too, and > > > > > > > > can virtio_vdpa be used inside VM? > > > > > > > > > > > > > > I hope Xie Yongji can help explain how to benchmark VDUSE. > > > > > > > > > > > > > > virtio_vdpa is available inside guests too. Please check that > > > > > > > VIRTIO_VDPA ("vDPA driver for virtio devices") is enabled in "Virtio > > > > > > > drivers" menu. > > > > > > > > > > > > > > > > > > > > > > > > # modprobe vduse > > > > > > > > > # qemu-storage-daemon \ > > > > > > > > > --blockdev file,filename=test.qcow2,cache.direct=of|off,aio=native,node-name=file \ > > > > > > > > > --blockdev qcow2,file=file,node-name=qcow2 \ > > > > > > > > > --object iothread,id=iothread0 \ > > > > > > > > > --export vduse-blk,id=vduse0,name=vduse0,num-queues=$(nproc),node-name=qcow2,writable=on,iothread=iothread0 > > > > > > > > > # vdpa dev add name vduse0 mgmtdev vduse > > > > > > > > > > > > > > > > > > A virtio-blk device should appear and xfstests can be run on it > > > > > > > > > (typically /dev/vda unless you already have other virtio-blk devices). > > > > > > > > > > > > > > > > > > Afterwards you can destroy the device using: > > > > > > > > > > > > > > > > > > # vdpa dev del vduse0 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > - ublk-qcow2 (make test T=qcow2/022) > > > > > > > > > > > > > > > > > > There are a lot of other factors not directly related to NBD vs ublk. In > > > > > > > > > order to get an apples-to-apples comparison with qemu-* a ublk export > > > > > > > > > type is needed in qemu-storage-daemon. That way only the difference is > > > > > > > > > the ublk interface and the rest of the code path is identical, making it > > > > > > > > > possible to compare NBD, VDUSE, ublk, etc more precisely. > > > > > > > > > > > > > > > > Maybe not true. > > > > > > > > > > > > > > > > ublk-qcow2 uses io_uring to handle all backend IO(include meta IO) completely, > > > > > > > > and so far single io_uring/pthread is for handling all qcow2 IOs and IO > > > > > > > > command. > > > > > > > > > > > > > > qemu-nbd doesn't use io_uring to handle the backend IO, so we don't > > > > > > > > > > > > I tried to use it via --aio=io_uring for setting up qemu-nbd, but not succeed. > > > > > > > > > > > > > know whether the benchmark demonstrates that ublk is faster than NBD, > > > > > > > that the ublk-qcow2 implementation is faster than qemu-nbd's qcow2, > > > > > > > whether there are miscellaneous implementation differences between > > > > > > > ublk-qcow2 and qemu-nbd (like using the same io_uring context for both > > > > > > > ublk and backend IO), or something else. > > > > > > > > > > > > The theory shouldn't be too complicated: > > > > > > > > > > > > 1) io uring passthough(pt) communication is fast than socket, and io command > > > > > > is carried over io_uring pt commands, and should be fast than virio > > > > > > communication too. > > > > > > > > > > > > 2) io uring io handling is fast than libaio which is taken in the > > > > > > test on qemu-nbd, and all qcow2 backend io(include meta io) is handled > > > > > > by io_uring. > > > > > > > > > > > > https://github.com/ming1/ubdsrv/blob/master/tests/common/qcow2_common > > > > > > > > > > > > 3) ublk uses one single io_uring to handle all io commands and qcow2 > > > > > > backend IOs, so batching handling is common, and it is easy to see > > > > > > dozens of IOs/io commands handled in single syscall, or even more. > > > > > > > > > > I agree with the theory but theory has to be tested through > > > > > experiments in order to validate it. We can all learn from systematic > > > > > performance analysis - there might even be bottlenecks in ublk that > > > > > can be solved to improve performance further. > > > > > > > > Indeed, one thing is that ublk uses get user pages to retrieve user pages > > > > for copying data, this way may add latency for big chunk IO, since > > > > latency of get user pages should be increased linearly by nr_pages. > > > > > > > > I looked into vduse code a bit too, and vduse still needs the page copy, > > > > but lots of bounce pages are allocated and cached in the whole device > > > > lifetime, this way can void the latency for retrieving & allocating > > > > pages runtime with cost of extra memory consumption. Correct me > > > > if it is wrong, Xie Yongji or anyone? > > > > > > > > > > Yes, you are right. Another way is registering the preallocated > > > userspace memory as bounce buffer. > > > > Thanks for the clarification. > > > > IMO, the pages consumption is too much for vduse, each vdpa device > > has one vduse_iova_domain which may allocate 64K bounce pages at most, > > and these pages won't be freed until freeing the device. > > > > Yes, actually in our initial design, this can be mitigated by some > memory reclaim mechanism and zero copy support. Even we can let > multiple vdpa device share one iova domain. I think zero copy is great, especially for big chunk IO request. Thanks, Ming ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: ublk-qcow2: ublk-qcow2 is available 2022-10-04 13:53 ` Stefan Hajnoczi 2022-10-05 4:18 ` Ming Lei @ 2022-10-06 10:14 ` Richard W.M. Jones 2022-10-12 14:15 ` Stefan Hajnoczi 1 sibling, 1 reply; 22+ messages in thread From: Richard W.M. Jones @ 2022-10-06 10:14 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Ming Lei, Stefan Hajnoczi, io-uring, linux-block, linux-kernel, Kirill Tkhai, Manuel Bentele, qemu-devel, Kevin Wolf, Xie Yongji, Denis V. Lunev, Stefano Garzarella On Tue, Oct 04, 2022 at 09:53:32AM -0400, Stefan Hajnoczi wrote: > qemu-nbd doesn't use io_uring to handle the backend IO, Would this be fixed by your (not yet upstream) libblkio driver for qemu? Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-p2v converts physical machines to virtual machines. Boot with a live CD or over the network (PXE) and turn machines into KVM guests. http://libguestfs.org/virt-v2v ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: ublk-qcow2: ublk-qcow2 is available 2022-10-06 10:14 ` Richard W.M. Jones @ 2022-10-12 14:15 ` Stefan Hajnoczi 2022-10-13 1:50 ` Ming Lei 0 siblings, 1 reply; 22+ messages in thread From: Stefan Hajnoczi @ 2022-10-12 14:15 UTC (permalink / raw) To: Richard W.M. Jones Cc: Ming Lei, Stefan Hajnoczi, io-uring, linux-block, linux-kernel, Kirill Tkhai, Manuel Bentele, qemu-devel, Kevin Wolf, Xie Yongji, Denis V. Lunev, Stefano Garzarella On Thu, 6 Oct 2022 at 06:14, Richard W.M. Jones <rjones@redhat.com> wrote: > > On Tue, Oct 04, 2022 at 09:53:32AM -0400, Stefan Hajnoczi wrote: > > qemu-nbd doesn't use io_uring to handle the backend IO, > > Would this be fixed by your (not yet upstream) libblkio driver for > qemu? I was wrong, qemu-nbd has syntax to use io_uring: $ qemu-nbd ... --image-opts driver=file,filename=test.img,aio=io_uring The new libblkio driver will also support io_uring, but QEMU's built-in io_uring support is already available and can be used as shown above. Stefan ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: ublk-qcow2: ublk-qcow2 is available 2022-10-12 14:15 ` Stefan Hajnoczi @ 2022-10-13 1:50 ` Ming Lei 2022-10-13 16:01 ` Stefan Hajnoczi 0 siblings, 1 reply; 22+ messages in thread From: Ming Lei @ 2022-10-13 1:50 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Richard W.M. Jones, Stefan Hajnoczi, io-uring, linux-block, linux-kernel, Kirill Tkhai, Manuel Bentele, qemu-devel, Kevin Wolf, Xie Yongji, Denis V. Lunev, Stefano Garzarella On Wed, Oct 12, 2022 at 10:15:28AM -0400, Stefan Hajnoczi wrote: > On Thu, 6 Oct 2022 at 06:14, Richard W.M. Jones <rjones@redhat.com> wrote: > > > > On Tue, Oct 04, 2022 at 09:53:32AM -0400, Stefan Hajnoczi wrote: > > > qemu-nbd doesn't use io_uring to handle the backend IO, > > > > Would this be fixed by your (not yet upstream) libblkio driver for > > qemu? > > I was wrong, qemu-nbd has syntax to use io_uring: > > $ qemu-nbd ... --image-opts driver=file,filename=test.img,aio=io_uring Yeah, I saw the option, previously when I tried io_uring via: qemu-nbd -c /dev/nbd11 -n --aio=io_uring $my_file It complains that 'qemu-nbd: Invalid aio mode 'io_uring'' even though that 'qemu-nbd --help' does say that io_uring is supported. Today just tried it on Fedora 37, looks it starts working with --aio=io_uring, but the IOPS is basically same with --aio=native, and IO trace shows that io_uring is used by qemu-nbd. Thanks, Ming ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: ublk-qcow2: ublk-qcow2 is available 2022-10-13 1:50 ` Ming Lei @ 2022-10-13 16:01 ` Stefan Hajnoczi 0 siblings, 0 replies; 22+ messages in thread From: Stefan Hajnoczi @ 2022-10-13 16:01 UTC (permalink / raw) To: Ming Lei Cc: Stefan Hajnoczi, Richard W.M. Jones, io-uring, linux-block, linux-kernel, Kirill Tkhai, Manuel Bentele, qemu-devel, Kevin Wolf, Xie Yongji, Denis V. Lunev, Stefano Garzarella [-- Attachment #1: Type: text/plain, Size: 1255 bytes --] On Thu, Oct 13, 2022 at 09:50:55AM +0800, Ming Lei wrote: > On Wed, Oct 12, 2022 at 10:15:28AM -0400, Stefan Hajnoczi wrote: > > On Thu, 6 Oct 2022 at 06:14, Richard W.M. Jones <rjones@redhat.com> wrote: > > > > > > On Tue, Oct 04, 2022 at 09:53:32AM -0400, Stefan Hajnoczi wrote: > > > > qemu-nbd doesn't use io_uring to handle the backend IO, > > > > > > Would this be fixed by your (not yet upstream) libblkio driver for > > > qemu? > > > > I was wrong, qemu-nbd has syntax to use io_uring: > > > > $ qemu-nbd ... --image-opts driver=file,filename=test.img,aio=io_uring > > Yeah, I saw the option, previously when I tried io_uring via: > > qemu-nbd -c /dev/nbd11 -n --aio=io_uring $my_file > > It complains that 'qemu-nbd: Invalid aio mode 'io_uring'' even though > that 'qemu-nbd --help' does say that io_uring is supported. > > Today just tried it on Fedora 37, looks it starts working with > --aio=io_uring, but the IOPS is basically same with --aio=native, and > IO trace shows that io_uring is used by qemu-nbd. Okay, similar performance to Linux AIO is expected. That's what we've seen with io_uring in QEMU. QEMU doesn't use io_uring in polling mode, so it's similar to what we get with Linux AIO. Stefan [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2022-10-13 16:10 UTC | newest] Thread overview: 22+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <Yza1u1KfKa7ycQm0@T590> 2022-10-03 19:53 ` ublk-qcow2: ublk-qcow2 is available Stefan Hajnoczi 2022-10-03 23:57 ` Denis V. Lunev 2022-10-05 15:11 ` Stefan Hajnoczi 2022-10-06 10:26 ` Ming Lei 2022-10-06 13:59 ` Stefan Hajnoczi 2022-10-06 15:09 ` Ming Lei 2022-10-06 18:29 ` Stefan Hajnoczi 2022-10-07 11:21 ` Ming Lei 2022-10-04 9:43 ` Ming Lei 2022-10-04 13:53 ` Stefan Hajnoczi 2022-10-05 4:18 ` Ming Lei 2022-10-05 12:21 ` Stefan Hajnoczi 2022-10-05 12:38 ` Denis V. Lunev 2022-10-06 11:24 ` Ming Lei 2022-10-07 10:04 ` Yongji Xie 2022-10-07 10:51 ` Ming Lei 2022-10-07 11:21 ` Yongji Xie 2022-10-07 11:23 ` Ming Lei 2022-10-06 10:14 ` Richard W.M. Jones 2022-10-12 14:15 ` Stefan Hajnoczi 2022-10-13 1:50 ` Ming Lei 2022-10-13 16:01 ` Stefan Hajnoczi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).