* [RFC 0/5] virtiofs: map buffer out of virtqueue lock
@ 2025-01-22 16:31 Eugenio Pérez
2025-01-22 16:31 ` [RFC 1/5] vduse: add virtio_fs to allowed dev id Eugenio Pérez
` (4 more replies)
0 siblings, 5 replies; 14+ messages in thread
From: Eugenio Pérez @ 2025-01-22 16:31 UTC (permalink / raw)
To: linux-kernel, mst, Hanna Reitz, Stefano Garzarella, Xuan Zhuo,
Jason Wang, Vivek Goyal, Miklos Szeredi, German Maglione,
Stefan Hajnoczi, virtualization
This is useful for some setups like swiotlb or VDUSE where the DMA
operations are expensive and/or need to be performed with a write lock.
After applying this patch, fio read test goes from 1201MiB/s to
1211MiB/s.
---
Sending this series to obtain feedback if this is the right way to go.
Q: Do we need virtqueue_map_sgs? XDP directly call dma_map. Need to
profile.
TODO: Profile multiqueue.
TODO: Handling errors.
TODO: Do the same for hiprio queue.
TODO: Can we batch maps? virtiofs always sends many buffers.
Eugenio Pérez (4):
vduse: add virtio_fs to allowed dev id
virtiofs: Move stack sg to fuse_req
virtio_ring: add virtqueue premapped out/in buffers support
virtiofs: perform DMA operations out of the spinlock
Jason Wang (1):
virtio_ring: introduce virtqueue_map/unmap_sgs()
drivers/vdpa/vdpa_user/vduse_dev.c | 1 +
drivers/virtio/virtio_ring.c | 165 ++++++++++++++++++++++++++---
fs/fuse/fuse_i.h | 7 ++
fs/fuse/virtio_fs.c | 68 +++++++-----
include/linux/virtio.h | 17 +++
5 files changed, 220 insertions(+), 38 deletions(-)
--
2.48.1
^ permalink raw reply [flat|nested] 14+ messages in thread
* [RFC 1/5] vduse: add virtio_fs to allowed dev id
2025-01-22 16:31 [RFC 0/5] virtiofs: map buffer out of virtqueue lock Eugenio Pérez
@ 2025-01-22 16:31 ` Eugenio Pérez
2025-01-23 1:48 ` Jason Wang
2025-01-22 16:31 ` [RFC 2/5] virtio_ring: introduce virtqueue_map/unmap_sgs() Eugenio Pérez
` (3 subsequent siblings)
4 siblings, 1 reply; 14+ messages in thread
From: Eugenio Pérez @ 2025-01-22 16:31 UTC (permalink / raw)
To: linux-kernel, mst, Hanna Reitz, Stefano Garzarella, Xuan Zhuo,
Jason Wang, Vivek Goyal, Miklos Szeredi, German Maglione,
Stefan Hajnoczi, virtualization
A VDUSE device that implements virtiofs device works fine just by
adding the device id to the whitelist.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
drivers/vdpa/vdpa_user/vduse_dev.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
index 7ae99691efdf..6a9a37351310 100644
--- a/drivers/vdpa/vdpa_user/vduse_dev.c
+++ b/drivers/vdpa/vdpa_user/vduse_dev.c
@@ -144,6 +144,7 @@ static struct workqueue_struct *vduse_irq_bound_wq;
static u32 allowed_device_id[] = {
VIRTIO_ID_BLOCK,
VIRTIO_ID_NET,
+ VIRTIO_ID_FS,
};
static inline struct vduse_dev *vdpa_to_vduse(struct vdpa_device *vdpa)
--
2.48.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [RFC 2/5] virtio_ring: introduce virtqueue_map/unmap_sgs()
2025-01-22 16:31 [RFC 0/5] virtiofs: map buffer out of virtqueue lock Eugenio Pérez
2025-01-22 16:31 ` [RFC 1/5] vduse: add virtio_fs to allowed dev id Eugenio Pérez
@ 2025-01-22 16:31 ` Eugenio Pérez
2025-01-23 1:51 ` Jason Wang
2025-01-22 16:31 ` [RFC 3/5] virtiofs: Move stack sg to fuse_req Eugenio Pérez
` (2 subsequent siblings)
4 siblings, 1 reply; 14+ messages in thread
From: Eugenio Pérez @ 2025-01-22 16:31 UTC (permalink / raw)
To: linux-kernel, mst, Hanna Reitz, Stefano Garzarella, Xuan Zhuo,
Jason Wang, Vivek Goyal, Miklos Szeredi, German Maglione,
Stefan Hajnoczi, virtualization
From: Jason Wang <jasowang@redhat.com>
Introduce new virtqueue DMA operations which allows the drivers that
want to make use of the premapping API but operate at the sg level.
Note that we still follow the assumtions if virtqueue_add() so
dma_map_sg() is not used. This could be optimized in the future.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
--
Eugenio's changes: Remove blank
TODO: Should we call directly dma_map instead of this? XDP do the direct
call.
---
drivers/virtio/virtio_ring.c | 128 +++++++++++++++++++++++++++++++----
include/linux/virtio.h | 10 +++
2 files changed, 125 insertions(+), 13 deletions(-)
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index fdd2d2b07b5a..05729bc5cbb1 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -359,6 +359,26 @@ static struct device *vring_dma_dev(const struct vring_virtqueue *vq)
return vq->dma_dev;
}
+static int __vring_map_one_sg(const struct vring_virtqueue *vq,
+ struct scatterlist *sg,
+ enum dma_data_direction direction,
+ dma_addr_t *addr)
+{
+ /*
+ * We can't use dma_map_sg, because we don't use scatterlists in
+ * the way it expects (we don't guarantee that the scatterlist
+ * will exist for the lifetime of the mapping).
+ */
+ *addr = dma_map_page(vring_dma_dev(vq),
+ sg_page(sg), sg->offset, sg->length,
+ direction);
+
+ if (dma_mapping_error(vring_dma_dev(vq), *addr))
+ return -ENOMEM;
+
+ return 0;
+}
+
/* Map one sg entry. */
static int vring_map_one_sg(const struct vring_virtqueue *vq, struct scatterlist *sg,
enum dma_data_direction direction, dma_addr_t *addr,
@@ -383,19 +403,7 @@ static int vring_map_one_sg(const struct vring_virtqueue *vq, struct scatterlist
return 0;
}
- /*
- * We can't use dma_map_sg, because we don't use scatterlists in
- * the way it expects (we don't guarantee that the scatterlist
- * will exist for the lifetime of the mapping).
- */
- *addr = dma_map_page(vring_dma_dev(vq),
- sg_page(sg), sg->offset, sg->length,
- direction);
-
- if (dma_mapping_error(vring_dma_dev(vq), *addr))
- return -ENOMEM;
-
- return 0;
+ return __vring_map_one_sg(vq, sg, direction, addr);
}
static dma_addr_t vring_map_single(const struct vring_virtqueue *vq,
@@ -526,6 +534,100 @@ static inline unsigned int virtqueue_add_desc_split(struct virtqueue *vq,
return next;
}
+void virtqueue_unmap_sgs(struct virtqueue *_vq,
+ struct scatterlist *sgs[],
+ unsigned int out_sgs,
+ unsigned int in_sgs)
+{
+ struct vring_virtqueue *vq = to_vvq(_vq);
+ struct scatterlist *sg;
+ int n;
+
+ for (n = 0; n < out_sgs; n++) {
+ for (sg = sgs[n]; sg; sg = sg_next(sg)) {
+ dma_unmap_page(vring_dma_dev(vq),
+ sg_dma_address(sg),
+ sg->length,
+ DMA_TO_DEVICE);
+ }
+ }
+
+ for (; n < (out_sgs + in_sgs); n++) {
+ for (sg = sgs[n]; sg; sg = sg_next(sg)) {
+ dma_unmap_page(vring_dma_dev(vq),
+ sg_dma_address(sg),
+ sg->length,
+ DMA_FROM_DEVICE);
+ }
+ }
+}
+EXPORT_SYMBOL_GPL(virtqueue_unmap_sgs);
+
+int virtqueue_map_sgs(struct virtqueue *_vq,
+ struct scatterlist *sgs[],
+ unsigned int out_sgs,
+ unsigned int in_sgs)
+{
+ struct vring_virtqueue *vq = to_vvq(_vq);
+ int i, n, mapped_sg = 0;
+ struct scatterlist *sg;
+
+ for (n = 0; n < out_sgs; n++) {
+ for (sg = sgs[n]; sg; sg = sg_next(sg)) {
+ dma_addr_t addr;
+
+ if (__vring_map_one_sg(vq, sg, DMA_TO_DEVICE, &addr))
+ goto unmap_release;
+
+ sg_dma_address(sg) = addr;
+ mapped_sg++;
+ }
+ }
+
+ for (; n < (out_sgs + in_sgs); n++) {
+ for (sg = sgs[n]; sg; sg = sg_next(sg)) {
+ dma_addr_t addr;
+
+ if (__vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, &addr))
+ goto unmap_release;
+
+ sg_dma_address(sg) = addr;
+ mapped_sg++;
+ }
+ }
+
+ return 0;
+
+unmap_release:
+ i = 0;
+
+ for (n = 0; n < out_sgs; n++) {
+ for (sg = sgs[n]; sg; sg = sg_next(sg)) {
+ if (i++ == mapped_sg)
+ goto out;
+ dma_unmap_page(vring_dma_dev(vq),
+ sg_dma_address(sg),
+ sg->length,
+ DMA_TO_DEVICE);
+ }
+ }
+
+ for (; n < (out_sgs + in_sgs); n++) {
+ for (sg = sgs[n]; sg; sg = sg_next(sg)) {
+
+ if (i++ == mapped_sg)
+ goto out;
+ dma_unmap_page(vring_dma_dev(vq),
+ sg_dma_address(sg),
+ sg->length,
+ DMA_FROM_DEVICE);
+ }
+ }
+out:
+ return -ENOMEM;
+}
+EXPORT_SYMBOL_GPL(virtqueue_map_sgs);
+
static inline int virtqueue_add_split(struct virtqueue *_vq,
struct scatterlist *sgs[],
unsigned int total_sg,
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index dd88682e27e3..28db998d691e 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -67,6 +67,16 @@ int virtqueue_add_outbuf_premapped(struct virtqueue *vq,
void *data,
gfp_t gfp);
+int virtqueue_map_sgs(struct virtqueue *_vq,
+ struct scatterlist *sgs[],
+ unsigned int out_sgs,
+ unsigned int in_sgs);
+
+void virtqueue_unmap_sgs(struct virtqueue *_vq,
+ struct scatterlist *sgs[],
+ unsigned int out_sgs,
+ unsigned int in_sgs);
+
int virtqueue_add_sgs(struct virtqueue *vq,
struct scatterlist *sgs[],
unsigned int out_sgs,
--
2.48.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [RFC 3/5] virtiofs: Move stack sg to fuse_req
2025-01-22 16:31 [RFC 0/5] virtiofs: map buffer out of virtqueue lock Eugenio Pérez
2025-01-22 16:31 ` [RFC 1/5] vduse: add virtio_fs to allowed dev id Eugenio Pérez
2025-01-22 16:31 ` [RFC 2/5] virtio_ring: introduce virtqueue_map/unmap_sgs() Eugenio Pérez
@ 2025-01-22 16:31 ` Eugenio Pérez
2025-01-23 1:54 ` Jason Wang
2025-01-22 16:31 ` [RFC 4/5] virtio_ring: add virtqueue premapped out/in buffers support Eugenio Pérez
2025-01-22 16:31 ` [RFC 5/5] virtiofs: perform DMA operations out of the spinlock Eugenio Pérez
4 siblings, 1 reply; 14+ messages in thread
From: Eugenio Pérez @ 2025-01-22 16:31 UTC (permalink / raw)
To: linux-kernel, mst, Hanna Reitz, Stefano Garzarella, Xuan Zhuo,
Jason Wang, Vivek Goyal, Miklos Szeredi, German Maglione,
Stefan Hajnoczi, virtualization
We need to move the map and unmap out of virtiofs queue lock. We need to
store the unmap sgs to call virtqueue_unmap_sgs, so use the request
itself.
TODO: We need to store the forget sgs too. In my tests it is not called
so we're safe. Find a way to force-call it.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
fs/fuse/fuse_i.h | 7 +++++++
fs/fuse/virtio_fs.c | 45 +++++++++++++++++++++------------------------
2 files changed, 28 insertions(+), 24 deletions(-)
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 74744c6f2860..e57664daa761 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -18,6 +18,7 @@
#include <linux/mount.h>
#include <linux/wait.h>
#include <linux/list.h>
+#include <linux/scatterlist.h>
#include <linux/spinlock.h>
#include <linux/mm.h>
#include <linux/backing-dev.h>
@@ -434,6 +435,12 @@ struct fuse_req {
#if IS_ENABLED(CONFIG_VIRTIO_FS)
/** virtio-fs's physically contiguous buffer for in and out args */
void *argbuf;
+
+ /** virtio-fs's pre-mapped stuff */
+ struct scatterlist sg_inline_data[6]; /* optimization for short requests */
+ struct scatterlist *sg;
+ unsigned int out_sgs;
+ unsigned int in_sgs;
#endif
/** fuse_mount this request belongs to */
diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
index 82afe78ec542..1344c5782a7c 100644
--- a/fs/fuse/virtio_fs.c
+++ b/fs/fuse/virtio_fs.c
@@ -1377,14 +1377,10 @@ static int virtio_fs_enqueue_req(struct virtio_fs_vq *fsvq,
{
/* requests need at least 4 elements */
struct scatterlist *stack_sgs[6];
- struct scatterlist stack_sg[ARRAY_SIZE(stack_sgs)];
struct scatterlist **sgs = stack_sgs;
- struct scatterlist *sg = stack_sg;
struct virtqueue *vq;
struct fuse_args *args = req->args;
unsigned int argbuf_used = 0;
- unsigned int out_sgs = 0;
- unsigned int in_sgs = 0;
unsigned int total_sgs;
unsigned int i;
int ret;
@@ -1392,11 +1388,13 @@ static int virtio_fs_enqueue_req(struct virtio_fs_vq *fsvq,
struct fuse_pqueue *fpq;
/* Does the sglist fit on the stack? */
+ /* TODO replace magic 6 by a macro */
+ req->sg = req->sg_inline_data;
total_sgs = sg_count_fuse_req(req);
- if (total_sgs > ARRAY_SIZE(stack_sgs)) {
+ if (total_sgs > 6) {
sgs = kmalloc_array(total_sgs, sizeof(sgs[0]), gfp);
- sg = kmalloc_array(total_sgs, sizeof(sg[0]), gfp);
- if (!sgs || !sg) {
+ req->sg = kmalloc_array(total_sgs, sizeof(req->sg[0]), gfp);
+ if (!sgs || !req->sg) {
ret = -ENOMEM;
goto out;
}
@@ -1408,26 +1406,25 @@ static int virtio_fs_enqueue_req(struct virtio_fs_vq *fsvq,
goto out;
/* Request elements */
- sg_init_one(&sg[out_sgs++], &req->in.h, sizeof(req->in.h));
- out_sgs += sg_init_fuse_args(&sg[out_sgs], req,
- (struct fuse_arg *)args->in_args,
- args->in_numargs, args->in_pages,
- req->argbuf, &argbuf_used);
+ sg_init_one(&req->sg[req->out_sgs++], &req->in.h, sizeof(req->in.h));
+ req->out_sgs += sg_init_fuse_args(&req->sg[req->out_sgs], req,
+ (struct fuse_arg *)args->in_args,
+ args->in_numargs, args->in_pages,
+ req->argbuf, &argbuf_used);
/* Reply elements */
if (test_bit(FR_ISREPLY, &req->flags)) {
- sg_init_one(&sg[out_sgs + in_sgs++],
+ sg_init_one(&req->sg[req->out_sgs + req->in_sgs++],
&req->out.h, sizeof(req->out.h));
- in_sgs += sg_init_fuse_args(&sg[out_sgs + in_sgs], req,
- args->out_args, args->out_numargs,
- args->out_pages,
- req->argbuf + argbuf_used, NULL);
+ req->in_sgs += sg_init_fuse_args(&req->sg[req->out_sgs + req->in_sgs], req,
+ args->out_args, args->out_numargs,
+ args->out_pages,
+ req->argbuf + argbuf_used, NULL);
}
- WARN_ON(out_sgs + in_sgs != total_sgs);
-
for (i = 0; i < total_sgs; i++)
- sgs[i] = &sg[i];
+ sgs[i] = &req->sg[i];
+ WARN_ON(req->out_sgs + req->in_sgs != total_sgs);
spin_lock(&fsvq->lock);
@@ -1438,7 +1435,7 @@ static int virtio_fs_enqueue_req(struct virtio_fs_vq *fsvq,
}
vq = fsvq->vq;
- ret = virtqueue_add_sgs(vq, sgs, out_sgs, in_sgs, req, GFP_ATOMIC);
+ ret = virtqueue_add_sgs(vq, sgs, req->out_sgs, req->in_sgs, req, GFP_ATOMIC);
if (ret < 0) {
spin_unlock(&fsvq->lock);
goto out;
@@ -1467,10 +1464,10 @@ static int virtio_fs_enqueue_req(struct virtio_fs_vq *fsvq,
kfree(req->argbuf);
req->argbuf = NULL;
}
- if (sgs != stack_sgs) {
+ if (ret < 0 && req->sg != req->sg_inline_data)
+ kfree(req->sg);
+ if (sgs != stack_sgs)
kfree(sgs);
- kfree(sg);
- }
return ret;
}
--
2.48.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [RFC 4/5] virtio_ring: add virtqueue premapped out/in buffers support
2025-01-22 16:31 [RFC 0/5] virtiofs: map buffer out of virtqueue lock Eugenio Pérez
` (2 preceding siblings ...)
2025-01-22 16:31 ` [RFC 3/5] virtiofs: Move stack sg to fuse_req Eugenio Pérez
@ 2025-01-22 16:31 ` Eugenio Pérez
2025-01-22 16:31 ` [RFC 5/5] virtiofs: perform DMA operations out of the spinlock Eugenio Pérez
4 siblings, 0 replies; 14+ messages in thread
From: Eugenio Pérez @ 2025-01-22 16:31 UTC (permalink / raw)
To: linux-kernel, mst, Hanna Reitz, Stefano Garzarella, Xuan Zhuo,
Jason Wang, Vivek Goyal, Miklos Szeredi, German Maglione,
Stefan Hajnoczi, virtualization
Convenience function to add chains that mixes out and in buffers.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
drivers/virtio/virtio_ring.c | 35 +++++++++++++++++++++++++++++++++++
include/linux/virtio.h | 7 +++++++
2 files changed, 42 insertions(+)
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 05729bc5cbb1..e49912fa77c5 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2385,6 +2385,41 @@ static inline int virtqueue_add(struct virtqueue *_vq,
out_sgs, in_sgs, data, ctx, premapped, gfp);
}
+/**
+ * virtqueue_add_sgs_premapped - expose buffers to other end
+ * @_vq: the struct virtqueue we're talking about.
+ * @sgs: array of terminated scatterlists.
+ * @out_sgs: the number of scatterlists readable by other side
+ * @in_sgs: the number of scatterlists which are writable (after readable ones)
+ * @data: the token identifying the buffer.
+ * @gfp: how to do memory allocations (if necessary).
+ *
+ * Caller must ensure we don't call this with other virtqueue operations
+ * at the same time (except where noted).
+ *
+ * Returns zero or a negative error (ie. ENOSPC, ENOMEM, EIO).
+ */
+int virtqueue_add_sgs_premapped(struct virtqueue *_vq,
+ struct scatterlist *sgs[],
+ unsigned int out_sgs,
+ unsigned int in_sgs,
+ void *data,
+ gfp_t gfp)
+{
+ unsigned int i, total_sg = 0;
+
+ /* Count them first. */
+ for (i = 0; i < out_sgs + in_sgs; i++) {
+ struct scatterlist *sg;
+
+ for (sg = sgs[i]; sg; sg = sg_next(sg))
+ total_sg++;
+ }
+ return virtqueue_add(_vq, sgs, total_sg, out_sgs, in_sgs,
+ data, NULL, true, gfp);
+}
+EXPORT_SYMBOL_GPL(virtqueue_add_sgs_premapped);
+
/**
* virtqueue_add_sgs - expose buffers to other end
* @_vq: the struct virtqueue we're talking about.
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 28db998d691e..de52a0f6f734 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -84,6 +84,13 @@ int virtqueue_add_sgs(struct virtqueue *vq,
void *data,
gfp_t gfp);
+int virtqueue_add_sgs_premapped(struct virtqueue *vq,
+ struct scatterlist *sgs[],
+ unsigned int out_sgs,
+ unsigned int in_sgs,
+ void *data,
+ gfp_t gfp);
+
struct device *virtqueue_dma_dev(struct virtqueue *vq);
bool virtqueue_kick(struct virtqueue *vq);
--
2.48.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [RFC 5/5] virtiofs: perform DMA operations out of the spinlock
2025-01-22 16:31 [RFC 0/5] virtiofs: map buffer out of virtqueue lock Eugenio Pérez
` (3 preceding siblings ...)
2025-01-22 16:31 ` [RFC 4/5] virtio_ring: add virtqueue premapped out/in buffers support Eugenio Pérez
@ 2025-01-22 16:31 ` Eugenio Pérez
2025-01-23 1:56 ` Jason Wang
4 siblings, 1 reply; 14+ messages in thread
From: Eugenio Pérez @ 2025-01-22 16:31 UTC (permalink / raw)
To: linux-kernel, mst, Hanna Reitz, Stefano Garzarella, Xuan Zhuo,
Jason Wang, Vivek Goyal, Miklos Szeredi, German Maglione,
Stefan Hajnoczi, virtualization
This is useful for some setups like swiotlb or VDUSE where the DMA
operations are expensive and/or need to be performed with a write lock.
After applying this patch, fio read test goes from 1201MiB/s to 1211MiB/s.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
drivers/virtio/virtio_ring.c | 2 ++
fs/fuse/virtio_fs.c | 25 +++++++++++++++++++++++--
2 files changed, 25 insertions(+), 2 deletions(-)
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index e49912fa77c5..eb22bfcb9100 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -580,6 +580,7 @@ int virtqueue_map_sgs(struct virtqueue *_vq,
goto unmap_release;
sg_dma_address(sg) = addr;
+ sg_dma_len(sg) = sg->length;
mapped_sg++;
}
}
@@ -592,6 +593,7 @@ int virtqueue_map_sgs(struct virtqueue *_vq,
goto unmap_release;
sg_dma_address(sg) = addr;
+ sg_dma_len(sg) = sg->length;
mapped_sg++;
}
}
diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
index 1344c5782a7c..2b558b05d0f8 100644
--- a/fs/fuse/virtio_fs.c
+++ b/fs/fuse/virtio_fs.c
@@ -836,8 +836,21 @@ static void virtio_fs_requests_done_work(struct work_struct *work)
/* End requests */
list_for_each_entry_safe(req, next, &reqs, list) {
+ struct scatterlist *stack_sgs[6];
+ struct scatterlist **sgs = stack_sgs;
+ unsigned int total_sgs = req->out_sgs + req->in_sgs;
+
list_del_init(&req->list);
+ /* TODO replace magic 6 by a macro */
+ if (total_sgs > 6)
+ sgs = kmalloc_array(total_sgs, sizeof(sgs[0]), GFP_ATOMIC);
+
+ for (unsigned int i = 0; i < total_sgs; ++i)
+ sgs[i] = &req->sg[i];
+
+ virtqueue_unmap_sgs(vq, sgs, req->out_sgs, req->in_sgs);
+
/* blocking async request completes in a worker context */
if (req->args->may_block) {
struct virtio_fs_req_work *w;
@@ -850,6 +863,9 @@ static void virtio_fs_requests_done_work(struct work_struct *work)
} else {
virtio_fs_request_complete(req, fsvq);
}
+
+ if (sgs != stack_sgs)
+ kfree(sgs);
}
/* Try to push previously queued requests, as the queue might no longer be full */
@@ -1426,6 +1442,11 @@ static int virtio_fs_enqueue_req(struct virtio_fs_vq *fsvq,
sgs[i] = &req->sg[i];
WARN_ON(req->out_sgs + req->in_sgs != total_sgs);
+ // TODO can we change this ptr out of the lock?
+ vq = fsvq->vq;
+ // TODO handle this and following errors
+ ret = virtqueue_map_sgs(vq, sgs, req->out_sgs, req->in_sgs);
+ BUG_ON(ret < 0);
spin_lock(&fsvq->lock);
if (!fsvq->connected) {
@@ -1434,8 +1455,8 @@ static int virtio_fs_enqueue_req(struct virtio_fs_vq *fsvq,
goto out;
}
- vq = fsvq->vq;
- ret = virtqueue_add_sgs(vq, sgs, req->out_sgs, req->in_sgs, req, GFP_ATOMIC);
+ ret = virtqueue_add_sgs_premapped(vq, sgs, req->out_sgs,
+ req->in_sgs, req, GFP_ATOMIC);
if (ret < 0) {
spin_unlock(&fsvq->lock);
goto out;
--
2.48.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [RFC 1/5] vduse: add virtio_fs to allowed dev id
2025-01-22 16:31 ` [RFC 1/5] vduse: add virtio_fs to allowed dev id Eugenio Pérez
@ 2025-01-23 1:48 ` Jason Wang
0 siblings, 0 replies; 14+ messages in thread
From: Jason Wang @ 2025-01-23 1:48 UTC (permalink / raw)
To: Eugenio Pérez
Cc: linux-kernel, mst, Hanna Reitz, Stefano Garzarella, Xuan Zhuo,
Vivek Goyal, Miklos Szeredi, German Maglione, Stefan Hajnoczi,
virtualization
On Thu, Jan 23, 2025 at 12:32 AM Eugenio Pérez <eperezma@redhat.com> wrote:
>
> A VDUSE device that implements virtiofs device works fine just by
> adding the device id to the whitelist.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Thanks
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC 2/5] virtio_ring: introduce virtqueue_map/unmap_sgs()
2025-01-22 16:31 ` [RFC 2/5] virtio_ring: introduce virtqueue_map/unmap_sgs() Eugenio Pérez
@ 2025-01-23 1:51 ` Jason Wang
2025-01-23 7:31 ` Eugenio Perez Martin
0 siblings, 1 reply; 14+ messages in thread
From: Jason Wang @ 2025-01-23 1:51 UTC (permalink / raw)
To: Eugenio Pérez
Cc: linux-kernel, mst, Hanna Reitz, Stefano Garzarella, Xuan Zhuo,
Vivek Goyal, Miklos Szeredi, German Maglione, Stefan Hajnoczi,
virtualization
On Thu, Jan 23, 2025 at 12:32 AM Eugenio Pérez <eperezma@redhat.com> wrote:
>
> From: Jason Wang <jasowang@redhat.com>
>
> Introduce new virtqueue DMA operations which allows the drivers that
> want to make use of the premapping API but operate at the sg level.
>
> Note that we still follow the assumtions if virtqueue_add() so
> dma_map_sg() is not used. This could be optimized in the future.
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> --
> Eugenio's changes: Remove blank
> TODO: Should we call directly dma_map instead of this? XDP do the direct
> call.
Note that we should have an indirection layer as the device is not
necessarily DMA capable.
And we probably need to rename the virtqueue_dma_map_XXX() to
virtqueue_map_XXX()
Thanks
> ---
> drivers/virtio/virtio_ring.c | 128 +++++++++++++++++++++++++++++++----
> include/linux/virtio.h | 10 +++
> 2 files changed, 125 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index fdd2d2b07b5a..05729bc5cbb1 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -359,6 +359,26 @@ static struct device *vring_dma_dev(const struct vring_virtqueue *vq)
> return vq->dma_dev;
> }
>
> +static int __vring_map_one_sg(const struct vring_virtqueue *vq,
> + struct scatterlist *sg,
> + enum dma_data_direction direction,
> + dma_addr_t *addr)
> +{
> + /*
> + * We can't use dma_map_sg, because we don't use scatterlists in
> + * the way it expects (we don't guarantee that the scatterlist
> + * will exist for the lifetime of the mapping).
> + */
> + *addr = dma_map_page(vring_dma_dev(vq),
> + sg_page(sg), sg->offset, sg->length,
> + direction);
> +
> + if (dma_mapping_error(vring_dma_dev(vq), *addr))
> + return -ENOMEM;
> +
> + return 0;
> +}
> +
> /* Map one sg entry. */
> static int vring_map_one_sg(const struct vring_virtqueue *vq, struct scatterlist *sg,
> enum dma_data_direction direction, dma_addr_t *addr,
> @@ -383,19 +403,7 @@ static int vring_map_one_sg(const struct vring_virtqueue *vq, struct scatterlist
> return 0;
> }
>
> - /*
> - * We can't use dma_map_sg, because we don't use scatterlists in
> - * the way it expects (we don't guarantee that the scatterlist
> - * will exist for the lifetime of the mapping).
> - */
> - *addr = dma_map_page(vring_dma_dev(vq),
> - sg_page(sg), sg->offset, sg->length,
> - direction);
> -
> - if (dma_mapping_error(vring_dma_dev(vq), *addr))
> - return -ENOMEM;
> -
> - return 0;
> + return __vring_map_one_sg(vq, sg, direction, addr);
> }
>
> static dma_addr_t vring_map_single(const struct vring_virtqueue *vq,
> @@ -526,6 +534,100 @@ static inline unsigned int virtqueue_add_desc_split(struct virtqueue *vq,
> return next;
> }
>
> +void virtqueue_unmap_sgs(struct virtqueue *_vq,
> + struct scatterlist *sgs[],
> + unsigned int out_sgs,
> + unsigned int in_sgs)
> +{
> + struct vring_virtqueue *vq = to_vvq(_vq);
> + struct scatterlist *sg;
> + int n;
> +
> + for (n = 0; n < out_sgs; n++) {
> + for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> + dma_unmap_page(vring_dma_dev(vq),
> + sg_dma_address(sg),
> + sg->length,
> + DMA_TO_DEVICE);
> + }
> + }
> +
> + for (; n < (out_sgs + in_sgs); n++) {
> + for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> + dma_unmap_page(vring_dma_dev(vq),
> + sg_dma_address(sg),
> + sg->length,
> + DMA_FROM_DEVICE);
> + }
> + }
> +}
> +EXPORT_SYMBOL_GPL(virtqueue_unmap_sgs);
> +
> +int virtqueue_map_sgs(struct virtqueue *_vq,
> + struct scatterlist *sgs[],
> + unsigned int out_sgs,
> + unsigned int in_sgs)
> +{
> + struct vring_virtqueue *vq = to_vvq(_vq);
> + int i, n, mapped_sg = 0;
> + struct scatterlist *sg;
> +
> + for (n = 0; n < out_sgs; n++) {
> + for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> + dma_addr_t addr;
> +
> + if (__vring_map_one_sg(vq, sg, DMA_TO_DEVICE, &addr))
> + goto unmap_release;
> +
> + sg_dma_address(sg) = addr;
> + mapped_sg++;
> + }
> + }
> +
> + for (; n < (out_sgs + in_sgs); n++) {
> + for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> + dma_addr_t addr;
> +
> + if (__vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, &addr))
> + goto unmap_release;
> +
> + sg_dma_address(sg) = addr;
> + mapped_sg++;
> + }
> + }
> +
> + return 0;
> +
> +unmap_release:
> + i = 0;
> +
> + for (n = 0; n < out_sgs; n++) {
> + for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> + if (i++ == mapped_sg)
> + goto out;
> + dma_unmap_page(vring_dma_dev(vq),
> + sg_dma_address(sg),
> + sg->length,
> + DMA_TO_DEVICE);
> + }
> + }
> +
> + for (; n < (out_sgs + in_sgs); n++) {
> + for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> +
> + if (i++ == mapped_sg)
> + goto out;
> + dma_unmap_page(vring_dma_dev(vq),
> + sg_dma_address(sg),
> + sg->length,
> + DMA_FROM_DEVICE);
> + }
> + }
> +out:
> + return -ENOMEM;
> +}
> +EXPORT_SYMBOL_GPL(virtqueue_map_sgs);
> +
> static inline int virtqueue_add_split(struct virtqueue *_vq,
> struct scatterlist *sgs[],
> unsigned int total_sg,
> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> index dd88682e27e3..28db998d691e 100644
> --- a/include/linux/virtio.h
> +++ b/include/linux/virtio.h
> @@ -67,6 +67,16 @@ int virtqueue_add_outbuf_premapped(struct virtqueue *vq,
> void *data,
> gfp_t gfp);
>
> +int virtqueue_map_sgs(struct virtqueue *_vq,
> + struct scatterlist *sgs[],
> + unsigned int out_sgs,
> + unsigned int in_sgs);
> +
> +void virtqueue_unmap_sgs(struct virtqueue *_vq,
> + struct scatterlist *sgs[],
> + unsigned int out_sgs,
> + unsigned int in_sgs);
> +
> int virtqueue_add_sgs(struct virtqueue *vq,
> struct scatterlist *sgs[],
> unsigned int out_sgs,
> --
> 2.48.1
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC 3/5] virtiofs: Move stack sg to fuse_req
2025-01-22 16:31 ` [RFC 3/5] virtiofs: Move stack sg to fuse_req Eugenio Pérez
@ 2025-01-23 1:54 ` Jason Wang
2025-01-23 7:33 ` Eugenio Perez Martin
0 siblings, 1 reply; 14+ messages in thread
From: Jason Wang @ 2025-01-23 1:54 UTC (permalink / raw)
To: Eugenio Pérez
Cc: linux-kernel, mst, Hanna Reitz, Stefano Garzarella, Xuan Zhuo,
Vivek Goyal, Miklos Szeredi, German Maglione, Stefan Hajnoczi,
virtualization
On Thu, Jan 23, 2025 at 12:32 AM Eugenio Pérez <eperezma@redhat.com> wrote:
>
> We need to move the map and unmap out of virtiofs queue lock. We need to
> store the unmap sgs to call virtqueue_unmap_sgs, so use the request
> itself.
>
> TODO: We need to store the forget sgs too. In my tests it is not called
> so we're safe. Find a way to force-call it.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
> fs/fuse/fuse_i.h | 7 +++++++
> fs/fuse/virtio_fs.c | 45 +++++++++++++++++++++------------------------
> 2 files changed, 28 insertions(+), 24 deletions(-)
>
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index 74744c6f2860..e57664daa761 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -18,6 +18,7 @@
> #include <linux/mount.h>
> #include <linux/wait.h>
> #include <linux/list.h>
> +#include <linux/scatterlist.h>
> #include <linux/spinlock.h>
> #include <linux/mm.h>
> #include <linux/backing-dev.h>
> @@ -434,6 +435,12 @@ struct fuse_req {
> #if IS_ENABLED(CONFIG_VIRTIO_FS)
> /** virtio-fs's physically contiguous buffer for in and out args */
> void *argbuf;
> +
> + /** virtio-fs's pre-mapped stuff */
> + struct scatterlist sg_inline_data[6]; /* optimization for short requests */
> + struct scatterlist *sg;
> + unsigned int out_sgs;
> + unsigned int in_sgs;
> #endif
As this touches FUSE, I think we need to do a careful benchmark just
for this patch to make sure no regression is introduced.
A dumb question (know little about FUSE), if this is only used by
virtio-fs, could we find a way to embed fuse_req in a virtio-fs
dedicated req structure?
>
> /** fuse_mount this request belongs to */
> diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
> index 82afe78ec542..1344c5782a7c 100644
> --- a/fs/fuse/virtio_fs.c
> +++ b/fs/fuse/virtio_fs.c
> @@ -1377,14 +1377,10 @@ static int virtio_fs_enqueue_req(struct virtio_fs_vq *fsvq,
> {
> /* requests need at least 4 elements */
> struct scatterlist *stack_sgs[6];
> - struct scatterlist stack_sg[ARRAY_SIZE(stack_sgs)];
> struct scatterlist **sgs = stack_sgs;
> - struct scatterlist *sg = stack_sg;
> struct virtqueue *vq;
> struct fuse_args *args = req->args;
> unsigned int argbuf_used = 0;
> - unsigned int out_sgs = 0;
> - unsigned int in_sgs = 0;
> unsigned int total_sgs;
> unsigned int i;
> int ret;
> @@ -1392,11 +1388,13 @@ static int virtio_fs_enqueue_req(struct virtio_fs_vq *fsvq,
> struct fuse_pqueue *fpq;
>
> /* Does the sglist fit on the stack? */
> + /* TODO replace magic 6 by a macro */
> + req->sg = req->sg_inline_data;
> total_sgs = sg_count_fuse_req(req);
> - if (total_sgs > ARRAY_SIZE(stack_sgs)) {
> + if (total_sgs > 6) {
> sgs = kmalloc_array(total_sgs, sizeof(sgs[0]), gfp);
> - sg = kmalloc_array(total_sgs, sizeof(sg[0]), gfp);
> - if (!sgs || !sg) {
> + req->sg = kmalloc_array(total_sgs, sizeof(req->sg[0]), gfp);
> + if (!sgs || !req->sg) {
> ret = -ENOMEM;
> goto out;
> }
> @@ -1408,26 +1406,25 @@ static int virtio_fs_enqueue_req(struct virtio_fs_vq *fsvq,
> goto out;
>
> /* Request elements */
> - sg_init_one(&sg[out_sgs++], &req->in.h, sizeof(req->in.h));
> - out_sgs += sg_init_fuse_args(&sg[out_sgs], req,
> - (struct fuse_arg *)args->in_args,
> - args->in_numargs, args->in_pages,
> - req->argbuf, &argbuf_used);
> + sg_init_one(&req->sg[req->out_sgs++], &req->in.h, sizeof(req->in.h));
> + req->out_sgs += sg_init_fuse_args(&req->sg[req->out_sgs], req,
> + (struct fuse_arg *)args->in_args,
> + args->in_numargs, args->in_pages,
> + req->argbuf, &argbuf_used);
>
> /* Reply elements */
> if (test_bit(FR_ISREPLY, &req->flags)) {
> - sg_init_one(&sg[out_sgs + in_sgs++],
> + sg_init_one(&req->sg[req->out_sgs + req->in_sgs++],
> &req->out.h, sizeof(req->out.h));
> - in_sgs += sg_init_fuse_args(&sg[out_sgs + in_sgs], req,
> - args->out_args, args->out_numargs,
> - args->out_pages,
> - req->argbuf + argbuf_used, NULL);
> + req->in_sgs += sg_init_fuse_args(&req->sg[req->out_sgs + req->in_sgs], req,
> + args->out_args, args->out_numargs,
> + args->out_pages,
> + req->argbuf + argbuf_used, NULL);
> }
>
> - WARN_ON(out_sgs + in_sgs != total_sgs);
> -
> for (i = 0; i < total_sgs; i++)
> - sgs[i] = &sg[i];
> + sgs[i] = &req->sg[i];
> + WARN_ON(req->out_sgs + req->in_sgs != total_sgs);
>
> spin_lock(&fsvq->lock);
>
> @@ -1438,7 +1435,7 @@ static int virtio_fs_enqueue_req(struct virtio_fs_vq *fsvq,
> }
>
> vq = fsvq->vq;
> - ret = virtqueue_add_sgs(vq, sgs, out_sgs, in_sgs, req, GFP_ATOMIC);
> + ret = virtqueue_add_sgs(vq, sgs, req->out_sgs, req->in_sgs, req, GFP_ATOMIC);
> if (ret < 0) {
> spin_unlock(&fsvq->lock);
> goto out;
> @@ -1467,10 +1464,10 @@ static int virtio_fs_enqueue_req(struct virtio_fs_vq *fsvq,
> kfree(req->argbuf);
> req->argbuf = NULL;
> }
> - if (sgs != stack_sgs) {
> + if (ret < 0 && req->sg != req->sg_inline_data)
> + kfree(req->sg);
> + if (sgs != stack_sgs)
> kfree(sgs);
> - kfree(sg);
> - }
>
> return ret;
> }
> --
> 2.48.1
>
Thanks
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC 5/5] virtiofs: perform DMA operations out of the spinlock
2025-01-22 16:31 ` [RFC 5/5] virtiofs: perform DMA operations out of the spinlock Eugenio Pérez
@ 2025-01-23 1:56 ` Jason Wang
2025-01-23 2:26 ` Jason Wang
0 siblings, 1 reply; 14+ messages in thread
From: Jason Wang @ 2025-01-23 1:56 UTC (permalink / raw)
To: Eugenio Pérez
Cc: linux-kernel, mst, Hanna Reitz, Stefano Garzarella, Xuan Zhuo,
Vivek Goyal, Miklos Szeredi, German Maglione, Stefan Hajnoczi,
virtualization
On Thu, Jan 23, 2025 at 12:32 AM Eugenio Pérez <eperezma@redhat.com> wrote:
>
> This is useful for some setups like swiotlb or VDUSE where the DMA
> operations are expensive and/or need to be performed with a write lock.
>
> After applying this patch, fio read test goes from 1201MiB/s to 1211MiB/s.
The difference is too small to differentiate it from the noise.
I would suggest to test with different setups.
1) SWIOTLB 2) VDUSE
Note that SWIOTLB will do bouncing even for DMA_FROM_DEVICE, so I
think we may see better performance there.
And we need to try with different request size, I did a similar patch
for virtio-blk and I see better performance for large request like 1M
etc.
Thanks
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
> drivers/virtio/virtio_ring.c | 2 ++
> fs/fuse/virtio_fs.c | 25 +++++++++++++++++++++++--
> 2 files changed, 25 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index e49912fa77c5..eb22bfcb9100 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -580,6 +580,7 @@ int virtqueue_map_sgs(struct virtqueue *_vq,
> goto unmap_release;
>
> sg_dma_address(sg) = addr;
> + sg_dma_len(sg) = sg->length;
> mapped_sg++;
> }
> }
> @@ -592,6 +593,7 @@ int virtqueue_map_sgs(struct virtqueue *_vq,
> goto unmap_release;
>
> sg_dma_address(sg) = addr;
> + sg_dma_len(sg) = sg->length;
> mapped_sg++;
> }
> }
> diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
> index 1344c5782a7c..2b558b05d0f8 100644
> --- a/fs/fuse/virtio_fs.c
> +++ b/fs/fuse/virtio_fs.c
> @@ -836,8 +836,21 @@ static void virtio_fs_requests_done_work(struct work_struct *work)
>
> /* End requests */
> list_for_each_entry_safe(req, next, &reqs, list) {
> + struct scatterlist *stack_sgs[6];
> + struct scatterlist **sgs = stack_sgs;
> + unsigned int total_sgs = req->out_sgs + req->in_sgs;
> +
> list_del_init(&req->list);
>
> + /* TODO replace magic 6 by a macro */
> + if (total_sgs > 6)
> + sgs = kmalloc_array(total_sgs, sizeof(sgs[0]), GFP_ATOMIC);
> +
> + for (unsigned int i = 0; i < total_sgs; ++i)
> + sgs[i] = &req->sg[i];
> +
> + virtqueue_unmap_sgs(vq, sgs, req->out_sgs, req->in_sgs);
> +
> /* blocking async request completes in a worker context */
> if (req->args->may_block) {
> struct virtio_fs_req_work *w;
> @@ -850,6 +863,9 @@ static void virtio_fs_requests_done_work(struct work_struct *work)
> } else {
> virtio_fs_request_complete(req, fsvq);
> }
> +
> + if (sgs != stack_sgs)
> + kfree(sgs);
> }
>
> /* Try to push previously queued requests, as the queue might no longer be full */
> @@ -1426,6 +1442,11 @@ static int virtio_fs_enqueue_req(struct virtio_fs_vq *fsvq,
> sgs[i] = &req->sg[i];
> WARN_ON(req->out_sgs + req->in_sgs != total_sgs);
>
> + // TODO can we change this ptr out of the lock?
> + vq = fsvq->vq;
> + // TODO handle this and following errors
> + ret = virtqueue_map_sgs(vq, sgs, req->out_sgs, req->in_sgs);
> + BUG_ON(ret < 0);
> spin_lock(&fsvq->lock);
>
> if (!fsvq->connected) {
> @@ -1434,8 +1455,8 @@ static int virtio_fs_enqueue_req(struct virtio_fs_vq *fsvq,
> goto out;
> }
>
> - vq = fsvq->vq;
> - ret = virtqueue_add_sgs(vq, sgs, req->out_sgs, req->in_sgs, req, GFP_ATOMIC);
> + ret = virtqueue_add_sgs_premapped(vq, sgs, req->out_sgs,
> + req->in_sgs, req, GFP_ATOMIC);
> if (ret < 0) {
> spin_unlock(&fsvq->lock);
> goto out;
> --
> 2.48.1
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC 5/5] virtiofs: perform DMA operations out of the spinlock
2025-01-23 1:56 ` Jason Wang
@ 2025-01-23 2:26 ` Jason Wang
0 siblings, 0 replies; 14+ messages in thread
From: Jason Wang @ 2025-01-23 2:26 UTC (permalink / raw)
To: Eugenio Pérez
Cc: linux-kernel, mst, Hanna Reitz, Stefano Garzarella, Xuan Zhuo,
Vivek Goyal, Miklos Szeredi, German Maglione, Stefan Hajnoczi,
virtualization
On Thu, Jan 23, 2025 at 9:56 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Thu, Jan 23, 2025 at 12:32 AM Eugenio Pérez <eperezma@redhat.com> wrote:
> >
> > This is useful for some setups like swiotlb or VDUSE where the DMA
> > operations are expensive and/or need to be performed with a write lock.
> >
> > After applying this patch, fio read test goes from 1201MiB/s to 1211MiB/s.
>
> The difference is too small to differentiate it from the noise.
>
> I would suggest to test with different setups.
>
> 1) SWIOTLB 2) VDUSE
>
> Note that SWIOTLB will do bouncing even for DMA_FROM_DEVICE,
I meant dma map in this case actually.
Thanks
> so I
> think we may see better performance there.
>
> And we need to try with different request size, I did a similar patch
> for virtio-blk and I see better performance for large request like 1M
> etc.
>
> Thanks
>
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> > drivers/virtio/virtio_ring.c | 2 ++
> > fs/fuse/virtio_fs.c | 25 +++++++++++++++++++++++--
> > 2 files changed, 25 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index e49912fa77c5..eb22bfcb9100 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -580,6 +580,7 @@ int virtqueue_map_sgs(struct virtqueue *_vq,
> > goto unmap_release;
> >
> > sg_dma_address(sg) = addr;
> > + sg_dma_len(sg) = sg->length;
> > mapped_sg++;
> > }
> > }
> > @@ -592,6 +593,7 @@ int virtqueue_map_sgs(struct virtqueue *_vq,
> > goto unmap_release;
> >
> > sg_dma_address(sg) = addr;
> > + sg_dma_len(sg) = sg->length;
> > mapped_sg++;
> > }
> > }
> > diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
> > index 1344c5782a7c..2b558b05d0f8 100644
> > --- a/fs/fuse/virtio_fs.c
> > +++ b/fs/fuse/virtio_fs.c
> > @@ -836,8 +836,21 @@ static void virtio_fs_requests_done_work(struct work_struct *work)
> >
> > /* End requests */
> > list_for_each_entry_safe(req, next, &reqs, list) {
> > + struct scatterlist *stack_sgs[6];
> > + struct scatterlist **sgs = stack_sgs;
> > + unsigned int total_sgs = req->out_sgs + req->in_sgs;
> > +
> > list_del_init(&req->list);
> >
> > + /* TODO replace magic 6 by a macro */
> > + if (total_sgs > 6)
> > + sgs = kmalloc_array(total_sgs, sizeof(sgs[0]), GFP_ATOMIC);
> > +
> > + for (unsigned int i = 0; i < total_sgs; ++i)
> > + sgs[i] = &req->sg[i];
> > +
> > + virtqueue_unmap_sgs(vq, sgs, req->out_sgs, req->in_sgs);
> > +
> > /* blocking async request completes in a worker context */
> > if (req->args->may_block) {
> > struct virtio_fs_req_work *w;
> > @@ -850,6 +863,9 @@ static void virtio_fs_requests_done_work(struct work_struct *work)
> > } else {
> > virtio_fs_request_complete(req, fsvq);
> > }
> > +
> > + if (sgs != stack_sgs)
> > + kfree(sgs);
> > }
> >
> > /* Try to push previously queued requests, as the queue might no longer be full */
> > @@ -1426,6 +1442,11 @@ static int virtio_fs_enqueue_req(struct virtio_fs_vq *fsvq,
> > sgs[i] = &req->sg[i];
> > WARN_ON(req->out_sgs + req->in_sgs != total_sgs);
> >
> > + // TODO can we change this ptr out of the lock?
> > + vq = fsvq->vq;
> > + // TODO handle this and following errors
> > + ret = virtqueue_map_sgs(vq, sgs, req->out_sgs, req->in_sgs);
> > + BUG_ON(ret < 0);
> > spin_lock(&fsvq->lock);
> >
> > if (!fsvq->connected) {
> > @@ -1434,8 +1455,8 @@ static int virtio_fs_enqueue_req(struct virtio_fs_vq *fsvq,
> > goto out;
> > }
> >
> > - vq = fsvq->vq;
> > - ret = virtqueue_add_sgs(vq, sgs, req->out_sgs, req->in_sgs, req, GFP_ATOMIC);
> > + ret = virtqueue_add_sgs_premapped(vq, sgs, req->out_sgs,
> > + req->in_sgs, req, GFP_ATOMIC);
> > if (ret < 0) {
> > spin_unlock(&fsvq->lock);
> > goto out;
> > --
> > 2.48.1
> >
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC 2/5] virtio_ring: introduce virtqueue_map/unmap_sgs()
2025-01-23 1:51 ` Jason Wang
@ 2025-01-23 7:31 ` Eugenio Perez Martin
2025-01-24 1:03 ` Jason Wang
0 siblings, 1 reply; 14+ messages in thread
From: Eugenio Perez Martin @ 2025-01-23 7:31 UTC (permalink / raw)
To: Jason Wang
Cc: linux-kernel, mst, Hanna Reitz, Stefano Garzarella, Xuan Zhuo,
Vivek Goyal, Miklos Szeredi, German Maglione, Stefan Hajnoczi,
virtualization
On Thu, Jan 23, 2025 at 2:51 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Thu, Jan 23, 2025 at 12:32 AM Eugenio Pérez <eperezma@redhat.com> wrote:
> >
> > From: Jason Wang <jasowang@redhat.com>
> >
> > Introduce new virtqueue DMA operations which allows the drivers that
> > want to make use of the premapping API but operate at the sg level.
> >
> > Note that we still follow the assumtions if virtqueue_add() so
> > dma_map_sg() is not used. This could be optimized in the future.
> > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > --
> > Eugenio's changes: Remove blank
> > TODO: Should we call directly dma_map instead of this? XDP do the direct
> > call.
>
> Note that we should have an indirection layer as the device is not
> necessarily DMA capable.
>
Can you expand on this? virtqueue_map_sgs calls directly to
dma_map_page for each entry in the sg. This indirection layer should
be in the device's callback for DMA, isn't it?
> And we probably need to rename the virtqueue_dma_map_XXX() to
> virtqueue_map_XXX()
>
Ok so maybe these functions are already enough without using
dma_map_sgs. I"m ok with renaming these, but that should be done in a
separate series, isn't it?
Thanks!
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC 3/5] virtiofs: Move stack sg to fuse_req
2025-01-23 1:54 ` Jason Wang
@ 2025-01-23 7:33 ` Eugenio Perez Martin
0 siblings, 0 replies; 14+ messages in thread
From: Eugenio Perez Martin @ 2025-01-23 7:33 UTC (permalink / raw)
To: Jason Wang
Cc: linux-kernel, mst, Hanna Reitz, Stefano Garzarella, Xuan Zhuo,
Vivek Goyal, Miklos Szeredi, German Maglione, Stefan Hajnoczi,
virtualization
On Thu, Jan 23, 2025 at 2:54 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Thu, Jan 23, 2025 at 12:32 AM Eugenio Pérez <eperezma@redhat.com> wrote:
> >
> > We need to move the map and unmap out of virtiofs queue lock. We need to
> > store the unmap sgs to call virtqueue_unmap_sgs, so use the request
> > itself.
> >
> > TODO: We need to store the forget sgs too. In my tests it is not called
> > so we're safe. Find a way to force-call it.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> > fs/fuse/fuse_i.h | 7 +++++++
> > fs/fuse/virtio_fs.c | 45 +++++++++++++++++++++------------------------
> > 2 files changed, 28 insertions(+), 24 deletions(-)
> >
> > diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> > index 74744c6f2860..e57664daa761 100644
> > --- a/fs/fuse/fuse_i.h
> > +++ b/fs/fuse/fuse_i.h
> > @@ -18,6 +18,7 @@
> > #include <linux/mount.h>
> > #include <linux/wait.h>
> > #include <linux/list.h>
> > +#include <linux/scatterlist.h>
> > #include <linux/spinlock.h>
> > #include <linux/mm.h>
> > #include <linux/backing-dev.h>
> > @@ -434,6 +435,12 @@ struct fuse_req {
> > #if IS_ENABLED(CONFIG_VIRTIO_FS)
> > /** virtio-fs's physically contiguous buffer for in and out args */
> > void *argbuf;
> > +
> > + /** virtio-fs's pre-mapped stuff */
> > + struct scatterlist sg_inline_data[6]; /* optimization for short requests */
> > + struct scatterlist *sg;
> > + unsigned int out_sgs;
> > + unsigned int in_sgs;
> > #endif
>
> As this touches FUSE, I think we need to do a careful benchmark just
> for this patch to make sure no regression is introduced.
>
Sure, I can do it. But the struct already has a virtiofs field (argbuf).
> A dumb question (know little about FUSE), if this is only used by
> virtio-fs, could we find a way to embed fuse_req in a virtio-fs
> dedicated req structure?
>
I guess FUSE can ask virtiofs to allocate the struct for sure.
> >
> > /** fuse_mount this request belongs to */
> > diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
> > index 82afe78ec542..1344c5782a7c 100644
> > --- a/fs/fuse/virtio_fs.c
> > +++ b/fs/fuse/virtio_fs.c
> > @@ -1377,14 +1377,10 @@ static int virtio_fs_enqueue_req(struct virtio_fs_vq *fsvq,
> > {
> > /* requests need at least 4 elements */
> > struct scatterlist *stack_sgs[6];
> > - struct scatterlist stack_sg[ARRAY_SIZE(stack_sgs)];
> > struct scatterlist **sgs = stack_sgs;
> > - struct scatterlist *sg = stack_sg;
> > struct virtqueue *vq;
> > struct fuse_args *args = req->args;
> > unsigned int argbuf_used = 0;
> > - unsigned int out_sgs = 0;
> > - unsigned int in_sgs = 0;
> > unsigned int total_sgs;
> > unsigned int i;
> > int ret;
> > @@ -1392,11 +1388,13 @@ static int virtio_fs_enqueue_req(struct virtio_fs_vq *fsvq,
> > struct fuse_pqueue *fpq;
> >
> > /* Does the sglist fit on the stack? */
> > + /* TODO replace magic 6 by a macro */
> > + req->sg = req->sg_inline_data;
> > total_sgs = sg_count_fuse_req(req);
> > - if (total_sgs > ARRAY_SIZE(stack_sgs)) {
> > + if (total_sgs > 6) {
> > sgs = kmalloc_array(total_sgs, sizeof(sgs[0]), gfp);
> > - sg = kmalloc_array(total_sgs, sizeof(sg[0]), gfp);
> > - if (!sgs || !sg) {
> > + req->sg = kmalloc_array(total_sgs, sizeof(req->sg[0]), gfp);
> > + if (!sgs || !req->sg) {
> > ret = -ENOMEM;
> > goto out;
> > }
> > @@ -1408,26 +1406,25 @@ static int virtio_fs_enqueue_req(struct virtio_fs_vq *fsvq,
> > goto out;
> >
> > /* Request elements */
> > - sg_init_one(&sg[out_sgs++], &req->in.h, sizeof(req->in.h));
> > - out_sgs += sg_init_fuse_args(&sg[out_sgs], req,
> > - (struct fuse_arg *)args->in_args,
> > - args->in_numargs, args->in_pages,
> > - req->argbuf, &argbuf_used);
> > + sg_init_one(&req->sg[req->out_sgs++], &req->in.h, sizeof(req->in.h));
> > + req->out_sgs += sg_init_fuse_args(&req->sg[req->out_sgs], req,
> > + (struct fuse_arg *)args->in_args,
> > + args->in_numargs, args->in_pages,
> > + req->argbuf, &argbuf_used);
> >
> > /* Reply elements */
> > if (test_bit(FR_ISREPLY, &req->flags)) {
> > - sg_init_one(&sg[out_sgs + in_sgs++],
> > + sg_init_one(&req->sg[req->out_sgs + req->in_sgs++],
> > &req->out.h, sizeof(req->out.h));
> > - in_sgs += sg_init_fuse_args(&sg[out_sgs + in_sgs], req,
> > - args->out_args, args->out_numargs,
> > - args->out_pages,
> > - req->argbuf + argbuf_used, NULL);
> > + req->in_sgs += sg_init_fuse_args(&req->sg[req->out_sgs + req->in_sgs], req,
> > + args->out_args, args->out_numargs,
> > + args->out_pages,
> > + req->argbuf + argbuf_used, NULL);
> > }
> >
> > - WARN_ON(out_sgs + in_sgs != total_sgs);
> > -
> > for (i = 0; i < total_sgs; i++)
> > - sgs[i] = &sg[i];
> > + sgs[i] = &req->sg[i];
> > + WARN_ON(req->out_sgs + req->in_sgs != total_sgs);
> >
> > spin_lock(&fsvq->lock);
> >
> > @@ -1438,7 +1435,7 @@ static int virtio_fs_enqueue_req(struct virtio_fs_vq *fsvq,
> > }
> >
> > vq = fsvq->vq;
> > - ret = virtqueue_add_sgs(vq, sgs, out_sgs, in_sgs, req, GFP_ATOMIC);
> > + ret = virtqueue_add_sgs(vq, sgs, req->out_sgs, req->in_sgs, req, GFP_ATOMIC);
> > if (ret < 0) {
> > spin_unlock(&fsvq->lock);
> > goto out;
> > @@ -1467,10 +1464,10 @@ static int virtio_fs_enqueue_req(struct virtio_fs_vq *fsvq,
> > kfree(req->argbuf);
> > req->argbuf = NULL;
> > }
> > - if (sgs != stack_sgs) {
> > + if (ret < 0 && req->sg != req->sg_inline_data)
> > + kfree(req->sg);
> > + if (sgs != stack_sgs)
> > kfree(sgs);
> > - kfree(sg);
> > - }
> >
> > return ret;
> > }
> > --
> > 2.48.1
> >
>
> Thanks
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC 2/5] virtio_ring: introduce virtqueue_map/unmap_sgs()
2025-01-23 7:31 ` Eugenio Perez Martin
@ 2025-01-24 1:03 ` Jason Wang
0 siblings, 0 replies; 14+ messages in thread
From: Jason Wang @ 2025-01-24 1:03 UTC (permalink / raw)
To: Eugenio Perez Martin
Cc: linux-kernel, mst, Hanna Reitz, Stefano Garzarella, Xuan Zhuo,
Vivek Goyal, Miklos Szeredi, German Maglione, Stefan Hajnoczi,
virtualization
On Thu, Jan 23, 2025 at 3:32 PM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Thu, Jan 23, 2025 at 2:51 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Thu, Jan 23, 2025 at 12:32 AM Eugenio Pérez <eperezma@redhat.com> wrote:
> > >
> > > From: Jason Wang <jasowang@redhat.com>
> > >
> > > Introduce new virtqueue DMA operations which allows the drivers that
> > > want to make use of the premapping API but operate at the sg level.
> > >
> > > Note that we still follow the assumtions if virtqueue_add() so
> > > dma_map_sg() is not used. This could be optimized in the future.
> > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > --
> > > Eugenio's changes: Remove blank
> > > TODO: Should we call directly dma_map instead of this? XDP do the direct
> > > call.
> >
> > Note that we should have an indirection layer as the device is not
> > necessarily DMA capable.
> >
>
> Can you expand on this? virtqueue_map_sgs calls directly to
> dma_map_page for each entry in the sg. This indirection layer should
> be in the device's callback for DMA, isn't it?
Nope, there are devices that can't do DMA.
Basically, the virtio core should
1) provide mapping helpers then we should use
2) transport specific method to do the mapping, e.g PCI knows it can
do DMA so it can use DMA API
only 1) is implemented now, in the future we need to implement 2) (we
still have dirty hacks for vdpa_sim etc now).
So it's better not to go back to calling dma_map_sg() directly.
>
> > And we probably need to rename the virtqueue_dma_map_XXX() to
> > virtqueue_map_XXX()
> >
>
> Ok so maybe these functions are already enough without using
> dma_map_sgs. I"m ok with renaming these, but that should be done in a
> separate series, isn't it?
Yes, this explains 1) above.
Thanks
>
> Thanks!
>
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2025-01-24 1:03 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-22 16:31 [RFC 0/5] virtiofs: map buffer out of virtqueue lock Eugenio Pérez
2025-01-22 16:31 ` [RFC 1/5] vduse: add virtio_fs to allowed dev id Eugenio Pérez
2025-01-23 1:48 ` Jason Wang
2025-01-22 16:31 ` [RFC 2/5] virtio_ring: introduce virtqueue_map/unmap_sgs() Eugenio Pérez
2025-01-23 1:51 ` Jason Wang
2025-01-23 7:31 ` Eugenio Perez Martin
2025-01-24 1:03 ` Jason Wang
2025-01-22 16:31 ` [RFC 3/5] virtiofs: Move stack sg to fuse_req Eugenio Pérez
2025-01-23 1:54 ` Jason Wang
2025-01-23 7:33 ` Eugenio Perez Martin
2025-01-22 16:31 ` [RFC 4/5] virtio_ring: add virtqueue premapped out/in buffers support Eugenio Pérez
2025-01-22 16:31 ` [RFC 5/5] virtiofs: perform DMA operations out of the spinlock Eugenio Pérez
2025-01-23 1:56 ` Jason Wang
2025-01-23 2:26 ` Jason Wang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).