* [PATCH vhost v9 00/12] virtio core prepares for AF_XDP
@ 2023-05-17 2:22 Xuan Zhuo
2023-05-17 2:22 ` [PATCH vhost v9 01/12] virtio_ring: put mapping error check in vring_map_one_sg Xuan Zhuo
` (11 more replies)
0 siblings, 12 replies; 44+ messages in thread
From: Xuan Zhuo @ 2023-05-17 2:22 UTC (permalink / raw)
To: virtualization; +Cc: Christoph Hellwig, Xuan Zhuo, Michael S. Tsirkin
## About DMA APIs
Now, virtio may can not work with DMA APIs when virtio features do not have
VIRTIO_F_ACCESS_PLATFORM.
1. I tried to let DMA APIs return phy address by virtio-device. But DMA APIs just
work with the "real" devices.
2. I tried to let xsk support callballs to get phy address from virtio-net
driver as the dma address. But the maintainers of xsk may want to use dma-buf
to replace the DMA APIs. I think that may be a larger effort. We will wait
too long.
So rethinking this, firstly, we can support premapped-dma only for devices with
VIRTIO_F_ACCESS_PLATFORM. In the case of af-xdp, if the users want to use it,
they have to update the device to support VIRTIO_F_RING_RESET, and they can also
enable the device's VIRTIO_F_ACCESS_PLATFORM feature.
Thanks for the help from Christoph.
=================
XDP socket(AF_XDP) is an excellent bypass kernel network framework. The zero
copy feature of xsk (XDP socket) needs to be supported by the driver. The
performance of zero copy is very good.
ENV: Qemu with vhost.
vhost cpu | Guest APP CPU |Guest Softirq CPU | PPS
-----------------------------|---------------|------------------|------------
xmit by sockperf: 90% | 100% | | 318967
xmit by xsk: 100% | 30% | 33% | 1192064
recv by sockperf: 100% | 68% | 100% | 692288
recv by xsk: 100% | 33% | 43% | 771670
Before achieving the function of Virtio-Net, we also have to let virtio core
support these features:
1. virtio core support premapped
2. virtio core support reset per-queue
3. introduce DMA APIs to virtio core
Please review.
Thanks.
v9:
1. use flag to distinguish the premapped operations. no do judgment by sg.
v8:
1. vring_sg_address: check by sg_page(sg) not dma_address. Because 0 is a valid dma address
2. remove unused code from vring_map_one_sg()
v7:
1. virtqueue_dma_dev() return NULL when virtio is without DMA API.
v6:
1. change the size of the flags to u32.
v5:
1. fix for error handler
2. add flags to record internal dma mapping
v4:
1. rename map_inter to dma_map_internal
2. fix: Excess function parameter 'vq' description in 'virtqueue_dma_dev'
v3:
1. add map_inter to struct desc state to reocrd whether virtio core do dma map
v2:
1. based on sgs[0]->dma_address to judgment is premapped
2. based on extra.addr to judgment to do unmap for no-indirect desc
3. based on indir_desc to judgment to do unmap for indirect desc
4. rename virtqueue_get_dma_dev to virtqueue_dma_dev
v1:
1. expose dma device. NO introduce the api for dma and sync
2. split some commit for review.
Xuan Zhuo (12):
virtio_ring: put mapping error check in vring_map_one_sg
virtio_ring: simplify the reference of desc state inside
detach_buf_split()
virtio_ring: check use_dma_api before unmap desc for indirect
virtio_ring: virtqueue_add() support premapped
virtio_ring: split: virtqueue_add_split() support premapped
virtio_ring: packed: virtqueue_add_packed() support premapped
virtio_ring: introduce virtqueue_add_outbuf_premapped()
virtio_ring: introduce virtqueue_add_inbuf_premapped()
virtio_ring: introduce virtqueue_dma_dev()
virtio_ring: correct the expression of the description of
virtqueue_resize()
virtio_ring: separate the logic of reset/enable from virtqueue_resize
virtio_ring: introduce virtqueue_reset()
drivers/virtio/virtio_ring.c | 296 +++++++++++++++++++++++++++--------
include/linux/virtio.h | 14 ++
2 files changed, 246 insertions(+), 64 deletions(-)
--
2.32.0.3.g01195cf9f
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH vhost v9 01/12] virtio_ring: put mapping error check in vring_map_one_sg
2023-05-17 2:22 [PATCH vhost v9 00/12] virtio core prepares for AF_XDP Xuan Zhuo
@ 2023-05-17 2:22 ` Xuan Zhuo
2023-05-18 6:51 ` Jason Wang
2023-05-23 6:02 ` Christoph Hellwig
2023-05-17 2:22 ` [PATCH vhost v9 02/12] virtio_ring: simplify the reference of desc state inside detach_buf_split() Xuan Zhuo
` (10 subsequent siblings)
11 siblings, 2 replies; 44+ messages in thread
From: Xuan Zhuo @ 2023-05-17 2:22 UTC (permalink / raw)
To: virtualization; +Cc: Christoph Hellwig, Xuan Zhuo, Michael S. Tsirkin
This patch put the dma addr error check in vring_map_one_sg().
The benefits of doing this:
1. make vring_map_one_sg more simple, without calling
vring_mapping_error to check the return value.
2. reduce one judgment of vq->use_dma_api.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
drivers/virtio/virtio_ring.c | 37 +++++++++++++++++++++---------------
1 file changed, 22 insertions(+), 15 deletions(-)
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index c5310eaf8b46..c563215be6b9 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -355,9 +355,8 @@ static struct device *vring_dma_dev(const struct vring_virtqueue *vq)
}
/* Map one sg entry. */
-static dma_addr_t vring_map_one_sg(const struct vring_virtqueue *vq,
- struct scatterlist *sg,
- enum dma_data_direction direction)
+static int vring_map_one_sg(const struct vring_virtqueue *vq, struct scatterlist *sg,
+ enum dma_data_direction direction, static dma_addr_t *addr)
{
if (!vq->use_dma_api) {
/*
@@ -366,7 +365,8 @@ static dma_addr_t vring_map_one_sg(const struct vring_virtqueue *vq,
* depending on the direction.
*/
kmsan_handle_dma(sg_page(sg), sg->offset, sg->length, direction);
- return (dma_addr_t)sg_phys(sg);
+ *addr = (dma_addr_t)sg_phys(sg);
+ return 0;
}
/*
@@ -374,9 +374,14 @@ static dma_addr_t vring_map_one_sg(const struct vring_virtqueue *vq,
* the way it expects (we don't guarantee that the scatterlist
* will exist for the lifetime of the mapping).
*/
- return dma_map_page(vring_dma_dev(vq),
+ *addr = dma_map_page(vring_dma_dev(vq),
sg_page(sg), sg->offset, sg->length,
direction);
+
+ if (dma_mapping_error(vring_dma_dev(vq), *addr))
+ return -ENOMEM;
+
+ return 0;
}
static dma_addr_t vring_map_single(const struct vring_virtqueue *vq,
@@ -588,8 +593,9 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
for (n = 0; n < out_sgs; n++) {
for (sg = sgs[n]; sg; sg = sg_next(sg)) {
- dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_TO_DEVICE);
- if (vring_mapping_error(vq, addr))
+ dma_addr_t addr;
+
+ if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, &addr))
goto unmap_release;
prev = i;
@@ -603,8 +609,9 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
}
for (; n < (out_sgs + in_sgs); n++) {
for (sg = sgs[n]; sg; sg = sg_next(sg)) {
- dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_FROM_DEVICE);
- if (vring_mapping_error(vq, addr))
+ dma_addr_t addr;
+
+ if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, &addr))
goto unmap_release;
prev = i;
@@ -1279,9 +1286,8 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
for (n = 0; n < out_sgs + in_sgs; n++) {
for (sg = sgs[n]; sg; sg = sg_next(sg)) {
- addr = vring_map_one_sg(vq, sg, n < out_sgs ?
- DMA_TO_DEVICE : DMA_FROM_DEVICE);
- if (vring_mapping_error(vq, addr))
+ if (vring_map_one_sg(vq, sg, n < out_sgs ?
+ DMA_TO_DEVICE : DMA_FROM_DEVICE, &addr))
goto unmap_release;
desc[i].flags = cpu_to_le16(n < out_sgs ?
@@ -1426,9 +1432,10 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
c = 0;
for (n = 0; n < out_sgs + in_sgs; n++) {
for (sg = sgs[n]; sg; sg = sg_next(sg)) {
- dma_addr_t addr = vring_map_one_sg(vq, sg, n < out_sgs ?
- DMA_TO_DEVICE : DMA_FROM_DEVICE);
- if (vring_mapping_error(vq, addr))
+ dma_addr_t addr;
+
+ if (vring_map_one_sg(vq, sg, n < out_sgs ?
+ DMA_TO_DEVICE : DMA_FROM_DEVICE, &addr))
goto unmap_release;
flags = cpu_to_le16(vq->packed.avail_used_flags |
--
2.32.0.3.g01195cf9f
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH vhost v9 02/12] virtio_ring: simplify the reference of desc state inside detach_buf_split()
2023-05-17 2:22 [PATCH vhost v9 00/12] virtio core prepares for AF_XDP Xuan Zhuo
2023-05-17 2:22 ` [PATCH vhost v9 01/12] virtio_ring: put mapping error check in vring_map_one_sg Xuan Zhuo
@ 2023-05-17 2:22 ` Xuan Zhuo
2023-05-18 6:51 ` Jason Wang
2023-05-17 2:22 ` [PATCH vhost v9 03/12] virtio_ring: check use_dma_api before unmap desc for indirect Xuan Zhuo
` (9 subsequent siblings)
11 siblings, 1 reply; 44+ messages in thread
From: Xuan Zhuo @ 2023-05-17 2:22 UTC (permalink / raw)
To: virtualization; +Cc: Christoph Hellwig, Xuan Zhuo, Michael S. Tsirkin
The purpose of this is to simplify the reference to state. It is
convenient for subsequent commit.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
drivers/virtio/virtio_ring.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index c563215be6b9..479203346c36 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -744,11 +744,14 @@ static bool virtqueue_kick_prepare_split(struct virtqueue *_vq)
static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
void **ctx)
{
+ struct vring_desc_state_split *state;
unsigned int i, j;
__virtio16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT);
+ state = &vq->split.desc_state[head];
+
/* Clear data ptr. */
- vq->split.desc_state[head].data = NULL;
+ state->data = NULL;
/* Put back on free list: unmap first-level descriptors and find end */
i = head;
@@ -767,8 +770,7 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
vq->vq.num_free++;
if (vq->indirect) {
- struct vring_desc *indir_desc =
- vq->split.desc_state[head].indir_desc;
+ struct vring_desc *indir_desc = state->indir_desc;
u32 len;
/* Free the indirect table, if any, now that it's unmapped. */
@@ -785,9 +787,9 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
vring_unmap_one_split_indirect(vq, &indir_desc[j]);
kfree(indir_desc);
- vq->split.desc_state[head].indir_desc = NULL;
+ state->indir_desc = NULL;
} else if (ctx) {
- *ctx = vq->split.desc_state[head].indir_desc;
+ *ctx = state->indir_desc;
}
}
--
2.32.0.3.g01195cf9f
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH vhost v9 03/12] virtio_ring: check use_dma_api before unmap desc for indirect
2023-05-17 2:22 [PATCH vhost v9 00/12] virtio core prepares for AF_XDP Xuan Zhuo
2023-05-17 2:22 ` [PATCH vhost v9 01/12] virtio_ring: put mapping error check in vring_map_one_sg Xuan Zhuo
2023-05-17 2:22 ` [PATCH vhost v9 02/12] virtio_ring: simplify the reference of desc state inside detach_buf_split() Xuan Zhuo
@ 2023-05-17 2:22 ` Xuan Zhuo
2023-05-18 6:51 ` Jason Wang
2023-05-17 2:22 ` [PATCH vhost v9 04/12] virtio_ring: virtqueue_add() support premapped Xuan Zhuo
` (8 subsequent siblings)
11 siblings, 1 reply; 44+ messages in thread
From: Xuan Zhuo @ 2023-05-17 2:22 UTC (permalink / raw)
To: virtualization; +Cc: Christoph Hellwig, Xuan Zhuo, Michael S. Tsirkin
Inside detach_buf_split(), if use_dma_api is false,
vring_unmap_one_split_indirect will be called many times, but actually
nothing is done. So this patch check use_dma_api firstly.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
drivers/virtio/virtio_ring.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 479203346c36..1ffab1eb40c0 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -783,8 +783,10 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
VRING_DESC_F_INDIRECT));
BUG_ON(len == 0 || len % sizeof(struct vring_desc));
- for (j = 0; j < len / sizeof(struct vring_desc); j++)
- vring_unmap_one_split_indirect(vq, &indir_desc[j]);
+ if (vq->use_dma_api) {
+ for (j = 0; j < len / sizeof(struct vring_desc); j++)
+ vring_unmap_one_split_indirect(vq, &indir_desc[j]);
+ }
kfree(indir_desc);
state->indir_desc = NULL;
--
2.32.0.3.g01195cf9f
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH vhost v9 04/12] virtio_ring: virtqueue_add() support premapped
2023-05-17 2:22 [PATCH vhost v9 00/12] virtio core prepares for AF_XDP Xuan Zhuo
` (2 preceding siblings ...)
2023-05-17 2:22 ` [PATCH vhost v9 03/12] virtio_ring: check use_dma_api before unmap desc for indirect Xuan Zhuo
@ 2023-05-17 2:22 ` Xuan Zhuo
2023-05-18 6:51 ` Jason Wang
2023-05-23 6:03 ` Christoph Hellwig
2023-05-17 2:22 ` [PATCH vhost v9 05/12] virtio_ring: split: virtqueue_add_split() " Xuan Zhuo
` (7 subsequent siblings)
11 siblings, 2 replies; 44+ messages in thread
From: Xuan Zhuo @ 2023-05-17 2:22 UTC (permalink / raw)
To: virtualization; +Cc: Christoph Hellwig, Xuan Zhuo, Michael S. Tsirkin
virtuque_add() adds parameter premapped.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
drivers/virtio/virtio_ring.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 1ffab1eb40c0..e2fc50c05bec 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2135,6 +2135,7 @@ static inline int virtqueue_add(struct virtqueue *_vq,
unsigned int in_sgs,
void *data,
void *ctx,
+ bool premapped,
gfp_t gfp)
{
struct vring_virtqueue *vq = to_vvq(_vq);
@@ -2176,7 +2177,7 @@ int virtqueue_add_sgs(struct virtqueue *_vq,
total_sg++;
}
return virtqueue_add(_vq, sgs, total_sg, out_sgs, in_sgs,
- data, NULL, gfp);
+ data, NULL, false, gfp);
}
EXPORT_SYMBOL_GPL(virtqueue_add_sgs);
@@ -2198,7 +2199,7 @@ int virtqueue_add_outbuf(struct virtqueue *vq,
void *data,
gfp_t gfp)
{
- return virtqueue_add(vq, &sg, num, 1, 0, data, NULL, gfp);
+ return virtqueue_add(vq, &sg, num, 1, 0, data, NULL, false, gfp);
}
EXPORT_SYMBOL_GPL(virtqueue_add_outbuf);
@@ -2220,7 +2221,7 @@ int virtqueue_add_inbuf(struct virtqueue *vq,
void *data,
gfp_t gfp)
{
- return virtqueue_add(vq, &sg, num, 0, 1, data, NULL, gfp);
+ return virtqueue_add(vq, &sg, num, 0, 1, data, NULL, false, gfp);
}
EXPORT_SYMBOL_GPL(virtqueue_add_inbuf);
@@ -2244,7 +2245,7 @@ int virtqueue_add_inbuf_ctx(struct virtqueue *vq,
void *ctx,
gfp_t gfp)
{
- return virtqueue_add(vq, &sg, num, 0, 1, data, ctx, gfp);
+ return virtqueue_add(vq, &sg, num, 0, 1, data, ctx, false, gfp);
}
EXPORT_SYMBOL_GPL(virtqueue_add_inbuf_ctx);
--
2.32.0.3.g01195cf9f
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH vhost v9 05/12] virtio_ring: split: virtqueue_add_split() support premapped
2023-05-17 2:22 [PATCH vhost v9 00/12] virtio core prepares for AF_XDP Xuan Zhuo
` (3 preceding siblings ...)
2023-05-17 2:22 ` [PATCH vhost v9 04/12] virtio_ring: virtqueue_add() support premapped Xuan Zhuo
@ 2023-05-17 2:22 ` Xuan Zhuo
2023-05-18 6:51 ` Jason Wang
2023-05-17 2:22 ` [PATCH vhost v9 06/12] virtio_ring: packed: virtqueue_add_packed() " Xuan Zhuo
` (6 subsequent siblings)
11 siblings, 1 reply; 44+ messages in thread
From: Xuan Zhuo @ 2023-05-17 2:22 UTC (permalink / raw)
To: virtualization; +Cc: Christoph Hellwig, Xuan Zhuo, Michael S. Tsirkin
virtqueue_add_split() only supports virtual addresses, dma is completed
in virtqueue_add_split().
In some scenarios (such as the AF_XDP scenario), the memory is allocated
and DMA is completed in advance, so it is necessary for us to support
passing the DMA address to virtqueue_add_split().
Record this information in desc_state, we can skip unmap based on this
when executing dma unmap.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
drivers/virtio/virtio_ring.c | 38 +++++++++++++++++++++++++++---------
1 file changed, 29 insertions(+), 9 deletions(-)
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index e2fc50c05bec..bd5e84afab37 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -70,6 +70,7 @@
struct vring_desc_state_split {
void *data; /* Data for callback. */
struct vring_desc *indir_desc; /* Indirect descriptor, if any. */
+ bool premapped; /* DMA mapping is done by driver. */
};
struct vring_desc_state_packed {
@@ -356,8 +357,14 @@ static struct device *vring_dma_dev(const struct vring_virtqueue *vq)
/* Map one sg entry. */
static int vring_map_one_sg(const struct vring_virtqueue *vq, struct scatterlist *sg,
- enum dma_data_direction direction, static dma_addr_t *addr)
+ enum dma_data_direction direction,
+ bool premapped, dma_addr_t *addr)
{
+ if (premapped) {
+ *addr = sg_dma_address(sg);
+ return 0;
+ }
+
if (!vq->use_dma_api) {
/*
* If DMA is not used, KMSAN doesn't know that the scatterlist
@@ -445,7 +452,7 @@ static void vring_unmap_one_split_indirect(const struct vring_virtqueue *vq,
}
static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
- unsigned int i)
+ unsigned int i, bool premapped)
{
struct vring_desc_extra *extra = vq->split.desc_extra;
u16 flags;
@@ -462,6 +469,9 @@ static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
(flags & VRING_DESC_F_WRITE) ?
DMA_FROM_DEVICE : DMA_TO_DEVICE);
} else {
+ if (premapped)
+ goto out;
+
dma_unmap_page(vring_dma_dev(vq),
extra[i].addr,
extra[i].len,
@@ -532,6 +542,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
unsigned int in_sgs,
void *data,
void *ctx,
+ bool premapped,
gfp_t gfp)
{
struct vring_virtqueue *vq = to_vvq(_vq);
@@ -595,7 +606,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
for (sg = sgs[n]; sg; sg = sg_next(sg)) {
dma_addr_t addr;
- if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, &addr))
+ if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, premapped, &addr))
goto unmap_release;
prev = i;
@@ -611,7 +622,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
for (sg = sgs[n]; sg; sg = sg_next(sg)) {
dma_addr_t addr;
- if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, &addr))
+ if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, premapped, &addr))
goto unmap_release;
prev = i;
@@ -657,6 +668,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
/* Store token and indirect buffer state. */
vq->split.desc_state[head].data = data;
+ vq->split.desc_state[head].premapped = premapped;
if (indirect)
vq->split.desc_state[head].indir_desc = desc;
else
@@ -686,6 +698,14 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
return 0;
unmap_release:
+ if (premapped) {
+ if (indirect)
+ kfree(desc);
+
+ END_USE(vq);
+ return -ENOMEM;
+ }
+
err_idx = i;
if (indirect)
@@ -700,7 +720,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
vring_unmap_one_split_indirect(vq, &desc[i]);
i = virtio16_to_cpu(_vq->vdev, desc[i].next);
} else
- i = vring_unmap_one_split(vq, i);
+ i = vring_unmap_one_split(vq, i, false);
}
if (indirect)
@@ -757,12 +777,12 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
i = head;
while (vq->split.vring.desc[i].flags & nextflag) {
- vring_unmap_one_split(vq, i);
+ vring_unmap_one_split(vq, i, state->premapped);
i = vq->split.desc_extra[i].next;
vq->vq.num_free++;
}
- vring_unmap_one_split(vq, i);
+ vring_unmap_one_split(vq, i, state->premapped);
vq->split.desc_extra[i].next = vq->free_head;
vq->free_head = head;
@@ -783,7 +803,7 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
VRING_DESC_F_INDIRECT));
BUG_ON(len == 0 || len % sizeof(struct vring_desc));
- if (vq->use_dma_api) {
+ if (vq->use_dma_api && !state->premapped) {
for (j = 0; j < len / sizeof(struct vring_desc); j++)
vring_unmap_one_split_indirect(vq, &indir_desc[j]);
}
@@ -2143,7 +2163,7 @@ static inline int virtqueue_add(struct virtqueue *_vq,
return vq->packed_ring ? virtqueue_add_packed(_vq, sgs, total_sg,
out_sgs, in_sgs, data, ctx, gfp) :
virtqueue_add_split(_vq, sgs, total_sg,
- out_sgs, in_sgs, data, ctx, gfp);
+ out_sgs, in_sgs, data, ctx, premapped, gfp);
}
/**
--
2.32.0.3.g01195cf9f
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH vhost v9 06/12] virtio_ring: packed: virtqueue_add_packed() support premapped
2023-05-17 2:22 [PATCH vhost v9 00/12] virtio core prepares for AF_XDP Xuan Zhuo
` (4 preceding siblings ...)
2023-05-17 2:22 ` [PATCH vhost v9 05/12] virtio_ring: split: virtqueue_add_split() " Xuan Zhuo
@ 2023-05-17 2:22 ` Xuan Zhuo
2023-05-17 2:22 ` [PATCH vhost v9 07/12] virtio_ring: introduce virtqueue_add_outbuf_premapped() Xuan Zhuo
` (5 subsequent siblings)
11 siblings, 0 replies; 44+ messages in thread
From: Xuan Zhuo @ 2023-05-17 2:22 UTC (permalink / raw)
To: virtualization; +Cc: Christoph Hellwig, Xuan Zhuo, Michael S. Tsirkin
virtqueue_add_packed() only supports virtual addresses, dma is completed
in virtqueue_add_packed().
In some scenarios (such as the AF_XDP scenario), the memory is allocated
and DMA is completed in advance, so it is necessary for us to support
passing the DMA address to virtqueue_add_packed().
Record this information in desc_state, we can skip unmap based on this
when executing dma unmap.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
drivers/virtio/virtio_ring.c | 48 ++++++++++++++++++++++++------------
1 file changed, 32 insertions(+), 16 deletions(-)
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index bd5e84afab37..e169c7653b32 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -78,6 +78,7 @@ struct vring_desc_state_packed {
struct vring_packed_desc *indir_desc; /* Indirect descriptor, if any. */
u16 num; /* Descriptor list length. */
u16 last; /* The last desc state in a list. */
+ bool premapped; /* DMA mapping is done by driver. */
};
struct vring_desc_extra {
@@ -1222,7 +1223,8 @@ static u16 packed_last_used(u16 last_used_idx)
}
static void vring_unmap_extra_packed(const struct vring_virtqueue *vq,
- const struct vring_desc_extra *extra)
+ const struct vring_desc_extra *extra,
+ bool premapped)
{
u16 flags;
@@ -1237,6 +1239,9 @@ static void vring_unmap_extra_packed(const struct vring_virtqueue *vq,
(flags & VRING_DESC_F_WRITE) ?
DMA_FROM_DEVICE : DMA_TO_DEVICE);
} else {
+ if (premapped)
+ return;
+
dma_unmap_page(vring_dma_dev(vq),
extra->addr, extra->len,
(flags & VRING_DESC_F_WRITE) ?
@@ -1284,7 +1289,8 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
unsigned int out_sgs,
unsigned int in_sgs,
void *data,
- gfp_t gfp)
+ gfp_t gfp,
+ bool premapped)
{
struct vring_packed_desc *desc;
struct scatterlist *sg;
@@ -1311,7 +1317,7 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
for (n = 0; n < out_sgs + in_sgs; n++) {
for (sg = sgs[n]; sg; sg = sg_next(sg)) {
if (vring_map_one_sg(vq, sg, n < out_sgs ?
- DMA_TO_DEVICE : DMA_FROM_DEVICE, &addr))
+ DMA_TO_DEVICE : DMA_FROM_DEVICE, premapped, &addr))
goto unmap_release;
desc[i].flags = cpu_to_le16(n < out_sgs ?
@@ -1371,6 +1377,7 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
vq->packed.desc_state[id].data = data;
vq->packed.desc_state[id].indir_desc = desc;
vq->packed.desc_state[id].last = id;
+ vq->packed.desc_state[id].premapped = premapped;
vq->num_added += 1;
@@ -1380,10 +1387,11 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
return 0;
unmap_release:
- err_idx = i;
-
- for (i = 0; i < err_idx; i++)
- vring_unmap_desc_packed(vq, &desc[i]);
+ if (!premapped) {
+ err_idx = i;
+ for (i = 0; i < err_idx; i++)
+ vring_unmap_desc_packed(vq, &desc[i]);
+ }
kfree(desc);
@@ -1398,6 +1406,7 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
unsigned int in_sgs,
void *data,
void *ctx,
+ bool premapped,
gfp_t gfp)
{
struct vring_virtqueue *vq = to_vvq(_vq);
@@ -1424,7 +1433,7 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
if (virtqueue_use_indirect(vq, total_sg)) {
err = virtqueue_add_indirect_packed(vq, sgs, total_sg, out_sgs,
- in_sgs, data, gfp);
+ in_sgs, data, gfp, premapped);
if (err != -ENOMEM) {
END_USE(vq);
return err;
@@ -1458,8 +1467,8 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
for (sg = sgs[n]; sg; sg = sg_next(sg)) {
dma_addr_t addr;
- if (vring_map_one_sg(vq, sg, n < out_sgs ?
- DMA_TO_DEVICE : DMA_FROM_DEVICE, &addr))
+ if (vring_map_one_sg(vq, sg, n < out_sgs ? DMA_TO_DEVICE : DMA_FROM_DEVICE,
+ premapped, &addr))
goto unmap_release;
flags = cpu_to_le16(vq->packed.avail_used_flags |
@@ -1507,6 +1516,7 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
vq->packed.desc_state[id].data = data;
vq->packed.desc_state[id].indir_desc = ctx;
vq->packed.desc_state[id].last = prev;
+ vq->packed.desc_state[id].premapped = premapped;
/*
* A driver MUST NOT make the first descriptor in the list
@@ -1523,16 +1533,21 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
return 0;
unmap_release:
+ vq->packed.avail_used_flags = avail_used_flags;
+
+ if (premapped) {
+ END_USE(vq);
+ return -EIO;
+ }
+
err_idx = i;
i = head;
curr = vq->free_head;
- vq->packed.avail_used_flags = avail_used_flags;
-
for (n = 0; n < total_sg; n++) {
if (i == err_idx)
break;
- vring_unmap_extra_packed(vq, &vq->packed.desc_extra[curr]);
+ vring_unmap_extra_packed(vq, &vq->packed.desc_extra[curr], false);
curr = vq->packed.desc_extra[curr].next;
i++;
if (i >= vq->packed.vring.num)
@@ -1612,7 +1627,8 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
curr = id;
for (i = 0; i < state->num; i++) {
vring_unmap_extra_packed(vq,
- &vq->packed.desc_extra[curr]);
+ &vq->packed.desc_extra[curr],
+ state->premapped);
curr = vq->packed.desc_extra[curr].next;
}
}
@@ -1625,7 +1641,7 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
if (!desc)
return;
- if (vq->use_dma_api) {
+ if (vq->use_dma_api && !state->premapped) {
len = vq->packed.desc_extra[id].len;
for (i = 0; i < len / sizeof(struct vring_packed_desc);
i++)
@@ -2161,7 +2177,7 @@ static inline int virtqueue_add(struct virtqueue *_vq,
struct vring_virtqueue *vq = to_vvq(_vq);
return vq->packed_ring ? virtqueue_add_packed(_vq, sgs, total_sg,
- out_sgs, in_sgs, data, ctx, gfp) :
+ out_sgs, in_sgs, data, ctx, premapped, gfp) :
virtqueue_add_split(_vq, sgs, total_sg,
out_sgs, in_sgs, data, ctx, premapped, gfp);
}
--
2.32.0.3.g01195cf9f
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH vhost v9 07/12] virtio_ring: introduce virtqueue_add_outbuf_premapped()
2023-05-17 2:22 [PATCH vhost v9 00/12] virtio core prepares for AF_XDP Xuan Zhuo
` (5 preceding siblings ...)
2023-05-17 2:22 ` [PATCH vhost v9 06/12] virtio_ring: packed: virtqueue_add_packed() " Xuan Zhuo
@ 2023-05-17 2:22 ` Xuan Zhuo
2023-05-17 2:22 ` [PATCH vhost v9 08/12] virtio_ring: introduce virtqueue_add_inbuf_premapped() Xuan Zhuo
` (4 subsequent siblings)
11 siblings, 0 replies; 44+ messages in thread
From: Xuan Zhuo @ 2023-05-17 2:22 UTC (permalink / raw)
To: virtualization; +Cc: Christoph Hellwig, Xuan Zhuo, Michael S. Tsirkin
Introduce virtqueue_add_outbuf_premapped() to submit premapped sgs.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
drivers/virtio/virtio_ring.c | 25 +++++++++++++++++++++++++
include/linux/virtio.h | 5 +++++
2 files changed, 30 insertions(+)
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index e169c7653b32..3d3e602fd261 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2239,6 +2239,31 @@ int virtqueue_add_outbuf(struct virtqueue *vq,
}
EXPORT_SYMBOL_GPL(virtqueue_add_outbuf);
+/**
+ * virtqueue_add_outbuf_premapped - expose output buffers to other end
+ * @vq: the struct virtqueue we're talking about.
+ * @sg: scatterlist (must be well-formed and terminated!)
+ * @num: the number of entries in @sg readable by other side
+ * @data: the token identifying the buffer.
+ * @gfp: how to do memory allocations (if necessary).
+ *
+ * Caller must ensure we don't call this with other virtqueue operations
+ * at the same time (except where noted).
+ *
+ * It is required that all addrs have completed DMA operations. And use
+ * sg->dma_address, sg->length to pass addr and length.
+ *
+ * Returns zero or a negative error (ie. ENOSPC, ENOMEM, EIO).
+ */
+int virtqueue_add_outbuf_premapped(struct virtqueue *vq,
+ struct scatterlist *sg, unsigned int num,
+ void *data,
+ gfp_t gfp)
+{
+ return virtqueue_add(vq, &sg, num, 1, 0, data, NULL, true, gfp);
+}
+EXPORT_SYMBOL_GPL(virtqueue_add_outbuf_premapped);
+
/**
* virtqueue_add_inbuf - expose input buffers to other end
* @vq: the struct virtqueue we're talking about.
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index b93238db94e3..a533253fa9e8 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -43,6 +43,11 @@ int virtqueue_add_outbuf(struct virtqueue *vq,
void *data,
gfp_t gfp);
+int virtqueue_add_outbuf_premapped(struct virtqueue *vq,
+ struct scatterlist *sg, unsigned int num,
+ void *data,
+ gfp_t gfp);
+
int virtqueue_add_inbuf(struct virtqueue *vq,
struct scatterlist sg[], unsigned int num,
void *data,
--
2.32.0.3.g01195cf9f
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH vhost v9 08/12] virtio_ring: introduce virtqueue_add_inbuf_premapped()
2023-05-17 2:22 [PATCH vhost v9 00/12] virtio core prepares for AF_XDP Xuan Zhuo
` (6 preceding siblings ...)
2023-05-17 2:22 ` [PATCH vhost v9 07/12] virtio_ring: introduce virtqueue_add_outbuf_premapped() Xuan Zhuo
@ 2023-05-17 2:22 ` Xuan Zhuo
2023-05-17 2:22 ` [PATCH vhost v9 09/12] virtio_ring: introduce virtqueue_dma_dev() Xuan Zhuo
` (3 subsequent siblings)
11 siblings, 0 replies; 44+ messages in thread
From: Xuan Zhuo @ 2023-05-17 2:22 UTC (permalink / raw)
To: virtualization; +Cc: Christoph Hellwig, Xuan Zhuo, Michael S. Tsirkin
Introduce virtqueue_add_inbuf_premapped() to submit premapped sgs.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
drivers/virtio/virtio_ring.c | 25 +++++++++++++++++++++++++
include/linux/virtio.h | 5 +++++
2 files changed, 30 insertions(+)
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 3d3e602fd261..cbeac2f516c7 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2310,6 +2310,31 @@ int virtqueue_add_inbuf_ctx(struct virtqueue *vq,
}
EXPORT_SYMBOL_GPL(virtqueue_add_inbuf_ctx);
+/**
+ * virtqueue_add_inbuf_premapped - expose input buffers to other end
+ * @vq: the struct virtqueue we're talking about.
+ * @sg: scatterlist (must be well-formed and terminated!)
+ * @num: the number of entries in @sg writable by other side
+ * @data: the token identifying the buffer.
+ * @gfp: how to do memory allocations (if necessary).
+ *
+ * Caller must ensure we don't call this with other virtqueue operations
+ * at the same time (except where noted).
+ *
+ * It is required that all addrs have completed DMA operations. And use
+ * sg->dma_address, sg->length to pass addr and length.
+ *
+ * Returns zero or a negative error (ie. ENOSPC, ENOMEM, EIO).
+ */
+int virtqueue_add_inbuf_premapped(struct virtqueue *vq,
+ struct scatterlist *sg, unsigned int num,
+ void *data,
+ gfp_t gfp)
+{
+ return virtqueue_add(vq, &sg, num, 0, 1, data, NULL, true, gfp);
+}
+EXPORT_SYMBOL_GPL(virtqueue_add_inbuf_premapped);
+
/**
* virtqueue_kick_prepare - first half of split virtqueue_kick call.
* @_vq: the struct virtqueue
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index a533253fa9e8..0f787cdcfd5a 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -59,6 +59,11 @@ int virtqueue_add_inbuf_ctx(struct virtqueue *vq,
void *ctx,
gfp_t gfp);
+int virtqueue_add_inbuf_premapped(struct virtqueue *vq,
+ struct scatterlist *sg, unsigned int num,
+ void *data,
+ gfp_t gfp);
+
int virtqueue_add_sgs(struct virtqueue *vq,
struct scatterlist *sgs[],
unsigned int out_sgs,
--
2.32.0.3.g01195cf9f
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH vhost v9 09/12] virtio_ring: introduce virtqueue_dma_dev()
2023-05-17 2:22 [PATCH vhost v9 00/12] virtio core prepares for AF_XDP Xuan Zhuo
` (7 preceding siblings ...)
2023-05-17 2:22 ` [PATCH vhost v9 08/12] virtio_ring: introduce virtqueue_add_inbuf_premapped() Xuan Zhuo
@ 2023-05-17 2:22 ` Xuan Zhuo
2023-05-17 2:22 ` [PATCH vhost v9 10/12] virtio_ring: correct the expression of the description of virtqueue_resize() Xuan Zhuo
` (2 subsequent siblings)
11 siblings, 0 replies; 44+ messages in thread
From: Xuan Zhuo @ 2023-05-17 2:22 UTC (permalink / raw)
To: virtualization; +Cc: Christoph Hellwig, Xuan Zhuo, Michael S. Tsirkin
Added virtqueue_dma_dev() to get DMA device for virtio. Then the
caller can do dma operation in advance. The purpose is to keep memory
mapped across multiple add/get buf operations.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
drivers/virtio/virtio_ring.c | 17 +++++++++++++++++
include/linux/virtio.h | 2 ++
2 files changed, 19 insertions(+)
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index cbeac2f516c7..42730c4ecdc5 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2335,6 +2335,23 @@ int virtqueue_add_inbuf_premapped(struct virtqueue *vq,
}
EXPORT_SYMBOL_GPL(virtqueue_add_inbuf_premapped);
+/**
+ * virtqueue_dma_dev - get the dma dev
+ * @_vq: the struct virtqueue we're talking about.
+ *
+ * Returns the dma dev. That can been used for dma api.
+ */
+struct device *virtqueue_dma_dev(struct virtqueue *_vq)
+{
+ struct vring_virtqueue *vq = to_vvq(_vq);
+
+ if (vq->use_dma_api)
+ return vring_dma_dev(vq);
+ else
+ return NULL;
+}
+EXPORT_SYMBOL_GPL(virtqueue_dma_dev);
+
/**
* virtqueue_kick_prepare - first half of split virtqueue_kick call.
* @_vq: the struct virtqueue
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 0f787cdcfd5a..41ff92b6184e 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -71,6 +71,8 @@ int virtqueue_add_sgs(struct virtqueue *vq,
void *data,
gfp_t gfp);
+struct device *virtqueue_dma_dev(struct virtqueue *vq);
+
bool virtqueue_kick(struct virtqueue *vq);
bool virtqueue_kick_prepare(struct virtqueue *vq);
--
2.32.0.3.g01195cf9f
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH vhost v9 10/12] virtio_ring: correct the expression of the description of virtqueue_resize()
2023-05-17 2:22 [PATCH vhost v9 00/12] virtio core prepares for AF_XDP Xuan Zhuo
` (8 preceding siblings ...)
2023-05-17 2:22 ` [PATCH vhost v9 09/12] virtio_ring: introduce virtqueue_dma_dev() Xuan Zhuo
@ 2023-05-17 2:22 ` Xuan Zhuo
2023-05-18 12:12 ` Xuan Zhuo
2023-05-17 2:22 ` [PATCH vhost v9 11/12] virtio_ring: separate the logic of reset/enable from virtqueue_resize Xuan Zhuo
2023-05-17 2:22 ` [PATCH vhost v9 12/12] virtio_ring: introduce virtqueue_reset() Xuan Zhuo
11 siblings, 1 reply; 44+ messages in thread
From: Xuan Zhuo @ 2023-05-17 2:22 UTC (permalink / raw)
To: virtualization; +Cc: Christoph Hellwig, Xuan Zhuo, Michael S. Tsirkin
Modify the "useless" to a more accurate "unused".
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
drivers/virtio/virtio_ring.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 42730c4ecdc5..c90160d2d280 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2734,7 +2734,7 @@ EXPORT_SYMBOL_GPL(vring_create_virtqueue_dma);
* virtqueue_resize - resize the vring of vq
* @_vq: the struct virtqueue we're talking about.
* @num: new ring num
- * @recycle: callback for recycle the useless buffer
+ * @recycle: callback to recycle unused buffers
*
* When it is really necessary to create a new vring, it will set the current vq
* into the reset state. Then call the passed callback to recycle the buffer
--
2.32.0.3.g01195cf9f
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH vhost v9 11/12] virtio_ring: separate the logic of reset/enable from virtqueue_resize
2023-05-17 2:22 [PATCH vhost v9 00/12] virtio core prepares for AF_XDP Xuan Zhuo
` (9 preceding siblings ...)
2023-05-17 2:22 ` [PATCH vhost v9 10/12] virtio_ring: correct the expression of the description of virtqueue_resize() Xuan Zhuo
@ 2023-05-17 2:22 ` Xuan Zhuo
2023-05-17 2:22 ` [PATCH vhost v9 12/12] virtio_ring: introduce virtqueue_reset() Xuan Zhuo
11 siblings, 0 replies; 44+ messages in thread
From: Xuan Zhuo @ 2023-05-17 2:22 UTC (permalink / raw)
To: virtualization; +Cc: Christoph Hellwig, Xuan Zhuo, Michael S. Tsirkin
The subsequent reset function will reuse these logic.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
drivers/virtio/virtio_ring.c | 58 ++++++++++++++++++++++++------------
1 file changed, 39 insertions(+), 19 deletions(-)
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index c90160d2d280..7c1313706057 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2159,6 +2159,43 @@ static int virtqueue_resize_packed(struct virtqueue *_vq, u32 num)
return -ENOMEM;
}
+static int virtqueue_disable_and_recycle(struct virtqueue *_vq,
+ void (*recycle)(struct virtqueue *vq, void *buf))
+{
+ struct vring_virtqueue *vq = to_vvq(_vq);
+ struct virtio_device *vdev = vq->vq.vdev;
+ void *buf;
+ int err;
+
+ if (!vq->we_own_ring)
+ return -EPERM;
+
+ if (!vdev->config->disable_vq_and_reset)
+ return -ENOENT;
+
+ if (!vdev->config->enable_vq_after_reset)
+ return -ENOENT;
+
+ err = vdev->config->disable_vq_and_reset(_vq);
+ if (err)
+ return err;
+
+ while ((buf = virtqueue_detach_unused_buf(_vq)) != NULL)
+ recycle(_vq, buf);
+
+ return 0;
+}
+
+static int virtqueue_enable_after_reset(struct virtqueue *_vq)
+{
+ struct vring_virtqueue *vq = to_vvq(_vq);
+ struct virtio_device *vdev = vq->vq.vdev;
+
+ if (vdev->config->enable_vq_after_reset(_vq))
+ return -EBUSY;
+
+ return 0;
+}
/*
* Generic functions and exported symbols.
@@ -2758,13 +2795,8 @@ int virtqueue_resize(struct virtqueue *_vq, u32 num,
void (*recycle)(struct virtqueue *vq, void *buf))
{
struct vring_virtqueue *vq = to_vvq(_vq);
- struct virtio_device *vdev = vq->vq.vdev;
- void *buf;
int err;
- if (!vq->we_own_ring)
- return -EPERM;
-
if (num > vq->vq.num_max)
return -E2BIG;
@@ -2774,28 +2806,16 @@ int virtqueue_resize(struct virtqueue *_vq, u32 num,
if ((vq->packed_ring ? vq->packed.vring.num : vq->split.vring.num) == num)
return 0;
- if (!vdev->config->disable_vq_and_reset)
- return -ENOENT;
-
- if (!vdev->config->enable_vq_after_reset)
- return -ENOENT;
-
- err = vdev->config->disable_vq_and_reset(_vq);
+ err = virtqueue_disable_and_recycle(_vq, recycle);
if (err)
return err;
- while ((buf = virtqueue_detach_unused_buf(_vq)) != NULL)
- recycle(_vq, buf);
-
if (vq->packed_ring)
err = virtqueue_resize_packed(_vq, num);
else
err = virtqueue_resize_split(_vq, num);
- if (vdev->config->enable_vq_after_reset(_vq))
- return -EBUSY;
-
- return err;
+ return virtqueue_enable_after_reset(_vq);
}
EXPORT_SYMBOL_GPL(virtqueue_resize);
--
2.32.0.3.g01195cf9f
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH vhost v9 12/12] virtio_ring: introduce virtqueue_reset()
2023-05-17 2:22 [PATCH vhost v9 00/12] virtio core prepares for AF_XDP Xuan Zhuo
` (10 preceding siblings ...)
2023-05-17 2:22 ` [PATCH vhost v9 11/12] virtio_ring: separate the logic of reset/enable from virtqueue_resize Xuan Zhuo
@ 2023-05-17 2:22 ` Xuan Zhuo
11 siblings, 0 replies; 44+ messages in thread
From: Xuan Zhuo @ 2023-05-17 2:22 UTC (permalink / raw)
To: virtualization; +Cc: Christoph Hellwig, Xuan Zhuo, Michael S. Tsirkin
Introduce virtqueue_reset() to release all buffer inside vq.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
drivers/virtio/virtio_ring.c | 33 +++++++++++++++++++++++++++++++++
include/linux/virtio.h | 2 ++
2 files changed, 35 insertions(+)
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 7c1313706057..143f380baa1c 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2819,6 +2819,39 @@ int virtqueue_resize(struct virtqueue *_vq, u32 num,
}
EXPORT_SYMBOL_GPL(virtqueue_resize);
+/**
+ * virtqueue_reset - detach and recycle all unused buffers
+ * @_vq: the struct virtqueue we're talking about.
+ * @recycle: callback to recycle unused buffers
+ *
+ * Caller must ensure we don't call this with other virtqueue operations
+ * at the same time (except where noted).
+ *
+ * Returns zero or a negative error.
+ * 0: success.
+ * -EBUSY: Failed to sync with device, vq may not work properly
+ * -ENOENT: Transport or device not supported
+ * -EPERM: Operation not permitted
+ */
+int virtqueue_reset(struct virtqueue *_vq,
+ void (*recycle)(struct virtqueue *vq, void *buf))
+{
+ struct vring_virtqueue *vq = to_vvq(_vq);
+ int err;
+
+ err = virtqueue_disable_and_recycle(_vq, recycle);
+ if (err)
+ return err;
+
+ if (vq->packed_ring)
+ virtqueue_reinit_packed(vq);
+ else
+ virtqueue_reinit_split(vq);
+
+ return virtqueue_enable_after_reset(_vq);
+}
+EXPORT_SYMBOL_GPL(virtqueue_reset);
+
/* Only available for split ring */
struct virtqueue *vring_new_virtqueue(unsigned int index,
unsigned int num,
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 41ff92b6184e..134c6c9a445d 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -107,6 +107,8 @@ dma_addr_t virtqueue_get_used_addr(const struct virtqueue *vq);
int virtqueue_resize(struct virtqueue *vq, u32 num,
void (*recycle)(struct virtqueue *vq, void *buf));
+int virtqueue_reset(struct virtqueue *vq,
+ void (*recycle)(struct virtqueue *vq, void *buf));
/**
* struct virtio_device - representation of a device using virtio
--
2.32.0.3.g01195cf9f
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH vhost v9 01/12] virtio_ring: put mapping error check in vring_map_one_sg
2023-05-17 2:22 ` [PATCH vhost v9 01/12] virtio_ring: put mapping error check in vring_map_one_sg Xuan Zhuo
@ 2023-05-18 6:51 ` Jason Wang
2023-05-23 6:02 ` Christoph Hellwig
1 sibling, 0 replies; 44+ messages in thread
From: Jason Wang @ 2023-05-18 6:51 UTC (permalink / raw)
To: Xuan Zhuo; +Cc: Christoph Hellwig, Michael S. Tsirkin, virtualization
On Wed, May 17, 2023 at 10:23 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> This patch put the dma addr error check in vring_map_one_sg().
>
> The benefits of doing this:
>
> 1. make vring_map_one_sg more simple, without calling
> vring_mapping_error to check the return value.
> 2. reduce one judgment of vq->use_dma_api.
Code looks fine but it's better to explain how it relates or simply
anything with this series.
Thanks
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
> drivers/virtio/virtio_ring.c | 37 +++++++++++++++++++++---------------
> 1 file changed, 22 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index c5310eaf8b46..c563215be6b9 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -355,9 +355,8 @@ static struct device *vring_dma_dev(const struct vring_virtqueue *vq)
> }
>
> /* Map one sg entry. */
> -static dma_addr_t vring_map_one_sg(const struct vring_virtqueue *vq,
> - struct scatterlist *sg,
> - enum dma_data_direction direction)
> +static int vring_map_one_sg(const struct vring_virtqueue *vq, struct scatterlist *sg,
> + enum dma_data_direction direction, static dma_addr_t *addr)
> {
> if (!vq->use_dma_api) {
> /*
> @@ -366,7 +365,8 @@ static dma_addr_t vring_map_one_sg(const struct vring_virtqueue *vq,
> * depending on the direction.
> */
> kmsan_handle_dma(sg_page(sg), sg->offset, sg->length, direction);
> - return (dma_addr_t)sg_phys(sg);
> + *addr = (dma_addr_t)sg_phys(sg);
> + return 0;
> }
>
> /*
> @@ -374,9 +374,14 @@ static dma_addr_t vring_map_one_sg(const struct vring_virtqueue *vq,
> * the way it expects (we don't guarantee that the scatterlist
> * will exist for the lifetime of the mapping).
> */
> - return dma_map_page(vring_dma_dev(vq),
> + *addr = dma_map_page(vring_dma_dev(vq),
> sg_page(sg), sg->offset, sg->length,
> direction);
> +
> + if (dma_mapping_error(vring_dma_dev(vq), *addr))
> + return -ENOMEM;
> +
> + return 0;
> }
>
> static dma_addr_t vring_map_single(const struct vring_virtqueue *vq,
> @@ -588,8 +593,9 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>
> for (n = 0; n < out_sgs; n++) {
> for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> - dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_TO_DEVICE);
> - if (vring_mapping_error(vq, addr))
> + dma_addr_t addr;
> +
> + if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, &addr))
> goto unmap_release;
>
> prev = i;
> @@ -603,8 +609,9 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> }
> for (; n < (out_sgs + in_sgs); n++) {
> for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> - dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_FROM_DEVICE);
> - if (vring_mapping_error(vq, addr))
> + dma_addr_t addr;
> +
> + if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, &addr))
> goto unmap_release;
>
> prev = i;
> @@ -1279,9 +1286,8 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
>
> for (n = 0; n < out_sgs + in_sgs; n++) {
> for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> - addr = vring_map_one_sg(vq, sg, n < out_sgs ?
> - DMA_TO_DEVICE : DMA_FROM_DEVICE);
> - if (vring_mapping_error(vq, addr))
> + if (vring_map_one_sg(vq, sg, n < out_sgs ?
> + DMA_TO_DEVICE : DMA_FROM_DEVICE, &addr))
> goto unmap_release;
>
> desc[i].flags = cpu_to_le16(n < out_sgs ?
> @@ -1426,9 +1432,10 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
> c = 0;
> for (n = 0; n < out_sgs + in_sgs; n++) {
> for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> - dma_addr_t addr = vring_map_one_sg(vq, sg, n < out_sgs ?
> - DMA_TO_DEVICE : DMA_FROM_DEVICE);
> - if (vring_mapping_error(vq, addr))
> + dma_addr_t addr;
> +
> + if (vring_map_one_sg(vq, sg, n < out_sgs ?
> + DMA_TO_DEVICE : DMA_FROM_DEVICE, &addr))
> goto unmap_release;
>
> flags = cpu_to_le16(vq->packed.avail_used_flags |
> --
> 2.32.0.3.g01195cf9f
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH vhost v9 02/12] virtio_ring: simplify the reference of desc state inside detach_buf_split()
2023-05-17 2:22 ` [PATCH vhost v9 02/12] virtio_ring: simplify the reference of desc state inside detach_buf_split() Xuan Zhuo
@ 2023-05-18 6:51 ` Jason Wang
0 siblings, 0 replies; 44+ messages in thread
From: Jason Wang @ 2023-05-18 6:51 UTC (permalink / raw)
To: Xuan Zhuo; +Cc: Christoph Hellwig, Michael S. Tsirkin, virtualization
On Wed, May 17, 2023 at 10:23 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> The purpose of this is to simplify the reference to state. It is
> convenient for subsequent commit.
It's better to be verbose, e.g how it can simplify the following patches.
Thanks
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
> drivers/virtio/virtio_ring.c | 12 +++++++-----
> 1 file changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index c563215be6b9..479203346c36 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -744,11 +744,14 @@ static bool virtqueue_kick_prepare_split(struct virtqueue *_vq)
> static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> void **ctx)
> {
> + struct vring_desc_state_split *state;
> unsigned int i, j;
> __virtio16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT);
>
> + state = &vq->split.desc_state[head];
> +
> /* Clear data ptr. */
> - vq->split.desc_state[head].data = NULL;
> + state->data = NULL;
>
> /* Put back on free list: unmap first-level descriptors and find end */
> i = head;
> @@ -767,8 +770,7 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> vq->vq.num_free++;
>
> if (vq->indirect) {
> - struct vring_desc *indir_desc =
> - vq->split.desc_state[head].indir_desc;
> + struct vring_desc *indir_desc = state->indir_desc;
> u32 len;
>
> /* Free the indirect table, if any, now that it's unmapped. */
> @@ -785,9 +787,9 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> vring_unmap_one_split_indirect(vq, &indir_desc[j]);
>
> kfree(indir_desc);
> - vq->split.desc_state[head].indir_desc = NULL;
> + state->indir_desc = NULL;
> } else if (ctx) {
> - *ctx = vq->split.desc_state[head].indir_desc;
> + *ctx = state->indir_desc;
> }
> }
>
> --
> 2.32.0.3.g01195cf9f
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH vhost v9 03/12] virtio_ring: check use_dma_api before unmap desc for indirect
2023-05-17 2:22 ` [PATCH vhost v9 03/12] virtio_ring: check use_dma_api before unmap desc for indirect Xuan Zhuo
@ 2023-05-18 6:51 ` Jason Wang
0 siblings, 0 replies; 44+ messages in thread
From: Jason Wang @ 2023-05-18 6:51 UTC (permalink / raw)
To: Xuan Zhuo; +Cc: Christoph Hellwig, Michael S. Tsirkin, virtualization
On Wed, May 17, 2023 at 10:23 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> Inside detach_buf_split(), if use_dma_api is false,
> vring_unmap_one_split_indirect will be called many times, but actually
> nothing is done. So this patch check use_dma_api firstly.
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Thanks
> ---
> drivers/virtio/virtio_ring.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 479203346c36..1ffab1eb40c0 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -783,8 +783,10 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> VRING_DESC_F_INDIRECT));
> BUG_ON(len == 0 || len % sizeof(struct vring_desc));
>
> - for (j = 0; j < len / sizeof(struct vring_desc); j++)
> - vring_unmap_one_split_indirect(vq, &indir_desc[j]);
> + if (vq->use_dma_api) {
> + for (j = 0; j < len / sizeof(struct vring_desc); j++)
> + vring_unmap_one_split_indirect(vq, &indir_desc[j]);
> + }
>
> kfree(indir_desc);
> state->indir_desc = NULL;
> --
> 2.32.0.3.g01195cf9f
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH vhost v9 04/12] virtio_ring: virtqueue_add() support premapped
2023-05-17 2:22 ` [PATCH vhost v9 04/12] virtio_ring: virtqueue_add() support premapped Xuan Zhuo
@ 2023-05-18 6:51 ` Jason Wang
2023-05-23 6:03 ` Christoph Hellwig
1 sibling, 0 replies; 44+ messages in thread
From: Jason Wang @ 2023-05-18 6:51 UTC (permalink / raw)
To: Xuan Zhuo; +Cc: Christoph Hellwig, Michael S. Tsirkin, virtualization
On Wed, May 17, 2023 at 10:23 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> virtuque_add() adds parameter premapped.
I wonder if this patch is over simplified. Maybe it can be squashed
with the patch that implements the premapped logic.
Thanks
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
> drivers/virtio/virtio_ring.c | 9 +++++----
> 1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 1ffab1eb40c0..e2fc50c05bec 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -2135,6 +2135,7 @@ static inline int virtqueue_add(struct virtqueue *_vq,
> unsigned int in_sgs,
> void *data,
> void *ctx,
> + bool premapped,
> gfp_t gfp)
> {
> struct vring_virtqueue *vq = to_vvq(_vq);
> @@ -2176,7 +2177,7 @@ int virtqueue_add_sgs(struct virtqueue *_vq,
> total_sg++;
> }
> return virtqueue_add(_vq, sgs, total_sg, out_sgs, in_sgs,
> - data, NULL, gfp);
> + data, NULL, false, gfp);
> }
> EXPORT_SYMBOL_GPL(virtqueue_add_sgs);
>
> @@ -2198,7 +2199,7 @@ int virtqueue_add_outbuf(struct virtqueue *vq,
> void *data,
> gfp_t gfp)
> {
> - return virtqueue_add(vq, &sg, num, 1, 0, data, NULL, gfp);
> + return virtqueue_add(vq, &sg, num, 1, 0, data, NULL, false, gfp);
> }
> EXPORT_SYMBOL_GPL(virtqueue_add_outbuf);
>
> @@ -2220,7 +2221,7 @@ int virtqueue_add_inbuf(struct virtqueue *vq,
> void *data,
> gfp_t gfp)
> {
> - return virtqueue_add(vq, &sg, num, 0, 1, data, NULL, gfp);
> + return virtqueue_add(vq, &sg, num, 0, 1, data, NULL, false, gfp);
> }
> EXPORT_SYMBOL_GPL(virtqueue_add_inbuf);
>
> @@ -2244,7 +2245,7 @@ int virtqueue_add_inbuf_ctx(struct virtqueue *vq,
> void *ctx,
> gfp_t gfp)
> {
> - return virtqueue_add(vq, &sg, num, 0, 1, data, ctx, gfp);
> + return virtqueue_add(vq, &sg, num, 0, 1, data, ctx, false, gfp);
> }
> EXPORT_SYMBOL_GPL(virtqueue_add_inbuf_ctx);
>
> --
> 2.32.0.3.g01195cf9f
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH vhost v9 05/12] virtio_ring: split: virtqueue_add_split() support premapped
2023-05-17 2:22 ` [PATCH vhost v9 05/12] virtio_ring: split: virtqueue_add_split() " Xuan Zhuo
@ 2023-05-18 6:51 ` Jason Wang
2023-05-18 7:11 ` Michael S. Tsirkin
0 siblings, 1 reply; 44+ messages in thread
From: Jason Wang @ 2023-05-18 6:51 UTC (permalink / raw)
To: Xuan Zhuo; +Cc: Christoph Hellwig, Michael S. Tsirkin, virtualization
On Wed, May 17, 2023 at 10:23 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> virtqueue_add_split() only supports virtual addresses, dma is completed
> in virtqueue_add_split().
>
> In some scenarios (such as the AF_XDP scenario), the memory is allocated
> and DMA is completed in advance, so it is necessary for us to support
> passing the DMA address to virtqueue_add_split().
>
> Record this information in desc_state, we can skip unmap based on this
> when executing dma unmap.
I would also suggest documenting why a per descriptor metadata is
needed instead of a per virtqueue one.
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
> drivers/virtio/virtio_ring.c | 38 +++++++++++++++++++++++++++---------
> 1 file changed, 29 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index e2fc50c05bec..bd5e84afab37 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -70,6 +70,7 @@
> struct vring_desc_state_split {
> void *data; /* Data for callback. */
> struct vring_desc *indir_desc; /* Indirect descriptor, if any. */
> + bool premapped; /* DMA mapping is done by driver. */
Going back to the original discussion around where this should be
placed. I wonder if we can find a common place to store this since it
has nothing related to virtqueue layout. Maybe desc_extra? And it
would be even better if we can avoid stressing the cache like above.
> };
>
> struct vring_desc_state_packed {
> @@ -356,8 +357,14 @@ static struct device *vring_dma_dev(const struct vring_virtqueue *vq)
>
> /* Map one sg entry. */
> static int vring_map_one_sg(const struct vring_virtqueue *vq, struct scatterlist *sg,
> - enum dma_data_direction direction, static dma_addr_t *addr)
> + enum dma_data_direction direction,
> + bool premapped, dma_addr_t *addr)
having things like:
int func(bool do)
{
if (!do)
return;
}
is a hint that the check needs to be done by the caller?
And this change should work for both packed and split. I think we need
to squash the packed changes here.
Looking at how packed virtqueue uses this in this patch, I don't think
this patch can even be built. I will wait for a new version and
continue the review from there.
Thanks
> {
> + if (premapped) {
> + *addr = sg_dma_address(sg);
> + return 0;
> + }
> +
> if (!vq->use_dma_api) {
> /*
> * If DMA is not used, KMSAN doesn't know that the scatterlist
> @@ -445,7 +452,7 @@ static void vring_unmap_one_split_indirect(const struct vring_virtqueue *vq,
> }
>
> static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
> - unsigned int i)
> + unsigned int i, bool premapped)
> {
> struct vring_desc_extra *extra = vq->split.desc_extra;
> u16 flags;
> @@ -462,6 +469,9 @@ static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
> (flags & VRING_DESC_F_WRITE) ?
> DMA_FROM_DEVICE : DMA_TO_DEVICE);
> } else {
> + if (premapped)
> + goto out;
> +
> dma_unmap_page(vring_dma_dev(vq),
> extra[i].addr,
> extra[i].len,
> @@ -532,6 +542,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> unsigned int in_sgs,
> void *data,
> void *ctx,
> + bool premapped,
> gfp_t gfp)
> {
> struct vring_virtqueue *vq = to_vvq(_vq);
> @@ -595,7 +606,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> dma_addr_t addr;
>
> - if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, &addr))
> + if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, premapped, &addr))
> goto unmap_release;
>
> prev = i;
> @@ -611,7 +622,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> dma_addr_t addr;
>
> - if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, &addr))
> + if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, premapped, &addr))
> goto unmap_release;
>
> prev = i;
> @@ -657,6 +668,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>
> /* Store token and indirect buffer state. */
> vq->split.desc_state[head].data = data;
> + vq->split.desc_state[head].premapped = premapped;
> if (indirect)
> vq->split.desc_state[head].indir_desc = desc;
> else
> @@ -686,6 +698,14 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> return 0;
>
> unmap_release:
> + if (premapped) {
> + if (indirect)
> + kfree(desc);
> +
> + END_USE(vq);
> + return -ENOMEM;
> + }
> +
> err_idx = i;
>
> if (indirect)
> @@ -700,7 +720,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> vring_unmap_one_split_indirect(vq, &desc[i]);
> i = virtio16_to_cpu(_vq->vdev, desc[i].next);
> } else
> - i = vring_unmap_one_split(vq, i);
> + i = vring_unmap_one_split(vq, i, false);
> }
>
> if (indirect)
> @@ -757,12 +777,12 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> i = head;
>
> while (vq->split.vring.desc[i].flags & nextflag) {
> - vring_unmap_one_split(vq, i);
> + vring_unmap_one_split(vq, i, state->premapped);
> i = vq->split.desc_extra[i].next;
> vq->vq.num_free++;
> }
>
> - vring_unmap_one_split(vq, i);
> + vring_unmap_one_split(vq, i, state->premapped);
> vq->split.desc_extra[i].next = vq->free_head;
> vq->free_head = head;
>
> @@ -783,7 +803,7 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> VRING_DESC_F_INDIRECT));
> BUG_ON(len == 0 || len % sizeof(struct vring_desc));
>
> - if (vq->use_dma_api) {
> + if (vq->use_dma_api && !state->premapped) {
> for (j = 0; j < len / sizeof(struct vring_desc); j++)
> vring_unmap_one_split_indirect(vq, &indir_desc[j]);
> }
> @@ -2143,7 +2163,7 @@ static inline int virtqueue_add(struct virtqueue *_vq,
> return vq->packed_ring ? virtqueue_add_packed(_vq, sgs, total_sg,
> out_sgs, in_sgs, data, ctx, gfp) :
> virtqueue_add_split(_vq, sgs, total_sg,
> - out_sgs, in_sgs, data, ctx, gfp);
> + out_sgs, in_sgs, data, ctx, premapped, gfp);
> }
>
> /**
> --
> 2.32.0.3.g01195cf9f
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH vhost v9 05/12] virtio_ring: split: virtqueue_add_split() support premapped
2023-05-18 6:51 ` Jason Wang
@ 2023-05-18 7:11 ` Michael S. Tsirkin
2023-05-18 7:33 ` Xuan Zhuo
2023-05-18 9:24 ` Xuan Zhuo
0 siblings, 2 replies; 44+ messages in thread
From: Michael S. Tsirkin @ 2023-05-18 7:11 UTC (permalink / raw)
To: Jason Wang; +Cc: Christoph Hellwig, Xuan Zhuo, virtualization
On Thu, May 18, 2023 at 02:51:57PM +0800, Jason Wang wrote:
> On Wed, May 17, 2023 at 10:23 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > virtqueue_add_split() only supports virtual addresses, dma is completed
> > in virtqueue_add_split().
> >
> > In some scenarios (such as the AF_XDP scenario), the memory is allocated
> > and DMA is completed in advance, so it is necessary for us to support
> > passing the DMA address to virtqueue_add_split().
> >
> > Record this information in desc_state, we can skip unmap based on this
> > when executing dma unmap.
>
> I would also suggest documenting why a per descriptor metadata is
> needed instead of a per virtqueue one.
I think we could make it per virtqueue. That would mean all code in
virtio net would have to change to do dma mapping itself instead of
relying on virtio core though. Which is maybe a good idea? Definitely a
very intrusive change though, will need a lot of performance testing
to make sure we don't break anything.
> >
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > ---
> > drivers/virtio/virtio_ring.c | 38 +++++++++++++++++++++++++++---------
> > 1 file changed, 29 insertions(+), 9 deletions(-)
> >
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index e2fc50c05bec..bd5e84afab37 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -70,6 +70,7 @@
> > struct vring_desc_state_split {
> > void *data; /* Data for callback. */
> > struct vring_desc *indir_desc; /* Indirect descriptor, if any. */
> > + bool premapped; /* DMA mapping is done by driver. */
>
> Going back to the original discussion around where this should be
> placed. I wonder if we can find a common place to store this since it
> has nothing related to virtqueue layout. Maybe desc_extra? And it
> would be even better if we can avoid stressing the cache like above.
>
> > };
> >
> > struct vring_desc_state_packed {
> > @@ -356,8 +357,14 @@ static struct device *vring_dma_dev(const struct vring_virtqueue *vq)
> >
> > /* Map one sg entry. */
> > static int vring_map_one_sg(const struct vring_virtqueue *vq, struct scatterlist *sg,
> > - enum dma_data_direction direction, static dma_addr_t *addr)
> > + enum dma_data_direction direction,
> > + bool premapped, dma_addr_t *addr)
>
> having things like:
>
> int func(bool do)
> {
> if (!do)
> return;
> }
>
> is a hint that the check needs to be done by the caller?
>
> And this change should work for both packed and split. I think we need
> to squash the packed changes here.
>
> Looking at how packed virtqueue uses this in this patch, I don't think
> this patch can even be built. I will wait for a new version and
> continue the review from there.
>
> Thanks
>
>
>
> > {
> > + if (premapped) {
> > + *addr = sg_dma_address(sg);
> > + return 0;
> > + }
> > +
> > if (!vq->use_dma_api) {
> > /*
> > * If DMA is not used, KMSAN doesn't know that the scatterlist
> > @@ -445,7 +452,7 @@ static void vring_unmap_one_split_indirect(const struct vring_virtqueue *vq,
> > }
> >
> > static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
> > - unsigned int i)
> > + unsigned int i, bool premapped)
> > {
> > struct vring_desc_extra *extra = vq->split.desc_extra;
> > u16 flags;
> > @@ -462,6 +469,9 @@ static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
> > (flags & VRING_DESC_F_WRITE) ?
> > DMA_FROM_DEVICE : DMA_TO_DEVICE);
> > } else {
> > + if (premapped)
> > + goto out;
> > +
> > dma_unmap_page(vring_dma_dev(vq),
> > extra[i].addr,
> > extra[i].len,
> > @@ -532,6 +542,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > unsigned int in_sgs,
> > void *data,
> > void *ctx,
> > + bool premapped,
> > gfp_t gfp)
> > {
> > struct vring_virtqueue *vq = to_vvq(_vq);
> > @@ -595,7 +606,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> > dma_addr_t addr;
> >
> > - if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, &addr))
> > + if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, premapped, &addr))
> > goto unmap_release;
> >
> > prev = i;
> > @@ -611,7 +622,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> > dma_addr_t addr;
> >
> > - if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, &addr))
> > + if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, premapped, &addr))
> > goto unmap_release;
> >
> > prev = i;
> > @@ -657,6 +668,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> >
> > /* Store token and indirect buffer state. */
> > vq->split.desc_state[head].data = data;
> > + vq->split.desc_state[head].premapped = premapped;
> > if (indirect)
> > vq->split.desc_state[head].indir_desc = desc;
> > else
> > @@ -686,6 +698,14 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > return 0;
> >
> > unmap_release:
> > + if (premapped) {
> > + if (indirect)
> > + kfree(desc);
> > +
> > + END_USE(vq);
> > + return -ENOMEM;
> > + }
> > +
> > err_idx = i;
> >
> > if (indirect)
> > @@ -700,7 +720,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > vring_unmap_one_split_indirect(vq, &desc[i]);
> > i = virtio16_to_cpu(_vq->vdev, desc[i].next);
> > } else
> > - i = vring_unmap_one_split(vq, i);
> > + i = vring_unmap_one_split(vq, i, false);
> > }
> >
> > if (indirect)
> > @@ -757,12 +777,12 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> > i = head;
> >
> > while (vq->split.vring.desc[i].flags & nextflag) {
> > - vring_unmap_one_split(vq, i);
> > + vring_unmap_one_split(vq, i, state->premapped);
> > i = vq->split.desc_extra[i].next;
> > vq->vq.num_free++;
> > }
> >
> > - vring_unmap_one_split(vq, i);
> > + vring_unmap_one_split(vq, i, state->premapped);
> > vq->split.desc_extra[i].next = vq->free_head;
> > vq->free_head = head;
> >
> > @@ -783,7 +803,7 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> > VRING_DESC_F_INDIRECT));
> > BUG_ON(len == 0 || len % sizeof(struct vring_desc));
> >
> > - if (vq->use_dma_api) {
> > + if (vq->use_dma_api && !state->premapped) {
> > for (j = 0; j < len / sizeof(struct vring_desc); j++)
> > vring_unmap_one_split_indirect(vq, &indir_desc[j]);
> > }
> > @@ -2143,7 +2163,7 @@ static inline int virtqueue_add(struct virtqueue *_vq,
> > return vq->packed_ring ? virtqueue_add_packed(_vq, sgs, total_sg,
> > out_sgs, in_sgs, data, ctx, gfp) :
> > virtqueue_add_split(_vq, sgs, total_sg,
> > - out_sgs, in_sgs, data, ctx, gfp);
> > + out_sgs, in_sgs, data, ctx, premapped, gfp);
> > }
> >
> > /**
> > --
> > 2.32.0.3.g01195cf9f
> >
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH vhost v9 05/12] virtio_ring: split: virtqueue_add_split() support premapped
2023-05-18 7:11 ` Michael S. Tsirkin
@ 2023-05-18 7:33 ` Xuan Zhuo
2023-05-18 7:54 ` Jason Wang
2023-05-18 8:29 ` Michael S. Tsirkin
2023-05-18 9:24 ` Xuan Zhuo
1 sibling, 2 replies; 44+ messages in thread
From: Xuan Zhuo @ 2023-05-18 7:33 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: Christoph Hellwig, virtualization
On Thu, 18 May 2023 03:11:25 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Thu, May 18, 2023 at 02:51:57PM +0800, Jason Wang wrote:
> > On Wed, May 17, 2023 at 10:23 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > virtqueue_add_split() only supports virtual addresses, dma is completed
> > > in virtqueue_add_split().
> > >
> > > In some scenarios (such as the AF_XDP scenario), the memory is allocated
> > > and DMA is completed in advance, so it is necessary for us to support
> > > passing the DMA address to virtqueue_add_split().
> > >
> > > Record this information in desc_state, we can skip unmap based on this
> > > when executing dma unmap.
> >
> > I would also suggest documenting why a per descriptor metadata is
> > needed instead of a per virtqueue one.
>
> I think we could make it per virtqueue. That would mean all code in
> virtio net would have to change to do dma mapping itself instead of
> relying on virtio core though. Which is maybe a good idea? Definitely a
> very intrusive change though, will need a lot of performance testing
> to make sure we don't break anything.
In fact, we have tried this idea.
The problem is the detach and unmap.
We need to get all DMA Addresses from virtio-ring to unmap. Currently, it does
not support to return the DMA Address, and for SKB, we need to get multiple DMA
Addresses at one time.
This need to modify the logic of Virtio-Ring detach. Besides this, I also agree
with this idea.
Thanks.
>
>
>
>
> > >
> > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > ---
> > > drivers/virtio/virtio_ring.c | 38 +++++++++++++++++++++++++++---------
> > > 1 file changed, 29 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > index e2fc50c05bec..bd5e84afab37 100644
> > > --- a/drivers/virtio/virtio_ring.c
> > > +++ b/drivers/virtio/virtio_ring.c
> > > @@ -70,6 +70,7 @@
> > > struct vring_desc_state_split {
> > > void *data; /* Data for callback. */
> > > struct vring_desc *indir_desc; /* Indirect descriptor, if any. */
> > > + bool premapped; /* DMA mapping is done by driver. */
> >
> > Going back to the original discussion around where this should be
> > placed. I wonder if we can find a common place to store this since it
> > has nothing related to virtqueue layout. Maybe desc_extra? And it
> > would be even better if we can avoid stressing the cache like above.
> >
> > > };
> > >
> > > struct vring_desc_state_packed {
> > > @@ -356,8 +357,14 @@ static struct device *vring_dma_dev(const struct vring_virtqueue *vq)
> > >
> > > /* Map one sg entry. */
> > > static int vring_map_one_sg(const struct vring_virtqueue *vq, struct scatterlist *sg,
> > > - enum dma_data_direction direction, static dma_addr_t *addr)
> > > + enum dma_data_direction direction,
> > > + bool premapped, dma_addr_t *addr)
> >
> > having things like:
> >
> > int func(bool do)
> > {
> > if (!do)
> > return;
> > }
> >
> > is a hint that the check needs to be done by the caller?
> >
> > And this change should work for both packed and split. I think we need
> > to squash the packed changes here.
> >
> > Looking at how packed virtqueue uses this in this patch, I don't think
> > this patch can even be built. I will wait for a new version and
> > continue the review from there.
> >
> > Thanks
> >
> >
> >
> > > {
> > > + if (premapped) {
> > > + *addr = sg_dma_address(sg);
> > > + return 0;
> > > + }
> > > +
> > > if (!vq->use_dma_api) {
> > > /*
> > > * If DMA is not used, KMSAN doesn't know that the scatterlist
> > > @@ -445,7 +452,7 @@ static void vring_unmap_one_split_indirect(const struct vring_virtqueue *vq,
> > > }
> > >
> > > static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
> > > - unsigned int i)
> > > + unsigned int i, bool premapped)
> > > {
> > > struct vring_desc_extra *extra = vq->split.desc_extra;
> > > u16 flags;
> > > @@ -462,6 +469,9 @@ static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
> > > (flags & VRING_DESC_F_WRITE) ?
> > > DMA_FROM_DEVICE : DMA_TO_DEVICE);
> > > } else {
> > > + if (premapped)
> > > + goto out;
> > > +
> > > dma_unmap_page(vring_dma_dev(vq),
> > > extra[i].addr,
> > > extra[i].len,
> > > @@ -532,6 +542,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > unsigned int in_sgs,
> > > void *data,
> > > void *ctx,
> > > + bool premapped,
> > > gfp_t gfp)
> > > {
> > > struct vring_virtqueue *vq = to_vvq(_vq);
> > > @@ -595,7 +606,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> > > dma_addr_t addr;
> > >
> > > - if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, &addr))
> > > + if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, premapped, &addr))
> > > goto unmap_release;
> > >
> > > prev = i;
> > > @@ -611,7 +622,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> > > dma_addr_t addr;
> > >
> > > - if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, &addr))
> > > + if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, premapped, &addr))
> > > goto unmap_release;
> > >
> > > prev = i;
> > > @@ -657,6 +668,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > >
> > > /* Store token and indirect buffer state. */
> > > vq->split.desc_state[head].data = data;
> > > + vq->split.desc_state[head].premapped = premapped;
> > > if (indirect)
> > > vq->split.desc_state[head].indir_desc = desc;
> > > else
> > > @@ -686,6 +698,14 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > return 0;
> > >
> > > unmap_release:
> > > + if (premapped) {
> > > + if (indirect)
> > > + kfree(desc);
> > > +
> > > + END_USE(vq);
> > > + return -ENOMEM;
> > > + }
> > > +
> > > err_idx = i;
> > >
> > > if (indirect)
> > > @@ -700,7 +720,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > vring_unmap_one_split_indirect(vq, &desc[i]);
> > > i = virtio16_to_cpu(_vq->vdev, desc[i].next);
> > > } else
> > > - i = vring_unmap_one_split(vq, i);
> > > + i = vring_unmap_one_split(vq, i, false);
> > > }
> > >
> > > if (indirect)
> > > @@ -757,12 +777,12 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> > > i = head;
> > >
> > > while (vq->split.vring.desc[i].flags & nextflag) {
> > > - vring_unmap_one_split(vq, i);
> > > + vring_unmap_one_split(vq, i, state->premapped);
> > > i = vq->split.desc_extra[i].next;
> > > vq->vq.num_free++;
> > > }
> > >
> > > - vring_unmap_one_split(vq, i);
> > > + vring_unmap_one_split(vq, i, state->premapped);
> > > vq->split.desc_extra[i].next = vq->free_head;
> > > vq->free_head = head;
> > >
> > > @@ -783,7 +803,7 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> > > VRING_DESC_F_INDIRECT));
> > > BUG_ON(len == 0 || len % sizeof(struct vring_desc));
> > >
> > > - if (vq->use_dma_api) {
> > > + if (vq->use_dma_api && !state->premapped) {
> > > for (j = 0; j < len / sizeof(struct vring_desc); j++)
> > > vring_unmap_one_split_indirect(vq, &indir_desc[j]);
> > > }
> > > @@ -2143,7 +2163,7 @@ static inline int virtqueue_add(struct virtqueue *_vq,
> > > return vq->packed_ring ? virtqueue_add_packed(_vq, sgs, total_sg,
> > > out_sgs, in_sgs, data, ctx, gfp) :
> > > virtqueue_add_split(_vq, sgs, total_sg,
> > > - out_sgs, in_sgs, data, ctx, gfp);
> > > + out_sgs, in_sgs, data, ctx, premapped, gfp);
> > > }
> > >
> > > /**
> > > --
> > > 2.32.0.3.g01195cf9f
> > >
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH vhost v9 05/12] virtio_ring: split: virtqueue_add_split() support premapped
2023-05-18 7:33 ` Xuan Zhuo
@ 2023-05-18 7:54 ` Jason Wang
2023-05-18 7:56 ` Xuan Zhuo
2023-05-18 8:29 ` Michael S. Tsirkin
1 sibling, 1 reply; 44+ messages in thread
From: Jason Wang @ 2023-05-18 7:54 UTC (permalink / raw)
To: Xuan Zhuo; +Cc: Christoph Hellwig, virtualization, Michael S. Tsirkin
On Thu, May 18, 2023 at 3:41 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Thu, 18 May 2023 03:11:25 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > On Thu, May 18, 2023 at 02:51:57PM +0800, Jason Wang wrote:
> > > On Wed, May 17, 2023 at 10:23 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > virtqueue_add_split() only supports virtual addresses, dma is completed
> > > > in virtqueue_add_split().
> > > >
> > > > In some scenarios (such as the AF_XDP scenario), the memory is allocated
> > > > and DMA is completed in advance, so it is necessary for us to support
> > > > passing the DMA address to virtqueue_add_split().
> > > >
> > > > Record this information in desc_state, we can skip unmap based on this
> > > > when executing dma unmap.
> > >
> > > I would also suggest documenting why a per descriptor metadata is
> > > needed instead of a per virtqueue one.
> >
> > I think we could make it per virtqueue. That would mean all code in
> > virtio net would have to change to do dma mapping itself instead of
> > relying on virtio core though. Which is maybe a good idea? Definitely a
> > very intrusive change though, will need a lot of performance testing
> > to make sure we don't break anything.
>
> In fact, we have tried this idea.
>
> The problem is the detach and unmap.
>
> We need to get all DMA Addresses from virtio-ring to unmap. Currently, it does
> not support to return the DMA Address,
I'm not sure I got here, but we've already stored the DMA address in desc_extra?
> and for SKB, we need to get multiple DMA
> Addresses at one time.
Could you elaborate on this?
Thanks
>
> This need to modify the logic of Virtio-Ring detach. Besides this, I also agree
> with this idea.
>
> Thanks.
>
>
> >
> >
> >
> >
> > > >
> > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > ---
> > > > drivers/virtio/virtio_ring.c | 38 +++++++++++++++++++++++++++---------
> > > > 1 file changed, 29 insertions(+), 9 deletions(-)
> > > >
> > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > index e2fc50c05bec..bd5e84afab37 100644
> > > > --- a/drivers/virtio/virtio_ring.c
> > > > +++ b/drivers/virtio/virtio_ring.c
> > > > @@ -70,6 +70,7 @@
> > > > struct vring_desc_state_split {
> > > > void *data; /* Data for callback. */
> > > > struct vring_desc *indir_desc; /* Indirect descriptor, if any. */
> > > > + bool premapped; /* DMA mapping is done by driver. */
> > >
> > > Going back to the original discussion around where this should be
> > > placed. I wonder if we can find a common place to store this since it
> > > has nothing related to virtqueue layout. Maybe desc_extra? And it
> > > would be even better if we can avoid stressing the cache like above.
> > >
> > > > };
> > > >
> > > > struct vring_desc_state_packed {
> > > > @@ -356,8 +357,14 @@ static struct device *vring_dma_dev(const struct vring_virtqueue *vq)
> > > >
> > > > /* Map one sg entry. */
> > > > static int vring_map_one_sg(const struct vring_virtqueue *vq, struct scatterlist *sg,
> > > > - enum dma_data_direction direction, static dma_addr_t *addr)
> > > > + enum dma_data_direction direction,
> > > > + bool premapped, dma_addr_t *addr)
> > >
> > > having things like:
> > >
> > > int func(bool do)
> > > {
> > > if (!do)
> > > return;
> > > }
> > >
> > > is a hint that the check needs to be done by the caller?
> > >
> > > And this change should work for both packed and split. I think we need
> > > to squash the packed changes here.
> > >
> > > Looking at how packed virtqueue uses this in this patch, I don't think
> > > this patch can even be built. I will wait for a new version and
> > > continue the review from there.
> > >
> > > Thanks
> > >
> > >
> > >
> > > > {
> > > > + if (premapped) {
> > > > + *addr = sg_dma_address(sg);
> > > > + return 0;
> > > > + }
> > > > +
> > > > if (!vq->use_dma_api) {
> > > > /*
> > > > * If DMA is not used, KMSAN doesn't know that the scatterlist
> > > > @@ -445,7 +452,7 @@ static void vring_unmap_one_split_indirect(const struct vring_virtqueue *vq,
> > > > }
> > > >
> > > > static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
> > > > - unsigned int i)
> > > > + unsigned int i, bool premapped)
> > > > {
> > > > struct vring_desc_extra *extra = vq->split.desc_extra;
> > > > u16 flags;
> > > > @@ -462,6 +469,9 @@ static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
> > > > (flags & VRING_DESC_F_WRITE) ?
> > > > DMA_FROM_DEVICE : DMA_TO_DEVICE);
> > > > } else {
> > > > + if (premapped)
> > > > + goto out;
> > > > +
> > > > dma_unmap_page(vring_dma_dev(vq),
> > > > extra[i].addr,
> > > > extra[i].len,
> > > > @@ -532,6 +542,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > unsigned int in_sgs,
> > > > void *data,
> > > > void *ctx,
> > > > + bool premapped,
> > > > gfp_t gfp)
> > > > {
> > > > struct vring_virtqueue *vq = to_vvq(_vq);
> > > > @@ -595,7 +606,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> > > > dma_addr_t addr;
> > > >
> > > > - if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, &addr))
> > > > + if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, premapped, &addr))
> > > > goto unmap_release;
> > > >
> > > > prev = i;
> > > > @@ -611,7 +622,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> > > > dma_addr_t addr;
> > > >
> > > > - if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, &addr))
> > > > + if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, premapped, &addr))
> > > > goto unmap_release;
> > > >
> > > > prev = i;
> > > > @@ -657,6 +668,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > >
> > > > /* Store token and indirect buffer state. */
> > > > vq->split.desc_state[head].data = data;
> > > > + vq->split.desc_state[head].premapped = premapped;
> > > > if (indirect)
> > > > vq->split.desc_state[head].indir_desc = desc;
> > > > else
> > > > @@ -686,6 +698,14 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > return 0;
> > > >
> > > > unmap_release:
> > > > + if (premapped) {
> > > > + if (indirect)
> > > > + kfree(desc);
> > > > +
> > > > + END_USE(vq);
> > > > + return -ENOMEM;
> > > > + }
> > > > +
> > > > err_idx = i;
> > > >
> > > > if (indirect)
> > > > @@ -700,7 +720,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > vring_unmap_one_split_indirect(vq, &desc[i]);
> > > > i = virtio16_to_cpu(_vq->vdev, desc[i].next);
> > > > } else
> > > > - i = vring_unmap_one_split(vq, i);
> > > > + i = vring_unmap_one_split(vq, i, false);
> > > > }
> > > >
> > > > if (indirect)
> > > > @@ -757,12 +777,12 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> > > > i = head;
> > > >
> > > > while (vq->split.vring.desc[i].flags & nextflag) {
> > > > - vring_unmap_one_split(vq, i);
> > > > + vring_unmap_one_split(vq, i, state->premapped);
> > > > i = vq->split.desc_extra[i].next;
> > > > vq->vq.num_free++;
> > > > }
> > > >
> > > > - vring_unmap_one_split(vq, i);
> > > > + vring_unmap_one_split(vq, i, state->premapped);
> > > > vq->split.desc_extra[i].next = vq->free_head;
> > > > vq->free_head = head;
> > > >
> > > > @@ -783,7 +803,7 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> > > > VRING_DESC_F_INDIRECT));
> > > > BUG_ON(len == 0 || len % sizeof(struct vring_desc));
> > > >
> > > > - if (vq->use_dma_api) {
> > > > + if (vq->use_dma_api && !state->premapped) {
> > > > for (j = 0; j < len / sizeof(struct vring_desc); j++)
> > > > vring_unmap_one_split_indirect(vq, &indir_desc[j]);
> > > > }
> > > > @@ -2143,7 +2163,7 @@ static inline int virtqueue_add(struct virtqueue *_vq,
> > > > return vq->packed_ring ? virtqueue_add_packed(_vq, sgs, total_sg,
> > > > out_sgs, in_sgs, data, ctx, gfp) :
> > > > virtqueue_add_split(_vq, sgs, total_sg,
> > > > - out_sgs, in_sgs, data, ctx, gfp);
> > > > + out_sgs, in_sgs, data, ctx, premapped, gfp);
> > > > }
> > > >
> > > > /**
> > > > --
> > > > 2.32.0.3.g01195cf9f
> > > >
> >
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH vhost v9 05/12] virtio_ring: split: virtqueue_add_split() support premapped
2023-05-18 7:54 ` Jason Wang
@ 2023-05-18 7:56 ` Xuan Zhuo
2023-05-18 8:57 ` Jason Wang
0 siblings, 1 reply; 44+ messages in thread
From: Xuan Zhuo @ 2023-05-18 7:56 UTC (permalink / raw)
To: Jason Wang; +Cc: Christoph Hellwig, virtualization, Michael S. Tsirkin
On Thu, 18 May 2023 15:54:09 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Thu, May 18, 2023 at 3:41 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Thu, 18 May 2023 03:11:25 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > On Thu, May 18, 2023 at 02:51:57PM +0800, Jason Wang wrote:
> > > > On Wed, May 17, 2023 at 10:23 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > >
> > > > > virtqueue_add_split() only supports virtual addresses, dma is completed
> > > > > in virtqueue_add_split().
> > > > >
> > > > > In some scenarios (such as the AF_XDP scenario), the memory is allocated
> > > > > and DMA is completed in advance, so it is necessary for us to support
> > > > > passing the DMA address to virtqueue_add_split().
> > > > >
> > > > > Record this information in desc_state, we can skip unmap based on this
> > > > > when executing dma unmap.
> > > >
> > > > I would also suggest documenting why a per descriptor metadata is
> > > > needed instead of a per virtqueue one.
> > >
> > > I think we could make it per virtqueue. That would mean all code in
> > > virtio net would have to change to do dma mapping itself instead of
> > > relying on virtio core though. Which is maybe a good idea? Definitely a
> > > very intrusive change though, will need a lot of performance testing
> > > to make sure we don't break anything.
> >
> > In fact, we have tried this idea.
> >
> > The problem is the detach and unmap.
> >
> > We need to get all DMA Addresses from virtio-ring to unmap. Currently, it does
> > not support to return the DMA Address,
>
> I'm not sure I got here, but we've already stored the DMA address in desc_extra?
I mean we need to get the dma address from the virtio-core to virtio-net.
Thanks.
>
> > and for SKB, we need to get multiple DMA
> > Addresses at one time.
>
> Could you elaborate on this?
>
> Thanks
>
> >
> > This need to modify the logic of Virtio-Ring detach. Besides this, I also agree
> > with this idea.
> >
> > Thanks.
> >
> >
> > >
> > >
> > >
> > >
> > > > >
> > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > ---
> > > > > drivers/virtio/virtio_ring.c | 38 +++++++++++++++++++++++++++---------
> > > > > 1 file changed, 29 insertions(+), 9 deletions(-)
> > > > >
> > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > index e2fc50c05bec..bd5e84afab37 100644
> > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > @@ -70,6 +70,7 @@
> > > > > struct vring_desc_state_split {
> > > > > void *data; /* Data for callback. */
> > > > > struct vring_desc *indir_desc; /* Indirect descriptor, if any. */
> > > > > + bool premapped; /* DMA mapping is done by driver. */
> > > >
> > > > Going back to the original discussion around where this should be
> > > > placed. I wonder if we can find a common place to store this since it
> > > > has nothing related to virtqueue layout. Maybe desc_extra? And it
> > > > would be even better if we can avoid stressing the cache like above.
> > > >
> > > > > };
> > > > >
> > > > > struct vring_desc_state_packed {
> > > > > @@ -356,8 +357,14 @@ static struct device *vring_dma_dev(const struct vring_virtqueue *vq)
> > > > >
> > > > > /* Map one sg entry. */
> > > > > static int vring_map_one_sg(const struct vring_virtqueue *vq, struct scatterlist *sg,
> > > > > - enum dma_data_direction direction, static dma_addr_t *addr)
> > > > > + enum dma_data_direction direction,
> > > > > + bool premapped, dma_addr_t *addr)
> > > >
> > > > having things like:
> > > >
> > > > int func(bool do)
> > > > {
> > > > if (!do)
> > > > return;
> > > > }
> > > >
> > > > is a hint that the check needs to be done by the caller?
> > > >
> > > > And this change should work for both packed and split. I think we need
> > > > to squash the packed changes here.
> > > >
> > > > Looking at how packed virtqueue uses this in this patch, I don't think
> > > > this patch can even be built. I will wait for a new version and
> > > > continue the review from there.
> > > >
> > > > Thanks
> > > >
> > > >
> > > >
> > > > > {
> > > > > + if (premapped) {
> > > > > + *addr = sg_dma_address(sg);
> > > > > + return 0;
> > > > > + }
> > > > > +
> > > > > if (!vq->use_dma_api) {
> > > > > /*
> > > > > * If DMA is not used, KMSAN doesn't know that the scatterlist
> > > > > @@ -445,7 +452,7 @@ static void vring_unmap_one_split_indirect(const struct vring_virtqueue *vq,
> > > > > }
> > > > >
> > > > > static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
> > > > > - unsigned int i)
> > > > > + unsigned int i, bool premapped)
> > > > > {
> > > > > struct vring_desc_extra *extra = vq->split.desc_extra;
> > > > > u16 flags;
> > > > > @@ -462,6 +469,9 @@ static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
> > > > > (flags & VRING_DESC_F_WRITE) ?
> > > > > DMA_FROM_DEVICE : DMA_TO_DEVICE);
> > > > > } else {
> > > > > + if (premapped)
> > > > > + goto out;
> > > > > +
> > > > > dma_unmap_page(vring_dma_dev(vq),
> > > > > extra[i].addr,
> > > > > extra[i].len,
> > > > > @@ -532,6 +542,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > unsigned int in_sgs,
> > > > > void *data,
> > > > > void *ctx,
> > > > > + bool premapped,
> > > > > gfp_t gfp)
> > > > > {
> > > > > struct vring_virtqueue *vq = to_vvq(_vq);
> > > > > @@ -595,7 +606,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> > > > > dma_addr_t addr;
> > > > >
> > > > > - if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, &addr))
> > > > > + if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, premapped, &addr))
> > > > > goto unmap_release;
> > > > >
> > > > > prev = i;
> > > > > @@ -611,7 +622,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> > > > > dma_addr_t addr;
> > > > >
> > > > > - if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, &addr))
> > > > > + if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, premapped, &addr))
> > > > > goto unmap_release;
> > > > >
> > > > > prev = i;
> > > > > @@ -657,6 +668,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > >
> > > > > /* Store token and indirect buffer state. */
> > > > > vq->split.desc_state[head].data = data;
> > > > > + vq->split.desc_state[head].premapped = premapped;
> > > > > if (indirect)
> > > > > vq->split.desc_state[head].indir_desc = desc;
> > > > > else
> > > > > @@ -686,6 +698,14 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > return 0;
> > > > >
> > > > > unmap_release:
> > > > > + if (premapped) {
> > > > > + if (indirect)
> > > > > + kfree(desc);
> > > > > +
> > > > > + END_USE(vq);
> > > > > + return -ENOMEM;
> > > > > + }
> > > > > +
> > > > > err_idx = i;
> > > > >
> > > > > if (indirect)
> > > > > @@ -700,7 +720,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > vring_unmap_one_split_indirect(vq, &desc[i]);
> > > > > i = virtio16_to_cpu(_vq->vdev, desc[i].next);
> > > > > } else
> > > > > - i = vring_unmap_one_split(vq, i);
> > > > > + i = vring_unmap_one_split(vq, i, false);
> > > > > }
> > > > >
> > > > > if (indirect)
> > > > > @@ -757,12 +777,12 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> > > > > i = head;
> > > > >
> > > > > while (vq->split.vring.desc[i].flags & nextflag) {
> > > > > - vring_unmap_one_split(vq, i);
> > > > > + vring_unmap_one_split(vq, i, state->premapped);
> > > > > i = vq->split.desc_extra[i].next;
> > > > > vq->vq.num_free++;
> > > > > }
> > > > >
> > > > > - vring_unmap_one_split(vq, i);
> > > > > + vring_unmap_one_split(vq, i, state->premapped);
> > > > > vq->split.desc_extra[i].next = vq->free_head;
> > > > > vq->free_head = head;
> > > > >
> > > > > @@ -783,7 +803,7 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> > > > > VRING_DESC_F_INDIRECT));
> > > > > BUG_ON(len == 0 || len % sizeof(struct vring_desc));
> > > > >
> > > > > - if (vq->use_dma_api) {
> > > > > + if (vq->use_dma_api && !state->premapped) {
> > > > > for (j = 0; j < len / sizeof(struct vring_desc); j++)
> > > > > vring_unmap_one_split_indirect(vq, &indir_desc[j]);
> > > > > }
> > > > > @@ -2143,7 +2163,7 @@ static inline int virtqueue_add(struct virtqueue *_vq,
> > > > > return vq->packed_ring ? virtqueue_add_packed(_vq, sgs, total_sg,
> > > > > out_sgs, in_sgs, data, ctx, gfp) :
> > > > > virtqueue_add_split(_vq, sgs, total_sg,
> > > > > - out_sgs, in_sgs, data, ctx, gfp);
> > > > > + out_sgs, in_sgs, data, ctx, premapped, gfp);
> > > > > }
> > > > >
> > > > > /**
> > > > > --
> > > > > 2.32.0.3.g01195cf9f
> > > > >
> > >
> >
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH vhost v9 05/12] virtio_ring: split: virtqueue_add_split() support premapped
2023-05-18 7:33 ` Xuan Zhuo
2023-05-18 7:54 ` Jason Wang
@ 2023-05-18 8:29 ` Michael S. Tsirkin
2023-05-18 8:50 ` Xuan Zhuo
2023-05-18 8:57 ` Jason Wang
1 sibling, 2 replies; 44+ messages in thread
From: Michael S. Tsirkin @ 2023-05-18 8:29 UTC (permalink / raw)
To: Xuan Zhuo; +Cc: Christoph Hellwig, virtualization
On Thu, May 18, 2023 at 03:33:52PM +0800, Xuan Zhuo wrote:
> On Thu, 18 May 2023 03:11:25 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > On Thu, May 18, 2023 at 02:51:57PM +0800, Jason Wang wrote:
> > > On Wed, May 17, 2023 at 10:23 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > virtqueue_add_split() only supports virtual addresses, dma is completed
> > > > in virtqueue_add_split().
> > > >
> > > > In some scenarios (such as the AF_XDP scenario), the memory is allocated
> > > > and DMA is completed in advance, so it is necessary for us to support
> > > > passing the DMA address to virtqueue_add_split().
> > > >
> > > > Record this information in desc_state, we can skip unmap based on this
> > > > when executing dma unmap.
> > >
> > > I would also suggest documenting why a per descriptor metadata is
> > > needed instead of a per virtqueue one.
> >
> > I think we could make it per virtqueue. That would mean all code in
> > virtio net would have to change to do dma mapping itself instead of
> > relying on virtio core though. Which is maybe a good idea? Definitely a
> > very intrusive change though, will need a lot of performance testing
> > to make sure we don't break anything.
>
> In fact, we have tried this idea.
>
> The problem is the detach and unmap.
>
> We need to get all DMA Addresses from virtio-ring to unmap. Currently, it does
> not support to return the DMA Address, and for SKB, we need to get multiple DMA
> Addresses at one time.
>
> This need to modify the logic of Virtio-Ring detach. Besides this, I also agree
> with this idea.
>
> Thanks.
Well you can have a version of get_buf that returns them ... but
it is not clear to me all this is worth it unless you want
to do unsafe tricks like leaving them mapped. I'd leave that
for another day maybe.
For marking desc as premapped I think we can use a bit from
desc_extra->flags, either reusing one of NEXT,AVAIL,USED, or stealing
another one.
>
> >
> >
> >
> >
> > > >
> > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > ---
> > > > drivers/virtio/virtio_ring.c | 38 +++++++++++++++++++++++++++---------
> > > > 1 file changed, 29 insertions(+), 9 deletions(-)
> > > >
> > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > index e2fc50c05bec..bd5e84afab37 100644
> > > > --- a/drivers/virtio/virtio_ring.c
> > > > +++ b/drivers/virtio/virtio_ring.c
> > > > @@ -70,6 +70,7 @@
> > > > struct vring_desc_state_split {
> > > > void *data; /* Data for callback. */
> > > > struct vring_desc *indir_desc; /* Indirect descriptor, if any. */
> > > > + bool premapped; /* DMA mapping is done by driver. */
> > >
> > > Going back to the original discussion around where this should be
> > > placed. I wonder if we can find a common place to store this since it
> > > has nothing related to virtqueue layout. Maybe desc_extra? And it
> > > would be even better if we can avoid stressing the cache like above.
> > >
> > > > };
> > > >
> > > > struct vring_desc_state_packed {
> > > > @@ -356,8 +357,14 @@ static struct device *vring_dma_dev(const struct vring_virtqueue *vq)
> > > >
> > > > /* Map one sg entry. */
> > > > static int vring_map_one_sg(const struct vring_virtqueue *vq, struct scatterlist *sg,
> > > > - enum dma_data_direction direction, static dma_addr_t *addr)
> > > > + enum dma_data_direction direction,
> > > > + bool premapped, dma_addr_t *addr)
> > >
> > > having things like:
> > >
> > > int func(bool do)
> > > {
> > > if (!do)
> > > return;
> > > }
> > >
> > > is a hint that the check needs to be done by the caller?
> > >
> > > And this change should work for both packed and split. I think we need
> > > to squash the packed changes here.
> > >
> > > Looking at how packed virtqueue uses this in this patch, I don't think
> > > this patch can even be built. I will wait for a new version and
> > > continue the review from there.
> > >
> > > Thanks
> > >
> > >
> > >
> > > > {
> > > > + if (premapped) {
> > > > + *addr = sg_dma_address(sg);
> > > > + return 0;
> > > > + }
> > > > +
> > > > if (!vq->use_dma_api) {
> > > > /*
> > > > * If DMA is not used, KMSAN doesn't know that the scatterlist
> > > > @@ -445,7 +452,7 @@ static void vring_unmap_one_split_indirect(const struct vring_virtqueue *vq,
> > > > }
> > > >
> > > > static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
> > > > - unsigned int i)
> > > > + unsigned int i, bool premapped)
> > > > {
> > > > struct vring_desc_extra *extra = vq->split.desc_extra;
> > > > u16 flags;
> > > > @@ -462,6 +469,9 @@ static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
> > > > (flags & VRING_DESC_F_WRITE) ?
> > > > DMA_FROM_DEVICE : DMA_TO_DEVICE);
> > > > } else {
> > > > + if (premapped)
> > > > + goto out;
> > > > +
> > > > dma_unmap_page(vring_dma_dev(vq),
> > > > extra[i].addr,
> > > > extra[i].len,
> > > > @@ -532,6 +542,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > unsigned int in_sgs,
> > > > void *data,
> > > > void *ctx,
> > > > + bool premapped,
> > > > gfp_t gfp)
> > > > {
> > > > struct vring_virtqueue *vq = to_vvq(_vq);
> > > > @@ -595,7 +606,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> > > > dma_addr_t addr;
> > > >
> > > > - if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, &addr))
> > > > + if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, premapped, &addr))
> > > > goto unmap_release;
> > > >
> > > > prev = i;
> > > > @@ -611,7 +622,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> > > > dma_addr_t addr;
> > > >
> > > > - if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, &addr))
> > > > + if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, premapped, &addr))
> > > > goto unmap_release;
> > > >
> > > > prev = i;
> > > > @@ -657,6 +668,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > >
> > > > /* Store token and indirect buffer state. */
> > > > vq->split.desc_state[head].data = data;
> > > > + vq->split.desc_state[head].premapped = premapped;
> > > > if (indirect)
> > > > vq->split.desc_state[head].indir_desc = desc;
> > > > else
> > > > @@ -686,6 +698,14 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > return 0;
> > > >
> > > > unmap_release:
> > > > + if (premapped) {
> > > > + if (indirect)
> > > > + kfree(desc);
> > > > +
> > > > + END_USE(vq);
> > > > + return -ENOMEM;
> > > > + }
> > > > +
> > > > err_idx = i;
> > > >
> > > > if (indirect)
> > > > @@ -700,7 +720,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > vring_unmap_one_split_indirect(vq, &desc[i]);
> > > > i = virtio16_to_cpu(_vq->vdev, desc[i].next);
> > > > } else
> > > > - i = vring_unmap_one_split(vq, i);
> > > > + i = vring_unmap_one_split(vq, i, false);
> > > > }
> > > >
> > > > if (indirect)
> > > > @@ -757,12 +777,12 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> > > > i = head;
> > > >
> > > > while (vq->split.vring.desc[i].flags & nextflag) {
> > > > - vring_unmap_one_split(vq, i);
> > > > + vring_unmap_one_split(vq, i, state->premapped);
> > > > i = vq->split.desc_extra[i].next;
> > > > vq->vq.num_free++;
> > > > }
> > > >
> > > > - vring_unmap_one_split(vq, i);
> > > > + vring_unmap_one_split(vq, i, state->premapped);
> > > > vq->split.desc_extra[i].next = vq->free_head;
> > > > vq->free_head = head;
> > > >
> > > > @@ -783,7 +803,7 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> > > > VRING_DESC_F_INDIRECT));
> > > > BUG_ON(len == 0 || len % sizeof(struct vring_desc));
> > > >
> > > > - if (vq->use_dma_api) {
> > > > + if (vq->use_dma_api && !state->premapped) {
> > > > for (j = 0; j < len / sizeof(struct vring_desc); j++)
> > > > vring_unmap_one_split_indirect(vq, &indir_desc[j]);
> > > > }
> > > > @@ -2143,7 +2163,7 @@ static inline int virtqueue_add(struct virtqueue *_vq,
> > > > return vq->packed_ring ? virtqueue_add_packed(_vq, sgs, total_sg,
> > > > out_sgs, in_sgs, data, ctx, gfp) :
> > > > virtqueue_add_split(_vq, sgs, total_sg,
> > > > - out_sgs, in_sgs, data, ctx, gfp);
> > > > + out_sgs, in_sgs, data, ctx, premapped, gfp);
> > > > }
> > > >
> > > > /**
> > > > --
> > > > 2.32.0.3.g01195cf9f
> > > >
> >
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH vhost v9 05/12] virtio_ring: split: virtqueue_add_split() support premapped
2023-05-18 8:29 ` Michael S. Tsirkin
@ 2023-05-18 8:50 ` Xuan Zhuo
2023-05-18 9:41 ` Michael S. Tsirkin
2023-05-18 8:57 ` Jason Wang
1 sibling, 1 reply; 44+ messages in thread
From: Xuan Zhuo @ 2023-05-18 8:50 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: Christoph Hellwig, virtualization
On Thu, 18 May 2023 04:29:01 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Thu, May 18, 2023 at 03:33:52PM +0800, Xuan Zhuo wrote:
> > On Thu, 18 May 2023 03:11:25 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > On Thu, May 18, 2023 at 02:51:57PM +0800, Jason Wang wrote:
> > > > On Wed, May 17, 2023 at 10:23 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > >
> > > > > virtqueue_add_split() only supports virtual addresses, dma is completed
> > > > > in virtqueue_add_split().
> > > > >
> > > > > In some scenarios (such as the AF_XDP scenario), the memory is allocated
> > > > > and DMA is completed in advance, so it is necessary for us to support
> > > > > passing the DMA address to virtqueue_add_split().
> > > > >
> > > > > Record this information in desc_state, we can skip unmap based on this
> > > > > when executing dma unmap.
> > > >
> > > > I would also suggest documenting why a per descriptor metadata is
> > > > needed instead of a per virtqueue one.
> > >
> > > I think we could make it per virtqueue. That would mean all code in
> > > virtio net would have to change to do dma mapping itself instead of
> > > relying on virtio core though. Which is maybe a good idea? Definitely a
> > > very intrusive change though, will need a lot of performance testing
> > > to make sure we don't break anything.
> >
> > In fact, we have tried this idea.
> >
> > The problem is the detach and unmap.
> >
> > We need to get all DMA Addresses from virtio-ring to unmap. Currently, it does
> > not support to return the DMA Address, and for SKB, we need to get multiple DMA
> > Addresses at one time.
> >
> > This need to modify the logic of Virtio-Ring detach. Besides this, I also agree
> > with this idea.
> >
> > Thanks.
>
> Well you can have a version of get_buf that returns them ... but
> it is not clear to me all this is worth it unless you want
> to do unsafe tricks like leaving them mapped. I'd leave that
> for another day maybe.
>
> For marking desc as premapped I think we can use a bit from
> desc_extra->flags, either reusing one of NEXT,AVAIL,USED, or stealing
> another one.
>
Do you mean this https://lore.kernel.org/all/20220210085124.15466-6-xuanzhuo@linux.alibaba.com/
Thanks.
>
>
> >
> > >
> > >
> > >
> > >
> > > > >
> > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > ---
> > > > > drivers/virtio/virtio_ring.c | 38 +++++++++++++++++++++++++++---------
> > > > > 1 file changed, 29 insertions(+), 9 deletions(-)
> > > > >
> > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > index e2fc50c05bec..bd5e84afab37 100644
> > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > @@ -70,6 +70,7 @@
> > > > > struct vring_desc_state_split {
> > > > > void *data; /* Data for callback. */
> > > > > struct vring_desc *indir_desc; /* Indirect descriptor, if any. */
> > > > > + bool premapped; /* DMA mapping is done by driver. */
> > > >
> > > > Going back to the original discussion around where this should be
> > > > placed. I wonder if we can find a common place to store this since it
> > > > has nothing related to virtqueue layout. Maybe desc_extra? And it
> > > > would be even better if we can avoid stressing the cache like above.
> > > >
> > > > > };
> > > > >
> > > > > struct vring_desc_state_packed {
> > > > > @@ -356,8 +357,14 @@ static struct device *vring_dma_dev(const struct vring_virtqueue *vq)
> > > > >
> > > > > /* Map one sg entry. */
> > > > > static int vring_map_one_sg(const struct vring_virtqueue *vq, struct scatterlist *sg,
> > > > > - enum dma_data_direction direction, static dma_addr_t *addr)
> > > > > + enum dma_data_direction direction,
> > > > > + bool premapped, dma_addr_t *addr)
> > > >
> > > > having things like:
> > > >
> > > > int func(bool do)
> > > > {
> > > > if (!do)
> > > > return;
> > > > }
> > > >
> > > > is a hint that the check needs to be done by the caller?
> > > >
> > > > And this change should work for both packed and split. I think we need
> > > > to squash the packed changes here.
> > > >
> > > > Looking at how packed virtqueue uses this in this patch, I don't think
> > > > this patch can even be built. I will wait for a new version and
> > > > continue the review from there.
> > > >
> > > > Thanks
> > > >
> > > >
> > > >
> > > > > {
> > > > > + if (premapped) {
> > > > > + *addr = sg_dma_address(sg);
> > > > > + return 0;
> > > > > + }
> > > > > +
> > > > > if (!vq->use_dma_api) {
> > > > > /*
> > > > > * If DMA is not used, KMSAN doesn't know that the scatterlist
> > > > > @@ -445,7 +452,7 @@ static void vring_unmap_one_split_indirect(const struct vring_virtqueue *vq,
> > > > > }
> > > > >
> > > > > static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
> > > > > - unsigned int i)
> > > > > + unsigned int i, bool premapped)
> > > > > {
> > > > > struct vring_desc_extra *extra = vq->split.desc_extra;
> > > > > u16 flags;
> > > > > @@ -462,6 +469,9 @@ static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
> > > > > (flags & VRING_DESC_F_WRITE) ?
> > > > > DMA_FROM_DEVICE : DMA_TO_DEVICE);
> > > > > } else {
> > > > > + if (premapped)
> > > > > + goto out;
> > > > > +
> > > > > dma_unmap_page(vring_dma_dev(vq),
> > > > > extra[i].addr,
> > > > > extra[i].len,
> > > > > @@ -532,6 +542,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > unsigned int in_sgs,
> > > > > void *data,
> > > > > void *ctx,
> > > > > + bool premapped,
> > > > > gfp_t gfp)
> > > > > {
> > > > > struct vring_virtqueue *vq = to_vvq(_vq);
> > > > > @@ -595,7 +606,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> > > > > dma_addr_t addr;
> > > > >
> > > > > - if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, &addr))
> > > > > + if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, premapped, &addr))
> > > > > goto unmap_release;
> > > > >
> > > > > prev = i;
> > > > > @@ -611,7 +622,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> > > > > dma_addr_t addr;
> > > > >
> > > > > - if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, &addr))
> > > > > + if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, premapped, &addr))
> > > > > goto unmap_release;
> > > > >
> > > > > prev = i;
> > > > > @@ -657,6 +668,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > >
> > > > > /* Store token and indirect buffer state. */
> > > > > vq->split.desc_state[head].data = data;
> > > > > + vq->split.desc_state[head].premapped = premapped;
> > > > > if (indirect)
> > > > > vq->split.desc_state[head].indir_desc = desc;
> > > > > else
> > > > > @@ -686,6 +698,14 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > return 0;
> > > > >
> > > > > unmap_release:
> > > > > + if (premapped) {
> > > > > + if (indirect)
> > > > > + kfree(desc);
> > > > > +
> > > > > + END_USE(vq);
> > > > > + return -ENOMEM;
> > > > > + }
> > > > > +
> > > > > err_idx = i;
> > > > >
> > > > > if (indirect)
> > > > > @@ -700,7 +720,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > vring_unmap_one_split_indirect(vq, &desc[i]);
> > > > > i = virtio16_to_cpu(_vq->vdev, desc[i].next);
> > > > > } else
> > > > > - i = vring_unmap_one_split(vq, i);
> > > > > + i = vring_unmap_one_split(vq, i, false);
> > > > > }
> > > > >
> > > > > if (indirect)
> > > > > @@ -757,12 +777,12 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> > > > > i = head;
> > > > >
> > > > > while (vq->split.vring.desc[i].flags & nextflag) {
> > > > > - vring_unmap_one_split(vq, i);
> > > > > + vring_unmap_one_split(vq, i, state->premapped);
> > > > > i = vq->split.desc_extra[i].next;
> > > > > vq->vq.num_free++;
> > > > > }
> > > > >
> > > > > - vring_unmap_one_split(vq, i);
> > > > > + vring_unmap_one_split(vq, i, state->premapped);
> > > > > vq->split.desc_extra[i].next = vq->free_head;
> > > > > vq->free_head = head;
> > > > >
> > > > > @@ -783,7 +803,7 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> > > > > VRING_DESC_F_INDIRECT));
> > > > > BUG_ON(len == 0 || len % sizeof(struct vring_desc));
> > > > >
> > > > > - if (vq->use_dma_api) {
> > > > > + if (vq->use_dma_api && !state->premapped) {
> > > > > for (j = 0; j < len / sizeof(struct vring_desc); j++)
> > > > > vring_unmap_one_split_indirect(vq, &indir_desc[j]);
> > > > > }
> > > > > @@ -2143,7 +2163,7 @@ static inline int virtqueue_add(struct virtqueue *_vq,
> > > > > return vq->packed_ring ? virtqueue_add_packed(_vq, sgs, total_sg,
> > > > > out_sgs, in_sgs, data, ctx, gfp) :
> > > > > virtqueue_add_split(_vq, sgs, total_sg,
> > > > > - out_sgs, in_sgs, data, ctx, gfp);
> > > > > + out_sgs, in_sgs, data, ctx, premapped, gfp);
> > > > > }
> > > > >
> > > > > /**
> > > > > --
> > > > > 2.32.0.3.g01195cf9f
> > > > >
> > >
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH vhost v9 05/12] virtio_ring: split: virtqueue_add_split() support premapped
2023-05-18 7:56 ` Xuan Zhuo
@ 2023-05-18 8:57 ` Jason Wang
2023-05-18 9:18 ` Xuan Zhuo
0 siblings, 1 reply; 44+ messages in thread
From: Jason Wang @ 2023-05-18 8:57 UTC (permalink / raw)
To: Xuan Zhuo; +Cc: Christoph Hellwig, virtualization, Michael S. Tsirkin
On Thu, May 18, 2023 at 3:57 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Thu, 18 May 2023 15:54:09 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Thu, May 18, 2023 at 3:41 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > On Thu, 18 May 2023 03:11:25 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > On Thu, May 18, 2023 at 02:51:57PM +0800, Jason Wang wrote:
> > > > > On Wed, May 17, 2023 at 10:23 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > >
> > > > > > virtqueue_add_split() only supports virtual addresses, dma is completed
> > > > > > in virtqueue_add_split().
> > > > > >
> > > > > > In some scenarios (such as the AF_XDP scenario), the memory is allocated
> > > > > > and DMA is completed in advance, so it is necessary for us to support
> > > > > > passing the DMA address to virtqueue_add_split().
> > > > > >
> > > > > > Record this information in desc_state, we can skip unmap based on this
> > > > > > when executing dma unmap.
> > > > >
> > > > > I would also suggest documenting why a per descriptor metadata is
> > > > > needed instead of a per virtqueue one.
> > > >
> > > > I think we could make it per virtqueue. That would mean all code in
> > > > virtio net would have to change to do dma mapping itself instead of
> > > > relying on virtio core though. Which is maybe a good idea? Definitely a
> > > > very intrusive change though, will need a lot of performance testing
> > > > to make sure we don't break anything.
> > >
> > > In fact, we have tried this idea.
> > >
> > > The problem is the detach and unmap.
> > >
> > > We need to get all DMA Addresses from virtio-ring to unmap. Currently, it does
> > > not support to return the DMA Address,
> >
> > I'm not sure I got here, but we've already stored the DMA address in desc_extra?
>
>
> I mean we need to get the dma address from the virtio-core to virtio-net.
>
It probably just requires a new helper.
Thanks
> Thanks.
>
>
> >
> > > and for SKB, we need to get multiple DMA
> > > Addresses at one time.
> >
> > Could you elaborate on this?
> >
> > Thanks
> >
> > >
> > > This need to modify the logic of Virtio-Ring detach. Besides this, I also agree
> > > with this idea.
> > >
> > > Thanks.
> > >
> > >
> > > >
> > > >
> > > >
> > > >
> > > > > >
> > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > ---
> > > > > > drivers/virtio/virtio_ring.c | 38 +++++++++++++++++++++++++++---------
> > > > > > 1 file changed, 29 insertions(+), 9 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > > index e2fc50c05bec..bd5e84afab37 100644
> > > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > > @@ -70,6 +70,7 @@
> > > > > > struct vring_desc_state_split {
> > > > > > void *data; /* Data for callback. */
> > > > > > struct vring_desc *indir_desc; /* Indirect descriptor, if any. */
> > > > > > + bool premapped; /* DMA mapping is done by driver. */
> > > > >
> > > > > Going back to the original discussion around where this should be
> > > > > placed. I wonder if we can find a common place to store this since it
> > > > > has nothing related to virtqueue layout. Maybe desc_extra? And it
> > > > > would be even better if we can avoid stressing the cache like above.
> > > > >
> > > > > > };
> > > > > >
> > > > > > struct vring_desc_state_packed {
> > > > > > @@ -356,8 +357,14 @@ static struct device *vring_dma_dev(const struct vring_virtqueue *vq)
> > > > > >
> > > > > > /* Map one sg entry. */
> > > > > > static int vring_map_one_sg(const struct vring_virtqueue *vq, struct scatterlist *sg,
> > > > > > - enum dma_data_direction direction, static dma_addr_t *addr)
> > > > > > + enum dma_data_direction direction,
> > > > > > + bool premapped, dma_addr_t *addr)
> > > > >
> > > > > having things like:
> > > > >
> > > > > int func(bool do)
> > > > > {
> > > > > if (!do)
> > > > > return;
> > > > > }
> > > > >
> > > > > is a hint that the check needs to be done by the caller?
> > > > >
> > > > > And this change should work for both packed and split. I think we need
> > > > > to squash the packed changes here.
> > > > >
> > > > > Looking at how packed virtqueue uses this in this patch, I don't think
> > > > > this patch can even be built. I will wait for a new version and
> > > > > continue the review from there.
> > > > >
> > > > > Thanks
> > > > >
> > > > >
> > > > >
> > > > > > {
> > > > > > + if (premapped) {
> > > > > > + *addr = sg_dma_address(sg);
> > > > > > + return 0;
> > > > > > + }
> > > > > > +
> > > > > > if (!vq->use_dma_api) {
> > > > > > /*
> > > > > > * If DMA is not used, KMSAN doesn't know that the scatterlist
> > > > > > @@ -445,7 +452,7 @@ static void vring_unmap_one_split_indirect(const struct vring_virtqueue *vq,
> > > > > > }
> > > > > >
> > > > > > static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
> > > > > > - unsigned int i)
> > > > > > + unsigned int i, bool premapped)
> > > > > > {
> > > > > > struct vring_desc_extra *extra = vq->split.desc_extra;
> > > > > > u16 flags;
> > > > > > @@ -462,6 +469,9 @@ static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
> > > > > > (flags & VRING_DESC_F_WRITE) ?
> > > > > > DMA_FROM_DEVICE : DMA_TO_DEVICE);
> > > > > > } else {
> > > > > > + if (premapped)
> > > > > > + goto out;
> > > > > > +
> > > > > > dma_unmap_page(vring_dma_dev(vq),
> > > > > > extra[i].addr,
> > > > > > extra[i].len,
> > > > > > @@ -532,6 +542,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > > unsigned int in_sgs,
> > > > > > void *data,
> > > > > > void *ctx,
> > > > > > + bool premapped,
> > > > > > gfp_t gfp)
> > > > > > {
> > > > > > struct vring_virtqueue *vq = to_vvq(_vq);
> > > > > > @@ -595,7 +606,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > > for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> > > > > > dma_addr_t addr;
> > > > > >
> > > > > > - if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, &addr))
> > > > > > + if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, premapped, &addr))
> > > > > > goto unmap_release;
> > > > > >
> > > > > > prev = i;
> > > > > > @@ -611,7 +622,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > > for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> > > > > > dma_addr_t addr;
> > > > > >
> > > > > > - if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, &addr))
> > > > > > + if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, premapped, &addr))
> > > > > > goto unmap_release;
> > > > > >
> > > > > > prev = i;
> > > > > > @@ -657,6 +668,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > >
> > > > > > /* Store token and indirect buffer state. */
> > > > > > vq->split.desc_state[head].data = data;
> > > > > > + vq->split.desc_state[head].premapped = premapped;
> > > > > > if (indirect)
> > > > > > vq->split.desc_state[head].indir_desc = desc;
> > > > > > else
> > > > > > @@ -686,6 +698,14 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > > return 0;
> > > > > >
> > > > > > unmap_release:
> > > > > > + if (premapped) {
> > > > > > + if (indirect)
> > > > > > + kfree(desc);
> > > > > > +
> > > > > > + END_USE(vq);
> > > > > > + return -ENOMEM;
> > > > > > + }
> > > > > > +
> > > > > > err_idx = i;
> > > > > >
> > > > > > if (indirect)
> > > > > > @@ -700,7 +720,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > > vring_unmap_one_split_indirect(vq, &desc[i]);
> > > > > > i = virtio16_to_cpu(_vq->vdev, desc[i].next);
> > > > > > } else
> > > > > > - i = vring_unmap_one_split(vq, i);
> > > > > > + i = vring_unmap_one_split(vq, i, false);
> > > > > > }
> > > > > >
> > > > > > if (indirect)
> > > > > > @@ -757,12 +777,12 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> > > > > > i = head;
> > > > > >
> > > > > > while (vq->split.vring.desc[i].flags & nextflag) {
> > > > > > - vring_unmap_one_split(vq, i);
> > > > > > + vring_unmap_one_split(vq, i, state->premapped);
> > > > > > i = vq->split.desc_extra[i].next;
> > > > > > vq->vq.num_free++;
> > > > > > }
> > > > > >
> > > > > > - vring_unmap_one_split(vq, i);
> > > > > > + vring_unmap_one_split(vq, i, state->premapped);
> > > > > > vq->split.desc_extra[i].next = vq->free_head;
> > > > > > vq->free_head = head;
> > > > > >
> > > > > > @@ -783,7 +803,7 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> > > > > > VRING_DESC_F_INDIRECT));
> > > > > > BUG_ON(len == 0 || len % sizeof(struct vring_desc));
> > > > > >
> > > > > > - if (vq->use_dma_api) {
> > > > > > + if (vq->use_dma_api && !state->premapped) {
> > > > > > for (j = 0; j < len / sizeof(struct vring_desc); j++)
> > > > > > vring_unmap_one_split_indirect(vq, &indir_desc[j]);
> > > > > > }
> > > > > > @@ -2143,7 +2163,7 @@ static inline int virtqueue_add(struct virtqueue *_vq,
> > > > > > return vq->packed_ring ? virtqueue_add_packed(_vq, sgs, total_sg,
> > > > > > out_sgs, in_sgs, data, ctx, gfp) :
> > > > > > virtqueue_add_split(_vq, sgs, total_sg,
> > > > > > - out_sgs, in_sgs, data, ctx, gfp);
> > > > > > + out_sgs, in_sgs, data, ctx, premapped, gfp);
> > > > > > }
> > > > > >
> > > > > > /**
> > > > > > --
> > > > > > 2.32.0.3.g01195cf9f
> > > > > >
> > > >
> > >
> >
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH vhost v9 05/12] virtio_ring: split: virtqueue_add_split() support premapped
2023-05-18 8:29 ` Michael S. Tsirkin
2023-05-18 8:50 ` Xuan Zhuo
@ 2023-05-18 8:57 ` Jason Wang
2023-05-18 9:14 ` Xuan Zhuo
2023-05-18 9:44 ` Michael S. Tsirkin
1 sibling, 2 replies; 44+ messages in thread
From: Jason Wang @ 2023-05-18 8:57 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: Christoph Hellwig, Xuan Zhuo, virtualization
On Thu, May 18, 2023 at 4:29 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Thu, May 18, 2023 at 03:33:52PM +0800, Xuan Zhuo wrote:
> > On Thu, 18 May 2023 03:11:25 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > On Thu, May 18, 2023 at 02:51:57PM +0800, Jason Wang wrote:
> > > > On Wed, May 17, 2023 at 10:23 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > >
> > > > > virtqueue_add_split() only supports virtual addresses, dma is completed
> > > > > in virtqueue_add_split().
> > > > >
> > > > > In some scenarios (such as the AF_XDP scenario), the memory is allocated
> > > > > and DMA is completed in advance, so it is necessary for us to support
> > > > > passing the DMA address to virtqueue_add_split().
> > > > >
> > > > > Record this information in desc_state, we can skip unmap based on this
> > > > > when executing dma unmap.
> > > >
> > > > I would also suggest documenting why a per descriptor metadata is
> > > > needed instead of a per virtqueue one.
> > >
> > > I think we could make it per virtqueue. That would mean all code in
> > > virtio net would have to change to do dma mapping itself instead of
> > > relying on virtio core though. Which is maybe a good idea? Definitely a
> > > very intrusive change though, will need a lot of performance testing
> > > to make sure we don't break anything.
> >
> > In fact, we have tried this idea.
> >
> > The problem is the detach and unmap.
> >
> > We need to get all DMA Addresses from virtio-ring to unmap. Currently, it does
> > not support to return the DMA Address, and for SKB, we need to get multiple DMA
> > Addresses at one time.
> >
> > This need to modify the logic of Virtio-Ring detach. Besides this, I also agree
> > with this idea.
> >
> > Thanks.
>
> Well you can have a version of get_buf that returns them ... but
> it is not clear to me all this is worth it unless you want
> to do unsafe tricks like leaving them mapped.
Some high speed NIC drivers use this trick for better performance.
> I'd leave that
> for another day maybe.
>
> For marking desc as premapped I think we can use a bit from
> desc_extra->flags, either reusing one of NEXT,AVAIL,USED, or stealing
> another one.
Probably.
Thanks
>
>
>
> >
> > >
> > >
> > >
> > >
> > > > >
> > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > ---
> > > > > drivers/virtio/virtio_ring.c | 38 +++++++++++++++++++++++++++---------
> > > > > 1 file changed, 29 insertions(+), 9 deletions(-)
> > > > >
> > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > index e2fc50c05bec..bd5e84afab37 100644
> > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > @@ -70,6 +70,7 @@
> > > > > struct vring_desc_state_split {
> > > > > void *data; /* Data for callback. */
> > > > > struct vring_desc *indir_desc; /* Indirect descriptor, if any. */
> > > > > + bool premapped; /* DMA mapping is done by driver. */
> > > >
> > > > Going back to the original discussion around where this should be
> > > > placed. I wonder if we can find a common place to store this since it
> > > > has nothing related to virtqueue layout. Maybe desc_extra? And it
> > > > would be even better if we can avoid stressing the cache like above.
> > > >
> > > > > };
> > > > >
> > > > > struct vring_desc_state_packed {
> > > > > @@ -356,8 +357,14 @@ static struct device *vring_dma_dev(const struct vring_virtqueue *vq)
> > > > >
> > > > > /* Map one sg entry. */
> > > > > static int vring_map_one_sg(const struct vring_virtqueue *vq, struct scatterlist *sg,
> > > > > - enum dma_data_direction direction, static dma_addr_t *addr)
> > > > > + enum dma_data_direction direction,
> > > > > + bool premapped, dma_addr_t *addr)
> > > >
> > > > having things like:
> > > >
> > > > int func(bool do)
> > > > {
> > > > if (!do)
> > > > return;
> > > > }
> > > >
> > > > is a hint that the check needs to be done by the caller?
> > > >
> > > > And this change should work for both packed and split. I think we need
> > > > to squash the packed changes here.
> > > >
> > > > Looking at how packed virtqueue uses this in this patch, I don't think
> > > > this patch can even be built. I will wait for a new version and
> > > > continue the review from there.
> > > >
> > > > Thanks
> > > >
> > > >
> > > >
> > > > > {
> > > > > + if (premapped) {
> > > > > + *addr = sg_dma_address(sg);
> > > > > + return 0;
> > > > > + }
> > > > > +
> > > > > if (!vq->use_dma_api) {
> > > > > /*
> > > > > * If DMA is not used, KMSAN doesn't know that the scatterlist
> > > > > @@ -445,7 +452,7 @@ static void vring_unmap_one_split_indirect(const struct vring_virtqueue *vq,
> > > > > }
> > > > >
> > > > > static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
> > > > > - unsigned int i)
> > > > > + unsigned int i, bool premapped)
> > > > > {
> > > > > struct vring_desc_extra *extra = vq->split.desc_extra;
> > > > > u16 flags;
> > > > > @@ -462,6 +469,9 @@ static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
> > > > > (flags & VRING_DESC_F_WRITE) ?
> > > > > DMA_FROM_DEVICE : DMA_TO_DEVICE);
> > > > > } else {
> > > > > + if (premapped)
> > > > > + goto out;
> > > > > +
> > > > > dma_unmap_page(vring_dma_dev(vq),
> > > > > extra[i].addr,
> > > > > extra[i].len,
> > > > > @@ -532,6 +542,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > unsigned int in_sgs,
> > > > > void *data,
> > > > > void *ctx,
> > > > > + bool premapped,
> > > > > gfp_t gfp)
> > > > > {
> > > > > struct vring_virtqueue *vq = to_vvq(_vq);
> > > > > @@ -595,7 +606,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> > > > > dma_addr_t addr;
> > > > >
> > > > > - if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, &addr))
> > > > > + if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, premapped, &addr))
> > > > > goto unmap_release;
> > > > >
> > > > > prev = i;
> > > > > @@ -611,7 +622,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> > > > > dma_addr_t addr;
> > > > >
> > > > > - if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, &addr))
> > > > > + if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, premapped, &addr))
> > > > > goto unmap_release;
> > > > >
> > > > > prev = i;
> > > > > @@ -657,6 +668,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > >
> > > > > /* Store token and indirect buffer state. */
> > > > > vq->split.desc_state[head].data = data;
> > > > > + vq->split.desc_state[head].premapped = premapped;
> > > > > if (indirect)
> > > > > vq->split.desc_state[head].indir_desc = desc;
> > > > > else
> > > > > @@ -686,6 +698,14 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > return 0;
> > > > >
> > > > > unmap_release:
> > > > > + if (premapped) {
> > > > > + if (indirect)
> > > > > + kfree(desc);
> > > > > +
> > > > > + END_USE(vq);
> > > > > + return -ENOMEM;
> > > > > + }
> > > > > +
> > > > > err_idx = i;
> > > > >
> > > > > if (indirect)
> > > > > @@ -700,7 +720,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > vring_unmap_one_split_indirect(vq, &desc[i]);
> > > > > i = virtio16_to_cpu(_vq->vdev, desc[i].next);
> > > > > } else
> > > > > - i = vring_unmap_one_split(vq, i);
> > > > > + i = vring_unmap_one_split(vq, i, false);
> > > > > }
> > > > >
> > > > > if (indirect)
> > > > > @@ -757,12 +777,12 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> > > > > i = head;
> > > > >
> > > > > while (vq->split.vring.desc[i].flags & nextflag) {
> > > > > - vring_unmap_one_split(vq, i);
> > > > > + vring_unmap_one_split(vq, i, state->premapped);
> > > > > i = vq->split.desc_extra[i].next;
> > > > > vq->vq.num_free++;
> > > > > }
> > > > >
> > > > > - vring_unmap_one_split(vq, i);
> > > > > + vring_unmap_one_split(vq, i, state->premapped);
> > > > > vq->split.desc_extra[i].next = vq->free_head;
> > > > > vq->free_head = head;
> > > > >
> > > > > @@ -783,7 +803,7 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> > > > > VRING_DESC_F_INDIRECT));
> > > > > BUG_ON(len == 0 || len % sizeof(struct vring_desc));
> > > > >
> > > > > - if (vq->use_dma_api) {
> > > > > + if (vq->use_dma_api && !state->premapped) {
> > > > > for (j = 0; j < len / sizeof(struct vring_desc); j++)
> > > > > vring_unmap_one_split_indirect(vq, &indir_desc[j]);
> > > > > }
> > > > > @@ -2143,7 +2163,7 @@ static inline int virtqueue_add(struct virtqueue *_vq,
> > > > > return vq->packed_ring ? virtqueue_add_packed(_vq, sgs, total_sg,
> > > > > out_sgs, in_sgs, data, ctx, gfp) :
> > > > > virtqueue_add_split(_vq, sgs, total_sg,
> > > > > - out_sgs, in_sgs, data, ctx, gfp);
> > > > > + out_sgs, in_sgs, data, ctx, premapped, gfp);
> > > > > }
> > > > >
> > > > > /**
> > > > > --
> > > > > 2.32.0.3.g01195cf9f
> > > > >
> > >
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH vhost v9 05/12] virtio_ring: split: virtqueue_add_split() support premapped
2023-05-18 8:57 ` Jason Wang
@ 2023-05-18 9:14 ` Xuan Zhuo
2023-05-18 9:49 ` Michael S. Tsirkin
2023-05-18 9:44 ` Michael S. Tsirkin
1 sibling, 1 reply; 44+ messages in thread
From: Xuan Zhuo @ 2023-05-18 9:14 UTC (permalink / raw)
To: Jason Wang; +Cc: Christoph Hellwig, Michael S. Tsirkin, virtualization
On Thu, 18 May 2023 16:57:37 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Thu, May 18, 2023 at 4:29 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Thu, May 18, 2023 at 03:33:52PM +0800, Xuan Zhuo wrote:
> > > On Thu, 18 May 2023 03:11:25 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > On Thu, May 18, 2023 at 02:51:57PM +0800, Jason Wang wrote:
> > > > > On Wed, May 17, 2023 at 10:23 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > >
> > > > > > virtqueue_add_split() only supports virtual addresses, dma is completed
> > > > > > in virtqueue_add_split().
> > > > > >
> > > > > > In some scenarios (such as the AF_XDP scenario), the memory is allocated
> > > > > > and DMA is completed in advance, so it is necessary for us to support
> > > > > > passing the DMA address to virtqueue_add_split().
> > > > > >
> > > > > > Record this information in desc_state, we can skip unmap based on this
> > > > > > when executing dma unmap.
> > > > >
> > > > > I would also suggest documenting why a per descriptor metadata is
> > > > > needed instead of a per virtqueue one.
> > > >
> > > > I think we could make it per virtqueue. That would mean all code in
> > > > virtio net would have to change to do dma mapping itself instead of
> > > > relying on virtio core though. Which is maybe a good idea? Definitely a
> > > > very intrusive change though, will need a lot of performance testing
> > > > to make sure we don't break anything.
> > >
> > > In fact, we have tried this idea.
> > >
> > > The problem is the detach and unmap.
> > >
> > > We need to get all DMA Addresses from virtio-ring to unmap. Currently, it does
> > > not support to return the DMA Address, and for SKB, we need to get multiple DMA
> > > Addresses at one time.
> > >
> > > This need to modify the logic of Virtio-Ring detach. Besides this, I also agree
> > > with this idea.
> > >
> > > Thanks.
> >
> > Well you can have a version of get_buf that returns them ... but
> > it is not clear to me all this is worth it unless you want
> > to do unsafe tricks like leaving them mapped.
>
> Some high speed NIC drivers use this trick for better performance.
Interesting, this is the first time I know this. Is there any problem?
So, is that virtio-net master the operation of dma by itself the right way?
Thanks
>
> > I'd leave that
> > for another day maybe.
> >
> > For marking desc as premapped I think we can use a bit from
> > desc_extra->flags, either reusing one of NEXT,AVAIL,USED, or stealing
> > another one.
>
> Probably.
>
> Thanks
>
> >
> >
> >
> > >
> > > >
> > > >
> > > >
> > > >
> > > > > >
> > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > ---
> > > > > > drivers/virtio/virtio_ring.c | 38 +++++++++++++++++++++++++++---------
> > > > > > 1 file changed, 29 insertions(+), 9 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > > index e2fc50c05bec..bd5e84afab37 100644
> > > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > > @@ -70,6 +70,7 @@
> > > > > > struct vring_desc_state_split {
> > > > > > void *data; /* Data for callback. */
> > > > > > struct vring_desc *indir_desc; /* Indirect descriptor, if any. */
> > > > > > + bool premapped; /* DMA mapping is done by driver. */
> > > > >
> > > > > Going back to the original discussion around where this should be
> > > > > placed. I wonder if we can find a common place to store this since it
> > > > > has nothing related to virtqueue layout. Maybe desc_extra? And it
> > > > > would be even better if we can avoid stressing the cache like above.
> > > > >
> > > > > > };
> > > > > >
> > > > > > struct vring_desc_state_packed {
> > > > > > @@ -356,8 +357,14 @@ static struct device *vring_dma_dev(const struct vring_virtqueue *vq)
> > > > > >
> > > > > > /* Map one sg entry. */
> > > > > > static int vring_map_one_sg(const struct vring_virtqueue *vq, struct scatterlist *sg,
> > > > > > - enum dma_data_direction direction, static dma_addr_t *addr)
> > > > > > + enum dma_data_direction direction,
> > > > > > + bool premapped, dma_addr_t *addr)
> > > > >
> > > > > having things like:
> > > > >
> > > > > int func(bool do)
> > > > > {
> > > > > if (!do)
> > > > > return;
> > > > > }
> > > > >
> > > > > is a hint that the check needs to be done by the caller?
> > > > >
> > > > > And this change should work for both packed and split. I think we need
> > > > > to squash the packed changes here.
> > > > >
> > > > > Looking at how packed virtqueue uses this in this patch, I don't think
> > > > > this patch can even be built. I will wait for a new version and
> > > > > continue the review from there.
> > > > >
> > > > > Thanks
> > > > >
> > > > >
> > > > >
> > > > > > {
> > > > > > + if (premapped) {
> > > > > > + *addr = sg_dma_address(sg);
> > > > > > + return 0;
> > > > > > + }
> > > > > > +
> > > > > > if (!vq->use_dma_api) {
> > > > > > /*
> > > > > > * If DMA is not used, KMSAN doesn't know that the scatterlist
> > > > > > @@ -445,7 +452,7 @@ static void vring_unmap_one_split_indirect(const struct vring_virtqueue *vq,
> > > > > > }
> > > > > >
> > > > > > static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
> > > > > > - unsigned int i)
> > > > > > + unsigned int i, bool premapped)
> > > > > > {
> > > > > > struct vring_desc_extra *extra = vq->split.desc_extra;
> > > > > > u16 flags;
> > > > > > @@ -462,6 +469,9 @@ static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
> > > > > > (flags & VRING_DESC_F_WRITE) ?
> > > > > > DMA_FROM_DEVICE : DMA_TO_DEVICE);
> > > > > > } else {
> > > > > > + if (premapped)
> > > > > > + goto out;
> > > > > > +
> > > > > > dma_unmap_page(vring_dma_dev(vq),
> > > > > > extra[i].addr,
> > > > > > extra[i].len,
> > > > > > @@ -532,6 +542,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > > unsigned int in_sgs,
> > > > > > void *data,
> > > > > > void *ctx,
> > > > > > + bool premapped,
> > > > > > gfp_t gfp)
> > > > > > {
> > > > > > struct vring_virtqueue *vq = to_vvq(_vq);
> > > > > > @@ -595,7 +606,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > > for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> > > > > > dma_addr_t addr;
> > > > > >
> > > > > > - if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, &addr))
> > > > > > + if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, premapped, &addr))
> > > > > > goto unmap_release;
> > > > > >
> > > > > > prev = i;
> > > > > > @@ -611,7 +622,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > > for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> > > > > > dma_addr_t addr;
> > > > > >
> > > > > > - if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, &addr))
> > > > > > + if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, premapped, &addr))
> > > > > > goto unmap_release;
> > > > > >
> > > > > > prev = i;
> > > > > > @@ -657,6 +668,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > >
> > > > > > /* Store token and indirect buffer state. */
> > > > > > vq->split.desc_state[head].data = data;
> > > > > > + vq->split.desc_state[head].premapped = premapped;
> > > > > > if (indirect)
> > > > > > vq->split.desc_state[head].indir_desc = desc;
> > > > > > else
> > > > > > @@ -686,6 +698,14 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > > return 0;
> > > > > >
> > > > > > unmap_release:
> > > > > > + if (premapped) {
> > > > > > + if (indirect)
> > > > > > + kfree(desc);
> > > > > > +
> > > > > > + END_USE(vq);
> > > > > > + return -ENOMEM;
> > > > > > + }
> > > > > > +
> > > > > > err_idx = i;
> > > > > >
> > > > > > if (indirect)
> > > > > > @@ -700,7 +720,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > > vring_unmap_one_split_indirect(vq, &desc[i]);
> > > > > > i = virtio16_to_cpu(_vq->vdev, desc[i].next);
> > > > > > } else
> > > > > > - i = vring_unmap_one_split(vq, i);
> > > > > > + i = vring_unmap_one_split(vq, i, false);
> > > > > > }
> > > > > >
> > > > > > if (indirect)
> > > > > > @@ -757,12 +777,12 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> > > > > > i = head;
> > > > > >
> > > > > > while (vq->split.vring.desc[i].flags & nextflag) {
> > > > > > - vring_unmap_one_split(vq, i);
> > > > > > + vring_unmap_one_split(vq, i, state->premapped);
> > > > > > i = vq->split.desc_extra[i].next;
> > > > > > vq->vq.num_free++;
> > > > > > }
> > > > > >
> > > > > > - vring_unmap_one_split(vq, i);
> > > > > > + vring_unmap_one_split(vq, i, state->premapped);
> > > > > > vq->split.desc_extra[i].next = vq->free_head;
> > > > > > vq->free_head = head;
> > > > > >
> > > > > > @@ -783,7 +803,7 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> > > > > > VRING_DESC_F_INDIRECT));
> > > > > > BUG_ON(len == 0 || len % sizeof(struct vring_desc));
> > > > > >
> > > > > > - if (vq->use_dma_api) {
> > > > > > + if (vq->use_dma_api && !state->premapped) {
> > > > > > for (j = 0; j < len / sizeof(struct vring_desc); j++)
> > > > > > vring_unmap_one_split_indirect(vq, &indir_desc[j]);
> > > > > > }
> > > > > > @@ -2143,7 +2163,7 @@ static inline int virtqueue_add(struct virtqueue *_vq,
> > > > > > return vq->packed_ring ? virtqueue_add_packed(_vq, sgs, total_sg,
> > > > > > out_sgs, in_sgs, data, ctx, gfp) :
> > > > > > virtqueue_add_split(_vq, sgs, total_sg,
> > > > > > - out_sgs, in_sgs, data, ctx, gfp);
> > > > > > + out_sgs, in_sgs, data, ctx, premapped, gfp);
> > > > > > }
> > > > > >
> > > > > > /**
> > > > > > --
> > > > > > 2.32.0.3.g01195cf9f
> > > > > >
> > > >
> >
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH vhost v9 05/12] virtio_ring: split: virtqueue_add_split() support premapped
2023-05-18 8:57 ` Jason Wang
@ 2023-05-18 9:18 ` Xuan Zhuo
0 siblings, 0 replies; 44+ messages in thread
From: Xuan Zhuo @ 2023-05-18 9:18 UTC (permalink / raw)
To: Jason Wang; +Cc: Christoph Hellwig, virtualization, Michael S. Tsirkin
On Thu, 18 May 2023 16:57:12 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Thu, May 18, 2023 at 3:57 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Thu, 18 May 2023 15:54:09 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Thu, May 18, 2023 at 3:41 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > On Thu, 18 May 2023 03:11:25 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > On Thu, May 18, 2023 at 02:51:57PM +0800, Jason Wang wrote:
> > > > > > On Wed, May 17, 2023 at 10:23 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > >
> > > > > > > virtqueue_add_split() only supports virtual addresses, dma is completed
> > > > > > > in virtqueue_add_split().
> > > > > > >
> > > > > > > In some scenarios (such as the AF_XDP scenario), the memory is allocated
> > > > > > > and DMA is completed in advance, so it is necessary for us to support
> > > > > > > passing the DMA address to virtqueue_add_split().
> > > > > > >
> > > > > > > Record this information in desc_state, we can skip unmap based on this
> > > > > > > when executing dma unmap.
> > > > > >
> > > > > > I would also suggest documenting why a per descriptor metadata is
> > > > > > needed instead of a per virtqueue one.
> > > > >
> > > > > I think we could make it per virtqueue. That would mean all code in
> > > > > virtio net would have to change to do dma mapping itself instead of
> > > > > relying on virtio core though. Which is maybe a good idea? Definitely a
> > > > > very intrusive change though, will need a lot of performance testing
> > > > > to make sure we don't break anything.
> > > >
> > > > In fact, we have tried this idea.
> > > >
> > > > The problem is the detach and unmap.
> > > >
> > > > We need to get all DMA Addresses from virtio-ring to unmap. Currently, it does
> > > > not support to return the DMA Address,
> > >
> > > I'm not sure I got here, but we've already stored the DMA address in desc_extra?
> >
> >
> > I mean we need to get the dma address from the virtio-core to virtio-net.
> >
>
> It probably just requires a new helper.
Yes
Thanks
>
> Thanks
>
> > Thanks.
> >
> >
> > >
> > > > and for SKB, we need to get multiple DMA
> > > > Addresses at one time.
> > >
> > > Could you elaborate on this?
> > >
> > > Thanks
> > >
> > > >
> > > > This need to modify the logic of Virtio-Ring detach. Besides this, I also agree
> > > > with this idea.
> > > >
> > > > Thanks.
> > > >
> > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > > >
> > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > > ---
> > > > > > > drivers/virtio/virtio_ring.c | 38 +++++++++++++++++++++++++++---------
> > > > > > > 1 file changed, 29 insertions(+), 9 deletions(-)
> > > > > > >
> > > > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > > > index e2fc50c05bec..bd5e84afab37 100644
> > > > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > > > @@ -70,6 +70,7 @@
> > > > > > > struct vring_desc_state_split {
> > > > > > > void *data; /* Data for callback. */
> > > > > > > struct vring_desc *indir_desc; /* Indirect descriptor, if any. */
> > > > > > > + bool premapped; /* DMA mapping is done by driver. */
> > > > > >
> > > > > > Going back to the original discussion around where this should be
> > > > > > placed. I wonder if we can find a common place to store this since it
> > > > > > has nothing related to virtqueue layout. Maybe desc_extra? And it
> > > > > > would be even better if we can avoid stressing the cache like above.
> > > > > >
> > > > > > > };
> > > > > > >
> > > > > > > struct vring_desc_state_packed {
> > > > > > > @@ -356,8 +357,14 @@ static struct device *vring_dma_dev(const struct vring_virtqueue *vq)
> > > > > > >
> > > > > > > /* Map one sg entry. */
> > > > > > > static int vring_map_one_sg(const struct vring_virtqueue *vq, struct scatterlist *sg,
> > > > > > > - enum dma_data_direction direction, static dma_addr_t *addr)
> > > > > > > + enum dma_data_direction direction,
> > > > > > > + bool premapped, dma_addr_t *addr)
> > > > > >
> > > > > > having things like:
> > > > > >
> > > > > > int func(bool do)
> > > > > > {
> > > > > > if (!do)
> > > > > > return;
> > > > > > }
> > > > > >
> > > > > > is a hint that the check needs to be done by the caller?
> > > > > >
> > > > > > And this change should work for both packed and split. I think we need
> > > > > > to squash the packed changes here.
> > > > > >
> > > > > > Looking at how packed virtqueue uses this in this patch, I don't think
> > > > > > this patch can even be built. I will wait for a new version and
> > > > > > continue the review from there.
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > >
> > > > > >
> > > > > > > {
> > > > > > > + if (premapped) {
> > > > > > > + *addr = sg_dma_address(sg);
> > > > > > > + return 0;
> > > > > > > + }
> > > > > > > +
> > > > > > > if (!vq->use_dma_api) {
> > > > > > > /*
> > > > > > > * If DMA is not used, KMSAN doesn't know that the scatterlist
> > > > > > > @@ -445,7 +452,7 @@ static void vring_unmap_one_split_indirect(const struct vring_virtqueue *vq,
> > > > > > > }
> > > > > > >
> > > > > > > static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
> > > > > > > - unsigned int i)
> > > > > > > + unsigned int i, bool premapped)
> > > > > > > {
> > > > > > > struct vring_desc_extra *extra = vq->split.desc_extra;
> > > > > > > u16 flags;
> > > > > > > @@ -462,6 +469,9 @@ static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
> > > > > > > (flags & VRING_DESC_F_WRITE) ?
> > > > > > > DMA_FROM_DEVICE : DMA_TO_DEVICE);
> > > > > > > } else {
> > > > > > > + if (premapped)
> > > > > > > + goto out;
> > > > > > > +
> > > > > > > dma_unmap_page(vring_dma_dev(vq),
> > > > > > > extra[i].addr,
> > > > > > > extra[i].len,
> > > > > > > @@ -532,6 +542,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > > > unsigned int in_sgs,
> > > > > > > void *data,
> > > > > > > void *ctx,
> > > > > > > + bool premapped,
> > > > > > > gfp_t gfp)
> > > > > > > {
> > > > > > > struct vring_virtqueue *vq = to_vvq(_vq);
> > > > > > > @@ -595,7 +606,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > > > for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> > > > > > > dma_addr_t addr;
> > > > > > >
> > > > > > > - if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, &addr))
> > > > > > > + if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, premapped, &addr))
> > > > > > > goto unmap_release;
> > > > > > >
> > > > > > > prev = i;
> > > > > > > @@ -611,7 +622,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > > > for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> > > > > > > dma_addr_t addr;
> > > > > > >
> > > > > > > - if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, &addr))
> > > > > > > + if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, premapped, &addr))
> > > > > > > goto unmap_release;
> > > > > > >
> > > > > > > prev = i;
> > > > > > > @@ -657,6 +668,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > > >
> > > > > > > /* Store token and indirect buffer state. */
> > > > > > > vq->split.desc_state[head].data = data;
> > > > > > > + vq->split.desc_state[head].premapped = premapped;
> > > > > > > if (indirect)
> > > > > > > vq->split.desc_state[head].indir_desc = desc;
> > > > > > > else
> > > > > > > @@ -686,6 +698,14 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > > > return 0;
> > > > > > >
> > > > > > > unmap_release:
> > > > > > > + if (premapped) {
> > > > > > > + if (indirect)
> > > > > > > + kfree(desc);
> > > > > > > +
> > > > > > > + END_USE(vq);
> > > > > > > + return -ENOMEM;
> > > > > > > + }
> > > > > > > +
> > > > > > > err_idx = i;
> > > > > > >
> > > > > > > if (indirect)
> > > > > > > @@ -700,7 +720,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > > > vring_unmap_one_split_indirect(vq, &desc[i]);
> > > > > > > i = virtio16_to_cpu(_vq->vdev, desc[i].next);
> > > > > > > } else
> > > > > > > - i = vring_unmap_one_split(vq, i);
> > > > > > > + i = vring_unmap_one_split(vq, i, false);
> > > > > > > }
> > > > > > >
> > > > > > > if (indirect)
> > > > > > > @@ -757,12 +777,12 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> > > > > > > i = head;
> > > > > > >
> > > > > > > while (vq->split.vring.desc[i].flags & nextflag) {
> > > > > > > - vring_unmap_one_split(vq, i);
> > > > > > > + vring_unmap_one_split(vq, i, state->premapped);
> > > > > > > i = vq->split.desc_extra[i].next;
> > > > > > > vq->vq.num_free++;
> > > > > > > }
> > > > > > >
> > > > > > > - vring_unmap_one_split(vq, i);
> > > > > > > + vring_unmap_one_split(vq, i, state->premapped);
> > > > > > > vq->split.desc_extra[i].next = vq->free_head;
> > > > > > > vq->free_head = head;
> > > > > > >
> > > > > > > @@ -783,7 +803,7 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> > > > > > > VRING_DESC_F_INDIRECT));
> > > > > > > BUG_ON(len == 0 || len % sizeof(struct vring_desc));
> > > > > > >
> > > > > > > - if (vq->use_dma_api) {
> > > > > > > + if (vq->use_dma_api && !state->premapped) {
> > > > > > > for (j = 0; j < len / sizeof(struct vring_desc); j++)
> > > > > > > vring_unmap_one_split_indirect(vq, &indir_desc[j]);
> > > > > > > }
> > > > > > > @@ -2143,7 +2163,7 @@ static inline int virtqueue_add(struct virtqueue *_vq,
> > > > > > > return vq->packed_ring ? virtqueue_add_packed(_vq, sgs, total_sg,
> > > > > > > out_sgs, in_sgs, data, ctx, gfp) :
> > > > > > > virtqueue_add_split(_vq, sgs, total_sg,
> > > > > > > - out_sgs, in_sgs, data, ctx, gfp);
> > > > > > > + out_sgs, in_sgs, data, ctx, premapped, gfp);
> > > > > > > }
> > > > > > >
> > > > > > > /**
> > > > > > > --
> > > > > > > 2.32.0.3.g01195cf9f
> > > > > > >
> > > > >
> > > >
> > >
> >
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH vhost v9 05/12] virtio_ring: split: virtqueue_add_split() support premapped
2023-05-18 7:11 ` Michael S. Tsirkin
2023-05-18 7:33 ` Xuan Zhuo
@ 2023-05-18 9:24 ` Xuan Zhuo
1 sibling, 0 replies; 44+ messages in thread
From: Xuan Zhuo @ 2023-05-18 9:24 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: Christoph Hellwig, virtualization
On Thu, 18 May 2023 03:11:25 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Thu, May 18, 2023 at 02:51:57PM +0800, Jason Wang wrote:
> > On Wed, May 17, 2023 at 10:23 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > virtqueue_add_split() only supports virtual addresses, dma is completed
> > > in virtqueue_add_split().
> > >
> > > In some scenarios (such as the AF_XDP scenario), the memory is allocated
> > > and DMA is completed in advance, so it is necessary for us to support
> > > passing the DMA address to virtqueue_add_split().
> > >
> > > Record this information in desc_state, we can skip unmap based on this
> > > when executing dma unmap.
> >
> > I would also suggest documenting why a per descriptor metadata is
> > needed instead of a per virtqueue one.
>
> I think we could make it per virtqueue. That would mean all code in
> virtio net would have to change to do dma mapping itself instead of
> relying on virtio core though. Which is maybe a good idea? Definitely a
> very intrusive change though, will need a lot of performance testing
> to make sure we don't break anything.
We can do this, virtio-net does not have to use premapped by default, we can
switch when vq reset. This is ok for af-xdp
Thanks.
>
>
>
>
> > >
> > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > ---
> > > drivers/virtio/virtio_ring.c | 38 +++++++++++++++++++++++++++---------
> > > 1 file changed, 29 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > index e2fc50c05bec..bd5e84afab37 100644
> > > --- a/drivers/virtio/virtio_ring.c
> > > +++ b/drivers/virtio/virtio_ring.c
> > > @@ -70,6 +70,7 @@
> > > struct vring_desc_state_split {
> > > void *data; /* Data for callback. */
> > > struct vring_desc *indir_desc; /* Indirect descriptor, if any. */
> > > + bool premapped; /* DMA mapping is done by driver. */
> >
> > Going back to the original discussion around where this should be
> > placed. I wonder if we can find a common place to store this since it
> > has nothing related to virtqueue layout. Maybe desc_extra? And it
> > would be even better if we can avoid stressing the cache like above.
> >
> > > };
> > >
> > > struct vring_desc_state_packed {
> > > @@ -356,8 +357,14 @@ static struct device *vring_dma_dev(const struct vring_virtqueue *vq)
> > >
> > > /* Map one sg entry. */
> > > static int vring_map_one_sg(const struct vring_virtqueue *vq, struct scatterlist *sg,
> > > - enum dma_data_direction direction, static dma_addr_t *addr)
> > > + enum dma_data_direction direction,
> > > + bool premapped, dma_addr_t *addr)
> >
> > having things like:
> >
> > int func(bool do)
> > {
> > if (!do)
> > return;
> > }
> >
> > is a hint that the check needs to be done by the caller?
> >
> > And this change should work for both packed and split. I think we need
> > to squash the packed changes here.
> >
> > Looking at how packed virtqueue uses this in this patch, I don't think
> > this patch can even be built. I will wait for a new version and
> > continue the review from there.
> >
> > Thanks
> >
> >
> >
> > > {
> > > + if (premapped) {
> > > + *addr = sg_dma_address(sg);
> > > + return 0;
> > > + }
> > > +
> > > if (!vq->use_dma_api) {
> > > /*
> > > * If DMA is not used, KMSAN doesn't know that the scatterlist
> > > @@ -445,7 +452,7 @@ static void vring_unmap_one_split_indirect(const struct vring_virtqueue *vq,
> > > }
> > >
> > > static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
> > > - unsigned int i)
> > > + unsigned int i, bool premapped)
> > > {
> > > struct vring_desc_extra *extra = vq->split.desc_extra;
> > > u16 flags;
> > > @@ -462,6 +469,9 @@ static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
> > > (flags & VRING_DESC_F_WRITE) ?
> > > DMA_FROM_DEVICE : DMA_TO_DEVICE);
> > > } else {
> > > + if (premapped)
> > > + goto out;
> > > +
> > > dma_unmap_page(vring_dma_dev(vq),
> > > extra[i].addr,
> > > extra[i].len,
> > > @@ -532,6 +542,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > unsigned int in_sgs,
> > > void *data,
> > > void *ctx,
> > > + bool premapped,
> > > gfp_t gfp)
> > > {
> > > struct vring_virtqueue *vq = to_vvq(_vq);
> > > @@ -595,7 +606,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> > > dma_addr_t addr;
> > >
> > > - if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, &addr))
> > > + if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, premapped, &addr))
> > > goto unmap_release;
> > >
> > > prev = i;
> > > @@ -611,7 +622,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> > > dma_addr_t addr;
> > >
> > > - if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, &addr))
> > > + if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, premapped, &addr))
> > > goto unmap_release;
> > >
> > > prev = i;
> > > @@ -657,6 +668,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > >
> > > /* Store token and indirect buffer state. */
> > > vq->split.desc_state[head].data = data;
> > > + vq->split.desc_state[head].premapped = premapped;
> > > if (indirect)
> > > vq->split.desc_state[head].indir_desc = desc;
> > > else
> > > @@ -686,6 +698,14 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > return 0;
> > >
> > > unmap_release:
> > > + if (premapped) {
> > > + if (indirect)
> > > + kfree(desc);
> > > +
> > > + END_USE(vq);
> > > + return -ENOMEM;
> > > + }
> > > +
> > > err_idx = i;
> > >
> > > if (indirect)
> > > @@ -700,7 +720,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > vring_unmap_one_split_indirect(vq, &desc[i]);
> > > i = virtio16_to_cpu(_vq->vdev, desc[i].next);
> > > } else
> > > - i = vring_unmap_one_split(vq, i);
> > > + i = vring_unmap_one_split(vq, i, false);
> > > }
> > >
> > > if (indirect)
> > > @@ -757,12 +777,12 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> > > i = head;
> > >
> > > while (vq->split.vring.desc[i].flags & nextflag) {
> > > - vring_unmap_one_split(vq, i);
> > > + vring_unmap_one_split(vq, i, state->premapped);
> > > i = vq->split.desc_extra[i].next;
> > > vq->vq.num_free++;
> > > }
> > >
> > > - vring_unmap_one_split(vq, i);
> > > + vring_unmap_one_split(vq, i, state->premapped);
> > > vq->split.desc_extra[i].next = vq->free_head;
> > > vq->free_head = head;
> > >
> > > @@ -783,7 +803,7 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> > > VRING_DESC_F_INDIRECT));
> > > BUG_ON(len == 0 || len % sizeof(struct vring_desc));
> > >
> > > - if (vq->use_dma_api) {
> > > + if (vq->use_dma_api && !state->premapped) {
> > > for (j = 0; j < len / sizeof(struct vring_desc); j++)
> > > vring_unmap_one_split_indirect(vq, &indir_desc[j]);
> > > }
> > > @@ -2143,7 +2163,7 @@ static inline int virtqueue_add(struct virtqueue *_vq,
> > > return vq->packed_ring ? virtqueue_add_packed(_vq, sgs, total_sg,
> > > out_sgs, in_sgs, data, ctx, gfp) :
> > > virtqueue_add_split(_vq, sgs, total_sg,
> > > - out_sgs, in_sgs, data, ctx, gfp);
> > > + out_sgs, in_sgs, data, ctx, premapped, gfp);
> > > }
> > >
> > > /**
> > > --
> > > 2.32.0.3.g01195cf9f
> > >
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH vhost v9 05/12] virtio_ring: split: virtqueue_add_split() support premapped
2023-05-18 8:50 ` Xuan Zhuo
@ 2023-05-18 9:41 ` Michael S. Tsirkin
0 siblings, 0 replies; 44+ messages in thread
From: Michael S. Tsirkin @ 2023-05-18 9:41 UTC (permalink / raw)
To: Xuan Zhuo; +Cc: Christoph Hellwig, virtualization
On Thu, May 18, 2023 at 04:50:35PM +0800, Xuan Zhuo wrote:
> On Thu, 18 May 2023 04:29:01 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > On Thu, May 18, 2023 at 03:33:52PM +0800, Xuan Zhuo wrote:
> > > On Thu, 18 May 2023 03:11:25 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > On Thu, May 18, 2023 at 02:51:57PM +0800, Jason Wang wrote:
> > > > > On Wed, May 17, 2023 at 10:23 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > >
> > > > > > virtqueue_add_split() only supports virtual addresses, dma is completed
> > > > > > in virtqueue_add_split().
> > > > > >
> > > > > > In some scenarios (such as the AF_XDP scenario), the memory is allocated
> > > > > > and DMA is completed in advance, so it is necessary for us to support
> > > > > > passing the DMA address to virtqueue_add_split().
> > > > > >
> > > > > > Record this information in desc_state, we can skip unmap based on this
> > > > > > when executing dma unmap.
> > > > >
> > > > > I would also suggest documenting why a per descriptor metadata is
> > > > > needed instead of a per virtqueue one.
> > > >
> > > > I think we could make it per virtqueue. That would mean all code in
> > > > virtio net would have to change to do dma mapping itself instead of
> > > > relying on virtio core though. Which is maybe a good idea? Definitely a
> > > > very intrusive change though, will need a lot of performance testing
> > > > to make sure we don't break anything.
> > >
> > > In fact, we have tried this idea.
> > >
> > > The problem is the detach and unmap.
> > >
> > > We need to get all DMA Addresses from virtio-ring to unmap. Currently, it does
> > > not support to return the DMA Address, and for SKB, we need to get multiple DMA
> > > Addresses at one time.
> > >
> > > This need to modify the logic of Virtio-Ring detach. Besides this, I also agree
> > > with this idea.
> > >
> > > Thanks.
> >
> > Well you can have a version of get_buf that returns them ... but
> > it is not clear to me all this is worth it unless you want
> > to do unsafe tricks like leaving them mapped. I'd leave that
> > for another day maybe.
> >
> > For marking desc as premapped I think we can use a bit from
> > desc_extra->flags, either reusing one of NEXT,AVAIL,USED, or stealing
> > another one.
> >
>
> Do you mean this https://lore.kernel.org/all/20220210085124.15466-6-xuanzhuo@linux.alibaba.com/
>
> Thanks.
That reused VRING_PACKED_DESC_F_USED, I would make that explicit rather
than hard-coding 15.
But besides that yes ... I see Jason objected
to that. Jason is it worth burning extra memory now? We can always
do it later when we have to extend flags ... no?
> >
> >
> > >
> > > >
> > > >
> > > >
> > > >
> > > > > >
> > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > ---
> > > > > > drivers/virtio/virtio_ring.c | 38 +++++++++++++++++++++++++++---------
> > > > > > 1 file changed, 29 insertions(+), 9 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > > index e2fc50c05bec..bd5e84afab37 100644
> > > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > > @@ -70,6 +70,7 @@
> > > > > > struct vring_desc_state_split {
> > > > > > void *data; /* Data for callback. */
> > > > > > struct vring_desc *indir_desc; /* Indirect descriptor, if any. */
> > > > > > + bool premapped; /* DMA mapping is done by driver. */
> > > > >
> > > > > Going back to the original discussion around where this should be
> > > > > placed. I wonder if we can find a common place to store this since it
> > > > > has nothing related to virtqueue layout. Maybe desc_extra? And it
> > > > > would be even better if we can avoid stressing the cache like above.
> > > > >
> > > > > > };
> > > > > >
> > > > > > struct vring_desc_state_packed {
> > > > > > @@ -356,8 +357,14 @@ static struct device *vring_dma_dev(const struct vring_virtqueue *vq)
> > > > > >
> > > > > > /* Map one sg entry. */
> > > > > > static int vring_map_one_sg(const struct vring_virtqueue *vq, struct scatterlist *sg,
> > > > > > - enum dma_data_direction direction, static dma_addr_t *addr)
> > > > > > + enum dma_data_direction direction,
> > > > > > + bool premapped, dma_addr_t *addr)
> > > > >
> > > > > having things like:
> > > > >
> > > > > int func(bool do)
> > > > > {
> > > > > if (!do)
> > > > > return;
> > > > > }
> > > > >
> > > > > is a hint that the check needs to be done by the caller?
> > > > >
> > > > > And this change should work for both packed and split. I think we need
> > > > > to squash the packed changes here.
> > > > >
> > > > > Looking at how packed virtqueue uses this in this patch, I don't think
> > > > > this patch can even be built. I will wait for a new version and
> > > > > continue the review from there.
> > > > >
> > > > > Thanks
> > > > >
> > > > >
> > > > >
> > > > > > {
> > > > > > + if (premapped) {
> > > > > > + *addr = sg_dma_address(sg);
> > > > > > + return 0;
> > > > > > + }
> > > > > > +
> > > > > > if (!vq->use_dma_api) {
> > > > > > /*
> > > > > > * If DMA is not used, KMSAN doesn't know that the scatterlist
> > > > > > @@ -445,7 +452,7 @@ static void vring_unmap_one_split_indirect(const struct vring_virtqueue *vq,
> > > > > > }
> > > > > >
> > > > > > static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
> > > > > > - unsigned int i)
> > > > > > + unsigned int i, bool premapped)
> > > > > > {
> > > > > > struct vring_desc_extra *extra = vq->split.desc_extra;
> > > > > > u16 flags;
> > > > > > @@ -462,6 +469,9 @@ static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
> > > > > > (flags & VRING_DESC_F_WRITE) ?
> > > > > > DMA_FROM_DEVICE : DMA_TO_DEVICE);
> > > > > > } else {
> > > > > > + if (premapped)
> > > > > > + goto out;
> > > > > > +
> > > > > > dma_unmap_page(vring_dma_dev(vq),
> > > > > > extra[i].addr,
> > > > > > extra[i].len,
> > > > > > @@ -532,6 +542,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > > unsigned int in_sgs,
> > > > > > void *data,
> > > > > > void *ctx,
> > > > > > + bool premapped,
> > > > > > gfp_t gfp)
> > > > > > {
> > > > > > struct vring_virtqueue *vq = to_vvq(_vq);
> > > > > > @@ -595,7 +606,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > > for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> > > > > > dma_addr_t addr;
> > > > > >
> > > > > > - if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, &addr))
> > > > > > + if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, premapped, &addr))
> > > > > > goto unmap_release;
> > > > > >
> > > > > > prev = i;
> > > > > > @@ -611,7 +622,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > > for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> > > > > > dma_addr_t addr;
> > > > > >
> > > > > > - if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, &addr))
> > > > > > + if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, premapped, &addr))
> > > > > > goto unmap_release;
> > > > > >
> > > > > > prev = i;
> > > > > > @@ -657,6 +668,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > >
> > > > > > /* Store token and indirect buffer state. */
> > > > > > vq->split.desc_state[head].data = data;
> > > > > > + vq->split.desc_state[head].premapped = premapped;
> > > > > > if (indirect)
> > > > > > vq->split.desc_state[head].indir_desc = desc;
> > > > > > else
> > > > > > @@ -686,6 +698,14 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > > return 0;
> > > > > >
> > > > > > unmap_release:
> > > > > > + if (premapped) {
> > > > > > + if (indirect)
> > > > > > + kfree(desc);
> > > > > > +
> > > > > > + END_USE(vq);
> > > > > > + return -ENOMEM;
> > > > > > + }
> > > > > > +
> > > > > > err_idx = i;
> > > > > >
> > > > > > if (indirect)
> > > > > > @@ -700,7 +720,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > > vring_unmap_one_split_indirect(vq, &desc[i]);
> > > > > > i = virtio16_to_cpu(_vq->vdev, desc[i].next);
> > > > > > } else
> > > > > > - i = vring_unmap_one_split(vq, i);
> > > > > > + i = vring_unmap_one_split(vq, i, false);
> > > > > > }
> > > > > >
> > > > > > if (indirect)
> > > > > > @@ -757,12 +777,12 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> > > > > > i = head;
> > > > > >
> > > > > > while (vq->split.vring.desc[i].flags & nextflag) {
> > > > > > - vring_unmap_one_split(vq, i);
> > > > > > + vring_unmap_one_split(vq, i, state->premapped);
> > > > > > i = vq->split.desc_extra[i].next;
> > > > > > vq->vq.num_free++;
> > > > > > }
> > > > > >
> > > > > > - vring_unmap_one_split(vq, i);
> > > > > > + vring_unmap_one_split(vq, i, state->premapped);
> > > > > > vq->split.desc_extra[i].next = vq->free_head;
> > > > > > vq->free_head = head;
> > > > > >
> > > > > > @@ -783,7 +803,7 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> > > > > > VRING_DESC_F_INDIRECT));
> > > > > > BUG_ON(len == 0 || len % sizeof(struct vring_desc));
> > > > > >
> > > > > > - if (vq->use_dma_api) {
> > > > > > + if (vq->use_dma_api && !state->premapped) {
> > > > > > for (j = 0; j < len / sizeof(struct vring_desc); j++)
> > > > > > vring_unmap_one_split_indirect(vq, &indir_desc[j]);
> > > > > > }
> > > > > > @@ -2143,7 +2163,7 @@ static inline int virtqueue_add(struct virtqueue *_vq,
> > > > > > return vq->packed_ring ? virtqueue_add_packed(_vq, sgs, total_sg,
> > > > > > out_sgs, in_sgs, data, ctx, gfp) :
> > > > > > virtqueue_add_split(_vq, sgs, total_sg,
> > > > > > - out_sgs, in_sgs, data, ctx, gfp);
> > > > > > + out_sgs, in_sgs, data, ctx, premapped, gfp);
> > > > > > }
> > > > > >
> > > > > > /**
> > > > > > --
> > > > > > 2.32.0.3.g01195cf9f
> > > > > >
> > > >
> >
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH vhost v9 05/12] virtio_ring: split: virtqueue_add_split() support premapped
2023-05-18 8:57 ` Jason Wang
2023-05-18 9:14 ` Xuan Zhuo
@ 2023-05-18 9:44 ` Michael S. Tsirkin
1 sibling, 0 replies; 44+ messages in thread
From: Michael S. Tsirkin @ 2023-05-18 9:44 UTC (permalink / raw)
To: Jason Wang; +Cc: Christoph Hellwig, Xuan Zhuo, virtualization
On Thu, May 18, 2023 at 04:57:37PM +0800, Jason Wang wrote:
> On Thu, May 18, 2023 at 4:29 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Thu, May 18, 2023 at 03:33:52PM +0800, Xuan Zhuo wrote:
> > > On Thu, 18 May 2023 03:11:25 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > On Thu, May 18, 2023 at 02:51:57PM +0800, Jason Wang wrote:
> > > > > On Wed, May 17, 2023 at 10:23 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > >
> > > > > > virtqueue_add_split() only supports virtual addresses, dma is completed
> > > > > > in virtqueue_add_split().
> > > > > >
> > > > > > In some scenarios (such as the AF_XDP scenario), the memory is allocated
> > > > > > and DMA is completed in advance, so it is necessary for us to support
> > > > > > passing the DMA address to virtqueue_add_split().
> > > > > >
> > > > > > Record this information in desc_state, we can skip unmap based on this
> > > > > > when executing dma unmap.
> > > > >
> > > > > I would also suggest documenting why a per descriptor metadata is
> > > > > needed instead of a per virtqueue one.
> > > >
> > > > I think we could make it per virtqueue. That would mean all code in
> > > > virtio net would have to change to do dma mapping itself instead of
> > > > relying on virtio core though. Which is maybe a good idea? Definitely a
> > > > very intrusive change though, will need a lot of performance testing
> > > > to make sure we don't break anything.
> > >
> > > In fact, we have tried this idea.
> > >
> > > The problem is the detach and unmap.
> > >
> > > We need to get all DMA Addresses from virtio-ring to unmap. Currently, it does
> > > not support to return the DMA Address, and for SKB, we need to get multiple DMA
> > > Addresses at one time.
> > >
> > > This need to modify the logic of Virtio-Ring detach. Besides this, I also agree
> > > with this idea.
> > >
> > > Thanks.
> >
> > Well you can have a version of get_buf that returns them ... but
> > it is not clear to me all this is worth it unless you want
> > to do unsafe tricks like leaving them mapped.
>
> Some high speed NIC drivers use this trick for better performance.
I know. I think it's better left as a patch on top though, right?
> > I'd leave that
> > for another day maybe.
> >
> > For marking desc as premapped I think we can use a bit from
> > desc_extra->flags, either reusing one of NEXT,AVAIL,USED, or stealing
> > another one.
>
> Probably.
>
> Thanks
>
> >
> >
> >
> > >
> > > >
> > > >
> > > >
> > > >
> > > > > >
> > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > ---
> > > > > > drivers/virtio/virtio_ring.c | 38 +++++++++++++++++++++++++++---------
> > > > > > 1 file changed, 29 insertions(+), 9 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > > index e2fc50c05bec..bd5e84afab37 100644
> > > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > > @@ -70,6 +70,7 @@
> > > > > > struct vring_desc_state_split {
> > > > > > void *data; /* Data for callback. */
> > > > > > struct vring_desc *indir_desc; /* Indirect descriptor, if any. */
> > > > > > + bool premapped; /* DMA mapping is done by driver. */
> > > > >
> > > > > Going back to the original discussion around where this should be
> > > > > placed. I wonder if we can find a common place to store this since it
> > > > > has nothing related to virtqueue layout. Maybe desc_extra? And it
> > > > > would be even better if we can avoid stressing the cache like above.
> > > > >
> > > > > > };
> > > > > >
> > > > > > struct vring_desc_state_packed {
> > > > > > @@ -356,8 +357,14 @@ static struct device *vring_dma_dev(const struct vring_virtqueue *vq)
> > > > > >
> > > > > > /* Map one sg entry. */
> > > > > > static int vring_map_one_sg(const struct vring_virtqueue *vq, struct scatterlist *sg,
> > > > > > - enum dma_data_direction direction, static dma_addr_t *addr)
> > > > > > + enum dma_data_direction direction,
> > > > > > + bool premapped, dma_addr_t *addr)
> > > > >
> > > > > having things like:
> > > > >
> > > > > int func(bool do)
> > > > > {
> > > > > if (!do)
> > > > > return;
> > > > > }
> > > > >
> > > > > is a hint that the check needs to be done by the caller?
> > > > >
> > > > > And this change should work for both packed and split. I think we need
> > > > > to squash the packed changes here.
> > > > >
> > > > > Looking at how packed virtqueue uses this in this patch, I don't think
> > > > > this patch can even be built. I will wait for a new version and
> > > > > continue the review from there.
> > > > >
> > > > > Thanks
> > > > >
> > > > >
> > > > >
> > > > > > {
> > > > > > + if (premapped) {
> > > > > > + *addr = sg_dma_address(sg);
> > > > > > + return 0;
> > > > > > + }
> > > > > > +
> > > > > > if (!vq->use_dma_api) {
> > > > > > /*
> > > > > > * If DMA is not used, KMSAN doesn't know that the scatterlist
> > > > > > @@ -445,7 +452,7 @@ static void vring_unmap_one_split_indirect(const struct vring_virtqueue *vq,
> > > > > > }
> > > > > >
> > > > > > static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
> > > > > > - unsigned int i)
> > > > > > + unsigned int i, bool premapped)
> > > > > > {
> > > > > > struct vring_desc_extra *extra = vq->split.desc_extra;
> > > > > > u16 flags;
> > > > > > @@ -462,6 +469,9 @@ static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
> > > > > > (flags & VRING_DESC_F_WRITE) ?
> > > > > > DMA_FROM_DEVICE : DMA_TO_DEVICE);
> > > > > > } else {
> > > > > > + if (premapped)
> > > > > > + goto out;
> > > > > > +
> > > > > > dma_unmap_page(vring_dma_dev(vq),
> > > > > > extra[i].addr,
> > > > > > extra[i].len,
> > > > > > @@ -532,6 +542,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > > unsigned int in_sgs,
> > > > > > void *data,
> > > > > > void *ctx,
> > > > > > + bool premapped,
> > > > > > gfp_t gfp)
> > > > > > {
> > > > > > struct vring_virtqueue *vq = to_vvq(_vq);
> > > > > > @@ -595,7 +606,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > > for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> > > > > > dma_addr_t addr;
> > > > > >
> > > > > > - if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, &addr))
> > > > > > + if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, premapped, &addr))
> > > > > > goto unmap_release;
> > > > > >
> > > > > > prev = i;
> > > > > > @@ -611,7 +622,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > > for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> > > > > > dma_addr_t addr;
> > > > > >
> > > > > > - if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, &addr))
> > > > > > + if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, premapped, &addr))
> > > > > > goto unmap_release;
> > > > > >
> > > > > > prev = i;
> > > > > > @@ -657,6 +668,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > >
> > > > > > /* Store token and indirect buffer state. */
> > > > > > vq->split.desc_state[head].data = data;
> > > > > > + vq->split.desc_state[head].premapped = premapped;
> > > > > > if (indirect)
> > > > > > vq->split.desc_state[head].indir_desc = desc;
> > > > > > else
> > > > > > @@ -686,6 +698,14 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > > return 0;
> > > > > >
> > > > > > unmap_release:
> > > > > > + if (premapped) {
> > > > > > + if (indirect)
> > > > > > + kfree(desc);
> > > > > > +
> > > > > > + END_USE(vq);
> > > > > > + return -ENOMEM;
> > > > > > + }
> > > > > > +
> > > > > > err_idx = i;
> > > > > >
> > > > > > if (indirect)
> > > > > > @@ -700,7 +720,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > > vring_unmap_one_split_indirect(vq, &desc[i]);
> > > > > > i = virtio16_to_cpu(_vq->vdev, desc[i].next);
> > > > > > } else
> > > > > > - i = vring_unmap_one_split(vq, i);
> > > > > > + i = vring_unmap_one_split(vq, i, false);
> > > > > > }
> > > > > >
> > > > > > if (indirect)
> > > > > > @@ -757,12 +777,12 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> > > > > > i = head;
> > > > > >
> > > > > > while (vq->split.vring.desc[i].flags & nextflag) {
> > > > > > - vring_unmap_one_split(vq, i);
> > > > > > + vring_unmap_one_split(vq, i, state->premapped);
> > > > > > i = vq->split.desc_extra[i].next;
> > > > > > vq->vq.num_free++;
> > > > > > }
> > > > > >
> > > > > > - vring_unmap_one_split(vq, i);
> > > > > > + vring_unmap_one_split(vq, i, state->premapped);
> > > > > > vq->split.desc_extra[i].next = vq->free_head;
> > > > > > vq->free_head = head;
> > > > > >
> > > > > > @@ -783,7 +803,7 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> > > > > > VRING_DESC_F_INDIRECT));
> > > > > > BUG_ON(len == 0 || len % sizeof(struct vring_desc));
> > > > > >
> > > > > > - if (vq->use_dma_api) {
> > > > > > + if (vq->use_dma_api && !state->premapped) {
> > > > > > for (j = 0; j < len / sizeof(struct vring_desc); j++)
> > > > > > vring_unmap_one_split_indirect(vq, &indir_desc[j]);
> > > > > > }
> > > > > > @@ -2143,7 +2163,7 @@ static inline int virtqueue_add(struct virtqueue *_vq,
> > > > > > return vq->packed_ring ? virtqueue_add_packed(_vq, sgs, total_sg,
> > > > > > out_sgs, in_sgs, data, ctx, gfp) :
> > > > > > virtqueue_add_split(_vq, sgs, total_sg,
> > > > > > - out_sgs, in_sgs, data, ctx, gfp);
> > > > > > + out_sgs, in_sgs, data, ctx, premapped, gfp);
> > > > > > }
> > > > > >
> > > > > > /**
> > > > > > --
> > > > > > 2.32.0.3.g01195cf9f
> > > > > >
> > > >
> >
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH vhost v9 05/12] virtio_ring: split: virtqueue_add_split() support premapped
2023-05-18 9:14 ` Xuan Zhuo
@ 2023-05-18 9:49 ` Michael S. Tsirkin
2023-05-18 12:20 ` Xuan Zhuo
` (2 more replies)
0 siblings, 3 replies; 44+ messages in thread
From: Michael S. Tsirkin @ 2023-05-18 9:49 UTC (permalink / raw)
To: Xuan Zhuo; +Cc: Christoph Hellwig, virtualization
On Thu, May 18, 2023 at 05:14:03PM +0800, Xuan Zhuo wrote:
> On Thu, 18 May 2023 16:57:37 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Thu, May 18, 2023 at 4:29 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Thu, May 18, 2023 at 03:33:52PM +0800, Xuan Zhuo wrote:
> > > > On Thu, 18 May 2023 03:11:25 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > On Thu, May 18, 2023 at 02:51:57PM +0800, Jason Wang wrote:
> > > > > > On Wed, May 17, 2023 at 10:23 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > >
> > > > > > > virtqueue_add_split() only supports virtual addresses, dma is completed
> > > > > > > in virtqueue_add_split().
> > > > > > >
> > > > > > > In some scenarios (such as the AF_XDP scenario), the memory is allocated
> > > > > > > and DMA is completed in advance, so it is necessary for us to support
> > > > > > > passing the DMA address to virtqueue_add_split().
> > > > > > >
> > > > > > > Record this information in desc_state, we can skip unmap based on this
> > > > > > > when executing dma unmap.
> > > > > >
> > > > > > I would also suggest documenting why a per descriptor metadata is
> > > > > > needed instead of a per virtqueue one.
> > > > >
> > > > > I think we could make it per virtqueue. That would mean all code in
> > > > > virtio net would have to change to do dma mapping itself instead of
> > > > > relying on virtio core though. Which is maybe a good idea? Definitely a
> > > > > very intrusive change though, will need a lot of performance testing
> > > > > to make sure we don't break anything.
> > > >
> > > > In fact, we have tried this idea.
> > > >
> > > > The problem is the detach and unmap.
> > > >
> > > > We need to get all DMA Addresses from virtio-ring to unmap. Currently, it does
> > > > not support to return the DMA Address, and for SKB, we need to get multiple DMA
> > > > Addresses at one time.
> > > >
> > > > This need to modify the logic of Virtio-Ring detach. Besides this, I also agree
> > > > with this idea.
> > > >
> > > > Thanks.
> > >
> > > Well you can have a version of get_buf that returns them ... but
> > > it is not clear to me all this is worth it unless you want
> > > to do unsafe tricks like leaving them mapped.
> >
> > Some high speed NIC drivers use this trick for better performance.
>
>
> Interesting, this is the first time I know this. Is there any problem?
depends - if you are relying on the IOMMU then yes - malicious hardware
can steal guest secrets or corrupt memory since it's a hack not properly
integrated with linux and there's no real control preventing linux from
reusing this memory for something unrelated.
If instead you are using something like bounce buffers then no, but OTOH
bounce buffers are already expensive so you might not see a lot
of benefit.
> So, is that virtio-net master the operation of dma by itself the right way?
>
> Thanks
I am fine with the approach taken for now. And look at reducing
cost of dma map/unmap later.
>
>
> >
> > > I'd leave that
> > > for another day maybe.
> > >
> > > For marking desc as premapped I think we can use a bit from
> > > desc_extra->flags, either reusing one of NEXT,AVAIL,USED, or stealing
> > > another one.
> >
> > Probably.
> >
> > Thanks
> >
> > >
> > >
> > >
> > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > > >
> > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > > ---
> > > > > > > drivers/virtio/virtio_ring.c | 38 +++++++++++++++++++++++++++---------
> > > > > > > 1 file changed, 29 insertions(+), 9 deletions(-)
> > > > > > >
> > > > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > > > index e2fc50c05bec..bd5e84afab37 100644
> > > > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > > > @@ -70,6 +70,7 @@
> > > > > > > struct vring_desc_state_split {
> > > > > > > void *data; /* Data for callback. */
> > > > > > > struct vring_desc *indir_desc; /* Indirect descriptor, if any. */
> > > > > > > + bool premapped; /* DMA mapping is done by driver. */
> > > > > >
> > > > > > Going back to the original discussion around where this should be
> > > > > > placed. I wonder if we can find a common place to store this since it
> > > > > > has nothing related to virtqueue layout. Maybe desc_extra? And it
> > > > > > would be even better if we can avoid stressing the cache like above.
> > > > > >
> > > > > > > };
> > > > > > >
> > > > > > > struct vring_desc_state_packed {
> > > > > > > @@ -356,8 +357,14 @@ static struct device *vring_dma_dev(const struct vring_virtqueue *vq)
> > > > > > >
> > > > > > > /* Map one sg entry. */
> > > > > > > static int vring_map_one_sg(const struct vring_virtqueue *vq, struct scatterlist *sg,
> > > > > > > - enum dma_data_direction direction, static dma_addr_t *addr)
> > > > > > > + enum dma_data_direction direction,
> > > > > > > + bool premapped, dma_addr_t *addr)
> > > > > >
> > > > > > having things like:
> > > > > >
> > > > > > int func(bool do)
> > > > > > {
> > > > > > if (!do)
> > > > > > return;
> > > > > > }
> > > > > >
> > > > > > is a hint that the check needs to be done by the caller?
> > > > > >
> > > > > > And this change should work for both packed and split. I think we need
> > > > > > to squash the packed changes here.
> > > > > >
> > > > > > Looking at how packed virtqueue uses this in this patch, I don't think
> > > > > > this patch can even be built. I will wait for a new version and
> > > > > > continue the review from there.
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > >
> > > > > >
> > > > > > > {
> > > > > > > + if (premapped) {
> > > > > > > + *addr = sg_dma_address(sg);
> > > > > > > + return 0;
> > > > > > > + }
> > > > > > > +
> > > > > > > if (!vq->use_dma_api) {
> > > > > > > /*
> > > > > > > * If DMA is not used, KMSAN doesn't know that the scatterlist
> > > > > > > @@ -445,7 +452,7 @@ static void vring_unmap_one_split_indirect(const struct vring_virtqueue *vq,
> > > > > > > }
> > > > > > >
> > > > > > > static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
> > > > > > > - unsigned int i)
> > > > > > > + unsigned int i, bool premapped)
> > > > > > > {
> > > > > > > struct vring_desc_extra *extra = vq->split.desc_extra;
> > > > > > > u16 flags;
> > > > > > > @@ -462,6 +469,9 @@ static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
> > > > > > > (flags & VRING_DESC_F_WRITE) ?
> > > > > > > DMA_FROM_DEVICE : DMA_TO_DEVICE);
> > > > > > > } else {
> > > > > > > + if (premapped)
> > > > > > > + goto out;
> > > > > > > +
> > > > > > > dma_unmap_page(vring_dma_dev(vq),
> > > > > > > extra[i].addr,
> > > > > > > extra[i].len,
> > > > > > > @@ -532,6 +542,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > > > unsigned int in_sgs,
> > > > > > > void *data,
> > > > > > > void *ctx,
> > > > > > > + bool premapped,
> > > > > > > gfp_t gfp)
> > > > > > > {
> > > > > > > struct vring_virtqueue *vq = to_vvq(_vq);
> > > > > > > @@ -595,7 +606,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > > > for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> > > > > > > dma_addr_t addr;
> > > > > > >
> > > > > > > - if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, &addr))
> > > > > > > + if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, premapped, &addr))
> > > > > > > goto unmap_release;
> > > > > > >
> > > > > > > prev = i;
> > > > > > > @@ -611,7 +622,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > > > for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> > > > > > > dma_addr_t addr;
> > > > > > >
> > > > > > > - if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, &addr))
> > > > > > > + if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, premapped, &addr))
> > > > > > > goto unmap_release;
> > > > > > >
> > > > > > > prev = i;
> > > > > > > @@ -657,6 +668,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > > >
> > > > > > > /* Store token and indirect buffer state. */
> > > > > > > vq->split.desc_state[head].data = data;
> > > > > > > + vq->split.desc_state[head].premapped = premapped;
> > > > > > > if (indirect)
> > > > > > > vq->split.desc_state[head].indir_desc = desc;
> > > > > > > else
> > > > > > > @@ -686,6 +698,14 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > > > return 0;
> > > > > > >
> > > > > > > unmap_release:
> > > > > > > + if (premapped) {
> > > > > > > + if (indirect)
> > > > > > > + kfree(desc);
> > > > > > > +
> > > > > > > + END_USE(vq);
> > > > > > > + return -ENOMEM;
> > > > > > > + }
> > > > > > > +
> > > > > > > err_idx = i;
> > > > > > >
> > > > > > > if (indirect)
> > > > > > > @@ -700,7 +720,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > > > vring_unmap_one_split_indirect(vq, &desc[i]);
> > > > > > > i = virtio16_to_cpu(_vq->vdev, desc[i].next);
> > > > > > > } else
> > > > > > > - i = vring_unmap_one_split(vq, i);
> > > > > > > + i = vring_unmap_one_split(vq, i, false);
> > > > > > > }
> > > > > > >
> > > > > > > if (indirect)
> > > > > > > @@ -757,12 +777,12 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> > > > > > > i = head;
> > > > > > >
> > > > > > > while (vq->split.vring.desc[i].flags & nextflag) {
> > > > > > > - vring_unmap_one_split(vq, i);
> > > > > > > + vring_unmap_one_split(vq, i, state->premapped);
> > > > > > > i = vq->split.desc_extra[i].next;
> > > > > > > vq->vq.num_free++;
> > > > > > > }
> > > > > > >
> > > > > > > - vring_unmap_one_split(vq, i);
> > > > > > > + vring_unmap_one_split(vq, i, state->premapped);
> > > > > > > vq->split.desc_extra[i].next = vq->free_head;
> > > > > > > vq->free_head = head;
> > > > > > >
> > > > > > > @@ -783,7 +803,7 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> > > > > > > VRING_DESC_F_INDIRECT));
> > > > > > > BUG_ON(len == 0 || len % sizeof(struct vring_desc));
> > > > > > >
> > > > > > > - if (vq->use_dma_api) {
> > > > > > > + if (vq->use_dma_api && !state->premapped) {
> > > > > > > for (j = 0; j < len / sizeof(struct vring_desc); j++)
> > > > > > > vring_unmap_one_split_indirect(vq, &indir_desc[j]);
> > > > > > > }
> > > > > > > @@ -2143,7 +2163,7 @@ static inline int virtqueue_add(struct virtqueue *_vq,
> > > > > > > return vq->packed_ring ? virtqueue_add_packed(_vq, sgs, total_sg,
> > > > > > > out_sgs, in_sgs, data, ctx, gfp) :
> > > > > > > virtqueue_add_split(_vq, sgs, total_sg,
> > > > > > > - out_sgs, in_sgs, data, ctx, gfp);
> > > > > > > + out_sgs, in_sgs, data, ctx, premapped, gfp);
> > > > > > > }
> > > > > > >
> > > > > > > /**
> > > > > > > --
> > > > > > > 2.32.0.3.g01195cf9f
> > > > > > >
> > > > >
> > >
> >
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH vhost v9 10/12] virtio_ring: correct the expression of the description of virtqueue_resize()
2023-05-17 2:22 ` [PATCH vhost v9 10/12] virtio_ring: correct the expression of the description of virtqueue_resize() Xuan Zhuo
@ 2023-05-18 12:12 ` Xuan Zhuo
2023-05-18 14:00 ` Michael S. Tsirkin
0 siblings, 1 reply; 44+ messages in thread
From: Xuan Zhuo @ 2023-05-18 12:12 UTC (permalink / raw)
To: Xuan Zhuo; +Cc: Christoph Hellwig, virtualization, Michael S. Tsirkin
On Wed, 17 May 2023 10:22:47 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> Modify the "useless" to a more accurate "unused".
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> Acked-by: Jason Wang <jasowang@redhat.com>
Hi Michael,
Currently, we have to discuss a few cases regarding dma-premapped. Can you
merge the three patches behind this (including this one)? These have nothing to
do with dma-premapped.
Should I post a new patch set separately?
Thanks.
> ---
> drivers/virtio/virtio_ring.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 42730c4ecdc5..c90160d2d280 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -2734,7 +2734,7 @@ EXPORT_SYMBOL_GPL(vring_create_virtqueue_dma);
> * virtqueue_resize - resize the vring of vq
> * @_vq: the struct virtqueue we're talking about.
> * @num: new ring num
> - * @recycle: callback for recycle the useless buffer
> + * @recycle: callback to recycle unused buffers
> *
> * When it is really necessary to create a new vring, it will set the current vq
> * into the reset state. Then call the passed callback to recycle the buffer
> --
> 2.32.0.3.g01195cf9f
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH vhost v9 05/12] virtio_ring: split: virtqueue_add_split() support premapped
2023-05-18 9:49 ` Michael S. Tsirkin
@ 2023-05-18 12:20 ` Xuan Zhuo
2023-05-18 12:22 ` Xuan Zhuo
2023-05-19 3:38 ` Jason Wang
2 siblings, 0 replies; 44+ messages in thread
From: Xuan Zhuo @ 2023-05-18 12:20 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: Christoph Hellwig, virtualization
On Thu, 18 May 2023 05:49:46 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Thu, May 18, 2023 at 05:14:03PM +0800, Xuan Zhuo wrote:
> > On Thu, 18 May 2023 16:57:37 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Thu, May 18, 2023 at 4:29 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Thu, May 18, 2023 at 03:33:52PM +0800, Xuan Zhuo wrote:
> > > > > On Thu, 18 May 2023 03:11:25 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > On Thu, May 18, 2023 at 02:51:57PM +0800, Jason Wang wrote:
> > > > > > > On Wed, May 17, 2023 at 10:23 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > >
> > > > > > > > virtqueue_add_split() only supports virtual addresses, dma is completed
> > > > > > > > in virtqueue_add_split().
> > > > > > > >
> > > > > > > > In some scenarios (such as the AF_XDP scenario), the memory is allocated
> > > > > > > > and DMA is completed in advance, so it is necessary for us to support
> > > > > > > > passing the DMA address to virtqueue_add_split().
> > > > > > > >
> > > > > > > > Record this information in desc_state, we can skip unmap based on this
> > > > > > > > when executing dma unmap.
> > > > > > >
> > > > > > > I would also suggest documenting why a per descriptor metadata is
> > > > > > > needed instead of a per virtqueue one.
> > > > > >
> > > > > > I think we could make it per virtqueue. That would mean all code in
> > > > > > virtio net would have to change to do dma mapping itself instead of
> > > > > > relying on virtio core though. Which is maybe a good idea? Definitely a
> > > > > > very intrusive change though, will need a lot of performance testing
> > > > > > to make sure we don't break anything.
> > > > >
> > > > > In fact, we have tried this idea.
> > > > >
> > > > > The problem is the detach and unmap.
> > > > >
> > > > > We need to get all DMA Addresses from virtio-ring to unmap. Currently, it does
> > > > > not support to return the DMA Address, and for SKB, we need to get multiple DMA
> > > > > Addresses at one time.
> > > > >
> > > > > This need to modify the logic of Virtio-Ring detach. Besides this, I also agree
> > > > > with this idea.
> > > > >
> > > > > Thanks.
> > > >
> > > > Well you can have a version of get_buf that returns them ... but
> > > > it is not clear to me all this is worth it unless you want
> > > > to do unsafe tricks like leaving them mapped.
> > >
> > > Some high speed NIC drivers use this trick for better performance.
> >
> >
> > Interesting, this is the first time I know this. Is there any problem?
>
> depends - if you are relying on the IOMMU then yes - malicious hardware
> can steal guest secrets or corrupt memory since it's a hack not properly
> integrated with linux and there's no real control preventing linux from
> reusing this memory for something unrelated.
> If instead you are using something like bounce buffers then no, but OTOH
> bounce buffers are already expensive so you might not see a lot
> of benefit.
Thanks for the explanation.
>
> > So, is that virtio-net master the operation of dma by itself the right way?
> >
> > Thanks
>
> I am fine with the approach taken for now. And look at reducing
> cost of dma map/unmap later.
>
> >
> >
> > >
> > > > I'd leave that
> > > > for another day maybe.
> > > >
> > > > For marking desc as premapped I think we can use a bit from
> > > > desc_extra->flags, either reusing one of NEXT,AVAIL,USED, or stealing
> > > > another one.
> > >
> > > Probably.
> > >
> > > Thanks
> > >
> > > >
> > > >
> > > >
> > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > > >
> > > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > > > ---
> > > > > > > > drivers/virtio/virtio_ring.c | 38 +++++++++++++++++++++++++++---------
> > > > > > > > 1 file changed, 29 insertions(+), 9 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > > > > index e2fc50c05bec..bd5e84afab37 100644
> > > > > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > > > > @@ -70,6 +70,7 @@
> > > > > > > > struct vring_desc_state_split {
> > > > > > > > void *data; /* Data for callback. */
> > > > > > > > struct vring_desc *indir_desc; /* Indirect descriptor, if any. */
> > > > > > > > + bool premapped; /* DMA mapping is done by driver. */
> > > > > > >
> > > > > > > Going back to the original discussion around where this should be
> > > > > > > placed. I wonder if we can find a common place to store this since it
> > > > > > > has nothing related to virtqueue layout. Maybe desc_extra? And it
> > > > > > > would be even better if we can avoid stressing the cache like above.
> > > > > > >
> > > > > > > > };
> > > > > > > >
> > > > > > > > struct vring_desc_state_packed {
> > > > > > > > @@ -356,8 +357,14 @@ static struct device *vring_dma_dev(const struct vring_virtqueue *vq)
> > > > > > > >
> > > > > > > > /* Map one sg entry. */
> > > > > > > > static int vring_map_one_sg(const struct vring_virtqueue *vq, struct scatterlist *sg,
> > > > > > > > - enum dma_data_direction direction, static dma_addr_t *addr)
> > > > > > > > + enum dma_data_direction direction,
> > > > > > > > + bool premapped, dma_addr_t *addr)
> > > > > > >
> > > > > > > having things like:
> > > > > > >
> > > > > > > int func(bool do)
> > > > > > > {
> > > > > > > if (!do)
> > > > > > > return;
> > > > > > > }
> > > > > > >
> > > > > > > is a hint that the check needs to be done by the caller?
> > > > > > >
> > > > > > > And this change should work for both packed and split. I think we need
> > > > > > > to squash the packed changes here.
> > > > > > >
> > > > > > > Looking at how packed virtqueue uses this in this patch, I don't think
> > > > > > > this patch can even be built. I will wait for a new version and
> > > > > > > continue the review from there.
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > {
> > > > > > > > + if (premapped) {
> > > > > > > > + *addr = sg_dma_address(sg);
> > > > > > > > + return 0;
> > > > > > > > + }
> > > > > > > > +
> > > > > > > > if (!vq->use_dma_api) {
> > > > > > > > /*
> > > > > > > > * If DMA is not used, KMSAN doesn't know that the scatterlist
> > > > > > > > @@ -445,7 +452,7 @@ static void vring_unmap_one_split_indirect(const struct vring_virtqueue *vq,
> > > > > > > > }
> > > > > > > >
> > > > > > > > static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
> > > > > > > > - unsigned int i)
> > > > > > > > + unsigned int i, bool premapped)
> > > > > > > > {
> > > > > > > > struct vring_desc_extra *extra = vq->split.desc_extra;
> > > > > > > > u16 flags;
> > > > > > > > @@ -462,6 +469,9 @@ static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
> > > > > > > > (flags & VRING_DESC_F_WRITE) ?
> > > > > > > > DMA_FROM_DEVICE : DMA_TO_DEVICE);
> > > > > > > > } else {
> > > > > > > > + if (premapped)
> > > > > > > > + goto out;
> > > > > > > > +
> > > > > > > > dma_unmap_page(vring_dma_dev(vq),
> > > > > > > > extra[i].addr,
> > > > > > > > extra[i].len,
> > > > > > > > @@ -532,6 +542,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > > > > unsigned int in_sgs,
> > > > > > > > void *data,
> > > > > > > > void *ctx,
> > > > > > > > + bool premapped,
> > > > > > > > gfp_t gfp)
> > > > > > > > {
> > > > > > > > struct vring_virtqueue *vq = to_vvq(_vq);
> > > > > > > > @@ -595,7 +606,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > > > > for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> > > > > > > > dma_addr_t addr;
> > > > > > > >
> > > > > > > > - if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, &addr))
> > > > > > > > + if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, premapped, &addr))
> > > > > > > > goto unmap_release;
> > > > > > > >
> > > > > > > > prev = i;
> > > > > > > > @@ -611,7 +622,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > > > > for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> > > > > > > > dma_addr_t addr;
> > > > > > > >
> > > > > > > > - if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, &addr))
> > > > > > > > + if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, premapped, &addr))
> > > > > > > > goto unmap_release;
> > > > > > > >
> > > > > > > > prev = i;
> > > > > > > > @@ -657,6 +668,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > > > >
> > > > > > > > /* Store token and indirect buffer state. */
> > > > > > > > vq->split.desc_state[head].data = data;
> > > > > > > > + vq->split.desc_state[head].premapped = premapped;
> > > > > > > > if (indirect)
> > > > > > > > vq->split.desc_state[head].indir_desc = desc;
> > > > > > > > else
> > > > > > > > @@ -686,6 +698,14 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > > > > return 0;
> > > > > > > >
> > > > > > > > unmap_release:
> > > > > > > > + if (premapped) {
> > > > > > > > + if (indirect)
> > > > > > > > + kfree(desc);
> > > > > > > > +
> > > > > > > > + END_USE(vq);
> > > > > > > > + return -ENOMEM;
> > > > > > > > + }
> > > > > > > > +
> > > > > > > > err_idx = i;
> > > > > > > >
> > > > > > > > if (indirect)
> > > > > > > > @@ -700,7 +720,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > > > > vring_unmap_one_split_indirect(vq, &desc[i]);
> > > > > > > > i = virtio16_to_cpu(_vq->vdev, desc[i].next);
> > > > > > > > } else
> > > > > > > > - i = vring_unmap_one_split(vq, i);
> > > > > > > > + i = vring_unmap_one_split(vq, i, false);
> > > > > > > > }
> > > > > > > >
> > > > > > > > if (indirect)
> > > > > > > > @@ -757,12 +777,12 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> > > > > > > > i = head;
> > > > > > > >
> > > > > > > > while (vq->split.vring.desc[i].flags & nextflag) {
> > > > > > > > - vring_unmap_one_split(vq, i);
> > > > > > > > + vring_unmap_one_split(vq, i, state->premapped);
> > > > > > > > i = vq->split.desc_extra[i].next;
> > > > > > > > vq->vq.num_free++;
> > > > > > > > }
> > > > > > > >
> > > > > > > > - vring_unmap_one_split(vq, i);
> > > > > > > > + vring_unmap_one_split(vq, i, state->premapped);
> > > > > > > > vq->split.desc_extra[i].next = vq->free_head;
> > > > > > > > vq->free_head = head;
> > > > > > > >
> > > > > > > > @@ -783,7 +803,7 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> > > > > > > > VRING_DESC_F_INDIRECT));
> > > > > > > > BUG_ON(len == 0 || len % sizeof(struct vring_desc));
> > > > > > > >
> > > > > > > > - if (vq->use_dma_api) {
> > > > > > > > + if (vq->use_dma_api && !state->premapped) {
> > > > > > > > for (j = 0; j < len / sizeof(struct vring_desc); j++)
> > > > > > > > vring_unmap_one_split_indirect(vq, &indir_desc[j]);
> > > > > > > > }
> > > > > > > > @@ -2143,7 +2163,7 @@ static inline int virtqueue_add(struct virtqueue *_vq,
> > > > > > > > return vq->packed_ring ? virtqueue_add_packed(_vq, sgs, total_sg,
> > > > > > > > out_sgs, in_sgs, data, ctx, gfp) :
> > > > > > > > virtqueue_add_split(_vq, sgs, total_sg,
> > > > > > > > - out_sgs, in_sgs, data, ctx, gfp);
> > > > > > > > + out_sgs, in_sgs, data, ctx, premapped, gfp);
> > > > > > > > }
> > > > > > > >
> > > > > > > > /**
> > > > > > > > --
> > > > > > > > 2.32.0.3.g01195cf9f
> > > > > > > >
> > > > > >
> > > >
> > >
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH vhost v9 05/12] virtio_ring: split: virtqueue_add_split() support premapped
2023-05-18 9:49 ` Michael S. Tsirkin
2023-05-18 12:20 ` Xuan Zhuo
@ 2023-05-18 12:22 ` Xuan Zhuo
2023-05-18 17:12 ` Michael S. Tsirkin
2023-05-19 3:38 ` Jason Wang
2 siblings, 1 reply; 44+ messages in thread
From: Xuan Zhuo @ 2023-05-18 12:22 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: Christoph Hellwig, virtualization
On Thu, 18 May 2023 05:49:46 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Thu, May 18, 2023 at 05:14:03PM +0800, Xuan Zhuo wrote:
> > On Thu, 18 May 2023 16:57:37 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Thu, May 18, 2023 at 4:29 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Thu, May 18, 2023 at 03:33:52PM +0800, Xuan Zhuo wrote:
> > > > > On Thu, 18 May 2023 03:11:25 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > On Thu, May 18, 2023 at 02:51:57PM +0800, Jason Wang wrote:
> > > > > > > On Wed, May 17, 2023 at 10:23 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > >
> > > > > > > > virtqueue_add_split() only supports virtual addresses, dma is completed
> > > > > > > > in virtqueue_add_split().
> > > > > > > >
> > > > > > > > In some scenarios (such as the AF_XDP scenario), the memory is allocated
> > > > > > > > and DMA is completed in advance, so it is necessary for us to support
> > > > > > > > passing the DMA address to virtqueue_add_split().
> > > > > > > >
> > > > > > > > Record this information in desc_state, we can skip unmap based on this
> > > > > > > > when executing dma unmap.
> > > > > > >
> > > > > > > I would also suggest documenting why a per descriptor metadata is
> > > > > > > needed instead of a per virtqueue one.
> > > > > >
> > > > > > I think we could make it per virtqueue. That would mean all code in
> > > > > > virtio net would have to change to do dma mapping itself instead of
> > > > > > relying on virtio core though. Which is maybe a good idea? Definitely a
> > > > > > very intrusive change though, will need a lot of performance testing
> > > > > > to make sure we don't break anything.
> > > > >
> > > > > In fact, we have tried this idea.
> > > > >
> > > > > The problem is the detach and unmap.
> > > > >
> > > > > We need to get all DMA Addresses from virtio-ring to unmap. Currently, it does
> > > > > not support to return the DMA Address, and for SKB, we need to get multiple DMA
> > > > > Addresses at one time.
> > > > >
> > > > > This need to modify the logic of Virtio-Ring detach. Besides this, I also agree
> > > > > with this idea.
> > > > >
> > > > > Thanks.
> > > >
> > > > Well you can have a version of get_buf that returns them ... but
> > > > it is not clear to me all this is worth it unless you want
> > > > to do unsafe tricks like leaving them mapped.
> > >
> > > Some high speed NIC drivers use this trick for better performance.
> >
> >
> > Interesting, this is the first time I know this. Is there any problem?
>
> depends - if you are relying on the IOMMU then yes - malicious hardware
> can steal guest secrets or corrupt memory since it's a hack not properly
> integrated with linux and there's no real control preventing linux from
> reusing this memory for something unrelated.
> If instead you are using something like bounce buffers then no, but OTOH
> bounce buffers are already expensive so you might not see a lot
> of benefit.
>
> > So, is that virtio-net master the operation of dma by itself the right way?
> >
> > Thanks
>
> I am fine with the approach taken for now. And look at reducing
> cost of dma map/unmap later.
Well, so far, we have discussed various situations, in order to on the same
page. Let's summarize.
1. premapped for pre-virtqueue
We don't always need to check premmapped, and we can try to merge the
premapped flag with use_dma_api so that we do not need to check more flags.
We can switch premapped when vq reset, so I don't think we have to open it by
default. And when supporting af-xdp, there must do vq reset.
2. premapped for pre-desc(state)
* save the flag inside the extra->flags
OK, let me know you want which one.
If I miss something, please point out.
Thanks
>
> >
> >
> > >
> > > > I'd leave that
> > > > for another day maybe.
> > > >
> > > > For marking desc as premapped I think we can use a bit from
> > > > desc_extra->flags, either reusing one of NEXT,AVAIL,USED, or stealing
> > > > another one.
> > >
> > > Probably.
> > >
> > > Thanks
> > >
> > > >
> > > >
> > > >
> > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > > >
> > > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > > > ---
> > > > > > > > drivers/virtio/virtio_ring.c | 38 +++++++++++++++++++++++++++---------
> > > > > > > > 1 file changed, 29 insertions(+), 9 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > > > > index e2fc50c05bec..bd5e84afab37 100644
> > > > > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > > > > @@ -70,6 +70,7 @@
> > > > > > > > struct vring_desc_state_split {
> > > > > > > > void *data; /* Data for callback. */
> > > > > > > > struct vring_desc *indir_desc; /* Indirect descriptor, if any. */
> > > > > > > > + bool premapped; /* DMA mapping is done by driver. */
> > > > > > >
> > > > > > > Going back to the original discussion around where this should be
> > > > > > > placed. I wonder if we can find a common place to store this since it
> > > > > > > has nothing related to virtqueue layout. Maybe desc_extra? And it
> > > > > > > would be even better if we can avoid stressing the cache like above.
> > > > > > >
> > > > > > > > };
> > > > > > > >
> > > > > > > > struct vring_desc_state_packed {
> > > > > > > > @@ -356,8 +357,14 @@ static struct device *vring_dma_dev(const struct vring_virtqueue *vq)
> > > > > > > >
> > > > > > > > /* Map one sg entry. */
> > > > > > > > static int vring_map_one_sg(const struct vring_virtqueue *vq, struct scatterlist *sg,
> > > > > > > > - enum dma_data_direction direction, static dma_addr_t *addr)
> > > > > > > > + enum dma_data_direction direction,
> > > > > > > > + bool premapped, dma_addr_t *addr)
> > > > > > >
> > > > > > > having things like:
> > > > > > >
> > > > > > > int func(bool do)
> > > > > > > {
> > > > > > > if (!do)
> > > > > > > return;
> > > > > > > }
> > > > > > >
> > > > > > > is a hint that the check needs to be done by the caller?
> > > > > > >
> > > > > > > And this change should work for both packed and split. I think we need
> > > > > > > to squash the packed changes here.
> > > > > > >
> > > > > > > Looking at how packed virtqueue uses this in this patch, I don't think
> > > > > > > this patch can even be built. I will wait for a new version and
> > > > > > > continue the review from there.
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > {
> > > > > > > > + if (premapped) {
> > > > > > > > + *addr = sg_dma_address(sg);
> > > > > > > > + return 0;
> > > > > > > > + }
> > > > > > > > +
> > > > > > > > if (!vq->use_dma_api) {
> > > > > > > > /*
> > > > > > > > * If DMA is not used, KMSAN doesn't know that the scatterlist
> > > > > > > > @@ -445,7 +452,7 @@ static void vring_unmap_one_split_indirect(const struct vring_virtqueue *vq,
> > > > > > > > }
> > > > > > > >
> > > > > > > > static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
> > > > > > > > - unsigned int i)
> > > > > > > > + unsigned int i, bool premapped)
> > > > > > > > {
> > > > > > > > struct vring_desc_extra *extra = vq->split.desc_extra;
> > > > > > > > u16 flags;
> > > > > > > > @@ -462,6 +469,9 @@ static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
> > > > > > > > (flags & VRING_DESC_F_WRITE) ?
> > > > > > > > DMA_FROM_DEVICE : DMA_TO_DEVICE);
> > > > > > > > } else {
> > > > > > > > + if (premapped)
> > > > > > > > + goto out;
> > > > > > > > +
> > > > > > > > dma_unmap_page(vring_dma_dev(vq),
> > > > > > > > extra[i].addr,
> > > > > > > > extra[i].len,
> > > > > > > > @@ -532,6 +542,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > > > > unsigned int in_sgs,
> > > > > > > > void *data,
> > > > > > > > void *ctx,
> > > > > > > > + bool premapped,
> > > > > > > > gfp_t gfp)
> > > > > > > > {
> > > > > > > > struct vring_virtqueue *vq = to_vvq(_vq);
> > > > > > > > @@ -595,7 +606,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > > > > for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> > > > > > > > dma_addr_t addr;
> > > > > > > >
> > > > > > > > - if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, &addr))
> > > > > > > > + if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, premapped, &addr))
> > > > > > > > goto unmap_release;
> > > > > > > >
> > > > > > > > prev = i;
> > > > > > > > @@ -611,7 +622,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > > > > for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> > > > > > > > dma_addr_t addr;
> > > > > > > >
> > > > > > > > - if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, &addr))
> > > > > > > > + if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, premapped, &addr))
> > > > > > > > goto unmap_release;
> > > > > > > >
> > > > > > > > prev = i;
> > > > > > > > @@ -657,6 +668,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > > > >
> > > > > > > > /* Store token and indirect buffer state. */
> > > > > > > > vq->split.desc_state[head].data = data;
> > > > > > > > + vq->split.desc_state[head].premapped = premapped;
> > > > > > > > if (indirect)
> > > > > > > > vq->split.desc_state[head].indir_desc = desc;
> > > > > > > > else
> > > > > > > > @@ -686,6 +698,14 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > > > > return 0;
> > > > > > > >
> > > > > > > > unmap_release:
> > > > > > > > + if (premapped) {
> > > > > > > > + if (indirect)
> > > > > > > > + kfree(desc);
> > > > > > > > +
> > > > > > > > + END_USE(vq);
> > > > > > > > + return -ENOMEM;
> > > > > > > > + }
> > > > > > > > +
> > > > > > > > err_idx = i;
> > > > > > > >
> > > > > > > > if (indirect)
> > > > > > > > @@ -700,7 +720,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > > > > vring_unmap_one_split_indirect(vq, &desc[i]);
> > > > > > > > i = virtio16_to_cpu(_vq->vdev, desc[i].next);
> > > > > > > > } else
> > > > > > > > - i = vring_unmap_one_split(vq, i);
> > > > > > > > + i = vring_unmap_one_split(vq, i, false);
> > > > > > > > }
> > > > > > > >
> > > > > > > > if (indirect)
> > > > > > > > @@ -757,12 +777,12 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> > > > > > > > i = head;
> > > > > > > >
> > > > > > > > while (vq->split.vring.desc[i].flags & nextflag) {
> > > > > > > > - vring_unmap_one_split(vq, i);
> > > > > > > > + vring_unmap_one_split(vq, i, state->premapped);
> > > > > > > > i = vq->split.desc_extra[i].next;
> > > > > > > > vq->vq.num_free++;
> > > > > > > > }
> > > > > > > >
> > > > > > > > - vring_unmap_one_split(vq, i);
> > > > > > > > + vring_unmap_one_split(vq, i, state->premapped);
> > > > > > > > vq->split.desc_extra[i].next = vq->free_head;
> > > > > > > > vq->free_head = head;
> > > > > > > >
> > > > > > > > @@ -783,7 +803,7 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> > > > > > > > VRING_DESC_F_INDIRECT));
> > > > > > > > BUG_ON(len == 0 || len % sizeof(struct vring_desc));
> > > > > > > >
> > > > > > > > - if (vq->use_dma_api) {
> > > > > > > > + if (vq->use_dma_api && !state->premapped) {
> > > > > > > > for (j = 0; j < len / sizeof(struct vring_desc); j++)
> > > > > > > > vring_unmap_one_split_indirect(vq, &indir_desc[j]);
> > > > > > > > }
> > > > > > > > @@ -2143,7 +2163,7 @@ static inline int virtqueue_add(struct virtqueue *_vq,
> > > > > > > > return vq->packed_ring ? virtqueue_add_packed(_vq, sgs, total_sg,
> > > > > > > > out_sgs, in_sgs, data, ctx, gfp) :
> > > > > > > > virtqueue_add_split(_vq, sgs, total_sg,
> > > > > > > > - out_sgs, in_sgs, data, ctx, gfp);
> > > > > > > > + out_sgs, in_sgs, data, ctx, premapped, gfp);
> > > > > > > > }
> > > > > > > >
> > > > > > > > /**
> > > > > > > > --
> > > > > > > > 2.32.0.3.g01195cf9f
> > > > > > > >
> > > > > >
> > > >
> > >
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH vhost v9 10/12] virtio_ring: correct the expression of the description of virtqueue_resize()
2023-05-18 12:12 ` Xuan Zhuo
@ 2023-05-18 14:00 ` Michael S. Tsirkin
0 siblings, 0 replies; 44+ messages in thread
From: Michael S. Tsirkin @ 2023-05-18 14:00 UTC (permalink / raw)
To: Xuan Zhuo; +Cc: Christoph Hellwig, virtualization
On Thu, May 18, 2023 at 08:12:50PM +0800, Xuan Zhuo wrote:
> On Wed, 17 May 2023 10:22:47 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > Modify the "useless" to a more accurate "unused".
> >
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > Acked-by: Jason Wang <jasowang@redhat.com>
>
>
> Hi Michael,
>
> Currently, we have to discuss a few cases regarding dma-premapped. Can you
> merge the three patches behind this (including this one)? These have nothing to
> do with dma-premapped.
>
> Should I post a new patch set separately?
>
> Thanks.
Please do.
>
>
>
> > ---
> > drivers/virtio/virtio_ring.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index 42730c4ecdc5..c90160d2d280 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -2734,7 +2734,7 @@ EXPORT_SYMBOL_GPL(vring_create_virtqueue_dma);
> > * virtqueue_resize - resize the vring of vq
> > * @_vq: the struct virtqueue we're talking about.
> > * @num: new ring num
> > - * @recycle: callback for recycle the useless buffer
> > + * @recycle: callback to recycle unused buffers
> > *
> > * When it is really necessary to create a new vring, it will set the current vq
> > * into the reset state. Then call the passed callback to recycle the buffer
> > --
> > 2.32.0.3.g01195cf9f
> >
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH vhost v9 05/12] virtio_ring: split: virtqueue_add_split() support premapped
2023-05-18 12:22 ` Xuan Zhuo
@ 2023-05-18 17:12 ` Michael S. Tsirkin
2023-05-19 3:27 ` Xuan Zhuo
0 siblings, 1 reply; 44+ messages in thread
From: Michael S. Tsirkin @ 2023-05-18 17:12 UTC (permalink / raw)
To: Xuan Zhuo; +Cc: Christoph Hellwig, virtualization
On Thu, May 18, 2023 at 08:22:14PM +0800, Xuan Zhuo wrote:
> On Thu, 18 May 2023 05:49:46 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > On Thu, May 18, 2023 at 05:14:03PM +0800, Xuan Zhuo wrote:
> > > On Thu, 18 May 2023 16:57:37 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > On Thu, May 18, 2023 at 4:29 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >
> > > > > On Thu, May 18, 2023 at 03:33:52PM +0800, Xuan Zhuo wrote:
> > > > > > On Thu, 18 May 2023 03:11:25 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > > On Thu, May 18, 2023 at 02:51:57PM +0800, Jason Wang wrote:
> > > > > > > > On Wed, May 17, 2023 at 10:23 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > >
> > > > > > > > > virtqueue_add_split() only supports virtual addresses, dma is completed
> > > > > > > > > in virtqueue_add_split().
> > > > > > > > >
> > > > > > > > > In some scenarios (such as the AF_XDP scenario), the memory is allocated
> > > > > > > > > and DMA is completed in advance, so it is necessary for us to support
> > > > > > > > > passing the DMA address to virtqueue_add_split().
> > > > > > > > >
> > > > > > > > > Record this information in desc_state, we can skip unmap based on this
> > > > > > > > > when executing dma unmap.
> > > > > > > >
> > > > > > > > I would also suggest documenting why a per descriptor metadata is
> > > > > > > > needed instead of a per virtqueue one.
> > > > > > >
> > > > > > > I think we could make it per virtqueue. That would mean all code in
> > > > > > > virtio net would have to change to do dma mapping itself instead of
> > > > > > > relying on virtio core though. Which is maybe a good idea? Definitely a
> > > > > > > very intrusive change though, will need a lot of performance testing
> > > > > > > to make sure we don't break anything.
> > > > > >
> > > > > > In fact, we have tried this idea.
> > > > > >
> > > > > > The problem is the detach and unmap.
> > > > > >
> > > > > > We need to get all DMA Addresses from virtio-ring to unmap. Currently, it does
> > > > > > not support to return the DMA Address, and for SKB, we need to get multiple DMA
> > > > > > Addresses at one time.
> > > > > >
> > > > > > This need to modify the logic of Virtio-Ring detach. Besides this, I also agree
> > > > > > with this idea.
> > > > > >
> > > > > > Thanks.
> > > > >
> > > > > Well you can have a version of get_buf that returns them ... but
> > > > > it is not clear to me all this is worth it unless you want
> > > > > to do unsafe tricks like leaving them mapped.
> > > >
> > > > Some high speed NIC drivers use this trick for better performance.
> > >
> > >
> > > Interesting, this is the first time I know this. Is there any problem?
> >
> > depends - if you are relying on the IOMMU then yes - malicious hardware
> > can steal guest secrets or corrupt memory since it's a hack not properly
> > integrated with linux and there's no real control preventing linux from
> > reusing this memory for something unrelated.
> > If instead you are using something like bounce buffers then no, but OTOH
> > bounce buffers are already expensive so you might not see a lot
> > of benefit.
> >
> > > So, is that virtio-net master the operation of dma by itself the right way?
> > >
> > > Thanks
> >
> > I am fine with the approach taken for now. And look at reducing
> > cost of dma map/unmap later.
>
> Well, so far, we have discussed various situations, in order to on the same
> page. Let's summarize.
>
> 1. premapped for pre-virtqueue
>
> We don't always need to check premmapped, and we can try to merge the
> premapped flag with use_dma_api so that we do not need to check more flags.
>
> We can switch premapped when vq reset, so I don't think we have to open it by
> default. And when supporting af-xdp, there must do vq reset.
Sounds attractive but I didn't realize AF_XDP blocks regular xdp.
Is that true?
> 2. premapped for pre-desc(state)
> * save the flag inside the extra->flags
>
> OK, let me know you want which one.
>
> If I miss something, please point out.
>
> Thanks
>
>
>
>
>
> >
> > >
> > >
> > > >
> > > > > I'd leave that
> > > > > for another day maybe.
> > > > >
> > > > > For marking desc as premapped I think we can use a bit from
> > > > > desc_extra->flags, either reusing one of NEXT,AVAIL,USED, or stealing
> > > > > another one.
> > > >
> > > > Probably.
> > > >
> > > > Thanks
> > > >
> > > > >
> > > > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > >
> > > > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > > > > ---
> > > > > > > > > drivers/virtio/virtio_ring.c | 38 +++++++++++++++++++++++++++---------
> > > > > > > > > 1 file changed, 29 insertions(+), 9 deletions(-)
> > > > > > > > >
> > > > > > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > > > > > index e2fc50c05bec..bd5e84afab37 100644
> > > > > > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > > > > > @@ -70,6 +70,7 @@
> > > > > > > > > struct vring_desc_state_split {
> > > > > > > > > void *data; /* Data for callback. */
> > > > > > > > > struct vring_desc *indir_desc; /* Indirect descriptor, if any. */
> > > > > > > > > + bool premapped; /* DMA mapping is done by driver. */
> > > > > > > >
> > > > > > > > Going back to the original discussion around where this should be
> > > > > > > > placed. I wonder if we can find a common place to store this since it
> > > > > > > > has nothing related to virtqueue layout. Maybe desc_extra? And it
> > > > > > > > would be even better if we can avoid stressing the cache like above.
> > > > > > > >
> > > > > > > > > };
> > > > > > > > >
> > > > > > > > > struct vring_desc_state_packed {
> > > > > > > > > @@ -356,8 +357,14 @@ static struct device *vring_dma_dev(const struct vring_virtqueue *vq)
> > > > > > > > >
> > > > > > > > > /* Map one sg entry. */
> > > > > > > > > static int vring_map_one_sg(const struct vring_virtqueue *vq, struct scatterlist *sg,
> > > > > > > > > - enum dma_data_direction direction, static dma_addr_t *addr)
> > > > > > > > > + enum dma_data_direction direction,
> > > > > > > > > + bool premapped, dma_addr_t *addr)
> > > > > > > >
> > > > > > > > having things like:
> > > > > > > >
> > > > > > > > int func(bool do)
> > > > > > > > {
> > > > > > > > if (!do)
> > > > > > > > return;
> > > > > > > > }
> > > > > > > >
> > > > > > > > is a hint that the check needs to be done by the caller?
> > > > > > > >
> > > > > > > > And this change should work for both packed and split. I think we need
> > > > > > > > to squash the packed changes here.
> > > > > > > >
> > > > > > > > Looking at how packed virtqueue uses this in this patch, I don't think
> > > > > > > > this patch can even be built. I will wait for a new version and
> > > > > > > > continue the review from there.
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > {
> > > > > > > > > + if (premapped) {
> > > > > > > > > + *addr = sg_dma_address(sg);
> > > > > > > > > + return 0;
> > > > > > > > > + }
> > > > > > > > > +
> > > > > > > > > if (!vq->use_dma_api) {
> > > > > > > > > /*
> > > > > > > > > * If DMA is not used, KMSAN doesn't know that the scatterlist
> > > > > > > > > @@ -445,7 +452,7 @@ static void vring_unmap_one_split_indirect(const struct vring_virtqueue *vq,
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
> > > > > > > > > - unsigned int i)
> > > > > > > > > + unsigned int i, bool premapped)
> > > > > > > > > {
> > > > > > > > > struct vring_desc_extra *extra = vq->split.desc_extra;
> > > > > > > > > u16 flags;
> > > > > > > > > @@ -462,6 +469,9 @@ static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
> > > > > > > > > (flags & VRING_DESC_F_WRITE) ?
> > > > > > > > > DMA_FROM_DEVICE : DMA_TO_DEVICE);
> > > > > > > > > } else {
> > > > > > > > > + if (premapped)
> > > > > > > > > + goto out;
> > > > > > > > > +
> > > > > > > > > dma_unmap_page(vring_dma_dev(vq),
> > > > > > > > > extra[i].addr,
> > > > > > > > > extra[i].len,
> > > > > > > > > @@ -532,6 +542,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > > > > > unsigned int in_sgs,
> > > > > > > > > void *data,
> > > > > > > > > void *ctx,
> > > > > > > > > + bool premapped,
> > > > > > > > > gfp_t gfp)
> > > > > > > > > {
> > > > > > > > > struct vring_virtqueue *vq = to_vvq(_vq);
> > > > > > > > > @@ -595,7 +606,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > > > > > for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> > > > > > > > > dma_addr_t addr;
> > > > > > > > >
> > > > > > > > > - if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, &addr))
> > > > > > > > > + if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, premapped, &addr))
> > > > > > > > > goto unmap_release;
> > > > > > > > >
> > > > > > > > > prev = i;
> > > > > > > > > @@ -611,7 +622,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > > > > > for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> > > > > > > > > dma_addr_t addr;
> > > > > > > > >
> > > > > > > > > - if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, &addr))
> > > > > > > > > + if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, premapped, &addr))
> > > > > > > > > goto unmap_release;
> > > > > > > > >
> > > > > > > > > prev = i;
> > > > > > > > > @@ -657,6 +668,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > > > > >
> > > > > > > > > /* Store token and indirect buffer state. */
> > > > > > > > > vq->split.desc_state[head].data = data;
> > > > > > > > > + vq->split.desc_state[head].premapped = premapped;
> > > > > > > > > if (indirect)
> > > > > > > > > vq->split.desc_state[head].indir_desc = desc;
> > > > > > > > > else
> > > > > > > > > @@ -686,6 +698,14 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > > > > > return 0;
> > > > > > > > >
> > > > > > > > > unmap_release:
> > > > > > > > > + if (premapped) {
> > > > > > > > > + if (indirect)
> > > > > > > > > + kfree(desc);
> > > > > > > > > +
> > > > > > > > > + END_USE(vq);
> > > > > > > > > + return -ENOMEM;
> > > > > > > > > + }
> > > > > > > > > +
> > > > > > > > > err_idx = i;
> > > > > > > > >
> > > > > > > > > if (indirect)
> > > > > > > > > @@ -700,7 +720,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > > > > > vring_unmap_one_split_indirect(vq, &desc[i]);
> > > > > > > > > i = virtio16_to_cpu(_vq->vdev, desc[i].next);
> > > > > > > > > } else
> > > > > > > > > - i = vring_unmap_one_split(vq, i);
> > > > > > > > > + i = vring_unmap_one_split(vq, i, false);
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > if (indirect)
> > > > > > > > > @@ -757,12 +777,12 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> > > > > > > > > i = head;
> > > > > > > > >
> > > > > > > > > while (vq->split.vring.desc[i].flags & nextflag) {
> > > > > > > > > - vring_unmap_one_split(vq, i);
> > > > > > > > > + vring_unmap_one_split(vq, i, state->premapped);
> > > > > > > > > i = vq->split.desc_extra[i].next;
> > > > > > > > > vq->vq.num_free++;
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > - vring_unmap_one_split(vq, i);
> > > > > > > > > + vring_unmap_one_split(vq, i, state->premapped);
> > > > > > > > > vq->split.desc_extra[i].next = vq->free_head;
> > > > > > > > > vq->free_head = head;
> > > > > > > > >
> > > > > > > > > @@ -783,7 +803,7 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> > > > > > > > > VRING_DESC_F_INDIRECT));
> > > > > > > > > BUG_ON(len == 0 || len % sizeof(struct vring_desc));
> > > > > > > > >
> > > > > > > > > - if (vq->use_dma_api) {
> > > > > > > > > + if (vq->use_dma_api && !state->premapped) {
> > > > > > > > > for (j = 0; j < len / sizeof(struct vring_desc); j++)
> > > > > > > > > vring_unmap_one_split_indirect(vq, &indir_desc[j]);
> > > > > > > > > }
> > > > > > > > > @@ -2143,7 +2163,7 @@ static inline int virtqueue_add(struct virtqueue *_vq,
> > > > > > > > > return vq->packed_ring ? virtqueue_add_packed(_vq, sgs, total_sg,
> > > > > > > > > out_sgs, in_sgs, data, ctx, gfp) :
> > > > > > > > > virtqueue_add_split(_vq, sgs, total_sg,
> > > > > > > > > - out_sgs, in_sgs, data, ctx, gfp);
> > > > > > > > > + out_sgs, in_sgs, data, ctx, premapped, gfp);
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > /**
> > > > > > > > > --
> > > > > > > > > 2.32.0.3.g01195cf9f
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> >
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH vhost v9 05/12] virtio_ring: split: virtqueue_add_split() support premapped
2023-05-18 17:12 ` Michael S. Tsirkin
@ 2023-05-19 3:27 ` Xuan Zhuo
2023-05-19 3:39 ` Jason Wang
0 siblings, 1 reply; 44+ messages in thread
From: Xuan Zhuo @ 2023-05-19 3:27 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: Christoph Hellwig, virtualization
On Thu, 18 May 2023 13:12:49 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Thu, May 18, 2023 at 08:22:14PM +0800, Xuan Zhuo wrote:
> > On Thu, 18 May 2023 05:49:46 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > On Thu, May 18, 2023 at 05:14:03PM +0800, Xuan Zhuo wrote:
> > > > On Thu, 18 May 2023 16:57:37 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > On Thu, May 18, 2023 at 4:29 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > >
> > > > > > On Thu, May 18, 2023 at 03:33:52PM +0800, Xuan Zhuo wrote:
> > > > > > > On Thu, 18 May 2023 03:11:25 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > > > On Thu, May 18, 2023 at 02:51:57PM +0800, Jason Wang wrote:
> > > > > > > > > On Wed, May 17, 2023 at 10:23 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > > >
> > > > > > > > > > virtqueue_add_split() only supports virtual addresses, dma is completed
> > > > > > > > > > in virtqueue_add_split().
> > > > > > > > > >
> > > > > > > > > > In some scenarios (such as the AF_XDP scenario), the memory is allocated
> > > > > > > > > > and DMA is completed in advance, so it is necessary for us to support
> > > > > > > > > > passing the DMA address to virtqueue_add_split().
> > > > > > > > > >
> > > > > > > > > > Record this information in desc_state, we can skip unmap based on this
> > > > > > > > > > when executing dma unmap.
> > > > > > > > >
> > > > > > > > > I would also suggest documenting why a per descriptor metadata is
> > > > > > > > > needed instead of a per virtqueue one.
> > > > > > > >
> > > > > > > > I think we could make it per virtqueue. That would mean all code in
> > > > > > > > virtio net would have to change to do dma mapping itself instead of
> > > > > > > > relying on virtio core though. Which is maybe a good idea? Definitely a
> > > > > > > > very intrusive change though, will need a lot of performance testing
> > > > > > > > to make sure we don't break anything.
> > > > > > >
> > > > > > > In fact, we have tried this idea.
> > > > > > >
> > > > > > > The problem is the detach and unmap.
> > > > > > >
> > > > > > > We need to get all DMA Addresses from virtio-ring to unmap. Currently, it does
> > > > > > > not support to return the DMA Address, and for SKB, we need to get multiple DMA
> > > > > > > Addresses at one time.
> > > > > > >
> > > > > > > This need to modify the logic of Virtio-Ring detach. Besides this, I also agree
> > > > > > > with this idea.
> > > > > > >
> > > > > > > Thanks.
> > > > > >
> > > > > > Well you can have a version of get_buf that returns them ... but
> > > > > > it is not clear to me all this is worth it unless you want
> > > > > > to do unsafe tricks like leaving them mapped.
> > > > >
> > > > > Some high speed NIC drivers use this trick for better performance.
> > > >
> > > >
> > > > Interesting, this is the first time I know this. Is there any problem?
> > >
> > > depends - if you are relying on the IOMMU then yes - malicious hardware
> > > can steal guest secrets or corrupt memory since it's a hack not properly
> > > integrated with linux and there's no real control preventing linux from
> > > reusing this memory for something unrelated.
> > > If instead you are using something like bounce buffers then no, but OTOH
> > > bounce buffers are already expensive so you might not see a lot
> > > of benefit.
> > >
> > > > So, is that virtio-net master the operation of dma by itself the right way?
> > > >
> > > > Thanks
> > >
> > > I am fine with the approach taken for now. And look at reducing
> > > cost of dma map/unmap later.
> >
> > Well, so far, we have discussed various situations, in order to on the same
> > page. Let's summarize.
> >
> > 1. premapped for pre-virtqueue
> >
> > We don't always need to check premmapped, and we can try to merge the
> > premapped flag with use_dma_api so that we do not need to check more flags.
> >
> > We can switch premapped when vq reset, so I don't think we have to open it by
> > default. And when supporting af-xdp, there must do vq reset.
>
> Sounds attractive but I didn't realize AF_XDP blocks regular xdp.
Sorry, I do not understand your mean.
AF_XDP depends on XDP when it runs, and both must be bound to the nic at the
same time to work normally. The data packet is redirected by xdp to af-xdp.
Since af-xdp needs to refill vq, we need to reset vq to release the buffer that
has been filled into vq.
My idea is that we can turn on the premmapped function while executing vq reset.
Thanks.
> Is that true?
>
> > 2. premapped for pre-desc(state)
> > * save the flag inside the extra->flags
> >
> > OK, let me know you want which one.
> >
> > If I miss something, please point out.
> >
> > Thanks
> >
> >
> >
> >
> >
> > >
> > > >
> > > >
> > > > >
> > > > > > I'd leave that
> > > > > > for another day maybe.
> > > > > >
> > > > > > For marking desc as premapped I think we can use a bit from
> > > > > > desc_extra->flags, either reusing one of NEXT,AVAIL,USED, or stealing
> > > > > > another one.
> > > > >
> > > > > Probably.
> > > > >
> > > > > Thanks
> > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > > > > > ---
> > > > > > > > > > drivers/virtio/virtio_ring.c | 38 +++++++++++++++++++++++++++---------
> > > > > > > > > > 1 file changed, 29 insertions(+), 9 deletions(-)
> > > > > > > > > >
> > > > > > > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > > > > > > index e2fc50c05bec..bd5e84afab37 100644
> > > > > > > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > > > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > > > > > > @@ -70,6 +70,7 @@
> > > > > > > > > > struct vring_desc_state_split {
> > > > > > > > > > void *data; /* Data for callback. */
> > > > > > > > > > struct vring_desc *indir_desc; /* Indirect descriptor, if any. */
> > > > > > > > > > + bool premapped; /* DMA mapping is done by driver. */
> > > > > > > > >
> > > > > > > > > Going back to the original discussion around where this should be
> > > > > > > > > placed. I wonder if we can find a common place to store this since it
> > > > > > > > > has nothing related to virtqueue layout. Maybe desc_extra? And it
> > > > > > > > > would be even better if we can avoid stressing the cache like above.
> > > > > > > > >
> > > > > > > > > > };
> > > > > > > > > >
> > > > > > > > > > struct vring_desc_state_packed {
> > > > > > > > > > @@ -356,8 +357,14 @@ static struct device *vring_dma_dev(const struct vring_virtqueue *vq)
> > > > > > > > > >
> > > > > > > > > > /* Map one sg entry. */
> > > > > > > > > > static int vring_map_one_sg(const struct vring_virtqueue *vq, struct scatterlist *sg,
> > > > > > > > > > - enum dma_data_direction direction, static dma_addr_t *addr)
> > > > > > > > > > + enum dma_data_direction direction,
> > > > > > > > > > + bool premapped, dma_addr_t *addr)
> > > > > > > > >
> > > > > > > > > having things like:
> > > > > > > > >
> > > > > > > > > int func(bool do)
> > > > > > > > > {
> > > > > > > > > if (!do)
> > > > > > > > > return;
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > is a hint that the check needs to be done by the caller?
> > > > > > > > >
> > > > > > > > > And this change should work for both packed and split. I think we need
> > > > > > > > > to squash the packed changes here.
> > > > > > > > >
> > > > > > > > > Looking at how packed virtqueue uses this in this patch, I don't think
> > > > > > > > > this patch can even be built. I will wait for a new version and
> > > > > > > > > continue the review from there.
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > {
> > > > > > > > > > + if (premapped) {
> > > > > > > > > > + *addr = sg_dma_address(sg);
> > > > > > > > > > + return 0;
> > > > > > > > > > + }
> > > > > > > > > > +
> > > > > > > > > > if (!vq->use_dma_api) {
> > > > > > > > > > /*
> > > > > > > > > > * If DMA is not used, KMSAN doesn't know that the scatterlist
> > > > > > > > > > @@ -445,7 +452,7 @@ static void vring_unmap_one_split_indirect(const struct vring_virtqueue *vq,
> > > > > > > > > > }
> > > > > > > > > >
> > > > > > > > > > static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
> > > > > > > > > > - unsigned int i)
> > > > > > > > > > + unsigned int i, bool premapped)
> > > > > > > > > > {
> > > > > > > > > > struct vring_desc_extra *extra = vq->split.desc_extra;
> > > > > > > > > > u16 flags;
> > > > > > > > > > @@ -462,6 +469,9 @@ static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
> > > > > > > > > > (flags & VRING_DESC_F_WRITE) ?
> > > > > > > > > > DMA_FROM_DEVICE : DMA_TO_DEVICE);
> > > > > > > > > > } else {
> > > > > > > > > > + if (premapped)
> > > > > > > > > > + goto out;
> > > > > > > > > > +
> > > > > > > > > > dma_unmap_page(vring_dma_dev(vq),
> > > > > > > > > > extra[i].addr,
> > > > > > > > > > extra[i].len,
> > > > > > > > > > @@ -532,6 +542,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > > > > > > unsigned int in_sgs,
> > > > > > > > > > void *data,
> > > > > > > > > > void *ctx,
> > > > > > > > > > + bool premapped,
> > > > > > > > > > gfp_t gfp)
> > > > > > > > > > {
> > > > > > > > > > struct vring_virtqueue *vq = to_vvq(_vq);
> > > > > > > > > > @@ -595,7 +606,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > > > > > > for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> > > > > > > > > > dma_addr_t addr;
> > > > > > > > > >
> > > > > > > > > > - if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, &addr))
> > > > > > > > > > + if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, premapped, &addr))
> > > > > > > > > > goto unmap_release;
> > > > > > > > > >
> > > > > > > > > > prev = i;
> > > > > > > > > > @@ -611,7 +622,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > > > > > > for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> > > > > > > > > > dma_addr_t addr;
> > > > > > > > > >
> > > > > > > > > > - if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, &addr))
> > > > > > > > > > + if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, premapped, &addr))
> > > > > > > > > > goto unmap_release;
> > > > > > > > > >
> > > > > > > > > > prev = i;
> > > > > > > > > > @@ -657,6 +668,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > > > > > >
> > > > > > > > > > /* Store token and indirect buffer state. */
> > > > > > > > > > vq->split.desc_state[head].data = data;
> > > > > > > > > > + vq->split.desc_state[head].premapped = premapped;
> > > > > > > > > > if (indirect)
> > > > > > > > > > vq->split.desc_state[head].indir_desc = desc;
> > > > > > > > > > else
> > > > > > > > > > @@ -686,6 +698,14 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > > > > > > return 0;
> > > > > > > > > >
> > > > > > > > > > unmap_release:
> > > > > > > > > > + if (premapped) {
> > > > > > > > > > + if (indirect)
> > > > > > > > > > + kfree(desc);
> > > > > > > > > > +
> > > > > > > > > > + END_USE(vq);
> > > > > > > > > > + return -ENOMEM;
> > > > > > > > > > + }
> > > > > > > > > > +
> > > > > > > > > > err_idx = i;
> > > > > > > > > >
> > > > > > > > > > if (indirect)
> > > > > > > > > > @@ -700,7 +720,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > > > > > > vring_unmap_one_split_indirect(vq, &desc[i]);
> > > > > > > > > > i = virtio16_to_cpu(_vq->vdev, desc[i].next);
> > > > > > > > > > } else
> > > > > > > > > > - i = vring_unmap_one_split(vq, i);
> > > > > > > > > > + i = vring_unmap_one_split(vq, i, false);
> > > > > > > > > > }
> > > > > > > > > >
> > > > > > > > > > if (indirect)
> > > > > > > > > > @@ -757,12 +777,12 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> > > > > > > > > > i = head;
> > > > > > > > > >
> > > > > > > > > > while (vq->split.vring.desc[i].flags & nextflag) {
> > > > > > > > > > - vring_unmap_one_split(vq, i);
> > > > > > > > > > + vring_unmap_one_split(vq, i, state->premapped);
> > > > > > > > > > i = vq->split.desc_extra[i].next;
> > > > > > > > > > vq->vq.num_free++;
> > > > > > > > > > }
> > > > > > > > > >
> > > > > > > > > > - vring_unmap_one_split(vq, i);
> > > > > > > > > > + vring_unmap_one_split(vq, i, state->premapped);
> > > > > > > > > > vq->split.desc_extra[i].next = vq->free_head;
> > > > > > > > > > vq->free_head = head;
> > > > > > > > > >
> > > > > > > > > > @@ -783,7 +803,7 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> > > > > > > > > > VRING_DESC_F_INDIRECT));
> > > > > > > > > > BUG_ON(len == 0 || len % sizeof(struct vring_desc));
> > > > > > > > > >
> > > > > > > > > > - if (vq->use_dma_api) {
> > > > > > > > > > + if (vq->use_dma_api && !state->premapped) {
> > > > > > > > > > for (j = 0; j < len / sizeof(struct vring_desc); j++)
> > > > > > > > > > vring_unmap_one_split_indirect(vq, &indir_desc[j]);
> > > > > > > > > > }
> > > > > > > > > > @@ -2143,7 +2163,7 @@ static inline int virtqueue_add(struct virtqueue *_vq,
> > > > > > > > > > return vq->packed_ring ? virtqueue_add_packed(_vq, sgs, total_sg,
> > > > > > > > > > out_sgs, in_sgs, data, ctx, gfp) :
> > > > > > > > > > virtqueue_add_split(_vq, sgs, total_sg,
> > > > > > > > > > - out_sgs, in_sgs, data, ctx, gfp);
> > > > > > > > > > + out_sgs, in_sgs, data, ctx, premapped, gfp);
> > > > > > > > > > }
> > > > > > > > > >
> > > > > > > > > > /**
> > > > > > > > > > --
> > > > > > > > > > 2.32.0.3.g01195cf9f
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > >
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH vhost v9 05/12] virtio_ring: split: virtqueue_add_split() support premapped
2023-05-18 9:49 ` Michael S. Tsirkin
2023-05-18 12:20 ` Xuan Zhuo
2023-05-18 12:22 ` Xuan Zhuo
@ 2023-05-19 3:38 ` Jason Wang
2 siblings, 0 replies; 44+ messages in thread
From: Jason Wang @ 2023-05-19 3:38 UTC (permalink / raw)
To: Michael S. Tsirkin, Xuan Zhuo; +Cc: Christoph Hellwig, virtualization
在 2023/5/18 17:49, Michael S. Tsirkin 写道:
> On Thu, May 18, 2023 at 05:14:03PM +0800, Xuan Zhuo wrote:
>> On Thu, 18 May 2023 16:57:37 +0800, Jason Wang <jasowang@redhat.com> wrote:
>>> On Thu, May 18, 2023 at 4:29 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>>>> On Thu, May 18, 2023 at 03:33:52PM +0800, Xuan Zhuo wrote:
>>>>> On Thu, 18 May 2023 03:11:25 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
>>>>>> On Thu, May 18, 2023 at 02:51:57PM +0800, Jason Wang wrote:
>>>>>>> On Wed, May 17, 2023 at 10:23 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>>>>>>>> virtqueue_add_split() only supports virtual addresses, dma is completed
>>>>>>>> in virtqueue_add_split().
>>>>>>>>
>>>>>>>> In some scenarios (such as the AF_XDP scenario), the memory is allocated
>>>>>>>> and DMA is completed in advance, so it is necessary for us to support
>>>>>>>> passing the DMA address to virtqueue_add_split().
>>>>>>>>
>>>>>>>> Record this information in desc_state, we can skip unmap based on this
>>>>>>>> when executing dma unmap.
>>>>>>> I would also suggest documenting why a per descriptor metadata is
>>>>>>> needed instead of a per virtqueue one.
>>>>>> I think we could make it per virtqueue. That would mean all code in
>>>>>> virtio net would have to change to do dma mapping itself instead of
>>>>>> relying on virtio core though. Which is maybe a good idea? Definitely a
>>>>>> very intrusive change though, will need a lot of performance testing
>>>>>> to make sure we don't break anything.
>>>>> In fact, we have tried this idea.
>>>>>
>>>>> The problem is the detach and unmap.
>>>>>
>>>>> We need to get all DMA Addresses from virtio-ring to unmap. Currently, it does
>>>>> not support to return the DMA Address, and for SKB, we need to get multiple DMA
>>>>> Addresses at one time.
>>>>>
>>>>> This need to modify the logic of Virtio-Ring detach. Besides this, I also agree
>>>>> with this idea.
>>>>>
>>>>> Thanks.
>>>> Well you can have a version of get_buf that returns them ... but
>>>> it is not clear to me all this is worth it unless you want
>>>> to do unsafe tricks like leaving them mapped.
>>> Some high speed NIC drivers use this trick for better performance.
>>
>> Interesting, this is the first time I know this. Is there any problem?
> depends - if you are relying on the IOMMU then yes - malicious hardware
> can steal guest secrets or corrupt memory since it's a hack not properly
> integrated with linux and there's no real control preventing linux from
> reusing this memory for something unrelated.
The pages are pre-allocated/mapped buffers for RX. So it should be fine.
Thanks
> If instead you are using something like bounce buffers then no, but OTOH
> bounce buffers are already expensive so you might not see a lot
> of benefit.
>
>> So, is that virtio-net master the operation of dma by itself the right way?
>>
>> Thanks
> I am fine with the approach taken for now. And look at reducing
> cost of dma map/unmap later.
>
>>
>>>> I'd leave that
>>>> for another day maybe.
>>>>
>>>> For marking desc as premapped I think we can use a bit from
>>>> desc_extra->flags, either reusing one of NEXT,AVAIL,USED, or stealing
>>>> another one.
>>> Probably.
>>>
>>> Thanks
>>>
>>>>
>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>>>>>>>> ---
>>>>>>>> drivers/virtio/virtio_ring.c | 38 +++++++++++++++++++++++++++---------
>>>>>>>> 1 file changed, 29 insertions(+), 9 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
>>>>>>>> index e2fc50c05bec..bd5e84afab37 100644
>>>>>>>> --- a/drivers/virtio/virtio_ring.c
>>>>>>>> +++ b/drivers/virtio/virtio_ring.c
>>>>>>>> @@ -70,6 +70,7 @@
>>>>>>>> struct vring_desc_state_split {
>>>>>>>> void *data; /* Data for callback. */
>>>>>>>> struct vring_desc *indir_desc; /* Indirect descriptor, if any. */
>>>>>>>> + bool premapped; /* DMA mapping is done by driver. */
>>>>>>> Going back to the original discussion around where this should be
>>>>>>> placed. I wonder if we can find a common place to store this since it
>>>>>>> has nothing related to virtqueue layout. Maybe desc_extra? And it
>>>>>>> would be even better if we can avoid stressing the cache like above.
>>>>>>>
>>>>>>>> };
>>>>>>>>
>>>>>>>> struct vring_desc_state_packed {
>>>>>>>> @@ -356,8 +357,14 @@ static struct device *vring_dma_dev(const struct vring_virtqueue *vq)
>>>>>>>>
>>>>>>>> /* Map one sg entry. */
>>>>>>>> static int vring_map_one_sg(const struct vring_virtqueue *vq, struct scatterlist *sg,
>>>>>>>> - enum dma_data_direction direction, static dma_addr_t *addr)
>>>>>>>> + enum dma_data_direction direction,
>>>>>>>> + bool premapped, dma_addr_t *addr)
>>>>>>> having things like:
>>>>>>>
>>>>>>> int func(bool do)
>>>>>>> {
>>>>>>> if (!do)
>>>>>>> return;
>>>>>>> }
>>>>>>>
>>>>>>> is a hint that the check needs to be done by the caller?
>>>>>>>
>>>>>>> And this change should work for both packed and split. I think we need
>>>>>>> to squash the packed changes here.
>>>>>>>
>>>>>>> Looking at how packed virtqueue uses this in this patch, I don't think
>>>>>>> this patch can even be built. I will wait for a new version and
>>>>>>> continue the review from there.
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> {
>>>>>>>> + if (premapped) {
>>>>>>>> + *addr = sg_dma_address(sg);
>>>>>>>> + return 0;
>>>>>>>> + }
>>>>>>>> +
>>>>>>>> if (!vq->use_dma_api) {
>>>>>>>> /*
>>>>>>>> * If DMA is not used, KMSAN doesn't know that the scatterlist
>>>>>>>> @@ -445,7 +452,7 @@ static void vring_unmap_one_split_indirect(const struct vring_virtqueue *vq,
>>>>>>>> }
>>>>>>>>
>>>>>>>> static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
>>>>>>>> - unsigned int i)
>>>>>>>> + unsigned int i, bool premapped)
>>>>>>>> {
>>>>>>>> struct vring_desc_extra *extra = vq->split.desc_extra;
>>>>>>>> u16 flags;
>>>>>>>> @@ -462,6 +469,9 @@ static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
>>>>>>>> (flags & VRING_DESC_F_WRITE) ?
>>>>>>>> DMA_FROM_DEVICE : DMA_TO_DEVICE);
>>>>>>>> } else {
>>>>>>>> + if (premapped)
>>>>>>>> + goto out;
>>>>>>>> +
>>>>>>>> dma_unmap_page(vring_dma_dev(vq),
>>>>>>>> extra[i].addr,
>>>>>>>> extra[i].len,
>>>>>>>> @@ -532,6 +542,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>>>>>>>> unsigned int in_sgs,
>>>>>>>> void *data,
>>>>>>>> void *ctx,
>>>>>>>> + bool premapped,
>>>>>>>> gfp_t gfp)
>>>>>>>> {
>>>>>>>> struct vring_virtqueue *vq = to_vvq(_vq);
>>>>>>>> @@ -595,7 +606,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>>>>>>>> for (sg = sgs[n]; sg; sg = sg_next(sg)) {
>>>>>>>> dma_addr_t addr;
>>>>>>>>
>>>>>>>> - if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, &addr))
>>>>>>>> + if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, premapped, &addr))
>>>>>>>> goto unmap_release;
>>>>>>>>
>>>>>>>> prev = i;
>>>>>>>> @@ -611,7 +622,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>>>>>>>> for (sg = sgs[n]; sg; sg = sg_next(sg)) {
>>>>>>>> dma_addr_t addr;
>>>>>>>>
>>>>>>>> - if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, &addr))
>>>>>>>> + if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, premapped, &addr))
>>>>>>>> goto unmap_release;
>>>>>>>>
>>>>>>>> prev = i;
>>>>>>>> @@ -657,6 +668,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>>>>>>>>
>>>>>>>> /* Store token and indirect buffer state. */
>>>>>>>> vq->split.desc_state[head].data = data;
>>>>>>>> + vq->split.desc_state[head].premapped = premapped;
>>>>>>>> if (indirect)
>>>>>>>> vq->split.desc_state[head].indir_desc = desc;
>>>>>>>> else
>>>>>>>> @@ -686,6 +698,14 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>>>>>>>> return 0;
>>>>>>>>
>>>>>>>> unmap_release:
>>>>>>>> + if (premapped) {
>>>>>>>> + if (indirect)
>>>>>>>> + kfree(desc);
>>>>>>>> +
>>>>>>>> + END_USE(vq);
>>>>>>>> + return -ENOMEM;
>>>>>>>> + }
>>>>>>>> +
>>>>>>>> err_idx = i;
>>>>>>>>
>>>>>>>> if (indirect)
>>>>>>>> @@ -700,7 +720,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>>>>>>>> vring_unmap_one_split_indirect(vq, &desc[i]);
>>>>>>>> i = virtio16_to_cpu(_vq->vdev, desc[i].next);
>>>>>>>> } else
>>>>>>>> - i = vring_unmap_one_split(vq, i);
>>>>>>>> + i = vring_unmap_one_split(vq, i, false);
>>>>>>>> }
>>>>>>>>
>>>>>>>> if (indirect)
>>>>>>>> @@ -757,12 +777,12 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
>>>>>>>> i = head;
>>>>>>>>
>>>>>>>> while (vq->split.vring.desc[i].flags & nextflag) {
>>>>>>>> - vring_unmap_one_split(vq, i);
>>>>>>>> + vring_unmap_one_split(vq, i, state->premapped);
>>>>>>>> i = vq->split.desc_extra[i].next;
>>>>>>>> vq->vq.num_free++;
>>>>>>>> }
>>>>>>>>
>>>>>>>> - vring_unmap_one_split(vq, i);
>>>>>>>> + vring_unmap_one_split(vq, i, state->premapped);
>>>>>>>> vq->split.desc_extra[i].next = vq->free_head;
>>>>>>>> vq->free_head = head;
>>>>>>>>
>>>>>>>> @@ -783,7 +803,7 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
>>>>>>>> VRING_DESC_F_INDIRECT));
>>>>>>>> BUG_ON(len == 0 || len % sizeof(struct vring_desc));
>>>>>>>>
>>>>>>>> - if (vq->use_dma_api) {
>>>>>>>> + if (vq->use_dma_api && !state->premapped) {
>>>>>>>> for (j = 0; j < len / sizeof(struct vring_desc); j++)
>>>>>>>> vring_unmap_one_split_indirect(vq, &indir_desc[j]);
>>>>>>>> }
>>>>>>>> @@ -2143,7 +2163,7 @@ static inline int virtqueue_add(struct virtqueue *_vq,
>>>>>>>> return vq->packed_ring ? virtqueue_add_packed(_vq, sgs, total_sg,
>>>>>>>> out_sgs, in_sgs, data, ctx, gfp) :
>>>>>>>> virtqueue_add_split(_vq, sgs, total_sg,
>>>>>>>> - out_sgs, in_sgs, data, ctx, gfp);
>>>>>>>> + out_sgs, in_sgs, data, ctx, premapped, gfp);
>>>>>>>> }
>>>>>>>>
>>>>>>>> /**
>>>>>>>> --
>>>>>>>> 2.32.0.3.g01195cf9f
>>>>>>>>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH vhost v9 05/12] virtio_ring: split: virtqueue_add_split() support premapped
2023-05-19 3:27 ` Xuan Zhuo
@ 2023-05-19 3:39 ` Jason Wang
0 siblings, 0 replies; 44+ messages in thread
From: Jason Wang @ 2023-05-19 3:39 UTC (permalink / raw)
To: Xuan Zhuo; +Cc: Christoph Hellwig, virtualization, Michael S. Tsirkin
On Fri, May 19, 2023 at 11:33 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Thu, 18 May 2023 13:12:49 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > On Thu, May 18, 2023 at 08:22:14PM +0800, Xuan Zhuo wrote:
> > > On Thu, 18 May 2023 05:49:46 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > On Thu, May 18, 2023 at 05:14:03PM +0800, Xuan Zhuo wrote:
> > > > > On Thu, 18 May 2023 16:57:37 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > On Thu, May 18, 2023 at 4:29 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > >
> > > > > > > On Thu, May 18, 2023 at 03:33:52PM +0800, Xuan Zhuo wrote:
> > > > > > > > On Thu, 18 May 2023 03:11:25 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > > > > On Thu, May 18, 2023 at 02:51:57PM +0800, Jason Wang wrote:
> > > > > > > > > > On Wed, May 17, 2023 at 10:23 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > virtqueue_add_split() only supports virtual addresses, dma is completed
> > > > > > > > > > > in virtqueue_add_split().
> > > > > > > > > > >
> > > > > > > > > > > In some scenarios (such as the AF_XDP scenario), the memory is allocated
> > > > > > > > > > > and DMA is completed in advance, so it is necessary for us to support
> > > > > > > > > > > passing the DMA address to virtqueue_add_split().
> > > > > > > > > > >
> > > > > > > > > > > Record this information in desc_state, we can skip unmap based on this
> > > > > > > > > > > when executing dma unmap.
> > > > > > > > > >
> > > > > > > > > > I would also suggest documenting why a per descriptor metadata is
> > > > > > > > > > needed instead of a per virtqueue one.
> > > > > > > > >
> > > > > > > > > I think we could make it per virtqueue. That would mean all code in
> > > > > > > > > virtio net would have to change to do dma mapping itself instead of
> > > > > > > > > relying on virtio core though. Which is maybe a good idea? Definitely a
> > > > > > > > > very intrusive change though, will need a lot of performance testing
> > > > > > > > > to make sure we don't break anything.
> > > > > > > >
> > > > > > > > In fact, we have tried this idea.
> > > > > > > >
> > > > > > > > The problem is the detach and unmap.
> > > > > > > >
> > > > > > > > We need to get all DMA Addresses from virtio-ring to unmap. Currently, it does
> > > > > > > > not support to return the DMA Address, and for SKB, we need to get multiple DMA
> > > > > > > > Addresses at one time.
> > > > > > > >
> > > > > > > > This need to modify the logic of Virtio-Ring detach. Besides this, I also agree
> > > > > > > > with this idea.
> > > > > > > >
> > > > > > > > Thanks.
> > > > > > >
> > > > > > > Well you can have a version of get_buf that returns them ... but
> > > > > > > it is not clear to me all this is worth it unless you want
> > > > > > > to do unsafe tricks like leaving them mapped.
> > > > > >
> > > > > > Some high speed NIC drivers use this trick for better performance.
> > > > >
> > > > >
> > > > > Interesting, this is the first time I know this. Is there any problem?
> > > >
> > > > depends - if you are relying on the IOMMU then yes - malicious hardware
> > > > can steal guest secrets or corrupt memory since it's a hack not properly
> > > > integrated with linux and there's no real control preventing linux from
> > > > reusing this memory for something unrelated.
> > > > If instead you are using something like bounce buffers then no, but OTOH
> > > > bounce buffers are already expensive so you might not see a lot
> > > > of benefit.
> > > >
> > > > > So, is that virtio-net master the operation of dma by itself the right way?
> > > > >
> > > > > Thanks
> > > >
> > > > I am fine with the approach taken for now. And look at reducing
> > > > cost of dma map/unmap later.
> > >
> > > Well, so far, we have discussed various situations, in order to on the same
> > > page. Let's summarize.
> > >
> > > 1. premapped for pre-virtqueue
> > >
> > > We don't always need to check premmapped, and we can try to merge the
> > > premapped flag with use_dma_api so that we do not need to check more flags.
> > >
> > > We can switch premapped when vq reset, so I don't think we have to open it by
> > > default. And when supporting af-xdp, there must do vq reset.
> >
> > Sounds attractive but I didn't realize AF_XDP blocks regular xdp.
>
> Sorry, I do not understand your mean.
>
> AF_XDP depends on XDP when it runs, and both must be bound to the nic at the
> same time to work normally. The data packet is redirected by xdp to af-xdp.
>
> Since af-xdp needs to refill vq, we need to reset vq to release the buffer that
> has been filled into vq.
>
> My idea is that we can turn on the premmapped function while executing vq reset.
That should be fine, and it helps to avoid per descriptor metadata.
Thanks
>
> Thanks.
>
>
> > Is that true?
> >
> > > 2. premapped for pre-desc(state)
> > > * save the flag inside the extra->flags
> > >
> > > OK, let me know you want which one.
> > >
> > > If I miss something, please point out.
> > >
> > > Thanks
> > >
> > >
> > >
> > >
> > >
> > > >
> > > > >
> > > > >
> > > > > >
> > > > > > > I'd leave that
> > > > > > > for another day maybe.
> > > > > > >
> > > > > > > For marking desc as premapped I think we can use a bit from
> > > > > > > desc_extra->flags, either reusing one of NEXT,AVAIL,USED, or stealing
> > > > > > > another one.
> > > > > >
> > > > > > Probably.
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > > > > > > ---
> > > > > > > > > > > drivers/virtio/virtio_ring.c | 38 +++++++++++++++++++++++++++---------
> > > > > > > > > > > 1 file changed, 29 insertions(+), 9 deletions(-)
> > > > > > > > > > >
> > > > > > > > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > > > > > > > index e2fc50c05bec..bd5e84afab37 100644
> > > > > > > > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > > > > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > > > > > > > @@ -70,6 +70,7 @@
> > > > > > > > > > > struct vring_desc_state_split {
> > > > > > > > > > > void *data; /* Data for callback. */
> > > > > > > > > > > struct vring_desc *indir_desc; /* Indirect descriptor, if any. */
> > > > > > > > > > > + bool premapped; /* DMA mapping is done by driver. */
> > > > > > > > > >
> > > > > > > > > > Going back to the original discussion around where this should be
> > > > > > > > > > placed. I wonder if we can find a common place to store this since it
> > > > > > > > > > has nothing related to virtqueue layout. Maybe desc_extra? And it
> > > > > > > > > > would be even better if we can avoid stressing the cache like above.
> > > > > > > > > >
> > > > > > > > > > > };
> > > > > > > > > > >
> > > > > > > > > > > struct vring_desc_state_packed {
> > > > > > > > > > > @@ -356,8 +357,14 @@ static struct device *vring_dma_dev(const struct vring_virtqueue *vq)
> > > > > > > > > > >
> > > > > > > > > > > /* Map one sg entry. */
> > > > > > > > > > > static int vring_map_one_sg(const struct vring_virtqueue *vq, struct scatterlist *sg,
> > > > > > > > > > > - enum dma_data_direction direction, static dma_addr_t *addr)
> > > > > > > > > > > + enum dma_data_direction direction,
> > > > > > > > > > > + bool premapped, dma_addr_t *addr)
> > > > > > > > > >
> > > > > > > > > > having things like:
> > > > > > > > > >
> > > > > > > > > > int func(bool do)
> > > > > > > > > > {
> > > > > > > > > > if (!do)
> > > > > > > > > > return;
> > > > > > > > > > }
> > > > > > > > > >
> > > > > > > > > > is a hint that the check needs to be done by the caller?
> > > > > > > > > >
> > > > > > > > > > And this change should work for both packed and split. I think we need
> > > > > > > > > > to squash the packed changes here.
> > > > > > > > > >
> > > > > > > > > > Looking at how packed virtqueue uses this in this patch, I don't think
> > > > > > > > > > this patch can even be built. I will wait for a new version and
> > > > > > > > > > continue the review from there.
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > {
> > > > > > > > > > > + if (premapped) {
> > > > > > > > > > > + *addr = sg_dma_address(sg);
> > > > > > > > > > > + return 0;
> > > > > > > > > > > + }
> > > > > > > > > > > +
> > > > > > > > > > > if (!vq->use_dma_api) {
> > > > > > > > > > > /*
> > > > > > > > > > > * If DMA is not used, KMSAN doesn't know that the scatterlist
> > > > > > > > > > > @@ -445,7 +452,7 @@ static void vring_unmap_one_split_indirect(const struct vring_virtqueue *vq,
> > > > > > > > > > > }
> > > > > > > > > > >
> > > > > > > > > > > static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
> > > > > > > > > > > - unsigned int i)
> > > > > > > > > > > + unsigned int i, bool premapped)
> > > > > > > > > > > {
> > > > > > > > > > > struct vring_desc_extra *extra = vq->split.desc_extra;
> > > > > > > > > > > u16 flags;
> > > > > > > > > > > @@ -462,6 +469,9 @@ static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
> > > > > > > > > > > (flags & VRING_DESC_F_WRITE) ?
> > > > > > > > > > > DMA_FROM_DEVICE : DMA_TO_DEVICE);
> > > > > > > > > > > } else {
> > > > > > > > > > > + if (premapped)
> > > > > > > > > > > + goto out;
> > > > > > > > > > > +
> > > > > > > > > > > dma_unmap_page(vring_dma_dev(vq),
> > > > > > > > > > > extra[i].addr,
> > > > > > > > > > > extra[i].len,
> > > > > > > > > > > @@ -532,6 +542,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > > > > > > > unsigned int in_sgs,
> > > > > > > > > > > void *data,
> > > > > > > > > > > void *ctx,
> > > > > > > > > > > + bool premapped,
> > > > > > > > > > > gfp_t gfp)
> > > > > > > > > > > {
> > > > > > > > > > > struct vring_virtqueue *vq = to_vvq(_vq);
> > > > > > > > > > > @@ -595,7 +606,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > > > > > > > for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> > > > > > > > > > > dma_addr_t addr;
> > > > > > > > > > >
> > > > > > > > > > > - if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, &addr))
> > > > > > > > > > > + if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, premapped, &addr))
> > > > > > > > > > > goto unmap_release;
> > > > > > > > > > >
> > > > > > > > > > > prev = i;
> > > > > > > > > > > @@ -611,7 +622,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > > > > > > > for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> > > > > > > > > > > dma_addr_t addr;
> > > > > > > > > > >
> > > > > > > > > > > - if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, &addr))
> > > > > > > > > > > + if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, premapped, &addr))
> > > > > > > > > > > goto unmap_release;
> > > > > > > > > > >
> > > > > > > > > > > prev = i;
> > > > > > > > > > > @@ -657,6 +668,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > > > > > > >
> > > > > > > > > > > /* Store token and indirect buffer state. */
> > > > > > > > > > > vq->split.desc_state[head].data = data;
> > > > > > > > > > > + vq->split.desc_state[head].premapped = premapped;
> > > > > > > > > > > if (indirect)
> > > > > > > > > > > vq->split.desc_state[head].indir_desc = desc;
> > > > > > > > > > > else
> > > > > > > > > > > @@ -686,6 +698,14 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > > > > > > > return 0;
> > > > > > > > > > >
> > > > > > > > > > > unmap_release:
> > > > > > > > > > > + if (premapped) {
> > > > > > > > > > > + if (indirect)
> > > > > > > > > > > + kfree(desc);
> > > > > > > > > > > +
> > > > > > > > > > > + END_USE(vq);
> > > > > > > > > > > + return -ENOMEM;
> > > > > > > > > > > + }
> > > > > > > > > > > +
> > > > > > > > > > > err_idx = i;
> > > > > > > > > > >
> > > > > > > > > > > if (indirect)
> > > > > > > > > > > @@ -700,7 +720,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > > > > > > > > vring_unmap_one_split_indirect(vq, &desc[i]);
> > > > > > > > > > > i = virtio16_to_cpu(_vq->vdev, desc[i].next);
> > > > > > > > > > > } else
> > > > > > > > > > > - i = vring_unmap_one_split(vq, i);
> > > > > > > > > > > + i = vring_unmap_one_split(vq, i, false);
> > > > > > > > > > > }
> > > > > > > > > > >
> > > > > > > > > > > if (indirect)
> > > > > > > > > > > @@ -757,12 +777,12 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> > > > > > > > > > > i = head;
> > > > > > > > > > >
> > > > > > > > > > > while (vq->split.vring.desc[i].flags & nextflag) {
> > > > > > > > > > > - vring_unmap_one_split(vq, i);
> > > > > > > > > > > + vring_unmap_one_split(vq, i, state->premapped);
> > > > > > > > > > > i = vq->split.desc_extra[i].next;
> > > > > > > > > > > vq->vq.num_free++;
> > > > > > > > > > > }
> > > > > > > > > > >
> > > > > > > > > > > - vring_unmap_one_split(vq, i);
> > > > > > > > > > > + vring_unmap_one_split(vq, i, state->premapped);
> > > > > > > > > > > vq->split.desc_extra[i].next = vq->free_head;
> > > > > > > > > > > vq->free_head = head;
> > > > > > > > > > >
> > > > > > > > > > > @@ -783,7 +803,7 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> > > > > > > > > > > VRING_DESC_F_INDIRECT));
> > > > > > > > > > > BUG_ON(len == 0 || len % sizeof(struct vring_desc));
> > > > > > > > > > >
> > > > > > > > > > > - if (vq->use_dma_api) {
> > > > > > > > > > > + if (vq->use_dma_api && !state->premapped) {
> > > > > > > > > > > for (j = 0; j < len / sizeof(struct vring_desc); j++)
> > > > > > > > > > > vring_unmap_one_split_indirect(vq, &indir_desc[j]);
> > > > > > > > > > > }
> > > > > > > > > > > @@ -2143,7 +2163,7 @@ static inline int virtqueue_add(struct virtqueue *_vq,
> > > > > > > > > > > return vq->packed_ring ? virtqueue_add_packed(_vq, sgs, total_sg,
> > > > > > > > > > > out_sgs, in_sgs, data, ctx, gfp) :
> > > > > > > > > > > virtqueue_add_split(_vq, sgs, total_sg,
> > > > > > > > > > > - out_sgs, in_sgs, data, ctx, gfp);
> > > > > > > > > > > + out_sgs, in_sgs, data, ctx, premapped, gfp);
> > > > > > > > > > > }
> > > > > > > > > > >
> > > > > > > > > > > /**
> > > > > > > > > > > --
> > > > > > > > > > > 2.32.0.3.g01195cf9f
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > >
> >
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH vhost v9 01/12] virtio_ring: put mapping error check in vring_map_one_sg
2023-05-17 2:22 ` [PATCH vhost v9 01/12] virtio_ring: put mapping error check in vring_map_one_sg Xuan Zhuo
2023-05-18 6:51 ` Jason Wang
@ 2023-05-23 6:02 ` Christoph Hellwig
1 sibling, 0 replies; 44+ messages in thread
From: Christoph Hellwig @ 2023-05-23 6:02 UTC (permalink / raw)
To: Xuan Zhuo; +Cc: Christoph Hellwig, Michael S. Tsirkin, virtualization
On Wed, May 17, 2023 at 10:22:38AM +0800, Xuan Zhuo wrote:
> -static dma_addr_t vring_map_one_sg(const struct vring_virtqueue *vq,
> - struct scatterlist *sg,
> - enum dma_data_direction direction)
> +static int vring_map_one_sg(const struct vring_virtqueue *vq, struct scatterlist *sg,
> + enum dma_data_direction direction, static dma_addr_t *addr)
Please avoid making this unreadable by adding overly lone lines.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH vhost v9 04/12] virtio_ring: virtqueue_add() support premapped
2023-05-17 2:22 ` [PATCH vhost v9 04/12] virtio_ring: virtqueue_add() support premapped Xuan Zhuo
2023-05-18 6:51 ` Jason Wang
@ 2023-05-23 6:03 ` Christoph Hellwig
2023-05-23 7:19 ` Michael S. Tsirkin
1 sibling, 1 reply; 44+ messages in thread
From: Christoph Hellwig @ 2023-05-23 6:03 UTC (permalink / raw)
To: Xuan Zhuo; +Cc: Christoph Hellwig, Michael S. Tsirkin, virtualization
On Wed, May 17, 2023 at 10:22:41AM +0800, Xuan Zhuo wrote:
> virtuque_add() adds parameter premapped.
Well, I can see that. But why?
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH vhost v9 04/12] virtio_ring: virtqueue_add() support premapped
2023-05-23 6:03 ` Christoph Hellwig
@ 2023-05-23 7:19 ` Michael S. Tsirkin
2023-05-24 3:22 ` Xuan Zhuo
0 siblings, 1 reply; 44+ messages in thread
From: Michael S. Tsirkin @ 2023-05-23 7:19 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Xuan Zhuo, virtualization
On Mon, May 22, 2023 at 11:03:26PM -0700, Christoph Hellwig wrote:
> On Wed, May 17, 2023 at 10:22:41AM +0800, Xuan Zhuo wrote:
> > virtuque_add() adds parameter premapped.
>
> Well, I can see that. But why?
Assuming it's intentional, it should say something along the lines of
"The parameter is unused for now, and all callers just pass false.
It will be used by a follow-up patch".
It's not a bad way to split patches, this way actual logic in
the next patch stands out as opposed to being masked by
the code reshuffling following the extra parameter.
--
MST
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH vhost v9 04/12] virtio_ring: virtqueue_add() support premapped
2023-05-23 7:19 ` Michael S. Tsirkin
@ 2023-05-24 3:22 ` Xuan Zhuo
0 siblings, 0 replies; 44+ messages in thread
From: Xuan Zhuo @ 2023-05-24 3:22 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: Christoph Hellwig, virtualization
On Tue, 23 May 2023 03:19:42 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Mon, May 22, 2023 at 11:03:26PM -0700, Christoph Hellwig wrote:
> > On Wed, May 17, 2023 at 10:22:41AM +0800, Xuan Zhuo wrote:
> > > virtuque_add() adds parameter premapped.
> >
> > Well, I can see that. But why?
>
> Assuming it's intentional, it should say something along the lines of
> "The parameter is unused for now, and all callers just pass false.
> It will be used by a follow-up patch".
I agree.
Will fix.
>
> It's not a bad way to split patches, this way actual logic in
> the next patch stands out as opposed to being masked by
> the code reshuffling following the extra parameter.
I think so.
Thanks.
>
> --
> MST
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 44+ messages in thread
end of thread, other threads:[~2023-05-24 3:24 UTC | newest]
Thread overview: 44+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-05-17 2:22 [PATCH vhost v9 00/12] virtio core prepares for AF_XDP Xuan Zhuo
2023-05-17 2:22 ` [PATCH vhost v9 01/12] virtio_ring: put mapping error check in vring_map_one_sg Xuan Zhuo
2023-05-18 6:51 ` Jason Wang
2023-05-23 6:02 ` Christoph Hellwig
2023-05-17 2:22 ` [PATCH vhost v9 02/12] virtio_ring: simplify the reference of desc state inside detach_buf_split() Xuan Zhuo
2023-05-18 6:51 ` Jason Wang
2023-05-17 2:22 ` [PATCH vhost v9 03/12] virtio_ring: check use_dma_api before unmap desc for indirect Xuan Zhuo
2023-05-18 6:51 ` Jason Wang
2023-05-17 2:22 ` [PATCH vhost v9 04/12] virtio_ring: virtqueue_add() support premapped Xuan Zhuo
2023-05-18 6:51 ` Jason Wang
2023-05-23 6:03 ` Christoph Hellwig
2023-05-23 7:19 ` Michael S. Tsirkin
2023-05-24 3:22 ` Xuan Zhuo
2023-05-17 2:22 ` [PATCH vhost v9 05/12] virtio_ring: split: virtqueue_add_split() " Xuan Zhuo
2023-05-18 6:51 ` Jason Wang
2023-05-18 7:11 ` Michael S. Tsirkin
2023-05-18 7:33 ` Xuan Zhuo
2023-05-18 7:54 ` Jason Wang
2023-05-18 7:56 ` Xuan Zhuo
2023-05-18 8:57 ` Jason Wang
2023-05-18 9:18 ` Xuan Zhuo
2023-05-18 8:29 ` Michael S. Tsirkin
2023-05-18 8:50 ` Xuan Zhuo
2023-05-18 9:41 ` Michael S. Tsirkin
2023-05-18 8:57 ` Jason Wang
2023-05-18 9:14 ` Xuan Zhuo
2023-05-18 9:49 ` Michael S. Tsirkin
2023-05-18 12:20 ` Xuan Zhuo
2023-05-18 12:22 ` Xuan Zhuo
2023-05-18 17:12 ` Michael S. Tsirkin
2023-05-19 3:27 ` Xuan Zhuo
2023-05-19 3:39 ` Jason Wang
2023-05-19 3:38 ` Jason Wang
2023-05-18 9:44 ` Michael S. Tsirkin
2023-05-18 9:24 ` Xuan Zhuo
2023-05-17 2:22 ` [PATCH vhost v9 06/12] virtio_ring: packed: virtqueue_add_packed() " Xuan Zhuo
2023-05-17 2:22 ` [PATCH vhost v9 07/12] virtio_ring: introduce virtqueue_add_outbuf_premapped() Xuan Zhuo
2023-05-17 2:22 ` [PATCH vhost v9 08/12] virtio_ring: introduce virtqueue_add_inbuf_premapped() Xuan Zhuo
2023-05-17 2:22 ` [PATCH vhost v9 09/12] virtio_ring: introduce virtqueue_dma_dev() Xuan Zhuo
2023-05-17 2:22 ` [PATCH vhost v9 10/12] virtio_ring: correct the expression of the description of virtqueue_resize() Xuan Zhuo
2023-05-18 12:12 ` Xuan Zhuo
2023-05-18 14:00 ` Michael S. Tsirkin
2023-05-17 2:22 ` [PATCH vhost v9 11/12] virtio_ring: separate the logic of reset/enable from virtqueue_resize Xuan Zhuo
2023-05-17 2:22 ` [PATCH vhost v9 12/12] virtio_ring: introduce virtqueue_reset() Xuan Zhuo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).