From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out30-112.freemail.mail.aliyun.com (out30-112.freemail.mail.aliyun.com [115.124.30.112]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E55D01552E1; Thu, 12 Sep 2024 07:42:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.112 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726126975; cv=none; b=o6T125T5gW5OY8rqZsocJL4SUeeHYL+nGQj2MmITfxdjrwAmRnhYlGU20sxwT/Bm4m3p1M1CD3eTk0MOZwSr6yefrNt7S98MTkVokXXPNJ/ZWuIXddbPORNyrffhqYRgPQo+V8aWxRr8Q4YoCIVRhx+jRZBw30s1qa+/+GZ1CTk= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726126975; c=relaxed/simple; bh=j/o+wcsyRF/DAocXBwgSxHKy2B6Gf7g2GPjxJLoUbKc=; h=Message-ID:Subject:Date:From:To:Cc:References:In-Reply-To: Content-Type; b=LbYI3zIfxgf2P1XFr7ios2Ky2TPF1wMRHnSkTx7T8Jr8zBu81w4ZQIjYAZ5NRKdmzHBGlRT+aohYpoYBJhjfGlmDDCTj1kuIrrrkMsYML7a7y0B+y++ieccsCQaWivRNcBkJIZA+moAbDSzuUr3cpSm2yuq4zr8eHPd2/eVa04Q= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=l4GrSaRi; arc=none smtp.client-ip=115.124.30.112 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="l4GrSaRi" DKIM-Signature:v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1726126969; h=Message-ID:Subject:Date:From:To:Content-Type; bh=E05P+8jgvmwnGVGquKKKGBfFfSCeC2pb59XNl5pAJiw=; b=l4GrSaRiPvKF0gkqrQ9v2OGy/uYyLTTT+uNm/PXPTrKZ2JaLAmAPPDFSy1InTLWXkN/gy0cvwwde5MTCmXrBH3B9102VCW2cWOj4c4rMd2LF78NqZ/eke/5wCyR5CoUPZgw0s3l9FnTfh3GedhrNFr2NQ6DX44S6vNhUhimY9Kg= Received: from localhost(mailfrom:xuanzhuo@linux.alibaba.com fp:SMTPD_---0WEqd5Wx_1726126968) by smtp.aliyun-inc.com; Thu, 12 Sep 2024 15:42:49 +0800 Message-ID: <1726126586.5406406-2-xuanzhuo@linux.alibaba.com> Subject: Re: [PATCH net-next 04/13] virtio_ring: perform premapped operations based on per-buffer Date: Thu, 12 Sep 2024 15:36:26 +0800 From: Xuan Zhuo To: Jason Wang Cc: netdev@vger.kernel.org, "Michael S. Tsirkin" , =?utf-8?q?Eugenio_P=C3=A9rez?= , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , virtualization@lists.linux.dev, bpf@vger.kernel.org References: <20240820073330.9161-1-xuanzhuo@linux.alibaba.com> <20240820073330.9161-5-xuanzhuo@linux.alibaba.com> In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: On Wed, 11 Sep 2024 11:54:25 +0800, Jason Wang wrote: > On Tue, Aug 20, 2024 at 3:33=E2=80=AFPM Xuan Zhuo wrote: > > > > The current configuration sets the virtqueue (vq) to premapped mode, > > implying that all buffers submitted to this queue must be mapped ahead > > of time. This presents a challenge for the virtnet send queue (sq): the > > virtnet driver would be required to keep track of dma information for vq > > size * 17, which can be substantial. However, if the premapped mode were > > applied on a per-buffer basis, the complexity would be greatly reduced. > > With AF_XDP enabled, AF_XDP buffers would become premapped, while kernel > > skb buffers could remain unmapped. > > Is this only applied to TX or both TX and RX. For rx, if you mean per-buffer dma buffer, I think it is yes, rx can reuse this. If you mean should we do premapped for the normal rx buffers, I think we should, that can reduce the dma map operations. > > > > > We can distinguish them by sg_page(sg), When sg_page(sg) is NULL, this > > indicates that the driver has performed DMA mapping in advance, allowing > > the Virtio core to directly utilize sg_dma_address(sg) without > > conducting any internal DMA mapping. > > This seems conflict with the code below? > > #define sg_is_premapped(sg) (!sg_page(sg)) Sorry, I do not get for you. The key point is that the sg->page is setted by driver. So I mean if the driver sets sg->page =3D NULL, then for this sg, the virtio core can skip dma mapping. If the driver sets sg->page to the page of the buffer, then the virtio core should do dma mapping for this sg. > > And rethink of the design, a question is that is there a chance that > VM_PFNMAP area could be used for virtio-net? If it is true, the trick > for sg_page() can not work? > > A quick glance told me AF_XEP seems to be safe as it uses > pin_user_pages(), but we need to check other possibilities. > > Or we need to fall back to our previous idea, having new APIs. > > > Additionally, DMA unmap operations > > for this buffer will be bypassed. > > > > Suggested-by: Jason Wang > > Signed-off-by: Xuan Zhuo > > --- > > drivers/virtio/virtio_ring.c | 70 +++++++++++++++++++++--------------- > > 1 file changed, 41 insertions(+), 29 deletions(-) > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c > > index b43eca93015c..7efddc71af67 100644 > > --- a/drivers/virtio/virtio_ring.c > > +++ b/drivers/virtio/virtio_ring.c > > @@ -235,6 +235,7 @@ static void vring_free(struct virtqueue *_vq); > > */ > > > > #define to_vvq(_vq) container_of_const(_vq, struct vring_virtqueue, vq) > > +#define sg_is_premapped(sg) (!sg_page(sg)) > > > > static bool virtqueue_use_indirect(const struct vring_virtqueue *vq, > > unsigned int total_sg) > > @@ -292,9 +293,10 @@ static bool vring_use_dma_api(const struct virtio_= device *vdev) > > return false; > > } > > > > -static bool vring_need_unmap_buffer(const struct vring_virtqueue *vrin= g) > > +static bool vring_need_unmap_buffer(const struct vring_virtqueue *vrin= g, > > + const struct vring_desc_extra *extr= a) > > { > > - return vring->use_dma_api && !vring->premapped; > > + return vring->use_dma_api && (extra->addr !=3D DMA_MAPPING_ERRO= R); > > } > > > > size_t virtio_max_dma_size(const struct virtio_device *vdev) > > @@ -366,7 +368,7 @@ static struct device *vring_dma_dev(const struct vr= ing_virtqueue *vq) > > static int vring_map_one_sg(const struct vring_virtqueue *vq, struct s= catterlist *sg, > > enum dma_data_direction direction, dma_addr= _t *addr) > > { > > - if (vq->premapped) { > > + if (sg_is_premapped(sg)) { > > *addr =3D sg_dma_address(sg); > > return 0; > > } > > @@ -457,7 +459,7 @@ static unsigned int vring_unmap_one_split(const str= uct vring_virtqueue *vq, > > (flags & VRING_DESC_F_WRITE) ? > > DMA_FROM_DEVICE : DMA_TO_DEVICE); > > } else { > > - if (!vring_need_unmap_buffer(vq)) > > + if (!vring_need_unmap_buffer(vq, extra)) > > goto out; > > > > dma_unmap_page(vring_dma_dev(vq), > > @@ -510,7 +512,7 @@ static inline unsigned int virtqueue_add_desc_split= (struct virtqueue *vq, > > dma_addr_t addr, > > unsigned int len, > > u16 flags, > > - bool indirect) > > + bool indirect, bool= premapped) > > { > > u16 next; > > > > @@ -518,7 +520,7 @@ static inline unsigned int virtqueue_add_desc_split= (struct virtqueue *vq, > > desc[i].addr =3D cpu_to_virtio64(vq->vdev, addr); > > desc[i].len =3D cpu_to_virtio32(vq->vdev, len); > > > > - extra[i].addr =3D addr; > > + extra[i].addr =3D premapped ? DMA_MAPPING_ERROR : addr; > > extra[i].len =3D len; > > extra[i].flags =3D flags; > > > > @@ -611,7 +613,7 @@ static inline int virtqueue_add_split(struct virtqu= eue *_vq, > > */ > > i =3D virtqueue_add_desc_split(_vq, desc, extra= , i, addr, sg->length, > > VRING_DESC_F_NEXT, > > - indirect); > > + indirect, sg_is_pr= emapped(sg)); > > } > > } > > for (; n < (out_sgs + in_sgs); n++) { > > @@ -629,12 +631,12 @@ static inline int virtqueue_add_split(struct virt= queue *_vq, > > sg->length, > > VRING_DESC_F_NEXT | > > VRING_DESC_F_WRITE, > > - indirect); > > + indirect, sg_is_pr= emapped(sg)); > > } > > } > > /* Last one doesn't continue. */ > > desc[prev].flags &=3D cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_= NEXT); > > - if (!indirect && vring_need_unmap_buffer(vq)) > > + if (!indirect && vring_need_unmap_buffer(vq, &extra[prev])) > > vq->split.desc_extra[prev & (vq->split.vring.num - 1)].= flags &=3D > > ~VRING_DESC_F_NEXT; > > > > @@ -643,19 +645,15 @@ static inline int virtqueue_add_split(struct virt= queue *_vq, > > dma_addr_t addr =3D vring_map_single( > > vq, desc, total_sg * sizeof(struct vring_desc), > > DMA_TO_DEVICE); > > - if (vring_mapping_error(vq, addr)) { > > - if (vq->premapped) > > - goto free_indirect; > > - > > + if (vring_mapping_error(vq, addr)) > > goto unmap_release; > > - } > > > > virtqueue_add_desc_split(_vq, vq->split.vring.desc, > > vq->split.desc_extra, > > head, addr, > > total_sg * sizeof(struct vring= _desc), > > VRING_DESC_F_INDIRECT, > > - false); > > + false, false); > > } > > > > /* We're using some buffers from the free list. */ > > @@ -712,7 +710,6 @@ static inline int virtqueue_add_split(struct virtqu= eue *_vq, > > i =3D vring_unmap_one_split(vq, &extra[i]); > > } > > > > -free_indirect: > > if (indirect) > > kfree(desc); > > > > @@ -794,7 +791,7 @@ static void detach_buf_split(struct vring_virtqueue= *vq, unsigned int head, > > VRING_DESC_F_INDIRECT)); > > BUG_ON(len =3D=3D 0 || len % sizeof(struct vring_desc)); > > > > - if (vring_need_unmap_buffer(vq)) { > > + if (vq->use_dma_api) { > > for (j =3D 0; j < len / sizeof(struct vring_des= c); j++) > > vring_unmap_one_split(vq, &extra[j]); > > } > > @@ -1228,7 +1225,7 @@ static void vring_unmap_extra_packed(const struct= vring_virtqueue *vq, > > (flags & VRING_DESC_F_WRITE) ? > > DMA_FROM_DEVICE : DMA_TO_DEVICE); > > } else { > > - if (!vring_need_unmap_buffer(vq)) > > + if (!vring_need_unmap_buffer(vq, extra)) > > return; > > > > dma_unmap_page(vring_dma_dev(vq), > > @@ -1309,7 +1306,7 @@ static int virtqueue_add_indirect_packed(struct v= ring_virtqueue *vq, > > i++; > > > > if (unlikely(vq->use_dma_api)) { > > - extra[i].addr =3D addr; > > + extra[i].addr =3D sg_is_premapped(sg) ?= DMA_MAPPING_ERROR : addr; > > extra[i].len =3D sg->length; > > extra[i].flags =3D n < out_sgs ? 0 : V= RING_DESC_F_WRITE; > > } > > @@ -1320,12 +1317,8 @@ static int virtqueue_add_indirect_packed(struct = vring_virtqueue *vq, > > addr =3D vring_map_single(vq, desc, > > total_sg * sizeof(struct vring_packed_desc), > > DMA_TO_DEVICE); > > - if (vring_mapping_error(vq, addr)) { > > - if (vq->premapped) > > - goto free_desc; > > - > > + if (vring_mapping_error(vq, addr)) > > goto unmap_release; > > - } > > > > vq->packed.vring.desc[head].addr =3D cpu_to_le64(addr); > > vq->packed.vring.desc[head].len =3D cpu_to_le32(total_sg * > > @@ -1383,7 +1376,6 @@ static int virtqueue_add_indirect_packed(struct v= ring_virtqueue *vq, > > for (i =3D 0; i < err_idx; i++) > > vring_unmap_extra_packed(vq, &extra[i]); > > > > -free_desc: > > kfree(desc); > > > > END_USE(vq); > > @@ -1474,7 +1466,8 @@ static inline int virtqueue_add_packed(struct vir= tqueue *_vq, > > desc[i].id =3D cpu_to_le16(id); > > > > if (unlikely(vq->use_dma_api)) { > > - vq->packed.desc_extra[curr].addr =3D ad= dr; > > + vq->packed.desc_extra[curr].addr =3D sg= _is_premapped(sg) ? > > + DMA_MAPPING_ERROR : addr; > > vq->packed.desc_extra[curr].len =3D sg-= >length; > > vq->packed.desc_extra[curr].flags =3D > > le16_to_cpu(flags); > > @@ -1625,10 +1618,9 @@ static void detach_buf_packed(struct vring_virtq= ueue *vq, > > if (!extra) > > return; > > > > - if (vring_need_unmap_buffer(vq)) { > > + if (vq->use_dma_api) { > > len =3D vq->packed.desc_extra[id].len; > > - for (i =3D 0; i < len / sizeof(struct vring_pac= ked_desc); > > - i++) > > + for (i =3D 0; i < len / sizeof(struct vring_pac= ked_desc); i++) > > Unnecessary changes? Will fix. Thanks. > > > > vring_unmap_extra_packed(vq, &extra[i]); > > } > > kfree(desc); > > @@ -2212,6 +2204,11 @@ static inline int virtqueue_add(struct virtqueue= *_vq, > > * @data: the token identifying the buffer. > > * @gfp: how to do memory allocations (if necessary). > > * > > + * When sg_page(sg) is NULL, this indicates that the driver has perfor= med DMA > > + * mapping in advance, allowing the virtio core to directly utilize > > + * sg_dma_address(sg) without conducting any internal DMA mapping. Add= itionally, > > + * DMA unmap operations for this buffer will be bypassed. > > + * > > * Caller must ensure we don't call this with other virtqueue operatio= ns > > * at the same time (except where noted). > > * > > @@ -2246,6 +2243,11 @@ EXPORT_SYMBOL_GPL(virtqueue_add_sgs); > > * @data: the token identifying the buffer. > > * @gfp: how to do memory allocations (if necessary). > > * > > + * When sg_page(sg) is NULL, this indicates that the driver has perfor= med DMA > > + * mapping in advance, allowing the virtio core to directly utilize > > + * sg_dma_address(sg) without conducting any internal DMA mapping. Add= itionally, > > + * DMA unmap operations for this buffer will be bypassed. > > + * > > * Caller must ensure we don't call this with other virtqueue operatio= ns > > * at the same time (except where noted). > > * > > @@ -2268,6 +2270,11 @@ EXPORT_SYMBOL_GPL(virtqueue_add_outbuf); > > * @data: the token identifying the buffer. > > * @gfp: how to do memory allocations (if necessary). > > * > > + * When sg_page(sg) is NULL, this indicates that the driver has perfor= med DMA > > + * mapping in advance, allowing the virtio core to directly utilize > > + * sg_dma_address(sg) without conducting any internal DMA mapping. Add= itionally, > > + * DMA unmap operations for this buffer will be bypassed. > > + * > > * Caller must ensure we don't call this with other virtqueue operatio= ns > > * at the same time (except where noted). > > * > > @@ -2291,6 +2298,11 @@ EXPORT_SYMBOL_GPL(virtqueue_add_inbuf); > > * @ctx: extra context for the token > > * @gfp: how to do memory allocations (if necessary). > > * > > + * When sg_page(sg) is NULL, this indicates that the driver has perfor= med DMA > > + * mapping in advance, allowing the virtio core to directly utilize > > + * sg_dma_address(sg) without conducting any internal DMA mapping. Add= itionally, > > + * DMA unmap operations for this buffer will be bypassed. > > + * > > * Caller must ensure we don't call this with other virtqueue operatio= ns > > * at the same time (except where noted). > > * > > -- > > 2.32.0.3.g01195cf9f > > > > Thanks >