From: "Michael S. Tsirkin" <mst@redhat.com>
To: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Cc: netdev <netdev@vger.kernel.org>,
"David S. Miller" <davem@davemloft.net>,
Jakub Kicinski <kuba@kernel.org>,
virtualization <virtualization@lists.linux-foundation.org>
Subject: Re: [PATCH 1/3] virtio: cache indirect desc for split
Date: Sun, 31 Oct 2021 10:47:34 -0400 [thread overview]
Message-ID: <20211031104700-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <1635401763.7680635-3-xuanzhuo@linux.alibaba.com>
On Thu, Oct 28, 2021 at 02:16:03PM +0800, Xuan Zhuo wrote:
> On Thu, 28 Oct 2021 10:16:10 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Thu, Oct 28, 2021 at 1:07 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Wed, Oct 27, 2021 at 02:19:11PM +0800, Xuan Zhuo wrote:
> > > > In the case of using indirect, indirect desc must be allocated and
> > > > released each time, which increases a lot of cpu overhead.
> > > >
> > > > Here, a cache is added for indirect. If the number of indirect desc to be
> > > > applied for is less than VIRT_QUEUE_CACHE_DESC_NUM, the desc array with
> > > > the size of VIRT_QUEUE_CACHE_DESC_NUM is fixed and cached for reuse.
> > > >
> > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > ---
> > > > drivers/virtio/virtio.c | 6 ++++
> > > > drivers/virtio/virtio_ring.c | 63 ++++++++++++++++++++++++++++++------
> > > > include/linux/virtio.h | 10 ++++++
> > > > 3 files changed, 70 insertions(+), 9 deletions(-)
> > > >
> > > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > > > index 0a5b54034d4b..04bcb74e5b9a 100644
> > > > --- a/drivers/virtio/virtio.c
> > > > +++ b/drivers/virtio/virtio.c
> > > > @@ -431,6 +431,12 @@ bool is_virtio_device(struct device *dev)
> > > > }
> > > > EXPORT_SYMBOL_GPL(is_virtio_device);
> > > >
> > > > +void virtio_use_desc_cache(struct virtio_device *dev, bool val)
> > > > +{
> > > > + dev->desc_cache = val;
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(virtio_use_desc_cache);
> > > > +
> > > > void unregister_virtio_device(struct virtio_device *dev)
> > > > {
> > > > int index = dev->index; /* save for after device release */
> > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > index dd95dfd85e98..0b9a8544b0e8 100644
> > > > --- a/drivers/virtio/virtio_ring.c
> > > > +++ b/drivers/virtio/virtio_ring.c
> > > > @@ -117,6 +117,10 @@ struct vring_virtqueue {
> > > > /* Hint for event idx: already triggered no need to disable. */
> > > > bool event_triggered;
> > > >
> > > > + /* Is indirect cache used? */
> > > > + bool use_desc_cache;
> > > > + void *desc_cache_chain;
> > > > +
> > > > union {
> > > > /* Available for split ring */
> > > > struct {
> > > > @@ -423,12 +427,47 @@ static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
> > > > return extra[i].next;
> > > > }
> > > >
> > > > -static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq,
> > > > +#define VIRT_QUEUE_CACHE_DESC_NUM 4
> > > > +
> > > > +static void desc_cache_chain_free_split(void *chain)
> > > > +{
> > > > + struct vring_desc *desc;
> > > > +
> > > > + while (chain) {
> > > > + desc = chain;
> > > > + chain = (void *)desc->addr;
> > > > + kfree(desc);
> > > > + }
> > > > +}
> > > > +
> > > > +static void desc_cache_put_split(struct vring_virtqueue *vq,
> > > > + struct vring_desc *desc, int n)
> > > > +{
> > > > + if (vq->use_desc_cache && n <= VIRT_QUEUE_CACHE_DESC_NUM) {
> > > > + desc->addr = (u64)vq->desc_cache_chain;
> > > > + vq->desc_cache_chain = desc;
> > > > + } else {
> > > > + kfree(desc);
> > > > + }
> > > > +}
> > > > +
> > >
> > >
> > > So I have a question here. What happens if we just do:
> > >
> > > if (n <= VIRT_QUEUE_CACHE_DESC_NUM) {
> > > return kmem_cache_alloc(VIRT_QUEUE_CACHE_DESC_NUM * sizeof desc, gfp)
> > > } else {
> > > return kmalloc_arrat(n, sizeof desc, gfp)
> > > }
> > >
> > > A small change and won't we reap most performance benefits?
> >
> > Yes, I think we need a benchmark to use private cache to see how much
> > it can help.
>
> I did a test, the code is as follows:
>
> +static void desc_cache_put_packed(struct vring_virtqueue *vq,
> + struct vring_packed_desc *desc, int n)
> + {
> + if (n <= VIRT_QUEUE_CACHE_DESC_NUM) {
> + kmem_cache_free(vq->desc_cache, desc);
> + } else {
> + kfree(desc);
> + }
>
>
> @@ -476,11 +452,14 @@ static struct vring_desc *alloc_indirect_split(struct vring_virtqueue *vq,
> */
> gfp &= ~__GFP_HIGHMEM;
>
> - desc = kmalloc_array(n, sizeof(struct vring_desc), gfp);
> + if (total_sg <= VIRT_QUEUE_CACHE_DESC_NUM)
> + desc = kmem_cache_alloc(vq->desc_cache, gfp);
> + else
> + desc = kmalloc_array(total_sg, sizeof(struct vring_desc), gfp);
> +
>
> .......
>
> + vq->desc_cache = kmem_cache_create("virio_desc",
> + 4 * sizeof(struct vring_desc),
> + 0, 0, NULL);
>
> The effect is not good, basically there is no improvement, using perf top can
> see that the overhead of kmem_cache_alloc/kmem_cache_free is also relatively
> large:
>
> 26.91% [kernel] [k] virtqueue_add
> 15.35% [kernel] [k] detach_buf_split
> 14.15% [kernel] [k] virtnet_xsk_xmit
> 13.24% [kernel] [k] virtqueue_add_outbuf
> > 9.30% [kernel] [k] __slab_free
> > 3.91% [kernel] [k] kmem_cache_alloc
> 2.85% [kernel] [k] virtqueue_get_buf_ctx
> > 2.82% [kernel] [k] kmem_cache_free
> 2.54% [kernel] [k] memset_erms
> 2.37% [kernel] [k] xsk_tx_peek_desc
> 1.20% [kernel] [k] virtnet_xsk_run
> 0.81% [kernel] [k] vring_map_one_sg
> 0.69% [kernel] [k] __free_old_xmit_ptr
> 0.69% [kernel] [k] virtqueue_kick_prepare
> 0.43% [kernel] [k] sg_init_table
> 0.41% [kernel] [k] sg_next
> 0.28% [kernel] [k] vring_unmap_one_split
> 0.25% [kernel] [k] vring_map_single.constprop.34
> 0.24% [kernel] [k] net_rx_action
>
> Thanks.
How about batching these? E.g. kmem_cache_alloc_bulk/kmem_cache_free_bulk?
> >
> > Thanks
> >
> > >
> > > --
> > > MST
> > >
> >
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
WARNING: multiple messages have this Message-ID (diff)
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Cc: Jason Wang <jasowang@redhat.com>,
virtualization <virtualization@lists.linux-foundation.org>,
netdev <netdev@vger.kernel.org>,
"David S. Miller" <davem@davemloft.net>,
Jakub Kicinski <kuba@kernel.org>
Subject: Re: [PATCH 1/3] virtio: cache indirect desc for split
Date: Sun, 31 Oct 2021 10:47:34 -0400 [thread overview]
Message-ID: <20211031104700-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <1635401763.7680635-3-xuanzhuo@linux.alibaba.com>
On Thu, Oct 28, 2021 at 02:16:03PM +0800, Xuan Zhuo wrote:
> On Thu, 28 Oct 2021 10:16:10 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Thu, Oct 28, 2021 at 1:07 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Wed, Oct 27, 2021 at 02:19:11PM +0800, Xuan Zhuo wrote:
> > > > In the case of using indirect, indirect desc must be allocated and
> > > > released each time, which increases a lot of cpu overhead.
> > > >
> > > > Here, a cache is added for indirect. If the number of indirect desc to be
> > > > applied for is less than VIRT_QUEUE_CACHE_DESC_NUM, the desc array with
> > > > the size of VIRT_QUEUE_CACHE_DESC_NUM is fixed and cached for reuse.
> > > >
> > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > ---
> > > > drivers/virtio/virtio.c | 6 ++++
> > > > drivers/virtio/virtio_ring.c | 63 ++++++++++++++++++++++++++++++------
> > > > include/linux/virtio.h | 10 ++++++
> > > > 3 files changed, 70 insertions(+), 9 deletions(-)
> > > >
> > > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > > > index 0a5b54034d4b..04bcb74e5b9a 100644
> > > > --- a/drivers/virtio/virtio.c
> > > > +++ b/drivers/virtio/virtio.c
> > > > @@ -431,6 +431,12 @@ bool is_virtio_device(struct device *dev)
> > > > }
> > > > EXPORT_SYMBOL_GPL(is_virtio_device);
> > > >
> > > > +void virtio_use_desc_cache(struct virtio_device *dev, bool val)
> > > > +{
> > > > + dev->desc_cache = val;
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(virtio_use_desc_cache);
> > > > +
> > > > void unregister_virtio_device(struct virtio_device *dev)
> > > > {
> > > > int index = dev->index; /* save for after device release */
> > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > index dd95dfd85e98..0b9a8544b0e8 100644
> > > > --- a/drivers/virtio/virtio_ring.c
> > > > +++ b/drivers/virtio/virtio_ring.c
> > > > @@ -117,6 +117,10 @@ struct vring_virtqueue {
> > > > /* Hint for event idx: already triggered no need to disable. */
> > > > bool event_triggered;
> > > >
> > > > + /* Is indirect cache used? */
> > > > + bool use_desc_cache;
> > > > + void *desc_cache_chain;
> > > > +
> > > > union {
> > > > /* Available for split ring */
> > > > struct {
> > > > @@ -423,12 +427,47 @@ static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
> > > > return extra[i].next;
> > > > }
> > > >
> > > > -static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq,
> > > > +#define VIRT_QUEUE_CACHE_DESC_NUM 4
> > > > +
> > > > +static void desc_cache_chain_free_split(void *chain)
> > > > +{
> > > > + struct vring_desc *desc;
> > > > +
> > > > + while (chain) {
> > > > + desc = chain;
> > > > + chain = (void *)desc->addr;
> > > > + kfree(desc);
> > > > + }
> > > > +}
> > > > +
> > > > +static void desc_cache_put_split(struct vring_virtqueue *vq,
> > > > + struct vring_desc *desc, int n)
> > > > +{
> > > > + if (vq->use_desc_cache && n <= VIRT_QUEUE_CACHE_DESC_NUM) {
> > > > + desc->addr = (u64)vq->desc_cache_chain;
> > > > + vq->desc_cache_chain = desc;
> > > > + } else {
> > > > + kfree(desc);
> > > > + }
> > > > +}
> > > > +
> > >
> > >
> > > So I have a question here. What happens if we just do:
> > >
> > > if (n <= VIRT_QUEUE_CACHE_DESC_NUM) {
> > > return kmem_cache_alloc(VIRT_QUEUE_CACHE_DESC_NUM * sizeof desc, gfp)
> > > } else {
> > > return kmalloc_arrat(n, sizeof desc, gfp)
> > > }
> > >
> > > A small change and won't we reap most performance benefits?
> >
> > Yes, I think we need a benchmark to use private cache to see how much
> > it can help.
>
> I did a test, the code is as follows:
>
> +static void desc_cache_put_packed(struct vring_virtqueue *vq,
> + struct vring_packed_desc *desc, int n)
> + {
> + if (n <= VIRT_QUEUE_CACHE_DESC_NUM) {
> + kmem_cache_free(vq->desc_cache, desc);
> + } else {
> + kfree(desc);
> + }
>
>
> @@ -476,11 +452,14 @@ static struct vring_desc *alloc_indirect_split(struct vring_virtqueue *vq,
> */
> gfp &= ~__GFP_HIGHMEM;
>
> - desc = kmalloc_array(n, sizeof(struct vring_desc), gfp);
> + if (total_sg <= VIRT_QUEUE_CACHE_DESC_NUM)
> + desc = kmem_cache_alloc(vq->desc_cache, gfp);
> + else
> + desc = kmalloc_array(total_sg, sizeof(struct vring_desc), gfp);
> +
>
> .......
>
> + vq->desc_cache = kmem_cache_create("virio_desc",
> + 4 * sizeof(struct vring_desc),
> + 0, 0, NULL);
>
> The effect is not good, basically there is no improvement, using perf top can
> see that the overhead of kmem_cache_alloc/kmem_cache_free is also relatively
> large:
>
> 26.91% [kernel] [k] virtqueue_add
> 15.35% [kernel] [k] detach_buf_split
> 14.15% [kernel] [k] virtnet_xsk_xmit
> 13.24% [kernel] [k] virtqueue_add_outbuf
> > 9.30% [kernel] [k] __slab_free
> > 3.91% [kernel] [k] kmem_cache_alloc
> 2.85% [kernel] [k] virtqueue_get_buf_ctx
> > 2.82% [kernel] [k] kmem_cache_free
> 2.54% [kernel] [k] memset_erms
> 2.37% [kernel] [k] xsk_tx_peek_desc
> 1.20% [kernel] [k] virtnet_xsk_run
> 0.81% [kernel] [k] vring_map_one_sg
> 0.69% [kernel] [k] __free_old_xmit_ptr
> 0.69% [kernel] [k] virtqueue_kick_prepare
> 0.43% [kernel] [k] sg_init_table
> 0.41% [kernel] [k] sg_next
> 0.28% [kernel] [k] vring_unmap_one_split
> 0.25% [kernel] [k] vring_map_single.constprop.34
> 0.24% [kernel] [k] net_rx_action
>
> Thanks.
How about batching these? E.g. kmem_cache_alloc_bulk/kmem_cache_free_bulk?
> >
> > Thanks
> >
> > >
> > > --
> > > MST
> > >
> >
next prev parent reply other threads:[~2021-10-31 14:47 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-10-27 6:19 [PATCH 0/3] virtio support cache indirect desc Xuan Zhuo
2021-10-27 6:19 ` Xuan Zhuo
2021-10-27 6:19 ` [PATCH 1/3] virtio: cache indirect desc for split Xuan Zhuo
2021-10-27 6:19 ` Xuan Zhuo
2021-10-27 8:55 ` Michael S. Tsirkin
2021-10-27 8:55 ` Michael S. Tsirkin
2021-10-27 9:03 ` Xuan Zhuo
2021-10-27 16:33 ` Dongli Zhang
2021-10-27 16:33 ` Dongli Zhang
2021-10-27 19:36 ` Michael S. Tsirkin
2021-10-27 19:36 ` Michael S. Tsirkin
2021-10-27 17:07 ` Michael S. Tsirkin
2021-10-27 17:07 ` Michael S. Tsirkin
2021-10-28 2:16 ` Jason Wang
2021-10-28 2:16 ` Jason Wang
2021-10-28 6:16 ` Xuan Zhuo
2021-10-31 14:47 ` Michael S. Tsirkin [this message]
2021-10-31 14:47 ` Michael S. Tsirkin
2021-10-27 23:02 ` kernel test robot
2021-10-27 23:02 ` kernel test robot
2021-10-27 23:02 ` kernel test robot
2021-10-28 0:57 ` kernel test robot
2021-10-28 0:57 ` kernel test robot
2021-10-28 0:57 ` kernel test robot
2021-10-27 6:19 ` [PATCH 2/3] virtio: cache indirect desc for packed Xuan Zhuo
2021-10-27 6:19 ` Xuan Zhuo
2021-10-28 0:28 ` kernel test robot
2021-10-28 0:28 ` kernel test robot
2021-10-28 0:28 ` kernel test robot
2021-10-28 3:51 ` kernel test robot
2021-10-28 3:51 ` kernel test robot
2021-10-28 3:51 ` kernel test robot
2021-10-28 7:38 ` kernel test robot
2021-10-28 7:38 ` kernel test robot
2021-10-28 7:38 ` kernel test robot
2021-10-27 6:19 ` [PATCH 3/3] virtio-net: enable virtio indirect cache Xuan Zhuo
2021-10-27 6:19 ` Xuan Zhuo
2021-10-27 15:55 ` Jakub Kicinski
2021-10-28 1:57 ` Xuan Zhuo
2021-10-28 2:28 ` Jason Wang
2021-10-28 2:28 ` Jason Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20211031104700-mutt-send-email-mst@kernel.org \
--to=mst@redhat.com \
--cc=davem@davemloft.net \
--cc=kuba@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=virtualization@lists.linux-foundation.org \
--cc=xuanzhuo@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.