From: "Michael S. Tsirkin" <mst@redhat.com>
To: Max Gurtovoy <mgurtovoy@nvidia.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>,
Jens Axboe <axboe@kernel.dk>,
hch@infradead.org, virtualization@lists.linux-foundation.org,
kvm@vger.kernel.org, israelr@nvidia.com, nitzanc@nvidia.com,
oren@nvidia.com, linux-block@vger.kernel.org
Subject: Re: [PATCH v3 1/1] virtio-blk: avoid preallocating big SGL for data
Date: Thu, 23 Sep 2021 11:37:42 -0400 [thread overview]
Message-ID: <20210923113644-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <56cf84e2-fec0-08e8-0a47-24bb1df71883@nvidia.com>
OK by me.
Acked-by: Michael S. Tsirkin <mst@redhat.com>
I will queue it for the next kernel.
Thanks!
On Thu, Sep 23, 2021 at 04:40:56PM +0300, Max Gurtovoy wrote:
> Hi MST/Jens,
>
> Do we need more review here or are we ok with the code and the test matrix ?
>
> If we're ok, we need to decide if this goes through virtio PR or block PR.
>
> Cheers,
>
> -Max.
>
> On 9/14/2021 3:22 PM, Stefan Hajnoczi wrote:
> > On Mon, Sep 13, 2021 at 05:50:21PM +0300, Max Gurtovoy wrote:
> > > On 9/6/2021 6:09 PM, Stefan Hajnoczi wrote:
> > > > On Wed, Sep 01, 2021 at 04:14:34PM +0300, Max Gurtovoy wrote:
> > > > > No need to pre-allocate a big buffer for the IO SGL anymore. If a device
> > > > > has lots of deep queues, preallocation for the sg list can consume
> > > > > substantial amounts of memory. For HW virtio-blk device, nr_hw_queues
> > > > > can be 64 or 128 and each queue's depth might be 128. This means the
> > > > > resulting preallocation for the data SGLs is big.
> > > > >
> > > > > Switch to runtime allocation for SGL for lists longer than 2 entries.
> > > > > This is the approach used by NVMe drivers so it should be reasonable for
> > > > > virtio block as well. Runtime SGL allocation has always been the case
> > > > > for the legacy I/O path so this is nothing new.
> > > > >
> > > > > The preallocated small SGL depends on SG_CHAIN so if the ARCH doesn't
> > > > > support SG_CHAIN, use only runtime allocation for the SGL.
> > > > >
> > > > > Re-organize the setup of the IO request to fit the new sg chain
> > > > > mechanism.
> > > > >
> > > > > No performance degradation was seen (fio libaio engine with 16 jobs and
> > > > > 128 iodepth):
> > > > >
> > > > > IO size IOPs Rand Read (before/after) IOPs Rand Write (before/after)
> > > > > -------- --------------------------------- ----------------------------------
> > > > > 512B 318K/316K 329K/325K
> > > > >
> > > > > 4KB 323K/321K 353K/349K
> > > > >
> > > > > 16KB 199K/208K 250K/275K
> > > > >
> > > > > 128KB 36K/36.1K 39.2K/41.7K
> > > > I ran fio randread benchmarks with 4k, 16k, 64k, and 128k at iodepth 1,
> > > > 8, and 64 on two vCPUs. The results look fine, there is no significant
> > > > regression.
> > > >
> > > > iodepth=1 and iodepth=64 are very consistent. For some reason the
> > > > iodepth=8 has significant variance but I don't think it's the fault of
> > > > this patch.
> > > >
> > > > Fio results and the Jupyter notebook export are available here (check
> > > > out benchmark.html to see the graphs):
> > > >
> > > > https://gitlab.com/stefanha/virt-playbooks/-/tree/virtio-blk-sgl-allocation-benchmark/notebook
> > > >
> > > > Guest:
> > > > - Fedora 34
> > > > - Linux v5.14
> > > > - 2 vCPUs (pinned), 4 GB RAM (single host NUMA node)
> > > > - 1 IOThread (pinned)
> > > > - virtio-blk aio=native,cache=none,format=raw
> > > > - QEMU 6.1.0
> > > >
> > > > Host:
> > > > - RHEL 8.3
> > > > - Linux 4.18.0-240.22.1.el8_3.x86_64
> > > > - Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz
> > > > - Intel Optane DC P4800X
> > > >
> > > > Stefan
> > > Thanks, Stefan.
> > >
> > > Would you like me to add some of the results in my commit msg ? or Tested-By
> > > sign ?
> > Thanks, there's no need to change the commit description.
> >
> > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> > Tested-by: Stefan Hajnoczi <stefanha@redhat.com>
WARNING: multiple messages have this Message-ID (diff)
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Max Gurtovoy <mgurtovoy@nvidia.com>
Cc: Jens Axboe <axboe@kernel.dk>,
linux-block@vger.kernel.org, kvm@vger.kernel.org,
israelr@nvidia.com, virtualization@lists.linux-foundation.org,
hch@infradead.org, nitzanc@nvidia.com,
Stefan Hajnoczi <stefanha@redhat.com>,
oren@nvidia.com
Subject: Re: [PATCH v3 1/1] virtio-blk: avoid preallocating big SGL for data
Date: Thu, 23 Sep 2021 11:37:42 -0400 [thread overview]
Message-ID: <20210923113644-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <56cf84e2-fec0-08e8-0a47-24bb1df71883@nvidia.com>
OK by me.
Acked-by: Michael S. Tsirkin <mst@redhat.com>
I will queue it for the next kernel.
Thanks!
On Thu, Sep 23, 2021 at 04:40:56PM +0300, Max Gurtovoy wrote:
> Hi MST/Jens,
>
> Do we need more review here or are we ok with the code and the test matrix ?
>
> If we're ok, we need to decide if this goes through virtio PR or block PR.
>
> Cheers,
>
> -Max.
>
> On 9/14/2021 3:22 PM, Stefan Hajnoczi wrote:
> > On Mon, Sep 13, 2021 at 05:50:21PM +0300, Max Gurtovoy wrote:
> > > On 9/6/2021 6:09 PM, Stefan Hajnoczi wrote:
> > > > On Wed, Sep 01, 2021 at 04:14:34PM +0300, Max Gurtovoy wrote:
> > > > > No need to pre-allocate a big buffer for the IO SGL anymore. If a device
> > > > > has lots of deep queues, preallocation for the sg list can consume
> > > > > substantial amounts of memory. For HW virtio-blk device, nr_hw_queues
> > > > > can be 64 or 128 and each queue's depth might be 128. This means the
> > > > > resulting preallocation for the data SGLs is big.
> > > > >
> > > > > Switch to runtime allocation for SGL for lists longer than 2 entries.
> > > > > This is the approach used by NVMe drivers so it should be reasonable for
> > > > > virtio block as well. Runtime SGL allocation has always been the case
> > > > > for the legacy I/O path so this is nothing new.
> > > > >
> > > > > The preallocated small SGL depends on SG_CHAIN so if the ARCH doesn't
> > > > > support SG_CHAIN, use only runtime allocation for the SGL.
> > > > >
> > > > > Re-organize the setup of the IO request to fit the new sg chain
> > > > > mechanism.
> > > > >
> > > > > No performance degradation was seen (fio libaio engine with 16 jobs and
> > > > > 128 iodepth):
> > > > >
> > > > > IO size IOPs Rand Read (before/after) IOPs Rand Write (before/after)
> > > > > -------- --------------------------------- ----------------------------------
> > > > > 512B 318K/316K 329K/325K
> > > > >
> > > > > 4KB 323K/321K 353K/349K
> > > > >
> > > > > 16KB 199K/208K 250K/275K
> > > > >
> > > > > 128KB 36K/36.1K 39.2K/41.7K
> > > > I ran fio randread benchmarks with 4k, 16k, 64k, and 128k at iodepth 1,
> > > > 8, and 64 on two vCPUs. The results look fine, there is no significant
> > > > regression.
> > > >
> > > > iodepth=1 and iodepth=64 are very consistent. For some reason the
> > > > iodepth=8 has significant variance but I don't think it's the fault of
> > > > this patch.
> > > >
> > > > Fio results and the Jupyter notebook export are available here (check
> > > > out benchmark.html to see the graphs):
> > > >
> > > > https://gitlab.com/stefanha/virt-playbooks/-/tree/virtio-blk-sgl-allocation-benchmark/notebook
> > > >
> > > > Guest:
> > > > - Fedora 34
> > > > - Linux v5.14
> > > > - 2 vCPUs (pinned), 4 GB RAM (single host NUMA node)
> > > > - 1 IOThread (pinned)
> > > > - virtio-blk aio=native,cache=none,format=raw
> > > > - QEMU 6.1.0
> > > >
> > > > Host:
> > > > - RHEL 8.3
> > > > - Linux 4.18.0-240.22.1.el8_3.x86_64
> > > > - Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz
> > > > - Intel Optane DC P4800X
> > > >
> > > > Stefan
> > > Thanks, Stefan.
> > >
> > > Would you like me to add some of the results in my commit msg ? or Tested-By
> > > sign ?
> > Thanks, there's no need to change the commit description.
> >
> > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> > Tested-by: Stefan Hajnoczi <stefanha@redhat.com>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
next prev parent reply other threads:[~2021-09-23 15:37 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-09-01 13:14 [PATCH v3 1/1] virtio-blk: avoid preallocating big SGL for data Max Gurtovoy
2021-09-01 14:50 ` Michael S. Tsirkin
2021-09-01 14:50 ` Michael S. Tsirkin
2021-09-01 14:58 ` Max Gurtovoy
2021-09-01 15:27 ` Jens Axboe
2021-09-01 15:27 ` Jens Axboe
2021-09-01 22:25 ` Max Gurtovoy
2021-09-02 2:08 ` Jens Axboe
2021-09-02 2:08 ` Jens Axboe
2021-09-02 12:21 ` Stefan Hajnoczi
2021-09-02 12:21 ` Stefan Hajnoczi
2021-09-02 12:41 ` Max Gurtovoy
2021-09-06 15:09 ` Stefan Hajnoczi
2021-09-06 15:09 ` Stefan Hajnoczi
2021-09-10 6:32 ` Feng Li
2021-09-10 6:32 ` Feng Li
2021-09-13 14:50 ` Max Gurtovoy
2021-09-14 12:22 ` Stefan Hajnoczi
2021-09-14 12:22 ` Stefan Hajnoczi
2021-09-23 13:40 ` Max Gurtovoy
2021-09-23 15:37 ` Michael S. Tsirkin [this message]
2021-09-23 15:37 ` Michael S. Tsirkin
2021-10-22 9:15 ` Michael S. Tsirkin
2021-10-22 9:15 ` Michael S. Tsirkin
2021-10-24 14:31 ` Max Gurtovoy
2021-09-27 11:59 ` Christoph Hellwig
2021-09-27 11:59 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210923113644-mutt-send-email-mst@kernel.org \
--to=mst@redhat.com \
--cc=axboe@kernel.dk \
--cc=hch@infradead.org \
--cc=israelr@nvidia.com \
--cc=kvm@vger.kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=mgurtovoy@nvidia.com \
--cc=nitzanc@nvidia.com \
--cc=oren@nvidia.com \
--cc=stefanha@redhat.com \
--cc=virtualization@lists.linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.