From: Kevin Wolf <kwolf@redhat.com>
To: Xiang Zheng <zhengxiang9@huawei.com>
Cc: Peter Maydell <peter.maydell@linaro.org>,
qemu-block@nongnu.org, Ard Biesheuvel <ard.biesheuvel@linaro.org>,
QEMU Developers <qemu-devel@nongnu.org>,
Markus Armbruster <armbru@redhat.com>,
qemu-arm <qemu-arm@nongnu.org>,
Stefan Hajnoczi <stefanha@redhat.com>,
Heyi Guo <guoheyi@huawei.com>,
wanghaibin.wang@huawei.com, Max Reitz <mreitz@redhat.com>,
Laszlo Ersek <lersek@redhat.com>
Subject: Re: [Qemu-arm] [Qemu-devel] [RFC PATCH] hw/arm/virt: use variable size of flash device to save memory
Date: Fri, 12 Apr 2019 12:57:37 +0200 [thread overview]
Message-ID: <20190412105737.GC4522@linux.fritz.box> (raw)
In-Reply-To: <355cd139-2084-e282-9a60-3574fe0746bd@huawei.com>
Am 12.04.2019 um 11:50 hat Xiang Zheng geschrieben:
>
> On 2019/4/12 9:52, Xiang Zheng wrote:
> > On 2019/4/11 20:22, Kevin Wolf wrote:
> >> Okay, so your problem is that blk_pread() writes to the whole buffer,
> >> writing explicit zeroes for unallocated parts of the image, while you
> >> would like to leave those parts of the buffer untouched so that we don't
> >> actually allocate the memory, but can just use the shared zero page.
> >>
> >> If you just want to read the non-zero parts of the image, that can be
> >> done by using a loop that calls bdrv_block_status() and only reads from
> >> the image if the BDRV_BLOCK_ZERO bit is clear.
> >>
> >> Would this solve your problem?
> >
> > Sounds good! What if guest tried to read/write the zero parts?
> >
>
> I wrote the below patch (refer to bdrv_make_zero()) for test, it seems
> that everything is OK and the memory is also exactly allocated on demand.
>
> This requires pflash devices to use sparse files backend. Thus I have to
> create images like:
>
> dd of="QEMU_EFI-pflash.raw" if="/dev/zero" bs=1M seek=64 count=0
> dd of="QEMU_EFI-pflash.raw" if="QEMU_EFI.fd" conv=notrunc
>
> dd of="empty_VARS.fd" if="/dev/zero" bs=1M seek=64 count=0
>
>
> ---8>---
>
> diff --git a/block/block-backend.c b/block/block-backend.c
> index f78e82a..ed8ca87 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -1379,6 +1379,12 @@ BlockAIOCB *blk_aio_pwrite_zeroes(BlockBackend *blk, int64_t offset,
> flags | BDRV_REQ_ZERO_WRITE, cb, opaque);
> }
>
> +int blk_pread_nonzeroes(BlockBackend *blk, void *buf)
> +{
> + int ret = bdrv_pread_nonzeroes(blk->root, buf);
> + return ret;
> +}
I don't think this deserves a place in the public block layer interface,
as it's only a single device that makes use of it.
Maybe you wrote things this way because there is no blk_block_status(),
but you can get the BlockDriverState with blk_bs(blk) and then implement
everything inside hw/block/block.c.
> int blk_pread(BlockBackend *blk, int64_t offset, void *buf, int count)
> {
> int ret = blk_prw(blk, offset, buf, count, blk_read_entry, 0);
> diff --git a/block/io.c b/block/io.c
> index dfc153b..83e5ea7 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -882,6 +882,38 @@ int bdrv_pwrite_zeroes(BdrvChild *child, int64_t offset,
> BDRV_REQ_ZERO_WRITE | flags);
> }
>
> +int bdrv_pread_nonzeroes(BdrvChild *child, void *buf)
> +{
> + int ret;
> + int64_t target_size, bytes, offset = 0;
> + BlockDriverState *bs = child->bs;
> +
> + target_size = bdrv_getlength(bs);
> + if (target_size < 0) {
> + return target_size;
> + }
> +
> + for (;;) {
> + bytes = MIN(target_size - offset, BDRV_REQUEST_MAX_BYTES);
> + if (bytes <= 0) {
> + return 0;
> + }
> + ret = bdrv_block_status(bs, offset, bytes, &bytes, NULL, NULL);
> + if (ret < 0) {
> + return ret;
> + }
> + if (ret & BDRV_BLOCK_ZERO) {
> + offset += bytes;
> + continue;
> + }
> + ret = bdrv_pread(child, offset, buf, bytes);
> + if (ret < 0) {
> + return ret;
> + }
> + offset += bytes;
I think the code becomes simpler the other way round:
if (!(ret & BDRV_BLOCK_ZERO)) {
ret = bdrv_pread(child, offset, buf, bytes);
if (ret < 0) {
return ret;
}
}
offset += bytes;
You don't increment buf, so if you have a hole in the file, this will
corrupt the buffer. You need to either increment buf, too, or use
(uint8_t*) buf + offset for the bdrv_pread() call.
> + }
> +}
> +
> /*
> * Completely zero out a block device with the help of bdrv_pwrite_zeroes.
> * The operation is sped up by checking the block status and only writing
Kevin
WARNING: multiple messages have this Message-ID (diff)
From: Kevin Wolf <kwolf@redhat.com>
To: Xiang Zheng <zhengxiang9@huawei.com>
Cc: Markus Armbruster <armbru@redhat.com>,
Laszlo Ersek <lersek@redhat.com>,
Peter Maydell <peter.maydell@linaro.org>,
Ard Biesheuvel <ard.biesheuvel@linaro.org>,
QEMU Developers <qemu-devel@nongnu.org>,
qemu-arm <qemu-arm@nongnu.org>, Heyi Guo <guoheyi@huawei.com>,
wanghaibin.wang@huawei.com, qemu-block@nongnu.org,
Max Reitz <mreitz@redhat.com>,
Stefan Hajnoczi <stefanha@redhat.com>
Subject: Re: [Qemu-devel] [RFC PATCH] hw/arm/virt: use variable size of flash device to save memory
Date: Fri, 12 Apr 2019 12:57:37 +0200 [thread overview]
Message-ID: <20190412105737.GC4522@linux.fritz.box> (raw)
In-Reply-To: <355cd139-2084-e282-9a60-3574fe0746bd@huawei.com>
Am 12.04.2019 um 11:50 hat Xiang Zheng geschrieben:
>
> On 2019/4/12 9:52, Xiang Zheng wrote:
> > On 2019/4/11 20:22, Kevin Wolf wrote:
> >> Okay, so your problem is that blk_pread() writes to the whole buffer,
> >> writing explicit zeroes for unallocated parts of the image, while you
> >> would like to leave those parts of the buffer untouched so that we don't
> >> actually allocate the memory, but can just use the shared zero page.
> >>
> >> If you just want to read the non-zero parts of the image, that can be
> >> done by using a loop that calls bdrv_block_status() and only reads from
> >> the image if the BDRV_BLOCK_ZERO bit is clear.
> >>
> >> Would this solve your problem?
> >
> > Sounds good! What if guest tried to read/write the zero parts?
> >
>
> I wrote the below patch (refer to bdrv_make_zero()) for test, it seems
> that everything is OK and the memory is also exactly allocated on demand.
>
> This requires pflash devices to use sparse files backend. Thus I have to
> create images like:
>
> dd of="QEMU_EFI-pflash.raw" if="/dev/zero" bs=1M seek=64 count=0
> dd of="QEMU_EFI-pflash.raw" if="QEMU_EFI.fd" conv=notrunc
>
> dd of="empty_VARS.fd" if="/dev/zero" bs=1M seek=64 count=0
>
>
> ---8>---
>
> diff --git a/block/block-backend.c b/block/block-backend.c
> index f78e82a..ed8ca87 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -1379,6 +1379,12 @@ BlockAIOCB *blk_aio_pwrite_zeroes(BlockBackend *blk, int64_t offset,
> flags | BDRV_REQ_ZERO_WRITE, cb, opaque);
> }
>
> +int blk_pread_nonzeroes(BlockBackend *blk, void *buf)
> +{
> + int ret = bdrv_pread_nonzeroes(blk->root, buf);
> + return ret;
> +}
I don't think this deserves a place in the public block layer interface,
as it's only a single device that makes use of it.
Maybe you wrote things this way because there is no blk_block_status(),
but you can get the BlockDriverState with blk_bs(blk) and then implement
everything inside hw/block/block.c.
> int blk_pread(BlockBackend *blk, int64_t offset, void *buf, int count)
> {
> int ret = blk_prw(blk, offset, buf, count, blk_read_entry, 0);
> diff --git a/block/io.c b/block/io.c
> index dfc153b..83e5ea7 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -882,6 +882,38 @@ int bdrv_pwrite_zeroes(BdrvChild *child, int64_t offset,
> BDRV_REQ_ZERO_WRITE | flags);
> }
>
> +int bdrv_pread_nonzeroes(BdrvChild *child, void *buf)
> +{
> + int ret;
> + int64_t target_size, bytes, offset = 0;
> + BlockDriverState *bs = child->bs;
> +
> + target_size = bdrv_getlength(bs);
> + if (target_size < 0) {
> + return target_size;
> + }
> +
> + for (;;) {
> + bytes = MIN(target_size - offset, BDRV_REQUEST_MAX_BYTES);
> + if (bytes <= 0) {
> + return 0;
> + }
> + ret = bdrv_block_status(bs, offset, bytes, &bytes, NULL, NULL);
> + if (ret < 0) {
> + return ret;
> + }
> + if (ret & BDRV_BLOCK_ZERO) {
> + offset += bytes;
> + continue;
> + }
> + ret = bdrv_pread(child, offset, buf, bytes);
> + if (ret < 0) {
> + return ret;
> + }
> + offset += bytes;
I think the code becomes simpler the other way round:
if (!(ret & BDRV_BLOCK_ZERO)) {
ret = bdrv_pread(child, offset, buf, bytes);
if (ret < 0) {
return ret;
}
}
offset += bytes;
You don't increment buf, so if you have a hole in the file, this will
corrupt the buffer. You need to either increment buf, too, or use
(uint8_t*) buf + offset for the bdrv_pread() call.
> + }
> +}
> +
> /*
> * Completely zero out a block device with the help of bdrv_pwrite_zeroes.
> * The operation is sped up by checking the block status and only writing
Kevin
WARNING: multiple messages have this Message-ID (diff)
From: Kevin Wolf <kwolf@redhat.com>
To: Xiang Zheng <zhengxiang9@huawei.com>
Cc: Peter Maydell <peter.maydell@linaro.org>,
qemu-block@nongnu.org, Ard Biesheuvel <ard.biesheuvel@linaro.org>,
QEMU Developers <qemu-devel@nongnu.org>,
Markus Armbruster <armbru@redhat.com>,
qemu-arm <qemu-arm@nongnu.org>,
Stefan Hajnoczi <stefanha@redhat.com>,
Heyi Guo <guoheyi@huawei.com>,
wanghaibin.wang@huawei.com, Max Reitz <mreitz@redhat.com>,
Laszlo Ersek <lersek@redhat.com>
Subject: Re: [Qemu-devel] [RFC PATCH] hw/arm/virt: use variable size of flash device to save memory
Date: Fri, 12 Apr 2019 12:57:37 +0200 [thread overview]
Message-ID: <20190412105737.GC4522@linux.fritz.box> (raw)
Message-ID: <20190412105737.8940EFIINx_HJgjZXXvFiBKmRygq1qgIaUkeOjvNGDU@z> (raw)
In-Reply-To: <355cd139-2084-e282-9a60-3574fe0746bd@huawei.com>
Am 12.04.2019 um 11:50 hat Xiang Zheng geschrieben:
>
> On 2019/4/12 9:52, Xiang Zheng wrote:
> > On 2019/4/11 20:22, Kevin Wolf wrote:
> >> Okay, so your problem is that blk_pread() writes to the whole buffer,
> >> writing explicit zeroes for unallocated parts of the image, while you
> >> would like to leave those parts of the buffer untouched so that we don't
> >> actually allocate the memory, but can just use the shared zero page.
> >>
> >> If you just want to read the non-zero parts of the image, that can be
> >> done by using a loop that calls bdrv_block_status() and only reads from
> >> the image if the BDRV_BLOCK_ZERO bit is clear.
> >>
> >> Would this solve your problem?
> >
> > Sounds good! What if guest tried to read/write the zero parts?
> >
>
> I wrote the below patch (refer to bdrv_make_zero()) for test, it seems
> that everything is OK and the memory is also exactly allocated on demand.
>
> This requires pflash devices to use sparse files backend. Thus I have to
> create images like:
>
> dd of="QEMU_EFI-pflash.raw" if="/dev/zero" bs=1M seek=64 count=0
> dd of="QEMU_EFI-pflash.raw" if="QEMU_EFI.fd" conv=notrunc
>
> dd of="empty_VARS.fd" if="/dev/zero" bs=1M seek=64 count=0
>
>
> ---8>---
>
> diff --git a/block/block-backend.c b/block/block-backend.c
> index f78e82a..ed8ca87 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -1379,6 +1379,12 @@ BlockAIOCB *blk_aio_pwrite_zeroes(BlockBackend *blk, int64_t offset,
> flags | BDRV_REQ_ZERO_WRITE, cb, opaque);
> }
>
> +int blk_pread_nonzeroes(BlockBackend *blk, void *buf)
> +{
> + int ret = bdrv_pread_nonzeroes(blk->root, buf);
> + return ret;
> +}
I don't think this deserves a place in the public block layer interface,
as it's only a single device that makes use of it.
Maybe you wrote things this way because there is no blk_block_status(),
but you can get the BlockDriverState with blk_bs(blk) and then implement
everything inside hw/block/block.c.
> int blk_pread(BlockBackend *blk, int64_t offset, void *buf, int count)
> {
> int ret = blk_prw(blk, offset, buf, count, blk_read_entry, 0);
> diff --git a/block/io.c b/block/io.c
> index dfc153b..83e5ea7 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -882,6 +882,38 @@ int bdrv_pwrite_zeroes(BdrvChild *child, int64_t offset,
> BDRV_REQ_ZERO_WRITE | flags);
> }
>
> +int bdrv_pread_nonzeroes(BdrvChild *child, void *buf)
> +{
> + int ret;
> + int64_t target_size, bytes, offset = 0;
> + BlockDriverState *bs = child->bs;
> +
> + target_size = bdrv_getlength(bs);
> + if (target_size < 0) {
> + return target_size;
> + }
> +
> + for (;;) {
> + bytes = MIN(target_size - offset, BDRV_REQUEST_MAX_BYTES);
> + if (bytes <= 0) {
> + return 0;
> + }
> + ret = bdrv_block_status(bs, offset, bytes, &bytes, NULL, NULL);
> + if (ret < 0) {
> + return ret;
> + }
> + if (ret & BDRV_BLOCK_ZERO) {
> + offset += bytes;
> + continue;
> + }
> + ret = bdrv_pread(child, offset, buf, bytes);
> + if (ret < 0) {
> + return ret;
> + }
> + offset += bytes;
I think the code becomes simpler the other way round:
if (!(ret & BDRV_BLOCK_ZERO)) {
ret = bdrv_pread(child, offset, buf, bytes);
if (ret < 0) {
return ret;
}
}
offset += bytes;
You don't increment buf, so if you have a hole in the file, this will
corrupt the buffer. You need to either increment buf, too, or use
(uint8_t*) buf + offset for the bdrv_pread() call.
> + }
> +}
> +
> /*
> * Completely zero out a block device with the help of bdrv_pwrite_zeroes.
> * The operation is sped up by checking the block status and only writing
Kevin
next prev parent reply other threads:[~2019-04-12 10:58 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-03-25 12:51 [Qemu-arm] [RFC PATCH] hw/arm/virt: use variable size of flash device to save memory Xiang Zheng
2019-03-25 13:11 ` Peter Maydell
2019-03-25 14:03 ` Zheng Xiang
2019-03-26 6:17 ` [Qemu-arm] [Qemu-devel] " Markus Armbruster
2019-03-26 11:03 ` Laszlo Ersek
2019-03-26 16:39 ` Markus Armbruster
2019-03-26 17:10 ` Laszlo Ersek
2019-03-26 18:36 ` Markus Armbruster
2019-04-03 14:12 ` [Qemu-arm] " Xiang Zheng
2019-04-03 14:12 ` Xiang Zheng
2019-04-03 15:35 ` [Qemu-arm] " Laszlo Ersek
2019-04-03 15:35 ` Laszlo Ersek
2019-04-08 13:43 ` [Qemu-arm] " Xiang Zheng
2019-04-08 13:43 ` Xiang Zheng
2019-04-08 16:14 ` [Qemu-arm] " Laszlo Ersek
2019-04-08 16:14 ` Laszlo Ersek
2019-04-09 3:39 ` [Qemu-arm] " Xiang Zheng
2019-04-09 3:39 ` Xiang Zheng
2019-04-09 6:01 ` [Qemu-arm] " Markus Armbruster
2019-04-09 6:01 ` Markus Armbruster
2019-04-09 6:01 ` Markus Armbruster
2019-04-09 8:28 ` [Qemu-arm] " Kevin Wolf
2019-04-09 8:28 ` Kevin Wolf
2019-04-09 8:28 ` Kevin Wolf
2019-04-10 8:36 ` [Qemu-arm] " Xiang Zheng
2019-04-10 8:36 ` Xiang Zheng
2019-04-10 8:36 ` Xiang Zheng
2019-04-11 7:15 ` [Qemu-arm] " Markus Armbruster
2019-04-11 7:15 ` Markus Armbruster
2019-04-12 9:26 ` [Qemu-arm] " Xiang Zheng
2019-04-12 9:26 ` Xiang Zheng
2019-04-11 12:22 ` [Qemu-arm] " Kevin Wolf
2019-04-11 12:22 ` Kevin Wolf
2019-04-11 12:22 ` Kevin Wolf
2019-04-12 1:52 ` [Qemu-arm] " Xiang Zheng
2019-04-12 1:52 ` Xiang Zheng
2019-04-12 1:52 ` Xiang Zheng
2019-04-12 9:50 ` [Qemu-arm] " Xiang Zheng
2019-04-12 9:50 ` Xiang Zheng
2019-04-12 9:50 ` Xiang Zheng
2019-04-12 10:57 ` Kevin Wolf [this message]
2019-04-12 10:57 ` Kevin Wolf
2019-04-12 10:57 ` Kevin Wolf
2019-04-15 2:39 ` [Qemu-arm] " Xiang Zheng
2019-04-15 2:39 ` Xiang Zheng
2019-04-15 2:39 ` Xiang Zheng
2019-04-22 1:37 ` [Qemu-arm] " Xiang Zheng
2019-04-22 1:37 ` Xiang Zheng
2019-04-22 1:37 ` Xiang Zheng
2019-03-25 14:07 ` Laszlo Ersek
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190412105737.GC4522@linux.fritz.box \
--to=kwolf@redhat.com \
--cc=ard.biesheuvel@linaro.org \
--cc=armbru@redhat.com \
--cc=guoheyi@huawei.com \
--cc=lersek@redhat.com \
--cc=mreitz@redhat.com \
--cc=peter.maydell@linaro.org \
--cc=qemu-arm@nongnu.org \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
--cc=wanghaibin.wang@huawei.com \
--cc=zhengxiang9@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.