Re: [Qemu-arm] [Qemu-devel] [RFC PATCH] hw/arm/virt: use variable size of flash device to save memory

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Markus Armbruster <armbru@redhat.com>
To: Laszlo Ersek <lersek@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>,
	Peter Maydell <peter.maydell@linaro.org>,
	qemu-block@nongnu.org, Ard Biesheuvel <ard.biesheuvel@linaro.org>,
	QEMU Developers <qemu-devel@nongnu.org>,
	Max Reitz <mreitz@redhat.com>,
	Xiang Zheng <zhengxiang9@huawei.com>,
	qemu-arm <qemu-arm@nongnu.org>,
	Stefan Hajnoczi <stefanha@redhat.com>,
	Heyi Guo <guoheyi@huawei.com>,
	wanghaibin.wang@huawei.com
Subject: Re: [Qemu-arm] [Qemu-devel] [RFC PATCH] hw/arm/virt: use variable size of flash device to save memory
Date: Tue, 09 Apr 2019 08:01:27 +0200	[thread overview]
Message-ID: <87bm1fiuo8.fsf@dusky.pond.sub.org> (raw)
In-Reply-To: <a8144a1f-364e-cea5-b0a7-05342bdab936@redhat.com> (Laszlo Ersek's message of "Mon, 8 Apr 2019 18:14:15 +0200")

László's last sentence below is "This really needs the attention of the
block people."  Cc'ing some.

Laszlo Ersek <lersek@redhat.com> writes:

> On 04/08/19 15:43, Xiang Zheng wrote:
>> 
>> On 2019/4/3 23:35, Laszlo Ersek wrote:
>>>> I thought about your comments and wrote the following patch (just for test)
>>>> which uses a file mapping to replace the anonymous mapping. UEFI seems to work
>>>> fine. So why not use a file mapping to read or write a pflash device?
>>> Honestly I can't answer "why not do this". Maybe we should.
>>>
>>> Regarding "why not do this *exactly as shown below*" -- probably because
>>> then we'd have updates to the same underlying regular file via both
>>> mmap() and write(), and the interactions between those are really tricky
>>> (= best avoided).
>>>
>>> One of my favorite questions over the years used to be posting a matrix
>>> of possible mmap() and file descriptor r/w/sync interactions, with the
>>> outcomes as they were "implied" by POSIX, and then asking at the bottom,
>>> "is my understanding correct?" I've never ever received an answer to
>>> that. :)
>> 
>> In my opinion, it's feasible to r/w/sync the memory devices which use a block
>> backend via mmap() and write().
>
> Maybe. I think that would be a first in QEMU, and you'd likely have to
> describe and follow a semi-formal model between fd actions and mmap()
> actions.
>
> Here's the (unconfirmed) table I referred to earlier.
>
> +-------------+-----------------------------------------------------+
> | change made | change visible via                                  |
> | through     +-----------------+-------------+---------------------+
> |             | MAP_SHARED      | MAP_PRIVATE | read()              |
> +-------------+-----------------+-------------+---------------------+
> | MAP_SHARED  | yes             | unspecified | depends on MS_SYNC, |
> |             |                 |             | MS_ASYNC, or normal |
> |             |                 |             | system activity     |
> +-------------+-----------------+-------------+---------------------+
> | MAP_PRIVATE | no              | no          | no                  |
> +-------------+-----------------+-------------+---------------------+
> | write()     | depends on      | unspecified | yes                 |
> |             | MS_INVALIDATE,  |             |                     |
> |             | or the system's |             |                     |
> |             | read/write      |             |                     |
> |             | consistency     |             |                     |
> +-------------+-----------------+-------------+---------------------+
>
> Presumably:
>
> - a write() to some offset range of a regular file should be visible in
> a MAP_SHARED mapping of that range after a matching msync(...,
> MS_INVALIDATE) call;
>
> - changes through a MAP_SHARED mapping to a regular file should be
> visible via read() after a matching msync(..., MS_SYNC) call returns.
>
> I sent this table first in 2010 to the Austin Group mailing list, and
> received no comments. Then another person (on the same list) asked
> basically the same questions in 2013, to which I responded with the
> above assumptions / interpretations -- and received no comments from
> third parties again.
>
> But, I'm really out of my depth here -- we're not even discussing
> generic read()/write()/mmap()/munmap()/msync() interactions, but how
> they would fit into the qemu block layer. And I have no idea.
>
>> 
>>>
>>> Also... although we don't really use them in practice, some QCOW2
>>> features for pflash block backends are -- or would be -- nice. (Not the
>>> internal snapshots or the live RAM dumps, of course.)
>> 
>> Regarding sparse files, can we read actual backend size data for the read-only
>> pflash memory as the following steps shown?
>> 
>> 1) Create a sparse file -- QEMU_EFI-test.raw:
>>    dd if=/dev/zero of=QEMU_EFI-test.raw bs=1M seek=64 count=0
>> 
>> 2) Read from QEMU_EFI.fd and write to QEMU_EFI-test.raw:
>>    dd of="QEMU_EFI-test.raw" if="QEMU_EFI.fd" conv=notrunc
>> 
>> 3) Use QEMU_EFI-test.raw as the read-only pflash and start qemu with the below
>> patch applied:
>> 
>> ---8>---
>> 
>> diff --git a/hw/block/block.c b/hw/block/block.c
>> index bf56c76..ad18d5e 100644
>> --- a/hw/block/block.c
>> +++ b/hw/block/block.c
>> @@ -30,7 +30,7 @@
>>  bool blk_check_size_and_read_all(BlockBackend *blk, void *buf, hwaddr size,
>>                                   Error **errp)
>>  {
>> -    int64_t blk_len;
>> +    int64_t blk_len, actual_len;
>>      int ret;
>> 
>>      blk_len = blk_getlength(blk);
>> @@ -53,7 +53,9 @@ bool blk_check_size_and_read_all(BlockBackend *blk, void *buf, hwaddr size,
>>       * block device and read only on demand.
>>       */
>>      assert(size <= BDRV_REQUEST_MAX_BYTES);
>> -    ret = blk_pread(blk, 0, buf, size);
>> +    actual_len = bdrv_get_allocated_file_size(blk_bs(blk));
>> +    ret = blk_pread(blk, 0, buf,
>> +            (!blk_is_read_only(blk) || actual_len < 0) ? size : actual_len);
>>      if (ret < 0) {
>>          error_setg_errno(errp, -ret, "can't read block backend");
>>          return false;
>> 
>
> This hunk looks dubious for a general helper function. It seems to
> assume that the holes are punched at the end of the file.
>
> Sorry, this discussion is making me uncomfortable. I don't want to
> ignore your questions, but I have no idea about the right answers. This
> really needs the attention of the block people.

WARNING: multiple messages have this Message-ID (diff)

From: Markus Armbruster <armbru@redhat.com>
To: Laszlo Ersek <lersek@redhat.com>
Cc: Xiang Zheng <zhengxiang9@huawei.com>,
	Peter Maydell <peter.maydell@linaro.org>,
	Ard Biesheuvel <ard.biesheuvel@linaro.org>,
	QEMU Developers <qemu-devel@nongnu.org>,
	qemu-arm <qemu-arm@nongnu.org>, Heyi Guo <guoheyi@huawei.com>,
	wanghaibin.wang@huawei.com, qemu-block@nongnu.org,
	Kevin Wolf <kwolf@redhat.com>, Max Reitz <mreitz@redhat.com>,
	Stefan Hajnoczi <stefanha@redhat.com>
Subject: Re: [Qemu-devel] [RFC PATCH] hw/arm/virt: use variable size of flash device to save memory
Date: Tue, 09 Apr 2019 08:01:27 +0200	[thread overview]
Message-ID: <87bm1fiuo8.fsf@dusky.pond.sub.org> (raw)
In-Reply-To: <a8144a1f-364e-cea5-b0a7-05342bdab936@redhat.com> (Laszlo Ersek's message of "Mon, 8 Apr 2019 18:14:15 +0200")

László's last sentence below is "This really needs the attention of the
block people."  Cc'ing some.

Laszlo Ersek <lersek@redhat.com> writes:

> On 04/08/19 15:43, Xiang Zheng wrote:
>> 
>> On 2019/4/3 23:35, Laszlo Ersek wrote:
>>>> I thought about your comments and wrote the following patch (just for test)
>>>> which uses a file mapping to replace the anonymous mapping. UEFI seems to work
>>>> fine. So why not use a file mapping to read or write a pflash device?
>>> Honestly I can't answer "why not do this". Maybe we should.
>>>
>>> Regarding "why not do this *exactly as shown below*" -- probably because
>>> then we'd have updates to the same underlying regular file via both
>>> mmap() and write(), and the interactions between those are really tricky
>>> (= best avoided).
>>>
>>> One of my favorite questions over the years used to be posting a matrix
>>> of possible mmap() and file descriptor r/w/sync interactions, with the
>>> outcomes as they were "implied" by POSIX, and then asking at the bottom,
>>> "is my understanding correct?" I've never ever received an answer to
>>> that. :)
>> 
>> In my opinion, it's feasible to r/w/sync the memory devices which use a block
>> backend via mmap() and write().
>
> Maybe. I think that would be a first in QEMU, and you'd likely have to
> describe and follow a semi-formal model between fd actions and mmap()
> actions.
>
> Here's the (unconfirmed) table I referred to earlier.
>
> +-------------+-----------------------------------------------------+
> | change made | change visible via                                  |
> | through     +-----------------+-------------+---------------------+
> |             | MAP_SHARED      | MAP_PRIVATE | read()              |
> +-------------+-----------------+-------------+---------------------+
> | MAP_SHARED  | yes             | unspecified | depends on MS_SYNC, |
> |             |                 |             | MS_ASYNC, or normal |
> |             |                 |             | system activity     |
> +-------------+-----------------+-------------+---------------------+
> | MAP_PRIVATE | no              | no          | no                  |
> +-------------+-----------------+-------------+---------------------+
> | write()     | depends on      | unspecified | yes                 |
> |             | MS_INVALIDATE,  |             |                     |
> |             | or the system's |             |                     |
> |             | read/write      |             |                     |
> |             | consistency     |             |                     |
> +-------------+-----------------+-------------+---------------------+
>
> Presumably:
>
> - a write() to some offset range of a regular file should be visible in
> a MAP_SHARED mapping of that range after a matching msync(...,
> MS_INVALIDATE) call;
>
> - changes through a MAP_SHARED mapping to a regular file should be
> visible via read() after a matching msync(..., MS_SYNC) call returns.
>
> I sent this table first in 2010 to the Austin Group mailing list, and
> received no comments. Then another person (on the same list) asked
> basically the same questions in 2013, to which I responded with the
> above assumptions / interpretations -- and received no comments from
> third parties again.
>
> But, I'm really out of my depth here -- we're not even discussing
> generic read()/write()/mmap()/munmap()/msync() interactions, but how
> they would fit into the qemu block layer. And I have no idea.
>
>> 
>>>
>>> Also... although we don't really use them in practice, some QCOW2
>>> features for pflash block backends are -- or would be -- nice. (Not the
>>> internal snapshots or the live RAM dumps, of course.)
>> 
>> Regarding sparse files, can we read actual backend size data for the read-only
>> pflash memory as the following steps shown?
>> 
>> 1) Create a sparse file -- QEMU_EFI-test.raw:
>>    dd if=/dev/zero of=QEMU_EFI-test.raw bs=1M seek=64 count=0
>> 
>> 2) Read from QEMU_EFI.fd and write to QEMU_EFI-test.raw:
>>    dd of="QEMU_EFI-test.raw" if="QEMU_EFI.fd" conv=notrunc
>> 
>> 3) Use QEMU_EFI-test.raw as the read-only pflash and start qemu with the below
>> patch applied:
>> 
>> ---8>---
>> 
>> diff --git a/hw/block/block.c b/hw/block/block.c
>> index bf56c76..ad18d5e 100644
>> --- a/hw/block/block.c
>> +++ b/hw/block/block.c
>> @@ -30,7 +30,7 @@
>>  bool blk_check_size_and_read_all(BlockBackend *blk, void *buf, hwaddr size,
>>                                   Error **errp)
>>  {
>> -    int64_t blk_len;
>> +    int64_t blk_len, actual_len;
>>      int ret;
>> 
>>      blk_len = blk_getlength(blk);
>> @@ -53,7 +53,9 @@ bool blk_check_size_and_read_all(BlockBackend *blk, void *buf, hwaddr size,
>>       * block device and read only on demand.
>>       */
>>      assert(size <= BDRV_REQUEST_MAX_BYTES);
>> -    ret = blk_pread(blk, 0, buf, size);
>> +    actual_len = bdrv_get_allocated_file_size(blk_bs(blk));
>> +    ret = blk_pread(blk, 0, buf,
>> +            (!blk_is_read_only(blk) || actual_len < 0) ? size : actual_len);
>>      if (ret < 0) {
>>          error_setg_errno(errp, -ret, "can't read block backend");
>>          return false;
>> 
>
> This hunk looks dubious for a general helper function. It seems to
> assume that the holes are punched at the end of the file.
>
> Sorry, this discussion is making me uncomfortable. I don't want to
> ignore your questions, but I have no idea about the right answers. This
> really needs the attention of the block people.

WARNING: multiple messages have this Message-ID (diff)

From: Markus Armbruster <armbru@redhat.com>
To: Laszlo Ersek <lersek@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>,
	Peter Maydell <peter.maydell@linaro.org>,
	qemu-block@nongnu.org, Ard Biesheuvel <ard.biesheuvel@linaro.org>,
	QEMU Developers <qemu-devel@nongnu.org>,
	Max Reitz <mreitz@redhat.com>,
	Xiang Zheng <zhengxiang9@huawei.com>,
	qemu-arm <qemu-arm@nongnu.org>,
	Stefan Hajnoczi <stefanha@redhat.com>,
	Heyi Guo <guoheyi@huawei.com>,
	wanghaibin.wang@huawei.com
Subject: Re: [Qemu-devel] [RFC PATCH] hw/arm/virt: use variable size of flash device to save memory
Date: Tue, 09 Apr 2019 08:01:27 +0200	[thread overview]
Message-ID: <87bm1fiuo8.fsf@dusky.pond.sub.org> (raw)
Message-ID: <20190409060127.PszrfsQmK7gKx_B9J0rp9PciUvSswOkq-gTqSZvVD_4@z> (raw)
In-Reply-To: <a8144a1f-364e-cea5-b0a7-05342bdab936@redhat.com> (Laszlo Ersek's message of "Mon, 8 Apr 2019 18:14:15 +0200")

László's last sentence below is "This really needs the attention of the
block people."  Cc'ing some.

Laszlo Ersek <lersek@redhat.com> writes:

> On 04/08/19 15:43, Xiang Zheng wrote:
>> 
>> On 2019/4/3 23:35, Laszlo Ersek wrote:
>>>> I thought about your comments and wrote the following patch (just for test)
>>>> which uses a file mapping to replace the anonymous mapping. UEFI seems to work
>>>> fine. So why not use a file mapping to read or write a pflash device?
>>> Honestly I can't answer "why not do this". Maybe we should.
>>>
>>> Regarding "why not do this *exactly as shown below*" -- probably because
>>> then we'd have updates to the same underlying regular file via both
>>> mmap() and write(), and the interactions between those are really tricky
>>> (= best avoided).
>>>
>>> One of my favorite questions over the years used to be posting a matrix
>>> of possible mmap() and file descriptor r/w/sync interactions, with the
>>> outcomes as they were "implied" by POSIX, and then asking at the bottom,
>>> "is my understanding correct?" I've never ever received an answer to
>>> that. :)
>> 
>> In my opinion, it's feasible to r/w/sync the memory devices which use a block
>> backend via mmap() and write().
>
> Maybe. I think that would be a first in QEMU, and you'd likely have to
> describe and follow a semi-formal model between fd actions and mmap()
> actions.
>
> Here's the (unconfirmed) table I referred to earlier.
>
> +-------------+-----------------------------------------------------+
> | change made | change visible via                                  |
> | through     +-----------------+-------------+---------------------+
> |             | MAP_SHARED      | MAP_PRIVATE | read()              |
> +-------------+-----------------+-------------+---------------------+
> | MAP_SHARED  | yes             | unspecified | depends on MS_SYNC, |
> |             |                 |             | MS_ASYNC, or normal |
> |             |                 |             | system activity     |
> +-------------+-----------------+-------------+---------------------+
> | MAP_PRIVATE | no              | no          | no                  |
> +-------------+-----------------+-------------+---------------------+
> | write()     | depends on      | unspecified | yes                 |
> |             | MS_INVALIDATE,  |             |                     |
> |             | or the system's |             |                     |
> |             | read/write      |             |                     |
> |             | consistency     |             |                     |
> +-------------+-----------------+-------------+---------------------+
>
> Presumably:
>
> - a write() to some offset range of a regular file should be visible in
> a MAP_SHARED mapping of that range after a matching msync(...,
> MS_INVALIDATE) call;
>
> - changes through a MAP_SHARED mapping to a regular file should be
> visible via read() after a matching msync(..., MS_SYNC) call returns.
>
> I sent this table first in 2010 to the Austin Group mailing list, and
> received no comments. Then another person (on the same list) asked
> basically the same questions in 2013, to which I responded with the
> above assumptions / interpretations -- and received no comments from
> third parties again.
>
> But, I'm really out of my depth here -- we're not even discussing
> generic read()/write()/mmap()/munmap()/msync() interactions, but how
> they would fit into the qemu block layer. And I have no idea.
>
>> 
>>>
>>> Also... although we don't really use them in practice, some QCOW2
>>> features for pflash block backends are -- or would be -- nice. (Not the
>>> internal snapshots or the live RAM dumps, of course.)
>> 
>> Regarding sparse files, can we read actual backend size data for the read-only
>> pflash memory as the following steps shown?
>> 
>> 1) Create a sparse file -- QEMU_EFI-test.raw:
>>    dd if=/dev/zero of=QEMU_EFI-test.raw bs=1M seek=64 count=0
>> 
>> 2) Read from QEMU_EFI.fd and write to QEMU_EFI-test.raw:
>>    dd of="QEMU_EFI-test.raw" if="QEMU_EFI.fd" conv=notrunc
>> 
>> 3) Use QEMU_EFI-test.raw as the read-only pflash and start qemu with the below
>> patch applied:
>> 
>> ---8>---
>> 
>> diff --git a/hw/block/block.c b/hw/block/block.c
>> index bf56c76..ad18d5e 100644
>> --- a/hw/block/block.c
>> +++ b/hw/block/block.c
>> @@ -30,7 +30,7 @@
>>  bool blk_check_size_and_read_all(BlockBackend *blk, void *buf, hwaddr size,
>>                                   Error **errp)
>>  {
>> -    int64_t blk_len;
>> +    int64_t blk_len, actual_len;
>>      int ret;
>> 
>>      blk_len = blk_getlength(blk);
>> @@ -53,7 +53,9 @@ bool blk_check_size_and_read_all(BlockBackend *blk, void *buf, hwaddr size,
>>       * block device and read only on demand.
>>       */
>>      assert(size <= BDRV_REQUEST_MAX_BYTES);
>> -    ret = blk_pread(blk, 0, buf, size);
>> +    actual_len = bdrv_get_allocated_file_size(blk_bs(blk));
>> +    ret = blk_pread(blk, 0, buf,
>> +            (!blk_is_read_only(blk) || actual_len < 0) ? size : actual_len);
>>      if (ret < 0) {
>>          error_setg_errno(errp, -ret, "can't read block backend");
>>          return false;
>> 
>
> This hunk looks dubious for a general helper function. It seems to
> assume that the holes are punched at the end of the file.
>
> Sorry, this discussion is making me uncomfortable. I don't want to
> ignore your questions, but I have no idea about the right answers. This
> really needs the attention of the block people.

next prev parent reply	other threads:[~2019-04-09  6:01 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-25 12:51 [Qemu-arm] [RFC PATCH] hw/arm/virt: use variable size of flash device to save memory Xiang Zheng
2019-03-25 13:11 ` Peter Maydell
2019-03-25 14:03   ` Zheng Xiang
2019-03-26  6:17     ` [Qemu-arm] [Qemu-devel] " Markus Armbruster
2019-03-26 11:03       ` Laszlo Ersek
2019-03-26 16:39         ` Markus Armbruster
2019-03-26 17:10           ` Laszlo Ersek
2019-03-26 18:36             ` Markus Armbruster
2019-04-03 14:12               ` [Qemu-arm] " Xiang Zheng
2019-04-03 14:12                 ` Xiang Zheng
2019-04-03 15:35                 ` [Qemu-arm] " Laszlo Ersek
2019-04-03 15:35                   ` Laszlo Ersek
2019-04-08 13:43                   ` [Qemu-arm] " Xiang Zheng
2019-04-08 13:43                     ` Xiang Zheng
2019-04-08 16:14                     ` [Qemu-arm] " Laszlo Ersek
2019-04-08 16:14                       ` Laszlo Ersek
2019-04-09  3:39                       ` [Qemu-arm] " Xiang Zheng
2019-04-09  3:39                         ` Xiang Zheng
2019-04-09  6:01                       ` Markus Armbruster [this message]
2019-04-09  6:01                         ` Markus Armbruster
2019-04-09  6:01                         ` Markus Armbruster
2019-04-09  8:28                         ` [Qemu-arm] " Kevin Wolf
2019-04-09  8:28                           ` Kevin Wolf
2019-04-09  8:28                           ` Kevin Wolf
2019-04-10  8:36                           ` [Qemu-arm] " Xiang Zheng
2019-04-10  8:36                             ` Xiang Zheng
2019-04-10  8:36                             ` Xiang Zheng
2019-04-11  7:15                             ` [Qemu-arm] " Markus Armbruster
2019-04-11  7:15                               ` Markus Armbruster
2019-04-12  9:26                               ` [Qemu-arm] " Xiang Zheng
2019-04-12  9:26                                 ` Xiang Zheng
2019-04-11 12:22                             ` [Qemu-arm] " Kevin Wolf
2019-04-11 12:22                               ` Kevin Wolf
2019-04-11 12:22                               ` Kevin Wolf
2019-04-12  1:52                               ` [Qemu-arm] " Xiang Zheng
2019-04-12  1:52                                 ` Xiang Zheng
2019-04-12  1:52                                 ` Xiang Zheng
2019-04-12  9:50                                 ` [Qemu-arm] " Xiang Zheng
2019-04-12  9:50                                   ` Xiang Zheng
2019-04-12  9:50                                   ` Xiang Zheng
2019-04-12 10:57                                   ` [Qemu-arm] " Kevin Wolf
2019-04-12 10:57                                     ` Kevin Wolf
2019-04-12 10:57                                     ` Kevin Wolf
2019-04-15  2:39                                     ` [Qemu-arm] " Xiang Zheng
2019-04-15  2:39                                       ` Xiang Zheng
2019-04-15  2:39                                       ` Xiang Zheng
2019-04-22  1:37                                       ` [Qemu-arm] " Xiang Zheng
2019-04-22  1:37                                         ` Xiang Zheng
2019-04-22  1:37                                         ` Xiang Zheng
2019-03-25 14:07   ` Laszlo Ersek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87bm1fiuo8.fsf@dusky.pond.sub.org \
    --to=armbru@redhat.com \
    --cc=ard.biesheuvel@linaro.org \
    --cc=guoheyi@huawei.com \
    --cc=kwolf@redhat.com \
    --cc=lersek@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-arm@nongnu.org \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    --cc=wanghaibin.wang@huawei.com \
    --cc=zhengxiang9@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.