qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Peter Xu <peterx@redhat.com>
Cc: "Marcel Apfelbaum" <mapfelba@redhat.com>,
	"Cornelia Huck" <cohuck@redhat.com>,
	"Eduardo Habkost" <ehabkost@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"Stefan Weil" <sw@weilnetz.de>,
	"Murilo Opsfelder Araujo" <muriloo@linux.ibm.com>,
	"Richard Henderson" <richard.henderson@linaro.org>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	"Juan Quintela" <quintela@redhat.com>,
	qemu-devel@nongnu.org, "Halil Pasic" <pasic@linux.ibm.com>,
	"Christian Borntraeger" <borntraeger@de.ibm.com>,
	"Greg Kurz" <groug@kaod.org>,
	"Stefan Hajnoczi" <stefanha@redhat.com>,
	"Igor Mammedov" <imammedo@redhat.com>,
	"Thomas Huth" <thuth@redhat.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Philippe Mathieu-Daudé" <philmd@redhat.com>,
	"Igor Kotrasinski" <i.kotrasinsk@partner.samsung.com>
Subject: Re: [PATCH v3 09/12] util/mmap-alloc: Pass flags instead of separate bools to qemu_ram_mmap()
Date: Wed, 10 Mar 2021 11:55:58 +0100	[thread overview]
Message-ID: <2a597019-4847-1af8-bceb-1664cee108b1@redhat.com> (raw)
In-Reply-To: <1831f4b8-9dd2-4fc1-c3cd-1e789a7bc37f@redhat.com>

On 10.03.21 11:11, David Hildenbrand wrote:
> On 10.03.21 09:41, David Hildenbrand wrote:
>> On 09.03.21 21:58, Peter Xu wrote:
>>> On Tue, Mar 09, 2021 at 09:27:10PM +0100, David Hildenbrand wrote:
>>>>
>>>>> Am 09.03.2021 um 21:04 schrieb Peter Xu <peterx@redhat.com>:
>>>>>
>>>>> On Mon, Mar 08, 2021 at 04:05:57PM +0100, David Hildenbrand wrote:
>>>>>> Let's introduce a new set of flags that abstract mmap logic and replace
>>>>>> our current set of bools, to prepare for another flag.
>>>>>>
>>>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>>>> ---
>>>>>> include/qemu/mmap-alloc.h | 17 +++++++++++------
>>>>>> softmmu/physmem.c         |  8 +++++---
>>>>>> util/mmap-alloc.c         | 14 +++++++-------
>>>>>> util/oslib-posix.c        |  3 ++-
>>>>>> 4 files changed, 25 insertions(+), 17 deletions(-)
>>>>>>
>>>>>> diff --git a/include/qemu/mmap-alloc.h b/include/qemu/mmap-alloc.h
>>>>>> index 456ff87df1..55664ea9f3 100644
>>>>>> --- a/include/qemu/mmap-alloc.h
>>>>>> +++ b/include/qemu/mmap-alloc.h
>>>>>> @@ -6,6 +6,15 @@ size_t qemu_fd_getpagesize(int fd);
>>>>>>
>>>>>> size_t qemu_mempath_getpagesize(const char *mem_path);
>>>>>>
>>>>>> +/* Map PROT_READ instead of PROT_READ|PROT_WRITE. */
>>>>>> +#define QEMU_RAM_MMAP_READONLY      (1 << 0)
>>>>>> +
>>>>>> +/* Map MAP_SHARED instead of MAP_PRIVATE. */
>>>>>> +#define QEMU_RAM_MMAP_SHARED        (1 << 1)
>>>>>> +
>>>>>> +/* Map MAP_SYNC|MAP_SHARED_VALIDATE if possible, fallback and warn otherwise. */
>>>>>> +#define QEMU_RAM_MMAP_PMEM          (1 << 2)
>>>>>
>>>>> Sorry to speak late - I just noticed that is_pmem can actually be converted too
>>>>> with "MAP_SYNC | MAP_SHARED_VALIDATE".  We can even define MAP_PMEM_EXTRA for
>>>>> use within qemu if we want.  Then we can avoid one layer of QEMU_RAM_* by
>>>>> directly using MAP_*, I think?
>>>>>
>>>>
>>>> No problem :) I don‘t think passing in random MAP_ flags is a good interface (we would at least need an allow list).
>>>>
>>>>     I like the abstraction / explicit semenatics of QEMU_RAM_MMAP_PMEM as spelled out in the comment. Doing the fallback when passing in the mmap flags is a little ugly. We could do the fallback in the caller, I think I remember there is only a single call site.
>>>>
>>>> PROT_READ won‘t be covered as well, not sure if passing in protections improves the interface.
>>>>
>>>> Long story short, I like the abstraction provided by these flags, only exporting what we actually support/abstracting it, and setting some MAP_ flags automatically (MAP_PRIVATE, MAP_ANON) instead of having to spell that put in the caller.
>>>
>>> Yeh the READONLY flag would be special, it will need to be separated from the
>>> rest flags.  I'd keep my own preference, but if you really like the current
>>> way, maybe at least move it to qemu/osdep.h?  So at least when someone needs a
>>> cross-platform flag they'll show up - while mmap-alloc.h looks still only for
>>> the posix world, then it'll be odd to introduce these flags only for posix even
>>> if posix definied most of them.
>>
>> I'll give it another thought today. I certainly want to avoid moving all
>> that MAP_ flag and PROT_ logic to the callers. E.g., MAP_SHARED implies
>> !MAP_PRIVATE. MAP_SYNC implies that we want MAP_SHARED_VALIDATE. fd < 0
>> implies MAP_ANONYMOUS.
>>
>> Maybe something like
>>
>> /*
>>     * QEMU's MMAP abstraction to map guest RAM, taking care of alignment
>>     * requirements and guard pages.
>>     *
>>     * Supported flags: MAP_SHARED, MAP_SYNC
>>     *
>>     * Implicitly set flags:
>>     * - MAP PRIVATE: When !MAP_SHARED and !MAP_SYNC
>>     * - MAP_ANONYMOUS: When fd < 0
>>     * - MAP_SHARED_VALIDATE: When MAP_SYNC
>>     *
>>     * If mapping with MAP_SYNC|MAP_SHARED_VALIDATE fails, fallback to
>>     * !MAP_SYNC|MAP_SHARED and warn.
>>     */
>>     void *qemu_ram_mmap(int fd,
>>                         size_t size,
>>                         size_t align,
>>                         bool readonly,
>>                         uint32_t mmap_flags,
>>                         off_t map_offset);
> 
> What about this:
> 


The only ugly thing is that e.g., MAP_SYNC is only defined for Linux and 
MAP_NORESERVE is mostly only defined on Linux.

So we need something like we already have in mmap-alloc.c:

#ifdef CONFIG_LINUX
#include <linux/mman.h>
#else  /* !CONFIG_LINUX */
#define MAP_SYNC              0x0
#define MAP_SHARED_VALIDATE   0x0
#endif /* CONFIG_LINUX */


and for the noreserve part


#ifndef MAP_NORESERVE
#define MAP_NORESERVE 0x0
#endif


But then, I can no longer bail out if someone specifies a flag although 
it is unsupported/not effective. hmmmm ...


> 
>   From 13a59d404bb3edaed9e42c94432be28fb9a65c26 Mon Sep 17 00:00:00 2001
> From: David Hildenbrand <david@redhat.com>
> Date: Fri, 5 Mar 2021 17:20:37 +0100
> Subject: [PATCH] util/mmap-alloc: Pass MAP_ flags instead of separate bools to
>    qemu_ram_mmap()
> 
> Let's pass MAP_ flags instead of bools to prepare for passing other MAP_
> flags and update the documentation of qemu_ram_mmap(). Only allow selected
> MAP_ flags (MAP_SHARED, MAP_SYNC) to be passed and keep setting other
> flags implicitly.
> 
> Keep the "readonly" flag, as it cannot be expressed via MAP_ flags.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>    include/qemu/mmap-alloc.h | 19 ++++++++++++++-----
>    softmmu/physmem.c         |  6 ++++--
>    util/mmap-alloc.c         | 13 ++++++++-----
>    util/oslib-posix.c        |  3 ++-
>    4 files changed, 28 insertions(+), 13 deletions(-)
> 
> diff --git a/include/qemu/mmap-alloc.h b/include/qemu/mmap-alloc.h
> index 456ff87df1..27ef374810 100644
> --- a/include/qemu/mmap-alloc.h
> +++ b/include/qemu/mmap-alloc.h
> @@ -7,7 +7,10 @@ size_t qemu_fd_getpagesize(int fd);
>    size_t qemu_mempath_getpagesize(const char *mem_path);
>    
>    /**
> - * qemu_ram_mmap: mmap the specified file or device.
> + * qemu_ram_mmap: mmap anonymous memory, the specified file or device.
> + *
> + * QEMU's MMAP abstraction to map guest RAM, simplifying flag handling,
> + * taking care of alignment requirements and installing guard pages.
>     *
>     * Parameters:
>     *  @fd: the file or the device to mmap
> @@ -15,10 +18,17 @@ size_t qemu_mempath_getpagesize(const char *mem_path);
>     *  @align: if not zero, specify the alignment of the starting mapping address;
>     *          otherwise, the alignment in use will be determined by QEMU.
>     *  @readonly: true for a read-only mapping, false for read/write.
> - *  @shared: map has RAM_SHARED flag.
> - *  @is_pmem: map has RAM_PMEM flag.
> + *  @map_flags: supported MAP_* flags: MAP_SHARED, MAP_SYNC
>     *  @map_offset: map starts at offset of map_offset from the start of fd
>     *
> + * Implicitly handled map_flags:
> + * - MAP PRIVATE: With !MAP_SHARED
> + * - MAP_ANONYMOUS: With fd < 0
> + * - MAP_SHARED_VALIDATE: With MAP_SYNC && MAP_SHARED
> + *
> + * MAP_SYNC is ignored without MAP_SHARED. If mapping via MAP_SYNC fails,
> + * warn and fallback to mapping without MAP_SYNC.
> + *
>     * Return:
>     *  On success, return a pointer to the mapped area.
>     *  On failure, return MAP_FAILED.
> @@ -27,8 +37,7 @@ void *qemu_ram_mmap(int fd,
>                        size_t size,
>                        size_t align,
>                        bool readonly,
> -                    bool shared,
> -                    bool is_pmem,
> +                    uint32_t map_flags,
>                        off_t map_offset);
>    
>    void qemu_ram_munmap(int fd, void *ptr, size_t size);
> diff --git a/softmmu/physmem.c b/softmmu/physmem.c
> index 8f3d286e12..1336884b51 100644
> --- a/softmmu/physmem.c
> +++ b/softmmu/physmem.c
> @@ -1533,6 +1533,7 @@ static void *file_ram_alloc(RAMBlock *block,
>                                off_t offset,
>                                Error **errp)
>    {
> +    uint32_t map_flags;
>        void *area;
>    
>        block->page_size = qemu_fd_getpagesize(fd);
> @@ -1580,9 +1581,10 @@ static void *file_ram_alloc(RAMBlock *block,
>            perror("ftruncate");
>        }
>    
> +    map_flags = (block->flags & RAM_SHARED) ? MAP_SHARED : 0;
> +    map_flags |= (block->flags & RAM_PMEM) ? MAP_SYNC : 0;
>        area = qemu_ram_mmap(fd, memory, block->mr->align, readonly,
> -                         block->flags & RAM_SHARED, block->flags & RAM_PMEM,
> -                         offset);
> +                         map_flags, offset);
>        if (area == MAP_FAILED) {
>            error_setg_errno(errp, errno,
>                             "unable to map backing store for guest RAM");
> diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
> index 0e2bd7bc0e..b558f1675a 100644
> --- a/util/mmap-alloc.c
> +++ b/util/mmap-alloc.c
> @@ -119,16 +119,20 @@ static void *mmap_reserve(size_t size, int fd)
>     * it accessible.
>     */
>    static void *mmap_activate(void *ptr, size_t size, int fd, bool readonly,
> -                           bool shared, bool is_pmem, off_t map_offset)
> +                           uint32_t map_flags, off_t map_offset)
>    {
> +    const bool shared = map_flags & MAP_SHARED;
> +    const bool sync = map_flags & MAP_SYNC;
>        const int prot = PROT_READ | (readonly ? 0 : PROT_WRITE);
>        int map_sync_flags = 0;
>        int flags = MAP_FIXED;
>        void *activated_ptr;
>    
> +    g_assert(!(map_flags & ~(MAP_SHARED | MAP_SYNC)));
> +
>        flags |= fd == -1 ? MAP_ANONYMOUS : 0;
>        flags |= shared ? MAP_SHARED : MAP_PRIVATE;
> -    if (shared && is_pmem) {
> +    if (shared && sync) {
>            map_sync_flags = MAP_SYNC | MAP_SHARED_VALIDATE;
>        }
>    
> @@ -174,8 +178,7 @@ void *qemu_ram_mmap(int fd,
>                        size_t size,
>                        size_t align,
>                        bool readonly,
> -                    bool shared,
> -                    bool is_pmem,
> +                    uint32_t map_flags,
>                        off_t map_offset)
>    {
>        const size_t guard_pagesize = mmap_guard_pagesize(fd);
> @@ -199,7 +202,7 @@ void *qemu_ram_mmap(int fd,
>    
>        offset = QEMU_ALIGN_UP((uintptr_t)guardptr, align) - (uintptr_t)guardptr;
>    
> -    ptr = mmap_activate(guardptr + offset, size, fd, readonly, shared, is_pmem,
> +    ptr = mmap_activate(guardptr + offset, size, fd, readonly, map_flags,
>                            map_offset);
>        if (ptr == MAP_FAILED) {
>            munmap(guardptr, total);
> diff --git a/util/oslib-posix.c b/util/oslib-posix.c
> index 36820fec16..95e2b85279 100644
> --- a/util/oslib-posix.c
> +++ b/util/oslib-posix.c
> @@ -229,8 +229,9 @@ void *qemu_memalign(size_t alignment, size_t size)
>    /* alloc shared memory pages */
>    void *qemu_anon_ram_alloc(size_t size, uint64_t *alignment, bool shared)
>    {
> +    const uint32_t map_flags = shared ? MAP_SHARED : 0;
>        size_t align = QEMU_VMALLOC_ALIGN;
> -    void *ptr = qemu_ram_mmap(-1, size, align, false, shared, false, 0);
> +    void *ptr = qemu_ram_mmap(-1, size, align, false, map_flags, 0);
>    
>        if (ptr == MAP_FAILED) {
>            return NULL;
> 


-- 
Thanks,

David / dhildenb



  reply	other threads:[~2021-03-10 10:58 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-08 15:05 [PATCH v3 00/12] RAM_NORESERVE, MAP_NORESERVE and hostmem "reserve" property David Hildenbrand
2021-03-08 15:05 ` [PATCH v3 01/12] softmmu/physmem: Mark shared anonymous memory RAM_SHARED David Hildenbrand
2021-03-08 15:05 ` [PATCH v3 02/12] softmmu/physmem: Fix ram_block_discard_range() to handle shared anonymous memory David Hildenbrand
2021-03-11 16:39   ` Dr. David Alan Gilbert
2021-03-11 16:45     ` David Hildenbrand
2021-03-11 17:11       ` Peter Xu
2021-03-11 17:15         ` David Hildenbrand
2021-03-11 17:18           ` David Hildenbrand
2021-03-11 17:22           ` Peter Xu
2021-03-11 17:41             ` David Hildenbrand
2021-03-11 21:25               ` Peter Xu
2021-03-11 21:37   ` Peter Xu
2021-03-11 21:49     ` David Hildenbrand
2021-03-08 15:05 ` [PATCH v3 03/12] softmmu/physmem: Fix qemu_ram_remap() " David Hildenbrand
2021-03-08 15:05 ` [PATCH v3 04/12] util/mmap-alloc: Factor out calculation of the pagesize for the guard page David Hildenbrand
2021-03-08 15:05 ` [PATCH v3 05/12] util/mmap-alloc: Factor out reserving of a memory region to mmap_reserve() David Hildenbrand
2021-03-08 15:05 ` [PATCH v3 06/12] util/mmap-alloc: Factor out activating of memory to mmap_activate() David Hildenbrand
2021-03-08 15:05 ` [PATCH v3 07/12] softmmu/memory: Pass ram_flags into qemu_ram_alloc_from_fd() David Hildenbrand
2021-03-08 15:05 ` [PATCH v3 08/12] softmmu/memory: Pass ram_flags into memory_region_init_ram_shared_nomigrate() David Hildenbrand
2021-03-08 15:05 ` [PATCH v3 09/12] util/mmap-alloc: Pass flags instead of separate bools to qemu_ram_mmap() David Hildenbrand
2021-03-09 20:04   ` Peter Xu
2021-03-09 20:27     ` David Hildenbrand
2021-03-09 20:58       ` Peter Xu
2021-03-10  8:41         ` David Hildenbrand
2021-03-10 10:11           ` David Hildenbrand
2021-03-10 10:55             ` David Hildenbrand [this message]
2021-03-10 16:27               ` Peter Xu
2021-03-08 15:05 ` [PATCH v3 10/12] memory: introduce RAM_NORESERVE and wire it up in qemu_ram_mmap() David Hildenbrand
2021-03-08 15:05 ` [PATCH v3 11/12] util/mmap-alloc: Support RAM_NORESERVE via MAP_NORESERVE David Hildenbrand
2021-03-10 10:28   ` David Hildenbrand
2021-03-08 15:06 ` [PATCH v3 12/12] hostmem: Wire up RAM_NORESERVE via "reserve" property David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2a597019-4847-1af8-bceb-1664cee108b1@redhat.com \
    --to=david@redhat.com \
    --cc=borntraeger@de.ibm.com \
    --cc=cohuck@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=ehabkost@redhat.com \
    --cc=groug@kaod.org \
    --cc=i.kotrasinsk@partner.samsung.com \
    --cc=imammedo@redhat.com \
    --cc=mapfelba@redhat.com \
    --cc=mst@redhat.com \
    --cc=muriloo@linux.ibm.com \
    --cc=pasic@linux.ibm.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=philmd@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=richard.henderson@linaro.org \
    --cc=stefanha@redhat.com \
    --cc=sw@weilnetz.de \
    --cc=thuth@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).