qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Igor Mammedov <imammedo@redhat.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: qemu-devel@nongnu.org, David Hildenbrand <david@redhat.com>,
	Cornelia Huck <cohuck@redhat.com>,
	Halil Pasic <pasic@linux.ibm.com>,
	qemu-s390x@nongnu.org
Subject: Re: [Qemu-devel] [PATCH v1 5/5] s390: do not call memory_region_allocate_system_memory() multiple times
Date: Thu, 18 Apr 2019 11:38:59 +0200	[thread overview]
Message-ID: <20190418113859.00248d07@redhat.com> (raw)
In-Reply-To: <89ca3a70-066b-e40e-faaf-39a39ec976bf@de.ibm.com>

On Tue, 16 Apr 2019 13:09:08 +0200
Christian Borntraeger <borntraeger@de.ibm.com> wrote:

> This fails with more than 8TB, e.g.  "-m 9T "
> 
> [pid 231065] ioctl(10, KVM_SET_USER_MEMORY_REGION, {slot=0, flags=0, guest_phys_addr=0, memory_size=0, userspace_addr=0x3ffc8500000}) = 0
> [pid 231065] ioctl(10, KVM_SET_USER_MEMORY_REGION, {slot=0, flags=0, guest_phys_addr=0, memory_size=9895604649984, userspace_addr=0x3ffc8500000}) = -1 EINVAL (Invalid argument)
> 
> seems that the 2nd memslot gets the full size (and not 9TB-size of first slot).

it turns out MemoryRegions is rendered correctly in to 2 parts (one per alias),
but follow up flatview_simplify() collapses adjacent ranges back
into big one.

I see 2 ways how to approach it:
 1. 'improve' memory region API, so we could disable merging for
     a specific memory region (i.e. RAM providing memory region)
     (I don't in particular like the idea of twisting this API to serve KVM specific purpose)

 2. hide KVMism in kvm code. Move KVM_SLOT_MAX_BYTES out of s390 machine
    code and handle splitting big chunk into several (upto KVM_SLOT_MAX_BYTES)
    in kvm_set_phys_mem().
    We could add KVMState::max_slot_size which is set only by s390,
    so it won't affect other targets.

Paolo,
 I'd like to get your opinion/suggestion which direction I should look into?
     

 
> On 15.04.19 15:27, Igor Mammedov wrote:
> > s390 was trying to solve limited memslot size issue by abusing
> > memory_region_allocate_system_memory(), which breaks API contract
> > where the function might be called only once.
> > 
> > s390 should have used memory aliases to fragment inital memory into
> > smaller chunks to satisfy KVM's memslot limitation. But its a bit
> > late now, since allocated pieces are transfered in migration stream
> > separately, so it's not possible to just replace broken layout with
> > correct one. Previous patch made MemoryRegion alases migratable and
> > this patch switches to use them to split big initial RAM chunk into
> > smaller pieces up to KVM_SLOT_MAX_BYTES each and registers aliases
> > for migration.
> > 
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > ---
> > A don't have access to a suitable system to test it, so I've simulated
> > it with smaller chunks on x84 host. Ping-pong migration between old
> > and new QEMU worked fine.  KVM part should be fine as memslots
> > using mapped MemoryRegions (in this case it would be aliases) as
> > far as I know but is someone could test it on big enough host it
> > would be nice.
> > ---
> >  hw/s390x/s390-virtio-ccw.c | 20 +++++++++++++++-----
> >  1 file changed, 15 insertions(+), 5 deletions(-)
> > 
> > diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
> > index d11069b..12ca3a9 100644
> > --- a/hw/s390x/s390-virtio-ccw.c
> > +++ b/hw/s390x/s390-virtio-ccw.c
> > @@ -161,20 +161,30 @@ static void virtio_ccw_register_hcalls(void)
> >  static void s390_memory_init(ram_addr_t mem_size)
> >  {
> >      MemoryRegion *sysmem = get_system_memory();
> > +    MemoryRegion *ram = g_new(MemoryRegion, 1);
> >      ram_addr_t chunk, offset = 0;
> >      unsigned int number = 0;
> >      gchar *name;
> >  
> >      /* allocate RAM for core */
> > +    memory_region_allocate_system_memory(ram, NULL, "s390.whole.ram", mem_size);
> > +    /*
> > +     * memory_region_allocate_system_memory() registers allocated RAM for
> > +     * migration, however for compat reasons the RAM should be passed over
> > +     * as RAMBlocks of the size upto KVM_SLOT_MAX_BYTES. So unregister just
> > +     * allocated RAM so it won't be migrated directly. Aliases will take
> > +     * of segmenting RAM into legacy chunks.
> > +     */
> > +    vmstate_unregister_ram(ram, NULL);
> >      name = g_strdup_printf("s390.ram");
> >      while (mem_size) {
> > -        MemoryRegion *ram = g_new(MemoryRegion, 1);
> > -        uint64_t size = mem_size;
> > +        MemoryRegion *alias = g_new(MemoryRegion, 1);
> >  
> >          /* KVM does not allow memslots >= 8 TB */
> > -        chunk = MIN(size, KVM_SLOT_MAX_BYTES);
> > -        memory_region_allocate_system_memory(ram, NULL, name, chunk);
> > -        memory_region_add_subregion(sysmem, offset, ram);
> > +        chunk = MIN(mem_size, KVM_SLOT_MAX_BYTES);
> > +        memory_region_init_alias(alias, NULL, name, ram, offset, chunk);
> > +        vmstate_register_ram_global(alias);
> > +        memory_region_add_subregion(sysmem, offset, alias);
> >          mem_size -= chunk;
> >          offset += chunk;
> >          g_free(name);
> >   
> 
> 

WARNING: multiple messages have this Message-ID (diff)
From: Igor Mammedov <imammedo@redhat.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Halil Pasic <pasic@linux.ibm.com>,
	qemu-s390x@nongnu.org, Cornelia Huck <cohuck@redhat.com>,
	qemu-devel@nongnu.org, David Hildenbrand <david@redhat.com>
Subject: Re: [Qemu-devel] [PATCH v1 5/5] s390: do not call memory_region_allocate_system_memory() multiple times
Date: Thu, 18 Apr 2019 11:38:59 +0200	[thread overview]
Message-ID: <20190418113859.00248d07@redhat.com> (raw)
Message-ID: <20190418093859.dAO3YhRa4tgEmxzk9GVTMwvbbdWk8RdXMjQI4NBdvPA@z> (raw)
In-Reply-To: <89ca3a70-066b-e40e-faaf-39a39ec976bf@de.ibm.com>

On Tue, 16 Apr 2019 13:09:08 +0200
Christian Borntraeger <borntraeger@de.ibm.com> wrote:

> This fails with more than 8TB, e.g.  "-m 9T "
> 
> [pid 231065] ioctl(10, KVM_SET_USER_MEMORY_REGION, {slot=0, flags=0, guest_phys_addr=0, memory_size=0, userspace_addr=0x3ffc8500000}) = 0
> [pid 231065] ioctl(10, KVM_SET_USER_MEMORY_REGION, {slot=0, flags=0, guest_phys_addr=0, memory_size=9895604649984, userspace_addr=0x3ffc8500000}) = -1 EINVAL (Invalid argument)
> 
> seems that the 2nd memslot gets the full size (and not 9TB-size of first slot).

it turns out MemoryRegions is rendered correctly in to 2 parts (one per alias),
but follow up flatview_simplify() collapses adjacent ranges back
into big one.

I see 2 ways how to approach it:
 1. 'improve' memory region API, so we could disable merging for
     a specific memory region (i.e. RAM providing memory region)
     (I don't in particular like the idea of twisting this API to serve KVM specific purpose)

 2. hide KVMism in kvm code. Move KVM_SLOT_MAX_BYTES out of s390 machine
    code and handle splitting big chunk into several (upto KVM_SLOT_MAX_BYTES)
    in kvm_set_phys_mem().
    We could add KVMState::max_slot_size which is set only by s390,
    so it won't affect other targets.

Paolo,
 I'd like to get your opinion/suggestion which direction I should look into?
     

 
> On 15.04.19 15:27, Igor Mammedov wrote:
> > s390 was trying to solve limited memslot size issue by abusing
> > memory_region_allocate_system_memory(), which breaks API contract
> > where the function might be called only once.
> > 
> > s390 should have used memory aliases to fragment inital memory into
> > smaller chunks to satisfy KVM's memslot limitation. But its a bit
> > late now, since allocated pieces are transfered in migration stream
> > separately, so it's not possible to just replace broken layout with
> > correct one. Previous patch made MemoryRegion alases migratable and
> > this patch switches to use them to split big initial RAM chunk into
> > smaller pieces up to KVM_SLOT_MAX_BYTES each and registers aliases
> > for migration.
> > 
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > ---
> > A don't have access to a suitable system to test it, so I've simulated
> > it with smaller chunks on x84 host. Ping-pong migration between old
> > and new QEMU worked fine.  KVM part should be fine as memslots
> > using mapped MemoryRegions (in this case it would be aliases) as
> > far as I know but is someone could test it on big enough host it
> > would be nice.
> > ---
> >  hw/s390x/s390-virtio-ccw.c | 20 +++++++++++++++-----
> >  1 file changed, 15 insertions(+), 5 deletions(-)
> > 
> > diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
> > index d11069b..12ca3a9 100644
> > --- a/hw/s390x/s390-virtio-ccw.c
> > +++ b/hw/s390x/s390-virtio-ccw.c
> > @@ -161,20 +161,30 @@ static void virtio_ccw_register_hcalls(void)
> >  static void s390_memory_init(ram_addr_t mem_size)
> >  {
> >      MemoryRegion *sysmem = get_system_memory();
> > +    MemoryRegion *ram = g_new(MemoryRegion, 1);
> >      ram_addr_t chunk, offset = 0;
> >      unsigned int number = 0;
> >      gchar *name;
> >  
> >      /* allocate RAM for core */
> > +    memory_region_allocate_system_memory(ram, NULL, "s390.whole.ram", mem_size);
> > +    /*
> > +     * memory_region_allocate_system_memory() registers allocated RAM for
> > +     * migration, however for compat reasons the RAM should be passed over
> > +     * as RAMBlocks of the size upto KVM_SLOT_MAX_BYTES. So unregister just
> > +     * allocated RAM so it won't be migrated directly. Aliases will take
> > +     * of segmenting RAM into legacy chunks.
> > +     */
> > +    vmstate_unregister_ram(ram, NULL);
> >      name = g_strdup_printf("s390.ram");
> >      while (mem_size) {
> > -        MemoryRegion *ram = g_new(MemoryRegion, 1);
> > -        uint64_t size = mem_size;
> > +        MemoryRegion *alias = g_new(MemoryRegion, 1);
> >  
> >          /* KVM does not allow memslots >= 8 TB */
> > -        chunk = MIN(size, KVM_SLOT_MAX_BYTES);
> > -        memory_region_allocate_system_memory(ram, NULL, name, chunk);
> > -        memory_region_add_subregion(sysmem, offset, ram);
> > +        chunk = MIN(mem_size, KVM_SLOT_MAX_BYTES);
> > +        memory_region_init_alias(alias, NULL, name, ram, offset, chunk);
> > +        vmstate_register_ram_global(alias);
> > +        memory_region_add_subregion(sysmem, offset, alias);
> >          mem_size -= chunk;
> >          offset += chunk;
> >          g_free(name);
> >   
> 
> 



  parent reply	other threads:[~2019-04-18  9:39 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-15 13:27 [Qemu-devel] [PATCH v1 0/5] Fix misuses of memory_region_allocate_system_memory() Igor Mammedov
2019-04-15 13:27 ` Igor Mammedov
2019-04-15 13:27 ` [Qemu-devel] [PATCH v1 1/5] sparc64: use memory_region_allocate_system_memory() only for '-m' specified RAM Igor Mammedov
2019-04-15 13:27   ` Igor Mammedov
2019-04-15 15:07   ` Philippe Mathieu-Daudé
2019-04-15 15:07     ` Philippe Mathieu-Daudé
2019-04-15 13:27 ` [Qemu-devel] [PATCH v1 2/5] ppc: rs6000_mc: drop usage of memory_region_allocate_system_memory() Igor Mammedov
2019-04-15 13:27   ` Igor Mammedov
2019-04-16  4:13   ` David Gibson
2019-04-16  4:13     ` David Gibson
2019-04-15 13:27 ` [Qemu-devel] [PATCH v1 3/5] hppa: drop usage of memory_region_allocate_system_memory() for ROM Igor Mammedov
2019-04-15 13:27   ` Igor Mammedov
2019-04-15 15:16   ` Philippe Mathieu-Daudé
2019-04-15 15:16     ` Philippe Mathieu-Daudé
2019-04-15 13:27 ` [Qemu-devel] [PATCH v1 4/5] memory: make MemoryRegion alias migratable Igor Mammedov
2019-04-15 13:27   ` Igor Mammedov
2019-04-15 13:27 ` [Qemu-devel] [PATCH v1 5/5] s390: do not call memory_region_allocate_system_memory() multiple times Igor Mammedov
2019-04-15 13:27   ` Igor Mammedov
2019-04-16 11:01   ` Christian Borntraeger
2019-04-16 11:01     ` Christian Borntraeger
2019-04-16 11:02     ` Christian Borntraeger
2019-04-16 11:02       ` Christian Borntraeger
2019-04-16 11:09   ` Christian Borntraeger
2019-04-16 11:09     ` Christian Borntraeger
2019-04-17 14:30     ` Igor Mammedov
2019-04-17 14:30       ` Igor Mammedov
2019-04-18  9:38     ` Igor Mammedov [this message]
2019-04-18  9:38       ` Igor Mammedov
2019-04-18 11:24       ` David Hildenbrand
2019-04-18 11:24         ` David Hildenbrand
2019-04-18 12:01         ` Igor Mammedov
2019-04-18 12:01           ` Igor Mammedov
2019-04-18 12:06           ` David Hildenbrand
2019-04-18 12:06             ` David Hildenbrand
2019-04-18 14:56             ` Igor Mammedov
2019-04-18 14:56               ` Igor Mammedov
2019-04-18 15:01               ` David Hildenbrand
2019-04-18 15:01                 ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190418113859.00248d07@redhat.com \
    --to=imammedo@redhat.com \
    --cc=cohuck@redhat.com \
    --cc=david@redhat.com \
    --cc=pasic@linux.ibm.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-s390x@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).