Re: [Qemu-devel] [PATCH v2] pc: memhp: enforce minimal 128Mb alignment for pc-dimm

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: "Michael S. Tsirkin" <mst@redhat.com>
To: Igor Mammedov <imammedo@redhat.com>
Cc: pbonzini@redhat.com, qemu-devel@nongnu.org, ehabkost@redhat.com
Subject: Re: [Qemu-devel] [PATCH v2] pc: memhp: enforce minimal 128Mb alignment for pc-dimm
Date: Mon, 26 Oct 2015 12:28:21 +0200	[thread overview]
Message-ID: <20151026120943-mutt-send-email-mst@redhat.com> (raw)
In-Reply-To: <1445852815-85168-1-git-send-email-imammedo@redhat.com>

On Mon, Oct 26, 2015 at 10:46:55AM +0100, Igor Mammedov wrote:
> commit aa8580cd "pc: memhp: force gaps between DIMM's GPA"
> regressed memory hot-unplug for linux guests triggering
> following BUGON
>  =====
>  kernel BUG at mm/memory_hotplug.c:703!
>  ...
>  [<ffffffff81385fa7>] acpi_memory_device_remove+0x79/0xa5
>  [<ffffffff81357818>] acpi_bus_trim+0x5a/0x8d
>  [<ffffffff81359026>] acpi_device_hotplug+0x1b7/0x418
>  ===
>     BUG_ON(phys_start_pfn & ~PAGE_SECTION_MASK);
>  ===
> 
> reson for it is that x86-64 linux guest supports memory
> hotplug in chunks of 128Mb and memory section also should
> be 128Mb aligned.
> However gaps forced between 128Mb DIMMs with backend's
> natural alignment of 2Mb make the 2nd and following
> DIMMs not being aligned on 128Mb boundary as it was
> originally. To fix regression enforce minimal 128Mb
> alignment like it was done for PPC.
> 
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>

So our temporary work around is creating more trouble.  I'm inclined to just
revert aa8580cd and df0acded19 with it.

> ---
> PS:
>   PAGE_SECTION_MASK is derived from SECTION_SIZE_BITS which
>   is arch dependent so this is fix for x86-64 target only.
>   If anyone cares obout 32bit guests, it should also be fine
>   for x86-32 which has 64Mb memory sections/alignment.

Like 32 bit guests are unheard of?  This does not inspire confidence at all.


So I dug in linux guest code:

#ifdef CONFIG_X86_32
# ifdef CONFIG_X86_PAE
#  define SECTION_SIZE_BITS     29
#  define MAX_PHYSADDR_BITS     36
#  define MAX_PHYSMEM_BITS      36
# else
#  define SECTION_SIZE_BITS     26
#  define MAX_PHYSADDR_BITS     32
#  define MAX_PHYSMEM_BITS      32
# endif
#else /* CONFIG_X86_32 */
# define SECTION_SIZE_BITS      27 /* matt - 128 is convenient right now */
# define MAX_PHYSADDR_BITS      44
# define MAX_PHYSMEM_BITS       46
#endif

Looks like PAE needs more alignment.
And it looks like 128 is arbitrary here.

So we are tying ourselves to specific guest quirks.
All this just looks wrong to me.


> ---
>  hw/i386/pc.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 3d958ba..0f7cf7c 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -1610,6 +1610,8 @@ void ioapic_init_gsi(GSIState *gsi_state, const char *parent_name)
>      }
>  }
>  
> +#define PC_MIN_DIMM_ALIGNMENT (1ULL << 27) /* 128Mb */
> +

This kind of comment doesn't really help.

>  static void pc_dimm_plug(HotplugHandler *hotplug_dev,
>                           DeviceState *dev, Error **errp)
>  {
> @@ -1624,6 +1626,16 @@ static void pc_dimm_plug(HotplugHandler *hotplug_dev,
>  
>      if (memory_region_get_alignment(mr) && pcms->enforce_aligned_dimm) {
>          align = memory_region_get_alignment(mr);
> +        /*
> +         * Linux x64 guests expect 128Mb aligned DIMM,

this implies no other guest cares. which isn't true.

> +         * but this change

which change?

> causes memory layout change

change compared to what?

> so
> +         * for compatibility

compatibility with what?

> apply 128Mb alignment only
> +         * when forced gaps are enabled since it is the cause
> +         * of misalignment.

Which makes no sense, sorry.

Can it be misaligned for some other reason?

If not, why limit to this case?

> +         */
> +        if (pcmc->inter_dimm_gap && align < PC_MIN_DIMM_ALIGNMENT) {
> +            align = PC_MIN_DIMM_ALIGNMENT;
> +        }
>      }
>  
>      if (!pcms->acpi_dev) {

All this sounds pretty fragile. How about we revert the inter dimm gap
thing for 2.4? It's just a work around, this is piling work arounds on
top of work arounds.

> -- 
> 1.8.3.1

next prev parent reply	other threads:[~2015-10-26 10:28 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-26  9:46 [Qemu-devel] [PATCH v2] pc: memhp: enforce minimal 128Mb alignment for pc-dimm Igor Mammedov
2015-10-26 10:28 ` Michael S. Tsirkin [this message]
2015-10-26 13:24   ` Igor Mammedov
2015-10-26 18:42     ` Andrey Korolyov
2015-10-27  7:39       ` Igor Mammedov
2015-10-27  8:31     ` Michael S. Tsirkin
2015-10-27  8:48       ` Igor Mammedov
2015-10-27  8:53         ` Michael S. Tsirkin
2015-10-27  9:14           ` Igor Mammedov
2015-10-27 16:36             ` Eduardo Habkost
2015-10-29 13:36               ` Igor Mammedov
2015-10-29 18:16                 ` [Qemu-devel] RAM backend and guest ABI (was Re: [PATCH v2] pc: memhp: enforce minimal 128Mb) " Eduardo Habkost
2015-10-30 13:07                   ` Igor Mammedov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151026120943-mutt-send-email-mst@redhat.com \
    --to=mst@redhat.com \
    --cc=ehabkost@redhat.com \
    --cc=imammedo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).