From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:33745) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Zqf0u-000669-Kc for qemu-devel@nongnu.org; Mon, 26 Oct 2015 06:28:29 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Zqf0q-000488-GT for qemu-devel@nongnu.org; Mon, 26 Oct 2015 06:28:28 -0400 Received: from mx1.redhat.com ([209.132.183.28]:53693) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Zqf0q-00047Z-8K for qemu-devel@nongnu.org; Mon, 26 Oct 2015 06:28:24 -0400 Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23]) by mx1.redhat.com (Postfix) with ESMTPS id D7325C0AEE24 for ; Mon, 26 Oct 2015 10:28:23 +0000 (UTC) Date: Mon, 26 Oct 2015 12:28:21 +0200 From: "Michael S. Tsirkin" Message-ID: <20151026120943-mutt-send-email-mst@redhat.com> References: <1445852815-85168-1-git-send-email-imammedo@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1445852815-85168-1-git-send-email-imammedo@redhat.com> Subject: Re: [Qemu-devel] [PATCH v2] pc: memhp: enforce minimal 128Mb alignment for pc-dimm List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Igor Mammedov Cc: pbonzini@redhat.com, qemu-devel@nongnu.org, ehabkost@redhat.com On Mon, Oct 26, 2015 at 10:46:55AM +0100, Igor Mammedov wrote: > commit aa8580cd "pc: memhp: force gaps between DIMM's GPA" > regressed memory hot-unplug for linux guests triggering > following BUGON > ===== > kernel BUG at mm/memory_hotplug.c:703! > ... > [] acpi_memory_device_remove+0x79/0xa5 > [] acpi_bus_trim+0x5a/0x8d > [] acpi_device_hotplug+0x1b7/0x418 > === > BUG_ON(phys_start_pfn & ~PAGE_SECTION_MASK); > === > > reson for it is that x86-64 linux guest supports memory > hotplug in chunks of 128Mb and memory section also should > be 128Mb aligned. > However gaps forced between 128Mb DIMMs with backend's > natural alignment of 2Mb make the 2nd and following > DIMMs not being aligned on 128Mb boundary as it was > originally. To fix regression enforce minimal 128Mb > alignment like it was done for PPC. > > Signed-off-by: Igor Mammedov So our temporary work around is creating more trouble. I'm inclined to just revert aa8580cd and df0acded19 with it. > --- > PS: > PAGE_SECTION_MASK is derived from SECTION_SIZE_BITS which > is arch dependent so this is fix for x86-64 target only. > If anyone cares obout 32bit guests, it should also be fine > for x86-32 which has 64Mb memory sections/alignment. Like 32 bit guests are unheard of? This does not inspire confidence at all. So I dug in linux guest code: #ifdef CONFIG_X86_32 # ifdef CONFIG_X86_PAE # define SECTION_SIZE_BITS 29 # define MAX_PHYSADDR_BITS 36 # define MAX_PHYSMEM_BITS 36 # else # define SECTION_SIZE_BITS 26 # define MAX_PHYSADDR_BITS 32 # define MAX_PHYSMEM_BITS 32 # endif #else /* CONFIG_X86_32 */ # define SECTION_SIZE_BITS 27 /* matt - 128 is convenient right now */ # define MAX_PHYSADDR_BITS 44 # define MAX_PHYSMEM_BITS 46 #endif Looks like PAE needs more alignment. And it looks like 128 is arbitrary here. So we are tying ourselves to specific guest quirks. All this just looks wrong to me. > --- > hw/i386/pc.c | 12 ++++++++++++ > 1 file changed, 12 insertions(+) > > diff --git a/hw/i386/pc.c b/hw/i386/pc.c > index 3d958ba..0f7cf7c 100644 > --- a/hw/i386/pc.c > +++ b/hw/i386/pc.c > @@ -1610,6 +1610,8 @@ void ioapic_init_gsi(GSIState *gsi_state, const char *parent_name) > } > } > > +#define PC_MIN_DIMM_ALIGNMENT (1ULL << 27) /* 128Mb */ > + This kind of comment doesn't really help. > static void pc_dimm_plug(HotplugHandler *hotplug_dev, > DeviceState *dev, Error **errp) > { > @@ -1624,6 +1626,16 @@ static void pc_dimm_plug(HotplugHandler *hotplug_dev, > > if (memory_region_get_alignment(mr) && pcms->enforce_aligned_dimm) { > align = memory_region_get_alignment(mr); > + /* > + * Linux x64 guests expect 128Mb aligned DIMM, this implies no other guest cares. which isn't true. > + * but this change which change? > causes memory layout change change compared to what? > so > + * for compatibility compatibility with what? > apply 128Mb alignment only > + * when forced gaps are enabled since it is the cause > + * of misalignment. Which makes no sense, sorry. Can it be misaligned for some other reason? If not, why limit to this case? > + */ > + if (pcmc->inter_dimm_gap && align < PC_MIN_DIMM_ALIGNMENT) { > + align = PC_MIN_DIMM_ALIGNMENT; > + } > } > > if (!pcms->acpi_dev) { All this sounds pretty fragile. How about we revert the inter dimm gap thing for 2.4? It's just a work around, this is piling work arounds on top of work arounds. > -- > 1.8.3.1