From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:47562) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VdGu0-0002jc-2V for qemu-devel@nongnu.org; Mon, 04 Nov 2013 04:57:02 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1VdGtu-0005Ea-2N for qemu-devel@nongnu.org; Mon, 04 Nov 2013 04:56:56 -0500 Received: from mx1.redhat.com ([209.132.183.28]:31735) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VdGtt-0005EO-Qt for qemu-devel@nongnu.org; Mon, 04 Nov 2013 04:56:49 -0500 Date: Mon, 4 Nov 2013 11:59:33 +0200 From: "Michael S. Tsirkin" Message-ID: <20131104095933.GA30026@redhat.com> References: <1383511723-11228-1-git-send-email-marcel.a@redhat.com> <20131104061814.GA3324@redhat.com> <1383557636.2264.8.camel@localhost.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1383557636.2264.8.camel@localhost.localdomain> Subject: Re: [Qemu-devel] [PATCH] exec: fix regression by making system-memory region UINT64_MAX size List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Marcel Apfelbaum Cc: Peter Maydell , Jan Kiszka , QEMU Developers , Anthony Liguori , Paolo Bonzini , Andreas =?iso-8859-1?Q?F=E4rber?= , Richard Henderson On Mon, Nov 04, 2013 at 11:33:56AM +0200, Marcel Apfelbaum wrote: > On Mon, 2013-11-04 at 08:18 +0200, Michael S. Tsirkin wrote: > > On Sun, Nov 03, 2013 at 09:26:06PM +0000, Peter Maydell wrote: > > > On 3 November 2013 20:48, Marcel Apfelbaum wrote: > > > > The problem appears when a root memory region within an > > > > address space with size < UINT64_MAX has overlapping children > > > > with the same size. If the size of the root memory region is UINT64_MAX > > > > everyting is ok. > > > > > > > > Solved the regression by making the system-memory region > > > > of size UINT64_MAX instead of INT64_MAX. > > > > > > > > Signed-off-by: Marcel Apfelbaum > > > > --- > > > > In the mean time I am investigating why the > > > > root memory region has to be UINT64_MAX size in order > > > > to have overlapping children > > > > > > > system_memory = g_malloc(sizeof(*system_memory)); > > > > - memory_region_init(system_memory, NULL, "system", INT64_MAX); > > > > + memory_region_init(system_memory, NULL, "system", UINT64_MAX); > > > > address_space_init(&address_space_memory, system_memory, "memory"); > > > > > > As you say above we should investigate why this caused a > > > problem, but I was surprised the system memory space isn't > > > already maximum size. It turns out that that change was > > > introduced in commit 8417cebf in an attempt to avoid overflow > > > issues by sticking to signed 64 bit arithmetic. This approach was > > > subsequently ditched in favour of using proper 128 bit arithmetic > > > in commit 08dafab4, but we never changed the init call for > > > the system memory back to UINT64_MAX. So I think this is > > > a good change in itself. > > > > > > -- PMM > > > > I think I debugged it. > > > > So this patch seems to help simply because we only have > > sanity checking asserts in the subpage path. UINT64_MAX will make > > the region a number of full pages and avoid > > hitting the checks. > > > > > > I think I see what the issue is: exec.c > > assumes that TARGET_PHYS_ADDR_SPACE_BITS is enough > > to render any section in system memory: > > number of page table levels is calculated from that: > > > > #define P_L2_LEVELS \ > > (((TARGET_PHYS_ADDR_SPACE_BITS - TARGET_PAGE_BITS - 1) / L2_BITS) + 1) > > > > any other bits are simply ignored: > > > > for (i = P_L2_LEVELS - 1; i >= 0 && !lp.is_leaf; i--) { > > if (lp.ptr == PHYS_MAP_NODE_NIL) { > > return §ions[PHYS_SECTION_UNASSIGNED]; > > } > > p = nodes[lp.ptr]; > > lp = p[(index >> (i * L2_BITS)) & (L2_SIZE - 1)]; > > } > > > > so mask by L2_SIZE - 1 means that each round looks at L2_BITS bits, > > and there are at most P_L2_LEVELS. > > > > Any other bits are simply ignored. > > Michael, thanks for helping to debug this issue. > Let me see if I got it right: > If the system memory size is INT64_MAX (0x7fffffffffffffff), the address of the > last page (0x7ffffffffffff) has more bits (55) that TARGET_PHYS_ADDR_SPACE_BITS (52) > and cannot be correctly mapped into page levels? > > Thanks, > Marcel Yes, I think that's it. > > This is very wrong and can break in a number of other ways, > > for example I think we will also hit this assert > > if we have a non aligned 64 bit BAR of a PCI device. > > > > I think the fastest solution is to just limit > > system memory size of TARGET_PAGE_BITS. > > I sent a patch like this. > > > > > > > >