From: Andrea Arcangeli <aarcange@redhat.com>
To: Avi Kivity <avi@redhat.com>
Cc: Tomasz Chmielewski <mangoo@wpkg.org>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
Marcelo Tosatti <mtosatti@redhat.com>
Subject: Re: 2.6.38.1 general protection fault
Date: Mon, 28 Mar 2011 22:04:01 +0200 [thread overview]
Message-ID: <20110328200401.GC12265@random.random> (raw)
In-Reply-To: <4D90CD47.7010107@redhat.com>
On Mon, Mar 28, 2011 at 08:02:47PM +0200, Avi Kivity wrote:
> On 03/28/2011 07:54 PM, Andrea Arcangeli wrote:
> > BTW, is it genuine that a protection fault is generated instead of a page
> > fault while dereferencing address 0x00008805d6b087f8? I would normally
> > except a page fault from a memory dereference that doesn't alter
> > processor state/segments.
>
> Yes. Bits 48-63 of the address must be equal to bit 47, or a #GP is
> generated (non-canonical address).
Ok, when you said 16 bit reversed I didn't match it to bit 48 and max
128TB of user address space. I thought it was good idea to check
because in the past I've seen GFP that were hardware issues triggering
on normal memory dereference but this is probably not the case.
Tomasz, how easily can you reproduce? Could you upload to the site the
output of objdump -dr arch/x86/kvm/mmu.o too? (my assembly is vastly
different than the one shown so far, I may find more info in the oops
if I get the assembly of the caller too and of the iteration of the
loop that runs in that function before the GFP)
khugepaged is present in your second trace (and khugepaged is mangling
over some memslot range with guest gfn mapped or kvm_unmap_rmapp
wouldn't be called in the first place, hope the memslot are all ok)
but probably you didn't get the right alignment so likely the THP are
mapped as 4k pages in the guest, which must work fine too. I wonder if
that might be related to that (my qemu-kvm I keep it patched with the
patch below which isn't yet polished enough to be digestible for qemu,
wrong alignments, x86 4M alignment not handled yet, and not sure if
the DONTFORK fix to prevent OOM with hotplug/migrate is acceptable in
that position).
Can you try to "echo 0 >/sys/kernel/mm/transparent_hugepage/khugepaged/scan_sleep_millisecs"
and then run "cat /proc/`pgrep qemu`/smaps >/dev/null" once per minute (or find
the right pid by hand if you've more than one qemu process running).
This debug trick will only work for 2.6.38.1, as 2.6.39 has a native
THP handling in the smaps file, but in 2.6.38.1 it should flush all
sptes mapped on THP just like fork (this might help to reproduce).
I'm also surprised this happened during fork that initialize the tap
interface, shouldn't that fork run before any sptes is established?
(we're running the spte invalidate with mmu notifier in the parent
before wrprotecting the ptes during fork)
I also wonder if it's a memslot race of some kind, I don't see
anything wrong in the rmapp handling at the moment.
This isn't a patch to try, I'm only showing it here for reference as I
guess I suspect it might hide the bug. I'm now going to reverse it and
see if I can reproduce, in case having large sptes (instead of 4k
sptes) always mapped on host THP changes something.
Thanks!
diff --git a/exec.c b/exec.c
index bb0c1be..f60e5fe 100644
--- a/exec.c
+++ b/exec.c
@@ -2856,6 +2856,18 @@ static ram_addr_t last_ram_offset(void)
return last;
}
+#if defined(__linux__) && defined(__x86_64__)
+/*
+ * Align on the max transparent hugepage size so that
+ * "(gfn ^ pfn) & (HPAGE_SIZE-1) == 0" to allow KVM to
+ * take advantage of hugepages with NPT/EPT or to
+ * ensure the first 2M of the guest physical ram will
+ * be mapped by the same hugetlb for QEMU (it is worth
+ * it even without NPT/EPT).
+ */
+#define PREFERRED_RAM_ALIGN (2*1024*1024)
+#endif
+
ram_addr_t qemu_ram_alloc_from_ptr(DeviceState *dev, const char *name,
ram_addr_t size, void *host)
{
@@ -2902,9 +2914,15 @@ ram_addr_t qemu_ram_alloc_from_ptr(DeviceState *dev, const char *name,
PROT_EXEC|PROT_READ|PROT_WRITE,
MAP_SHARED | MAP_ANONYMOUS, -1, 0);
#else
- new_block->host = qemu_vmalloc(size);
+#ifdef PREFERRED_RAM_ALIGN
+ if (size >= PREFERRED_RAM_ALIGN)
+ new_block->host = qemu_memalign(PREFERRED_RAM_ALIGN, size);
+ else
+#endif
+ new_block->host = qemu_vmalloc(size);
#endif
qemu_madvise(new_block->host, size, QEMU_MADV_MERGEABLE);
+ qemu_madvise(new_block->host, size, QEMU_MADV_DONTFORK);
}
}
next prev parent reply other threads:[~2011-03-28 20:04 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-03-25 9:32 2.6.38.1 general protection fault Tomasz Chmielewski
2011-03-26 9:15 ` Avi Kivity
2011-03-26 10:42 ` Tomasz Chmielewski
2011-03-27 9:42 ` Avi Kivity
2011-03-28 6:24 ` Tomasz Chmielewski
2011-03-28 9:19 ` Avi Kivity
2011-03-28 17:54 ` Andrea Arcangeli
2011-03-28 18:02 ` Avi Kivity
2011-03-28 20:04 ` Andrea Arcangeli [this message]
2011-03-28 20:14 ` Tomasz Chmielewski
2011-04-20 9:28 ` Thomas Treutner
2011-04-20 10:54 ` Tomasz Chmielewski
2011-03-29 13:34 ` Marcelo Tosatti
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110328200401.GC12265@random.random \
--to=aarcange@redhat.com \
--cc=avi@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=mangoo@wpkg.org \
--cc=mtosatti@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).