From: Teck Choon Giam <giamteckchoon@gmail.com>
To: MaoXiaoyun <tinnycloud@hotmail.com>
Cc: jeremy@goop.org, xen devel <xen-devel@lists.xensource.com>,
keir@xen.org, ian.campbell@citrix.com, konrad.wilk@oracle.com,
dave@ivt.com.au
Subject: Re: kernel BUG at arch/x86/xen/mmu.c:1872
Date: Mon, 11 Apr 2011 04:14:45 +0800 [thread overview]
Message-ID: <BANLkTimgh_iip27zkDPNV9r7miwbxHmdVg@mail.gmail.com> (raw)
In-Reply-To: <BLU157-w540B39FBA137B4D96278D2DAA90@phx.gbl>
[-- Attachment #1: Type: text/plain, Size: 4583 bytes --]
2011/4/10 MaoXiaoyun <tinnycloud@hotmail.com>:
> Hi Konrad & Jeremy:
>
> I think we finally located the missing patch for this commit.
> We test commit
> http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commit;h=c97f681f138039425c87f35ea46a92385d81e70e
> which is works.
>
> We test commit
> http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commit;h=221c64dbf860d37f841f40893bddf8d804aa55bd
> which server crashed.
>
> Later I found the comments for this commit:
>
> http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commit;h=64141da587241301ce8638cc945f8b67853156ec
>
> So It looks like this fix is not applied on 2.6.32.36, Could you
> take a look at this?
>
> Many thanks.
>
> =====================================================
>>Hi Konrad & Jeremy:
>>
>> I'd like to open this BUG in a new thread, since the old thread is too
>> long for easy read.
>>
>> We recently want to upgrade our kernel to 2.6.32, but unfortunately,
>> we confront a kernel crash bug.
>>Our test case is simple, start 24 win2003 HVMS on our physical machine, and
>> each HVM reboot
>>every 15minutes. The kernel will crash in half an hour.(That is crash on VM
>> second starts).
>>
>>Our test go much further.
>>We test different kernel version.
>>2.6.32.10
>> http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commit;h=d945b014ac5df9592c478bf9486d97e8914aab59
>>2.6.32.11
>> http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commit;h=27f948a3bf365a5bc3d56119637a177d41147815
>>2.6.32.12
>> http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commit;h=ba739f9abd3f659b907a824af1161926b420a2ce
>>2.6.32.13
>> http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commit;h=f6fe6583b77a49b569eef1b66c3d761eec2e561b
>>2.6.32.15
>> http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commit;h=27ed1b0e0dae5f1d5da5c76451bc84cb529128bd
>>2.6.32.21
>> http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commit;h=69e50db231723596ed8ef9275d0068d6697f466a
>>
>>There are basic three different result we met.
>>
>>i1) grant table issue
>>The host still function, but use xm dmesg, we have abnormal log.
>>please refer to the attched log of grant table
>>
>>i2) kernel crash on a different place.
>>Host die during the test, after reboot, we can see nothing abnormal in
>> /var/log/messages
>>
>>i3) kernel BUG at arch/x86/xen/mmu.c:1872;
>>Host die during the test, after reboot, we see the crash log in messages,
>> refer to the attached log of 2.6.32.36
>>Summary of the test result, can be classified in two:
>>
>>1) 2.6.32.10
>>30 machines involved the test, and three has issue (i1), and two has issue
>> (i2), *no* issue (i3)
>>Other machines run tests successfully till now, more than 8 hours
>>
>>2)2.6.32.11 or later version.
>>Each version containers 10 machine for tests, and all machine crashed in
>> less than half an hour.
>>
>>Conclusion:
>>1) grant table issue exists in all kernel version
>>2) kernerl crash at different place may exist in all kernel versions, but
>> not happen so frequently, 2 out of 30
>>3) We observe the major difference of issue i3), from the test, it looks
>> like it is introduced between the version
>>2.6.32.10 and 2.6.32.11.
>>
>>Hope this help to locate the bug.
>>Many thanks.
>>
>>
>
Hi,
Sorry, since this mmu related BUG has been troubled me for very
long... I really want to "kill" this BUG but my knowledge in kernel
hacking and/or xen is very limited.
While waiting for Jeremy or Konrad or others ...
Many thanks for spending time to track down this mmu related BUG. I
have backported the commit from
http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commit;h=64141da587241301ce8638cc945f8b67853156ec
to 2.6.32.36 PVOPS kernel and patch attached. I won't know whether
did I backport it correctly nor does it affects anything. I am
currently testing the 2.6.32.36 PVOPS kernel with this patch applied
and also unset CONFIG_DEBUG_PAGEALLOC. Currently running testcrash.sh
loop 1000 as I am unable to reproduce this mmu BUG 1872 in
testcrash.sh loop 100. Please note that when CONFIG_DEBUG_PAGEALLOC
is unset, I can reproduce this mmu BUG 1872 easily within <50
testcrash.sh loop cycle with PVOPS version 2.6.32.24 to 2.6.32.36
kernel. Now test with this backport patch to see whether I can
reproduce this mmu BUG... ...
Kindest regards,
Giam Teck Choon
[-- Attachment #2: vmalloc__eagerly_clear_ptes_on_vunmap.patch --]
[-- Type: text/x-patch, Size: 3393 bytes --]
Back port from commit http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commit;h=64141da587241301ce8638cc945f8b67853156ec
diff -urN a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
--- a/arch/x86/xen/mmu.c 2011-03-30 06:17:46.000000000 +0800
+++ b/arch/x86/xen/mmu.c 2011-04-11 02:17:54.000000000 +0800
@@ -2430,8 +2430,6 @@
x86_init.paging.pagetable_setup_start = xen_pagetable_setup_start;
x86_init.paging.pagetable_setup_done = xen_pagetable_setup_done;
pv_mmu_ops = xen_mmu_ops;
-
- vmap_lazy_unmap = false;
}
/* Protected by xen_reservation_lock. */
diff -urN a/include/linux/vmalloc.h b/include/linux/vmalloc.h
--- a/include/linux/vmalloc.h 2011-03-30 06:17:46.000000000 +0800
+++ b/include/linux/vmalloc.h 2011-04-11 02:18:43.000000000 +0800
@@ -7,8 +7,6 @@
struct vm_area_struct; /* vma defining user mapping in mm_types.h */
-extern bool vmap_lazy_unmap;
-
/* bits in flags of vmalloc's vm_struct below */
#define VM_IOREMAP 0x00000001 /* ioremap() and friends */
#define VM_ALLOC 0x00000002 /* vmalloc() */
diff -urN a/mm/vmalloc.c b/mm/vmalloc.c
--- a/mm/vmalloc.c 2011-03-30 06:17:46.000000000 +0800
+++ b/mm/vmalloc.c 2011-04-11 02:25:38.000000000 +0800
@@ -31,8 +31,6 @@
#include <asm/tlbflush.h>
#include <asm/shmparam.h>
-bool vmap_lazy_unmap __read_mostly = true;
-
/*** Page table manipulation functions ***/
static void vunmap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end)
@@ -503,9 +501,6 @@
{
unsigned int log;
- if (!vmap_lazy_unmap)
- return 0;
-
log = fls(num_online_cpus());
return log * (32UL * 1024 * 1024 / PAGE_SIZE);
@@ -566,7 +561,6 @@
if (va->va_end > *end)
*end = va->va_end;
nr += (va->va_end - va->va_start) >> PAGE_SHIFT;
- unmap_vmap_area(va);
list_add_tail(&va->purge_list, &valist);
va->flags |= VM_LAZY_FREEING;
va->flags &= ~VM_LAZY_FREE;
@@ -612,10 +606,11 @@
}
/*
- * Free and unmap a vmap area, caller ensuring flush_cache_vunmap had been
- * called for the correct range previously.
+ * Free a vmap area, caller ensuring that the area has been unmapped
+ * and flush_cache_vunmap had been called for the correct range
+ * previously.
*/
-static void free_unmap_vmap_area_noflush(struct vmap_area *va)
+static void free_vmap_area_noflush(struct vmap_area *va)
{
va->flags |= VM_LAZY_FREE;
atomic_add((va->va_end - va->va_start) >> PAGE_SHIFT, &vmap_lazy_nr);
@@ -624,6 +619,16 @@
}
/*
+ * Free and unmap a vmap area, caller ensuring flush_cache_vunmap had been
+ * called for the correct range previously.
+ */
+static void free_unmap_vmap_area_noflush(struct vmap_area *va)
+{
+ unmap_vmap_area(va);
+ free_vmap_area_noflush(va);
+}
+
+/*
* Free and unmap a vmap area
*/
static void free_unmap_vmap_area(struct vmap_area *va)
@@ -799,7 +804,7 @@
spin_unlock(&vmap_block_tree_lock);
BUG_ON(tmp != vb);
- free_unmap_vmap_area_noflush(vb->va);
+ free_vmap_area_noflush(vb->va);
call_rcu(&vb->rcu_head, rcu_free_vb);
}
@@ -936,6 +941,8 @@
rcu_read_unlock();
BUG_ON(!vb);
+ vunmap_page_range((unsigned long)addr, (unsigned long)addr + size);
+
spin_lock(&vb->lock);
BUG_ON(bitmap_allocate_region(vb->dirty_map, offset >> PAGE_SHIFT, order));
@@ -988,7 +995,6 @@
s = vb->va->va_start + (i << PAGE_SHIFT);
e = vb->va->va_start + (j << PAGE_SHIFT);
- vunmap_page_range(s, e);
flush = 1;
if (s < start)
[-- Attachment #3: testcrash.sh --]
[-- Type: application/x-sh, Size: 5573 bytes --]
[-- Attachment #4: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
next prev parent reply other threads:[~2011-04-10 20:14 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <COL0-MC1-F14hmBzxHs00230882@col0-mc1-f14.Col0.hotmail.com>
2011-04-08 11:24 ` kernel BUG at arch/x86/xen/mmu.c:1860! MaoXiaoyun
2011-04-08 11:46 ` MaoXiaoyun
2011-04-10 3:57 ` kernel BUG at arch/x86/xen/mmu.c:1872 MaoXiaoyun
2011-04-10 4:29 ` MaoXiaoyun
2011-04-10 13:57 ` MaoXiaoyun
2011-04-10 20:14 ` Teck Choon Giam [this message]
2011-04-11 12:16 ` Teck Choon Giam
2011-04-11 12:22 ` Teck Choon Giam
2011-04-11 12:31 ` MaoXiaoyun
2011-04-11 15:25 ` Teck Choon Giam
2011-04-12 3:30 ` MaoXiaoyun
2011-04-12 16:08 ` Teck Choon Giam
2011-04-11 18:08 ` Jeremy Fitzhardinge
2011-04-12 3:35 ` MaoXiaoyun
2011-04-12 6:48 ` Grant Table Error on 2.6.32.36 + Xen 4.0.1 MaoXiaoyun
2011-04-12 8:46 ` Konrad Rzeszutek Wilk
2011-04-12 9:02 ` MaoXiaoyun
2011-04-12 9:11 ` Kernel BUG at arch/x86/mm/tlb.c:61 MaoXiaoyun
2011-04-12 10:00 ` Konrad Rzeszutek Wilk
2011-04-12 10:10 ` MaoXiaoyun
2011-04-14 6:16 ` MaoXiaoyun
2011-04-14 7:26 ` Teck Choon Giam
2011-04-14 7:56 ` MaoXiaoyun
2011-04-14 11:16 ` MaoXiaoyun
2011-04-15 12:23 ` MaoXiaoyun
2011-04-15 21:22 ` Jeremy Fitzhardinge
2011-04-18 15:20 ` MaoXiaoyun
2011-04-25 3:15 ` MaoXiaoyun
2011-04-26 5:52 ` Tian, Kevin
2011-04-26 7:04 ` MaoXiaoyun
2011-04-26 8:31 ` Tian, Kevin
2011-04-28 23:29 ` Jeremy Fitzhardinge
2011-04-29 0:19 ` Tian, Kevin
2011-04-29 1:50 ` MaoXiaoyun
2011-04-29 1:57 ` Tian, Kevin
2011-04-25 4:42 ` MaoXiaoyun
2011-04-25 12:54 ` MaoXiaoyun
2011-04-25 13:11 ` MaoXiaoyun
2011-04-25 15:05 ` MaoXiaoyun
2011-04-26 5:55 ` Tian, Kevin
2011-04-12 16:32 ` kernel BUG at arch/x86/xen/mmu.c:1872 Teck Choon Giam
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=BANLkTimgh_iip27zkDPNV9r7miwbxHmdVg@mail.gmail.com \
--to=giamteckchoon@gmail.com \
--cc=dave@ivt.com.au \
--cc=ian.campbell@citrix.com \
--cc=jeremy@goop.org \
--cc=keir@xen.org \
--cc=konrad.wilk@oracle.com \
--cc=tinnycloud@hotmail.com \
--cc=xen-devel@lists.xensource.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).