From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: "Christopher S. Aker" <caker@theshore.net>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>,
"xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>,
Ian Campbell <Ian.Campbell@citrix.com>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: Re: kernel BUG at mm/swapfile.c:2527! [was 3.0.0 Xen pv guest - BUG: Unable to handle]
Date: Tue, 6 Sep 2011 13:13:19 -0400 [thread overview]
Message-ID: <20110906171319.GB29839@dumpdata.com> (raw)
In-Reply-To: <4E5E9CDB.3070706@theshore.net>
On Wed, Aug 31, 2011 at 04:43:07PM -0400, Christopher S. Aker wrote:
> On 8/30/11 7:45 AM, Ian Campbell wrote:
> >On Mon, 2011-08-29 at 16:07 +0100, Konrad Rzeszutek Wilk wrote:
> >>I just don't get how you are the only person seeing this - and you have
> >>been seeing this from 2.6.32... The dom0 you have - is it printing at least
> >>something when this happens (or before)? Or the Xen hypervisor:
> >>maybe a message about L1 pages not found?
So .. just to confirm this b/c you have been seeing this for some time. Did you
see this with a 2.6.32 DomU? Asking b/c in 2.6.37 we removed some code:
ef691947d8a3d479e67652312783aedcf629320a
commit ef691947d8a3d479e67652312783aedcf629320a
Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Date: Wed Dec 1 15:45:48 2010 -0800
vmalloc: remove vmalloc_sync_all() from alloc_vm_area()
There's no need for it: it will get faulted into the current pagetable
as needed.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 5d60302..fdf4b1e 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2148,10 +2148,6 @@ struct vm_struct *alloc_vm_area(size_t size)
return NULL;
}
- /* Make sure the pagetables are constructed in process kernel
- mappings */
- vmalloc_sync_all();
-
return area;
}
EXPORT_SYMBOL_GPL(alloc_vm_area);
Which we found led to a couple of bugs:
" Revert "vmalloc: remove vmalloc_sync_all() from alloc_vm_area()"
This reverts commit ef691947d8a3d479e67652312783aedcf629320a.
Xen backend drivers (e.g., blkback and netback) would sometimes fail
to map grant pages into the vmalloc address space allocated with
alloc_vm_area(). The GNTTABOP_map_grant_ref would fail because Xen
could not find the page (in the L2 table) containing the PTEs it
needed to update.
(XEN) mm.c:3846:d0 Could not find L1 PTE for address fbb42000
netback and blkback were making the hypercall from a kernel thread
where task->active_mm != &init_mm and alloc_vm_area() was only
updating the page tables for init_mm. The usual method of deferring
the update to the page tables of other processes (i.e., after taking a
fault) doesn't work as a fault cannot occur during the hypercall.
This would work on some systems depending on what else was using
vmalloc.
"
It would really neat if the issue you have been hitting was exactly this
and just having you revert the ef691947d8a3d479e67652312783aedcf629320a
would fix it.
I am grasping at straws here - since without able to reproduce this it is
a bit hard to figure out what is going wrong.
BTW, the fix also affects the front-ends - especially the xen netfront -
even thought the comment only mentions backends.
> >
> >It'd be worth ensuring that the requires guest_loglvl and loglvl
> >parameters to allow this is in place on the hypervisor command line.
>
> Nothing in Xen's output correlates at the time of the domUs
> crashing, however we don't have guest log levels turned up.
>
> >Are these reports against totally unpatched kernel.org domU kernels?
>
> Yes - unpatched domUs.
>
> >>And the dom0 is 2.6.18, right? - Did you update it (I know that the Red Hat guys
> >>have been updating a couple of things on it).
>
> 2.6.18 from xenbits, all around changeset 931 vintage.
>
> >>Any chance I can get access to your setup and try to work with somebody
> >>to reproduce this?
>
> Konrad, that's a fantastic offer and much appreciated. To make this
> happen I'll need to find a volunteer customer or two whose activity
> reproduces this problem and who can deal with some downtime -- then
> quarantine them off to an environment you can access. I'll send out
> the word...
>
> >>>------------[ cut here ]------------
> >>>kernel BUG at mm/swapfile.c:2527!
> >
> >This is "BUG_ON(*map == 0);" which is subtly different from the error in
> >the original post from Peter which was a "unable to handle kernel paging
> >request" at EIP c01ab854, with a pagetable walk showing PTE==0.
> >
> >I'd bet the dereference corresponds to the "*map" in that same place but
> >Peter can you convert that address to a line of code please?
>
> root@build:/build/xen/domU/i386/3.0.0-linode35-debug# gdb vmlinux
> GNU gdb (GDB) 7.1-ubuntu (...snip...)
> Reading symbols from
> /build/xen/domU/i386/3.0.0-linode35-debug/vmlinux...done.
> (gdb) list *0xc01ab854
> 0xc01ab854 is in swap_count_continued (mm/swapfile.c:2493).
> 2488
> 2489 if (count == (SWAP_MAP_MAX | COUNT_CONTINUED)) { /*
> incrementing */
> 2490 /*
> 2491 * Think of how you add 1 to 999
> 2492 */
> 2493 while (*map == (SWAP_CONT_MAX | COUNT_CONTINUED)) {
> 2494 kunmap_atomic(map, KM_USER0);
> 2495 page = list_entry(page->lru.next,
> struct page, lru);
> 2496 BUG_ON(page == head);
> 2497 map = kmap_atomic(page, KM_USER0) + offset;
> (gdb)
>
> >map came from a kmap_atomic() not far before this point so it appears
> >that it is mapping the wrong page (so *map != 0) and/or mapping a
> >non-existent page (leading to the fault).
> >
> >Warning, wild speculation follows...
> >
> >Is it possible that we are in lazy paravirt mode at this point such that
> >the mapping hasn't really occurred yet, leaving either nothing or the
> >previous mapping? (would the current paravirt lazy state make a useful
> >general addition to the panic message?)
> >
> >The definition of kmap_atomic is a bit confusing:
> > /*
> > * Make both: kmap_atomic(page, idx) and kmap_atomic(page) work.
> > */
> > #define kmap_atomic(page, args...) __kmap_atomic(page)
> >but it appears that the KM_USER0 at the callsite is ignored and instead
> >we end up using the __kmap_atomic_idx stuff (fine). I wondered if it is
> >possible we are overflowing the number of slots but there is an explicit
> >BUG_ON for that case in kmap_atomic_idx_push. Oh, wait, that's iff
> >CONFIG_DEBUG_HIGHMEM, which appears to not be enabled. I think it would
> >be worth trying, it doesn't look to have too much overhead.
>
> My next build will be sure to include CONFIG_DEBUG_HIGHMEM. Maybe
> that'll lead us to a discovery.
>
> >Another possibility which springs to mind is the pfn->mfn laundering
> >going wrong. Perhaps as a skanky debug hack remembering the last pte
> >val, address, mfn, pfn etc and dumping them on error would give a hint?
> >I wouldn't expect that to result in a non-present mapping though, rather
> >I would expect either the wrong thing or the guest to be killed by the
> >hypervisor
> >
> >Would it be worth doing a __get_user(map) (or some other "safe" pointer
> >dereference) right after the mapping is established, catching a fault if
> >one occurs so we can dump some additional debug in that case? I'm not
> >entirely sure what to suggest dumping though.
> >
> >Ian.
> >
> >>>invalid opcode: 0000 [#1] SMP
> >>>last sysfs file: /sys/devices/system/cpu/cpu3/topology/core_id
> >>>Modules linked in:
> >>>
> >>>Pid: 17680, comm: postgres Tainted: G B 2.6.39-linode33 #3
> >>>EIP: 0061:[<c01b4b26>] EFLAGS: 00210246 CPU: 0
> >>>EIP is at swap_count_continued+0x176/0x180
> >>>EAX: f57bac57 EBX: eba2c200 ECX: f57ba000 EDX: 00000000
> >>>ESI: ebfd7c20 EDI: 00000080 EBP: 00000c57 ESP: c670fe0c
> >>> DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069
> >>>Process postgres (pid: 17680, ti=c670e000 task=e93415d0 task.ti=c670e000)
> >>>Stack:
> >>> e9e3a340 00013c57 ee15fc57 00000000 c01b60b1 c0731000 c06982d5 401b4b73
> >>> ceebc988 e9e3a340 00013c57 00000000 c01b60f7 ceebc988 b7731000 c670ff04
> >>> c01a7183 4646e045 80000005 e62ce348 28999063 c0103fc5 7f662000 00278ae0
> >>>Call Trace:
> >>> [<c01b60b1>] ? swap_entry_free+0x121/0x140
> >>> [<c06982d5>] ? _raw_spin_lock+0x5/0x10
> >>> [<c01b60f7>] ? free_swap_and_cache+0x27/0xd0
> >>> [<c01a7183>] ? zap_pte_range+0x1b3/0x480
> >>> [<c0103fc5>] ? pte_pfn_to_mfn+0xb5/0xd0
> >>> [<c01a7568>] ? unmap_page_range+0x118/0x1a0
> >>> [<c0105b17>] ? xen_force_evtchn_callback+0x17/0x30
> >>> [<c01a771b>] ? unmap_vmas+0x12b/0x1e0
> >>> [<c01aba01>] ? exit_mmap+0x91/0x140
> >>> [<c0134b2b>] ? mmput+0x2b/0xc0
> >>> [<c01386ba>] ? exit_mm+0xfa/0x130
> >>> [<c0698330>] ? _raw_spin_lock_irq+0x10/0x20
> >>> [<c013a2b5>] ? do_exit+0x125/0x360
> >>> [<c0105b17>] ? xen_force_evtchn_callback+0x17/0x30
> >>> [<c013a52c>] ? do_group_exit+0x3c/0xa0
> >>> [<c013a5a1>] ? sys_exit_group+0x11/0x20
> >>> [<c0698631>] ? syscall_call+0x7/0xb
> >>>Code: ff 89 d8 e8 7d ec f6 ff 01 e8 8d 76 00 c6 00 00 ba 01 00 00 00
> >>>eb b2 89 f8 3c 80 0f 94 c0
> >>>e9 b9 fe ff ff 0f 0b eb fe 0f 0b eb fe<0f> 0b eb fe 0f 0b eb fe 66
> >>>90 53 31 db 83 ec 0c 85 c0 7
> >>>4 39 89
> >>>EIP: [<c01b4b26>] swap_count_continued+0x176/0x180 SS:ESP 0069:c670fe0c
> >>>---[ end trace c2dcb41c89b0a9f7 ]---
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
next prev parent reply other threads:[~2011-09-06 17:13 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-08-26 17:42 3.0.0 Xen pv guest - BUG: Unable to handle kernel paging request in swap_count_continued Peter Sandin
2011-08-29 14:39 ` kernel BUG at mm/swapfile.c:2527! [was 3.0.0 Xen pv guest - BUG: Unable to handle] Christopher S. Aker
2011-08-29 15:07 ` Konrad Rzeszutek Wilk
2011-08-30 11:45 ` Ian Campbell
2011-08-31 20:43 ` Christopher S. Aker
2011-09-06 17:13 ` Konrad Rzeszutek Wilk [this message]
2011-09-12 16:06 ` [Xen-devel] " Christopher S. Aker
2011-09-12 16:11 ` Konrad Rzeszutek Wilk
2011-09-15 18:58 ` Christopher S. Aker
2011-09-15 19:17 ` Christopher S. Aker
2011-09-18 15:05 ` Christopher S. Aker
2011-09-21 18:04 ` Konrad Rzeszutek Wilk
2011-09-21 22:09 ` Christopher S. Aker
2011-09-22 18:32 ` Konrad Rzeszutek Wilk
2011-09-22 20:02 ` [Xen-devel] " Christopher S. Aker
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110906171319.GB29839@dumpdata.com \
--to=konrad.wilk@oracle.com \
--cc=Ian.Campbell@citrix.com \
--cc=caker@theshore.net \
--cc=jeremy@goop.org \
--cc=linux-kernel@vger.kernel.org \
--cc=xen-devel@lists.xensource.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).