From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: Robert Phillips <robert.phillips@citrix.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>,
Ben Guthro <ben@guthro.net>,
"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
Subject: Re: dom0 linux 3.6.0-rc4, crash due to ballooning althoug dom0_mem=X, max:X set
Date: Wed, 5 Sep 2012 10:06:01 -0400 [thread overview]
Message-ID: <20120905140600.GA5844@phenom.dumpdata.com> (raw)
In-Reply-To: <048EAD622912254A9DEA24C1734613C18C864C3C5D@FTLPMAILBOX02.citrite.net>
On Tue, Sep 04, 2012 at 04:27:20PM -0400, Robert Phillips wrote:
> Ben,
>
> You have asked me to provide the rationale behind the gnttab_old_mfn patch, which you emailed to Sander earlier today.
> Here are my findings.
>
> I found that xen_blkbk_map() in drivers/block/xen-blkback/blkback.c has changed from our previous version. It now calls gnttab_map_refs() in drivers/xen/grant-table.c.
>
> That function first calls HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, ... ) and then calls m2p_add_override() in p2m.c
And HYPERVISOR_grant_table_op .. would populate map_ops[i].bus_addr with the machine address..
> which is where I made my change.
>
> The unpatched code was saving the pfn's old mfn in kmap_op->dev_bus_addr.
>
> kmap_op is of type struct gnttab_map_grant_ref. That data type is used to record grant table mappings so later they can be unmapped correctly.
Right, but the blkback makes a distinction by passing NULL as kmap_op, which means it should
use the old mechanism. Meaning that once the hypercall is done, the map_ops[i].bus_addr is not
used anymore..
>
> The problem with saving the old mfn in kmap_op->dev_bus_addr is that it is later overwritten by __gnttab_map_grant_ref() in xen/common/grant_table.c
Uh, so the problem of saving the old mfn in dev_bus_addr has been there for a long long time then?
Even before this patch set?
>
> Since the storage holding the old mfn got overwritten, the unmapping was being done incorrectly. The balloon code detected that and bugged at drivers/xen/balloon.c:359
>
Hmm, I believe the storage for holding the old mfn was/is page->index.
> My patch simply adds another member called old_mfn to struct gnttab_map_grant_ref rather than trying to overload dev_bus_addr.
>
> I don't know if Sander's bug is the same or related. The BUG_ON at drivers/xen/balloon.c:359 is quite general. It simply asserts that we are not trying to re-map a valid mapping.
Right. Somehow he ends up with valid mappings where there should be none. And lots of them.
>
> -- Robert Phillips
>
>
> -----Original Message-----
> From: Sander Eikelenboom [mailto:linux@eikelenboom.it]
> Sent: Tuesday, September 04, 2012 3:35 PM
> To: Ben Guthro
> Cc: Konrad Rzeszutek Wilk; xen-devel@lists.xen.org; Robert Phillips
> Subject: Re: [Xen-devel] dom0 linux 3.6.0-rc4, crash due to ballooning althoug dom0_mem=X, max:X set
>
>
> Tuesday, September 4, 2012, 8:07:11 PM, you wrote:
>
> > We ran into the same issue, in newer kernels - but had not yet
> > submitted this fix.
>
> > One of the developers here came up with a fix (attached, and CC'ed
> > here) that fixes an issue where the p2m code reuses a structure member
> > where it shouldn't.
> > The patch adds a new "old_mfn" member to the gnttab_map_grant_ref
> > structure, instead of re-using dev_bus_addr.
>
>
> > If this also works for you, I can re-submit it with a Signed-off-by
> > line, if you prefer, Konrad.
>
> Hi Ben,
>
> This patch doesn't work for me:
>
> When starting the PV-guest i get:
>
> (XEN) [2012-09-04 20:31:37] grant_table.c:499:d0 Bad flags in grant map op (68b69070).
> (XEN) [2012-09-04 20:31:37] grant_table.c:499:d0 Bad flags in grant map op (0).
> (XEN) [2012-09-04 20:31:37] grant_table.c:499:d0 Bad flags in grant map op (0).
>
>
> and from the dom0 kernel:
>
> [ 374.425727] BUG: unable to handle kernel paging request at ffff8800fffd9078
> [ 374.428901] IP: [<ffffffff81336e4e>] gnttab_map_refs+0x14e/0x270
> [ 374.428901] PGD 1e0c067 PUD 0
> [ 374.428901] Oops: 0000 [#1] PREEMPT SMP
> [ 374.428901] Modules linked in:
> [ 374.428901] CPU 0
> [ 374.428901] Pid: 4308, comm: qemu-system-i38 Not tainted 3.6.0-rc4-20120830+ #70 System manufacturer System Product Name/P5Q-EM DO
> [ 374.428901] RIP: e030:[<ffffffff81336e4e>] [<ffffffff81336e4e>] gnttab_map_refs+0x14e/0x270
> [ 374.428901] RSP: e02b:ffff88002f185ca8 EFLAGS: 00010206
> [ 374.428901] RAX: ffff880000000000 RBX: ffff88001471cf00 RCX: 00000000fffd9078
> [ 374.428901] RDX: 0000000000000050 RSI: 40000000000fffd9 RDI: 00003ffffffff000
> [ 374.428901] RBP: ffff88002f185d08 R08: 0000000000000078 R09: 0000000000000000
> [ 374.428901] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000004
> [ 374.428901] R13: ffff88001471c480 R14: 0000000000000002 R15: 0000000000000002
> [ 374.428901] FS: 00007f6def9f2740(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
> [ 374.428901] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b
> [ 374.428901] CR2: ffff8800fffd9078 CR3: 000000002d30e000 CR4: 0000000000042660
> [ 374.428901] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 374.428901] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [ 374.428901] Process qemu-system-i38 (pid: 4308, threadinfo ffff88002f184000, task ffff8800376f1040)
> [ 374.428901] Stack:
> [ 374.428901] ffffffffffffffff 0000000000000050 00000000fffd9078 00000000000fffd9
> [ 374.428901] 0000000001000000 ffff8800382135a0 ffff88002f185d08 ffff880038211960
> [ 374.428901] ffff88002f11d2c0 0000000000000004 0000000000000003 0000000000000001
> [ 374.428901] Call Trace:
> [ 374.428901] [<ffffffff8134212e>] gntdev_mmap+0x20e/0x520
> [ 374.428901] [<ffffffff8111c502>] ? mmap_region+0x312/0x5a0
> [ 374.428901] [<ffffffff810ae0a0>] ? lockdep_trace_alloc+0xa0/0x130
> [ 374.428901] [<ffffffff8111c5be>] mmap_region+0x3ce/0x5a0
> [ 374.428901] [<ffffffff8111c9e0>] do_mmap_pgoff+0x250/0x350
> [ 374.428901] [<ffffffff81109e88>] vm_mmap_pgoff+0x68/0x90
> [ 374.428901] [<ffffffff8111a5b2>] sys_mmap_pgoff+0x152/0x170
> [ 374.428901] [<ffffffff812b29be>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> [ 374.428901] [<ffffffff81011f29>] sys_mmap+0x29/0x30
> [ 374.428901] [<ffffffff8184b939>] system_call_fastpath+0x16/0x1b
> [ 374.428901] Code: 0f 84 e7 00 00 00 48 89 f1 48 c1 e1 0c 41 81 e0 ff 0f 00 00 48 b8 00 00 00 00 00 88 ff ff 48 bf 00 f0 ff ff ff 3f 00 00 4c 01 c1 <48> 23 3c 01 48 c1 ef 0c 49 8d 54 15 00 4d 85 ed b8 00 00 00 00
> [ 374.428901] RIP [<ffffffff81336e4e>] gnttab_map_refs+0x14e/0x270
> [ 374.428901] RSP <ffff88002f185ca8>
> [ 374.428901] CR2: ffff8800fffd9078
> [ 374.428901] ---[ end trace 0e0a5a49f6503c0a ]---
>
>
>
> > Ben
>
>
> > On Tue, Sep 4, 2012 at 1:19 PM, Sander Eikelenboom <linux@eikelenboom.it> wrote:
> >>
> >> Tuesday, September 4, 2012, 6:33:47 PM, you wrote:
> >>
> >>> On Tue, Sep 04, 2012 at 06:37:57PM +0200, Sander Eikelenboom wrote:
> >>>> Hi Konrad,
> >>>>
> >>>> This seems to happen only on a intel machine i'm trying to setup as a development machine (haven't seen it on my amd).
> >>>> It boots fine, i have dom0_mem=1024M,max:1024M set, the machine has 2G of mem.
> >>
> >>> Is this only with Xen 4.2? As, does Xen 4.1 work?
> >>>>
> >>>> Dom0 and guest kernel are 3.6.0-rc4 with config:
> >>
> >>> If you back out:
> >>
> >>> f393387d160211f60398d58463a7e65
> >>> Author: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> >>> Date: Fri Aug 17 16:43:28 2012 -0400
> >>
> >>> xen/setup: Fix one-off error when adding for-balloon PFNs to the P2M.
> >>
> >>> Do you see this bug? (Either with Xen 4.1 or Xen 4.2)?
> >>
> >> With c96aae1f7f393387d160211f60398d58463a7e65 reverted i still see this bug (with Xen 4.2).
> >>
> >> Will use the debug patch you mailed and send back the results ...
> >>
> >>
> >>>> [*] Xen memory balloon driver
> >>>> [*] Scrub pages before returning them to system
> >>>>
> >>>> From http://wiki.xen.org/wiki/Do%EF%BB%BFm0_Memory_%E2%80%94_Where_It_Has_Not_Gone , I thought this should be okay
> >>>>
> >>>> But when trying to start a PV guest with 512MB mem, the machine (dom0) crashes with the stacktrace below (complete serial-log.txt attached).
> >>>>
> >>>> From the:
> >>>> "mapping kernel into physical memory
> >>>> about to get started..."
> >>>>
> >>>> I would almost say it's trying to reload dom0 ?
> >>>>
> >>>>
> >>>> [ 897.161119] device vif1.0 entered promiscuous mode
> >>>> mapping kernel into physical memory
> >>>> about to get started...
> >>>> [ 897.696619] xen_bridge: port 1(vif1.0) entered forwarding state
> >>>> [ 897.716219] xen_bridge: port 1(vif1.0) entered forwarding state
> >>>> [ 898.129465] ------------[ cut here ]------------
> >>>> [ 898.132209] kernel BUG at drivers/xen/balloon.c:359!
> >>>> [ 898.132209] invalid opcode: 0000 [#1] PREEMPT SMP
> >>
> >>
> >>
> >> _______________________________________________
> >> Xen-devel mailing list
> >> Xen-devel@lists.xen.org
> >> http://lists.xen.org/xen-devel
next prev parent reply other threads:[~2012-09-05 14:06 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-09-04 16:37 dom0 linux 3.6.0-rc4, crash due to ballooning althoug dom0_mem=X, max:X set Sander Eikelenboom
2012-09-04 16:33 ` Konrad Rzeszutek Wilk
2012-09-04 17:19 ` Sander Eikelenboom
2012-09-04 18:07 ` Ben Guthro
2012-09-04 18:22 ` Konrad Rzeszutek Wilk
2012-09-04 18:57 ` Sander Eikelenboom
2012-09-04 19:34 ` Sander Eikelenboom
2012-09-04 20:27 ` Robert Phillips
2012-09-05 14:06 ` Konrad Rzeszutek Wilk [this message]
2012-09-05 14:38 ` Sander Eikelenboom
2012-09-05 20:19 ` Konrad Rzeszutek Wilk
2012-09-05 22:52 ` Sander Eikelenboom
2012-09-06 10:57 ` Konrad Rzeszutek Wilk
2012-09-06 11:16 ` Sander Eikelenboom
2012-09-06 16:46 ` Sander Eikelenboom
2012-09-11 16:02 ` Stefano Stabellini
2012-09-12 10:28 ` Sander Eikelenboom
2012-09-12 11:28 ` Stefano Stabellini
2012-09-13 13:32 ` Konrad Rzeszutek Wilk
2012-09-13 13:42 ` Robert Phillips
2012-09-14 14:53 ` Conny Seidel
2012-09-14 17:00 ` Konrad Rzeszutek Wilk
2012-09-14 17:38 ` Conny Seidel
2012-09-17 19:14 ` Sander Eikelenboom
2012-09-17 19:23 ` Konrad Rzeszutek Wilk
2012-09-04 16:39 ` Konrad Rzeszutek Wilk
2012-09-04 18:02 ` Sander Eikelenboom
2012-09-04 17:58 ` Konrad Rzeszutek Wilk
2012-09-04 19:01 ` Sander Eikelenboom
2012-09-04 20:13 ` Sander Eikelenboom
2012-09-04 21:23 ` Sander Eikelenboom
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120905140600.GA5844@phenom.dumpdata.com \
--to=konrad.wilk@oracle.com \
--cc=ben@guthro.net \
--cc=linux@eikelenboom.it \
--cc=robert.phillips@citrix.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).