From: Keir Fraser <keir.fraser@eu.citrix.com>
To: MaoXiaoyun <tinnycloud@hotmail.com>,
"jbeulich@novell.com" <jbeulich@novell.com>
Cc: xen devel <xen-devel@lists.xensource.com>
Subject: Re: Xen-unstable panic: FATAL PAGE FAULT
Date: Wed, 1 Sep 2010 11:25:22 +0100 [thread overview]
Message-ID: <C8A3E8A2.21BB6%keir.fraser@eu.citrix.com> (raw)
In-Reply-To: <BAY121-W212818AFA30B1FA26F4F7CDA8B0@phx.gbl>
That doesn't imply anything. It is perfectly valid for a page's prev or next
index to be PAGE_LIST_NULL, if that page is not in a list, or if it is at
the head and/or tail of a list.
-- Keir
On 01/09/2010 11:21, "MaoXiaoyun" <tinnycloud@hotmail.com> wrote:
> Thanks Keir.
>
> I myself did below test. in page_alloc.c.
> check_page will panic on all pages which the 6th character in its adddress is
> '3', i used to indicate which line paniced.
>
> Below output indicates the panic comes from line 558, and the page address is
> ffff82f600002040, while its next page
> is ffff8315ffffffe0, compare to the panic address in previous
> panic(ffff8315ffffffe4), which is very similar.
>
> I think this should imply something.
>
> ---------------------------------------
> (XEN) -----------18
> (XEN) System RAM: 24542MB (25131224kB)
> (XEN) SRAT: No PXM for e820 range: 0000000000000000 - 000000000009a7ff
> (XEN) SRAT: SRAT not used.
> (XEN) ----------------pgb ffff82f600002040 pg ffff8315ffffffe0, mask 1, order
> 0, 0
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 0:
> (XEN) xmao invalid page address assigned
> (XEN) ****************************************
> (XEN)
>
> ----------------------------------------------------
> 485 static int check_page(struct page_info* pgb, struct page_info* pg,
> unsigned long mask, unsigned int order, int i){
> 486
> 487 if((unsigned long)pg & 0x0000020000000000 &&
> 488 (unsigned long)pg & 0x0000010000000000
> 489 ){
> 490 printk("----------------pgb %p pg %p, mask %lx, order
> %d, %d\n", pgb, pg, mask, order, i);
> 491 panic("xmao invalid page address assigned \n");
> 492 }
> 493 return 0;
> 494 }
>
> 549 if ( (page_to_mfn(pg) & mask) )
> 550 {
> 551 /* Merge with predecessor block? */
> 552 if ( !mfn_valid(page_to_mfn(pg-mask)) ||
> 553 !page_state_is(pg-mask, free) ||
> 554 (PFN_ORDER(pg-mask) != order) )
> 555 break;
> 556 pg -= mask;
> 557
> 558 check_page(pg, pdx_to_page(pg->list.next), mask, order, 0);
> 559 check_page(pg, pdx_to_page(pg->list.prev), mask, order, 1);
> 560
> 561 page_list_del(pg, &heap(node, zone, order));
> 562 }
> 563 else
> 564 {
> 565 /* Merge with successor block? */
> 566 if ( !mfn_valid(page_to_mfn(pg+mask)) ||
> 567 !page_state_is(pg+mask, free) ||
> 568 (PFN_ORDER(pg+mask) != order) )
> 569 break;
> 570
> 571 pgt = pg + mask;
> 572 check_page(pg, pdx_to_page(pgt->list.next), mask, order, 2);
> 573 check_page(pg, pdx_to_page(pgt->list.prev), mask, order, 3);
> 574
>
>> Date: Wed, 1 Sep 2010 10:58:54 +0100
>> Subject: Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT
>> From: keir.fraser@eu.citrix.com
>> To: tinnycloud@hotmail.com; jbeulich@novell.com
>> CC: xen-devel@lists.xensource.com
>>
>> Hm, well, it is a bit weird. The check in init_heap_pages() ought to prevent
>> merging across node boundaries. Nonetheless the code is simpler and more
>> obvious if we put a further merging constraint in free_heap_pages() instead.
>> It's also correcter, since I'm not sure that the
>> phys_to_nid(page_to_maddr(pg-1)) in init_heap_pages() won't possibly BUG out
>> if pg-1 is not a RAM page and is not in a known NUMA node range.
>>
>> Please give the attached patch a spin. (You should revert the previous
>> patch, of course).
>>
>> Thanks,
>> Keir
>>
>> On 01/09/2010 10:23, "MaoXiaoyun" <tinnycloud@hotmail.com> wrote:
>>
>>> Well. It did crash on every startup.
>>>
>>> below is what I got.
>>> ---------------------------------------------------
>>> root (hd0,0)
>>> Filesystem type is ext2fs, partition type 0x83
>>> kernel /xen-4.0.0.gz msi=1 iommu=off x2apic=off hap=0 dom0_mem=10240M
>>> dom0_max_
>>> vcpus=4 dom0_vcpus_pin console=com1,vga com1=115200,8n1 conswitch=ax
>>> noreboot
>>> [Multiboot-elf, <0x100000:0x152000:0x148000>, shtab=0x39a078,
>>> entry=0x100000
>>> ]
>>> module /vmlinuz-2.6.31.13-pvops-patch ro root=LABEL=/ hda=noprobe
>>> console=hvc0
>>> [Multiboot-module @ 0x39b000, 0x3214d0 bytes]
>>>
>>>
>>> __ __ _ _
>>> ___ ___
>>> \ \/ /___ _ __ | || | / _ \ / _ \ *
>>> \ // _ \ '_ \ | || |_| | | | | | | *
>>> / \ __/ | | | |__ _| |_| | |_| | * *
>>> /_/\_\___|_| |_| |_|(_)___(_)___/ **************************************
>>> hich entry is highlighted.
>>> (XEN) Xen version 4.0.0 (root@dev.sd.aliyun.com) (gcc version 4.1.2 20080704
>>> (Red Hat 4.1.2-46)) Wed Sep 1 17:13:35 CST 2010
>>> (XEN) Latest ChangeSet: unavailableto modify the kernel arguments
>>> (XEN) Command line: msi=1 iommu=off x2apic=off hap=0 dom0_mem=10240M
>>> dom0_max_vcpus=4 dom0_vcpus_pin console=com1,vga com1=115200,8n1
>>> conswitch=ax
>>> noreboot
>>> (XEN) Video information:
>>> (XEN) VGA is text mode 80x25, font 8x16automatically in 3 seconds.
>>> (XEN) VBE/DDC methods: none; EDID transfer time: 0 seconds
>>> (XEN) EDID info not retrieved because no DDC retrieval method detected
>>> (XEN) Disc information:
>>> (XEN) Found 6 MBR signatures
>>> (XEN) Found 6 EDD information structures
>>> (XEN) Xen-e820 RAM map:
>>> (XEN) 0000000000000000 - 000000000009a800 (usable)
>>> (XEN) 000000000009a800 - 00000000000a0000 (reserved)
>>> (XEN) 00000000000e4bb0 - 0000000000100000 (reserved)
>>> (XEN) 0000000000100000 - 00000000bf790000 (usable)
>>> (XEN) 00000000bf790000 - 00000000bf79e000 (ACPI data)
>>> (XEN) 00000000bf79e000 - 00000000bf7d0000 (ACPI NVS)
>>> (XEN) 00000000bf7d0000 - 00000000bf7e0000 (reserved)
>>> (XEN) 00000000bf7ec000 - 00000000c0000000 (reserved)
>>> (XEN) 00000000e0000000 - 00000000f0000000 (reserved)
>>> (XEN) 00000000fee00000 - 00000000fee01000 (reserved)
>>> (XEN) 00000000fff00000 - 0000000100000000 (reserved)
>>> (XEN) 0000000100000000 - 0000000640000000 (usable)
>>> (XEN) --------------849
>>> (XEN) --------------849
>>> (XEN) --------------849
>>> (XEN) ACPI: RSDP 000F9DD0, 0024 (r2 ACPIAM)
>>> (XEN) ACPI: XSDT BF790100, 005C (r1 112309 XSDT1113 20091123 MSFT 97)
>>> (XEN) ACPI: FACP BF790290, 00F4 (r4 112309 FACP1113 20091123 MSFT 97)
>>> (XEN) ACPI: DSDT BF7904B0, 4D6A (r2 CTSAV CTSAV122 122 INTL 20051117)
>>> (XEN) ACPI: FACS BF79E000, 0040
>>> (XEN) ACPI: APIC BF790390, 00D8 (r2 112309 APIC1113 20091123 MSFT 97)
>>> (XEN) ACPI: MCFG BF790470, 003C (r1 112309 OEMMCFG 20091123 MSFT 97)
>>> (XEN) ACPI: OEMB BF79E040, 007A (r1 112309 OEMB1113 20091123 MSFT 97)
>>> (XEN) ACPI: SRAT BF79A4B0, 01D0 (r1 112309 OEMSRAT 1 INTL 1)
>>> (XEN) ACPI: HPET BF79A680, 0038 (r1 112309 OEMHPET 20091123 MSFT 97)
>>> (XEN) ACPI: SSDT BF7A1A00, 0363 (r1 DpgPmm CpuPm 12 INTL 20051117)
>>> (XEN) --------------847
>>> (XEN) ---------srat enter
>>> (XEN) ---------prepare enter into pfn
>>> (XEN) -------in pfn
>>> (XEN) -------hole shift returned
>>> (XEN) --------------849
>>> (XEN) System RAM: 24542MB (25131224kB)
>>> (XEN) Unknown interrupt (cr2=0000000000000000)
>>> (XEN) 00000000000000ab 0000000000000000 ffff82f600004020
>>> 00007d0a00000000 ffff82f600004000 0000000000000020 0000000000201000
>>> 0000000000000000 ffffffffffffffff 0000000000000000 0000000000000008
>>> 0000000000000000 00000000000001ff 00000000000001ff 0000000000000000
>>> ffff82c480115787 000000000000e008 0000000000010002 ffff82c48035fd18
>>> 0000000000000000 ffff82c48011536a 0000000000000000 0000000000000000
>>> 0000000000000163 0000000900000000 00000000000000ab 0000000000000201
>>> 0000000000000000 0000000000000100 ffff82f600004020 0000000000000eff
>>> 0000000000000000 ffff82c480115e60 0000000000000000 ffff82f600002020
>>> 0000000000001000 0000000000000004 0000000000000080 0000000000000001
>>> ffff82c48020be8d ffff830000100000 0000000000000008 0000000000000000
>>> 0000000000000000 ffffffffffffffff 0000000000000101 ffff82c48022d8fc
>>> 0000000000540000 00000000005fde36 0000000000540000 0000000000100000
>>> 0000000100000000 0000000000000010 ffff82c48024deb4 ffff82c4802404f7
>>> 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>>> 0000000000000000 ffff8300bf568ff8 ffff8300bf569ff8 000000000022a630
>>> 000000000022a695 0000000000087f00 0000000000000000 ffff830000087fc0
>>> 00000000005fde36 000000000087b6d0 0000000000d44000 0000000001000000
>>> 0000000000000000 ffffffffffffffff ffff830000087f00 0000100000000000
>>> 0000000800000000 000000010000006e 0000000000000003 00000000000002f8
>>> 0000000000000000 0000000000000000 0000000000067ebc 0000000000000000
>>> 0000000000000000 0000000000000000 0000000000000000 ffff82c4801000b5
>>> 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>>> 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>>> 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>>> 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>>> 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>>> 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>>> 0000000000000000 0000000000000000 00000000fffff000
>>>
>>>> Date: Wed, 1 Sep 2010 09:49:18 +0100
>>>> Subject: Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT
>>>> From: keir.fraser@eu.citrix.com
>>>> To: JBeulich@novell.com
>>>> CC: tinnycloud@hotmail.com; xen-devel@lists.xensource.com
>>>>
>>>> On 01/09/2010 09:02, "Jan Beulich" <JBeulich@novell.com> wrote:
>>>>
>>>>>> Well I agree with your logic anyway. So I don't see that this can be the
>>>>>> cause of MaoXiaoyun's bug. At least not directly. But then I'm stumped as
>>>>>> to
>>>>>> why the page arithmetic and checks in free_heap_pages are (apparently)
>>>>>> resulting in a page pointer way outside the frame-table region and
>>>>>> actually
>>>>>> in the directmap region.
>>>>>
>>>>> There must be some unchecked use of PAGE_LIST_NULL, i.e.
>>>>> running off a list end without taking notice (0xffff8315ffffffe4
>>>>> exactly corresponds with that).
>>>>
>>>> Okay, my next guess then is that we are deleting a chunk from the wrong
>>>> list
>>>> head. I don't see any check that the adjacent chunks we are considering to
>>>> merge are from the same node and zone. I suppose the zone logic does just
>>>> work as we're dealing with 2**x aligned and sized regions. But, shouldn't
>>>> the merging logic in free_heap_pages be checking that the merging candidate
>>>> is from the same NUMA node? I see I have an ASSERTion later in the same
>>>> function, but it's too weak and wishful I suspect.
>>>>
>>>> MaoXiaoyun: can you please test with the attached patch? If I'm right, you
>>>> will crash on one of the BUG_ON checks that I added, rather than crashing
>>>> on
>>>> a pointer dereference. You may even crash during boot. Anyhow, what is
>>>> interesting is whether this patch always makes you crash on BUG_ON before
>>>> you would normally crash on pointer dereference. If so this is trivial to
>>>> fix.
>>>>
>>>> Thanks,
>>>> Keir
>>>>
>>>
>>
>
next prev parent reply other threads:[~2010-09-01 10:25 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <SNT0-MC2-F12iKC1rdi000797d9@snt0-mc2-f12.Snt0.hotmail.com>
2010-08-26 4:49 ` Re:Re: Xen-unstable panic: FATAL PAGE FAULT MaoXiaoyun
2010-08-26 7:39 ` Keir Fraser
2010-08-26 8:59 ` MaoXiaoyun
2010-08-26 9:11 ` Keir Fraser
2010-08-30 8:47 ` MaoXiaoyun
2010-08-30 9:02 ` Keir Fraser
2010-08-30 13:03 ` MaoXiaoyun
2010-08-30 13:16 ` Keir Fraser
2010-08-31 13:49 ` MaoXiaoyun
2010-08-31 14:49 ` Keir Fraser
2010-08-31 15:00 ` Keir Fraser
2010-08-31 15:07 ` Jan Beulich
2010-08-31 16:01 ` Keir Fraser
2010-08-31 16:22 ` Jan Beulich
2010-08-31 16:35 ` Keir Fraser
2010-08-31 17:03 ` Keir Fraser
2010-09-01 7:17 ` MaoXiaoyun
2010-09-01 7:40 ` Keir Fraser
2010-09-01 8:05 ` Jan Beulich
2010-09-01 8:32 ` MaoXiaoyun
2010-09-01 8:02 ` Jan Beulich
2010-09-01 8:49 ` Keir Fraser
2010-09-01 9:01 ` Jan Beulich
2010-09-01 9:28 ` Keir Fraser
2010-09-01 9:48 ` MaoXiaoyun
2010-09-01 10:09 ` Jan Beulich
2010-09-01 9:06 ` MaoXiaoyun
2010-09-01 9:23 ` MaoXiaoyun
2010-09-01 9:58 ` Keir Fraser
2010-09-01 10:21 ` MaoXiaoyun
2010-09-01 10:25 ` Keir Fraser [this message]
2010-09-01 10:28 ` Keir Fraser
2010-09-01 10:34 ` Jan Beulich
2010-09-01 11:32 ` MaoXiaoyun
2010-09-01 7:54 ` Jan Beulich
2010-09-01 3:17 ` MaoXiaoyun
2010-02-06 22:56 Mark Hurenkamp
2010-02-07 11:56 ` Keir Fraser
2010-04-30 20:52 ` Bastian Blank
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=C8A3E8A2.21BB6%keir.fraser@eu.citrix.com \
--to=keir.fraser@eu.citrix.com \
--cc=jbeulich@novell.com \
--cc=tinnycloud@hotmail.com \
--cc=xen-devel@lists.xensource.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).