Konrad Rzeszutek Wilk a écrit : > On Thu, Jan 20, 2011 at 03:37:36PM +0000, Ian Campbell wrote: >> On Thu, 2011-01-20 at 15:06 +0000, Konrad Rzeszutek Wilk wrote: >>> On Thu, Jan 20, 2011 at 12:18:26PM +0100, castet.matthieu@free.fr wrote: >>>> Quoting Konrad Rzeszutek Wilk : >>>> >>>>> On Wed, Jan 19, 2011 at 11:59:57PM +0100, matthieu castet wrote: >>>>>> Le Wed, 19 Jan 2011 16:14:32 -0500, >>>>>> Konrad Rzeszutek Wilk a écrit : >>>>>>>>> I was just shown this[1] on Xen from an Ubuntu bug report[2]. >>>>>>>>> >>>>>>>>> [ 1.230382] NX-protecting the kernel data: 3884k >>>>>>>>> [ 1.231002] BUG: unable to handle kernel paging request at >>>>>>>>> c1782ae0 ... >>>>>>>>> [ 1.231145] Call Trace: >>>>>>>>> [ 1.231152] [] ? __change_page_attr+0x2c1/0x370 >>>>>>>>> [ 1.231161] [] ? __purge_vmap_area_lazy+0xc1/0x180 >>>>>>>>> [ 1.231169] [] ? >>>>>>>>> __change_page_attr_set_clr+0x4c/0xb0 [ 1.231176] >>>>>>>>> [] ? change_page_attr_set_clr+0x128/0x300 >>>>>>>>> [ 1.231183] [] ? >>>>>>>>> __raw_callee_save_xen_restore_fl+0x6/0x8 [ 1.231192] >>>>>>>>> [] ? vprintk+0x171/0x3f0 [ 1.231198] [] ? >>>>>>>>> set_memory_nx+0x5f/0x70 >>>>>>>> If you run it with Xen debugging enabled: >>>>>>>> >>>>>>>> [ 7.753329] NX-protecting the kernel data: 2400k >>>>>>>> (XEN) mm.c:2389:d0 Bad type (saw 3c000003 != exp 70000000) for mfn >>>>>> this happen if (x & (PGT_type_mask|PGT_pae_xen_l2)) != type) >>>>>> >>>>>> but >>>>>> #define PGT_type_mask (7U<<29) /* Bits 29-31. */ >>>>>> #define _PGT_pae_xen_l2 26 >>>>>> #define PGT_pae_xen_l2 (1U<<_PGT_pae_xen_l2) >>>>>> >>>>>> but (exp type = 0x70000000) & (PGT_type_mask|PGT_pae_xen_l2) = >>>>>> 0x60000000 >>>>>> >>>>>> So the exp type look strange. >>>>>> #define _PGT_pinned 28 >>>>>> #define PGT_pinned (1U<<_PGT_pinned) >>>>>> >>>>>>>> 1355a5 (pfn 15a5) (XEN) mm.c:889:d0 Error getting mfn 1355a5 (pfn >>>>>>>> 15a5) from L1 entry 80000001355a5063 for l1e_owner=0, pg_owner=0 >>>>>>>> (XEN) mm.c:4958:d0 ptwr_emulate: could not get_page_from_l1e() >>>>>>>> [ 7.759087] BUG: unable to handle kernel paging request at >>>>>>>> c82a4d28 [ 7.759087] IP: [] >>>>>>>> xen_set_pte_atomic+0x21/0x2f [ 7.759087] *pdpt = >>>>>>>> 0000000001663001 *pde = 00000000082db067 *pte = 80000000082a4061 .. >>>>>>>> and same stack trace. >>>>>>>> >>>>>>>>> >>>>>>>>> Does Xen have different size page table allocations or something >>>>>>>>> weird? >>>>>>>> The same page size. Not sure actually why it is being triggered. >>>>>>>> Let me copy Keir on this. Keir, the region that is being marked as >>>>>>>> _NX is .bss one and >>>>>>> _past_ the __init_end it dies. Any ideas? >>>>>>> >>>>>> Does this happen if you add ". = ALIGN(HPAGE_SIZE);" before bss section >>>>>> in arch/x86/kernel/vmlinux.lds.S ? >>>>> Like this? >>>> Yes >>>>> yeeeey...That made it boot. >>>>> >>>>>> What's the output of kernel_page_tables debugfs ? >>>>> Shees.. I get >>>>> >>>>> [ 73.723105] BUG: unable to handle kernel paging request at 15555000 >>>> [...] >>>>> with the patch and if I revert 5bd5a452662bc37c54fb6828db1a3faf87e6511c.. >>>>> >>>>> That looks to be another bug to hunt down. >>>>> >>>> No that the same bug : that the root cause. >>>> >>>> For some reason with xen, accessing some page tables (bss and after) make the >>>> system crash. >>> I think I know the failure in the first case - the swapper_pg_dir is marked as _RO >>> and you are not suppose to make it _RW (unless you first do a bit of dance and switch >>> over to another pagetable). The reason being that Xen has a symbiotic relationship >>> with PV domains where pagetables are marked _RO so that any update to >>> it will go through Xen so it can validate that we aren't doing anything stupid. >>> >>> But accessing the page table should be OK, not sure why it crashed - we >>> aren't writting anything to it - just reading. >>> >>> Let me copy Ian on this - he might have better ideas. >> It's pretty hard to follow the quoted context above but it certainly >> seems plausible that set_memory_nx could inadvertently end up trying to >> make a page which Xen made RO into a RW again. >> >> For example the callchain appear to pass through static_protections() >> which explicitly makes .data and .bss writeable, I think these regions >> can potentially contain page table pages -- e.g. allocated from BRK >> perhaps? > > They definitly do - it has the level1_ident_pgt, which is definitly used > during bootup. > Ok that make sense > Perhaps the fix is when marking NX, just do NX, don't try to set RW if they > are RO. > What do you think of this patch ? Matthieu