* 2.6.10-bkcurr: major slab corruption preventing booting on ARM
@ 2005-01-04 14:43 Russell King
2005-01-04 16:10 ` Russell King
0 siblings, 1 reply; 5+ messages in thread
From: Russell King @ 2005-01-04 14:43 UTC (permalink / raw)
To: Linux Kernel List
Hi.
I've had a report from a fellow ARM hacker of their platform not
booting. After they turned on slab debugging, they saw (pieced
together from a report on IRC):
Freeing init memory: 104K
run_init_process(/bin/bash)
Slab corruption: start=c0010934, len=160
Last user: [<c00adc54>](d_alloc+0x28/0x2d8)
I've just run up 2.6.10-bkcurr on a different ARM platform, and
encountered the following output. It looks like there's serious
slab corruption issues in these kernels.
I'll dig a little further into the report below to see if there's
anything obvious.
Starting up networking
eth0: link down
eth0: link up, 10Mbps, half-duplex, lpa 0x0021
Starting network services
slab: Internal list corruption detected in cache 'buffer_head'(63), slabp c7912000(16). Hexdump:
000: 00 01 10 00 00 02 20 00 14 01 00 00 14 21 91 c7
010: 10 00 00 00 10 00 00 00 fe ff ff ff fe ff ff ff
020: fe ff ff ff fe ff ff ff fe ff ff ff fe ff ff ff
030: fe ff ff ff fe ff ff ff fe ff ff ff fe ff ff ff
040: fe ff ff ff fe ff ff ff fe ff ff ff fe ff ff ff
050: fe ff ff ff fe ff ff ff 11 00 00 00 12 00 00 00
060: 13 00 00 00 14 00 00 00 15 00 00 00 16 00 00 00
070: 17 00 00 00 0a 60 6b 6b 19 00 00 00 1a 00 00 00
080: 1b 00 00 00 1c 00 00 00 1d 00 00 00 1e 00 00 00
090: 1f 00 00 00 20 00 00 00 21 00 00 00 22 00 00 00
0a0: 23 00 00 00 24 00 00 00 25 00 00 00 26 00 00 00
0b0: 27 00 00 00 28 00 00 00 29 00 00 00 2a 00 00 00
0c0: 2b 00 00 00 2c 00 00 00 2d 00 00 00 2e 00 00 00
0d0: 2f 00 00 00 30 00 00 00 31 00 00 00 32 00 00 00
0e0: 33 00 00 00 34 00 00 00 35 00 00 00 36 00 00 00
0f0: 37 00 00 00 38 00 00 00 39 00 00 00 3a 00 00 00
kernel BUG at /home/rmk/build/linux-v2.6-local/mm/slab.c:1977!
Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = c0004000
[00000000] *pgd=00000000
Internal error: Oops: 817 [#1]
Modules linked in:
CPU: 0
PC is at __bug+0x40/0x54
LR is at 0x1
pc : [<c00263f8>] lr : [<00000001>] Not tainted
sp : c03c5ee4 ip : 60000093 fp : c03c5ef4
r10: 00000007 r9 : 00000000 r8 : c7912018
r7 : 00000000 r6 : c039e8e0 r5 : c7912000 r4 : 00000000
r3 : 00000000 r2 : 00000000 r1 : 000012f3 r0 : 00000001
Flags: nZCv IRQs off FIQs on Mode SVC_32 Segment kernel
Control: 5717F Table: 07A44000 DAC: 00000017
Process events/0 (pid: 3, stack limit = 0xc03c4190)
Stack: (0xc03c5ee4 to 0xc03c6000)
5ee0: c7912114 c03c5f24 c03c5ef8 c005e66c c00263c8 c0399a28 00000007
5f00: c0399a18 c0399a28 c039e8e0 c023ee68 00000000 c023ee78 c03c5f44 c03c5f28
5f20: c005ef64 c005e5e0 c039e8e0 00000000 c039e950 00000001 c03c5f70 c03c5f48
5f40: c005f020 c005eee4 c038fea8 c023ee88 80000013 c038fea0 00000000 c038fe98
5f60: c005ef88 c03c5fc8 c03c5f74 c0048c14 c005ef98 ffffffff ffffffff 00000001
5f80: 00000000 c0035efc 00010000 00000000 00000000 c038d7c0 c0035efc 00100100
5fa0: 00200200 c03c4000 c03b3f34 c038fe98 c0048a4c fffffffc 00000000 c03c5ff4
5fc0: c03c5fcc c004d148 c0048a5c ffffffff ffffffff 00000000 00000000 00000000
5fe0: 00000000 00000000 00000000 c03c5ff8 c003b7b8 c004d0d4 00000000 00000000
Backtrace:
[<c00263b8>] (__bug+0x0/0x54) from [<c005e66c>] (free_block+0x9c/0x18c)
r4 = C7912114
[<c005e5d0>] (free_block+0x0/0x18c) from [<c005ef64>] (drain_array_locked+0x90/0xb4)
[<c005eed4>] (drain_array_locked+0x0/0xb4) from [<c005f020>] (cache_reap+0x98/0x208)
r7 = 00000001 r6 = C039E950 r5 = 00000000 r4 = C039E8E0
[<c005ef88>] (cache_reap+0x0/0x208) from [<c0048c14>] (worker_thread+0x1c8/0x258)
[<c0048a4c>] (worker_thread+0x0/0x258) from [<c004d148>] (kthread+0x84/0xb0)
[<c004d0c4>] (kthread+0x0/0xb0) from [<c003b7b8>] (do_exit+0x0/0x408)
r8 = 00000000 r7 = 00000000 r6 = 00000000 r5 = 00000000
r4 = 00000000
Code: 1b004cba e59f0014 eb004cb8 e3a03000 (e5833000)
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/
2.6 Serial core
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: 2.6.10-bkcurr: major slab corruption preventing booting on ARM 2005-01-04 14:43 2.6.10-bkcurr: major slab corruption preventing booting on ARM Russell King @ 2005-01-04 16:10 ` Russell King 2005-01-04 17:18 ` Ben Dooks 2005-01-04 17:21 ` Russell King 0 siblings, 2 replies; 5+ messages in thread From: Russell King @ 2005-01-04 16:10 UTC (permalink / raw) To: Linux Kernel List On Tue, Jan 04, 2005 at 02:43:50PM +0000, Russell King wrote: > I've had a report from a fellow ARM hacker of their platform not > booting. After they turned on slab debugging, they saw (pieced > together from a report on IRC): > > Freeing init memory: 104K > run_init_process(/bin/bash) > Slab corruption: start=c0010934, len=160 > Last user: [<c00adc54>](d_alloc+0x28/0x2d8) > > I've just run up 2.6.10-bkcurr on a different ARM platform, and > encountered the following output. It looks like there's serious > slab corruption issues in these kernels. > > I'll dig a little further into the report below to see if there's > anything obvious. Ok, reverting the pud_t patch fixes both these problems (the exact patch can be found at: http://www.home.arm.linux.org.uk/~rmk/misc/bk4-bk5 Note that this is not a plain bk4-bk5 patch, but just the pud_t changes brought forward to bk6 or there abouts.) So, something in the 4 level page table patches is causing random scribbling in kernel memory. -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/ 2.6 Serial core ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 2.6.10-bkcurr: major slab corruption preventing booting on ARM 2005-01-04 16:10 ` Russell King @ 2005-01-04 17:18 ` Ben Dooks 2005-01-04 17:21 ` Russell King 1 sibling, 0 replies; 5+ messages in thread From: Ben Dooks @ 2005-01-04 17:18 UTC (permalink / raw) To: Linux Kernel List On Tue, Jan 04, 2005 at 04:10:49PM +0000, Russell King wrote: > On Tue, Jan 04, 2005 at 02:43:50PM +0000, Russell King wrote: > > I've had a report from a fellow ARM hacker of their platform not > > booting. After they turned on slab debugging, they saw (pieced > > together from a report on IRC): > > > > Freeing init memory: 104K > > run_init_process(/bin/bash) > > Slab corruption: start=c0010934, len=160 > > Last user: [<c00adc54>](d_alloc+0x28/0x2d8) > > > > I've just run up 2.6.10-bkcurr on a different ARM platform, and > > encountered the following output. It looks like there's serious > > slab corruption issues in these kernels. > > > > I'll dig a little further into the report below to see if there's > > anything obvious. > > Ok, reverting the pud_t patch fixes both these problems (the exact > patch can be found at: http://www.home.arm.linux.org.uk/~rmk/misc/bk4-bk5 > Note that this is not a plain bk4-bk5 patch, but just the pud_t > changes brought forward to bk6 or there abouts.) > > So, something in the 4 level page table patches is causing random > scribbling in kernel memory. I've tried that, and it fixes the problems for me on the EB2410ITX (ARM9 2410) and the corruption of the initial-ramdisk. -- Ben (ben@fluff.org, http://www.fluff.org/) 'a smiley only costs 4 bytes' ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 2.6.10-bkcurr: major slab corruption preventing booting on ARM 2005-01-04 16:10 ` Russell King 2005-01-04 17:18 ` Ben Dooks @ 2005-01-04 17:21 ` Russell King 2005-01-05 2:00 ` Nick Piggin 1 sibling, 1 reply; 5+ messages in thread From: Russell King @ 2005-01-04 17:21 UTC (permalink / raw) To: Linux Kernel List; +Cc: Nick Piggin On Tue, Jan 04, 2005 at 04:10:49PM +0000, Russell King wrote: > On Tue, Jan 04, 2005 at 02:43:50PM +0000, Russell King wrote: > > I've had a report from a fellow ARM hacker of their platform not > > booting. After they turned on slab debugging, they saw (pieced > > together from a report on IRC): > > > > Freeing init memory: 104K > > run_init_process(/bin/bash) > > Slab corruption: start=c0010934, len=160 > > Last user: [<c00adc54>](d_alloc+0x28/0x2d8) > > > > I've just run up 2.6.10-bkcurr on a different ARM platform, and > > encountered the following output. It looks like there's serious > > slab corruption issues in these kernels. > > > > I'll dig a little further into the report below to see if there's > > anything obvious. > > Ok, reverting the pud_t patch fixes both these problems (the exact > patch can be found at: http://www.home.arm.linux.org.uk/~rmk/misc/bk4-bk5 > Note that this is not a plain bk4-bk5 patch, but just the pud_t > changes brought forward to bk6 or there abouts.) > > So, something in the 4 level page table patches is causing random > scribbling in kernel memory. Ok, I've narrowed the problem down to something in the following patch. Andi Kleen suggests that maybe the ARM FIRST_USER_PGD_NR got broken in by something here. Nick, any ideas? diff -urN linux-2.6.10-bk4/include/linux/mm.h linux-2.6.10-bk5/include/linux/mm.h --- linux-2.6.10-bk4/include/linux/mm.h 2004-12-24 13:33:50.000000000 -0800 +++ linux-2.6.10-bk5/include/linux/mm.h 2005-01-02 04:55:30.285949371 -0800 @@ -566,7 +566,7 @@ struct vm_area_struct *start_vma, unsigned long start_addr, unsigned long end_addr, unsigned long *nr_accounted, struct zap_details *); -void clear_page_tables(struct mmu_gather *tlb, unsigned long first, int nr); +void clear_page_range(struct mmu_gather *tlb, unsigned long addr, unsigned long end); int copy_page_range(struct mm_struct *dst, struct mm_struct *src, struct vm_area_struct *vma); int zeromap_page_range(struct vm_area_struct *vma, unsigned long from, diff -urN linux-2.6.10-bk4/mm/memory.c linux-2.6.10-bk5/mm/memory.c --- linux-2.6.10-bk4/mm/memory.c 2004-12-24 13:34:44.000000000 -0800 +++ linux-2.6.10-bk5/mm/memory.c 2005-01-02 04:55:31.265995181 -0800 @@ -34,6 +34,8 @@ * * 16.07.99 - Support of BIGMEM added by Gerhard Wichert, Siemens AG * (Gerhard.Wichert@pdb.siemens.de) + * + * Aug/Sep 2004 Changed to four level page tables (Andi Kleen) */ #include <linux/kernel_stat.h> @@ -98,58 +100,107 @@ * Note: this doesn't free the actual pages themselves. That * has been handled earlier when unmapping all the memory regions. */ -static inline void free_one_pmd(struct mmu_gather *tlb, pmd_t * dir) +static inline void clear_pmd_range(struct mmu_gather *tlb, pmd_t *pmd, unsigned long start, unsigned long end) { struct page *page; - if (pmd_none(*dir)) + if (pmd_none(*pmd)) return; - if (unlikely(pmd_bad(*dir))) { - pmd_ERROR(*dir); - pmd_clear(dir); + if (unlikely(pmd_bad(*pmd))) { + pmd_ERROR(*pmd); + pmd_clear(pmd); return; } - page = pmd_page(*dir); - pmd_clear(dir); - dec_page_state(nr_page_table_pages); - tlb->mm->nr_ptes--; - pte_free_tlb(tlb, page); + if (!(start & ~PMD_MASK) && !(end & ~PMD_MASK)) { + page = pmd_page(*pmd); + pmd_clear(pmd); + dec_page_state(nr_page_table_pages); + tlb->mm->nr_ptes--; + pte_free_tlb(tlb, page); + } } -static inline void free_one_pgd(struct mmu_gather *tlb, pgd_t * dir) +static inline void clear_pud_range(struct mmu_gather *tlb, pud_t *pud, unsigned long start, unsigned long end) { - int j; - pmd_t * pmd; + unsigned long addr = start, next; + pmd_t *pmd, *__pmd; - if (pgd_none(*dir)) + if (pud_none(*pud)) return; - if (unlikely(pgd_bad(*dir))) { - pgd_ERROR(*dir); - pgd_clear(dir); + if (unlikely(pud_bad(*pud))) { + pud_ERROR(*pud); + pud_clear(pud); return; } - pmd = pmd_offset(dir, 0); - pgd_clear(dir); - for (j = 0; j < PTRS_PER_PMD ; j++) - free_one_pmd(tlb, pmd+j); - pmd_free_tlb(tlb, pmd); + + pmd = __pmd = pmd_offset(pud, start); + do { + next = (addr + PMD_SIZE) & PMD_MASK; + if (next > end || next <= addr) + next = end; + + clear_pmd_range(tlb, pmd, addr, next); + pmd++; + addr = next; + } while (addr && (addr < end)); + + if (!(start & ~PUD_MASK) && !(end & ~PUD_MASK)) { + pud_clear(pud); + pmd_free_tlb(tlb, __pmd); + } +} + + +static inline void clear_pgd_range(struct mmu_gather *tlb, pgd_t *pgd, unsigned long start, unsigned long end) +{ + unsigned long addr = start, next; + pud_t *pud, *__pud; + + if (pgd_none(*pgd)) + return; + if (unlikely(pgd_bad(*pgd))) { + pgd_ERROR(*pgd); + pgd_clear(pgd); + return; + } + + pud = __pud = pud_offset(pgd, start); + do { + next = (addr + PUD_SIZE) & PUD_MASK; + if (next > end || next <= addr) + next = end; + + clear_pud_range(tlb, pud, addr, next); + pud++; + addr = next; + } while (addr && (addr < end)); + + if (!(start & ~PGDIR_MASK) && !(end & ~PGDIR_MASK)) { + pgd_clear(pgd); + pud_free_tlb(tlb, __pud); + } } /* - * This function clears all user-level page tables of a process - this - * is needed by execve(), so that old pages aren't in the way. + * This function clears user-level page tables of a process. * * Must be called with pagetable lock held. */ -void clear_page_tables(struct mmu_gather *tlb, unsigned long first, int nr) +void clear_page_range(struct mmu_gather *tlb, unsigned long start, unsigned long end) { - pgd_t * page_dir = tlb->mm->pgd; - - page_dir += first; - do { - free_one_pgd(tlb, page_dir); - page_dir++; - } while (--nr); + unsigned long addr = start, next; + unsigned long i, nr = pgd_index(end + PGDIR_SIZE-1) - pgd_index(start); + pgd_t * pgd = pgd_offset(tlb->mm, start); + + for (i = 0; i < nr; i++) { + next = (addr + PGDIR_SIZE) & PGDIR_MASK; + if (next > end || next <= addr) + next = end; + + clear_pgd_range(tlb, pgd, addr, next); + pgd++; + addr = next; + } } pte_t fastcall * pte_alloc_map(struct mm_struct *mm, pmd_t *pmd, unsigned long address) diff -urN linux-2.6.10-bk4/mm/mmap.c linux-2.6.10-bk5/mm/mmap.c --- linux-2.6.10-bk4/mm/mmap.c 2004-12-24 13:35:00.000000000 -0800 +++ linux-2.6.10-bk5/mm/mmap.c 2005-01-02 04:55:31.385000743 -0800 @@ -1474,7 +1474,6 @@ { unsigned long first = start & PGDIR_MASK; unsigned long last = end + PGDIR_SIZE - 1; - unsigned long start_index, end_index; struct mm_struct *mm = tlb->mm; if (!prev) { @@ -1499,23 +1498,18 @@ last = next->vm_start; } if (prev->vm_end > first) - first = prev->vm_end + PGDIR_SIZE - 1; + first = prev->vm_end; break; } no_mmaps: if (last < first) /* for arches with discontiguous pgd indices */ return; - /* - * If the PGD bits are not consecutive in the virtual address, the - * old method of shifting the VA >> by PGDIR_SHIFT doesn't work. - */ - start_index = pgd_index(first); - if (start_index < FIRST_USER_PGD_NR) - start_index = FIRST_USER_PGD_NR; - end_index = pgd_index(last); - if (end_index > start_index) { - clear_page_tables(tlb, start_index, end_index - start_index); - flush_tlb_pgtables(mm, first & PGDIR_MASK, last & PGDIR_MASK); + if (first < FIRST_USER_PGD_NR * PGDIR_SIZE) + first = FIRST_USER_PGD_NR * PGDIR_SIZE; + /* No point trying to free anything if we're in the same pte page */ + if ((first & PMD_MASK) < (last & PMD_MASK)) { + clear_page_range(tlb, first, last); + flush_tlb_pgtables(mm, first, last); } } @@ -1844,7 +1838,9 @@ ~0UL, &nr_accounted, NULL); vm_unacct_memory(nr_accounted); BUG_ON(mm->map_count); /* This is just debugging */ - clear_page_tables(tlb, FIRST_USER_PGD_NR, USER_PTRS_PER_PGD); + clear_page_range(tlb, FIRST_USER_PGD_NR * PGDIR_SIZE, + (TASK_SIZE + PGDIR_SIZE - 1) & PGDIR_MASK); + tlb_finish_mmu(tlb, 0, MM_VM_SIZE(mm)); vma = mm->mmap; -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/ 2.6 Serial core ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 2.6.10-bkcurr: major slab corruption preventing booting on ARM 2005-01-04 17:21 ` Russell King @ 2005-01-05 2:00 ` Nick Piggin 0 siblings, 0 replies; 5+ messages in thread From: Nick Piggin @ 2005-01-05 2:00 UTC (permalink / raw) To: Russell King; +Cc: Linux Kernel List Russell King wrote: > On Tue, Jan 04, 2005 at 04:10:49PM +0000, Russell King wrote: > >>On Tue, Jan 04, 2005 at 02:43:50PM +0000, Russell King wrote: >> >>>I've had a report from a fellow ARM hacker of their platform not >>>booting. After they turned on slab debugging, they saw (pieced >>>together from a report on IRC): >>> >>>Freeing init memory: 104K >>>run_init_process(/bin/bash) >>>Slab corruption: start=c0010934, len=160 >>>Last user: [<c00adc54>](d_alloc+0x28/0x2d8) >>> >>>I've just run up 2.6.10-bkcurr on a different ARM platform, and >>>encountered the following output. It looks like there's serious >>>slab corruption issues in these kernels. >>> >>>I'll dig a little further into the report below to see if there's >>>anything obvious. >> >>Ok, reverting the pud_t patch fixes both these problems (the exact >>patch can be found at: http://www.home.arm.linux.org.uk/~rmk/misc/bk4-bk5 >>Note that this is not a plain bk4-bk5 patch, but just the pud_t >>changes brought forward to bk6 or there abouts.) >> >>So, something in the 4 level page table patches is causing random >>scribbling in kernel memory. > > > Ok, I've narrowed the problem down to something in the following patch. > Andi Kleen suggests that maybe the ARM FIRST_USER_PGD_NR got broken in > by something here. Nick, any ideas? > I see you've had a fix commited to -bk? Yes that looks like it would cause the problems you are seeing. Thanks, Nick ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2005-01-05 2:02 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-01-04 14:43 2.6.10-bkcurr: major slab corruption preventing booting on ARM Russell King 2005-01-04 16:10 ` Russell King 2005-01-04 17:18 ` Ben Dooks 2005-01-04 17:21 ` Russell King 2005-01-05 2:00 ` Nick Piggin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox