* Re: Using GDB kills kernel
2004-06-30 5:09 Using GDB kills kernel Peter Chubb
@ 2004-06-30 16:14 ` Alex Williamson
2004-07-01 0:09 ` peterc
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Alex Williamson @ 2004-06-30 16:14 UTC (permalink / raw)
To: linux-ia64
On Wed, 2004-06-30 at 15:09 +1000, Peter Chubb wrote:
>
> With David's current patch against 2.6.7-BK, attempting to use GDB
> causes a firmware assertion failure on my ZX2000. Looks like it's
> causing a SAL call or something.
>
> gdb ./h
> (gdb) r
> Firmware assertion failed: (((DATA8) addr - salNvm) < NVM_SIZE) ||
> (((DATA8) addr - efiNvm) < EFI_NVM_SIZE), file bbsram_new.c line 570
> ....
>
According to our firmware guys, that message is trying to say that
something tried to access non-volatile memory outside of the SAL or EFI
address ranges (ie. bad address that happened to hit NVM). I can't
reproduce it on my box. Does it happen every time? Any dependency on
the binary gdb is debugging? Thanks,
Alex
--
Alex Williamson HP Linux & Open Source Lab
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: Using GDB kills kernel
2004-06-30 5:09 Using GDB kills kernel Peter Chubb
2004-06-30 16:14 ` Alex Williamson
@ 2004-07-01 0:09 ` peterc
2004-07-07 0:21 ` David Mosberger
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: peterc @ 2004-07-01 0:09 UTC (permalink / raw)
To: linux-ia64
>>>>> "Alex" = Alex Williamson <alex.williamson@hp.com> writes:
Alex> On Wed, 2004-06-30 at 15:09 +1000, Peter Chubb wrote:
>> With David's current patch against 2.6.7-BK, attempting to use GDB
>> causes a firmware assertion failure on my ZX2000. Looks like it's
>> causing a SAL call or something.
>>
>> gdb ./h (gdb) r Firmware assertion failed: (((DATA8) addr - salNvm)
>> < NVM_SIZE) || (((DATA8) addr - efiNvm) < EFI_NVM_SIZE), file
>> bbsram_new.c line 570 ....
>>
Alex> According to our firmware guys, that message is trying to say
Alex> that something tried to access non-volatile memory outside of
Alex> the SAL or EFI address ranges (ie. bad address that happened to
Alex> hit NVM). I can't reproduce it on my box. Does it happen every
Alex> time? Any dependency on the binary gdb is debugging? Thanks,
I can reproduce the problem on any one-processor Zx2000, when using
gdb on any process.
On a dual processor, I see an oops instead:
Unable to handle kernel paging request at virtual address
a00080a40bc4c008
gdb[426]: Oops 8821862825984 [1]
Call Trace:
[<a000000100015b20>] show_stack+0x80/0xa0
spà00004040bd79a0 bspà00004040bd12e8
[<a000000100024c90>] die+0x1f0/0x280
spà00004040bd7b70 bspà00004040bd12b0
[<a00000010003fb40>] ia64_do_page_fault+0x360/0x960
spà00004040bd7b70 bspà00004040bd1248
[<a00000010000e1a0>] ia64_leave_kernel+0x0/0x260
spà00004040bd7c00 bspà00004040bd1248
[<a0000001000e0c40>] get_user_pages+0x980/0x9c0
spà00004040bd7dd0 bspà00004040bd10e8
[<a000000100088f00>] access_process_vm+0x1a0/0x4e0
spà00004040bd7de0 bspà00004040bd1038
[<a000000100018e50>] ia64_peek+0x230/0x260
spà00004040bd7e00 bspà00004040bd0fe0
[<a00000010001c070>] sys_ptrace+0x390/0xb20
spà00004040bd7e20 bspà00004040bd0f48
[<a00000010000e020>] ia64_ret_from_syscall+0x0/0x20
spà00004040bd7e30 bspà00004040bd0f48
0xa0000001000e0c40 where the fault occurred is
include/linux/mm.h:307 -- the atomic-increment of page->_count in
get_page.
--
Dr Peter Chubb http://www.gelato.unsw.edu.au peterc AT gelato.unsw.edu.au
The technical we do immediately, the political takes *forever*
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: Using GDB kills kernel
2004-06-30 5:09 Using GDB kills kernel Peter Chubb
2004-06-30 16:14 ` Alex Williamson
2004-07-01 0:09 ` peterc
@ 2004-07-07 0:21 ` David Mosberger
2004-07-07 8:32 ` Peter Chubb
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: David Mosberger @ 2004-07-07 0:21 UTC (permalink / raw)
To: linux-ia64
>>>>> On Wed, 30 Jun 2004 15:09:38 +1000, Peter Chubb <peter@chubb.wattle.id.au> said:
Peter> With David's current patch against 2.6.7-BK, attempting to use GDB
Peter> causes a firmware assertion failure on my ZX2000. Looks like it's
Peter> causing a SAL call or something.
Peter> gdb ./h
Peter> (gdb) r
Peter> Firmware assertion failed: (((DATA8) addr - salNvm) < NVM_SIZE) ||
Peter> (((DATA8) addr - efiNvm) < EFI_NVM_SIZE), file bbsram_new.c line 570
Peter> ....
I also cannot reproduce this. I tried with a UP kernel on zx2000
running gdb 6.1-debian (from Debian/unstable).
--david
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: Using GDB kills kernel
2004-06-30 5:09 Using GDB kills kernel Peter Chubb
` (2 preceding siblings ...)
2004-07-07 0:21 ` David Mosberger
@ 2004-07-07 8:32 ` Peter Chubb
2004-07-07 10:57 ` Peter Chubb
2004-07-08 1:18 ` David Mosberger
5 siblings, 0 replies; 7+ messages in thread
From: Peter Chubb @ 2004-07-07 8:32 UTC (permalink / raw)
To: linux-ia64
>>>>> "David" = David Mosberger <davidm@napali.hpl.hp.com> writes:
>>>>> On Wed, 7 Jul 2004 11:03:08 +1000, Peter Chubb <peterc@gelato.unsw.edu.au> said:
Peter> I'm currently trying binary chop to see what change(s) caused
Peter> the problem. But there were a lot of them in a very tangled
Peter> revision history, so progress is slow.
David> I'd guess that it may be another bug related to moving
David> "current" for init_task to region 5. Though it's not apparent
David> how or why this would only bite you (so far).
The problem is changeset
roland@redhat.com[torvalds]|ChangeSet|20040624165002|30880
which changes the way PTEs are looked up for the gate page.
--
Dr Peter Chubb http://www.gelato.unsw.edu.au peterc AT gelato.unsw.edu.au
The technical we do immediately, the political takes *forever*
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: Using GDB kills kernel
2004-06-30 5:09 Using GDB kills kernel Peter Chubb
` (3 preceding siblings ...)
2004-07-07 8:32 ` Peter Chubb
@ 2004-07-07 10:57 ` Peter Chubb
2004-07-08 1:18 ` David Mosberger
5 siblings, 0 replies; 7+ messages in thread
From: Peter Chubb @ 2004-07-07 10:57 UTC (permalink / raw)
To: linux-ia64
>>>>> "Peter" = Peter Chubb <peterc@gelato.unsw.edu.au> writes:
>>>>> "David" = David Mosberger <davidm@napali.hpl.hp.com> writes:
>>>>> On Wed, 7 Jul 2004 11:03:08 +1000, Peter Chubb <peterc@gelato.unsw.edu.au> said:
Peter> I'm currently trying binary chop to see what change(s) caused
Peter> the problem. But there were a lot of them in a very tangled
Peter> revision history, so progress is slow.
David> I'd guess that it may be another bug related to moving
David> "current" for init_task to region 5. Though it's not apparent
David> how or why this would only bite you (so far).
Peter> The problem is changeset
Peter> roland@redhat.com[torvalds]|ChangeSet|20040624165002|30880
Peter> which changes the way PTEs are looked up for the gate page.
OK, the problem arises when GDB tries to step over or access something
in the gate page. The changeset mentioned above changes things so that
instead of doing
pgd = pgd_offset_k(addr)
it does
pgd = pgd_offset(mm, addr)
for a gate page address.
Here's a patch that makes pgd_offset work for gate page addresses.
I'm not entirely happy about adding work everywhere to satisfy one
fairly rare use of pgd_offset() to get a gate page address.
Is there a better solution? The problematical change was to fix
X86_64 when ptracing 32-bit processes.
Index: linux-2.5-fixit/include/asm-ia64/pgtable.h
=================================--- linux-2.5-fixit.orig/include/asm-ia64/pgtable.h
+++ linux-2.5-fixit/include/asm-ia64/pgtable.h
@@ -308,18 +308,21 @@
return (region << (PAGE_SHIFT - 6)) | l1index;
}
+/* In the kernel's mapped region we have a full 43 bit space available and completely
+ ignore the region number (since we know its in region number 5). */
+#define pgd_offset_k(addr) \
+ (init_mm.pgd + (((addr) >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1)))
+
/* The offset in the 1-level directory is given by the 3 region bits
(61..63) and the seven level-1 bits (33-39). */
static inline pgd_t*
pgd_offset (struct mm_struct *mm, unsigned long address)
{
+ if ((address >> 61) = 5)
+ return pgd_offset_k(address);
return mm->pgd + pgd_index(address);
}
-/* In the kernel's mapped region we have a full 43 bit space available and completely
- ignore the region number (since we know its in region number 5). */
-#define pgd_offset_k(addr) \
- (init_mm.pgd + (((addr) >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1)))
/* Find an entry in the second-level page table.. */
#define pmd_offset(dir,addr) \
--
Dr Peter Chubb http://www.gelato.unsw.edu.au peterc AT gelato.unsw.edu.au
The technical we do immediately, the political takes *forever*
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: Using GDB kills kernel
2004-06-30 5:09 Using GDB kills kernel Peter Chubb
` (4 preceding siblings ...)
2004-07-07 10:57 ` Peter Chubb
@ 2004-07-08 1:18 ` David Mosberger
5 siblings, 0 replies; 7+ messages in thread
From: David Mosberger @ 2004-07-08 1:18 UTC (permalink / raw)
To: linux-ia64
>>>>> On Wed, 7 Jul 2004 20:57:27 +1000, Peter Chubb <peterc@gelato.unsw.edu.au> said:
Peter> OK, the problem arises when GDB tries to step over or access something
Peter> in the gate page. The changeset mentioned above changes things so that
Peter> instead of doing
Peter> pgd = pgd_offset_k(addr)
Peter> it does
Peter> pgd = pgd_offset(mm, addr)
Peter> for a gate page address.
Yeah, that'd do it.
Does the attached patch work for you?
--david
=== include/asm-generic/pgtable.h 1.6 vs edited ==--- 1.6/include/asm-generic/pgtable.h Wed May 26 07:56:17 2004
+++ edited/include/asm-generic/pgtable.h Wed Jul 7 18:02:20 2004
@@ -122,4 +122,8 @@
#define page_test_and_clear_young(page) (0)
#endif
+#ifndef __HAVE_ARCH_PGD_OFFSET_GATE
+#define pgd_offset_gate(mm, addr) pgd_offset(mm, addr)
+#endif
+
#endif /* _ASM_GENERIC_PGTABLE_H */
=== include/asm-ia64/pgtable.h 1.43 vs edited ==--- 1.43/include/asm-ia64/pgtable.h Sat Jun 19 07:48:59 2004
+++ edited/include/asm-ia64/pgtable.h Wed Jul 7 18:03:55 2004
@@ -321,6 +321,11 @@
#define pgd_offset_k(addr) \
(init_mm.pgd + (((addr) >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1)))
+/* Look up a pgd entry in the gate area. On IA-64, the gate-area
+ resides in the kernel-mapped segment, hence we use pgd_offset_k()
+ here. */
+#define pgd_offset_gate(mm, addr) pgd_offset_k(addr)
+
/* Find an entry in the second-level page table.. */
#define pmd_offset(dir,addr) \
((pmd_t *) pgd_page(*(dir)) + (((addr) >> PMD_SHIFT) & (PTRS_PER_PMD - 1)))
@@ -552,6 +557,7 @@
#define __HAVE_ARCH_PTEP_SET_WRPROTECT
#define __HAVE_ARCH_PTEP_MKDIRTY
#define __HAVE_ARCH_PTE_SAME
+#define __HAVE_ARCH_PGD_OFFSET_GATE
#include <asm-generic/pgtable.h>
#endif /* _ASM_IA64_PGTABLE_H */
=== mm/memory.c 1.148 vs edited ==--- 1.148/mm/memory.c Tue Jul 6 22:19:26 2004
+++ edited/mm/memory.c Wed Jul 7 18:02:29 2004
@@ -727,7 +727,7 @@
pte_t *pte;
if (write) /* user gate pages are read-only */
return i ? : -EFAULT;
- pgd = pgd_offset(mm, pg);
+ pgd = pgd_offset_gate(mm, pg);
if (!pgd)
return i ? : -EFAULT;
pmd = pmd_offset(pgd, pg);
^ permalink raw reply [flat|nested] 7+ messages in thread