public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* ia64 hang/mca running gdb 'make check'
@ 2010-07-20 17:35 dann frazier
  2010-07-21  1:51 ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 19+ messages in thread
From: dann frazier @ 2010-07-20 17:35 UTC (permalink / raw)
  To: linux-ia64
  Cc: linux-kernel, Hugh Dickins, Rik van Riel, KAMEZAWA Hiroyuki,
	KOSAKI Motohiro, Nick Piggin, Mel Gorman, Minchan Kim,
	Ralf Baechle

Debian's ia64 autobuilders have been experiencing system crashes while
trying to run the gdb test suite:
  http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=588574

I was able to reproduce this w/ the latest git tree, and bisected it
down to this commit, introduced in 2.6.32:

  commit 62eede62dafb4a6633eae7ffbeb34c60dba5e7b1
  Author: Hugh Dickins <hugh.dickins@tiscali.co.uk>
  Date:   Mon Sep 21 17:03:34 2009 -0700

    mm: ZERO_PAGE without PTE_SPECIAL

    Reinstate anonymous use of ZERO_PAGE to all architectures, not just to
    those which __HAVE_ARCH_PTE_SPECIAL: as suggested by Nick Piggin.

    Contrary to how I'd imagined it, there's nothing ugly about this, just a
    zero_pfn test built into one or another block of vm_normal_page().

    But the MIPS ZERO_PAGE-of-many-colours case demands is_zero_pfn() and
    my_zero_pfn() inlines.  Reinstate its mremap move_pte() shuffling of
    ZERO_PAGEs we did from 2.6.17 to 2.6.19?  Not unless someone shouts for
    that: it would have to take vm_flags to weed out some cases.

fyi, I found this to not be reproducible on SLES11 SP1 (which is
2.6.32-based). I compared the .configs and found that the relevant
difference is the PAGE_SIZE. It does not fail w/ 64KB pages, but
reliably fails w/ 16KB pages.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: ia64 hang/mca running gdb 'make check'
  2010-07-20 17:35 ia64 hang/mca running gdb 'make check' dann frazier
@ 2010-07-21  1:51 ` KAMEZAWA Hiroyuki
  2010-07-21  3:06   ` dann frazier
  0 siblings, 1 reply; 19+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-07-21  1:51 UTC (permalink / raw)
  To: dann frazier
  Cc: linux-ia64, linux-kernel, Hugh Dickins, Rik van Riel,
	KOSAKI Motohiro, Nick Piggin, Mel Gorman, Minchan Kim,
	Ralf Baechle

On Tue, 20 Jul 2010 11:35:12 -0600
dann frazier <dannf@debian.org> wrote:

> Debian's ia64 autobuilders have been experiencing system crashes while
> trying to run the gdb test suite:
>   http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=588574
> 
> I was able to reproduce this w/ the latest git tree, and bisected it
> down to this commit, introduced in 2.6.32:
> 
>   commit 62eede62dafb4a6633eae7ffbeb34c60dba5e7b1
>   Author: Hugh Dickins <hugh.dickins@tiscali.co.uk>
>   Date:   Mon Sep 21 17:03:34 2009 -0700
> 
>     mm: ZERO_PAGE without PTE_SPECIAL
> 
>     Reinstate anonymous use of ZERO_PAGE to all architectures, not just to
>     those which __HAVE_ARCH_PTE_SPECIAL: as suggested by Nick Piggin.
> 
>     Contrary to how I'd imagined it, there's nothing ugly about this, just a
>     zero_pfn test built into one or another block of vm_normal_page().
> 
>     But the MIPS ZERO_PAGE-of-many-colours case demands is_zero_pfn() and
>     my_zero_pfn() inlines.  Reinstate its mremap move_pte() shuffling of
>     ZERO_PAGEs we did from 2.6.17 to 2.6.19?  Not unless someone shouts for
>     that: it would have to take vm_flags to weed out some cases.
> 
> fyi, I found this to not be reproducible on SLES11 SP1 (which is
> 2.6.32-based). I compared the .configs and found that the relevant
> difference is the PAGE_SIZE. It does not fail w/ 64KB pages, but
> reliably fails w/ 16KB pages.
> 

Sorry, I have no idea...
Hmm, what is the address of empty_zero_page[] on your debian(16kb-page) ?

Thanks,
-Kame





^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: ia64 hang/mca running gdb 'make check'
  2010-07-21  1:51 ` KAMEZAWA Hiroyuki
@ 2010-07-21  3:06   ` dann frazier
  2010-07-21  4:19     ` Hugh Dickins
  0 siblings, 1 reply; 19+ messages in thread
From: dann frazier @ 2010-07-21  3:06 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-ia64, linux-kernel, Hugh Dickins, Rik van Riel,
	KOSAKI Motohiro, Nick Piggin, Mel Gorman, Minchan Kim,
	Ralf Baechle

On Wed, Jul 21, 2010 at 10:51:36AM +0900, KAMEZAWA Hiroyuki wrote:
> On Tue, 20 Jul 2010 11:35:12 -0600
> dann frazier <dannf@debian.org> wrote:
> 
> > Debian's ia64 autobuilders have been experiencing system crashes while
> > trying to run the gdb test suite:
> >   http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=588574
> > 
> > I was able to reproduce this w/ the latest git tree, and bisected it
> > down to this commit, introduced in 2.6.32:
> > 
> >   commit 62eede62dafb4a6633eae7ffbeb34c60dba5e7b1
> >   Author: Hugh Dickins <hugh.dickins@tiscali.co.uk>
> >   Date:   Mon Sep 21 17:03:34 2009 -0700
> > 
> >     mm: ZERO_PAGE without PTE_SPECIAL
> > 
> >     Reinstate anonymous use of ZERO_PAGE to all architectures, not just to
> >     those which __HAVE_ARCH_PTE_SPECIAL: as suggested by Nick Piggin.
> > 
> >     Contrary to how I'd imagined it, there's nothing ugly about this, just a
> >     zero_pfn test built into one or another block of vm_normal_page().
> > 
> >     But the MIPS ZERO_PAGE-of-many-colours case demands is_zero_pfn() and
> >     my_zero_pfn() inlines.  Reinstate its mremap move_pte() shuffling of
> >     ZERO_PAGEs we did from 2.6.17 to 2.6.19?  Not unless someone shouts for
> >     that: it would have to take vm_flags to weed out some cases.
> > 
> > fyi, I found this to not be reproducible on SLES11 SP1 (which is
> > 2.6.32-based). I compared the .configs and found that the relevant
> > difference is the PAGE_SIZE. It does not fail w/ 64KB pages, but
> > reliably fails w/ 16KB pages.
> > 
> 
> Sorry, I have no idea...
> Hmm, what is the address of empty_zero_page[] on your debian(16kb-page) ?


dannf@krebs:~$ grep empty_zero_page /boot/System.map-2.6.32-5-mckinley 
a0000001008784c0 d __ksymtab_empty_zero_page
a000000100882688 d __kcrctab_empty_zero_page
a000000100884ca4 r __kstrtab_empty_zero_page
a000000100974000 D empty_zero_page

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: ia64 hang/mca running gdb 'make check'
  2010-07-21  3:06   ` dann frazier
@ 2010-07-21  4:19     ` Hugh Dickins
  2010-07-21 12:54       ` KOSAKI Motohiro
  2010-07-27  7:19       ` dann frazier
  0 siblings, 2 replies; 19+ messages in thread
From: Hugh Dickins @ 2010-07-21  4:19 UTC (permalink / raw)
  To: dann frazier
  Cc: KAMEZAWA Hiroyuki, linux-ia64, linux-kernel, Rik van Riel,
	KOSAKI Motohiro, Nick Piggin, Mel Gorman, Minchan Kim,
	Ralf Baechle

On Tue, 20 Jul 2010, dann frazier wrote:
> On Wed, Jul 21, 2010 at 10:51:36AM +0900, KAMEZAWA Hiroyuki wrote:
> > On Tue, 20 Jul 2010 11:35:12 -0600
> > dann frazier <dannf@debian.org> wrote:
> > 
> > > Debian's ia64 autobuilders have been experiencing system crashes while
> > > trying to run the gdb test suite:
> > >   http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=588574
> > > 
> > > I was able to reproduce this w/ the latest git tree, and bisected it
> > > down to this commit, introduced in 2.6.32:
> > > 
> > >   commit 62eede62dafb4a6633eae7ffbeb34c60dba5e7b1
> > >   Author: Hugh Dickins <hugh.dickins@tiscali.co.uk>
> > >   Date:   Mon Sep 21 17:03:34 2009 -0700
> > > 
> > >     mm: ZERO_PAGE without PTE_SPECIAL
> > > 
> > >     Reinstate anonymous use of ZERO_PAGE to all architectures, not just to
> > >     those which __HAVE_ARCH_PTE_SPECIAL: as suggested by Nick Piggin.
> > > 
> > >     Contrary to how I'd imagined it, there's nothing ugly about this, just a
> > >     zero_pfn test built into one or another block of vm_normal_page().
> > > 
> > >     But the MIPS ZERO_PAGE-of-many-colours case demands is_zero_pfn() and
> > >     my_zero_pfn() inlines.  Reinstate its mremap move_pte() shuffling of
> > >     ZERO_PAGEs we did from 2.6.17 to 2.6.19?  Not unless someone shouts for
> > >     that: it would have to take vm_flags to weed out some cases.
> > > 
> > > fyi, I found this to not be reproducible on SLES11 SP1 (which is
> > > 2.6.32-based). I compared the .configs and found that the relevant
> > > difference is the PAGE_SIZE. It does not fail w/ 64KB pages, but
> > > reliably fails w/ 16KB pages.
> > > 
> > 
> > Sorry, I have no idea...
> > Hmm, what is the address of empty_zero_page[] on your debian(16kb-page) ?
> 
> 
> dannf@krebs:~$ grep empty_zero_page /boot/System.map-2.6.32-5-mckinley 
> a0000001008784c0 d __ksymtab_empty_zero_page
> a000000100882688 d __kcrctab_empty_zero_page
> a000000100884ca4 r __kstrtab_empty_zero_page
> a000000100974000 D empty_zero_page

Thanks a lot for reporting this, but I too have no idea yet.

It is likely that the bug is not to be found in that 62eede62, but
rather in one of the preceding patches to mm/memory.c which 62eede62
was extending to ia64 and other architectures without PTE_SPECIAL.

I wonder, from looking at that gdb testsuite log, is it plausible
that all these hangs/crashes occurred when writing out a coredump?
Is that something you could check for us? or rule out the possibility.

I was rather proud of the get_dump_page() simplification,
but perhaps there's something nasty lurking in there.

Hugh

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: ia64 hang/mca running gdb 'make check'
  2010-07-21  4:19     ` Hugh Dickins
@ 2010-07-21 12:54       ` KOSAKI Motohiro
  2010-07-27  7:19       ` dann frazier
  1 sibling, 0 replies; 19+ messages in thread
From: KOSAKI Motohiro @ 2010-07-21 12:54 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: kosaki.motohiro, dann frazier, KAMEZAWA Hiroyuki, linux-ia64,
	linux-kernel, Rik van Riel, Nick Piggin, Mel Gorman, Minchan Kim,
	Ralf Baechle

> On Tue, 20 Jul 2010, dann frazier wrote:
> > On Wed, Jul 21, 2010 at 10:51:36AM +0900, KAMEZAWA Hiroyuki wrote:
> > > On Tue, 20 Jul 2010 11:35:12 -0600
> > > dann frazier <dannf@debian.org> wrote:
> > > 
> > > > Debian's ia64 autobuilders have been experiencing system crashes while
> > > > trying to run the gdb test suite:
> > > >   http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=588574
> > > > 
> > > > I was able to reproduce this w/ the latest git tree, and bisected it
> > > > down to this commit, introduced in 2.6.32:
> > > > 
> > > >   commit 62eede62dafb4a6633eae7ffbeb34c60dba5e7b1
> > > >   Author: Hugh Dickins <hugh.dickins@tiscali.co.uk>
> > > >   Date:   Mon Sep 21 17:03:34 2009 -0700
> > > > 
> > > >     mm: ZERO_PAGE without PTE_SPECIAL
> > > > 
> > > >     Reinstate anonymous use of ZERO_PAGE to all architectures, not just to
> > > >     those which __HAVE_ARCH_PTE_SPECIAL: as suggested by Nick Piggin.
> > > > 
> > > >     Contrary to how I'd imagined it, there's nothing ugly about this, just a
> > > >     zero_pfn test built into one or another block of vm_normal_page().
> > > > 
> > > >     But the MIPS ZERO_PAGE-of-many-colours case demands is_zero_pfn() and
> > > >     my_zero_pfn() inlines.  Reinstate its mremap move_pte() shuffling of
> > > >     ZERO_PAGEs we did from 2.6.17 to 2.6.19?  Not unless someone shouts for
> > > >     that: it would have to take vm_flags to weed out some cases.
> > > > 
> > > > fyi, I found this to not be reproducible on SLES11 SP1 (which is
> > > > 2.6.32-based). I compared the .configs and found that the relevant
> > > > difference is the PAGE_SIZE. It does not fail w/ 64KB pages, but
> > > > reliably fails w/ 16KB pages.
> > > > 
> > > 
> > > Sorry, I have no idea...
> > > Hmm, what is the address of empty_zero_page[] on your debian(16kb-page) ?
> > 
> > 
> > dannf@krebs:~$ grep empty_zero_page /boot/System.map-2.6.32-5-mckinley 
> > a0000001008784c0 d __ksymtab_empty_zero_page
> > a000000100882688 d __kcrctab_empty_zero_page
> > a000000100884ca4 r __kstrtab_empty_zero_page
> > a000000100974000 D empty_zero_page
> 
> Thanks a lot for reporting this, but I too have no idea yet.
> 
> It is likely that the bug is not to be found in that 62eede62, but
> rather in one of the preceding patches to mm/memory.c which 62eede62
> was extending to ia64 and other architectures without PTE_SPECIAL.
> 
> I wonder, from looking at that gdb testsuite log, is it plausible
> that all these hangs/crashes occurred when writing out a coredump?
> Is that something you could check for us? or rule out the possibility.
> 
> I was rather proud of the get_dump_page() simplification,
> but perhaps there's something nasty lurking in there.

Ug. I did tested some zero page thing at developing 62eede62 on ia64.
but unforunatelly, I've lost ia64 test environment by physical machine
crash. and I don't remember I did test which page size ;)

Umm... I also have no idea. sorry.




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: ia64 hang/mca running gdb 'make check'
  2010-07-21  4:19     ` Hugh Dickins
  2010-07-21 12:54       ` KOSAKI Motohiro
@ 2010-07-27  7:19       ` dann frazier
  2010-07-27  9:03         ` KAMEZAWA Hiroyuki
  1 sibling, 1 reply; 19+ messages in thread
From: dann frazier @ 2010-07-27  7:19 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: KAMEZAWA Hiroyuki, linux-ia64, linux-kernel, Rik van Riel,
	KOSAKI Motohiro, Nick Piggin, Mel Gorman, Minchan Kim,
	Ralf Baechle

On Tue, Jul 20, 2010 at 09:19:50PM -0700, Hugh Dickins wrote:
> On Tue, 20 Jul 2010, dann frazier wrote:
> > On Wed, Jul 21, 2010 at 10:51:36AM +0900, KAMEZAWA Hiroyuki wrote:
> > > On Tue, 20 Jul 2010 11:35:12 -0600
> > > dann frazier <dannf@debian.org> wrote:
> > > 
> > > > Debian's ia64 autobuilders have been experiencing system crashes while
> > > > trying to run the gdb test suite:
> > > >   http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=588574
> > > > 
> > > > I was able to reproduce this w/ the latest git tree, and bisected it
> > > > down to this commit, introduced in 2.6.32:
> > > > 
> > > >   commit 62eede62dafb4a6633eae7ffbeb34c60dba5e7b1
> > > >   Author: Hugh Dickins <hugh.dickins@tiscali.co.uk>
> > > >   Date:   Mon Sep 21 17:03:34 2009 -0700
> > > > 
> > > >     mm: ZERO_PAGE without PTE_SPECIAL
> > > > 
> > > >     Reinstate anonymous use of ZERO_PAGE to all architectures, not just to
> > > >     those which __HAVE_ARCH_PTE_SPECIAL: as suggested by Nick Piggin.
> > > > 
> > > >     Contrary to how I'd imagined it, there's nothing ugly about this, just a
> > > >     zero_pfn test built into one or another block of vm_normal_page().
> > > > 
> > > >     But the MIPS ZERO_PAGE-of-many-colours case demands is_zero_pfn() and
> > > >     my_zero_pfn() inlines.  Reinstate its mremap move_pte() shuffling of
> > > >     ZERO_PAGEs we did from 2.6.17 to 2.6.19?  Not unless someone shouts for
> > > >     that: it would have to take vm_flags to weed out some cases.
> > > > 
> > > > fyi, I found this to not be reproducible on SLES11 SP1 (which is
> > > > 2.6.32-based). I compared the .configs and found that the relevant
> > > > difference is the PAGE_SIZE. It does not fail w/ 64KB pages, but
> > > > reliably fails w/ 16KB pages.
> > > > 
> > > 
> > > Sorry, I have no idea...
> > > Hmm, what is the address of empty_zero_page[] on your debian(16kb-page) ?
> > 
> > 
> > dannf@krebs:~$ grep empty_zero_page /boot/System.map-2.6.32-5-mckinley 
> > a0000001008784c0 d __ksymtab_empty_zero_page
> > a000000100882688 d __kcrctab_empty_zero_page
> > a000000100884ca4 r __kstrtab_empty_zero_page
> > a000000100974000 D empty_zero_page
> 
> Thanks a lot for reporting this, but I too have no idea yet.
> 
> It is likely that the bug is not to be found in that 62eede62, but
> rather in one of the preceding patches to mm/memory.c which 62eede62
> was extending to ia64 and other architectures without PTE_SPECIAL.
> 
> I wonder, from looking at that gdb testsuite log, is it plausible
> that all these hangs/crashes occurred when writing out a coredump?
> Is that something you could check for us? or rule out the possibility.

Yep, seems so. I've reduced it down to this test case:

dannf@rx2600:~> cat > foo.c
int leaf(void) {
  return 0;
}

int main(void) {
  leaf();
}
dannf@rx2600:~> gcc -g foo.c -o foo
dannf@rx2600:~> gdb ./foo 
GNU gdb (GDB) SUSE (7.0-0.4.16)
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "ia64-suse-linux".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/dannf/foo...done.
(gdb) break leaf
Breakpoint 1 at 0x40000000000005c1: file foo.c, line 2.
(gdb) run
Starting program: /home/dannf/foo 
Missing separate debuginfo for /lib/ld-linux-ia64.so.2
Try: zypper install -C "debuginfo(build-id)=d5bfb8b5940e174d54b978ca515dc0df76c7618c"
Missing separate debuginfo for /lib/libc.so.6.1
Try: zypper install -C "debuginfo(build-id)=ca78657bd9173653d95f8504a313d2b6db8cb1d6"

Breakpoint 1, leaf () at foo.c:2
2	     return 0;
(gdb) gcore /tmp/save

[bang]

> I was rather proud of the get_dump_page() simplification,
> but perhaps there's something nasty lurking in there.
> 
> Hugh
> 

-- 
dann frazier


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: ia64 hang/mca running gdb 'make check'
  2010-07-27  7:19       ` dann frazier
@ 2010-07-27  9:03         ` KAMEZAWA Hiroyuki
  2010-07-27 14:43           ` dann frazier
  2010-07-29  7:38           ` ia64 hang/mca running gdb 'make check' Luming Yu
  0 siblings, 2 replies; 19+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-07-27  9:03 UTC (permalink / raw)
  To: dann frazier
  Cc: Hugh Dickins, linux-ia64, linux-kernel, Rik van Riel,
	KOSAKI Motohiro, Nick Piggin, Mel Gorman, Minchan Kim,
	Ralf Baechle

On Tue, 27 Jul 2010 01:19:15 -0600
dann frazier <dannf@debian.org> wrote:

> On Tue, Jul 20, 2010 at 09:19:50PM -0700, Hugh Dickins wrote:
> > On Tue, 20 Jul 2010, dann frazier wrote:
> > > On Wed, Jul 21, 2010 at 10:51:36AM +0900, KAMEZAWA Hiroyuki wrote:
> > > > On Tue, 20 Jul 2010 11:35:12 -0600
> > > > dann frazier <dannf@debian.org> wrote:
> > > > 
> > > > > Debian's ia64 autobuilders have been experiencing system crashes while
> > > > > trying to run the gdb test suite:
> > > > >   http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=588574
> > > > > 
> > > > > I was able to reproduce this w/ the latest git tree, and bisected it
> > > > > down to this commit, introduced in 2.6.32:
> > > > > 
> > > > >   commit 62eede62dafb4a6633eae7ffbeb34c60dba5e7b1
> > > > >   Author: Hugh Dickins <hugh.dickins@tiscali.co.uk>
> > > > >   Date:   Mon Sep 21 17:03:34 2009 -0700
> > > > > 
> > > > >     mm: ZERO_PAGE without PTE_SPECIAL
> > > > > 
> > > > >     Reinstate anonymous use of ZERO_PAGE to all architectures, not just to
> > > > >     those which __HAVE_ARCH_PTE_SPECIAL: as suggested by Nick Piggin.
> > > > > 
> > > > >     Contrary to how I'd imagined it, there's nothing ugly about this, just a
> > > > >     zero_pfn test built into one or another block of vm_normal_page().
> > > > > 
> > > > >     But the MIPS ZERO_PAGE-of-many-colours case demands is_zero_pfn() and
> > > > >     my_zero_pfn() inlines.  Reinstate its mremap move_pte() shuffling of
> > > > >     ZERO_PAGEs we did from 2.6.17 to 2.6.19?  Not unless someone shouts for
> > > > >     that: it would have to take vm_flags to weed out some cases.
> > > > > 
> > > > > fyi, I found this to not be reproducible on SLES11 SP1 (which is
> > > > > 2.6.32-based). I compared the .configs and found that the relevant
> > > > > difference is the PAGE_SIZE. It does not fail w/ 64KB pages, but
> > > > > reliably fails w/ 16KB pages.
> > > > > 
> > > > 
> > > > Sorry, I have no idea...
> > > > Hmm, what is the address of empty_zero_page[] on your debian(16kb-page) ?
> > > 
> > > 
> > > dannf@krebs:~$ grep empty_zero_page /boot/System.map-2.6.32-5-mckinley 
> > > a0000001008784c0 d __ksymtab_empty_zero_page
> > > a000000100882688 d __kcrctab_empty_zero_page
> > > a000000100884ca4 r __kstrtab_empty_zero_page
> > > a000000100974000 D empty_zero_page
> > 
> > Thanks a lot for reporting this, but I too have no idea yet.
> > 
> > It is likely that the bug is not to be found in that 62eede62, but
> > rather in one of the preceding patches to mm/memory.c which 62eede62
> > was extending to ia64 and other architectures without PTE_SPECIAL.
> > 
> > I wonder, from looking at that gdb testsuite log, is it plausible
> > that all these hangs/crashes occurred when writing out a coredump?
> > Is that something you could check for us? or rule out the possibility.
> 
> Yep, seems so. I've reduced it down to this test case:
> 
> dannf@rx2600:~> cat > foo.c
> int leaf(void) {
>   return 0;
> }
> 
> int main(void) {
>   leaf();
> }
> dannf@rx2600:~> gcc -g foo.c -o foo
> dannf@rx2600:~> gdb ./foo 
> GNU gdb (GDB) SUSE (7.0-0.4.16)
> Copyright (C) 2009 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "ia64-suse-linux".
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>...
> Reading symbols from /home/dannf/foo...done.
> (gdb) break leaf
> Breakpoint 1 at 0x40000000000005c1: file foo.c, line 2.
> (gdb) run
> Starting program: /home/dannf/foo 
> Missing separate debuginfo for /lib/ld-linux-ia64.so.2
> Try: zypper install -C "debuginfo(build-id)=d5bfb8b5940e174d54b978ca515dc0df76c7618c"
> Missing separate debuginfo for /lib/libc.so.6.1
> Try: zypper install -C "debuginfo(build-id)=ca78657bd9173653d95f8504a313d2b6db8cb1d6"
> 
> Breakpoint 1, leaf () at foo.c:2
> 2	     return 0;
> (gdb) gcore /tmp/save
> 
> [bang]
> 

Does this happen on 2.6.34 or 2.6.35-rc kernel ?

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: ia64 hang/mca running gdb 'make check'
  2010-07-27  9:03         ` KAMEZAWA Hiroyuki
@ 2010-07-27 14:43           ` dann frazier
  2010-07-29  3:50             ` Hugh Dickins
  2010-07-29  7:38           ` ia64 hang/mca running gdb 'make check' Luming Yu
  1 sibling, 1 reply; 19+ messages in thread
From: dann frazier @ 2010-07-27 14:43 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Hugh Dickins, linux-ia64, linux-kernel, Rik van Riel,
	KOSAKI Motohiro, Nick Piggin, Mel Gorman, Minchan Kim,
	Ralf Baechle

On Tue, Jul 27, 2010 at 06:03:30PM +0900, KAMEZAWA Hiroyuki wrote:
> On Tue, 27 Jul 2010 01:19:15 -0600
> dann frazier <dannf@debian.org> wrote:
> 
> > On Tue, Jul 20, 2010 at 09:19:50PM -0700, Hugh Dickins wrote:
> > > On Tue, 20 Jul 2010, dann frazier wrote:
> > > > On Wed, Jul 21, 2010 at 10:51:36AM +0900, KAMEZAWA Hiroyuki wrote:
> > > > > On Tue, 20 Jul 2010 11:35:12 -0600
> > > > > dann frazier <dannf@debian.org> wrote:
> > > > > 
> > > > > > Debian's ia64 autobuilders have been experiencing system crashes while
> > > > > > trying to run the gdb test suite:
> > > > > >   http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=588574
> > > > > > 
> > > > > > I was able to reproduce this w/ the latest git tree, and bisected it
> > > > > > down to this commit, introduced in 2.6.32:
> > > > > > 
> > > > > >   commit 62eede62dafb4a6633eae7ffbeb34c60dba5e7b1
> > > > > >   Author: Hugh Dickins <hugh.dickins@tiscali.co.uk>
> > > > > >   Date:   Mon Sep 21 17:03:34 2009 -0700
> > > > > > 
> > > > > >     mm: ZERO_PAGE without PTE_SPECIAL
> > > > > > 
> > > > > >     Reinstate anonymous use of ZERO_PAGE to all architectures, not just to
> > > > > >     those which __HAVE_ARCH_PTE_SPECIAL: as suggested by Nick Piggin.
> > > > > > 
> > > > > >     Contrary to how I'd imagined it, there's nothing ugly about this, just a
> > > > > >     zero_pfn test built into one or another block of vm_normal_page().
> > > > > > 
> > > > > >     But the MIPS ZERO_PAGE-of-many-colours case demands is_zero_pfn() and
> > > > > >     my_zero_pfn() inlines.  Reinstate its mremap move_pte() shuffling of
> > > > > >     ZERO_PAGEs we did from 2.6.17 to 2.6.19?  Not unless someone shouts for
> > > > > >     that: it would have to take vm_flags to weed out some cases.
> > > > > > 
> > > > > > fyi, I found this to not be reproducible on SLES11 SP1 (which is
> > > > > > 2.6.32-based). I compared the .configs and found that the relevant
> > > > > > difference is the PAGE_SIZE. It does not fail w/ 64KB pages, but
> > > > > > reliably fails w/ 16KB pages.
> > > > > > 
> > > > > 
> > > > > Sorry, I have no idea...
> > > > > Hmm, what is the address of empty_zero_page[] on your debian(16kb-page) ?
> > > > 
> > > > 
> > > > dannf@krebs:~$ grep empty_zero_page /boot/System.map-2.6.32-5-mckinley 
> > > > a0000001008784c0 d __ksymtab_empty_zero_page
> > > > a000000100882688 d __kcrctab_empty_zero_page
> > > > a000000100884ca4 r __kstrtab_empty_zero_page
> > > > a000000100974000 D empty_zero_page
> > > 
> > > Thanks a lot for reporting this, but I too have no idea yet.
> > > 
> > > It is likely that the bug is not to be found in that 62eede62, but
> > > rather in one of the preceding patches to mm/memory.c which 62eede62
> > > was extending to ia64 and other architectures without PTE_SPECIAL.
> > > 
> > > I wonder, from looking at that gdb testsuite log, is it plausible
> > > that all these hangs/crashes occurred when writing out a coredump?
> > > Is that something you could check for us? or rule out the possibility.
> > 
> > Yep, seems so. I've reduced it down to this test case:
> > 
> > dannf@rx2600:~> cat > foo.c
> > int leaf(void) {
> >   return 0;
> > }
> > 
> > int main(void) {
> >   leaf();
> > }
> > dannf@rx2600:~> gcc -g foo.c -o foo
> > dannf@rx2600:~> gdb ./foo 
> > GNU gdb (GDB) SUSE (7.0-0.4.16)
> > Copyright (C) 2009 Free Software Foundation, Inc.
> > License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> > This is free software: you are free to change and redistribute it.
> > There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> > and "show warranty" for details.
> > This GDB was configured as "ia64-suse-linux".
> > For bug reporting instructions, please see:
> > <http://www.gnu.org/software/gdb/bugs/>...
> > Reading symbols from /home/dannf/foo...done.
> > (gdb) break leaf
> > Breakpoint 1 at 0x40000000000005c1: file foo.c, line 2.
> > (gdb) run
> > Starting program: /home/dannf/foo 
> > Missing separate debuginfo for /lib/ld-linux-ia64.so.2
> > Try: zypper install -C "debuginfo(build-id)=d5bfb8b5940e174d54b978ca515dc0df76c7618c"
> > Missing separate debuginfo for /lib/libc.so.6.1
> > Try: zypper install -C "debuginfo(build-id)=ca78657bd9173653d95f8504a313d2b6db8cb1d6"
> > 
> > Breakpoint 1, leaf () at foo.c:2
> > 2	     return 0;
> > (gdb) gcore /tmp/save
> > 
> > [bang]
> > 
> 
> Does this happen on 2.6.34 or 2.6.35-rc kernel ?

I've been testing w/ a 2.6.35-rc4+, though it was originally reported
on a 2.6.32.

-- 
dann frazier


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: ia64 hang/mca running gdb 'make check'
  2010-07-27 14:43           ` dann frazier
@ 2010-07-29  3:50             ` Hugh Dickins
  2010-07-29 19:22               ` dann frazier
  0 siblings, 1 reply; 19+ messages in thread
From: Hugh Dickins @ 2010-07-29  3:50 UTC (permalink / raw)
  To: dann frazier
  Cc: KAMEZAWA Hiroyuki, linux-ia64, linux-kernel, Rik van Riel,
	KOSAKI Motohiro, Nick Piggin, Mel Gorman, Minchan Kim,
	Ralf Baechle

On Tue, 27 Jul 2010, dann frazier wrote:
> On Tue, Jul 27, 2010 at 06:03:30PM +0900, KAMEZAWA Hiroyuki wrote:
> > On Tue, 27 Jul 2010 01:19:15 -0600
> > dann frazier <dannf@debian.org> wrote:
> > > On Tue, Jul 20, 2010 at 09:19:50PM -0700, Hugh Dickins wrote:
> > > > On Tue, 20 Jul 2010, dann frazier wrote:
> > > > > On Wed, Jul 21, 2010 at 10:51:36AM +0900, KAMEZAWA Hiroyuki wrote:
> > > > > > On Tue, 20 Jul 2010 11:35:12 -0600
> > > > > > dann frazier <dannf@debian.org> wrote:
> > > > > > 
> > > > > > > Debian's ia64 autobuilders have been experiencing system crashes while
> > > > > > > trying to run the gdb test suite:
> > > > > > >   http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=588574
> > > > > > > 
> > > > > > > I was able to reproduce this w/ the latest git tree, and bisected it
> > > > > > > down to this commit, introduced in 2.6.32:
> > > > > > > 
> > > > > > >   commit 62eede62dafb4a6633eae7ffbeb34c60dba5e7b1
> > > > > > >   Author: Hugh Dickins <hugh.dickins@tiscali.co.uk>
> > > > > > >   Date:   Mon Sep 21 17:03:34 2009 -0700
> > > > > > > 
> > > > > > >     mm: ZERO_PAGE without PTE_SPECIAL
> > > > > > > 
> > > > > > >     Reinstate anonymous use of ZERO_PAGE to all architectures, not just to
> > > > > > >     those which __HAVE_ARCH_PTE_SPECIAL: as suggested by Nick Piggin.
> > > > > > > 
> > > > > > >     Contrary to how I'd imagined it, there's nothing ugly about this, just a
> > > > > > >     zero_pfn test built into one or another block of vm_normal_page().
> > > > > > > 
> > > > > > >     But the MIPS ZERO_PAGE-of-many-colours case demands is_zero_pfn() and
> > > > > > >     my_zero_pfn() inlines.  Reinstate its mremap move_pte() shuffling of
> > > > > > >     ZERO_PAGEs we did from 2.6.17 to 2.6.19?  Not unless someone shouts for
> > > > > > >     that: it would have to take vm_flags to weed out some cases.
> > > > > > > 
> > > > > > > fyi, I found this to not be reproducible on SLES11 SP1 (which is
> > > > > > > 2.6.32-based). I compared the .configs and found that the relevant
> > > > > > > difference is the PAGE_SIZE. It does not fail w/ 64KB pages, but
> > > > > > > reliably fails w/ 16KB pages.
> > > > > > > 
> > > > > > 
> > > > > > Sorry, I have no idea...
> > > > > > Hmm, what is the address of empty_zero_page[] on your debian(16kb-page) ?
> > > > > 
> > > > > 
> > > > > dannf@krebs:~$ grep empty_zero_page /boot/System.map-2.6.32-5-mckinley 
> > > > > a0000001008784c0 d __ksymtab_empty_zero_page
> > > > > a000000100882688 d __kcrctab_empty_zero_page
> > > > > a000000100884ca4 r __kstrtab_empty_zero_page
> > > > > a000000100974000 D empty_zero_page
> > > > 
> > > > Thanks a lot for reporting this, but I too have no idea yet.
> > > > 
> > > > It is likely that the bug is not to be found in that 62eede62, but
> > > > rather in one of the preceding patches to mm/memory.c which 62eede62
> > > > was extending to ia64 and other architectures without PTE_SPECIAL.
> > > > 
> > > > I wonder, from looking at that gdb testsuite log, is it plausible
> > > > that all these hangs/crashes occurred when writing out a coredump?
> > > > Is that something you could check for us? or rule out the possibility.
> > > 
> > > Yep, seems so. I've reduced it down to this test case:
> > > 
> > > dannf@rx2600:~> cat > foo.c
> > > int leaf(void) {
> > >   return 0;
> > > }
> > > 
> > > int main(void) {
> > >   leaf();
> > > }
> > > dannf@rx2600:~> gcc -g foo.c -o foo
> > > dannf@rx2600:~> gdb ./foo 
> > > GNU gdb (GDB) SUSE (7.0-0.4.16)
> > > Copyright (C) 2009 Free Software Foundation, Inc.
> > > License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> > > This is free software: you are free to change and redistribute it.
> > > There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> > > and "show warranty" for details.
> > > This GDB was configured as "ia64-suse-linux".
> > > For bug reporting instructions, please see:
> > > <http://www.gnu.org/software/gdb/bugs/>...
> > > Reading symbols from /home/dannf/foo...done.
> > > (gdb) break leaf
> > > Breakpoint 1 at 0x40000000000005c1: file foo.c, line 2.
> > > (gdb) run
> > > Starting program: /home/dannf/foo 
> > > Missing separate debuginfo for /lib/ld-linux-ia64.so.2
> > > Try: zypper install -C "debuginfo(build-id)=d5bfb8b5940e174d54b978ca515dc0df76c7618c"
> > > Missing separate debuginfo for /lib/libc.so.6.1
> > > Try: zypper install -C "debuginfo(build-id)=ca78657bd9173653d95f8504a313d2b6db8cb1d6"
> > > 
> > > Breakpoint 1, leaf () at foo.c:2
> > > 2	     return 0;
> > > (gdb) gcore /tmp/save
> > > 
> > > [bang]
> > > 
> > 
> > Does this happen on 2.6.34 or 2.6.35-rc kernel ?
> 
> I've been testing w/ a 2.6.35-rc4+, though it was originally reported
> on a 2.6.32.

Thanks a lot for narrowing down to that simple testcase, and
thanks a lot for checking it's just as bad on recent kernels.

I'm sorry to say that I'm still just as baffled.

Let's note that gdb's gcore is building up its own version of a
coredump, not going through the get_dump_page() code I was wondering
about.  If I read gcore correctly (possibly not!), it will be reading
selected areas from /proc/<pid>/mem i.e. using access_process_vm().

But why the (16kB but not 64kB!) zero page should make that freeze
or reboot, I have no idea.

What would I be doing if I had an Itanium?  I think I'd be trying to
narrow down exactly where it goes bad (tedious when the penalty is
a freeze or reboot).

As it is, I'm hoping that someone with an ia64 can investigate...

Hugh

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: ia64 hang/mca running gdb 'make check'
  2010-07-27  9:03         ` KAMEZAWA Hiroyuki
  2010-07-27 14:43           ` dann frazier
@ 2010-07-29  7:38           ` Luming Yu
  2010-07-29  7:58             ` KAMEZAWA Hiroyuki
  1 sibling, 1 reply; 19+ messages in thread
From: Luming Yu @ 2010-07-29  7:38 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: dann frazier, Hugh Dickins, linux-ia64, linux-kernel,
	Rik van Riel, KOSAKI Motohiro, Nick Piggin, Mel Gorman,
	Minchan Kim, Ralf Baechle

On Tue, Jul 27, 2010 at 5:03 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Tue, 27 Jul 2010 01:19:15 -0600
> dann frazier <dannf@debian.org> wrote:
>
>> On Tue, Jul 20, 2010 at 09:19:50PM -0700, Hugh Dickins wrote:
>> > On Tue, 20 Jul 2010, dann frazier wrote:
>> > > On Wed, Jul 21, 2010 at 10:51:36AM +0900, KAMEZAWA Hiroyuki wrote:
>> > > > On Tue, 20 Jul 2010 11:35:12 -0600
>> > > > dann frazier <dannf@debian.org> wrote:
>> > > >
>> > > > > Debian's ia64 autobuilders have been experiencing system crashes while
>> > > > > trying to run the gdb test suite:
>> > > > >   http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=588574
>> > > > >
>> > > > > I was able to reproduce this w/ the latest git tree, and bisected it
>> > > > > down to this commit, introduced in 2.6.32:
>> > > > >
>> > > > >   commit 62eede62dafb4a6633eae7ffbeb34c60dba5e7b1
>> > > > >   Author: Hugh Dickins <hugh.dickins@tiscali.co.uk>
>> > > > >   Date:   Mon Sep 21 17:03:34 2009 -0700
>> > > > >
>> > > > >     mm: ZERO_PAGE without PTE_SPECIAL
>> > > > >
>> > > > >     Reinstate anonymous use of ZERO_PAGE to all architectures, not just to
>> > > > >     those which __HAVE_ARCH_PTE_SPECIAL: as suggested by Nick Piggin.
>> > > > >
>> > > > >     Contrary to how I'd imagined it, there's nothing ugly about this, just a
>> > > > >     zero_pfn test built into one or another block of vm_normal_page().
>> > > > >
>> > > > >     But the MIPS ZERO_PAGE-of-many-colours case demands is_zero_pfn() and
>> > > > >     my_zero_pfn() inlines.  Reinstate its mremap move_pte() shuffling of
>> > > > >     ZERO_PAGEs we did from 2.6.17 to 2.6.19?  Not unless someone shouts for
>> > > > >     that: it would have to take vm_flags to weed out some cases.
>> > > > >
>> > > > > fyi, I found this to not be reproducible on SLES11 SP1 (which is
>> > > > > 2.6.32-based). I compared the .configs and found that the relevant
>> > > > > difference is the PAGE_SIZE. It does not fail w/ 64KB pages, but
>> > > > > reliably fails w/ 16KB pages.
>> > > > >
>> > > >
>> > > > Sorry, I have no idea...
>> > > > Hmm, what is the address of empty_zero_page[] on your debian(16kb-page) ?
>> > >
>> > >
>> > > dannf@krebs:~$ grep empty_zero_page /boot/System.map-2.6.32-5-mckinley
>> > > a0000001008784c0 d __ksymtab_empty_zero_page
>> > > a000000100882688 d __kcrctab_empty_zero_page
>> > > a000000100884ca4 r __kstrtab_empty_zero_page
>> > > a000000100974000 D empty_zero_page
>> >
>> > Thanks a lot for reporting this, but I too have no idea yet.
>> >
>> > It is likely that the bug is not to be found in that 62eede62, but
>> > rather in one of the preceding patches to mm/memory.c which 62eede62
>> > was extending to ia64 and other architectures without PTE_SPECIAL.
>> >
>> > I wonder, from looking at that gdb testsuite log, is it plausible
>> > that all these hangs/crashes occurred when writing out a coredump?
>> > Is that something you could check for us? or rule out the possibility.
>>
>> Yep, seems so. I've reduced it down to this test case:
>>
>> dannf@rx2600:~> cat > foo.c
>> int leaf(void) {
>>   return 0;
>> }
>>
>> int main(void) {
>>   leaf();
>> }
>> dannf@rx2600:~> gcc -g foo.c -o foo
>> dannf@rx2600:~> gdb ./foo
>> GNU gdb (GDB) SUSE (7.0-0.4.16)
>> Copyright (C) 2009 Free Software Foundation, Inc.
>> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
>> This is free software: you are free to change and redistribute it.
>> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
>> and "show warranty" for details.
>> This GDB was configured as "ia64-suse-linux".
>> For bug reporting instructions, please see:
>> <http://www.gnu.org/software/gdb/bugs/>...
>> Reading symbols from /home/dannf/foo...done.
>> (gdb) break leaf
>> Breakpoint 1 at 0x40000000000005c1: file foo.c, line 2.
>> (gdb) run
>> Starting program: /home/dannf/foo
>> Missing separate debuginfo for /lib/ld-linux-ia64.so.2
>> Try: zypper install -C "debuginfo(build-id)=d5bfb8b5940e174d54b978ca515dc0df76c7618c"
>> Missing separate debuginfo for /lib/libc.so.6.1
>> Try: zypper install -C "debuginfo(build-id)=ca78657bd9173653d95f8504a313d2b6db8cb1d6"
>>
>> Breakpoint 1, leaf () at foo.c:2
>> 2          return 0;
>> (gdb) gcore /tmp/save
>>
>> [bang]
>>
>
> Does this happen on 2.6.34 or 2.6.35-rc kernel ?


# gdb ./foo
GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-23.el5)
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "ia64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /root/foo...done.
(gdb) break leaf
Breakpoint 1 at 0x40000000000005a1: file foo.c, line 2.
(gdb) run
Starting program: /root/foo

Breakpoint 1, leaf () at foo.c:2
2       }
(gdb) gcore /tmp/save
Segmentation fault
# cat /proc/version
Linux version 2.6.35-rc3+ ...


Is the "segmentation fault" to be called reproduced?

>
> Thanks,
> -Kame
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: ia64 hang/mca running gdb 'make check'
  2010-07-29  7:38           ` ia64 hang/mca running gdb 'make check' Luming Yu
@ 2010-07-29  7:58             ` KAMEZAWA Hiroyuki
  2010-07-29  8:40               ` Luming Yu
  0 siblings, 1 reply; 19+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-07-29  7:58 UTC (permalink / raw)
  To: Luming Yu
  Cc: dann frazier, Hugh Dickins, linux-ia64, linux-kernel,
	Rik van Riel, KOSAKI Motohiro, Nick Piggin, Mel Gorman,
	Minchan Kim, Ralf Baechle

On Thu, 29 Jul 2010 15:38:06 +0800
Luming Yu <luming.yu@gmail.com> wrote:

> On Tue, Jul 27, 2010 at 5:03 PM, KAMEZAWA Hiroyuki

> # gdb ./foo
> GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-23.el5)
> Copyright (C) 2009 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "ia64-redhat-linux-gnu".
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>...
> Reading symbols from /root/foo...done.
> (gdb) break leaf
> Breakpoint 1 at 0x40000000000005a1: file foo.c, line 2.
> (gdb) run
> Starting program: /root/foo
> 
> Breakpoint 1, leaf () at foo.c:2
> 2       }
> (gdb) gcore /tmp/save
> Segmentation fault
> # cat /proc/version
> Linux version 2.6.35-rc3+ ...
> 
> 

Hmm. What is EXEC_PAGESIZE installed in /usr/include/asm-generic/param.h ?
And what happnes when modify it to 16k if it's 64k ?

Thanks
-Kame





^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: ia64 hang/mca running gdb 'make check'
  2010-07-29  7:58             ` KAMEZAWA Hiroyuki
@ 2010-07-29  8:40               ` Luming Yu
  2010-07-29  8:44                 ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 19+ messages in thread
From: Luming Yu @ 2010-07-29  8:40 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: dann frazier, Hugh Dickins, linux-ia64, linux-kernel,
	Rik van Riel, KOSAKI Motohiro, Nick Piggin, Mel Gorman,
	Minchan Kim, Ralf Baechle

On Thu, Jul 29, 2010 at 3:58 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Thu, 29 Jul 2010 15:38:06 +0800
> Luming Yu <luming.yu@gmail.com> wrote:
>
>> On Tue, Jul 27, 2010 at 5:03 PM, KAMEZAWA Hiroyuki
>
>> # gdb ./foo
>> GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-23.el5)
>> Copyright (C) 2009 Free Software Foundation, Inc.
>> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
>> This is free software: you are free to change and redistribute it.
>> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
>> and "show warranty" for details.
>> This GDB was configured as "ia64-redhat-linux-gnu".
>> For bug reporting instructions, please see:
>> <http://www.gnu.org/software/gdb/bugs/>...
>> Reading symbols from /root/foo...done.
>> (gdb) break leaf
>> Breakpoint 1 at 0x40000000000005a1: file foo.c, line 2.
>> (gdb) run
>> Starting program: /root/foo
>>
>> Breakpoint 1, leaf () at foo.c:2
>> 2       }
>> (gdb) gcore /tmp/save
>> Segmentation fault
>> # cat /proc/version
>> Linux version 2.6.35-rc3+ ...
>>
>>
>
> Hmm. What is EXEC_PAGESIZE installed in /usr/include/asm-generic/param.h ?

I use stock gdb shipped with RHEL 5.5.

> And what happnes when modify it to 16k if it's 64k ?

Want me to repbuild a gdb with this modification?

>
> Thanks
> -Kame
>
>
>
>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: ia64 hang/mca running gdb 'make check'
  2010-07-29  8:40               ` Luming Yu
@ 2010-07-29  8:44                 ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 19+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-07-29  8:44 UTC (permalink / raw)
  To: Luming Yu
  Cc: dann frazier, Hugh Dickins, linux-ia64, linux-kernel,
	Rik van Riel, KOSAKI Motohiro, Nick Piggin, Mel Gorman,
	Minchan Kim, Ralf Baechle

On Thu, 29 Jul 2010 16:40:50 +0800
Luming Yu <luming.yu@gmail.com> wrote:

> On Thu, Jul 29, 2010 at 3:58 PM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > On Thu, 29 Jul 2010 15:38:06 +0800
> > Luming Yu <luming.yu@gmail.com> wrote:
> >
> >> On Tue, Jul 27, 2010 at 5:03 PM, KAMEZAWA Hiroyuki
> >
> >> # gdb ./foo
> >> GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-23.el5)
> >> Copyright (C) 2009 Free Software Foundation, Inc.
> >> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> >> This is free software: you are free to change and redistribute it.
> >> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> >> and "show warranty" for details.
> >> This GDB was configured as "ia64-redhat-linux-gnu".
> >> For bug reporting instructions, please see:
> >> <http://www.gnu.org/software/gdb/bugs/>...
> >> Reading symbols from /root/foo...done.
> >> (gdb) break leaf
> >> Breakpoint 1 at 0x40000000000005a1: file foo.c, line 2.
> >> (gdb) run
> >> Starting program: /root/foo
> >>
> >> Breakpoint 1, leaf () at foo.c:2
> >> 2       }
> >> (gdb) gcore /tmp/save
> >> Segmentation fault
> >> # cat /proc/version
> >> Linux version 2.6.35-rc3+ ...
> >>
> >>
> >
> > Hmm. What is EXEC_PAGESIZE installed in /usr/include/asm-generic/param.h ?
> 
> I use stock gdb shipped with RHEL 5.5.
> 
Hmm. RHEL5.5's EXEC_PAGESIZE is 64k, right ?
(And your kernel is 16k.)

> > And what happnes when modify it to 16k if it's 64k ?
> 
> Want me to repbuild a gdb with this modification?
> 
Ahhh, yes. It will be required...but plz when you have free time.
I don't think the difference can cause MCA or hang...

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: ia64 hang/mca running gdb 'make check'
  2010-07-29  3:50             ` Hugh Dickins
@ 2010-07-29 19:22               ` dann frazier
  2010-07-30  0:41                 ` KAMEZAWA Hiroyuki
  2010-07-30  2:01                 ` Hugh Dickins
  0 siblings, 2 replies; 19+ messages in thread
From: dann frazier @ 2010-07-29 19:22 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: KAMEZAWA Hiroyuki, linux-ia64, linux-kernel, Rik van Riel,
	KOSAKI Motohiro, Nick Piggin, Mel Gorman, Minchan Kim,
	Ralf Baechle

On Wed, Jul 28, 2010 at 08:50:18PM -0700, Hugh Dickins wrote:
> On Tue, 27 Jul 2010, dann frazier wrote:
> > On Tue, Jul 27, 2010 at 06:03:30PM +0900, KAMEZAWA Hiroyuki wrote:
> > > On Tue, 27 Jul 2010 01:19:15 -0600
> > > dann frazier <dannf@debian.org> wrote:
> > > > On Tue, Jul 20, 2010 at 09:19:50PM -0700, Hugh Dickins wrote:
> > > > > On Tue, 20 Jul 2010, dann frazier wrote:
> > > > > > On Wed, Jul 21, 2010 at 10:51:36AM +0900, KAMEZAWA Hiroyuki wrote:
> > > > > > > On Tue, 20 Jul 2010 11:35:12 -0600
> > > > > > > dann frazier <dannf@debian.org> wrote:
> > > > > > > 
> > > > > > > > Debian's ia64 autobuilders have been experiencing system crashes while
> > > > > > > > trying to run the gdb test suite:
> > > > > > > >   http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=588574
> > > > > > > > 
> > > > > > > > I was able to reproduce this w/ the latest git tree, and bisected it
> > > > > > > > down to this commit, introduced in 2.6.32:
> > > > > > > > 
> > > > > > > >   commit 62eede62dafb4a6633eae7ffbeb34c60dba5e7b1
> > > > > > > >   Author: Hugh Dickins <hugh.dickins@tiscali.co.uk>
> > > > > > > >   Date:   Mon Sep 21 17:03:34 2009 -0700
> > > > > > > > 
> > > > > > > >     mm: ZERO_PAGE without PTE_SPECIAL
> > > > > > > > 
> > > > > > > >     Reinstate anonymous use of ZERO_PAGE to all architectures, not just to
> > > > > > > >     those which __HAVE_ARCH_PTE_SPECIAL: as suggested by Nick Piggin.
> > > > > > > > 
> > > > > > > >     Contrary to how I'd imagined it, there's nothing ugly about this, just a
> > > > > > > >     zero_pfn test built into one or another block of vm_normal_page().
> > > > > > > > 
> > > > > > > >     But the MIPS ZERO_PAGE-of-many-colours case demands is_zero_pfn() and
> > > > > > > >     my_zero_pfn() inlines.  Reinstate its mremap move_pte() shuffling of
> > > > > > > >     ZERO_PAGEs we did from 2.6.17 to 2.6.19?  Not unless someone shouts for
> > > > > > > >     that: it would have to take vm_flags to weed out some cases.
> > > > > > > > 
> > > > > > > > fyi, I found this to not be reproducible on SLES11 SP1 (which is
> > > > > > > > 2.6.32-based). I compared the .configs and found that the relevant
> > > > > > > > difference is the PAGE_SIZE. It does not fail w/ 64KB pages, but
> > > > > > > > reliably fails w/ 16KB pages.
> > > > > > > > 
> > > > > > > 
> > > > > > > Sorry, I have no idea...
> > > > > > > Hmm, what is the address of empty_zero_page[] on your debian(16kb-page) ?
> > > > > > 
> > > > > > 
> > > > > > dannf@krebs:~$ grep empty_zero_page /boot/System.map-2.6.32-5-mckinley 
> > > > > > a0000001008784c0 d __ksymtab_empty_zero_page
> > > > > > a000000100882688 d __kcrctab_empty_zero_page
> > > > > > a000000100884ca4 r __kstrtab_empty_zero_page
> > > > > > a000000100974000 D empty_zero_page
> > > > > 
> > > > > Thanks a lot for reporting this, but I too have no idea yet.
> > > > > 
> > > > > It is likely that the bug is not to be found in that 62eede62, but
> > > > > rather in one of the preceding patches to mm/memory.c which 62eede62
> > > > > was extending to ia64 and other architectures without PTE_SPECIAL.
> > > > > 
> > > > > I wonder, from looking at that gdb testsuite log, is it plausible
> > > > > that all these hangs/crashes occurred when writing out a coredump?
> > > > > Is that something you could check for us? or rule out the possibility.
> > > > 
> > > > Yep, seems so. I've reduced it down to this test case:
> > > > 
> > > > dannf@rx2600:~> cat > foo.c
> > > > int leaf(void) {
> > > >   return 0;
> > > > }
> > > > 
> > > > int main(void) {
> > > >   leaf();
> > > > }
> > > > dannf@rx2600:~> gcc -g foo.c -o foo
> > > > dannf@rx2600:~> gdb ./foo 
> > > > GNU gdb (GDB) SUSE (7.0-0.4.16)
> > > > Copyright (C) 2009 Free Software Foundation, Inc.
> > > > License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> > > > This is free software: you are free to change and redistribute it.
> > > > There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> > > > and "show warranty" for details.
> > > > This GDB was configured as "ia64-suse-linux".
> > > > For bug reporting instructions, please see:
> > > > <http://www.gnu.org/software/gdb/bugs/>...
> > > > Reading symbols from /home/dannf/foo...done.
> > > > (gdb) break leaf
> > > > Breakpoint 1 at 0x40000000000005c1: file foo.c, line 2.
> > > > (gdb) run
> > > > Starting program: /home/dannf/foo 
> > > > Missing separate debuginfo for /lib/ld-linux-ia64.so.2
> > > > Try: zypper install -C "debuginfo(build-id)=d5bfb8b5940e174d54b978ca515dc0df76c7618c"
> > > > Missing separate debuginfo for /lib/libc.so.6.1
> > > > Try: zypper install -C "debuginfo(build-id)=ca78657bd9173653d95f8504a313d2b6db8cb1d6"
> > > > 
> > > > Breakpoint 1, leaf () at foo.c:2
> > > > 2	     return 0;
> > > > (gdb) gcore /tmp/save
> > > > 
> > > > [bang]
> > > > 
> > > 
> > > Does this happen on 2.6.34 or 2.6.35-rc kernel ?
> > 
> > I've been testing w/ a 2.6.35-rc4+, though it was originally reported
> > on a 2.6.32.
> 
> Thanks a lot for narrowing down to that simple testcase, and
> thanks a lot for checking it's just as bad on recent kernels.
> 
> I'm sorry to say that I'm still just as baffled.
> 
> Let's note that gdb's gcore is building up its own version of a
> coredump, not going through the get_dump_page() code I was wondering
> about.  If I read gcore correctly (possibly not!), it will be reading
> selected areas from /proc/<pid>/mem i.e. using access_process_vm().

This appears to be correct. I was able to collect the following
stacktrace using INIT:

[ 2535.074197] Backtrace of pid 4605 (gdb)
[ 2535.074197] 
[ 2535.074197] Call Trace:
[ 2535.074197]  [<a00000010000bb00>] ia64_native_leave_kernel+0x0/0x270
[ 2535.074197]                                 sp=e000004081c77c40 bsp=e000004081c71018
[ 2535.074197]  [<a000000100334720>] __copy_user+0x160/0x960
[ 2535.074197]                                 sp=e000004081c77e10 bsp=e000004081c71018
[ 2535.074197]  [<a000000100176b00>] access_process_vm+0x2c0/0x380
[ 2535.074197]                                 sp=e000004081c77e10 bsp=e000004081c70f60

> But why the (16kB but not 64kB!) zero page should make that freeze
> or reboot, I have no idea.
> 
> What would I be doing if I had an Itanium?  I think I'd be trying to
> narrow down exactly where it goes bad (tedious when the penalty is
> a freeze or reboot).
> 
> As it is, I'm hoping that someone with an ia64 can investigate...
> 
> Hugh
> 

-- 
dann frazier


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: ia64 hang/mca running gdb 'make check'
  2010-07-29 19:22               ` dann frazier
@ 2010-07-30  0:41                 ` KAMEZAWA Hiroyuki
  2010-07-30  2:01                 ` Hugh Dickins
  1 sibling, 0 replies; 19+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-07-30  0:41 UTC (permalink / raw)
  To: dann frazier
  Cc: Hugh Dickins, linux-ia64, linux-kernel, Rik van Riel,
	KOSAKI Motohiro, Nick Piggin, Mel Gorman, Minchan Kim,
	Ralf Baechle

On Thu, 29 Jul 2010 13:22:16 -0600
dann frazier <dannf@debian.org> wrote:

> On Wed, Jul 28, 2010 at 08:50:18PM -0700, Hugh Dickins wrote:
> > On Tue, 27 Jul 2010, dann frazier wrote:
> > > On Tue, Jul 27, 2010 at 06:03:30PM +0900, KAMEZAWA Hiroyuki wrote:
> > > > On Tue, 27 Jul 2010 01:19:15 -0600
> > > > dann frazier <dannf@debian.org> wrote:
> > > > > On Tue, Jul 20, 2010 at 09:19:50PM -0700, Hugh Dickins wrote:
> > > > > > On Tue, 20 Jul 2010, dann frazier wrote:
> > > > > > > On Wed, Jul 21, 2010 at 10:51:36AM +0900, KAMEZAWA Hiroyuki wrote:
> > > > > > > > On Tue, 20 Jul 2010 11:35:12 -0600
> > > > > > > > dann frazier <dannf@debian.org> wrote:
> > > > > > > > 
> > > > > > > > > Debian's ia64 autobuilders have been experiencing system crashes while
> > > > > > > > > trying to run the gdb test suite:
> > > > > > > > >   http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=588574
> > > > > > > > > 
> > > > > > > > > I was able to reproduce this w/ the latest git tree, and bisected it
> > > > > > > > > down to this commit, introduced in 2.6.32:
> > > > > > > > > 
> > > > > > > > >   commit 62eede62dafb4a6633eae7ffbeb34c60dba5e7b1
> > > > > > > > >   Author: Hugh Dickins <hugh.dickins@tiscali.co.uk>
> > > > > > > > >   Date:   Mon Sep 21 17:03:34 2009 -0700
> > > > > > > > > 
> > > > > > > > >     mm: ZERO_PAGE without PTE_SPECIAL
> > > > > > > > > 
> > > > > > > > >     Reinstate anonymous use of ZERO_PAGE to all architectures, not just to
> > > > > > > > >     those which __HAVE_ARCH_PTE_SPECIAL: as suggested by Nick Piggin.
> > > > > > > > > 
> > > > > > > > >     Contrary to how I'd imagined it, there's nothing ugly about this, just a
> > > > > > > > >     zero_pfn test built into one or another block of vm_normal_page().
> > > > > > > > > 
> > > > > > > > >     But the MIPS ZERO_PAGE-of-many-colours case demands is_zero_pfn() and
> > > > > > > > >     my_zero_pfn() inlines.  Reinstate its mremap move_pte() shuffling of
> > > > > > > > >     ZERO_PAGEs we did from 2.6.17 to 2.6.19?  Not unless someone shouts for
> > > > > > > > >     that: it would have to take vm_flags to weed out some cases.
> > > > > > > > > 
> > > > > > > > > fyi, I found this to not be reproducible on SLES11 SP1 (which is
> > > > > > > > > 2.6.32-based). I compared the .configs and found that the relevant
> > > > > > > > > difference is the PAGE_SIZE. It does not fail w/ 64KB pages, but
> > > > > > > > > reliably fails w/ 16KB pages.
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > Sorry, I have no idea...
> > > > > > > > Hmm, what is the address of empty_zero_page[] on your debian(16kb-page) ?
> > > > > > > 
> > > > > > > 
> > > > > > > dannf@krebs:~$ grep empty_zero_page /boot/System.map-2.6.32-5-mckinley 
> > > > > > > a0000001008784c0 d __ksymtab_empty_zero_page
> > > > > > > a000000100882688 d __kcrctab_empty_zero_page
> > > > > > > a000000100884ca4 r __kstrtab_empty_zero_page
> > > > > > > a000000100974000 D empty_zero_page
> > > > > > 
> > > > > > Thanks a lot for reporting this, but I too have no idea yet.
> > > > > > 
> > > > > > It is likely that the bug is not to be found in that 62eede62, but
> > > > > > rather in one of the preceding patches to mm/memory.c which 62eede62
> > > > > > was extending to ia64 and other architectures without PTE_SPECIAL.
> > > > > > 
> > > > > > I wonder, from looking at that gdb testsuite log, is it plausible
> > > > > > that all these hangs/crashes occurred when writing out a coredump?
> > > > > > Is that something you could check for us? or rule out the possibility.
> > > > > 
> > > > > Yep, seems so. I've reduced it down to this test case:
> > > > > 
> > > > > dannf@rx2600:~> cat > foo.c
> > > > > int leaf(void) {
> > > > >   return 0;
> > > > > }
> > > > > 
> > > > > int main(void) {
> > > > >   leaf();
> > > > > }
> > > > > dannf@rx2600:~> gcc -g foo.c -o foo
> > > > > dannf@rx2600:~> gdb ./foo 
> > > > > GNU gdb (GDB) SUSE (7.0-0.4.16)
> > > > > Copyright (C) 2009 Free Software Foundation, Inc.
> > > > > License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> > > > > This is free software: you are free to change and redistribute it.
> > > > > There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> > > > > and "show warranty" for details.
> > > > > This GDB was configured as "ia64-suse-linux".
> > > > > For bug reporting instructions, please see:
> > > > > <http://www.gnu.org/software/gdb/bugs/>...
> > > > > Reading symbols from /home/dannf/foo...done.
> > > > > (gdb) break leaf
> > > > > Breakpoint 1 at 0x40000000000005c1: file foo.c, line 2.
> > > > > (gdb) run
> > > > > Starting program: /home/dannf/foo 
> > > > > Missing separate debuginfo for /lib/ld-linux-ia64.so.2
> > > > > Try: zypper install -C "debuginfo(build-id)=d5bfb8b5940e174d54b978ca515dc0df76c7618c"
> > > > > Missing separate debuginfo for /lib/libc.so.6.1
> > > > > Try: zypper install -C "debuginfo(build-id)=ca78657bd9173653d95f8504a313d2b6db8cb1d6"
> > > > > 
> > > > > Breakpoint 1, leaf () at foo.c:2
> > > > > 2	     return 0;
> > > > > (gdb) gcore /tmp/save
> > > > > 
> > > > > [bang]
> > > > > 
> > > > 
> > > > Does this happen on 2.6.34 or 2.6.35-rc kernel ?
> > > 
> > > I've been testing w/ a 2.6.35-rc4+, though it was originally reported
> > > on a 2.6.32.
> > 
> > Thanks a lot for narrowing down to that simple testcase, and
> > thanks a lot for checking it's just as bad on recent kernels.
> > 
> > I'm sorry to say that I'm still just as baffled.
> > 
> > Let's note that gdb's gcore is building up its own version of a
> > coredump, not going through the get_dump_page() code I was wondering
> > about.  If I read gcore correctly (possibly not!), it will be reading
> > selected areas from /proc/<pid>/mem i.e. using access_process_vm().
> 
> This appears to be correct. I was able to collect the following
> stacktrace using INIT:
> 
> [ 2535.074197] Backtrace of pid 4605 (gdb)
> [ 2535.074197] 
> [ 2535.074197] Call Trace:
> [ 2535.074197]  [<a00000010000bb00>] ia64_native_leave_kernel+0x0/0x270
> [ 2535.074197]                                 sp=e000004081c77c40 bsp=e000004081c71018
> [ 2535.074197]  [<a000000100334720>] __copy_user+0x160/0x960
> [ 2535.074197]                                 sp=e000004081c77e10 bsp=e000004081c71018
> [ 2535.074197]  [<a000000100176b00>] access_process_vm+0x2c0/0x380
> [ 2535.074197]                                 sp=e000004081c77e10 bsp=e000004081c70f60
> 

Could you show full stack ? IIUC, ia64's gdb has to call both of strace(PEEK) and
/proc/pid/mem to check hidden regiter stack.

Thanks,
-Kame






^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: ia64 hang/mca running gdb 'make check'
  2010-07-29 19:22               ` dann frazier
  2010-07-30  0:41                 ` KAMEZAWA Hiroyuki
@ 2010-07-30  2:01                 ` Hugh Dickins
  2010-07-30  4:34                   ` dann frazier
  1 sibling, 1 reply; 19+ messages in thread
From: Hugh Dickins @ 2010-07-30  2:01 UTC (permalink / raw)
  To: dann frazier
  Cc: KAMEZAWA Hiroyuki, linux-ia64, linux-kernel, Rik van Riel,
	KOSAKI Motohiro, Nick Piggin, Mel Gorman, Minchan Kim,
	Ralf Baechle

On Thu, 29 Jul 2010, dann frazier wrote:
> On Wed, Jul 28, 2010 at 08:50:18PM -0700, Hugh Dickins wrote:
> > 
> > Let's note that gdb's gcore is building up its own version of a
> > coredump, not going through the get_dump_page() code I was wondering
> > about.  If I read gcore correctly (possibly not!), it will be reading
> > selected areas from /proc/<pid>/mem i.e. using access_process_vm().
> 
> This appears to be correct. I was able to collect the following
> stacktrace using INIT:
> 
> [ 2535.074197] Backtrace of pid 4605 (gdb)
> [ 2535.074197] 
> [ 2535.074197] Call Trace:
> [ 2535.074197]  [<a00000010000bb00>] ia64_native_leave_kernel+0x0/0x270
> [ 2535.074197]                                 sp=e000004081c77c40 bsp=e000004081c71018
> [ 2535.074197]  [<a000000100334720>] __copy_user+0x160/0x960
> [ 2535.074197]                                 sp=e000004081c77e10 bsp=e000004081c71018
> [ 2535.074197]  [<a000000100176b00>] access_process_vm+0x2c0/0x380
> [ 2535.074197]                                 sp=e000004081c77e10 bsp=e000004081c70f60

Thanks a lot, dann.  But it was the [vdso] line in foo's /proc/<pid>/maps
which you sent me privately, that set me thinking on the right track.
Here's what I believe is the appropriate patch: please give it a try
and let us know...

[PATCH] mm: fix ia64 crash when gcore reads gate area

Debian's ia64 autobuilders have been seeing kernel freeze or reboot
when running the gdb testsuite (Debian bug 588574): dannf bisected to
2.6.32 62eede62dafb4a6633eae7ffbeb34c60dba5e7b1 "mm: ZERO_PAGE without
PTE_SPECIAL"; and reproduced it with gdb's gcore on a simple target.

I'd missed updating the gate_vma handling in __get_user_pages(): that
happens to use vm_normal_page() (nowadays failing on the zero page),
yet reported success even when it failed to get a page - boom when
access_process_vm() tried to copy that to its intermediate buffer.

Fix this, resisting cleanups: in particular, leave it for now reporting
success when not asked to get any pages - very probably safe to change,
but let's not risk it without testing exposure.

Why did ia64 crash with 16kB pages, but succeed with 64kB pages?
Because setup_gate() pads each 64kB of its gate area with zero pages.

Reported-by: Andreas Barth <aba@not.so.argh.org>
Bisected-by: dann frazier <dannf@debian.org>
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@kernel.org
---

 mm/memory.c |   16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

--- 2.6.35-rc6/mm/memory.c	2010-05-30 17:58:57.000000000 -0700
+++ linux/mm/memory.c	2010-07-29 17:57:29.000000000 -0700
@@ -1394,10 +1394,20 @@ int __get_user_pages(struct task_struct
 				return i ? : -EFAULT;
 			}
 			if (pages) {
-				struct page *page = vm_normal_page(gate_vma, start, *pte);
+				struct page *page;
+
+				page = vm_normal_page(gate_vma, start, *pte);
+				if (!page) {
+					if (!(gup_flags & FOLL_DUMP) &&
+					     is_zero_pfn(pte_pfn(*pte)))
+						page = pte_page(*pte);
+					else {
+						pte_unmap(pte);
+						return i ? : -EFAULT;
+					}
+				}
 				pages[i] = page;
-				if (page)
-					get_page(page);
+				get_page(page);
 			}
 			pte_unmap(pte);
 			if (vmas)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: ia64 hang/mca running gdb 'make check'
  2010-07-30  2:01                 ` Hugh Dickins
@ 2010-07-30  4:34                   ` dann frazier
  2010-07-30 17:52                     ` Hugh Dickins
  0 siblings, 1 reply; 19+ messages in thread
From: dann frazier @ 2010-07-30  4:34 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: KAMEZAWA Hiroyuki, linux-ia64, linux-kernel, Rik van Riel,
	KOSAKI Motohiro, Nick Piggin, Mel Gorman, Minchan Kim,
	Ralf Baechle

On Thu, Jul 29, 2010 at 07:01:56PM -0700, Hugh Dickins wrote:
> On Thu, 29 Jul 2010, dann frazier wrote:
> > On Wed, Jul 28, 2010 at 08:50:18PM -0700, Hugh Dickins wrote:
> > > 
> > > Let's note that gdb's gcore is building up its own version of a
> > > coredump, not going through the get_dump_page() code I was wondering
> > > about.  If I read gcore correctly (possibly not!), it will be reading
> > > selected areas from /proc/<pid>/mem i.e. using access_process_vm().
> > 
> > This appears to be correct. I was able to collect the following
> > stacktrace using INIT:
> > 
> > [ 2535.074197] Backtrace of pid 4605 (gdb)
> > [ 2535.074197] 
> > [ 2535.074197] Call Trace:
> > [ 2535.074197]  [<a00000010000bb00>] ia64_native_leave_kernel+0x0/0x270
> > [ 2535.074197]                                 sp=e000004081c77c40 bsp=e000004081c71018
> > [ 2535.074197]  [<a000000100334720>] __copy_user+0x160/0x960
> > [ 2535.074197]                                 sp=e000004081c77e10 bsp=e000004081c71018
> > [ 2535.074197]  [<a000000100176b00>] access_process_vm+0x2c0/0x380
> > [ 2535.074197]                                 sp=e000004081c77e10 bsp=e000004081c70f60
> 
> Thanks a lot, dann.  But it was the [vdso] line in foo's /proc/<pid>/maps
> which you sent me privately, that set me thinking on the right track.
> Here's what I believe is the appropriate patch: please give it a try
> and let us know...

dannf@rx2600:~> gdb foo
GNU gdb (GDB) SUSE (7.0-0.4.16)
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "ia64-suse-linux".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/dannf/foo...done.
(gdb) break leaf
Breakpoint 1 at 0x4000000000000401: file foo.c, line 2.
(gdb) run
Starting program: /home/dannf/foo 

Breakpoint 1, leaf () at foo.c:2
2	     return 0;
(gdb) gcore
Saved corefile core.3952
(gdb) 

good work Hugh!

     -dann

> 
> [PATCH] mm: fix ia64 crash when gcore reads gate area
> 
> Debian's ia64 autobuilders have been seeing kernel freeze or reboot
> when running the gdb testsuite (Debian bug 588574): dannf bisected to
> 2.6.32 62eede62dafb4a6633eae7ffbeb34c60dba5e7b1 "mm: ZERO_PAGE without
> PTE_SPECIAL"; and reproduced it with gdb's gcore on a simple target.
> 
> I'd missed updating the gate_vma handling in __get_user_pages(): that
> happens to use vm_normal_page() (nowadays failing on the zero page),
> yet reported success even when it failed to get a page - boom when
> access_process_vm() tried to copy that to its intermediate buffer.
> 
> Fix this, resisting cleanups: in particular, leave it for now reporting
> success when not asked to get any pages - very probably safe to change,
> but let's not risk it without testing exposure.
> 
> Why did ia64 crash with 16kB pages, but succeed with 64kB pages?
> Because setup_gate() pads each 64kB of its gate area with zero pages.
> 
> Reported-by: Andreas Barth <aba@not.so.argh.org>
> Bisected-by: dann frazier <dannf@debian.org>
> Signed-off-by: Hugh Dickins <hughd@google.com>
> Cc: stable@kernel.org
> ---
> 
>  mm/memory.c |   16 +++++++++++++---
>  1 file changed, 13 insertions(+), 3 deletions(-)
> 
> --- 2.6.35-rc6/mm/memory.c	2010-05-30 17:58:57.000000000 -0700
> +++ linux/mm/memory.c	2010-07-29 17:57:29.000000000 -0700
> @@ -1394,10 +1394,20 @@ int __get_user_pages(struct task_struct
>  				return i ? : -EFAULT;
>  			}
>  			if (pages) {
> -				struct page *page = vm_normal_page(gate_vma, start, *pte);
> +				struct page *page;
> +
> +				page = vm_normal_page(gate_vma, start, *pte);
> +				if (!page) {
> +					if (!(gup_flags & FOLL_DUMP) &&
> +					     is_zero_pfn(pte_pfn(*pte)))
> +						page = pte_page(*pte);
> +					else {
> +						pte_unmap(pte);
> +						return i ? : -EFAULT;
> +					}
> +				}
>  				pages[i] = page;
> -				if (page)
> -					get_page(page);
> +				get_page(page);
>  			}
>  			pte_unmap(pte);
>  			if (vmas)
> 

-- 
dann frazier


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: ia64 hang/mca running gdb 'make check'
  2010-07-30  4:34                   ` dann frazier
@ 2010-07-30 17:52                     ` Hugh Dickins
  2010-07-30 17:58                       ` [PATCH] mm: fix ia64 crash when gcore reads gate area Hugh Dickins
  0 siblings, 1 reply; 19+ messages in thread
From: Hugh Dickins @ 2010-07-30 17:52 UTC (permalink / raw)
  To: dann frazier
  Cc: KAMEZAWA Hiroyuki, linux-ia64, linux-kernel, Rik van Riel,
	KOSAKI Motohiro, Nick Piggin, Mel Gorman, Minchan Kim,
	Ralf Baechle

On Thu, 29 Jul 2010, dann frazier wrote:
> 
> dannf@rx2600:~> gdb foo
> GNU gdb (GDB) SUSE (7.0-0.4.16)
> Copyright (C) 2009 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "ia64-suse-linux".
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>...
> Reading symbols from /home/dannf/foo...done.
> (gdb) break leaf
> Breakpoint 1 at 0x4000000000000401: file foo.c, line 2.
> (gdb) run
> Starting program: /home/dannf/foo 
> 
> Breakpoint 1, leaf () at foo.c:2
> 2	     return 0;
> (gdb) gcore
> Saved corefile core.3952
> (gdb) 

Many thanks for pursuing this and reporting back, dann.
Patch to Linus follows in a few moments.

Hugh

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH] mm: fix ia64 crash when gcore reads gate area
  2010-07-30 17:52                     ` Hugh Dickins
@ 2010-07-30 17:58                       ` Hugh Dickins
  0 siblings, 0 replies; 19+ messages in thread
From: Hugh Dickins @ 2010-07-30 17:58 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: dann frazier, dann frazier, Andreas Barth, Andrew Morton,
	KAMEZAWA Hiroyuki, linux-ia64, linux-kernel, Rik van Riel,
	KOSAKI Motohiro, Nick Piggin, Mel Gorman, Minchan Kim,
	Ralf Baechle

Debian's ia64 autobuilders have been seeing kernel freeze or reboot
when running the gdb testsuite (Debian bug 588574): dannf bisected to
2.6.32 62eede62dafb4a6633eae7ffbeb34c60dba5e7b1 "mm: ZERO_PAGE without
PTE_SPECIAL"; and reproduced it with gdb's gcore on a simple target.

I'd missed updating the gate_vma handling in __get_user_pages(): that
happens to use vm_normal_page() (nowadays failing on the zero page),
yet reported success even when it failed to get a page - boom when
access_process_vm() tried to copy that to its intermediate buffer.

Fix this, resisting cleanups: in particular, leave it for now reporting
success when not asked to get any pages - very probably safe to change,
but let's not risk it without testing exposure.

Why did ia64 crash with 16kB pages, but succeed with 64kB pages?
Because setup_gate() pads each 64kB of its gate area with zero pages.

Reported-by: Andreas Barth <aba@not.so.argh.org>
Bisected-by: dann frazier <dannf@debian.org>
Signed-off-by: Hugh Dickins <hughd@google.com>
Tested-by: dann frazier <dannf@dannf.org>
Cc: stable@kernel.org
---
Please add into 2.6.32-stable, 2.6.33-stable, 2.6.34-stable.

 mm/memory.c |   16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

--- 2.6.35-rc6/mm/memory.c	2010-05-30 17:58:57.000000000 -0700
+++ linux/mm/memory.c	2010-07-29 17:57:29.000000000 -0700
@@ -1394,10 +1394,20 @@ int __get_user_pages(struct task_struct
 				return i ? : -EFAULT;
 			}
 			if (pages) {
-				struct page *page = vm_normal_page(gate_vma, start, *pte);
+				struct page *page;
+
+				page = vm_normal_page(gate_vma, start, *pte);
+				if (!page) {
+					if (!(gup_flags & FOLL_DUMP) &&
+					     is_zero_pfn(pte_pfn(*pte)))
+						page = pte_page(*pte);
+					else {
+						pte_unmap(pte);
+						return i ? : -EFAULT;
+					}
+				}
 				pages[i] = page;
-				if (page)
-					get_page(page);
+				get_page(page);
 			}
 			pte_unmap(pte);
 			if (vmas)

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2010-07-30 18:00 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-07-20 17:35 ia64 hang/mca running gdb 'make check' dann frazier
2010-07-21  1:51 ` KAMEZAWA Hiroyuki
2010-07-21  3:06   ` dann frazier
2010-07-21  4:19     ` Hugh Dickins
2010-07-21 12:54       ` KOSAKI Motohiro
2010-07-27  7:19       ` dann frazier
2010-07-27  9:03         ` KAMEZAWA Hiroyuki
2010-07-27 14:43           ` dann frazier
2010-07-29  3:50             ` Hugh Dickins
2010-07-29 19:22               ` dann frazier
2010-07-30  0:41                 ` KAMEZAWA Hiroyuki
2010-07-30  2:01                 ` Hugh Dickins
2010-07-30  4:34                   ` dann frazier
2010-07-30 17:52                     ` Hugh Dickins
2010-07-30 17:58                       ` [PATCH] mm: fix ia64 crash when gcore reads gate area Hugh Dickins
2010-07-29  7:38           ` ia64 hang/mca running gdb 'make check' Luming Yu
2010-07-29  7:58             ` KAMEZAWA Hiroyuki
2010-07-29  8:40               ` Luming Yu
2010-07-29  8:44                 ` KAMEZAWA Hiroyuki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox