Re: perf bug: bad page map

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: perf bug: bad page map
@ 2013-11-15 18:04 Vince Weaver
  2013-11-18 15:17 ` Peter Zijlstra
  0 siblings, 1 reply; 8+ messages in thread
From: Vince Weaver @ 2013-11-15 18:04 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Peter Zijlstra, LKML, Ingo Molnar, Paul Mackerras,
	Arnaldo Carvalho de Melo


(figured out the minicom issue).

Anyway while trying to reproduce the last bug I instead got this with
the perf_fuzzer.

Is it worth continuing to run and report these issues?  I'm losing track 
of all the open bugs.

Vince

[ 1618.118179] BUG: Bad page map in process perf_fuzzer  pte:ffff8800c4d60040 pmd:bd86a067
[ 1618.142177] addr:0000000000409000 vm_flags:00000875 anon_vma:          (null) mapping:ffff8800cb74adf0 index:9
[ 1618.172142] vma->vm_ops->fault: filemap_fault+0x0/0x358
[ 1618.187783] vma->vm_file->f_op->mmap: ext4_file_mmap+0x0/0x48
[ 1618.204981] CPU: 1 PID: 24819 Comm: perf_fuzzer Not tainted 3.12.0 #4
[ 1618.224256] Hardware name: AOpen   DE7000/nMCP7ALPx-DE R1.06 Oct.19.2012, BIOS 080015  10/19/2012
[ 1618.250825]  0000000000409000 ffff8800bf6dfaa8 ffffffff8151d8ec 0000000000000000
[ 1618.273081]  ffff8800c89ac928 ffff8800bf6dfaf8 ffffffff810ed692 dead000000200200
[ 1618.295345]  00000000c03df067 ffff8800bf6dfbe8 0000000000409000 ffffea0002bc2fe8
[ 1618.317603] Call Trace:
[ 1618.324951]  [<ffffffff8151d8ec>] dump_stack+0x49/0x5d
[ 1618.340355]  [<ffffffff810ed692>] print_bad_pte+0x1f5/0x213
[ 1618.357059]  [<ffffffff810ef43c>] unmap_single_vma+0x511/0x666
[ 1618.374540]  [<ffffffff810ef5c3>] unmap_vmas+0x32/0x49
[ 1618.389934]  [<ffffffff810f3804>] exit_mmap+0x84/0x10d
[ 1618.405343]  [<ffffffff8105bb15>] ? hrtimer_try_to_cancel+0x41/0x4b
[ 1618.424129]  [<ffffffff8103ac43>] mmput+0x4b/0xd1
[ 1618.438227]  [<ffffffff8103ec76>] do_exit+0x36c/0x936
[ 1618.453366]  [<ffffffff810c7312>] ? update_context_time+0x11/0x34
[ 1618.471628]  [<ffffffff8100951b>] ? native_sched_clock+0x3b/0x3d
[ 1618.489635]  [<ffffffff8106730d>] ? sched_clock_local+0x1c/0x82
[ 1618.507376]  [<ffffffff8103f2b8>] do_group_exit+0x78/0xa0
[ 1618.523563]  [<ffffffff8104c898>] get_signal_to_deliver+0x46d/0x48a
[ 1618.542347]  [<ffffffff810c8ac7>] ? ctx_sched_in+0x35/0x185
[ 1618.559051]  [<ffffffff810c8c80>] ? perf_event_sched_in+0x69/0x72
[ 1618.577318]  [<ffffffff81002513>] do_signal+0x46/0x5f5
[ 1618.592724]  [<ffffffff810c8ffe>] ? __perf_event_task_sched_in+0x3a/0x10e
[ 1618.613071]  [<ffffffff8106699f>] ? finish_task_switch+0x46/0x98
[ 1618.631075]  [<ffffffff8151f832>] ? __schedule+0x51c/0x54b
[ 1618.647516]  [<ffffffff81002aee>] do_notify_resume+0x2c/0x64
[ 1618.664486]  [<ffffffff81520ef5>] retint_signal+0x3d/0x78
[ 1618.680661] Disabling lock debugging due to kernel taint


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: perf bug: bad page map
  2013-11-15 18:04 perf bug: bad page map Vince Weaver
@ 2013-11-18 15:17 ` Peter Zijlstra
  2013-11-18 15:36   ` Ingo Molnar
  2013-11-18 16:41   ` Vince Weaver
  0 siblings, 2 replies; 8+ messages in thread
From: Peter Zijlstra @ 2013-11-18 15:17 UTC (permalink / raw)
  To: Vince Weaver
  Cc: LKML, Ingo Molnar, Paul Mackerras, Arnaldo Carvalho de Melo,
	tytso, adilger.kernel

On Fri, Nov 15, 2013 at 01:04:23PM -0500, Vince Weaver wrote:
> 
> (figured out the minicom issue).
> 
> Anyway while trying to reproduce the last bug I instead got this with
> the perf_fuzzer.
> 
> Is it worth continuing to run and report these issues?  I'm losing track 
> of all the open bugs.

This is looks like ext4. Not entirely sure how perf ties into this.

Anyway, yes, I do think its useful to keep running these tests, we do
fix various issues -- although probably not at the rate you seem to be
finding them.

> [ 1618.118179] BUG: Bad page map in process perf_fuzzer  pte:ffff8800c4d60040 pmd:bd86a067
> [ 1618.142177] addr:0000000000409000 vm_flags:00000875 anon_vma:          (null) mapping:ffff8800cb74adf0 index:9
> [ 1618.172142] vma->vm_ops->fault: filemap_fault+0x0/0x358
> [ 1618.187783] vma->vm_file->f_op->mmap: ext4_file_mmap+0x0/0x48
> [ 1618.204981] CPU: 1 PID: 24819 Comm: perf_fuzzer Not tainted 3.12.0 #4
> [ 1618.224256] Hardware name: AOpen   DE7000/nMCP7ALPx-DE R1.06 Oct.19.2012, BIOS 080015  10/19/2012
> [ 1618.250825]  0000000000409000 ffff8800bf6dfaa8 ffffffff8151d8ec 0000000000000000
> [ 1618.273081]  ffff8800c89ac928 ffff8800bf6dfaf8 ffffffff810ed692 dead000000200200
> [ 1618.295345]  00000000c03df067 ffff8800bf6dfbe8 0000000000409000 ffffea0002bc2fe8
> [ 1618.317603] Call Trace:
> [ 1618.324951]  [<ffffffff8151d8ec>] dump_stack+0x49/0x5d
> [ 1618.340355]  [<ffffffff810ed692>] print_bad_pte+0x1f5/0x213
> [ 1618.357059]  [<ffffffff810ef43c>] unmap_single_vma+0x511/0x666
> [ 1618.374540]  [<ffffffff810ef5c3>] unmap_vmas+0x32/0x49
> [ 1618.389934]  [<ffffffff810f3804>] exit_mmap+0x84/0x10d
> [ 1618.405343]  [<ffffffff8105bb15>] ? hrtimer_try_to_cancel+0x41/0x4b
> [ 1618.424129]  [<ffffffff8103ac43>] mmput+0x4b/0xd1
> [ 1618.438227]  [<ffffffff8103ec76>] do_exit+0x36c/0x936
> [ 1618.453366]  [<ffffffff810c7312>] ? update_context_time+0x11/0x34
> [ 1618.471628]  [<ffffffff8100951b>] ? native_sched_clock+0x3b/0x3d
> [ 1618.489635]  [<ffffffff8106730d>] ? sched_clock_local+0x1c/0x82
> [ 1618.507376]  [<ffffffff8103f2b8>] do_group_exit+0x78/0xa0
> [ 1618.523563]  [<ffffffff8104c898>] get_signal_to_deliver+0x46d/0x48a
> [ 1618.542347]  [<ffffffff810c8ac7>] ? ctx_sched_in+0x35/0x185
> [ 1618.559051]  [<ffffffff810c8c80>] ? perf_event_sched_in+0x69/0x72
> [ 1618.577318]  [<ffffffff81002513>] do_signal+0x46/0x5f5
> [ 1618.592724]  [<ffffffff810c8ffe>] ? __perf_event_task_sched_in+0x3a/0x10e
> [ 1618.613071]  [<ffffffff8106699f>] ? finish_task_switch+0x46/0x98
> [ 1618.631075]  [<ffffffff8151f832>] ? __schedule+0x51c/0x54b
> [ 1618.647516]  [<ffffffff81002aee>] do_notify_resume+0x2c/0x64
> [ 1618.664486]  [<ffffffff81520ef5>] retint_signal+0x3d/0x78
> [ 1618.680661] Disabling lock debugging due to kernel taint
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: perf bug: bad page map
  2013-11-18 15:17 ` Peter Zijlstra
@ 2013-11-18 15:36   ` Ingo Molnar
  2013-11-18 16:41   ` Vince Weaver
  1 sibling, 0 replies; 8+ messages in thread
From: Ingo Molnar @ 2013-11-18 15:36 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Vince Weaver, LKML, Paul Mackerras, Arnaldo Carvalho de Melo,
	tytso, adilger.kernel


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Fri, Nov 15, 2013 at 01:04:23PM -0500, Vince Weaver wrote:
> > 
> > (figured out the minicom issue).
> > 
> > Anyway while trying to reproduce the last bug I instead got this 
> > with the perf_fuzzer.
> > 
> > Is it worth continuing to run and report these issues?  I'm losing 
> > track of all the open bugs.
> 
> This is looks like ext4. Not entirely sure how perf ties into this.
> 
> Anyway, yes, I do think its useful to keep running these tests, we 
> do fix various issues -- although probably not at the rate you seem 
> to be finding them.

I'm trying to slow down the merging of kernel side features, so that 
the fixing effort has a chance to catch up...

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: perf bug: bad page map
  2013-11-18 15:17 ` Peter Zijlstra
  2013-11-18 15:36   ` Ingo Molnar
@ 2013-11-18 16:41   ` Vince Weaver
  2013-11-18 17:13     ` Ingo Molnar
  2013-11-18 23:05     ` One Thousand Gnomes
  1 sibling, 2 replies; 8+ messages in thread
From: Vince Weaver @ 2013-11-18 16:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Vince Weaver, LKML, Ingo Molnar, Paul Mackerras,
	Arnaldo Carvalho de Melo, tytso, adilger.kernel

On Mon, 18 Nov 2013, Peter Zijlstra wrote:

> On Fri, Nov 15, 2013 at 01:04:23PM -0500, Vince Weaver wrote:
> > 
> > (figured out the minicom issue).
> > 
> > Anyway while trying to reproduce the last bug I instead got this with
> > the perf_fuzzer.
> > 
> > Is it worth continuing to run and report these issues?  I'm losing track 
> > of all the open bugs.
> 
> This is looks like ext4. Not entirely sure how perf ties into this.

It's believable the filesystem could have issues (it's a fuzzer machine, 
so it's had 100+ unclean shutdowns on an SSD drive in the past few months)
but as far as I know there shouldn't have been any filesystem accesses 
happening at all when the bug triggered.

I thought it might be perf related due to the perf references in the 
backtrace (and since it was being perf-fuzzed at the time).

> Anyway, yes, I do think its useful to keep running these tests, we do
> fix various issues -- although probably not at the rate you seem to be
> finding them.
> 
> > [ 1618.118179] BUG: Bad page map in process perf_fuzzer  pte:ffff8800c4d60040 pmd:bd86a067
> > [ 1618.142177] addr:0000000000409000 vm_flags:00000875 anon_vma:          (null) mapping:ffff8800cb74adf0 index:9
> > [ 1618.172142] vma->vm_ops->fault: filemap_fault+0x0/0x358
> > [ 1618.187783] vma->vm_file->f_op->mmap: ext4_file_mmap+0x0/0x48
> > [ 1618.204981] CPU: 1 PID: 24819 Comm: perf_fuzzer Not tainted 3.12.0 #4
> > [ 1618.224256] Hardware name: AOpen   DE7000/nMCP7ALPx-DE R1.06 Oct.19.2012, BIOS 080015  10/19/2012
> > [ 1618.250825]  0000000000409000 ffff8800bf6dfaa8 ffffffff8151d8ec 0000000000000000
> > [ 1618.273081]  ffff8800c89ac928 ffff8800bf6dfaf8 ffffffff810ed692 dead000000200200
> > [ 1618.295345]  00000000c03df067 ffff8800bf6dfbe8 0000000000409000 ffffea0002bc2fe8
> > [ 1618.317603] Call Trace:
> > [ 1618.324951]  [<ffffffff8151d8ec>] dump_stack+0x49/0x5d
> > [ 1618.340355]  [<ffffffff810ed692>] print_bad_pte+0x1f5/0x213
> > [ 1618.357059]  [<ffffffff810ef43c>] unmap_single_vma+0x511/0x666
> > [ 1618.374540]  [<ffffffff810ef5c3>] unmap_vmas+0x32/0x49
> > [ 1618.389934]  [<ffffffff810f3804>] exit_mmap+0x84/0x10d
> > [ 1618.405343]  [<ffffffff8105bb15>] ? hrtimer_try_to_cancel+0x41/0x4b
> > [ 1618.424129]  [<ffffffff8103ac43>] mmput+0x4b/0xd1
> > [ 1618.438227]  [<ffffffff8103ec76>] do_exit+0x36c/0x936
> > [ 1618.453366]  [<ffffffff810c7312>] ? update_context_time+0x11/0x34
> > [ 1618.471628]  [<ffffffff8100951b>] ? native_sched_clock+0x3b/0x3d
> > [ 1618.489635]  [<ffffffff8106730d>] ? sched_clock_local+0x1c/0x82
> > [ 1618.507376]  [<ffffffff8103f2b8>] do_group_exit+0x78/0xa0
> > [ 1618.523563]  [<ffffffff8104c898>] get_signal_to_deliver+0x46d/0x48a
> > [ 1618.542347]  [<ffffffff810c8ac7>] ? ctx_sched_in+0x35/0x185
> > [ 1618.559051]  [<ffffffff810c8c80>] ? perf_event_sched_in+0x69/0x72
> > [ 1618.577318]  [<ffffffff81002513>] do_signal+0x46/0x5f5
> > [ 1618.592724]  [<ffffffff810c8ffe>] ? __perf_event_task_sched_in+0x3a/0x10e
> > [ 1618.613071]  [<ffffffff8106699f>] ? finish_task_switch+0x46/0x98
> > [ 1618.631075]  [<ffffffff8151f832>] ? __schedule+0x51c/0x54b
> > [ 1618.647516]  [<ffffffff81002aee>] do_notify_resume+0x2c/0x64
> > [ 1618.664486]  [<ffffffff81520ef5>] retint_signal+0x3d/0x78
> > [ 1618.680661] Disabling lock debugging due to kernel taint

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: perf bug: bad page map
  2013-11-18 16:41   ` Vince Weaver
@ 2013-11-18 17:13     ` Ingo Molnar
  2013-11-18 23:05     ` One Thousand Gnomes
  1 sibling, 0 replies; 8+ messages in thread
From: Ingo Molnar @ 2013-11-18 17:13 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Peter Zijlstra, LKML, Paul Mackerras, Arnaldo Carvalho de Melo,
	tytso, adilger.kernel


* Vince Weaver <vincent.weaver@maine.edu> wrote:

> On Mon, 18 Nov 2013, Peter Zijlstra wrote:
> 
> > On Fri, Nov 15, 2013 at 01:04:23PM -0500, Vince Weaver wrote:
> > > 
> > > (figured out the minicom issue).
> > > 
> > > Anyway while trying to reproduce the last bug I instead got this with
> > > the perf_fuzzer.
> > > 
> > > Is it worth continuing to run and report these issues?  I'm losing track 
> > > of all the open bugs.
> > 
> > This is looks like ext4. Not entirely sure how perf ties into this.
> 
> It's believable the filesystem could have issues (it's a fuzzer 
> machine, so it's had 100+ unclean shutdowns on an SSD drive in the 
> past few months) but as far as I know there shouldn't have been any 
> filesystem accesses happening at all when the bug triggered.
> 
> I thought it might be perf related due to the perf references in the 
> backtrace (and since it was being perf-fuzzed at the time).

Maybe the connection is that ext4 has lots of tracepoints?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: perf bug: bad page map
  2013-11-18 16:41   ` Vince Weaver
  2013-11-18 17:13     ` Ingo Molnar
@ 2013-11-18 23:05     ` One Thousand Gnomes
  2013-11-19  1:57       ` Vince Weaver
  1 sibling, 1 reply; 8+ messages in thread
From: One Thousand Gnomes @ 2013-11-18 23:05 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Peter Zijlstra, LKML, Ingo Molnar, Paul Mackerras,
	Arnaldo Carvalho de Melo, tytso, adilger.kernel

On Mon, 18 Nov 2013 11:41:22 -0500 (EST)
Vince Weaver <vincent.weaver@maine.edu> wrote:

> On Mon, 18 Nov 2013, Peter Zijlstra wrote:
> 
> > On Fri, Nov 15, 2013 at 01:04:23PM -0500, Vince Weaver wrote:
> > > 
> > > (figured out the minicom issue).
> > > 
> > > Anyway while trying to reproduce the last bug I instead got this with
> > > the perf_fuzzer.
> > > 
> > > Is it worth continuing to run and report these issues?  I'm losing track 
> > > of all the open bugs.
> > 
> > This is looks like ext4. Not entirely sure how perf ties into this.
> 
> It's believable the filesystem could have issues (it's a fuzzer machine, 
> so it's had 100+ unclean shutdowns on an SSD drive in the past few months)
> but as far as I know there shouldn't have been any filesystem accesses 
> happening at all when the bug triggered.

Obvious question - does it pass fsck currently. If it does then
presumably it was sane at the time it went pop ?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: perf bug: bad page map
  2013-11-18 23:05     ` One Thousand Gnomes
@ 2013-11-19  1:57       ` Vince Weaver
  2013-11-19  7:06         ` Ingo Molnar
  0 siblings, 1 reply; 8+ messages in thread
From: Vince Weaver @ 2013-11-19  1:57 UTC (permalink / raw)
  To: One Thousand Gnomes
  Cc: Peter Zijlstra, LKML, Ingo Molnar, Paul Mackerras,
	Arnaldo Carvalho de Melo, tytso, adilger.kernel

On Mon, 18 Nov 2013, One Thousand Gnomes wrote:

> On Mon, 18 Nov 2013 11:41:22 -0500 (EST)
> Vince Weaver <vincent.weaver@maine.edu> wrote:
> 
> > On Mon, 18 Nov 2013, Peter Zijlstra wrote:
> > 
> > > On Fri, Nov 15, 2013 at 01:04:23PM -0500, Vince Weaver wrote:
> > > > 
> > > > (figured out the minicom issue).
> > > > 
> > > > Anyway while trying to reproduce the last bug I instead got this with
> > > > the perf_fuzzer.
> > > > 
> > > > Is it worth continuing to run and report these issues?  I'm losing track 
> > > > of all the open bugs.
> > > 
> > > This is looks like ext4. Not entirely sure how perf ties into this.
> > 
> > It's believable the filesystem could have issues (it's a fuzzer machine, 
> > so it's had 100+ unclean shutdowns on an SSD drive in the past few months)
> > but as far as I know there shouldn't have been any filesystem accesses 
> > happening at all when the bug triggered.
> 
> Obvious question - does it pass fsck currently. If it does then
> presumably it was sane at the time it went pop ?

# e2fsck -f /dev/sda1                                                           
e2fsck 1.42.8 (20-Jun-2013)                                                     
Pass 1: Checking inodes, blocks, and sizes                                      
Pass 2: Checking directory structure                                            
Pass 3: Checking directory connectivity                                         
Pass 4: Checking reference counts                                               
Pass 5: Checking group summary information                                      
/dev/sda1: 620972/3514368 files (0.5% non-contiguous), 9796212/14047744 blocks  

so it looks clean now...

Vince

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: perf bug: bad page map
  2013-11-19  1:57       ` Vince Weaver
@ 2013-11-19  7:06         ` Ingo Molnar
  0 siblings, 0 replies; 8+ messages in thread
From: Ingo Molnar @ 2013-11-19  7:06 UTC (permalink / raw)
  To: Vince Weaver
  Cc: One Thousand Gnomes, Peter Zijlstra, LKML, Paul Mackerras,
	Arnaldo Carvalho de Melo, tytso, adilger.kernel


* Vince Weaver <vincent.weaver@maine.edu> wrote:

> On Mon, 18 Nov 2013, One Thousand Gnomes wrote:
> 
> > On Mon, 18 Nov 2013 11:41:22 -0500 (EST)
> > Vince Weaver <vincent.weaver@maine.edu> wrote:
> > 
> > > On Mon, 18 Nov 2013, Peter Zijlstra wrote:
> > > 
> > > > On Fri, Nov 15, 2013 at 01:04:23PM -0500, Vince Weaver wrote:
> > > > > 
> > > > > (figured out the minicom issue).
> > > > > 
> > > > > Anyway while trying to reproduce the last bug I instead got this with
> > > > > the perf_fuzzer.
> > > > > 
> > > > > Is it worth continuing to run and report these issues?  I'm losing track 
> > > > > of all the open bugs.
> > > > 
> > > > This is looks like ext4. Not entirely sure how perf ties into this.
> > > 
> > > It's believable the filesystem could have issues (it's a fuzzer machine, 
> > > so it's had 100+ unclean shutdowns on an SSD drive in the past few months)
> > > but as far as I know there shouldn't have been any filesystem accesses 
> > > happening at all when the bug triggered.
> > 
> > Obvious question - does it pass fsck currently. If it does then
> > presumably it was sane at the time it went pop ?
> 
> # e2fsck -f /dev/sda1                                                           
> e2fsck 1.42.8 (20-Jun-2013)                                                     
> Pass 1: Checking inodes, blocks, and sizes                                      
> Pass 2: Checking directory structure                                            
> Pass 3: Checking directory connectivity                                         
> Pass 4: Checking reference counts                                               
> Pass 5: Checking group summary information                                      
> /dev/sda1: 620972/3514368 files (0.5% non-contiguous), 9796212/14047744 blocks  
> 
> so it looks clean now...

Also, in no way should a corrupted filesystem be able to provoke 
kernel crashes. So even if the filesystem had errors, this would still 
be a kernel bug we need to fix.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2013-11-19  7:06 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-11-15 18:04 perf bug: bad page map Vince Weaver
2013-11-18 15:17 ` Peter Zijlstra
2013-11-18 15:36   ` Ingo Molnar
2013-11-18 16:41   ` Vince Weaver
2013-11-18 17:13     ` Ingo Molnar
2013-11-18 23:05     ` One Thousand Gnomes
2013-11-19  1:57       ` Vince Weaver
2013-11-19  7:06         ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).