Re: [ltt-dev] [BUG] Linux 2.6.28.4 freezing on a 32-bits x86 Thinkpad T43p

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Ingo Molnar <mingo@elte.hu>
To: Mathieu Desnoyers <compudj@krystal.dyndns.org>
Cc: kvm@vger.kernel.org, Greg KH <greg@kroah.com>,
	linux-kernel@vger.kernel.org, ltt-dev@lists.casi.polymtl.ca,
	Avi Kivity <avi@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [ltt-dev] [BUG] Linux 2.6.28.4 freezing on a 32-bits x86 Thinkpad T43p
Date: Thu, 12 Feb 2009 15:43:13 +0100	[thread overview]
Message-ID: <20090212144313.GA14616@elte.hu> (raw)
In-Reply-To: <20090212045050.GA13924@Krystal>


* Mathieu Desnoyers <compudj@krystal.dyndns.org> wrote:

> * Mathieu Desnoyers (mathieu.desnoyers@polymtl.ca) wrote:
> > * Ingo Molnar (mingo@elte.hu) wrote:
> > > 
> > > * Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> wrote:
> > > 
> > > > Here is a new backtrace, taken with a huge amount of debugging active, which still 
> > > > points to an interrupt handler nested over kvm_mmu_pte_write as the culprit. It's 
> > > > weird that the kvm code gets called on my modest Pentium M laptop, which I think 
> > > > has no VT-x support at all. I am not running any KVM VMs on this machine. The 
> > > > problem still happens on 2.6.28.4, and Slub redzones did not identify any memory 
> > > > corruption. This could be due to kvm_mmu_pte_write which either should not be 
> > > > called at all, or due to improper interrupt disabling in this function.
> > > 
> > > Does latest tip:master fix it? In particular this one:
> > > 
> > >   9cf161a: x86/cpa: make sure cpa is safe to call in lazy mmu mode
> > > 
> > > fixes a crasher related to KVM and mmu notifiers ...
> > > 
> > > 	Ingo
> > 
> > I'll try to apply commit 
> > 9cf161a: x86/cpa: make sure cpa is safe to call in lazy mmu mode
> > 
> > To my 2.6.28.4 kernel to change the configuration minimally and see if
> > it helps. I guess we'll have to wait a few days before the problem is
> > reproduced, and even more if it's not. :)
> > 
> 
> OK, it's been much faster to reproduce now that the patch above is
> applied. New stack trace, different this time, but still pointing to
> data corruption seen by get_next_timer_interrupt. It happens in the
> first 5 minutes after bootup.
> 
> 
> BUG: unable to handle kernel NULL pointer dereference at 00000000
> IP: [<c1049aaa>] get_next_timer_interrupt+0x4a/0x220
> *pde = 00000000
> Oops: 0000 [#1] PREEMPT DEBUG_PAGEALLOC
> LTT NESTING LEVEL : 0
> last sysfs file: /sys/devices/pci0000:00/0000:00:1e.0/0000:0b:02.0/rf_kill
> Modules linked in: soundcore snd snd_rawmidi serio_raw snd_seq_midi cryptoloop snd_seq_oss snd_seq_device ipw2200 psmouse unix snd_timer snd_seq usbhid loop nvram pcmcia joydev aes_i586 snd_seq_dummy evdev i2c_i801 snd_seq_midi_event blowfish rsrc_nonstatic led_class ide_generic rfkill ide_cd_mod edd acpi_cpufreq hid_logitech sir_dev pcmcia_core thinkpad_acpi ltt_control ltt_statedump dm_mod snd_intel8x0m irtty_sir yenta_socket snd_mixer_oss ac97_bus agpgart floppy snd_pcm button dm_log dm_region_hash dm_mirror dm_snapshot snd_pcm_oss vfat thermal fat intel_agp snd_intel8x0 nls_cp437 crc_ccitt irda nls_iso8859_1 snd_ac97_codec lp parport ppdev bluetooth af_packet binfmt_misc parport_pc l2cap drm nsc_ircc ac rfcomm output video radeon battery lockd libphy ntfs ipv6 tg3 snd_page_alloc sunrpc nfs
> 
> Pid: 0, comm: swapper Not tainted (2.6.28.4-trace-00235-g6523760-dirty #15) 2687D5U
> EIP: 0060:[<c1049aaa>] EFLAGS: 00010002 CPU: 0
> EIP is at get_next_timer_interrupt+0x4a/0x220
> EAX: 0000006c EBX: c14f2b84 ECX: 00000000 EDX: 00000000
> ESI: c14f2800 EDI: 0000006c EBP: c1489ec8 ESP: c1489e90
>  DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
> Process swapper (pid: 0, ti=c1488000 task=c14473a0 task.ti=c1488000)
> Stack:
>  ffffbe6c ffffbe6b c14f2800 0000001e 00000000 00000030 00000000 c1489ec0
>  c105a696 00000000 00000030 0000001e 00000001 ffffbe6b c1489f10 c1060bf8
>  c1044f67 00000046 c1077601 00000001 c25f5d4f 0000001e c25e9f80 0000001e
> Call Trace:
>  [<c105a696>] ? sched_clock_cpu+0xc6/0x120
>  [<c1060bf8>] ? tick_nohz_stop_sched_tick+0x158/0x370
>  [<c1044f67>] ? __do_softirq+0x177/0x1f0
>  [<c1077601>] ? handle_edge_irq+0xd1/0x130
>  [<c104523e>] ? irq_exit+0x7e/0x90
>  [<c1021aad>] ? do_IRQ+0x7d/0x90
>  [<c10207b4>] ? common_interrupt+0x28/0x30
>  [<c1195d15>] ? acpi_idle_enter_simple+0x175/0x1e2
>  [<c124b7ad>] ? cpuidle_idle_call+0x6d/0xb0
>  [<c101ea15>] ? cpu_idle+0x55/0xb0
>  [<c12f9781>] ? rest_init+0x61/0x70
> Code: 0f b6 f9 89 4d c8 89 f8 8b 75 d0 8b 54 c6 24 8b 0a 0f 18 01 90 8d 5c c6 24 39 da 75 1c e9 f9 00 00 00 8d b4 26 00 00 00 00 89 ca <8b> 09 0f 18 01 90 39 da 0f 84 e2 00 00 00 f6 42 14 01 75 ea 85
> EIP: [<c1049aaa>] get_next_timer_interrupt+0x4a/0x220 SS:ESP 0068:c1489e90
> ---[ end trace 32ebcf3d2f51bd62 ]---
> Kernel panic - not syncing: Attempted to kill the idle task!
> BUG: spinlock lockup on CPU#0, swapper/0, c14f2800
> Pid: 0, comm: swapper Tainted: G      D    2.6.28.4-trace-00235-g6523760-dirty #15
> Call Trace:
>  [<c115ab5b>] _raw_spin_lock+0x10b/0x120
>  [<c1302cb9>] _spin_lock_irq+0x49/0x50
>  [<c1049309>] ? run_timer_softirq+0x29/0x1b0
>  [<c1049309>] run_timer_softirq+0x29/0x1b0
>  [<c101fcd0>] ? restore_nocheck_notrace+0x0/0xe
>  [<c1044ebe>] __do_softirq+0xce/0x1f0
>  [<c1058cf5>] ? hrtimer_interrupt+0x185/0x1a0
>  [<c104504d>] do_softirq+0x6d/0x80
>  [<c1045245>] irq_exit+0x85/0x90
>  [<c102ecb5>] smp_apic_timer_interrupt+0xd5/0x130
>  [<c10207e9>] apic_timer_interrupt+0x2d/0x34
>  [<c12ffd14>] ? panic+0x7b/0xf3
>  [<c10430de>] do_exit+0x68e/0x810
>  [<c103f98a>] ? print_oops_end_marker+0x2a/0x30
>  [<c12ffdeb>] ? printk+0x5f/0x6c
>  [<c103f98a>] ? print_oops_end_marker+0x2a/0x30
>  [<c1304191>] oops_end+0xa1/0xb0
>  [<c1022164>] die+0x54/0x70
>  [<c1305670>] ? do_page_fault+0x0/0xa60
>  [<c1305ac7>] do_page_fault+0x457/0xa60
>  [<c1302a19>] ? _spi....

hm, corrupted timer list? Have you tried my suggestions: debugojects, pagealloc, 
etc?

	Ingo

next prev parent reply	other threads:[~2009-02-12 14:43 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-02-04 21:11 [BUG] Linux 2.6.28.3 freezing on a 32-bits x86 Thinkpad T43p Mathieu Desnoyers
2009-02-04 21:17 ` Ingo Molnar
2009-02-11 19:31   ` [BUG] Linux 2.6.28.4 " Mathieu Desnoyers
2009-02-11 19:50     ` Ingo Molnar
2009-02-11 20:13       ` Mathieu Desnoyers
2009-02-12  4:50         ` [ltt-dev] " Mathieu Desnoyers
2009-02-12 14:43           ` Ingo Molnar [this message]
2009-02-12 15:07             ` Mathieu Desnoyers
2009-02-11 20:14       ` Marcelo Tosatti
2009-02-11 20:11     ` Avi Kivity

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090212144313.GA14616@elte.hu \
    --to=mingo@elte.hu \
    --cc=akpm@linux-foundation.org \
    --cc=avi@redhat.com \
    --cc=compudj@krystal.dyndns.org \
    --cc=greg@kroah.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ltt-dev@lists.casi.polymtl.ca \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox