linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Fengguang Wu <fengguang.wu@intel.com>
To: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Russell King - ARM Linux <linux@arm.linux.org.uk>,
	xen-devel@lists.xenproject.org,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Subject: Re: [xen] double fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Date: Tue, 8 Oct 2013 16:20:58 +0800	[thread overview]
Message-ID: <20131008082057.GA19657@localhost> (raw)
In-Reply-To: <20131008075816.GA6346@gmail.com>

On Tue, Oct 08, 2013 at 09:58:16AM +0200, Ingo Molnar wrote:
> 
> * Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
> > On Mon, Oct 7, 2013 at 1:35 AM, Fengguang Wu <fengguang.wu@intel.com> wrote:
> > > On Mon, Oct 07, 2013 at 01:12:17AM -0700, Linus Torvalds wrote:
> > >
> > > My pleasure! Here are 100 randomly selected call traces. Also attached
> > > several full dmesgs and the kconfig.
> > 
> > Ok, they may be randomly selected, but they are all the same. Which is
> > good, I guess, we're only talking about one bug.
> > 
> > Anyway, they all have RIP:run_timer_softirq+0x12c/0x1b8, and the code is
> > 
> >    0: 8b 65 c8             mov    -0x38(%rbp),%esp
> >    3: 4d 39 ec             cmp    %r13,%r12
> >    6: 0f 84 2f ff ff ff     je     0xffffffffffffff3b
> >    c: 41 8b 4c 24 18       mov    0x18(%r12),%ecx
> >   11: 4d 8b 74 24 20       mov    0x20(%r12),%r14
> >   16: 4d 8b 7c 24 28       mov    0x28(%r12),%r15
> >   1b: 4c 89 63 38           mov    %r12,0x38(%rbx)
> >   1f: 49 8b 44 24 08       mov    0x8(%r12),%rax
> >   24: 49 8b 14 24           mov    (%r12),%rdx
> >   28: 83 e1 02             and    $0x2,%ecx
> >   2b:* 48 89 42 08           mov    %rax,0x8(%rdx) <-- trapping instruction
> >   2f: 48 89 10             mov    %rdx,(%rax)
> >   32: 48 b8 00 02 20 00 00 movabs $0xdead000000200200,%rax
> > 
> > where that constant is LIST_POISON2 and the "and $2" seems to be 
> > TIMER_IRQSAFE. So the trapping instruction *looks* like it's doing 
> > __list_del() on the timer, and timer->next is NULL.
> > 
> > So somebody added a timer, and then deallocated/cleared the structure 
> > before it triggered. The problem is, I can't see a way to figure out 
> > _who_ did that.
> 
> I think CONFIG_DEBUG_OBJECTS_TIMERS=y should be able to detect that?

It did help expose more information, and earlier. 

w/o debugobjects, we hit the "BUG: ..." directly:

[    2.964097] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[    2.966666] IP: [<ffffffff81098f60>] run_timer_softirq+0x126/0x1da
[    2.968060] PGD 0
[    2.968060] Oops: 0002 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[    2.968060] CPU: 0 PID: 95 Comm: kworker/0:2 Not tainted 3.11.0-rc2-00010-gc817a67-dirty #5
[    2.968060] Workqueue: events flush_to_ldisc
[    2.968060] task: ffff8800068544c0 ti: ffff880006856000 task.ti: ffff880006856000
[    2.968060] RIP: 0010:[<ffffffff81098f60>]  [<ffffffff81098f60>] run_timer_softirq+0x126/0x1da

After enabling CONFIG_DEBUG_OBJECTS_TIMERS=y, it will issue a WARNING followed by a "BUG: ..."

[    2.802167] parport_pc 00:04: reported by Plug and Play ACPI
[    2.803818] parport0: PC-style at 0x378, irq 7 [PCSPP(,...)]
[    2.806035] kobject: 'parport_pc.956' (ffff880006dc3820): kobject_release, parent           (null) (delayed)
[    2.808626] ------------[ cut here ]------------                                                            
[    2.809776] WARNING: CPU: 1 PID: 1 at /c/wfg/linux/lib/debugobjects.c:260 debug_print_object+0x7c/0x8d()
[    2.812433] ODEBUG: init active (active state 0) object type: timer_list hint:           (null)         
......
[    3.796079] BUG: unable to handle kernel NULL pointer dereference at           (null)

> Debugobjects hooks into deallocation paths and complains immediately if a 
> live timer is zapped that way.
> 
> If the corrupion does not involve deallocation then it might be more 
> difficult to detect but not impossible either: for example if an object is 
> not freed but reused incorrectly then a repeat use of any timer function 
> will cause the debugobjects (and/or the timer code) to complain.
> 
> So I'd suggest trying debugobjects, it should catch a fair number of 
> non-exotic object corruption patterns.

Good to know that, thanks for the info!

Regards,
Fengguang

  reply	other threads:[~2013-10-08  8:21 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-10-06  8:23 [xen] double fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC Fengguang Wu
2013-10-06 17:26 ` Linus Torvalds
2013-10-07  2:11   ` Fengguang Wu
2013-10-07  5:10     ` Fengguang Wu
2013-10-07  8:12       ` Linus Torvalds
     [not found]         ` <20131007083505.GA22585@localhost>
2013-10-07 22:14           ` Linus Torvalds
2013-10-07 22:29             ` Russell King - ARM Linux
2013-10-07 23:33               ` Russell King - ARM Linux
2013-10-07 23:47             ` Linus Torvalds
2013-10-08  2:09               ` Fengguang Wu
2013-10-08  2:14                 ` Fengguang Wu
2013-10-08  2:36                   ` Fengguang Wu
2013-10-08  3:07                     ` Linus Torvalds
2013-10-08  2:51                   ` Linus Torvalds
2013-10-08  3:11                     ` Fengguang Wu
2013-10-08  3:29                       ` Linus Torvalds
2013-10-08  3:35                         ` Fengguang Wu
2013-10-08  4:35                           ` [DRM_CURRUS_QEMU/timer] WARNING: CPU: 0 PID: 1 at debug_print_object() Fengguang Wu
2013-10-08 12:17                       ` [xen] double fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC Fengguang Wu
2013-10-08 22:14                         ` Russell King - ARM Linux
2013-10-08 22:48                           ` Greg Kroah-Hartman
2013-10-09  0:45                             ` Linus Torvalds
2013-10-09  1:18                               ` Dave Jones
2013-10-09  1:26                                 ` [PATCH] kobject: show debug info on delayed kobject release Fengguang Wu
2013-10-09 14:47                                   ` Russell King - ARM Linux
2013-10-09 14:12                               ` [xen] double fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC Josh Boyer
2013-10-09  0:46                             ` Fengguang Wu
2013-10-08  8:06                   ` Russell King - ARM Linux
2013-10-10  2:23                     ` Dave Airlie
2013-10-10  2:38                       ` Linus Torvalds
2013-10-10  9:19                       ` Russell King - ARM Linux
2013-10-10 10:53                         ` Russell King - ARM Linux
2013-10-11  3:55                           ` Dave Airlie
2013-10-11  4:28                             ` Dave Airlie
2013-10-11  5:14                           ` Fengguang Wu
2013-10-23 11:20                             ` Xiong Zhou
2013-10-08  2:43                 ` Linus Torvalds
2013-10-08  7:58             ` Ingo Molnar
2013-10-08  8:20               ` Fengguang Wu [this message]
2013-10-08  8:26                 ` Ingo Molnar
2013-10-08  9:34                   ` Ingo Molnar
  -- strict thread matches above, loose matches on Subject: below --
2013-10-06 22:14 Boris Ostrovsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131008082057.GA19657@localhost \
    --to=fengguang.wu@intel.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@arm.linux.org.uk \
    --cc=mingo@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).