public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Philippe Elie <phil.el@wanadoo.fr>
To: Linux kernel Mailing List <linux-kernel@vger.kernel.org>,
	oprofile Mailing List <oprofile-list@lists.sourceforge.net>
Cc: Sami Farin <safari-kernel@safari.iki.fi>, Andi Kleen <ak@suse.de>
Subject: Re: 2.6.22.6 + oprofile oops
Date: Thu, 11 Oct 2007 16:36:10 +0200	[thread overview]
Message-ID: <20071011143610.GA2881@zaniah> (raw)
In-Reply-To: <20070929170523.mepbfbdqwhnyrx3z@m.safari.iki.fi>

[-- Attachment #1: Type: text/plain, Size: 1669 bytes --]

On Sat, 29 Sep 2007 at 20:05 +0000, Sami Farin wrote:

> > x86_64 SMP kernel v2.6.22.6 (not using callgraph).
> > sometimes oprofile works for a longer time... but not this time.
> > 
> > 2007-09-22 13:53:32.527237777 <1>[ 3372.390188] Unable to handle kernel NULL pointer dereference at 0000000000000650 RIP: 
> > 2007-09-22 13:53:32.527245948 <1>[ 3372.390195]  [<ffffffff80652f44>] _spin_lock+0x4/0x20
...
> 2007-09-22 13:53:32.527390314 <4>[ 3372.390457]  [<ffffffff80232b88>] get_task_mm+0x18/0x60

On the per cpu buffer writer side oprofile_add_sample() use profile_pc()
to get the eip, profile_pc() can return ~0lu, but an eip == ~0lu is a
magic value = ESCAPE_CODE. The per cpu reader side in buffer_sync.c use
this value to know that the associated data is a task pointer but here
the associated data is a counter number.

This has already been reported two years ago by Jesse Barnes on the same
sort of box, pentium D. This is not reproducible on a duo core nor I was
able to on a P4 box two years ago, I dunno why. Anyway profile_pc() is
broken() on both i386/x86_64, w/o frame pointer. For i386:

000000b0 <_spin_lock_bh>:
  b0:   53                      push   %ebx /* break profile_pc() */
  b1:   89 c3                   mov    %eax,%ebx

On x86_64 it's broken with or w/o frame pointer.

I understand the motivation to get the eip calling a spinlock function,
but that's a cheat and it has a price. Beside that, the trouble is also
on oprofile side, magic value are evil. This bug exists since at least
2.6.13.

Sami can you try the attached patch, the chunk in buffer_sync.c is
here only to avoid oopsing if another problem exists somewhere.


-- 
Phe

[-- Attachment #2: oprof-fix-profile_pc-use.patch --]
[-- Type: text/plain, Size: 1616 bytes --]

--- drivers/oprofile/cpu_buffer.c.orig	2007-10-03 21:36:08.000000000 +0200
+++ drivers/oprofile/cpu_buffer.c	2007-10-11 12:41:57.000000000 +0200
@@ -241,7 +241,7 @@
 void oprofile_add_sample(struct pt_regs * const regs, unsigned long event)
 {
 	int is_kernel = !user_mode(regs);
-	unsigned long pc = profile_pc(regs);
+	unsigned long pc = instruction_pointer(regs);
 
 	oprofile_add_ext_sample(pc, regs, event, is_kernel);
 }
--- drivers/oprofile/buffer_sync.c.orig	2007-10-04 10:15:27.000000000 +0200
+++ drivers/oprofile/buffer_sync.c	2007-10-11 12:49:22.000000000 +0200
@@ -530,11 +530,30 @@
 				/* userspace context switch */
 				new = (struct task_struct *)s->event;
 
-				release_mm(oldmm);
-				mm = take_tasks_mm(new);
-				if (mm != oldmm)
-					cookie = get_exec_dcookie(mm);
-				add_user_ctx_switch(new, cookie);
+				/* Ugly, let says task pointer can't be
+				   in the last page nor in the first page */
+				if (s->event + 0x1000 > 0x2000) {
+					release_mm(oldmm);
+					mm = take_tasks_mm(new);
+					if (mm != oldmm)
+						cookie = get_exec_dcookie(mm);
+					add_user_ctx_switch(new, cookie);
+				} else {
+					static int show_it = 0;
+					if (show_it++ < 5) {
+						printk(KERN_INFO
+						       "oprofile: Invalid "
+						       "event %lu, head %lu, "
+						       "tail %lu, eip %lu, "
+						       "available %lu, i %u "
+						       "state %d %d\n",
+						       s->event,
+						       cpu_buf->head_pos,
+						       cpu_buf->tail_pos,
+						       s->eip, available, i,
+						       state, in_kernel);
+					}
+				}
 			}
 		} else {
 			if (state >= sb_bt_start &&

  reply	other threads:[~2007-10-11 14:40 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-09-22 11:23 2.6.22.6 + oprofile oops Sami Farin
2007-09-29 17:05 ` Sami Farin
2007-10-11 14:36   ` Philippe Elie [this message]
2007-10-11 18:09     ` Sami Farin
2007-10-01 10:29 ` Andi Kleen
2007-10-01 13:22   ` Sami Farin
2007-10-01 14:47   ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20071011143610.GA2881@zaniah \
    --to=phil.el@wanadoo.fr \
    --cc=ak@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=oprofile-list@lists.sourceforge.net \
    --cc=safari-kernel@safari.iki.fi \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox