[PATCH]: Re: kernel 2.6.9-rc1-mm4 oops

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Kirill Korotaev <dev@sw.ru>
To: Roel van der Made <roel@telegraafnet.nl>
Cc: linux-kernel@vger.kernel.org, akpm@osdl.org, torvalds@osdl.org,
	wli@holomorphy.com
Subject: [PATCH]: Re: kernel 2.6.9-rc1-mm4 oops
Date: Mon, 13 Sep 2004 12:06:39 +0400	[thread overview]
Message-ID: <4145550F.8030601@sw.ru> (raw)
In-Reply-To: <20040912184804.GC19067@telegraafnet.nl>

[-- Attachment #1: Type: text/plain, Size: 3242 bytes --]

Roel van der Made wrote:
> Hi there,
> 
> This morning one of our (MySQL-)database serves crashed with the
> following kernel trace. Anyone has an idea what could've caused it?
> The machine is an SMP Xeon 2.8Ghz with 4G internal Reg. ECC ram running
> 4 scsi disks in sw raid 5 on a Debian (almost sid-)distribution.

> The trace:
> 
> ------------[ cut here ]------------
> kernel BUG at kernel/exit.c:852!
> invalid operand: 0000 [#1]
> SMP
> Modules linked in: ip_vs_wlc af_packet ipt_MARK iptable_mangle ip_tables ip_vs tg3 e1000 e100 eepro100 mii
> nfsd exportfs nfs lockd sunrpc unix
> CPU:    0
> EIP:    0060:[<c011df03>]    Not tainted VLI
> EFLAGS: 00010246   (2.6.9-rc1-mm4-fw-xeon.1)
> EIP is at next_thread+0xc/0x41
> eax: 00000000   ebx: 00000001   ecx: 00000001   edx: e93c3aa0
> esi: 00000000   edi: e93c3aa0   ebp: 00000000   esp: f3893dd8
> ds: 007b   es: 007b   ss: 0068
> Process snmpd (pid: 1182, threadinfo=f3892000 task=f3fa1550)
> Stack: c0182368 f3893f14 e93c3aa0 c016cecb c30c8a00 c011542b c03bfbe0 c30c8a00
>        c017fcf6 e18d6eb0 e93c3aa0 0000000d c017fdad e93c3aa0 4143bbb4 247966f0
>        c016c653 c03bfbe0 e18d6eb0 c03a4bc5 c01802a0 f3e56c20 e18d6eb0 0000000d
> Call Trace:
>  [<c0182368>] do_task_stat+0x279/0x752
>  [<c016cecb>] alloc_inode+0x1b/0x146
>  [<c011542b>] do_page_fault+0x19d/0x5c7
>  [<c017fcf6>] task_dumpable+0x39/0x4a
>  [<c017fdad>] proc_pid_make_inode+0xa6/0xe5
>  [<c016c653>] d_rehash+0x55/0x79
>  [<c01802a0>] proc_pident_lookup+0x100/0x26c
>  [<c0161586>] real_lookup+0xcd/0xf0
>  [<c016b468>] dput+0x24/0x209
>  [<c0162247>] link_path_walk+0xa3e/0xd89
>  [<c0182883>] proc_tgid_stat+0x1f/0x23
>  [<c017f3ed>] proc_info_read+0x6a/0x9f
>  [<c015417f>] vfs_read+0xbc/0x127
>  [<c015444d>] sys_read+0x51/0x80
>  [<c0105cdf>] syscall_call+0x7/0xb
> Code: 8b 44 24 0c 89 04 24 e8 1d fc ff ff 83 ec 04 0f b6 44 24 08 c1 e0 08 89 04 24 e8 0a fc ff ff 89 c2
> 8b 80 d0 04 00 00 85 c0 75 08 <0f> 0b 54 03 e5

It looks like an incorrect BUG() in next_thread().

Description
~~~~~~~~~~~

Note, that during exit process there can be a thread in the system with 
tsk->sighand == NULL, since the following call trace:

release_task()
{
	....
	__exit_sighand()	<<< makes tsk->sighand == NULL;
	__unhash_process()	<<< unhashes thread
	....
}

next, we see that next_thread checks for tsk->sighand != NULL:

task_t fastcall *next_thread(const task_t *p)
{
  #ifdef CONFIG_SMP
        if (!p->sighand)
                BUG();		<<< BUG happened here!!!
        if (!spin_is_locked(&p->sighand->siglock) &&
                                !rwlock_is_locked(&tasklist_lock))
	....
}

So the question is why next_thread() should check for
(p->sighand != NULL) && spin_is_locked(&p->sighand->siglock)?

I think these checks are invalid. For example do_task_stat() (which 
called next_thread() in this BUG) checks for tsk->sighand != NULL 
explicitly.
And moreover, next_thread() DOES always works correctly, whether there 
are threads or none.

This patch removes sighand checks from the next_thread(), since they are 
incorrect and has nothing to do with the next_thread() function. So they 
could trigger BUG() when there were no actually bug at all.

Signed-Off-By: Kirill Korotaev <dev@sw.ru>

Kirill

[-- Attachment #2: diff-next_thread --]
[-- Type: text/plain, Size: 491 bytes --]

--- ./kernel/exit.c.nt	2004-09-13 11:18:26.000000000 +0400
+++ ./kernel/exit.c	2004-09-13 11:53:23.611075360 +0400
@@ -848,10 +848,7 @@ asmlinkage long sys_exit(int error_code)
 task_t fastcall *next_thread(const task_t *p)
 {
 #ifdef CONFIG_SMP
-	if (!p->sighand)
-		BUG();
-	if (!spin_is_locked(&p->sighand->siglock) &&
-				!rwlock_is_locked(&tasklist_lock))
+	if (!rwlock_is_locked(&tasklist_lock))
 		BUG();
 #endif
 	return pid_task(p->pids[PIDTYPE_TGID].pid_list.next, PIDTYPE_TGID);

next prev parent reply	other threads:[~2004-09-13  7:55 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-09-12 18:48 kernel 2.6.9-rc1-mm4 oops Roel van der Made
2004-09-13  8:06 ` Kirill Korotaev [this message]
2004-09-13  8:05   ` [PATCH]: " William Lee Irwin III
2004-09-13  8:31   ` Ingo Molnar
2004-09-13  9:15     ` Kirill Korotaev
2004-09-13  9:24       ` Ingo Molnar
2004-09-13 13:34         ` Roel van der Made
2004-09-13 13:38           ` Ingo Molnar
2004-09-13 13:42             ` Roel van der Made
2004-09-13 15:03               ` Kirill Korotaev
2004-09-13 14:39         ` Kirill Korotaev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4145550F.8030601@sw.ru \
    --to=dev@sw.ru \
    --cc=akpm@osdl.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=roel@telegraafnet.nl \
    --cc=torvalds@osdl.org \
    --cc=wli@holomorphy.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox