Oops when read/write or mount/unmount continuously ~ 600.000 times

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Hong Tran Duc <hongtd2k@gmail.com>
To: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-ide@vger.kernel.org
Subject: Oops when read/write or mount/unmount continuously ~ 600.000 times
Date: Sun, 03 Aug 2008 19:49:50 +0700	[thread overview]
Message-ID: <4895A96E.2040303@gmail.com> (raw)

Hi all,

I’m using kernel 2.4.20 with fully preemptive enable (patch & set the 
CONFIG option). My CPU is PowerPC 750FX, HDD 80GB, RAM 512,

I got many Oops when try to mount/unmount or read/write on ATA HDD 
continuously about 600.000 times (in several hours). Oops often occurred 
when CPU trap SIGSEGV or SIGILL, sometime on page management module, 
sometimes on scheduler, block I/O manipulation, filesystem.

The most frequently happened on:
Block I/O : make_request, generic_make_request, submit_bh, bdfind, bmap, 
__wait_on_buffer ..
Filesystem: journal_commit_transaction, kill_super, invalidate_inode, 
invalidate_list ..

The reasons is almost linked list on those function was broken. Ex: 
linkedlist->next linkedlist->prev = NULL or set to invalid address.
In the situation SIGILL, the instruction pointer (NIP) is same as the 
return address register (LR).

The newest Oops, I got on function __wait_on_buffer(). The main 
sequences of __wait_on_buffer() are:
1. put_bh -> atomic_inc(bh->b_count);
2. add wait queue
3. loop: do some thing task manipulation, call *schedule()*
4. remove wait queue
5. get_bh -> atomic_dec(bh->b_count); *<- Got Oops here, SEGV because 
bh->b_count = R25 = 0x02 *

After analysis assembly code (I upload on pastebin bellow) at this 
point, I found that:
* At the point (1) -> address of bh->b_count stored in register r25.
* The point from (2) ->(4) all of affect to register 25 will be restored 
from stack (r25 act as non violent register in gcc ABI).
* An step 5, *r25 = 0x02 ??? I don’t know why r25 is changed ? May be 
stack on somewhere was corrupted ?*

This Oops is very difficult to replicate (about 2 hours run stress test 
program). I try to increase/reduce the HZ 10 times, but the frequency of 
bug is no change. And, I tried on ext2/ext3, it’s same result.

I’m really confusing now, I don’t know where the real problem is, and 
what is effected with the frequency of Oops, how to debug and figure 
this bug ?

I post my situation to this ML and hope to get some advice from you,

Some Oops, I uploaded on pastebin here:
http://vnoss.net/p/5783
http://vnoss.net/p/5785

Sources and assembly of __wait_on_buffer()
http://vnoss.net/p/5784

Thanks for your help,

-- 
nm.

GPG Key ID: 0xDD253B25
Fingerprint: 2B17 D64A 9561 A443 2ABC 1302 4641 D0B7 DD25 3B25

next             reply	other threads:[~2008-08-03 12:49 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-08-03 12:49 Hong Tran Duc [this message]
2008-08-03 13:38 ` Oops when read/write or mount/unmount continuously ~ 600.000 times Matthew Wilcox
2008-08-03 15:18   ` Hong Tran Duc

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4895A96E.2040303@gmail.com \
    --to=hongtd2k@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.