Re: 2.6.20.3 AMD64 oops in CFQ code

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jens Axboe <jens.axboe@oracle.com>
To: linux@horizon.com
Cc: linux-ide@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: 2.6.20.3 AMD64 oops in CFQ code
Date: Thu, 22 Mar 2007 19:41:57 +0100	[thread overview]
Message-ID: <20070322184155.GY19922@kernel.dk> (raw)
In-Reply-To: <20070322123821.7843.qmail@science.horizon.com>

On Thu, Mar 22 2007, linux@horizon.com wrote:
> This is a uniprocessor AMD64 system running software RAID-5 and RAID-10
> over multiple PCIe SiI3132 SATA controllers.  The hardware has been very
> stable for a long time, but has been acting up of late since I upgraded
> to 2.6.20.3.  ECC memory should preclude the possibility of bit-flip
> errors.
> 
> Kernel 2.6.20.3 + linuxpps patches (confined to drivers/serial, and not
> actually in use as I stole the serial port for a console).
> 
> It takes half a day to reproduce the problem, so bisecting would be painful.
> 
> BackupPC_dump mostly writes to a large (1.7 TB) ext3 RAID5 partition.
> 
> 
> Here are two oopes, a few minutes (16:31, to be precise) apart.
> Unusually, it oopsed twice *without* locking up the system..  Usually,
> I see this followed by an error from drivers/input/keyboard/atkbd.c:
>                         printk(KERN_WARNING "atkbd.c: Spurious %s on %s. "
>                                "Some program might be trying access hardware directly.\n",
> emitted at 1 Hz with the keyboard LEDs flashing and the system
> unresponsive to keyboard or pings.
> (I think it was spurious ACK on serio/input0, but my memory may be faulty.)
> 
> 
> If anyone has any suggestions, they'd be gratefully received.
> 
> 
> Unable to handle kernel NULL pointer dereference at 0000000000000098 RIP: 
>  [<ffffffff8031504a>] cfq_dispatch_insert+0x18/0x68
> PGD 777e9067 PUD 78774067 PMD 0 
> Oops: 0000 [1] 
> CPU 0 
> Modules linked in: ecb
> Pid: 2837, comm: BackupPC_dump Not tainted 2.6.20.3-g691f5333 #40
> RIP: 0010:[<ffffffff8031504a>]  [<ffffffff8031504a>] cfq_dispatch_insert+0x18/0x68
> RSP: 0018:ffff8100770bbaf8  EFLAGS: 00010092
> RAX: ffff81007fb36c80 RBX: 0000000000000000 RCX: 0000000000000001
> RDX: 000000010003e4e7 RSI: 0000000000000000 RDI: 0000000000000000
> RBP: ffff81007fb37a00 R08: 00000000ffffffff R09: ffff81005d390298
> R10: ffff81007fcb4f80 R11: ffff81007fcb4f80 R12: ffff81007facd280
> R13: 0000000000000004 R14: 0000000000000001 R15: 0000000000000000
> FS:  00002b322d120d30(0000) GS:ffffffff805de000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000098 CR3: 000000007bcf0000 CR4: 00000000000006e0
> Process BackupPC_dump (pid: 2837, threadinfo ffff8100770ba000, task ffff81007fc5d8e0)
> Stack:  0000000000000000 ffff8100770f39f0 0000000000000000 0000000000000004
>  0000000000000001 ffffffff80315253 ffffffff803b2607 ffff81005da2bc40
>  ffff81007fac3800 ffff81007facd280 ffff81007facd280 ffff81005d390298
> Call Trace:
>  [<ffffffff80315253>] cfq_dispatch_requests+0x152/0x512
>  [<ffffffff803b2607>] scsi_done+0x0/0x18
>  [<ffffffff8030d9f1>] elv_next_request+0x137/0x147
>  [<ffffffff803b7ce0>] scsi_request_fn+0x6a/0x33a
>  [<ffffffff8024d407>] generic_unplug_device+0xa/0xe
>  [<ffffffff80407ced>] unplug_slaves+0x5b/0x94
>  [<ffffffff80223d65>] sync_page+0x0/0x40
>  [<ffffffff80223d9b>] sync_page+0x36/0x40
>  [<ffffffff80256d45>] __wait_on_bit_lock+0x36/0x65
>  [<ffffffff80237496>] __lock_page+0x5e/0x64
>  [<ffffffff8028061d>] wake_bit_function+0x0/0x23
>  [<ffffffff802074de>] find_get_page+0xe/0x2d
>  [<ffffffff8020b38e>] do_generic_mapping_read+0x1c2/0x40d
>  [<ffffffff8020bd80>] file_read_actor+0x0/0x118
>  [<ffffffff8021422e>] generic_file_aio_read+0x15c/0x19e
>  [<ffffffff8020bafa>] do_sync_read+0xc9/0x10c
>  [<ffffffff80210342>] may_open+0x5b/0x1c6
>  [<ffffffff802805ef>] autoremove_wake_function+0x0/0x2e
>  [<ffffffff8020a857>] vfs_read+0xaa/0x152
>  [<ffffffff8020faf3>] sys_read+0x45/0x6e
>  [<ffffffff8025041e>] system_call+0x7e/0x83

3 (I think) seperate instances of this, each involving raid5. Is your
array degraded or fully operational?


-- 
Jens Axboe

next prev parent reply	other threads:[~2007-03-22 18:45 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-03-22 12:38 2.6.20.3 AMD64 oops in CFQ code linux
2007-03-22 18:41 ` Jens Axboe [this message]
2007-03-22 18:54   ` linux
2007-03-22 19:00     ` Jens Axboe
2007-03-22 23:59       ` Neil Brown
2007-03-23  0:31         ` Dan Williams
2007-03-23  0:33           ` Dan Williams
2007-03-23  0:44           ` Neil Brown
2007-03-23 17:46             ` linux
2007-04-03  5:49               ` Tejun Heo
2007-04-03 13:03                 ` linux
2007-04-03 13:11                   ` Tejun Heo
2007-04-04 23:22                 ` Bill Davidsen
2007-04-05  4:13                   ` Lee Revell
2007-04-05  4:29                     ` Tejun Heo
2007-03-22 18:43 ` Aristeu Sergio Rozanski Filho

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070322184155.GY19922@kernel.dk \
    --to=jens.axboe@oracle.com \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@horizon.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.