public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Jens Axboe <jens.axboe@oracle.com>
To: Brad Campbell <brad@wasp.net.au>
Cc: Neil Brown <neilb@suse.de>, Chuck Ebbert <cebbert@redhat.com>,
	lkml <linux-kernel@vger.kernel.org>
Subject: Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
Date: Mon, 23 Apr 2007 09:35:43 +0200	[thread overview]
Message-ID: <20070423073543.GE5311@kernel.dk> (raw)
In-Reply-To: <462B10C3.1030906@wasp.net.au>

On Sun, Apr 22 2007, Brad Campbell wrote:
> Jens Axboe wrote:
> >
> >Thanks for testing Brad, be sure to use the next patch I sent instead.
> >The one from this mail shouldn't even get you booted. So double check
> >that you are still using CFQ :-)
> >
> 
> [184901.576773] BUG: unable to handle kernel NULL pointer dereference at 
> virtual address 0000005c
> [184901.602612]  printing eip:
> [184901.610990] c0205399
> [184901.617796] *pde = 00000000
> [184901.626421] Oops: 0000 [#1]
> [184901.635044] Modules linked in:
> [184901.644500] CPU:    0
> [184901.644501] EIP:    0060:[<c0205399>]    Not tainted VLI
> [184901.644503] EFLAGS: 00010082   (2.6.21-rc7 #7)
> [184901.681294] EIP is at cfq_dispatch_insert+0x19/0x70
> [184901.696168] eax: f7f078e0   ebx: f7ca2794   ecx: 00000004   edx: 
> 00000000
> [184901.716743] esi: c1acaa1c   edi: f7c9c6c0   ebp: 00000000   esp: 
> dbaefde0
> [184901.737316] ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
> [184901.755032] Process md5sum (pid: 4268, ti=dbaee000 task=f794a5a0 
> task.ti=dbaee000)
> [184901.777422] Stack: 00000000 c1acaa1c f7c9c6c0 00000000 c0205509 
> e6b61bd8 c0133451 00001000
> [184901.803121]        00000008 00000000 00000004 0e713800 c1acaa1c 
> f7c9c6c0 c1acaa1c 00000000
> [184901.828837]        c0205749 f7ca2794 f7ca2794 f79bc000 00000282 
> c01fb829 00000000 c016ea8d
> [184901.854552] Call Trace:
> [184901.862723]  [<c0205509>] __cfq_dispatch_requests+0x79/0x170
> [184901.879971]  [<c0133451>] do_generic_mapping_read+0x281/0x470
> [184901.897478]  [<c0205749>] cfq_dispatch_requests+0x69/0x90
> [184901.913946]  [<c01fb829>] elv_next_request+0x39/0x130
> [184901.929375]  [<c016ea8d>] bio_endio+0x5d/0x90
> [184901.942725]  [<c0270375>] scsi_request_fn+0x45/0x280
> [184901.957896]  [<c01fde92>] blk_run_queue+0x32/0x70
> [184901.972286]  [<c026f8e0>] scsi_next_command+0x30/0x50
> [184901.987716]  [<c026f9cb>] scsi_end_request+0x9b/0xc0
> [184902.002886]  [<c026fb31>] scsi_io_completion+0x81/0x330
> [184902.018835]  [<c026daeb>] scsi_delete_timer+0xb/0x20
> [184902.034006]  [<c027ee35>] ata_scsi_qc_complete+0x65/0xd0
> [184902.050214]  [<c02751bb>] sd_rw_intr+0x8b/0x220
> [184902.064085]  [<c0280c0c>] ata_altstatus+0x1c/0x20
> [184902.078475]  [<c027b68d>] ata_hsm_move+0x14d/0x3f0
> [184902.093126]  [<c026bcc0>] scsi_finish_command+0x40/0x60
> [184902.109075]  [<c02702bf>] scsi_softirq_done+0x6f/0xe0
> [184902.124506]  [<c0285f61>] sil_interrupt+0x81/0x90
> [184902.138895]  [<c01ffa78>] blk_done_softirq+0x58/0x70
> [184902.154066]  [<c011721f>] __do_softirq+0x6f/0x80
> [184902.181806]  [<c0104cee>] do_IRQ+0x3e/0x80
> [184902.194380]  [<c010322f>] common_interrupt+0x23/0x28
> [184902.209551]  =======================
> [184902.220512] Code: 0e e3 ef ff e9 47 ff ff ff 89 f6 8d bc 27 00 00 00 00 
> 83 ec 10 89 1c 24 89 6c 24 0c 89 74 24 04 89 7c 24 08 89 c3 89 d5 8b 40 0c 
> <8b> 72 5c 8b 78
> 04 89 d0 e8 1a fa ff ff 8b 45 14 89 ea 25 01 80
> [184902.280564] EIP: [<c0205399>] cfq_dispatch_insert+0x19/0x70 SS:ESP 
> 0068:dbaefde0
> [184902.303418] Kernel panic - not syncing: Fatal exception in interrupt
> [184902.322746] Rebooting in 60 seconds..
> 
> Ok, it's taken be _ages_ to get the system to a point I can reproduce this, 
> but I think it's now reproducible with a couple of hours beating. The bad 
> news is it looks like it has not tickled any of your debugging markers! 
> This was the 1st thing printed on a clean serial console, so nothing above 
> that for days.
> 
> I did double check and I was/am certainly running the kernel with the debug 
> patch compiled in.

Ok, can you try and reproduce with this one applied? It'll keep the
system running (unless there are other corruptions going on), so it
should help you a bit as well. It will dump some cfq state info when the
condition triggers that can perhaps help diagnose this. So if you can
apply this patch and reproduce + send the output, I'd much appreciate
it!

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index b6491c0..2aba928 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -947,6 +947,36 @@ keep_queue:
 	return cfqq;
 }
 
+static void cfq_dump_queue(struct cfq_queue *cfqq)
+{
+	printk("  %d: sort=%d,next=%p,q=%d/%d,a=%d/%d,d=%d/%d,f=%x\n", cfqq->key, RB_EMPTY_ROOT(&cfqq->sort_list), cfqq->next_rq, cfqq->queued[0], cfqq->queued[1], cfqq->allocated[0], cfqq->allocated[1], cfqq->on_dispatch[0], cfqq->on_dispatch[1], cfqq->flags);
+}
+
+static void cfq_dump_state(struct cfq_data *cfqd)
+{
+	struct cfq_queue *cfqq;
+	int i;
+
+	printk("cfq: busy=%d,drv=%d,timer=%d\n", cfqd->busy_queues, cfqd->rq_in_driver, timer_pending(&cfqd->idle_slice_timer));
+
+	printk("cfq rr_list:\n");
+	for (i = 0; i < CFQ_PRIO_LISTS; i++)
+		list_for_each_entry(cfqq, &cfqd->rr_list[i], cfq_list)
+			cfq_dump_queue(cfqq);
+
+	printk("cfq busy_list:\n");
+	list_for_each_entry(cfqq, &cfqd->busy_rr, cfq_list)
+		cfq_dump_queue(cfqq);
+
+	printk("cfq idle_list:\n");
+	list_for_each_entry(cfqq, &cfqd->idle_rr, cfq_list)
+		cfq_dump_queue(cfqq);
+
+	printk("cfq cur_rr:\n");
+	list_for_each_entry(cfqq, &cfqd->cur_rr, cfq_list)
+		cfq_dump_queue(cfqq);
+}
+
 static int
 __cfq_dispatch_requests(struct cfq_data *cfqd, struct cfq_queue *cfqq,
 			int max_dispatch)
@@ -964,6 +994,30 @@ __cfq_dispatch_requests(struct cfq_data *cfqd, struct cfq_queue *cfqq,
 		if ((rq = cfq_check_fifo(cfqq)) == NULL)
 			rq = cfqq->next_rq;
 
+		if (unlikely(!rq)) {
+			/*
+			 * fixup that weird condition that happens with
+			 * md, where ->next_rq == NULL while the rbtree
+			 * is non-empty. dump some info that'll perhaps
+			 * help find this issue.
+			 */
+			struct rb_node *n;
+
+			printk("cfq: rbroot not empty, but ->next_rq"
+				" == NULL! Fixing up, report the"
+				" issue to lkml@vger.kernel.org\n");
+
+			cfq_dump_state(cfqd);
+
+			n = rb_first(&cfqq->sort_list);
+			if (!n) {
+				printk("cfq: rb_first() found nothing\n");
+				return 0;
+			}
+
+			rq = rb_entry(n, struct request, rb_node);
+		}
+
 		/*
 		 * finally, insert request into driver dispatch list
 		 */

-- 
Jens Axboe


  reply	other threads:[~2007-04-23  7:36 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-04-15 10:14 [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert Brad Campbell
2007-04-15 10:49 ` Brad Campbell
2007-04-15 23:53   ` Adrian Bunk
2007-04-16  3:23     ` Brad Campbell
2007-04-16 22:39   ` Chuck Ebbert
2007-04-17  5:10     ` Neil Brown
2007-04-17  8:13       ` Brad Campbell
2007-04-17 11:48       ` Brad Campbell
2007-04-17 20:39       ` Bartlomiej Zolnierkiewicz
2007-04-18 12:37       ` Jens Axboe
2007-04-18 13:19         ` Brad Campbell
2007-04-18 13:21           ` Jens Axboe
2007-04-22  7:37             ` Brad Campbell
2007-04-23  7:35               ` Jens Axboe [this message]
2007-04-24 19:40                 ` Brad Campbell
2007-04-25  8:34                   ` Neil Brown
2007-04-25  8:46                     ` Jens Axboe
2007-04-25  9:34                       ` Jens Axboe
2007-04-25  9:37                       ` Neil Brown
2007-04-25  9:47                         ` Jens Axboe
2007-04-25 10:02                           ` Brad Campbell
2007-04-25 10:18                             ` Jens Axboe
2007-04-25 13:59                               ` Roland Kuhn
2007-04-25 10:25                           ` Neil Brown
2007-04-25 10:36                             ` Jens Axboe
2007-04-25  9:54                         ` Brad Campbell
2007-04-25  8:50                     ` Brad Campbell
2007-04-25 10:06                     ` Brad Campbell
2007-04-25 10:59                       ` Neil Brown
2007-04-25 11:17                         ` Degraded RAID performance - Was : " Brad Campbell
2007-04-18 13:19         ` Jens Axboe
     [not found] <79880979-51BB-4D28-A3E8-3AE0F56F5B0A@e18.physik.tu-muenchen.de>
     [not found] ` <20070424091807.GA3744@kernel.dk>
     [not found]   ` <6A6800B3-F9C8-4046-9E1C-A8CEA81B2CE0@e18.physik.tu-muenchen.de>
     [not found]     ` <20070424093904.GB3744@kernel.dk>
     [not found]       ` <20070424094003.GC3744@kernel.dk>
2007-04-24 12:27         ` Roland Kuhn
2007-04-24 12:32           ` Jens Axboe
2007-04-24 13:03             ` Roland Kuhn
2007-04-24 13:07               ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070423073543.GE5311@kernel.dk \
    --to=jens.axboe@oracle.com \
    --cc=brad@wasp.net.au \
    --cc=cebbert@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox