Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jens Axboe <jens.axboe@oracle.com>
To: Brad Campbell <brad@wasp.net.au>
Cc: Neil Brown <neilb@suse.de>, Chuck Ebbert <cebbert@redhat.com>,
	lkml <linux-kernel@vger.kernel.org>
Subject: Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
Date: Mon, 23 Apr 2007 09:35:43 +0200	[thread overview]
Message-ID: <20070423073543.GE5311@kernel.dk> (raw)
In-Reply-To: <462B10C3.1030906@wasp.net.au>

On Sun, Apr 22 2007, Brad Campbell wrote:
> Jens Axboe wrote:
> >
> >Thanks for testing Brad, be sure to use the next patch I sent instead.
> >The one from this mail shouldn't even get you booted. So double check
> >that you are still using CFQ :-)
> >
> 
> [184901.576773] BUG: unable to handle kernel NULL pointer dereference at 
> virtual address 0000005c
> [184901.602612]  printing eip:
> [184901.610990] c0205399
> [184901.617796] *pde = 00000000
> [184901.626421] Oops: 0000 [#1]
> [184901.635044] Modules linked in:
> [184901.644500] CPU:    0
> [184901.644501] EIP:    0060:[<c0205399>]    Not tainted VLI
> [184901.644503] EFLAGS: 00010082   (2.6.21-rc7 #7)
> [184901.681294] EIP is at cfq_dispatch_insert+0x19/0x70
> [184901.696168] eax: f7f078e0   ebx: f7ca2794   ecx: 00000004   edx: 
> 00000000
> [184901.716743] esi: c1acaa1c   edi: f7c9c6c0   ebp: 00000000   esp: 
> dbaefde0
> [184901.737316] ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
> [184901.755032] Process md5sum (pid: 4268, ti=dbaee000 task=f794a5a0 
> task.ti=dbaee000)
> [184901.777422] Stack: 00000000 c1acaa1c f7c9c6c0 00000000 c0205509 
> e6b61bd8 c0133451 00001000
> [184901.803121]        00000008 00000000 00000004 0e713800 c1acaa1c 
> f7c9c6c0 c1acaa1c 00000000
> [184901.828837]        c0205749 f7ca2794 f7ca2794 f79bc000 00000282 
> c01fb829 00000000 c016ea8d
> [184901.854552] Call Trace:
> [184901.862723]  [<c0205509>] __cfq_dispatch_requests+0x79/0x170
> [184901.879971]  [<c0133451>] do_generic_mapping_read+0x281/0x470
> [184901.897478]  [<c0205749>] cfq_dispatch_requests+0x69/0x90
> [184901.913946]  [<c01fb829>] elv_next_request+0x39/0x130
> [184901.929375]  [<c016ea8d>] bio_endio+0x5d/0x90
> [184901.942725]  [<c0270375>] scsi_request_fn+0x45/0x280
> [184901.957896]  [<c01fde92>] blk_run_queue+0x32/0x70
> [184901.972286]  [<c026f8e0>] scsi_next_command+0x30/0x50
> [184901.987716]  [<c026f9cb>] scsi_end_request+0x9b/0xc0
> [184902.002886]  [<c026fb31>] scsi_io_completion+0x81/0x330
> [184902.018835]  [<c026daeb>] scsi_delete_timer+0xb/0x20
> [184902.034006]  [<c027ee35>] ata_scsi_qc_complete+0x65/0xd0
> [184902.050214]  [<c02751bb>] sd_rw_intr+0x8b/0x220
> [184902.064085]  [<c0280c0c>] ata_altstatus+0x1c/0x20
> [184902.078475]  [<c027b68d>] ata_hsm_move+0x14d/0x3f0
> [184902.093126]  [<c026bcc0>] scsi_finish_command+0x40/0x60
> [184902.109075]  [<c02702bf>] scsi_softirq_done+0x6f/0xe0
> [184902.124506]  [<c0285f61>] sil_interrupt+0x81/0x90
> [184902.138895]  [<c01ffa78>] blk_done_softirq+0x58/0x70
> [184902.154066]  [<c011721f>] __do_softirq+0x6f/0x80
> [184902.181806]  [<c0104cee>] do_IRQ+0x3e/0x80
> [184902.194380]  [<c010322f>] common_interrupt+0x23/0x28
> [184902.209551]  =======================
> [184902.220512] Code: 0e e3 ef ff e9 47 ff ff ff 89 f6 8d bc 27 00 00 00 00 
> 83 ec 10 89 1c 24 89 6c 24 0c 89 74 24 04 89 7c 24 08 89 c3 89 d5 8b 40 0c 
> <8b> 72 5c 8b 78
> 04 89 d0 e8 1a fa ff ff 8b 45 14 89 ea 25 01 80
> [184902.280564] EIP: [<c0205399>] cfq_dispatch_insert+0x19/0x70 SS:ESP 
> 0068:dbaefde0
> [184902.303418] Kernel panic - not syncing: Fatal exception in interrupt
> [184902.322746] Rebooting in 60 seconds..
> 
> Ok, it's taken be _ages_ to get the system to a point I can reproduce this, 
> but I think it's now reproducible with a couple of hours beating. The bad 
> news is it looks like it has not tickled any of your debugging markers! 
> This was the 1st thing printed on a clean serial console, so nothing above 
> that for days.
> 
> I did double check and I was/am certainly running the kernel with the debug 
> patch compiled in.

Ok, can you try and reproduce with this one applied? It'll keep the
system running (unless there are other corruptions going on), so it
should help you a bit as well. It will dump some cfq state info when the
condition triggers that can perhaps help diagnose this. So if you can
apply this patch and reproduce + send the output, I'd much appreciate
it!

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index b6491c0..2aba928 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -947,6 +947,36 @@ keep_queue:
 	return cfqq;
 }
 
+static void cfq_dump_queue(struct cfq_queue *cfqq)
+{
+	printk("  %d: sort=%d,next=%p,q=%d/%d,a=%d/%d,d=%d/%d,f=%x\n", cfqq->key, RB_EMPTY_ROOT(&cfqq->sort_list), cfqq->next_rq, cfqq->queued[0], cfqq->queued[1], cfqq->allocated[0], cfqq->allocated[1], cfqq->on_dispatch[0], cfqq->on_dispatch[1], cfqq->flags);
+}
+
+static void cfq_dump_state(struct cfq_data *cfqd)
+{
+	struct cfq_queue *cfqq;
+	int i;
+
+	printk("cfq: busy=%d,drv=%d,timer=%d\n", cfqd->busy_queues, cfqd->rq_in_driver, timer_pending(&cfqd->idle_slice_timer));
+
+	printk("cfq rr_list:\n");
+	for (i = 0; i < CFQ_PRIO_LISTS; i++)
+		list_for_each_entry(cfqq, &cfqd->rr_list[i], cfq_list)
+			cfq_dump_queue(cfqq);
+
+	printk("cfq busy_list:\n");
+	list_for_each_entry(cfqq, &cfqd->busy_rr, cfq_list)
+		cfq_dump_queue(cfqq);
+
+	printk("cfq idle_list:\n");
+	list_for_each_entry(cfqq, &cfqd->idle_rr, cfq_list)
+		cfq_dump_queue(cfqq);
+
+	printk("cfq cur_rr:\n");
+	list_for_each_entry(cfqq, &cfqd->cur_rr, cfq_list)
+		cfq_dump_queue(cfqq);
+}
+
 static int
 __cfq_dispatch_requests(struct cfq_data *cfqd, struct cfq_queue *cfqq,
 			int max_dispatch)
@@ -964,6 +994,30 @@ __cfq_dispatch_requests(struct cfq_data *cfqd, struct cfq_queue *cfqq,
 		if ((rq = cfq_check_fifo(cfqq)) == NULL)
 			rq = cfqq->next_rq;
 
+		if (unlikely(!rq)) {
+			/*
+			 * fixup that weird condition that happens with
+			 * md, where ->next_rq == NULL while the rbtree
+			 * is non-empty. dump some info that'll perhaps
+			 * help find this issue.
+			 */
+			struct rb_node *n;
+
+			printk("cfq: rbroot not empty, but ->next_rq"
+				" == NULL! Fixing up, report the"
+				" issue to lkml@vger.kernel.org\n");
+
+			cfq_dump_state(cfqd);
+
+			n = rb_first(&cfqq->sort_list);
+			if (!n) {
+				printk("cfq: rb_first() found nothing\n");
+				return 0;
+			}
+
+			rq = rb_entry(n, struct request, rb_node);
+		}
+
 		/*
 		 * finally, insert request into driver dispatch list
 		 */

-- 
Jens Axboe

next prev parent reply	other threads:[~2007-04-23  7:36 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-04-15 10:14 [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert Brad Campbell
2007-04-15 10:49 ` Brad Campbell
2007-04-15 23:53   ` Adrian Bunk
2007-04-16  3:23     ` Brad Campbell
2007-04-16 22:39   ` Chuck Ebbert
2007-04-17  5:10     ` Neil Brown
2007-04-17  8:13       ` Brad Campbell
2007-04-17 11:48       ` Brad Campbell
2007-04-17 20:39       ` Bartlomiej Zolnierkiewicz
2007-04-18 12:37       ` Jens Axboe
2007-04-18 13:19         ` Brad Campbell
2007-04-18 13:21           ` Jens Axboe
2007-04-22  7:37             ` Brad Campbell
2007-04-23  7:35               ` Jens Axboe [this message]
2007-04-24 19:40                 ` Brad Campbell
2007-04-25  8:34                   ` Neil Brown
2007-04-25  8:46                     ` Jens Axboe
2007-04-25  9:34                       ` Jens Axboe
2007-04-25  9:37                       ` Neil Brown
2007-04-25  9:47                         ` Jens Axboe
2007-04-25 10:02                           ` Brad Campbell
2007-04-25 10:18                             ` Jens Axboe
2007-04-25 13:59                               ` Roland Kuhn
2007-04-25 10:25                           ` Neil Brown
2007-04-25 10:36                             ` Jens Axboe
2007-04-25  9:54                         ` Brad Campbell
2007-04-25  8:50                     ` Brad Campbell
2007-04-25 10:06                     ` Brad Campbell
2007-04-25 10:59                       ` Neil Brown
2007-04-25 11:17                         ` Degraded RAID performance - Was : " Brad Campbell
2007-04-18 13:19         ` Jens Axboe
     [not found] <79880979-51BB-4D28-A3E8-3AE0F56F5B0A@e18.physik.tu-muenchen.de>
     [not found] ` <20070424091807.GA3744@kernel.dk>
     [not found]   ` <6A6800B3-F9C8-4046-9E1C-A8CEA81B2CE0@e18.physik.tu-muenchen.de>
     [not found]     ` <20070424093904.GB3744@kernel.dk>
     [not found]       ` <20070424094003.GC3744@kernel.dk>
2007-04-24 12:27         ` Roland Kuhn
2007-04-24 12:32           ` Jens Axboe
2007-04-24 13:03             ` Roland Kuhn
2007-04-24 13:07               ` Jens Axboe

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:b6491c0 dfblob:2aba928 )
 OR (
bs:"Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070423073543.GE5311@kernel.dk \
    --to=jens.axboe@oracle.com \
    --cc=brad@wasp.net.au \
    --cc=cebbert@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.