[PATCH] i2o_block Fix, possible CFQ elevator problem?

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] i2o_block Fix, possible CFQ elevator problem?
@ 2004-04-19 11:37 Warren Togami
  2004-04-19 12:12 ` Jens Axboe
  0 siblings, 1 reply; 13+ messages in thread
From: Warren Togami @ 2004-04-19 11:37 UTC (permalink / raw)
  To: linux-scsi; +Cc: linux-kernel, Arjan van de Ven, Alan Cox

[-- Attachment #1: Type: text/plain, Size: 2616 bytes --]

Test Systems:
x86 AMD Duron with PM3754 DPT controller, Two disks
x86-64 2x Opteron with ASR-3010S Adaptec controller, Four disks

Software:
Gentoo 1.4.3.13 x86 with gcc-3.3.2
Fedora Core 2 x86-64 development with gcc-3.3.3-7

The i2o_block driver currently in kernel-2.6.5 has a problem when there
is more than one logical block device on the I2O controller. Two or
more devices causes kernel memory corruption,[1] sometimes accompanied
by an oops, sometimes fatally in IRQ context and livelocks. In SMP mode
this tended to cause panic=5 reboot to fail.

http://lwn.net/Articles/58199/
Markus finally figured out what was going wrong when he read in this LWN
article that in the 2.6 kernel, the generic block layer handles
partition code and there is almost nothing the individual drivers have
to do. The attached patch from Markus removes this unused partition
logic and seemed to make things work extremely well.

After some testing we began to realize why the kernel memory corruption
that we saw with 2 or more block arrays was happening. The first array
always had a valid partition table, while second and more arrays never
did after they were split from the original single large array. The
unused partition logic was copying the corrupted partition table from
the 2nd and higher arrays into kernel memory, causing the trouble in
kernel-2.6.5.

Subsequent testing was going extremely well, with simultaneous heavy
disk usage on two I2O block devices with bonnie++ going stable. We then
chopped the controller into four I2O block devices, but unfortunately
this ran us into trouble. bonnie++ on three devices simultaneously
would panic. Below is a photo of the call trace from this panic.

http://togami.com/~warren/archive/2004/i2o_cfq_quad_bonnie.jpg

I noticed that the call trace contained "cfq_next_request", so I was
curious what would happened if we changed to the deadline scheduler.
Booted with the same kernel but with "elevator=deadline". To our
surprise, bonnie++ ran simultaneously on all four I2O block devices
without crashing the server. For another test we tried "elevator=as"
and it too remained stable.

Possible CFQ I/O scheduler problem?

This research has been a joint venture of Markus Lidel and Warren
Togami, with thanks to Mid-Pacific Institute for the loan of hardware.

Warren Togami
wtogami@redhat.com
Markus Lidel
Markus.Lidel@shadowconnect.com

[1]
With two block arrays, ls in the filesystem would sometimes have
corrupted looking characters as the directories and files, making the
system unusable. With four block arrays this happened, also with an
accompanying kernel oops.

[-- Attachment #2: i2o_block-cleanup.patch --]
[-- Type: text/x-patch, Size: 12712 bytes --]

--- i2o_block.c	2004-04-16 10:56:37.000000000 -1000
+++ i2o_block.c.new	2004-04-17 04:14:05.459068416 -1000
@@ -117,9 +117,6 @@
  *	Some of these can be made smaller later
  */
 
-static int i2ob_media_change_flag[MAX_I2OB];
-static u32 i2ob_max_sectors[MAX_I2OB<<4];
-
 static int i2ob_context;
 
 /*
@@ -143,6 +140,8 @@
 	int wcache;
 	int power;
 	int index;
+	int media_change_flag;
+	u32 max_sectors;
 };
 
 /*
@@ -162,9 +161,9 @@
 };
 
 /*
- * Per IOP requst queue information
+ * Per IOP request queue information
  *
- * We have a separate requeust_queue_t per IOP so that a heavilly
+ * We have a separate request_queue_t per IOP so that a heavilly
  * loaded I2O block device on an IOP does not starve block devices
  * across all I2O controllers.
  * 
@@ -183,7 +182,7 @@
  *	Each I2O disk is one of these.
  */
 
-static struct i2ob_device i2ob_dev[MAX_I2OB<<4];
+static struct i2ob_device i2ob_dev[MAX_I2OB];
 static int i2ob_dev_count = 0;
 static struct gendisk *i2ob_disk[MAX_I2OB];
 
@@ -312,7 +311,6 @@
 	/* 
 	 * Mask out partitions from now on
 	 */
-	unit &= 0xF0;
 		
 	/* This can be optimised later - just want to be sure its right for
 	   starters */
@@ -453,8 +451,8 @@
 	struct i2ob_request *ireq = NULL;
 	u8 st;
 	u32 *m = (u32 *)msg;
-	u8 unit = (m[2]>>8)&0xF0;	/* low 4 bits are partition */
-	struct i2ob_device *dev = &i2ob_dev[(unit&0xF0)];
+	u8 unit = m[2]>>8;
+	struct i2ob_device *dev = &i2ob_dev[unit];
 
 	/*
 	 * FAILed message
@@ -599,7 +597,6 @@
 	unsigned int evt;
 	unsigned long flags;
 	int unit;
-	int i;
 	//The only event that has data is the SCSI_SMART event.
 	struct i2o_reply {
 		u32 header[4];
@@ -648,7 +645,7 @@
 			 */
 			case I2O_EVT_IND_BSA_VOLUME_LOAD:
 			{
-				struct gendisk *p = i2ob_disk[unit>>4];
+				struct gendisk *p = i2ob_disk[unit];
 				i2ob_install_device(i2ob_dev[unit].i2odev->controller, 
 					i2ob_dev[unit].i2odev, unit);
 				add_disk(p);
@@ -663,11 +660,10 @@
 			 */
 			case I2O_EVT_IND_BSA_VOLUME_UNLOAD:
 			{
-				struct gendisk *p = i2ob_disk[unit>>4];
+				struct gendisk *p = i2ob_disk[unit];
 				del_gendisk(p);
-				for(i = unit; i <= unit+15; i++)
-					blk_queue_max_sectors(i2ob_dev[i].req_queue, 0);
-				i2ob_media_change_flag[unit] = 1;
+				blk_queue_max_sectors(i2ob_dev[unit].req_queue, 0);
+				i2ob_dev[unit].media_change_flag = 1;
 				break;
 			}
 
@@ -695,7 +691,7 @@
 					i2ob_query_device(&i2ob_dev[unit], 0x0000, 4, &size, 8);
 
 				spin_lock_irqsave(I2O_LOCK(unit), flags);	
-				set_capacity(i2ob_disk[unit>>4], size>>9);
+				set_capacity(i2ob_disk[unit], size>>9);
 				spin_unlock_irqrestore(I2O_LOCK(unit), flags);	
 				break;
 			}
@@ -803,7 +799,7 @@
 		i2ob_queues[dev->unit]->i2ob_qhead = ireq->next;
 		ireq->req = req;
 
-		i2ob_send(m, dev, ireq, (dev->unit&0xF0));
+		i2ob_send(m, dev, ireq, dev->index);
 	}
 }
 
@@ -1065,7 +1061,8 @@
 	u16 power;
 	u32 flags, status;
 	struct i2ob_device *dev=&i2ob_dev[unit];
-	int i;
+	request_queue_t *q;
+	int segments;
 
 	/*
 	 * For logging purposes...
@@ -1089,11 +1086,11 @@
 	 * code so that we can directly get the queue ptr from the
 	 * device instead of having to go the IOP data structure.
 	 */
-	dev->req_queue = i2ob_queues[unit]->req_queue;
+	dev->req_queue = i2ob_queues[c->unit]->req_queue;
 
 	/* initialize gendik structure */
-	i2ob_disk[unit>>4]->private_data = dev;
-	i2ob_disk[unit>>4]->queue = dev->req_queue;
+	i2ob_disk[unit]->private_data = dev;
+	i2ob_disk[unit]->queue = dev->req_queue;
 
 	/*
 	 *	Ask for the current media data. If that isn't supported
@@ -1110,7 +1107,7 @@
 		power = 0;
 	i2ob_query_device(dev, 0x0000, 5, &flags, 4);
 	i2ob_query_device(dev, 0x0000, 6, &status, 4);
-	set_capacity(i2ob_disk[unit>>4], size>>9);
+	set_capacity(i2ob_disk[unit], size>>9);
 
 	/*
 	 * Max number of Scatter-Gather Elements
@@ -1119,44 +1116,41 @@
 	i2ob_dev[unit].power = power;	/* Save power state in device proper */
 	i2ob_dev[unit].flags = flags;
 
-	for(i=unit;i<=unit+15;i++)
-	{
-		request_queue_t *q = i2ob_dev[unit].req_queue;
-		int segments = (d->controller->status_block->inbound_frame_size - 7) / 2;
+	q = i2ob_dev[unit].req_queue;
+	segments = (d->controller->status_block->inbound_frame_size - 7) / 2;
 
-		if(segments > 16)
-			segments = 16;
-					
-		i2ob_dev[i].power = power;	/* Save power state */
-		i2ob_dev[unit].flags = flags;	/* Keep the type info */
+	if(segments > 16)
+		segments = 16;
+				
+	i2ob_dev[unit].power = power;	/* Save power state */
+	i2ob_dev[unit].flags = flags;	/* Keep the type info */
 		
-		blk_queue_max_sectors(q, 96);	/* 256 might be nicer but many controllers 
+	blk_queue_max_sectors(q, 96);	/* 256 might be nicer but many controllers 
 						   explode on 65536 or higher */
-		blk_queue_max_phys_segments(q, segments);
-		blk_queue_max_hw_segments(q, segments);
+	blk_queue_max_phys_segments(q, segments);
+	blk_queue_max_hw_segments(q, segments);
 		
-		i2ob_dev[i].rcache = CACHE_SMARTFETCH;
-		i2ob_dev[i].wcache = CACHE_WRITETHROUGH;
+	i2ob_dev[unit].rcache = CACHE_SMARTFETCH;
+	i2ob_dev[unit].wcache = CACHE_WRITETHROUGH;
 		
-		if(d->controller->battery == 0)
-			i2ob_dev[i].wcache = CACHE_WRITETHROUGH;
+	if(d->controller->battery == 0)
+		i2ob_dev[unit].wcache = CACHE_WRITETHROUGH;
 
-		if(d->controller->promise)
-			i2ob_dev[i].wcache = CACHE_WRITETHROUGH;
+	if(d->controller->promise)
+		i2ob_dev[unit].wcache = CACHE_WRITETHROUGH;
 
-		if(d->controller->short_req)
-		{
-			blk_queue_max_sectors(q, 8);
-			blk_queue_max_phys_segments(q, 8);
-			blk_queue_max_hw_segments(q, 8);
-		}
+	if(d->controller->short_req)
+	{
+		blk_queue_max_sectors(q, 8);
+		blk_queue_max_phys_segments(q, 8);
+		blk_queue_max_hw_segments(q, 8);
 	}
 
-	strcpy(d->dev_name, i2ob_disk[unit>>4]->disk_name);
-	strcpy(i2ob_disk[unit>>4]->devfs_name, i2ob_disk[unit>>4]->disk_name);
+	strcpy(d->dev_name, i2ob_disk[unit]->disk_name);
+	strcpy(i2ob_disk[unit]->devfs_name, i2ob_disk[unit]->disk_name);
 
 	printk(KERN_INFO "%s: Max segments %d, queue depth %d, byte limit %d.\n",
-		 d->dev_name, i2ob_dev[unit].max_segments, i2ob_dev[unit].depth, i2ob_max_sectors[unit]<<9);
+		 d->dev_name, i2ob_dev[unit].max_segments, i2ob_dev[unit].depth, i2ob_dev[unit].max_sectors<<9);
 
 	i2ob_query_device(dev, 0x0000, 0, &type, 1);
 
@@ -1197,7 +1191,7 @@
 	}
 	printk(".\n");
 	printk(KERN_INFO "%s: Maximum sectors/read set to %d.\n", 
-		d->dev_name, i2ob_max_sectors[unit]);
+		d->dev_name, i2ob_dev[unit].max_sectors);
 
 	/*
 	 * Register for the events we're interested in and that the
@@ -1323,7 +1317,7 @@
 
 			i2o_release_device(d, &i2o_block_handler);
 
-			if(scan_unit<MAX_I2OB<<4)
+			if(scan_unit<MAX_I2OB)
 			{
  				/*
 				 * Get the device and fill in the
@@ -1339,8 +1333,8 @@
 					printk(KERN_WARNING "Could not install I2O block device\n");
 				else
 				{
-					add_disk(i2ob_disk[scan_unit>>4]);
-					scan_unit+=16;
+					add_disk(i2ob_disk[scan_unit]);
+					scan_unit++;
 					i2ob_dev_count++;
 
 					/* We want to know when device goes away */
@@ -1350,7 +1344,7 @@
 			else
 			{
 				if(!warned++)
-					printk(KERN_WARNING "i2o_block: too many device, registering only %d.\n", scan_unit>>4);
+					printk(KERN_WARNING "i2o_block: too many device, registering only %d.\n", scan_unit);
 			}
 		}
 		i2o_unlock_controller(c);
@@ -1399,12 +1393,12 @@
 	printk(KERN_INFO "   Controller %d Tid %d\n",c->unit, d->lct_data.tid);
 
 	/* Check for available space */
-	if(i2ob_dev_count>=MAX_I2OB<<4)
+	if(i2ob_dev_count>=MAX_I2OB)
 	{
 		printk(KERN_ERR "i2o_block: No more devices allowed!\n");
 		return;
 	}
-	for(unit = 0; unit < (MAX_I2OB<<4); unit += 16)
+	for(unit = 0; unit < MAX_I2OB; unit ++)
 	{
 		if(!i2ob_dev[unit].i2odev)
 			break;
@@ -1425,7 +1419,7 @@
 		printk(KERN_ERR "i2o_block: Could not install new device\n");
 	else	
 	{
-		add_disk(i2ob_disk[unit>>4]);
+		add_disk(i2ob_disk[unit]);
 		i2ob_dev_count++;
 		i2o_device_notify_on(d, &i2o_block_handler);
 	}
@@ -1443,7 +1437,6 @@
 void i2ob_del_device(struct i2o_controller *c, struct i2o_device *d)
 {	
 	int unit = 0;
-	int i = 0;
 	unsigned long flags;
 
 	spin_lock_irqsave(I2O_LOCK(c->unit), flags);
@@ -1456,7 +1449,7 @@
 
 	printk(KERN_INFO "I2O Block Device Deleted\n");
 
-	for(unit = 0; unit < MAX_I2OB<<4; unit += 16)
+	for(unit = 0; unit < MAX_I2OB; unit ++)
 	{
 		if(i2ob_dev[unit].i2odev == d)
 		{
@@ -1465,7 +1458,7 @@
 			break;
 		}
 	}
-	if(unit >= MAX_I2OB<<4)
+	if(unit >= MAX_I2OB)
 	{
 		printk(KERN_ERR "i2ob_del_device called, but not in dev table!\n");
 		spin_unlock_irqrestore(I2O_LOCK(c->unit), flags);
@@ -1476,13 +1469,10 @@
 	 * This will force errors when i2ob_get_queue() is called
 	 * by the kenrel.
 	 */
-	del_gendisk(i2ob_disk[unit>>4]);
+	del_gendisk(i2ob_disk[unit]);
 	i2ob_dev[unit].req_queue = NULL;
-	for(i = unit; i <= unit+15; i++)
-	{
-		i2ob_dev[i].i2odev = NULL;
-		blk_queue_max_sectors(i2ob_dev[i].req_queue, 0);
-	}
+	i2ob_dev[unit].i2odev = NULL;
+	blk_queue_max_sectors(i2ob_dev[unit].req_queue, 0);
 	spin_unlock_irqrestore(I2O_LOCK(c->unit), flags);
 
 	/*
@@ -1493,14 +1483,13 @@
 		MOD_DEC_USE_COUNT;
 
 	i2ob_dev[unit].refcnt = 0;
-	
-	i2ob_dev[i].tid = 0;
+	i2ob_dev[unit].tid = 0;
 
 	/* 
 	 * Do we need this?
 	 * The media didn't really change...the device is just gone
 	 */
-	i2ob_media_change_flag[unit] = 1;
+	i2ob_dev[unit].media_change_flag = 1;
 
 	i2ob_dev_count--;	
 }
@@ -1511,10 +1500,9 @@
 static int i2ob_media_change(struct gendisk *disk)
 {
 	struct i2ob_device *p = disk->private_data;
-	int i = p->index;
-	if(i2ob_media_change_flag[i])
+	if(p->media_change_flag)
 	{
-		i2ob_media_change_flag[i]=0;
+		p->media_change_flag=0;
 		return 1;
 	}
 	return 0;
@@ -1523,7 +1511,7 @@
 static int i2ob_revalidate(struct gendisk *disk)
 {
 	struct i2ob_device *p = disk->private_data;
-	return i2ob_install_device(p->controller, p->i2odev, p->index<<4);
+	return i2ob_install_device(p->controller, p->i2odev, p->index);
 }
 
 /*
@@ -1536,7 +1524,7 @@
 	
 	for(i=0;i<MAX_I2OB;i++)
 	{
-		struct i2ob_device *dev=&i2ob_dev[(i<<4)];
+		struct i2ob_device *dev=&i2ob_dev[i];
 		
 		if(dev->refcnt!=0)
 		{
@@ -1598,23 +1586,35 @@
 	if (register_blkdev(MAJOR_NR, "i2o_block"))
 		return -EIO;
 
-	for (i = 0; i < MAX_I2OB; i++) {
-		struct gendisk *disk = alloc_disk(16);
-		if (!disk)
-			goto oom;
-		i2ob_dev[i<<4].index = i;
-		disk->queue = i2ob_dev[i<<4].req_queue;
-		i2ob_disk[i] = disk;
-	}
 #ifdef MODULE
 	printk(KERN_INFO "i2o_block: registered device at major %d\n", MAJOR_NR);
 #endif
 
 	/*
+	 *	Set up the queue
+	 */
+	for(i = 0; i < MAX_I2O_CONTROLLERS; i++)
+	{
+		i2ob_queues[i] = NULL;
+		i2ob_init_iop(i);
+	}
+
+	/*
 	 *	Now fill in the boiler plate
 	 */
 	 
-	for (i = 0; i < MAX_I2OB << 4; i++) {
+	for (i = 0; i < MAX_I2OB; i++) {
+		struct gendisk *disk = alloc_disk(16);
+		if (!disk)
+			goto oom;
+		i2ob_disk[i] = disk;
+
+		disk->major = MAJOR_NR;
+		disk->first_minor = i<<4;
+		disk->fops = &i2ob_fops;
+		sprintf(disk->disk_name, "i2o/hd%c", 'a' + i);
+
+		i2ob_dev[i].index = i;
 		i2ob_dev[i].refcnt = 0;
 		i2ob_dev[i].flags = 0;
 		i2ob_dev[i].controller = NULL;
@@ -1623,26 +1623,10 @@
 		i2ob_dev[i].head = NULL;
 		i2ob_dev[i].tail = NULL;
 		i2ob_dev[i].depth = MAX_I2OB_DEPTH;
-		i2ob_max_sectors[i] = 2;
-	}
-	
-	for (i = 0; i < MAX_I2OB; i++) {
-		struct gendisk *disk = i2ob_disk[i];
-		disk->major = MAJOR_NR;
-		disk->first_minor = i<<4;
-		disk->fops = &i2ob_fops;
-		sprintf(disk->disk_name, "i2o/hd%c", 'a' + i);
+		i2ob_dev[i].max_sectors = 2;
 	}
 	
 	/*
-	 *	Set up the queue
-	 */
-	for(i = 0; i < MAX_I2O_CONTROLLERS; i++)
-	{
-		i2ob_queues[i] = NULL;
-	}
-
-	/*
 	 *	Register the OSM handler as we will need this to probe for
 	 *	drives, geometry and other goodies.
 	 */
@@ -1701,10 +1685,10 @@
 	if(i2ob_dev_count) {
 		struct i2o_device *d;
 		for(i = 0; i < MAX_I2OB; i++)
-		if((d=i2ob_dev[i<<4].i2odev)) {
+		if((d=i2ob_dev[i].i2odev)) {
 			i2o_device_notify_off(d, &i2o_block_handler);
 			i2o_event_register(d->controller, d->lct_data.tid, 
-				i2ob_context, i<<4, 0);
+				i2ob_context, i, 0);
 		}
 	}
 	
@@ -1726,14 +1710,26 @@
 
 	i2o_remove_handler(&i2o_block_handler);
 		 
-	for (i = 0; i < MAX_I2OB; i++)
+	for (i = 0; i < MAX_I2OB; i++) {
+		struct gendisk *disk = i2ob_disk[i];
+		if(disk->flags & GENHD_FL_UP)
+			del_gendisk(i2ob_disk[i]);
 		put_disk(i2ob_disk[i]);
+	}
 
 	/*
 	 *	Return the block device
 	 */
 	if (unregister_blkdev(MAJOR_NR, "i2o_block") != 0)
 		printk("i2o_block: cleanup_module failed\n");
+
+	/*
+	 *	release request queue
+	 */
+	for (i = 0; i < MAX_I2O_CONTROLLERS; i ++) {
+		blk_cleanup_queue(i2ob_queues[i]->req_queue);
+		kfree(i2ob_queues[i]);
+	}
 }
 
 MODULE_AUTHOR("Red Hat");


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] i2o_block Fix, possible CFQ elevator problem?
  2004-04-19 11:37 [PATCH] i2o_block Fix, possible CFQ elevator problem? Warren Togami
@ 2004-04-19 12:12 ` Jens Axboe
  2004-04-20  0:42   ` Warren Togami
  0 siblings, 1 reply; 13+ messages in thread
From: Jens Axboe @ 2004-04-19 12:12 UTC (permalink / raw)
  To: Warren Togami; +Cc: linux-scsi, linux-kernel, Arjan van de Ven, Alan Cox

On Mon, Apr 19 2004, Warren Togami wrote:
> Test Systems:
> x86 AMD Duron with PM3754 DPT controller, Two disks
> x86-64 2x Opteron with ASR-3010S Adaptec controller, Four disks
> 
> Software:
> Gentoo 1.4.3.13 x86 with gcc-3.3.2
> Fedora Core 2 x86-64 development with gcc-3.3.3-7
> 
> The i2o_block driver currently in kernel-2.6.5 has a problem when there
> is more than one logical block device on the I2O controller. Two or
> more devices causes kernel memory corruption,[1] sometimes accompanied
> by an oops, sometimes fatally in IRQ context and livelocks. In SMP mode
> this tended to cause panic=5 reboot to fail.
> 
> http://lwn.net/Articles/58199/
> Markus finally figured out what was going wrong when he read in this LWN
> article that in the 2.6 kernel, the generic block layer handles
> partition code and there is almost nothing the individual drivers have
> to do. The attached patch from Markus removes this unused partition
> logic and seemed to make things work extremely well.
> 
> After some testing we began to realize why the kernel memory corruption
> that we saw with 2 or more block arrays was happening. The first array
> always had a valid partition table, while second and more arrays never
> did after they were split from the original single large array. The
> unused partition logic was copying the corrupted partition table from
> the 2nd and higher arrays into kernel memory, causing the trouble in
> kernel-2.6.5.
> 
> Subsequent testing was going extremely well, with simultaneous heavy
> disk usage on two I2O block devices with bonnie++ going stable. We then
> chopped the controller into four I2O block devices, but unfortunately
> this ran us into trouble. bonnie++ on three devices simultaneously
> would panic. Below is a photo of the call trace from this panic.
> 
> http://togami.com/~warren/archive/2004/i2o_cfq_quad_bonnie.jpg
> 
> I noticed that the call trace contained "cfq_next_request", so I was
> curious what would happened if we changed to the deadline scheduler.
> Booted with the same kernel but with "elevator=deadline". To our
> surprise, bonnie++ ran simultaneously on all four I2O block devices
> without crashing the server. For another test we tried "elevator=as"
> and it too remained stable.
> 
> Possible CFQ I/O scheduler problem?

That looks pretty damn strange. Any chance you can capture a full oops,
with a serial console or similar?

A quick look at i2o_block doesn't show anything that should cause this.
There are a number of request handling badnesses in there, though:

- kill check fo RQ_INACTIVE, it's pointless. even if it wasn't, the
  inactive check return is buggy.

- you must not just clear req->waiting! remove that code.

- ->queue_depth should not be an expensive atomic type, you have the
  device lock when you look at/inc/dec it.

>  	/*
> +	 *	Set up the queue
> +	 */
> +	for(i = 0; i < MAX_I2O_CONTROLLERS; i++)
> +	{
> +		i2ob_queues[i] = NULL;
> +		i2ob_init_iop(i);
> +	}

Only initialize queues that will be used. Each allocated but unusued
queues wastes memory.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] i2o_block Fix, possible CFQ elevator problem?
  2004-04-19 12:12 ` Jens Axboe
@ 2004-04-20  0:42   ` Warren Togami
       [not found]     ` <40848159.7090605@togami.com>
  0 siblings, 1 reply; 13+ messages in thread
From: Warren Togami @ 2004-04-20  0:42 UTC (permalink / raw)
  To: axboe; +Cc: Arjan van de Ven, linux-scsi, linux-kernel, Alan Cox

[-- Attachment #1: Type: text/plain, Size: 5138 bytes --]

Jens Axboe wrote:
>>
>>http://togami.com/~warren/archive/2004/i2o_cfq_quad_bonnie.jpg
>>
>>I noticed that the call trace contained "cfq_next_request", so I was
>>curious what would happened if we changed to the deadline scheduler.
>>Booted with the same kernel but with "elevator=deadline". To our
>>surprise, bonnie++ ran simultaneously on all four I2O block devices
>>without crashing the server. For another test we tried "elevator=as"
>>and it too remained stable.
>>
>>Possible CFQ I/O scheduler problem?
> 
> 
> That looks pretty damn strange. Any chance you can capture a full oops,
> with a serial console or similar?

Fedora Core 2 x86_64 SMP (2 x Opteron) with kernel-2.6.5-1.326 (based on
2.6.6-rc1)
gcc-3.3.3-7 plus attached patch.  This panic was captured by netconsole.

http://togami.com/~warren/archive/2004/i2o_cfq_quad_bonnie.txt
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at cfq_iosched:407
invalid operand: 0000 [1] SMP
CPU 0
Pid: 1523, comm: bonnie++ Not tainted 2.6.5-1.326custom
RIP: 0010:[<ffffffff80234f93>] <ffffffff80234f93>{cfq_next_request+66}
RSP: 0000:ffffffff804a9048  EFLAGS: 00010016
RAX: 000001006cebf578 RBX: 000001007f3f1e00 RCX: 000001006ceb6340
RDX: 000001006cebf540 RSI: 0000000000000001 RDI: 000001007a225000
RBP: 000001007a225000 R08: ffffffff804a8ec8 R09: 0000000000000000
R10: 000001000289cb60 R11: 0000000000000000 R12: 000001007a225000
R13: 000001007a225000 R14: 000001001703bf58 R15: 0000000000509010
FS:  0000002a9555f360(0000) GS:ffffffff804a4e80(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000002a95558000 CR3: 0000000000101000 CR4: 00000000000006e0
Process bonnie++ (pid: 1523, stackpage=1007fd5e1c0)
Stack: 000001007aa01118 000001007d95abc0 000001007aa01118 ffffffff8022b444
         000001007d95abc0 000001007aa01118 ffffffffa0137bc0 ffffffffa0133b82
         0000000000000001 000001007d95abc0
Call Trace:<IRQ> <ffffffff8022b444>{elv_next_request+238}
<ffffffffa0133b82>{:i2o_block:i2ob_request+233}
         <ffffffffa013370e>{:i2o_block:i2o_block_reply+1162}
         <ffffffff80113d85>{handle_IRQ_event+44}
<ffffffffa00893ac>{:tg3:tg3_enable_ints+23}
         <ffffffff8013e7f9>{update_wall_time+12}
<ffffffffa002908a>{:i2o_core:i2o_run_queue+124}
         <ffffffffa002b60e>{:i2o_core:i2o_pci_interrupt+9}
<ffffffff80113d85>{handle_IRQ_event+44}
         <ffffffff801140b8>{do_IRQ+285} <ffffffff8013a894>{__do_softirq+76}
         <ffffffff8011185f>{ret_from_intr+0}  <EOI>

Code: 0f 0b 74 3a 32 80 ff ff ff ff 97 01 48 89 c8 eb 0e 48 89 de
RIP <ffffffff80234f93>{cfq_next_request+66} RSP <ffffffff804a9048>
   <0>Kernel panic: Aiee, killing interrupt handler!
In interrupt handler - not syncing
   <0>Rebooting in 5 seconds..Badness in panic at kernel/panic.c:87

Call Trace:<IRQ> <ffffffff80136024>{panic+416}
<ffffffff8011411d>{do_IRQ+386}
         <ffffffff8011185f>{ret_from_intr+0} <ffffffff80138cd3>{do_exit+87}
         <ffffffff80112ab5>{oops_end+80} <ffffffff80112bc5>{do_trap+0}
         <ffffffff80112cc2>{do_trap+253}
<ffffffff80112f45>{do_invalid_op+145}
         <ffffffff80234f93>{cfq_next_request+66}
<ffffffff8012ff68>{recalc_task_prio+343}
         <ffffffff8012ff68>{recalc_task_prio+343}
<ffffffff80111d31>{error_exit+0}
         <ffffffff80234f93>{cfq_next_request+66}
<ffffffff80156bea>{mempool_free+227}
         <ffffffff8022b444>{elv_next_request+238}
<ffffffffa0133b82>{:i2o_block:i2ob_request+233}
         <ffffffffa013370e>{:i2o_block:i2o_block_reply+1162}
         <ffffffff80113d85>{handle_IRQ_event+44}
<ffffffffa00893ac>{:tg3:tg3_enable_ints+23}
         <ffffffff8013e7f9>{update_wall_time+12}
<ffffffffa002908a>{:i2o_core:i2o_run_queue+124}
         <ffffffffa002b60e>{:i2o_core:i2o_pci_interrupt+9}
<ffffffff80113d85>{handle_IRQ_event+44}
         <ffffffff801140b8>{do_IRQ+285} <ffffffff8013a894>{__do_softirq+76}
         <ffffffff8011185f>{ret_from_intr+0}  <EOI>


> 
> A quick look at i2o_block doesn't show anything that should cause this.
> There are a number of request handling badnesses in there, though:
> 
> - kill check fo RQ_INACTIVE, it's pointless. even if it wasn't, the
>   inactive check return is buggy.
> 
> - you must not just clear req->waiting! remove that code.
> 
> - ->queue_depth should not be an expensive atomic type, you have the
>   device lock when you look at/inc/dec it.
> 
> 
>> 	/*
>>+	 *	Set up the queue
>>+	 */
>>+	for(i = 0; i < MAX_I2O_CONTROLLERS; i++)
>>+	{
>>+		i2ob_queues[i] = NULL;
>>+		i2ob_init_iop(i);
>>+	}
> 
> 
> Only initialize queues that will be used. Each allocated but unusued
> queues wastes memory.
> 

Markus implemented all of your recommendations except the atomic part,
which seemed to cause a kernel panic while loading the i2o_block driver.
   Attached is the latest i2o_block.c cleanup, while at this below URL is
his attempt of removing the atomic type.  He is not sure if he
implemented it properly.

http://i2o.shadowconnect.com/i2o_block-cleanup.patch-broke

Your recommendations would be greatly appreciated!

Thank you,
Warren Togami
wtogami@redhat.com
Markus Lidel
Markus.Lidel@shadowconnect.com

[-- Attachment #2: i2o_block-cleanup.patch --]
[-- Type: text/x-patch, Size: 13449 bytes --]

--- /root/linux-2.6.5-1.322/drivers/message/i2o/i2o_block.c.old	2004-04-15 12:05:05.000000000 -1000
+++ i2o_block.c	2004-04-20 12:46:38.000000000 -1000
@@ -117,9 +117,6 @@
  *	Some of these can be made smaller later
  */
 
-static int i2ob_media_change_flag[MAX_I2OB];
-static u32 i2ob_max_sectors[MAX_I2OB<<4];
-
 static int i2ob_context;
 
 /*
@@ -143,6 +140,8 @@
 	int wcache;
 	int power;
 	int index;
+	int media_change_flag;
+	u32 max_sectors;
 };
 
 /*
@@ -162,9 +161,9 @@
 };
 
 /*
- * Per IOP requst queue information
+ * Per IOP request queue information
  *
- * We have a separate requeust_queue_t per IOP so that a heavilly
+ * We have a separate request_queue_t per IOP so that a heavilly
  * loaded I2O block device on an IOP does not starve block devices
  * across all I2O controllers.
  * 
@@ -183,7 +182,7 @@
  *	Each I2O disk is one of these.
  */
 
-static struct i2ob_device i2ob_dev[MAX_I2OB<<4];
+static struct i2ob_device i2ob_dev[MAX_I2OB];
 static int i2ob_dev_count = 0;
 static struct gendisk *i2ob_disk[MAX_I2OB];
 
@@ -312,7 +311,6 @@
 	/* 
 	 * Mask out partitions from now on
 	 */
-	unit &= 0xF0;
 		
 	/* This can be optimised later - just want to be sure its right for
 	   starters */
@@ -453,8 +451,8 @@
 	struct i2ob_request *ireq = NULL;
 	u8 st;
 	u32 *m = (u32 *)msg;
-	u8 unit = (m[2]>>8)&0xF0;	/* low 4 bits are partition */
-	struct i2ob_device *dev = &i2ob_dev[(unit&0xF0)];
+	u8 unit = m[2]>>8;
+	struct i2ob_device *dev = &i2ob_dev[unit];
 
 	/*
 	 * FAILed message
@@ -599,7 +597,6 @@
 	unsigned int evt;
 	unsigned long flags;
 	int unit;
-	int i;
 	//The only event that has data is the SCSI_SMART event.
 	struct i2o_reply {
 		u32 header[4];
@@ -648,7 +645,7 @@
 			 */
 			case I2O_EVT_IND_BSA_VOLUME_LOAD:
 			{
-				struct gendisk *p = i2ob_disk[unit>>4];
+				struct gendisk *p = i2ob_disk[unit];
 				i2ob_install_device(i2ob_dev[unit].i2odev->controller, 
 					i2ob_dev[unit].i2odev, unit);
 				add_disk(p);
@@ -663,11 +660,10 @@
 			 */
 			case I2O_EVT_IND_BSA_VOLUME_UNLOAD:
 			{
-				struct gendisk *p = i2ob_disk[unit>>4];
+				struct gendisk *p = i2ob_disk[unit];
 				del_gendisk(p);
-				for(i = unit; i <= unit+15; i++)
-					blk_queue_max_sectors(i2ob_dev[i].req_queue, 0);
-				i2ob_media_change_flag[unit] = 1;
+				blk_queue_max_sectors(i2ob_dev[unit].req_queue, 0);
+				i2ob_dev[unit].media_change_flag = 1;
 				break;
 			}
 
@@ -695,7 +691,7 @@
 					i2ob_query_device(&i2ob_dev[unit], 0x0000, 4, &size, 8);
 
 				spin_lock_irqsave(I2O_LOCK(unit), flags);	
-				set_capacity(i2ob_disk[unit>>4], size>>9);
+				set_capacity(i2ob_disk[unit], size>>9);
 				spin_unlock_irqrestore(I2O_LOCK(unit), flags);	
 				break;
 			}
@@ -765,14 +761,6 @@
 	u32 m;
 	
 	while ((req = elv_next_request(q)) != NULL) {
-		/*
-		 *	On an IRQ completion if there is an inactive
-		 *	request on the queue head it means it isnt yet
-		 *	ready to dispatch.
-		 */
-		if(req->rq_status == RQ_INACTIVE)
-			return;
-
 		dev = req->rq_disk->private_data;
 
 		/* 
@@ -797,13 +785,12 @@
 		 */
 		req->errors = 0;
 		blkdev_dequeue_request(req);	
-		req->waiting = NULL;
 		
 		ireq = i2ob_queues[dev->unit]->i2ob_qhead;
 		i2ob_queues[dev->unit]->i2ob_qhead = ireq->next;
 		ireq->req = req;
 
-		i2ob_send(m, dev, ireq, (dev->unit&0xF0));
+		i2ob_send(m, dev, ireq, dev->index);
 	}
 }
 
@@ -1065,7 +1052,8 @@
 	u16 power;
 	u32 flags, status;
 	struct i2ob_device *dev=&i2ob_dev[unit];
-	int i;
+	request_queue_t *q;
+	int segments;
 
 	/*
 	 * For logging purposes...
@@ -1079,8 +1067,8 @@
 	 * before any I/O can be performed. If it fails, this
 	 * device is useless.
 	 */
-	if(!i2ob_queues[unit]) {
-		if(i2ob_init_iop(unit))
+	if(!i2ob_queues[c->unit]) {
+		if(i2ob_init_iop(c->unit))
 			return 1;
 	}
 
@@ -1089,11 +1077,11 @@
 	 * code so that we can directly get the queue ptr from the
 	 * device instead of having to go the IOP data structure.
 	 */
-	dev->req_queue = i2ob_queues[unit]->req_queue;
+	dev->req_queue = i2ob_queues[c->unit]->req_queue;
 
 	/* initialize gendik structure */
-	i2ob_disk[unit>>4]->private_data = dev;
-	i2ob_disk[unit>>4]->queue = dev->req_queue;
+	i2ob_disk[unit]->private_data = dev;
+	i2ob_disk[unit]->queue = dev->req_queue;
 
 	/*
 	 *	Ask for the current media data. If that isn't supported
@@ -1110,7 +1098,7 @@
 		power = 0;
 	i2ob_query_device(dev, 0x0000, 5, &flags, 4);
 	i2ob_query_device(dev, 0x0000, 6, &status, 4);
-	set_capacity(i2ob_disk[unit>>4], size>>9);
+	set_capacity(i2ob_disk[unit], size>>9);
 
 	/*
 	 * Max number of Scatter-Gather Elements
@@ -1119,44 +1107,41 @@
 	i2ob_dev[unit].power = power;	/* Save power state in device proper */
 	i2ob_dev[unit].flags = flags;
 
-	for(i=unit;i<=unit+15;i++)
-	{
-		request_queue_t *q = i2ob_dev[unit].req_queue;
-		int segments = (d->controller->status_block->inbound_frame_size - 7) / 2;
+	q = i2ob_dev[unit].req_queue;
+	segments = (d->controller->status_block->inbound_frame_size - 7) / 2;
 
-		if(segments > 16)
-			segments = 16;
-					
-		i2ob_dev[i].power = power;	/* Save power state */
-		i2ob_dev[unit].flags = flags;	/* Keep the type info */
+	if(segments > 16)
+		segments = 16;
+				
+	i2ob_dev[unit].power = power;	/* Save power state */
+	i2ob_dev[unit].flags = flags;	/* Keep the type info */
 		
-		blk_queue_max_sectors(q, 96);	/* 256 might be nicer but many controllers 
+	blk_queue_max_sectors(q, 96);	/* 256 might be nicer but many controllers 
 						   explode on 65536 or higher */
-		blk_queue_max_phys_segments(q, segments);
-		blk_queue_max_hw_segments(q, segments);
+	blk_queue_max_phys_segments(q, segments);
+	blk_queue_max_hw_segments(q, segments);
 		
-		i2ob_dev[i].rcache = CACHE_SMARTFETCH;
-		i2ob_dev[i].wcache = CACHE_WRITETHROUGH;
+	i2ob_dev[unit].rcache = CACHE_SMARTFETCH;
+	i2ob_dev[unit].wcache = CACHE_WRITETHROUGH;
 		
-		if(d->controller->battery == 0)
-			i2ob_dev[i].wcache = CACHE_WRITETHROUGH;
+	if(d->controller->battery == 0)
+		i2ob_dev[unit].wcache = CACHE_WRITETHROUGH;
 
-		if(d->controller->promise)
-			i2ob_dev[i].wcache = CACHE_WRITETHROUGH;
+	if(d->controller->promise)
+		i2ob_dev[unit].wcache = CACHE_WRITETHROUGH;
 
-		if(d->controller->short_req)
-		{
-			blk_queue_max_sectors(q, 8);
-			blk_queue_max_phys_segments(q, 8);
-			blk_queue_max_hw_segments(q, 8);
-		}
+	if(d->controller->short_req)
+	{
+		blk_queue_max_sectors(q, 8);
+		blk_queue_max_phys_segments(q, 8);
+		blk_queue_max_hw_segments(q, 8);
 	}
 
-	strcpy(d->dev_name, i2ob_disk[unit>>4]->disk_name);
-	strcpy(i2ob_disk[unit>>4]->devfs_name, i2ob_disk[unit>>4]->disk_name);
+	strcpy(d->dev_name, i2ob_disk[unit]->disk_name);
+	strcpy(i2ob_disk[unit]->devfs_name, i2ob_disk[unit]->disk_name);
 
 	printk(KERN_INFO "%s: Max segments %d, queue depth %d, byte limit %d.\n",
-		 d->dev_name, i2ob_dev[unit].max_segments, i2ob_dev[unit].depth, i2ob_max_sectors[unit]<<9);
+		 d->dev_name, i2ob_dev[unit].max_segments, i2ob_dev[unit].depth, i2ob_dev[unit].max_sectors<<9);
 
 	i2ob_query_device(dev, 0x0000, 0, &type, 1);
 
@@ -1197,7 +1182,7 @@
 	}
 	printk(".\n");
 	printk(KERN_INFO "%s: Maximum sectors/read set to %d.\n", 
-		d->dev_name, i2ob_max_sectors[unit]);
+		d->dev_name, i2ob_dev[unit].max_sectors);
 
 	/*
 	 * Register for the events we're interested in and that the
@@ -1323,7 +1308,7 @@
 
 			i2o_release_device(d, &i2o_block_handler);
 
-			if(scan_unit<MAX_I2OB<<4)
+			if(scan_unit<MAX_I2OB)
 			{
  				/*
 				 * Get the device and fill in the
@@ -1339,8 +1324,8 @@
 					printk(KERN_WARNING "Could not install I2O block device\n");
 				else
 				{
-					add_disk(i2ob_disk[scan_unit>>4]);
-					scan_unit+=16;
+					add_disk(i2ob_disk[scan_unit]);
+					scan_unit++;
 					i2ob_dev_count++;
 
 					/* We want to know when device goes away */
@@ -1350,7 +1335,7 @@
 			else
 			{
 				if(!warned++)
-					printk(KERN_WARNING "i2o_block: too many device, registering only %d.\n", scan_unit>>4);
+					printk(KERN_WARNING "i2o_block: too many device, registering only %d.\n", scan_unit);
 			}
 		}
 		i2o_unlock_controller(c);
@@ -1399,12 +1384,12 @@
 	printk(KERN_INFO "   Controller %d Tid %d\n",c->unit, d->lct_data.tid);
 
 	/* Check for available space */
-	if(i2ob_dev_count>=MAX_I2OB<<4)
+	if(i2ob_dev_count>=MAX_I2OB)
 	{
 		printk(KERN_ERR "i2o_block: No more devices allowed!\n");
 		return;
 	}
-	for(unit = 0; unit < (MAX_I2OB<<4); unit += 16)
+	for(unit = 0; unit < MAX_I2OB; unit ++)
 	{
 		if(!i2ob_dev[unit].i2odev)
 			break;
@@ -1425,7 +1410,7 @@
 		printk(KERN_ERR "i2o_block: Could not install new device\n");
 	else	
 	{
-		add_disk(i2ob_disk[unit>>4]);
+		add_disk(i2ob_disk[unit]);
 		i2ob_dev_count++;
 		i2o_device_notify_on(d, &i2o_block_handler);
 	}
@@ -1443,7 +1428,6 @@
 void i2ob_del_device(struct i2o_controller *c, struct i2o_device *d)
 {	
 	int unit = 0;
-	int i = 0;
 	unsigned long flags;
 
 	spin_lock_irqsave(I2O_LOCK(c->unit), flags);
@@ -1456,7 +1440,7 @@
 
 	printk(KERN_INFO "I2O Block Device Deleted\n");
 
-	for(unit = 0; unit < MAX_I2OB<<4; unit += 16)
+	for(unit = 0; unit < MAX_I2OB; unit ++)
 	{
 		if(i2ob_dev[unit].i2odev == d)
 		{
@@ -1465,7 +1449,7 @@
 			break;
 		}
 	}
-	if(unit >= MAX_I2OB<<4)
+	if(unit >= MAX_I2OB)
 	{
 		printk(KERN_ERR "i2ob_del_device called, but not in dev table!\n");
 		spin_unlock_irqrestore(I2O_LOCK(c->unit), flags);
@@ -1476,13 +1460,10 @@
 	 * This will force errors when i2ob_get_queue() is called
 	 * by the kenrel.
 	 */
-	del_gendisk(i2ob_disk[unit>>4]);
+	del_gendisk(i2ob_disk[unit]);
 	i2ob_dev[unit].req_queue = NULL;
-	for(i = unit; i <= unit+15; i++)
-	{
-		i2ob_dev[i].i2odev = NULL;
-		blk_queue_max_sectors(i2ob_dev[i].req_queue, 0);
-	}
+	i2ob_dev[unit].i2odev = NULL;
+	blk_queue_max_sectors(i2ob_dev[unit].req_queue, 0);
 	spin_unlock_irqrestore(I2O_LOCK(c->unit), flags);
 
 	/*
@@ -1493,14 +1474,13 @@
 		MOD_DEC_USE_COUNT;
 
 	i2ob_dev[unit].refcnt = 0;
-	
-	i2ob_dev[i].tid = 0;
+	i2ob_dev[unit].tid = 0;
 
 	/* 
 	 * Do we need this?
 	 * The media didn't really change...the device is just gone
 	 */
-	i2ob_media_change_flag[unit] = 1;
+	i2ob_dev[unit].media_change_flag = 1;
 
 	i2ob_dev_count--;	
 }
@@ -1511,10 +1491,9 @@
 static int i2ob_media_change(struct gendisk *disk)
 {
 	struct i2ob_device *p = disk->private_data;
-	int i = p->index;
-	if(i2ob_media_change_flag[i])
+	if(p->media_change_flag)
 	{
-		i2ob_media_change_flag[i]=0;
+		p->media_change_flag=0;
 		return 1;
 	}
 	return 0;
@@ -1523,7 +1502,7 @@
 static int i2ob_revalidate(struct gendisk *disk)
 {
 	struct i2ob_device *p = disk->private_data;
-	return i2ob_install_device(p->controller, p->i2odev, p->index<<4);
+	return i2ob_install_device(p->controller, p->i2odev, p->index);
 }
 
 /*
@@ -1536,7 +1515,7 @@
 	
 	for(i=0;i<MAX_I2OB;i++)
 	{
-		struct i2ob_device *dev=&i2ob_dev[(i<<4)];
+		struct i2ob_device *dev=&i2ob_dev[i];
 		
 		if(dev->refcnt!=0)
 		{
@@ -1598,23 +1577,32 @@
 	if (register_blkdev(MAJOR_NR, "i2o_block"))
 		return -EIO;
 
-	for (i = 0; i < MAX_I2OB; i++) {
-		struct gendisk *disk = alloc_disk(16);
-		if (!disk)
-			goto oom;
-		i2ob_dev[i<<4].index = i;
-		disk->queue = i2ob_dev[i<<4].req_queue;
-		i2ob_disk[i] = disk;
-	}
 #ifdef MODULE
 	printk(KERN_INFO "i2o_block: registered device at major %d\n", MAJOR_NR);
 #endif
 
 	/*
+	 *	Set up the queue
+	 */
+	for(i = 0; i < MAX_I2O_CONTROLLERS; i++)
+		i2ob_queues[i] = NULL;
+
+	/*
 	 *	Now fill in the boiler plate
 	 */
 	 
-	for (i = 0; i < MAX_I2OB << 4; i++) {
+	for (i = 0; i < MAX_I2OB; i++) {
+		struct gendisk *disk = alloc_disk(16);
+		if (!disk)
+			goto oom;
+		i2ob_disk[i] = disk;
+
+		disk->major = MAJOR_NR;
+		disk->first_minor = i<<4;
+		disk->fops = &i2ob_fops;
+		sprintf(disk->disk_name, "i2o/hd%c", 'a' + i);
+
+		i2ob_dev[i].index = i;
 		i2ob_dev[i].refcnt = 0;
 		i2ob_dev[i].flags = 0;
 		i2ob_dev[i].controller = NULL;
@@ -1623,26 +1614,10 @@
 		i2ob_dev[i].head = NULL;
 		i2ob_dev[i].tail = NULL;
 		i2ob_dev[i].depth = MAX_I2OB_DEPTH;
-		i2ob_max_sectors[i] = 2;
-	}
-	
-	for (i = 0; i < MAX_I2OB; i++) {
-		struct gendisk *disk = i2ob_disk[i];
-		disk->major = MAJOR_NR;
-		disk->first_minor = i<<4;
-		disk->fops = &i2ob_fops;
-		sprintf(disk->disk_name, "i2o/hd%c", 'a' + i);
+		i2ob_dev[i].max_sectors = 2;
 	}
 	
 	/*
-	 *	Set up the queue
-	 */
-	for(i = 0; i < MAX_I2O_CONTROLLERS; i++)
-	{
-		i2ob_queues[i] = NULL;
-	}
-
-	/*
 	 *	Register the OSM handler as we will need this to probe for
 	 *	drives, geometry and other goodies.
 	 */
@@ -1701,10 +1676,10 @@
 	if(i2ob_dev_count) {
 		struct i2o_device *d;
 		for(i = 0; i < MAX_I2OB; i++)
-		if((d=i2ob_dev[i<<4].i2odev)) {
+		if((d=i2ob_dev[i].i2odev)) {
 			i2o_device_notify_off(d, &i2o_block_handler);
 			i2o_event_register(d->controller, d->lct_data.tid, 
-				i2ob_context, i<<4, 0);
+				i2ob_context, i, 0);
 		}
 	}
 	
@@ -1726,14 +1701,27 @@
 
 	i2o_remove_handler(&i2o_block_handler);
 		 
-	for (i = 0; i < MAX_I2OB; i++)
+	for (i = 0; i < MAX_I2OB; i++) {
+		struct gendisk *disk = i2ob_disk[i];
+		if(disk->flags & GENHD_FL_UP)
+			del_gendisk(i2ob_disk[i]);
 		put_disk(i2ob_disk[i]);
+	}
 
 	/*
 	 *	Return the block device
 	 */
 	if (unregister_blkdev(MAJOR_NR, "i2o_block") != 0)
 		printk("i2o_block: cleanup_module failed\n");
+
+	/*
+	 *	release request queue
+	 */
+	for (i = 0; i < MAX_I2O_CONTROLLERS; i ++)
+		if(i2ob_queues[i]) {
+			blk_cleanup_queue(i2ob_queues[i]->req_queue);
+			kfree(i2ob_queues[i]);
+		}
 }
 
 MODULE_AUTHOR("Red Hat");

^ permalink raw reply	[flat|nested] 13+ messages in thread

[parent not found: <40848159.7090605@togami.com>]

* Re: [PATCH] i2o_block Fix, possible CFQ elevator problem?
       [not found]     ` <40848159.7090605@togami.com>
@ 2004-04-20  7:08       ` Jens Axboe
  2004-04-20  7:58         ` Warren Togami
  0 siblings, 1 reply; 13+ messages in thread
From: Jens Axboe @ 2004-04-20  7:08 UTC (permalink / raw)
  To: Warren Togami
  Cc: Markus Lidel, Arjan van de Ven, linux-scsi, linux-kernel,
	Alan Cox

On Mon, Apr 19 2004, Warren Togami wrote:
> Warren Togami wrote:
> >Jens Axboe wrote:
> >
> >>>
> >>>http://togami.com/~warren/archive/2004/i2o_cfq_quad_bonnie.jpg
> >>>
> >>>I noticed that the call trace contained "cfq_next_request", so I was
> >>>curious what would happened if we changed to the deadline scheduler.
> >>>Booted with the same kernel but with "elevator=deadline". To our
> >>>surprise, bonnie++ ran simultaneously on all four I2O block devices
> >>>without crashing the server. For another test we tried "elevator=as"
> >>>and it too remained stable.
> >>>
> >>>Possible CFQ I/O scheduler problem?
> >>
> >>
> >>
> >>That looks pretty damn strange. Any chance you can capture a full oops,
> >>with a serial console or similar?
> >
> >
> >Fedora Core 2 x86_64 SMP (2 x Opteron) with kernel-2.6.5-1.326 (based on
> >2.6.6-rc1)
> >gcc-3.3.3-7 plus attached patch.  This panic was captured by netconsole.
> >
> >http://togami.com/~warren/archive/2004/i2o_cfq_quad_bonnie.txt
> 
> Next we tested cfq with the following section of code commented out. 
> With this change the kernel no longer panics and seems to survive with 
> four simultaneous bonnie++'s on all four block devices.
> 
> --- cfq-iosched.c       2004-04-20 13:52:55.000000000 -1000
> +++ /root/linux-2.6.5-1.326/drivers/block/cfq-iosched.c 2004-04-20 
> 14:09:43.000000000 -1000
> @@ -401,10 +401,12 @@
>  dispatch:
>                 rq = list_entry_rq(cfqd->dispatch->next);
> 
> +/*
>                 BUG_ON(q->last_merge == rq);
>                 crq = RQ_DATA(rq);
>                 if (crq)
>                         BUG_ON(ON_MHASH(crq));
> +*/
> 
>                 return rq;
>         }

This is not safe, the BUG_ON is there for a reason. If the request in on
the merge hash when handed to the driver, you risk corrupting data. The
fix would be figuring out why this is happening. Maybe it's looking at
bad data, could you test with this patch applied and see if the oops
still triggers?

===== drivers/block/cfq-iosched.c 1.1 vs edited =====
--- 1.1/drivers/block/cfq-iosched.c	Mon Apr 12 19:55:20 2004
+++ edited/drivers/block/cfq-iosched.c	Tue Apr 20 09:07:20 2004
@@ -403,7 +403,7 @@
 
 		BUG_ON(q->last_merge == rq);
 		crq = RQ_DATA(rq);
-		if (crq)
+		if (blk_fs_request(rq) && crq)
 			BUG_ON(ON_MHASH(crq));
 
 		return rq;

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] i2o_block Fix, possible CFQ elevator problem?
  2004-04-20  7:08       ` Jens Axboe
@ 2004-04-20  7:58         ` Warren Togami
  2004-04-20  8:03           ` Jens Axboe
  0 siblings, 1 reply; 13+ messages in thread
From: Warren Togami @ 2004-04-20  7:58 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Markus Lidel, Arjan van de Ven, linux-scsi, linux-kernel,
	Alan Cox

Jens Axboe wrote:
>>>http://togami.com/~warren/archive/2004/i2o_cfq_quad_bonnie.txt
>>
>>Next we tested cfq with the following section of code commented out. 
>>With this change the kernel no longer panics and seems to survive with 
>>four simultaneous bonnie++'s on all four block devices.
>>
>>--- cfq-iosched.c       2004-04-20 13:52:55.000000000 -1000
>>+++ /root/linux-2.6.5-1.326/drivers/block/cfq-iosched.c 2004-04-20 
>>14:09:43.000000000 -1000
>>@@ -401,10 +401,12 @@
>> dispatch:
>>                rq = list_entry_rq(cfqd->dispatch->next);
>>
>>+/*
>>                BUG_ON(q->last_merge == rq);
>>                crq = RQ_DATA(rq);
>>                if (crq)
>>                        BUG_ON(ON_MHASH(crq));
>>+*/
>>
>>                return rq;
>>        }
> 
> 
> This is not safe, the BUG_ON is there for a reason. If the request in on
> the merge hash when handed to the driver, you risk corrupting data. The
> fix would be figuring out why this is happening. Maybe it's looking at
> bad data, could you test with this patch applied and see if the oops
> still triggers?
> 
> ===== drivers/block/cfq-iosched.c 1.1 vs edited =====
> --- 1.1/drivers/block/cfq-iosched.c	Mon Apr 12 19:55:20 2004
> +++ edited/drivers/block/cfq-iosched.c	Tue Apr 20 09:07:20 2004
> @@ -403,7 +403,7 @@
>  
>  		BUG_ON(q->last_merge == rq);
>  		crq = RQ_DATA(rq);
> -		if (crq)
> +		if (blk_fs_request(rq) && crq)
>  			BUG_ON(ON_MHASH(crq));
>  
>  		return rq;
> 

We figured removing error handling was not safe, the previous post was 
only reporting test results to ask for more suggestions.  I have now 
tested your suggested patch above and it seems to crash in the same way 
as originally.

http://togami.com/~warren/archive/2004/i2o_cfq_quad_bonnie2.txt

This makes me curious, the other elevators lacked this type of error 
checking.  Did this mean they were possibly allowing data corruption to 
happen with buggy drivers like this?  Kind of scary!  We were lucky to 
test this now, because this was one of the first FC kernels that 
included cfq by default.

Do you have any advice regarding the atomic type removal problem that we 
experienced from our previous post?

Warren Togami
wtogami@redhat.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] i2o_block Fix, possible CFQ elevator problem?
  2004-04-20  7:58         ` Warren Togami
@ 2004-04-20  8:03           ` Jens Axboe
  2004-04-20  8:59             ` Warren Togami
  0 siblings, 1 reply; 13+ messages in thread
From: Jens Axboe @ 2004-04-20  8:03 UTC (permalink / raw)
  To: Warren Togami
  Cc: Markus Lidel, Arjan van de Ven, linux-scsi, linux-kernel,
	Alan Cox

On Mon, Apr 19 2004, Warren Togami wrote:
> Jens Axboe wrote:
> >>>http://togami.com/~warren/archive/2004/i2o_cfq_quad_bonnie.txt
> >>
> >>Next we tested cfq with the following section of code commented out. 
> >>With this change the kernel no longer panics and seems to survive with 
> >>four simultaneous bonnie++'s on all four block devices.
> >>
> >>--- cfq-iosched.c       2004-04-20 13:52:55.000000000 -1000
> >>+++ /root/linux-2.6.5-1.326/drivers/block/cfq-iosched.c 2004-04-20 
> >>14:09:43.000000000 -1000
> >>@@ -401,10 +401,12 @@
> >>dispatch:
> >>               rq = list_entry_rq(cfqd->dispatch->next);
> >>
> >>+/*
> >>               BUG_ON(q->last_merge == rq);
> >>               crq = RQ_DATA(rq);
> >>               if (crq)
> >>                       BUG_ON(ON_MHASH(crq));
> >>+*/
> >>
> >>               return rq;
> >>       }
> >
> >
> >This is not safe, the BUG_ON is there for a reason. If the request in on
> >the merge hash when handed to the driver, you risk corrupting data. The
> >fix would be figuring out why this is happening. Maybe it's looking at
> >bad data, could you test with this patch applied and see if the oops
> >still triggers?
> >
> >===== drivers/block/cfq-iosched.c 1.1 vs edited =====
> >--- 1.1/drivers/block/cfq-iosched.c	Mon Apr 12 19:55:20 2004
> >+++ edited/drivers/block/cfq-iosched.c	Tue Apr 20 09:07:20 2004
> >@@ -403,7 +403,7 @@
> > 
> > 		BUG_ON(q->last_merge == rq);
> > 		crq = RQ_DATA(rq);
> >-		if (crq)
> >+		if (blk_fs_request(rq) && crq)
> > 			BUG_ON(ON_MHASH(crq));
> > 
> > 		return rq;
> >
> 
> We figured removing error handling was not safe, the previous post was 
> only reporting test results to ask for more suggestions.  I have now 
> tested your suggested patch above and it seems to crash in the same way 
> as originally.
> 
> http://togami.com/~warren/archive/2004/i2o_cfq_quad_bonnie2.txt

As a temporary safe work-around, you can apply this patch.

> This makes me curious, the other elevators lacked this type of error 
> checking.  Did this mean they were possibly allowing data corruption to 
> happen with buggy drivers like this?  Kind of scary!  We were lucky to 
> test this now, because this was one of the first FC kernels that 
> included cfq by default.

Not necessarily, it's most likely a CFQ bug. Otherwise it would have
surfaced before :-)

> Do you have any advice regarding the atomic type removal problem that we 
> experienced from our previous post?

Just change the type to an unsigned integer instead. Double check that
all decrements/increments and reads of that integer are inside the
device lock, it looked like they were.

===== drivers/block/cfq-iosched.c 1.1 vs edited =====
--- 1.1/drivers/block/cfq-iosched.c	Mon Apr 12 19:55:20 2004
+++ edited/drivers/block/cfq-iosched.c	Tue Apr 20 10:02:01 2004
@@ -404,7 +404,7 @@
 		BUG_ON(q->last_merge == rq);
 		crq = RQ_DATA(rq);
 		if (crq)
-			BUG_ON(ON_MHASH(crq));
+			cfq_remove_merge_hints(q, crq);
 
 		return rq;
 	}

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] i2o_block Fix, possible CFQ elevator problem?
  2004-04-20  8:03           ` Jens Axboe
@ 2004-04-20  8:59             ` Warren Togami
  2004-04-20  9:05               ` Jens Axboe
  0 siblings, 1 reply; 13+ messages in thread
From: Warren Togami @ 2004-04-20  8:59 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Markus Lidel, Arjan van de Ven, linux-scsi, linux-kernel,
	Alan Cox

Jens Axboe wrote:
>>We figured removing error handling was not safe, the previous post was 
>>only reporting test results to ask for more suggestions.  I have now 
>>tested your suggested patch above and it seems to crash in the same way 
>>as originally.
>>
>>http://togami.com/~warren/archive/2004/i2o_cfq_quad_bonnie2.txt
> 
> 
> As a temporary safe work-around, you can apply this patch.
> 
> 
>>This makes me curious, the other elevators lacked this type of error 
>>checking.  Did this mean they were possibly allowing data corruption to 
>>happen with buggy drivers like this?  Kind of scary!  We were lucky to 
>>test this now, because this was one of the first FC kernels that 
>>included cfq by default.
> 
> 
> Not necessarily, it's most likely a CFQ bug. Otherwise it would have
> surfaced before :-)
> 

I forgot to mention in the previous reports:

Prior to three of your original suggested cleanups of i2o_block, four 
simultaneous bonnie++'s on four independent arrays would almost 
immediately cause the crash while running elevator=cfq.  After those 
three cleanups four simultaneous bonnie++ would survive for a while 
without crashing... until you run "sync" in another terminal.  We 
however did not test it enough times to determine if without "sync" it 
can survive the test run.  Do you want this tested without "sync"?

With the deadline scheduler "sync" would take maybe 30 seconds and 
return.  With the cfq scheduler "sync" would be stuck there for much 
longer, then trigger the crash.  Markus has suspected that it crashes 
when sync returns, but we have not confirmed that.

With the cfq scheduler with the error checking completely removed, 
"sync" would be stuck there for a minute or more but eventually return 
without crashing.  (The array is reformatted between every test so we 
don't really care about data corruption.)

I hope this data is helpful.

Warren Togami
wtogami@redhat.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] i2o_block Fix, possible CFQ elevator problem?
  2004-04-20  8:59             ` Warren Togami
@ 2004-04-20  9:05               ` Jens Axboe
  2004-04-20 10:53                 ` Warren Togami
  0 siblings, 1 reply; 13+ messages in thread
From: Jens Axboe @ 2004-04-20  9:05 UTC (permalink / raw)
  To: Warren Togami
  Cc: Markus Lidel, Arjan van de Ven, linux-scsi, linux-kernel,
	Alan Cox

On Mon, Apr 19 2004, Warren Togami wrote:
> Jens Axboe wrote:
> >>We figured removing error handling was not safe, the previous post was 
> >>only reporting test results to ask for more suggestions.  I have now 
> >>tested your suggested patch above and it seems to crash in the same way 
> >>as originally.
> >>
> >>http://togami.com/~warren/archive/2004/i2o_cfq_quad_bonnie2.txt
> >
> >
> >As a temporary safe work-around, you can apply this patch.
> >
> >
> >>This makes me curious, the other elevators lacked this type of error 
> >>checking.  Did this mean they were possibly allowing data corruption to 
> >>happen with buggy drivers like this?  Kind of scary!  We were lucky to 
> >>test this now, because this was one of the first FC kernels that 
> >>included cfq by default.
> >
> >
> >Not necessarily, it's most likely a CFQ bug. Otherwise it would have
> >surfaced before :-)
> >
> 
> I forgot to mention in the previous reports:
> 
> Prior to three of your original suggested cleanups of i2o_block, four 
> simultaneous bonnie++'s on four independent arrays would almost 
> immediately cause the crash while running elevator=cfq.  After those 
> three cleanups four simultaneous bonnie++ would survive for a while 
> without crashing... until you run "sync" in another terminal.  We 
> however did not test it enough times to determine if without "sync" it 
> can survive the test run.  Do you want this tested without "sync"?

Repeat the tests that made it crash. The last patch I sent should work
for you, at least until the real issue is found.

> With the deadline scheduler "sync" would take maybe 30 seconds and 
> return.  With the cfq scheduler "sync" would be stuck there for much 
> longer, then trigger the crash.  Markus has suspected that it crashes 
> when sync returns, but we have not confirmed that.

Probably not the case. Running sync only initiates the dirty data
flushing, the actual write out happens out of that context. So it's
probably just running the sync that changes the timings a little bit,
triggering the bug.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] i2o_block Fix, possible CFQ elevator problem?
  2004-04-20  9:05               ` Jens Axboe
@ 2004-04-20 10:53                 ` Warren Togami
  2004-04-20 10:56                   ` Jens Axboe
  0 siblings, 1 reply; 13+ messages in thread
From: Warren Togami @ 2004-04-20 10:53 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Markus Lidel, Arjan van de Ven, linux-scsi, linux-kernel,
	Alan Cox

Jens Axboe wrote:
>>>Not necessarily, it's most likely a CFQ bug. Otherwise it would have
>>>surfaced before :-)
>>>
>>
>>I forgot to mention in the previous reports:
>>
>>Prior to three of your original suggested cleanups of i2o_block, four 
>>simultaneous bonnie++'s on four independent arrays would almost 
>>immediately cause the crash while running elevator=cfq.  After those 
>>three cleanups four simultaneous bonnie++ would survive for a while 
>>without crashing... until you run "sync" in another terminal.  We 
>>however did not test it enough times to determine if without "sync" it 
>>can survive the test run.  Do you want this tested without "sync"?
> 
> 
> Repeat the tests that made it crash. The last patch I sent should work
> for you, at least until the real issue is found.
> 

Tested your patch, it indeed does seem to keep the system stable.  If I 
am understanding it right, the patch disables merging in the case where 
it would have caused a BUG condition?  (Less efficiency.)

In any case, for now we are doing our i2o development using the other 
schedulers like deadline.  Let us know if you have updated cfq patches 
to try, and we will.

Warren Togami
wtogami@redhat.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] i2o_block Fix, possible CFQ elevator problem?
  2004-04-20 10:53                 ` Warren Togami
@ 2004-04-20 10:56                   ` Jens Axboe
  2004-04-20 11:29                     ` Warren Togami
  0 siblings, 1 reply; 13+ messages in thread
From: Jens Axboe @ 2004-04-20 10:56 UTC (permalink / raw)
  To: Warren Togami
  Cc: Markus Lidel, Arjan van de Ven, linux-scsi, linux-kernel,
	Alan Cox

On Tue, Apr 20 2004, Warren Togami wrote:
> Jens Axboe wrote:
> >>>Not necessarily, it's most likely a CFQ bug. Otherwise it would have
> >>>surfaced before :-)
> >>>
> >>
> >>I forgot to mention in the previous reports:
> >>
> >>Prior to three of your original suggested cleanups of i2o_block, four 
> >>simultaneous bonnie++'s on four independent arrays would almost 
> >>immediately cause the crash while running elevator=cfq.  After those 
> >>three cleanups four simultaneous bonnie++ would survive for a while 
> >>without crashing... until you run "sync" in another terminal.  We 
> >>however did not test it enough times to determine if without "sync" it 
> >>can survive the test run.  Do you want this tested without "sync"?
> >
> >
> >Repeat the tests that made it crash. The last patch I sent should work
> >for you, at least until the real issue is found.
> >
> 
> Tested your patch, it indeed does seem to keep the system stable.  If I 
> am understanding it right, the patch disables merging in the case where 
> it would have caused a BUG condition?  (Less efficiency.)

Not quite, it just removes the crq from the hash before it can do any
damage. It doesn't impact efficiency, the request is removed from the io
scheduler at this point anyway.

> In any case, for now we are doing our i2o development using the other 
> schedulers like deadline.  Let us know if you have updated cfq patches 
> to try, and we will.

I'll see if I can reproduce it here.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] i2o_block Fix, possible CFQ elevator problem?
  2004-04-20 10:56                   ` Jens Axboe
@ 2004-04-20 11:29                     ` Warren Togami
  2004-04-20 11:34                       ` Jens Axboe
  0 siblings, 1 reply; 13+ messages in thread
From: Warren Togami @ 2004-04-20 11:29 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Markus Lidel, Arjan van de Ven, linux-scsi, linux-kernel,
	Alan Cox

Jens Axboe wrote:
>>>
>>>Repeat the tests that made it crash. The last patch I sent should work
>>>for you, at least until the real issue is found.
>>>
>>
>>Tested your patch, it indeed does seem to keep the system stable.  If I 
>>am understanding it right, the patch disables merging in the case where 
>>it would have caused a BUG condition?  (Less efficiency.)
> 
> 

Bad news... much later during the test the system locked up.  During 
this test we did not use "sync" but just let all four bonnie++'s run.

http://togami.com/~warren/archive/2004/i2o_cfq_quad_bonnie3.txt
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at cfq_iosched:404
invalid operand: 0000 [1] SMP
CPU 0
Pid: 0, comm: swapper Not tainted 2.6.5-1.326changeme
RIP: 0010:[<ffffffff80234f74>] <ffffffff80234f74>{cfq_next_request+35}
RSP: 0018:ffffffff804a9048  EFLAGS: 00010046
RAX: 000001007f2c6000 RBX: 000001004483b340 RCX: 0000000000000000
RDX: 000001007f2ca600 RSI: 0000000000180000 RDI: 000001007f2c6000
RBP: 000001007f2c6000 R08: 0000010037ffa568 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000001000 R12: 000001007f2c6000
R13: 000001007f2c6000 R14: ffffffff804adf08 R15: 0000000000000000
FS:  0000002a9555f360(0000) GS:ffffffff804a4e80(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000856348 CR3: 0000000000101000 CR4: 00000000000006e0
Process swapper (pid: 0, stackpage=ffffffff803cf5c0)
Stack: ffffffffa003a880 00000100348b2300 ffffffffa003a880 ffffffff8022b444
        00000100348b2300 ffffffffa003a880 000000000001ac00 ffffffffa0036b82
        ffffffff803f40a0 00000100629a5e80
Call Trace:<IRQ> <ffffffff8022b444>{elv_next_request+238} 
<ffffffffa0036b82>{:i2o_block:i2ob_request+233}
        <ffffffffa003670e>{:i2o_block:i2o_block_reply+1162}
        <ffffffff8013e13a>{__mod_timer+807} 
<ffffffff801310b5>{load_balance+1046}
        <ffffffff8013e7f9>{update_wall_time+12} 
<ffffffffa002908a>{:i2o_core:i2o_run_queue+124}
        <ffffffffa002b60e>{:i2o_core:i2o_pci_interrupt+9} 
<ffffffff80113d85>{handle_IRQ_event+44}
        <ffffffff801140b8>{do_IRQ+285} <ffffffff8013a894>{__do_softirq+76}
        <ffffffff8010f590>{default_idle+0} 
<ffffffff8011185f>{ret_from_intr+0}
         <EOI> <ffffffff8010f590>{default_idle+0} 
<ffffffff8010f5b4>{default_idle+36}
        <ffffffff8010f627>{cpu_idle+24} <ffffffff804af7ea>{start_kernel+512}
        <ffffffff804af17c>{x86_64_start_kernel+144}

Code: 0f 0b 78 3a 32 80 ff ff ff ff 94 01 f6 43 10 10 48 8b 73 68
RIP <ffffffff80234f74>{cfq_next_request+35} RSP <ffffffff804a9048>
  <0>Kernel panic: Aiee, killing interrupt handler!
In interrupt handler - not syncing
  NMI Watchdog detected LOCKUP on CPU1, registers
CPU 1
Pid: 1525, comm: bonnie++ Not tainted 2.6.5-1.326changeme
RIP: 0010:[<ffffffff8022f332>] <ffffffff8022f332>{.text.lock.ll_rw_blk+95}
RSP: 0018:000001005b553808  EFLAGS: 00000086
RAX: 0000000000000000 RBX: 0000010037ffb118 RCX: 0000010064b61800
RDX: 0000000000000000 RSI: 000001005b553830 RDI: 000001007f2c6000
RBP: 000001007f2c6000 R08: 0000000000000000 R09: 00000100498666c0
R10: 00000100498666c0 R11: 00000100498666c0 R12: 0000000000000060
R13: 0000000000000008 R14: 00000000007ca9b7 R15: 0000000000000000
FS:  0000002a9555f360(0000) GS:ffffffff804a4f00(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000002a9555a000 CR3: 000000007fe2f000 CR4: 00000000000006e0
Process bonnie++ (pid: 1525, stackpage=1007e5363c0)
Stack: 0000010000000000 0000000000000001 000001005cc8d6e8 0000000000000000
        0000000000001000 00000100583a0cc0 000001005b553898 0000000000000000
        00000100583a0cc0 0000000000000060
Call Trace:<ffffffff8022e4a7>{generic_make_request+331} 
<ffffffff8022e5b1>{submit_bio+247}
        <ffffffff801961c6>{mpage_bio_submit+27} 
<ffffffff801964d7>{do_mpage_readpage+493}
        <ffffffff801b3f63>{ext2_get_block+0} 
<ffffffff801b3f63>{ext2_get_block+0}
        <ffffffff801b3f63>{ext2_get_block+0} 
<ffffffff8015428c>{add_to_page_cache+129}
        <ffffffff801b3f63>{ext2_get_block+0} 
<ffffffff80196653>{mpage_readpages+161}
        <ffffffff8015a5c3>{read_pages+57} 
<ffffffff80158035>{buffered_rmqueue+510}
        <ffffffff801580fe>{__alloc_pages+158} 
<ffffffff8015ab04>{do_page_cache_readahead+515}
        <ffffffff801545e8>{__lock_page+213} 
<ffffffff8015acbd>{page_cache_readahead+396}
        <ffffffff80154c04>{do_generic_mapping_read+193} 
<ffffffff80154e28>{file_read_actor+0}
        <ffffffff80155084>{__generic_file_aio_read+360} 
<ffffffff80155157>{generic_file_read+116}
        <ffffffff80181a75>{link_path_walk+2806} 
<ffffffff80165bf4>{vma_merge+166}
        <ffffffff802f5707>{thread_return+0} <ffffffff80172e01>{vfs_read+208}
        <ffffffff80173006>{sys_read+57} <ffffffff801112ba>{system_call+126}


Code: 7e f9 e9 aa ea ff ff 90 90 90 41 57 41 56 41 55 49 89 fd bf
console shuts up ...


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] i2o_block Fix, possible CFQ elevator problem?
  2004-04-20 11:29                     ` Warren Togami
@ 2004-04-20 11:34                       ` Jens Axboe
  2004-04-20 11:38                         ` Jens Axboe
  0 siblings, 1 reply; 13+ messages in thread
From: Jens Axboe @ 2004-04-20 11:34 UTC (permalink / raw)
  To: Warren Togami
  Cc: Markus Lidel, Arjan van de Ven, linux-scsi, linux-kernel,
	Alan Cox

On Tue, Apr 20 2004, Warren Togami wrote:
> Jens Axboe wrote:
> >>>
> >>>Repeat the tests that made it crash. The last patch I sent should work
> >>>for you, at least until the real issue is found.
> >>>
> >>
> >>Tested your patch, it indeed does seem to keep the system stable.  If I 
> >>am understanding it right, the patch disables merging in the case where 
> >>it would have caused a BUG condition?  (Less efficiency.)
> >
> >
> 
> Bad news... much later during the test the system locked up.  During 
> this test we did not use "sync" but just let all four bonnie++'s run.
> 
> http://togami.com/~warren/archive/2004/i2o_cfq_quad_bonnie3.txt
> ----------- [cut here ] --------- [please bite here ] ---------
> Kernel BUG at cfq_iosched:404
> invalid operand: 0000 [1] SMP

Sorry about that, that's actually expected when we know this bug
exists. You need to move the cfq_remove_merge_hints(q, crq) before the
BUG_ON(q->last_merge == rq) check, or (better) just remove it
completely. There's no way that q->last_merge could be set to this
request after cfq_remove_merge_hints() was called.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] i2o_block Fix, possible CFQ elevator problem?
  2004-04-20 11:34                       ` Jens Axboe
@ 2004-04-20 11:38                         ` Jens Axboe
  0 siblings, 0 replies; 13+ messages in thread
From: Jens Axboe @ 2004-04-20 11:38 UTC (permalink / raw)
  To: Warren Togami
  Cc: Markus Lidel, Arjan van de Ven, linux-scsi, linux-kernel,
	Alan Cox, Linus Torvalds

On Tue, Apr 20 2004, Jens Axboe wrote:
> On Tue, Apr 20 2004, Warren Togami wrote:
> > Jens Axboe wrote:
> > >>>
> > >>>Repeat the tests that made it crash. The last patch I sent should work
> > >>>for you, at least until the real issue is found.
> > >>>
> > >>
> > >>Tested your patch, it indeed does seem to keep the system stable.  If I 
> > >>am understanding it right, the patch disables merging in the case where 
> > >>it would have caused a BUG condition?  (Less efficiency.)
> > >
> > >
> > 
> > Bad news... much later during the test the system locked up.  During 
> > this test we did not use "sync" but just let all four bonnie++'s run.
> > 
> > http://togami.com/~warren/archive/2004/i2o_cfq_quad_bonnie3.txt
> > ----------- [cut here ] --------- [please bite here ] ---------
> > Kernel BUG at cfq_iosched:404
> > invalid operand: 0000 [1] SMP
> 
> Sorry about that, that's actually expected when we know this bug
> exists. You need to move the cfq_remove_merge_hints(q, crq) before the
> BUG_ON(q->last_merge == rq) check, or (better) just remove it
> completely. There's no way that q->last_merge could be set to this
> request after cfq_remove_merge_hints() was called.

In short, this patch. I can see this happening for an aliased request,
but you should not be hitting that with bonnie (you are not doing any
form of raw or O_DIRECT io, are you?).

===== drivers/block/cfq-iosched.c 1.1 vs edited =====
--- 1.1/drivers/block/cfq-iosched.c	Mon Apr 12 19:55:20 2004
+++ edited/drivers/block/cfq-iosched.c	Tue Apr 20 13:37:33 2004
@@ -401,10 +401,9 @@
 dispatch:
 		rq = list_entry_rq(cfqd->dispatch->next);
 
-		BUG_ON(q->last_merge == rq);
 		crq = RQ_DATA(rq);
 		if (crq)
-			BUG_ON(ON_MHASH(crq));
+			cfq_remove_merge_hints(q, crq);
 
 		return rq;
 	}

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2004-04-20 11:39 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-04-19 11:37 [PATCH] i2o_block Fix, possible CFQ elevator problem? Warren Togami
2004-04-19 12:12 ` Jens Axboe
2004-04-20  0:42   ` Warren Togami
     [not found]     ` <40848159.7090605@togami.com>
2004-04-20  7:08       ` Jens Axboe
2004-04-20  7:58         ` Warren Togami
2004-04-20  8:03           ` Jens Axboe
2004-04-20  8:59             ` Warren Togami
2004-04-20  9:05               ` Jens Axboe
2004-04-20 10:53                 ` Warren Togami
2004-04-20 10:56                   ` Jens Axboe
2004-04-20 11:29                     ` Warren Togami
2004-04-20 11:34                       ` Jens Axboe
2004-04-20 11:38                         ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox