calling scsi_adjust_queue_depth() during I/O...

All of lore.kernel.org
 help / color / mirror / Atom feed

* calling scsi_adjust_queue_depth() during I/O...
@ 2005-08-04 23:41 Andrew Vasquez
  2005-08-05  7:57 ` Jens Axboe
  0 siblings, 1 reply; 12+ messages in thread
From: Andrew Vasquez @ 2005-08-04 23:41 UTC (permalink / raw)
  To: Linux-SCSI Mailing List

All,

While adding support for the new change_queue_depth/type() callbacks,

	static int
	qla2x00_change_queue_depth(struct scsi_device *sdev, int qdepth)
	{
		scsi_adjust_queue_depth(sdev, scsi_get_tag_type(sdev), qdepth);
		return sdev->queue_depth;
	}

and updating the queue-depth:

	# echo 16 > /sys/class/scsi_device/3:0:0:0/device/queue_depth

while I/O is running, I'm hitting a reproducible WARN_ON() triggering
within as_completed_request():

	static void as_completed_request(request_queue_t *q, struct request *rq)
	{
		struct as_data *ad = q->elevator->elevator_data;
		struct as_rq *arq = RQ_DATA(rq);

		WARN_ON(!list_empty(&rq->queuelist));
		...

and a subsequent panic:

	Badness in as_completed_request at drivers/block/as-iosched.c:951

	Call Trace: <IRQ> ffff8024883a>{as_completed_request+63} <ffffffff8024098d>{elv_completed_request+44}
	       <ffffffff8024272a>{__blk_put_request+73} <ffffffff80280781>{scsi_end_request+164}
	       <ffffffff802809eb>{scsi_io_completion+584} <ffffffff80297059>{sd_rw_intr+709}
	       <ffffffff8027aa08>{scsi_finish_command+182} <ffffffff8027b2dc>{scsi_softirq+255}
	       <ffffffff801291ea>{__do_softirq+110} <ffffffff8010eb13>{call_softirq+31}
	       <ffffffff801101be>{do_softirq+54} <ffffffff80110211>{do_IRQ+74}
	       <ffffffff8010deba>{ret_from_intr+0}  <EOI> <ffffffff8010c2fd>{mwait_idle+86}
	       <ffffffff8021aef0>{acpi_processor_idle+310} <ffffffff8010cacb>{cpu_idle+79}
	       <ffffffff804cecbf>{start_secondary+1017}
	----------- [cut here ] --------- [please bite here ] ---------
	Kernel BUG at "drivers/block/ll_rw_blk.c":2361
	invalid operand: 0000 [1] SMP
	CPU 2
	Modules linked in: qla2xxx
	Pid: 0, comm: swapper Not tainted 2.6.13-rc5
	RIP: 0010:[<ffffffff80242734>] <ffffffff80242734>{__blk_put_request+83}
	RSP: 0018:ffff8100021bbde8  EFLAGS: 00010087
	RAX: 0000000000000000 RBX: ffff81002dc738b0 RCX: 0000000000008000
	RDX: 0000000000004e6b RSI: 0000000000000004 RDI: ffff81003e091778
	RBP: ffff81003f8fa600 R08: 0000000000000000 R09: 0000000000000003
	R10: 00000000ffffffff R11: 0000000000000000 R12: 0000000000000000
	R13: 0000000000000001 R14: ffff81003f8fa600 R15: ffff81003f8fa600
	FS:  0000000000000000(0000) GS:ffffffff804b6900(0000) knlGS:0000000000000000
	CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
	CR2: 00002aaaaaac1000 CR3: 0000000037f05000 CR4: 00000000000006e0
	Process swapper (pid: 0, threadinfo ffff8100021b6000, task ffff8100021b54f0)
	Stack: ffff81002dc738b0 ffff81002c1cd7c0 0000000000000286 ffffffff80280781
	       0000000000000001 ffff81002c1cd7c0 ffff81002dc738b0 0000000000000000
	       0000000000080000 ffffffff802809eb
	Call Trace: <IRQ> <ffffffff80280781>{scsi_end_request+164} <ffffffff802809eb>{scsi_io_completion+584}
	       <ffffffff80297059>{sd_rw_intr+709} <ffffffff8027aa08>{scsi_finish_command+182}
	       <ffffffff8027b2dc>{scsi_softirq+255} <ffffffff801291ea>{__do_softirq+110}
	       <ffffffff8010eb13>{call_softirq+31} <ffffffff801101be>{do_softirq+54}
	       <ffffffff80110211>{do_IRQ+74} <ffffffff8010deba>{ret_from_intr+0}
		<EOI> <ffffffff8010c2fd>{mwait_idle+86} <ffffffff8021aef0>{acpi_processor_idle+310}
	       <ffffffff8010cacb>{cpu_idle+79} <ffffffff804cecbf>{start_secondary+1017}

	Code: 0f 0b a3 0b f2 32 80 ff ff ff ff c2 39 09 48 89 de 48 89 ef
	RIP <ffffffff80242734>{__blk_put_request+83} RSP <ffff8100021bbde8>
	 <3>Debug: sleeping function called from invalid context at include/linux/rwsem.h:43
	in_atomic():1, irqs_disabled():1

	Call Trace: <IRQ> <ffffffff8011e2d7>{__might_sleep+199} <ffffffff80125316>{profile_task_exit+34}
	       <ffffffff80126fe2>{do_exit+34} <ffffffff801fc7d0>{vgacon_cursor+231}
	       <ffffffff8010f653>{kernel_math_error+0} <ffffffff8010fa09>{do_trap+264}
	       <ffffffff8010feb9>{do_invalid_op+145} <ffffffff80242734>{__blk_put_request+83}
	       <ffffffff801245d7>{printk+141} <ffffffff8010e415>{error_exit+0}
	       <ffffffff80242734>{__blk_put_request+83} <ffffffff8024272a>{__blk_put_request+73}
	       <ffffffff80280781>{scsi_end_request+164} <ffffffff802809eb>{scsi_io_completion+584}
	       <ffffffff80297059>{sd_rw_intr+709} <ffffffff8027aa08>{scsi_finish_command+182}
	       <ffffffff8027b2dc>{scsi_softirq+255} <ffffffff801291ea>{__do_softirq+110}
	       <ffffffff8010eb13>{call_softirq+31} <ffffffff801101be>{do_softirq+54}
	       <ffffffff80110211>{do_IRQ+74} <ffffffff8010deba>{ret_from_intr+0}
		<EOI> <ffffffff8010c2fd>{mwait_idle+86} <ffffffff8021aef0>{acpi_processor_idle+310}
	       <ffffffff8010cacb>{cpu_idle+79} <ffffffff804cecbf>{start_secondary+1017}

	Kernel panic - not syncing: Aiee, killing interrupt handler!

Adding scsi_target_quiesce() and scsi_target_resume() barriers around
the scsi_adjust_target_queue_depth() call appears to help (i.e.
dropping from 32 -> 24):

	# echo 24 > /sys/class/scsi_device/3\:0\:0\:0/device/queue_depth

and dropping down again to 16:

	# echo 16 > /sys/class/scsi_device/3\:0\:0\:0/device/queue_depth

but occasionally, while trying another depth drop:

	# echo 10 > /sys/class/scsi_device/3\:0\:0\:0/device/queue_depth

I'll either get a panic (haven't captured a good one yet (only a
couple of line within the trace):

	eip: ffffffff80248a62
	----------- [cut here ] --------- [please bite here ] ---------
	Kernel BUG at "include/asm/spinlock.h":121

or I get the following slab-error:

	slab error in cache_free_debugcheck(): cache `size-128': double free, or memory outside object was overwritten

	Call Trace:<ffffffff8014930c>{cache_free_debugcheck+290} <ffffffff8014975c>{kfree+136}
	       <ffffffff80244e65>{blk_queue_resize_tags+119} <ffffffff8027a826>{scsi_adjust_queue_depth+68}
	       <ffffffff88000133>{:qla2xxx:qla2x00_change_queue_depth+71}
	       <ffffffff80283666>{sdev_store_queue_depth_rw+82} <ffffffff8023a9a2>{dev_attr_store+31}
	       <ffffffff80191e95>{sysfs_write_file+200} <ffffffff80160dba>{vfs_write+172}
	       <ffffffff80160ed8>{sys_write+69} <ffffffff8010d8f6>{system_call+126}

	ffff8100389baba8: redzone 1: 0x170fc2a5, redzone 2: 0x0.

I'm using a fairly recent snapshot of Linus' GIT tree (sync done
earlier today).

Two questions:

 - must the target be quiesced before adjusting the queue-depth?

 - any ideas on where why successive lowering of the depth borks the
   machine?

Thanks,
Andrew Vasquez

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: calling scsi_adjust_queue_depth() during I/O...
  2005-08-04 23:41 calling scsi_adjust_queue_depth() during I/O Andrew Vasquez
@ 2005-08-05  7:57 ` Jens Axboe
  2005-08-05 11:09   ` Tejun Heo
  0 siblings, 1 reply; 12+ messages in thread
From: Jens Axboe @ 2005-08-05  7:57 UTC (permalink / raw)
  To: Andrew Vasquez; +Cc: Linux-SCSI Mailing List, Tejun Heo

On Thu, Aug 04 2005, Andrew Vasquez wrote:
> All,
> 
> While adding support for the new change_queue_depth/type() callbacks,
> 
> 	static int
> 	qla2x00_change_queue_depth(struct scsi_device *sdev, int qdepth)
> 	{
> 		scsi_adjust_queue_depth(sdev, scsi_get_tag_type(sdev), qdepth);
> 		return sdev->queue_depth;
> 	}
> 
> and updating the queue-depth:
> 
> 	# echo 16 > /sys/class/scsi_device/3:0:0:0/device/queue_depth
> 
> while I/O is running, I'm hitting a reproducible WARN_ON() triggering
> within as_completed_request():
> 
> 	static void as_completed_request(request_queue_t *q, struct request *rq)
> 	{
> 		struct as_data *ad = q->elevator->elevator_data;
> 		struct as_rq *arq = RQ_DATA(rq);
> 
> 		WARN_ON(!list_empty(&rq->queuelist));

Tejun, can you take a look at this please?

> 		...
> 
> and a subsequent panic:
> 
> 	Badness in as_completed_request at drivers/block/as-iosched.c:951
> 
> 	Call Trace: <IRQ> ffff8024883a>{as_completed_request+63} <ffffffff8024098d>{elv_completed_request+44}
> 	       <ffffffff8024272a>{__blk_put_request+73} <ffffffff80280781>{scsi_end_request+164}
> 	       <ffffffff802809eb>{scsi_io_completion+584} <ffffffff80297059>{sd_rw_intr+709}
> 	       <ffffffff8027aa08>{scsi_finish_command+182} <ffffffff8027b2dc>{scsi_softirq+255}
> 	       <ffffffff801291ea>{__do_softirq+110} <ffffffff8010eb13>{call_softirq+31}
> 	       <ffffffff801101be>{do_softirq+54} <ffffffff80110211>{do_IRQ+74}
> 	       <ffffffff8010deba>{ret_from_intr+0}  <EOI> <ffffffff8010c2fd>{mwait_idle+86}
> 	       <ffffffff8021aef0>{acpi_processor_idle+310} <ffffffff8010cacb>{cpu_idle+79}
> 	       <ffffffff804cecbf>{start_secondary+1017}
> 	----------- [cut here ] --------- [please bite here ] ---------
> 	Kernel BUG at "drivers/block/ll_rw_blk.c":2361
> 	invalid operand: 0000 [1] SMP
> 	CPU 2
> 	Modules linked in: qla2xxx
> 	Pid: 0, comm: swapper Not tainted 2.6.13-rc5
> 	RIP: 0010:[<ffffffff80242734>] <ffffffff80242734>{__blk_put_request+83}
> 	RSP: 0018:ffff8100021bbde8  EFLAGS: 00010087
> 	RAX: 0000000000000000 RBX: ffff81002dc738b0 RCX: 0000000000008000
> 	RDX: 0000000000004e6b RSI: 0000000000000004 RDI: ffff81003e091778
> 	RBP: ffff81003f8fa600 R08: 0000000000000000 R09: 0000000000000003
> 	R10: 00000000ffffffff R11: 0000000000000000 R12: 0000000000000000
> 	R13: 0000000000000001 R14: ffff81003f8fa600 R15: ffff81003f8fa600
> 	FS:  0000000000000000(0000) GS:ffffffff804b6900(0000) knlGS:0000000000000000
> 	CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> 	CR2: 00002aaaaaac1000 CR3: 0000000037f05000 CR4: 00000000000006e0
> 	Process swapper (pid: 0, threadinfo ffff8100021b6000, task ffff8100021b54f0)
> 	Stack: ffff81002dc738b0 ffff81002c1cd7c0 0000000000000286 ffffffff80280781
> 	       0000000000000001 ffff81002c1cd7c0 ffff81002dc738b0 0000000000000000
> 	       0000000000080000 ffffffff802809eb
> 	Call Trace: <IRQ> <ffffffff80280781>{scsi_end_request+164} <ffffffff802809eb>{scsi_io_completion+584}
> 	       <ffffffff80297059>{sd_rw_intr+709} <ffffffff8027aa08>{scsi_finish_command+182}
> 	       <ffffffff8027b2dc>{scsi_softirq+255} <ffffffff801291ea>{__do_softirq+110}
> 	       <ffffffff8010eb13>{call_softirq+31} <ffffffff801101be>{do_softirq+54}
> 	       <ffffffff80110211>{do_IRQ+74} <ffffffff8010deba>{ret_from_intr+0}
> 		<EOI> <ffffffff8010c2fd>{mwait_idle+86} <ffffffff8021aef0>{acpi_processor_idle+310}
> 	       <ffffffff8010cacb>{cpu_idle+79} <ffffffff804cecbf>{start_secondary+1017}
> 
> 	Code: 0f 0b a3 0b f2 32 80 ff ff ff ff c2 39 09 48 89 de 48 89 ef
> 	RIP <ffffffff80242734>{__blk_put_request+83} RSP <ffff8100021bbde8>
> 	 <3>Debug: sleeping function called from invalid context at include/linux/rwsem.h:43
> 	in_atomic():1, irqs_disabled():1
> 
> 	Call Trace: <IRQ> <ffffffff8011e2d7>{__might_sleep+199} <ffffffff80125316>{profile_task_exit+34}
> 	       <ffffffff80126fe2>{do_exit+34} <ffffffff801fc7d0>{vgacon_cursor+231}
> 	       <ffffffff8010f653>{kernel_math_error+0} <ffffffff8010fa09>{do_trap+264}
> 	       <ffffffff8010feb9>{do_invalid_op+145} <ffffffff80242734>{__blk_put_request+83}
> 	       <ffffffff801245d7>{printk+141} <ffffffff8010e415>{error_exit+0}
> 	       <ffffffff80242734>{__blk_put_request+83} <ffffffff8024272a>{__blk_put_request+73}
> 	       <ffffffff80280781>{scsi_end_request+164} <ffffffff802809eb>{scsi_io_completion+584}
> 	       <ffffffff80297059>{sd_rw_intr+709} <ffffffff8027aa08>{scsi_finish_command+182}
> 	       <ffffffff8027b2dc>{scsi_softirq+255} <ffffffff801291ea>{__do_softirq+110}
> 	       <ffffffff8010eb13>{call_softirq+31} <ffffffff801101be>{do_softirq+54}
> 	       <ffffffff80110211>{do_IRQ+74} <ffffffff8010deba>{ret_from_intr+0}
> 		<EOI> <ffffffff8010c2fd>{mwait_idle+86} <ffffffff8021aef0>{acpi_processor_idle+310}
> 	       <ffffffff8010cacb>{cpu_idle+79} <ffffffff804cecbf>{start_secondary+1017}
> 
> 	Kernel panic - not syncing: Aiee, killing interrupt handler!
> 
> Adding scsi_target_quiesce() and scsi_target_resume() barriers around
> the scsi_adjust_target_queue_depth() call appears to help (i.e.
> dropping from 32 -> 24):
> 
> 	# echo 24 > /sys/class/scsi_device/3\:0\:0\:0/device/queue_depth
> 
> and dropping down again to 16:
> 
> 	# echo 16 > /sys/class/scsi_device/3\:0\:0\:0/device/queue_depth
> 
> but occasionally, while trying another depth drop:
> 
> 	# echo 10 > /sys/class/scsi_device/3\:0\:0\:0/device/queue_depth
> 
> I'll either get a panic (haven't captured a good one yet (only a
> couple of line within the trace):
> 
> 	eip: ffffffff80248a62
> 	----------- [cut here ] --------- [please bite here ] ---------
> 	Kernel BUG at "include/asm/spinlock.h":121
> 
> or I get the following slab-error:
> 
> 	slab error in cache_free_debugcheck(): cache `size-128': double free, or memory outside object was overwritten
> 
> 	Call Trace:<ffffffff8014930c>{cache_free_debugcheck+290} <ffffffff8014975c>{kfree+136}
> 	       <ffffffff80244e65>{blk_queue_resize_tags+119} <ffffffff8027a826>{scsi_adjust_queue_depth+68}
> 	       <ffffffff88000133>{:qla2xxx:qla2x00_change_queue_depth+71}
> 	       <ffffffff80283666>{sdev_store_queue_depth_rw+82} <ffffffff8023a9a2>{dev_attr_store+31}
> 	       <ffffffff80191e95>{sysfs_write_file+200} <ffffffff80160dba>{vfs_write+172}
> 	       <ffffffff80160ed8>{sys_write+69} <ffffffff8010d8f6>{system_call+126}
> 
> 	ffff8100389baba8: redzone 1: 0x170fc2a5, redzone 2: 0x0.
> 
> I'm using a fairly recent snapshot of Linus' GIT tree (sync done
> earlier today).
> 
> Two questions:
> 
>  - must the target be quiesced before adjusting the queue-depth?
> 
>  - any ideas on where why successive lowering of the depth borks the
>    machine?
> 
> Thanks,
> Andrew Vasquez
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: calling scsi_adjust_queue_depth() during I/O...
  2005-08-05  7:57 ` Jens Axboe
@ 2005-08-05 11:09   ` Tejun Heo
  2005-08-05 11:43     ` Tejun Heo
  0 siblings, 1 reply; 12+ messages in thread
From: Tejun Heo @ 2005-08-05 11:09 UTC (permalink / raw)
  To: Jens Axboe, Andrew Vasquez; +Cc: Linux-SCSI Mailing List

 Hello, Andrew.  Hello, Jens.

On Fri, Aug 05, 2005 at 09:57:52AM +0200, Jens Axboe wrote:
> On Thu, Aug 04 2005, Andrew Vasquez wrote:
> > All,
> > 
> > While adding support for the new change_queue_depth/type() callbacks,
> > 
> > 	static int
> > 	qla2x00_change_queue_depth(struct scsi_device *sdev, int qdepth)
> > 	{
> > 		scsi_adjust_queue_depth(sdev, scsi_get_tag_type(sdev), qdepth);
> > 		return sdev->queue_depth;
> > 	}
> > 
> > and updating the queue-depth:
> > 
> > 	# echo 16 > /sys/class/scsi_device/3:0:0:0/device/queue_depth
> > 
> > while I/O is running, I'm hitting a reproducible WARN_ON() triggering
> > within as_completed_request():
> > 
> > 	static void as_completed_request(request_queue_t *q, struct request *rq)
> > 	{
> > 		struct as_data *ad = q->elevator->elevator_data;
> > 		struct as_rq *arq = RQ_DATA(rq);
> > 
> > 		WARN_ON(!list_empty(&rq->queuelist));
> 
> Tejun, can you take a look at this please?
> 

 Sure.

> > 		...
> > 
> > and a subsequent panic:
> > 
> > 	Badness in as_completed_request at drivers/block/as-iosched.c:951
> > 
> > 	Call Trace: <IRQ> ffff8024883a>{as_completed_request+63} <ffffffff8024098d>{elv_completed_request+44}
> > 	       <ffffffff8024272a>{__blk_put_request+73} <ffffffff80280781>{scsi_end_request+164}
> > 	       <ffffffff802809eb>{scsi_io_completion+584} <ffffffff80297059>{sd_rw_intr+709}
> > 	       <ffffffff8027aa08>{scsi_finish_command+182} <ffffffff8027b2dc>{scsi_softirq+255}
> > 	       <ffffffff801291ea>{__do_softirq+110} <ffffffff8010eb13>{call_softirq+31}
> > 	       <ffffffff801101be>{do_softirq+54} <ffffffff80110211>{do_IRQ+74}
> > 	       <ffffffff8010deba>{ret_from_intr+0}  <EOI> <ffffffff8010c2fd>{mwait_idle+86}
> > 	       <ffffffff8021aef0>{acpi_processor_idle+310} <ffffffff8010cacb>{cpu_idle+79}
> > 	       <ffffffff804cecbf>{start_secondary+1017}
> > 	----------- [cut here ] --------- [please bite here ] ---------
> > 	Kernel BUG at "drivers/block/ll_rw_blk.c":2361
> > 	invalid operand: 0000 [1] SMP
> > 	CPU 2
> > 	Modules linked in: qla2xxx
> > 	Pid: 0, comm: swapper Not tainted 2.6.13-rc5
> > 	RIP: 0010:[<ffffffff80242734>] <ffffffff80242734>{__blk_put_request+83}
> > 	RSP: 0018:ffff8100021bbde8  EFLAGS: 00010087
> > 	RAX: 0000000000000000 RBX: ffff81002dc738b0 RCX: 0000000000008000
> > 	RDX: 0000000000004e6b RSI: 0000000000000004 RDI: ffff81003e091778
> > 	RBP: ffff81003f8fa600 R08: 0000000000000000 R09: 0000000000000003
> > 	R10: 00000000ffffffff R11: 0000000000000000 R12: 0000000000000000
> > 	R13: 0000000000000001 R14: ffff81003f8fa600 R15: ffff81003f8fa600
> > 	FS:  0000000000000000(0000) GS:ffffffff804b6900(0000) knlGS:0000000000000000
> > 	CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> > 	CR2: 00002aaaaaac1000 CR3: 0000000037f05000 CR4: 00000000000006e0
> > 	Process swapper (pid: 0, threadinfo ffff8100021b6000, task ffff8100021b54f0)
> > 	Stack: ffff81002dc738b0 ffff81002c1cd7c0 0000000000000286 ffffffff80280781
> > 	       0000000000000001 ffff81002c1cd7c0 ffff81002dc738b0 0000000000000000
> > 	       0000000000080000 ffffffff802809eb
> > 	Call Trace: <IRQ> <ffffffff80280781>{scsi_end_request+164} <ffffffff802809eb>{scsi_io_completion+584}
> > 	       <ffffffff80297059>{sd_rw_intr+709} <ffffffff8027aa08>{scsi_finish_command+182}
> > 	       <ffffffff8027b2dc>{scsi_softirq+255} <ffffffff801291ea>{__do_softirq+110}
> > 	       <ffffffff8010eb13>{call_softirq+31} <ffffffff801101be>{do_softirq+54}
> > 	       <ffffffff80110211>{do_IRQ+74} <ffffffff8010deba>{ret_from_intr+0}
> > 		<EOI> <ffffffff8010c2fd>{mwait_idle+86} <ffffffff8021aef0>{acpi_processor_idle+310}
> > 	       <ffffffff8010cacb>{cpu_idle+79} <ffffffff804cecbf>{start_secondary+1017}
> > 
> > 	Code: 0f 0b a3 0b f2 32 80 ff ff ff ff c2 39 09 48 89 de 48 89 ef
> > 	RIP <ffffffff80242734>{__blk_put_request+83} RSP <ffff8100021bbde8>
> > 	 <3>Debug: sleeping function called from invalid context at include/linux/rwsem.h:43
> > 	in_atomic():1, irqs_disabled():1
> > 
> > 	Call Trace: <IRQ> <ffffffff8011e2d7>{__might_sleep+199} <ffffffff80125316>{profile_task_exit+34}
> > 	       <ffffffff80126fe2>{do_exit+34} <ffffffff801fc7d0>{vgacon_cursor+231}
> > 	       <ffffffff8010f653>{kernel_math_error+0} <ffffffff8010fa09>{do_trap+264}
> > 	       <ffffffff8010feb9>{do_invalid_op+145} <ffffffff80242734>{__blk_put_request+83}
> > 	       <ffffffff801245d7>{printk+141} <ffffffff8010e415>{error_exit+0}
> > 	       <ffffffff80242734>{__blk_put_request+83} <ffffffff8024272a>{__blk_put_request+73}
> > 	       <ffffffff80280781>{scsi_end_request+164} <ffffffff802809eb>{scsi_io_completion+584}
> > 	       <ffffffff80297059>{sd_rw_intr+709} <ffffffff8027aa08>{scsi_finish_command+182}
> > 	       <ffffffff8027b2dc>{scsi_softirq+255} <ffffffff801291ea>{__do_softirq+110}
> > 	       <ffffffff8010eb13>{call_softirq+31} <ffffffff801101be>{do_softirq+54}
> > 	       <ffffffff80110211>{do_IRQ+74} <ffffffff8010deba>{ret_from_intr+0}
> > 		<EOI> <ffffffff8010c2fd>{mwait_idle+86} <ffffffff8021aef0>{acpi_processor_idle+310}
> > 	       <ffffffff8010cacb>{cpu_idle+79} <ffffffff804cecbf>{start_secondary+1017}
> > 
> > 	Kernel panic - not syncing: Aiee, killing interrupt handler!
> > 
> > Adding scsi_target_quiesce() and scsi_target_resume() barriers around
> > the scsi_adjust_target_queue_depth() call appears to help (i.e.
> > dropping from 32 -> 24):
> > 
> > 	# echo 24 > /sys/class/scsi_device/3\:0\:0\:0/device/queue_depth
> > 
> > and dropping down again to 16:
> > 
> > 	# echo 16 > /sys/class/scsi_device/3\:0\:0\:0/device/queue_depth
> > 
> > but occasionally, while trying another depth drop:
> > 
> > 	# echo 10 > /sys/class/scsi_device/3\:0\:0\:0/device/queue_depth
> > 
> > I'll either get a panic (haven't captured a good one yet (only a
> > couple of line within the trace):
> > 
> > 	eip: ffffffff80248a62
> > 	----------- [cut here ] --------- [please bite here ] ---------
> > 	Kernel BUG at "include/asm/spinlock.h":121
> > 
> > or I get the following slab-error:
> > 
> > 	slab error in cache_free_debugcheck(): cache `size-128': double free, or memory outside object was overwritten
> > 
> > 	Call Trace:<ffffffff8014930c>{cache_free_debugcheck+290} <ffffffff8014975c>{kfree+136}
> > 	       <ffffffff80244e65>{blk_queue_resize_tags+119} <ffffffff8027a826>{scsi_adjust_queue_depth+68}
> > 	       <ffffffff88000133>{:qla2xxx:qla2x00_change_queue_depth+71}
> > 	       <ffffffff80283666>{sdev_store_queue_depth_rw+82} <ffffffff8023a9a2>{dev_attr_store+31}
> > 	       <ffffffff80191e95>{sysfs_write_file+200} <ffffffff80160dba>{vfs_write+172}
> > 	       <ffffffff80160ed8>{sys_write+69} <ffffffff8010d8f6>{system_call+126}
> > 
> > 	ffff8100389baba8: redzone 1: 0x170fc2a5, redzone 2: 0x0.
> > 
> > I'm using a fairly recent snapshot of Linus' GIT tree (sync done
> > earlier today).
> > 
> > Two questions:
> > 
> >  - must the target be quiesced before adjusting the queue-depth?
> > 
> >  - any ideas on where why successive lowering of the depth borks the
> >    machine?

 I think it's caused by using tag_index over its end.  The slab
corruption supports that.  I tried to fix this incorrectly in the
following post.

http://marc.theaimsgroup.com/?l=linux-kernel&m=111399756324813&w=2

 Good thing it didn't make into the tree, as tag map should never be
shrunk.  Thanks for not commiting it, Jens.  :-) Andrew, please try
the following quick fix (only fixes shrinking) and let me know how it
works.  If this is the right fix.  I'll generate a proper patch fixing
both shrinking and enlarging (this word sounds weird these days w/ all
those spams...).


diff --git a/drivers/block/ll_rw_blk.c b/drivers/block/ll_rw_blk.c
--- a/drivers/block/ll_rw_blk.c
+++ b/drivers/block/ll_rw_blk.c
@@ -784,16 +784,17 @@ init_tag_map(request_queue_t *q, struct 
 				__FUNCTION__, depth);
 	}
 
-	tag_index = kmalloc(depth * sizeof(struct request *), GFP_ATOMIC);
+	bits = (depth / BLK_TAGS_PER_LONG) + 1;
+
+	tag_index = kmalloc(bits * sizeof(struct request *), GFP_ATOMIC);
 	if (!tag_index)
 		goto fail;
 
-	bits = (depth / BLK_TAGS_PER_LONG) + 1;
 	tag_map = kmalloc(bits * sizeof(unsigned long), GFP_ATOMIC);
 	if (!tag_map)
 		goto fail;
 
-	memset(tag_index, 0, depth * sizeof(struct request *));
+	memset(tag_index, 0, bits * sizeof(struct request *));
 	memset(tag_map, 0, bits * sizeof(unsigned long));
 	tags->max_depth = depth;
 	tags->real_max_depth = bits * BITS_PER_LONG;

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: calling scsi_adjust_queue_depth() during I/O...
  2005-08-05 11:09   ` Tejun Heo
@ 2005-08-05 11:43     ` Tejun Heo
  2005-08-05 12:33       ` Tejun Heo
  0 siblings, 1 reply; 12+ messages in thread
From: Tejun Heo @ 2005-08-05 11:43 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Jens Axboe, Andrew Vasquez, Linux-SCSI Mailing List

Tejun Heo wrote:
>  Hello, Andrew.  Hello, Jens.
> 
> On Fri, Aug 05, 2005 at 09:57:52AM +0200, Jens Axboe wrote:
> 
>>On Thu, Aug 04 2005, Andrew Vasquez wrote:
>>
>>>All,
>>>
>>>While adding support for the new change_queue_depth/type() callbacks,
>>>
>>>	static int
>>>	qla2x00_change_queue_depth(struct scsi_device *sdev, int qdepth)
>>>	{
>>>		scsi_adjust_queue_depth(sdev, scsi_get_tag_type(sdev), qdepth);
>>>		return sdev->queue_depth;
>>>	}
>>>
>>>and updating the queue-depth:
>>>
>>>	# echo 16 > /sys/class/scsi_device/3:0:0:0/device/queue_depth
>>>
>>>while I/O is running, I'm hitting a reproducible WARN_ON() triggering
>>>within as_completed_request():
>>>
>>>	static void as_completed_request(request_queue_t *q, struct request *rq)
>>>	{
>>>		struct as_data *ad = q->elevator->elevator_data;
>>>		struct as_rq *arq = RQ_DATA(rq);
>>>
>>>		WARN_ON(!list_empty(&rq->queuelist));
>>
>>Tejun, can you take a look at this please?
>>
> 
> 
>  Sure.
> 
> 
>>>		...
>>>
>>>and a subsequent panic:
>>>
>>>	Badness in as_completed_request at drivers/block/as-iosched.c:951
>>>
>>>	Call Trace: <IRQ> ffff8024883a>{as_completed_request+63} <ffffffff8024098d>{elv_completed_request+44}
>>>	       <ffffffff8024272a>{__blk_put_request+73} <ffffffff80280781>{scsi_end_request+164}
>>>	       <ffffffff802809eb>{scsi_io_completion+584} <ffffffff80297059>{sd_rw_intr+709}
>>>	       <ffffffff8027aa08>{scsi_finish_command+182} <ffffffff8027b2dc>{scsi_softirq+255}
>>>	       <ffffffff801291ea>{__do_softirq+110} <ffffffff8010eb13>{call_softirq+31}
>>>	       <ffffffff801101be>{do_softirq+54} <ffffffff80110211>{do_IRQ+74}
>>>	       <ffffffff8010deba>{ret_from_intr+0}  <EOI> <ffffffff8010c2fd>{mwait_idle+86}
>>>	       <ffffffff8021aef0>{acpi_processor_idle+310} <ffffffff8010cacb>{cpu_idle+79}
>>>	       <ffffffff804cecbf>{start_secondary+1017}
>>>	----------- [cut here ] --------- [please bite here ] ---------
>>>	Kernel BUG at "drivers/block/ll_rw_blk.c":2361
>>>	invalid operand: 0000 [1] SMP
>>>	CPU 2
>>>	Modules linked in: qla2xxx
>>>	Pid: 0, comm: swapper Not tainted 2.6.13-rc5
>>>	RIP: 0010:[<ffffffff80242734>] <ffffffff80242734>{__blk_put_request+83}
>>>	RSP: 0018:ffff8100021bbde8  EFLAGS: 00010087
>>>	RAX: 0000000000000000 RBX: ffff81002dc738b0 RCX: 0000000000008000
>>>	RDX: 0000000000004e6b RSI: 0000000000000004 RDI: ffff81003e091778
>>>	RBP: ffff81003f8fa600 R08: 0000000000000000 R09: 0000000000000003
>>>	R10: 00000000ffffffff R11: 0000000000000000 R12: 0000000000000000
>>>	R13: 0000000000000001 R14: ffff81003f8fa600 R15: ffff81003f8fa600
>>>	FS:  0000000000000000(0000) GS:ffffffff804b6900(0000) knlGS:0000000000000000
>>>	CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
>>>	CR2: 00002aaaaaac1000 CR3: 0000000037f05000 CR4: 00000000000006e0
>>>	Process swapper (pid: 0, threadinfo ffff8100021b6000, task ffff8100021b54f0)
>>>	Stack: ffff81002dc738b0 ffff81002c1cd7c0 0000000000000286 ffffffff80280781
>>>	       0000000000000001 ffff81002c1cd7c0 ffff81002dc738b0 0000000000000000
>>>	       0000000000080000 ffffffff802809eb
>>>	Call Trace: <IRQ> <ffffffff80280781>{scsi_end_request+164} <ffffffff802809eb>{scsi_io_completion+584}
>>>	       <ffffffff80297059>{sd_rw_intr+709} <ffffffff8027aa08>{scsi_finish_command+182}
>>>	       <ffffffff8027b2dc>{scsi_softirq+255} <ffffffff801291ea>{__do_softirq+110}
>>>	       <ffffffff8010eb13>{call_softirq+31} <ffffffff801101be>{do_softirq+54}
>>>	       <ffffffff80110211>{do_IRQ+74} <ffffffff8010deba>{ret_from_intr+0}
>>>		<EOI> <ffffffff8010c2fd>{mwait_idle+86} <ffffffff8021aef0>{acpi_processor_idle+310}
>>>	       <ffffffff8010cacb>{cpu_idle+79} <ffffffff804cecbf>{start_secondary+1017}
>>>
>>>	Code: 0f 0b a3 0b f2 32 80 ff ff ff ff c2 39 09 48 89 de 48 89 ef
>>>	RIP <ffffffff80242734>{__blk_put_request+83} RSP <ffff8100021bbde8>
>>>	 <3>Debug: sleeping function called from invalid context at include/linux/rwsem.h:43
>>>	in_atomic():1, irqs_disabled():1
>>>
>>>	Call Trace: <IRQ> <ffffffff8011e2d7>{__might_sleep+199} <ffffffff80125316>{profile_task_exit+34}
>>>	       <ffffffff80126fe2>{do_exit+34} <ffffffff801fc7d0>{vgacon_cursor+231}
>>>	       <ffffffff8010f653>{kernel_math_error+0} <ffffffff8010fa09>{do_trap+264}
>>>	       <ffffffff8010feb9>{do_invalid_op+145} <ffffffff80242734>{__blk_put_request+83}
>>>	       <ffffffff801245d7>{printk+141} <ffffffff8010e415>{error_exit+0}
>>>	       <ffffffff80242734>{__blk_put_request+83} <ffffffff8024272a>{__blk_put_request+73}
>>>	       <ffffffff80280781>{scsi_end_request+164} <ffffffff802809eb>{scsi_io_completion+584}
>>>	       <ffffffff80297059>{sd_rw_intr+709} <ffffffff8027aa08>{scsi_finish_command+182}
>>>	       <ffffffff8027b2dc>{scsi_softirq+255} <ffffffff801291ea>{__do_softirq+110}
>>>	       <ffffffff8010eb13>{call_softirq+31} <ffffffff801101be>{do_softirq+54}
>>>	       <ffffffff80110211>{do_IRQ+74} <ffffffff8010deba>{ret_from_intr+0}
>>>		<EOI> <ffffffff8010c2fd>{mwait_idle+86} <ffffffff8021aef0>{acpi_processor_idle+310}
>>>	       <ffffffff8010cacb>{cpu_idle+79} <ffffffff804cecbf>{start_secondary+1017}
>>>
>>>	Kernel panic - not syncing: Aiee, killing interrupt handler!
>>>
>>>Adding scsi_target_quiesce() and scsi_target_resume() barriers around
>>>the scsi_adjust_target_queue_depth() call appears to help (i.e.
>>>dropping from 32 -> 24):
>>>
>>>	# echo 24 > /sys/class/scsi_device/3\:0\:0\:0/device/queue_depth
>>>
>>>and dropping down again to 16:
>>>
>>>	# echo 16 > /sys/class/scsi_device/3\:0\:0\:0/device/queue_depth
>>>
>>>but occasionally, while trying another depth drop:
>>>
>>>	# echo 10 > /sys/class/scsi_device/3\:0\:0\:0/device/queue_depth
>>>
>>>I'll either get a panic (haven't captured a good one yet (only a
>>>couple of line within the trace):
>>>
>>>	eip: ffffffff80248a62
>>>	----------- [cut here ] --------- [please bite here ] ---------
>>>	Kernel BUG at "include/asm/spinlock.h":121
>>>
>>>or I get the following slab-error:
>>>
>>>	slab error in cache_free_debugcheck(): cache `size-128': double free, or memory outside object was overwritten
>>>
>>>	Call Trace:<ffffffff8014930c>{cache_free_debugcheck+290} <ffffffff8014975c>{kfree+136}
>>>	       <ffffffff80244e65>{blk_queue_resize_tags+119} <ffffffff8027a826>{scsi_adjust_queue_depth+68}
>>>	       <ffffffff88000133>{:qla2xxx:qla2x00_change_queue_depth+71}
>>>	       <ffffffff80283666>{sdev_store_queue_depth_rw+82} <ffffffff8023a9a2>{dev_attr_store+31}
>>>	       <ffffffff80191e95>{sysfs_write_file+200} <ffffffff80160dba>{vfs_write+172}
>>>	       <ffffffff80160ed8>{sys_write+69} <ffffffff8010d8f6>{system_call+126}
>>>
>>>	ffff8100389baba8: redzone 1: 0x170fc2a5, redzone 2: 0x0.
>>>
>>>I'm using a fairly recent snapshot of Linus' GIT tree (sync done
>>>earlier today).
>>>
>>>Two questions:
>>>
>>> - must the target be quiesced before adjusting the queue-depth?
>>>
>>> - any ideas on where why successive lowering of the depth borks the
>>>   machine?
> 
> 
>  I think it's caused by using tag_index over its end.  The slab
> corruption supports that.  I tried to fix this incorrectly in the
> following post.
> 
> http://marc.theaimsgroup.com/?l=linux-kernel&m=111399756324813&w=2
> 

  Oops, forget about the previous mail.  Above patch make it into the 
tree and it's the source of the problem.  My git HEAD was pointing at 
the latest update but I haven't updated my cache, so I was looking at 
the old source tree.  My apologies for the hassle and the bug.

  Original code was broken in the following two points.

  * tag_index wasn't allocated fully
  * tag_map's extra bits were always initialized w/ 1's.

  The first bug is critical and the second bug prevents proper enlarging 
of tag map.  However, the second bug effectively masks the first bug 
avoiding critical problem.  My above mentioned patch broke things 
seriously when reducing tag size on flight.

  Again, my apologies and patch will soon follow.

-- 
tejun

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: calling scsi_adjust_queue_depth() during I/O...
  2005-08-05 11:43     ` Tejun Heo
@ 2005-08-05 12:33       ` Tejun Heo
  2005-08-05 15:55         ` Andrew Vasquez
  2005-08-05 16:32         ` James Bottomley
  0 siblings, 2 replies; 12+ messages in thread
From: Tejun Heo @ 2005-08-05 12:33 UTC (permalink / raw)
  To: Jens Axboe, Andrew Vasquez; +Cc: Linux-SCSI Mailing List

>  Oops, forget about the previous mail.  Above patch make it into the 
> tree and it's the source of the problem.  My git HEAD was pointing at 
> the latest update but I haven't updated my cache, so I was looking at 
> the old source tree.  My apologies for the hassle and the bug.
> 
>  Original code was broken in the following two points.
> 
>  * tag_index wasn't allocated fully
>  * tag_map's extra bits were always initialized w/ 1's.
> 
>  The first bug is critical and the second bug prevents proper enlarging 
> of tag map.  However, the second bug effectively masks the first bug 
> avoiding critical problem.  My above mentioned patch broke things 
> seriously when reducing tag size on flight.
> 
>  Again, my apologies and patch will soon follow.

 Here's the fix.  It basically revives bqt->real_max_depth sans
allocation optimization in init_tag_map.  I've also added a comment
explicitly noting that tag map cannot be shrunk to prevent other
morons like me.  :-( Please try this one and let me know how it works.
If this is the correct fix, I'll repost properly to Jens and lkml with
detailed explanation on how it was broken in the original code and how
I broke it with my previous patch.  Sorry.


diff --git a/drivers/block/ll_rw_blk.c b/drivers/block/ll_rw_blk.c
--- a/drivers/block/ll_rw_blk.c
+++ b/drivers/block/ll_rw_blk.c
@@ -719,7 +719,7 @@ struct request *blk_queue_find_tag(reque
 {
 	struct blk_queue_tag *bqt = q->queue_tags;
 
-	if (unlikely(bqt == NULL || tag >= bqt->max_depth))
+	if (unlikely(bqt == NULL || tag >= bqt->real_max_depth))
 		return NULL;
 
 	return bqt->tag_index[tag];
@@ -798,6 +798,7 @@ init_tag_map(request_queue_t *q, struct 
 
 	memset(tag_index, 0, depth * sizeof(struct request *));
 	memset(tag_map, 0, nr_ulongs * sizeof(unsigned long));
+	tags->real_max_depth = depth;
 	tags->max_depth = depth;
 	tags->tag_index = tag_index;
 	tags->tag_map = tag_map;
@@ -872,11 +873,22 @@ int blk_queue_resize_tags(request_queue_
 		return -ENXIO;
 
 	/*
+	 * if we already have large enough real_max_depth.  just
+	 * adjust max_depth.  *NOTE* as requests with tag value
+	 * between new_depth and real_max_depth can be in-flight, tag
+	 * map cannot be shrunk.
+	 */
+	if (new_depth <= bqt->real_max_depth) {
+		bqt->max_depth = new_depth;
+		return 0;
+	}
+
+	/*
 	 * save the old state info, so we can copy it back
 	 */
 	tag_index = bqt->tag_index;
 	tag_map = bqt->tag_map;
-	max_depth = bqt->max_depth;
+	max_depth = bqt->real_max_depth;
 
 	if (init_tag_map(q, bqt, new_depth))
 		return -ENOMEM;
@@ -913,7 +925,7 @@ void blk_queue_end_tag(request_queue_t *
 
 	BUG_ON(tag == -1);
 
-	if (unlikely(tag >= bqt->max_depth))
+	if (unlikely(tag >= bqt->real_max_depth))
 		/*
 		 * This can happen after tag depth has been reduced.
 		 * FIXME: how about a warning or info message here?
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -301,6 +301,7 @@ struct blk_queue_tag {
 	struct list_head busy_list;	/* fifo list of busy tags */
 	int busy;			/* current depth */
 	int max_depth;			/* what we will send to device */
+	int real_max_depth;		/* what the array can hold */
 	atomic_t refcnt;		/* map can be shared */
 };
 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: calling scsi_adjust_queue_depth() during I/O...
  2005-08-05 12:33       ` Tejun Heo
@ 2005-08-05 15:55         ` Andrew Vasquez
  2005-08-05 15:59           ` Jens Axboe
  2005-08-05 16:32         ` James Bottomley
  1 sibling, 1 reply; 12+ messages in thread
From: Andrew Vasquez @ 2005-08-05 15:55 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Jens Axboe, Linux-SCSI Mailing List

On Fri, 05 Aug 2005, Tejun Heo wrote:

> >  Oops, forget about the previous mail.  Above patch make it into the 
> > tree and it's the source of the problem.  My git HEAD was pointing at 
> > the latest update but I haven't updated my cache, so I was looking at 
> > the old source tree.  My apologies for the hassle and the bug.
> > 
> >  Original code was broken in the following two points.
> > 
> >  * tag_index wasn't allocated fully
> >  * tag_map's extra bits were always initialized w/ 1's.
> > 
> >  The first bug is critical and the second bug prevents proper enlarging 
> > of tag map.  However, the second bug effectively masks the first bug 
> > avoiding critical problem.  My above mentioned patch broke things 
> > seriously when reducing tag size on flight.
> > 
> >  Again, my apologies and patch will soon follow.
> 
>  Here's the fix.  It basically revives bqt->real_max_depth sans
> allocation optimization in init_tag_map.  I've also added a comment
> explicitly noting that tag map cannot be shrunk to prevent other
> morons like me.  :-( Please try this one and let me know how it works.
> If this is the correct fix, I'll repost properly to Jens and lkml with
> detailed explanation on how it was broken in the original code and how
> I broke it with my previous patch.  Sorry.

OK, 20 minutes into lowering and raising the queue-depth and
everything appears to be working fine.  I'll continue banging away
with my configuration and let you know if anything else comes up.
Looks good so far.

Thanks,
Andrew

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: calling scsi_adjust_queue_depth() during I/O...
  2005-08-05 15:55         ` Andrew Vasquez
@ 2005-08-05 15:59           ` Jens Axboe
  2005-08-05 17:15             ` Tejun Heo
  0 siblings, 1 reply; 12+ messages in thread
From: Jens Axboe @ 2005-08-05 15:59 UTC (permalink / raw)
  To: Andrew Vasquez; +Cc: Tejun Heo, Linux-SCSI Mailing List

On Fri, Aug 05 2005, Andrew Vasquez wrote:
> On Fri, 05 Aug 2005, Tejun Heo wrote:
> 
> > >  Oops, forget about the previous mail.  Above patch make it into the 
> > > tree and it's the source of the problem.  My git HEAD was pointing at 
> > > the latest update but I haven't updated my cache, so I was looking at 
> > > the old source tree.  My apologies for the hassle and the bug.
> > > 
> > >  Original code was broken in the following two points.
> > > 
> > >  * tag_index wasn't allocated fully
> > >  * tag_map's extra bits were always initialized w/ 1's.
> > > 
> > >  The first bug is critical and the second bug prevents proper enlarging 
> > > of tag map.  However, the second bug effectively masks the first bug 
> > > avoiding critical problem.  My above mentioned patch broke things 
> > > seriously when reducing tag size on flight.
> > > 
> > >  Again, my apologies and patch will soon follow.
> > 
> >  Here's the fix.  It basically revives bqt->real_max_depth sans
> > allocation optimization in init_tag_map.  I've also added a comment
> > explicitly noting that tag map cannot be shrunk to prevent other
> > morons like me.  :-( Please try this one and let me know how it works.
> > If this is the correct fix, I'll repost properly to Jens and lkml with
> > detailed explanation on how it was broken in the original code and how
> > I broke it with my previous patch.  Sorry.
> 
> OK, 20 minutes into lowering and raising the queue-depth and
> everything appears to be working fine.  I'll continue banging away
> with my configuration and let you know if anything else comes up.
> Looks good so far.

Thanks for fixing it so quickly, Tejun! I'll be on vacation next week,
can you make sure it gets to Andrew?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: calling scsi_adjust_queue_depth() during I/O...
  2005-08-05 12:33       ` Tejun Heo
  2005-08-05 15:55         ` Andrew Vasquez
@ 2005-08-05 16:32         ` James Bottomley
  2005-08-05 17:10           ` Tejun Heo
  1 sibling, 1 reply; 12+ messages in thread
From: James Bottomley @ 2005-08-05 16:32 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Jens Axboe, Andrew Vasquez, Linux-SCSI Mailing List

On Fri, 2005-08-05 at 21:33 +0900, Tejun Heo wrote:
>  Here's the fix.  It basically revives bqt->real_max_depth sans
> allocation optimization in init_tag_map.  I've also added a comment
> explicitly noting that tag map cannot be shrunk to prevent other
> morons like me.  :-( Please try this one and let me know how it works.
> If this is the correct fix, I'll repost properly to Jens and lkml with
> detailed explanation on how it was broken in the original code and how
> I broke it with my previous patch.  Sorry.

Actually, if you really want to adjust the array size downwards, there's
a way we can do it:

- If the bits that would be lost on shrinkage are all zero at the time
blk_queue_resize_tags() is called, that means that there are no
outstanding tags up there and the array can be shrunk immediately.

- If there are outstanding tags between the new and the old depth, the
array can be shrunk when the last one of these returns, say in
blk_rq_end_tag()

James



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: calling scsi_adjust_queue_depth() during I/O...
  2005-08-05 16:32         ` James Bottomley
@ 2005-08-05 17:10           ` Tejun Heo
  2005-08-05 17:20             ` James Bottomley
  2005-08-05 17:24             ` Andrew Vasquez
  0 siblings, 2 replies; 12+ messages in thread
From: Tejun Heo @ 2005-08-05 17:10 UTC (permalink / raw)
  To: James Bottomley; +Cc: Jens Axboe, Andrew Vasquez, Linux-SCSI Mailing List

On Fri, Aug 05, 2005 at 11:32:07AM -0500, James Bottomley wrote:
> On Fri, 2005-08-05 at 21:33 +0900, Tejun Heo wrote:
> >  Here's the fix.  It basically revives bqt->real_max_depth sans
> > allocation optimization in init_tag_map.  I've also added a comment
> > explicitly noting that tag map cannot be shrunk to prevent other
> > morons like me.  :-( Please try this one and let me know how it works.
> > If this is the correct fix, I'll repost properly to Jens and lkml with
> > detailed explanation on how it was broken in the original code and how
> > I broke it with my previous patch.  Sorry.
> 
> Actually, if you really want to adjust the array size downwards, there's
> a way we can do it:
> 
> - If the bits that would be lost on shrinkage are all zero at the time
> blk_queue_resize_tags() is called, that means that there are no
> outstanding tags up there and the array can be shrunk immediately.
> 
> - If there are outstanding tags between the new and the old depth, the
> array can be shrunk when the last one of these returns, say in
> blk_rq_end_tag()
> 

 Hello, James.

 Yes, we can do that, but I'm not sure if that would be necessary.
AFAIK, queues are normally not very deep and a tag only occupies one
pointer and one bit.  Also, the shrinking operation isn't very common,
at least for traditional SPI devices and SATA drives, I think.

 Are newer SCSI devices (say, SAS/iSCSI) different? - like having very
deep queue and needing dynamic queue depth adjustment?  If that's the
case, I think I can implement shrinking in a separate patch. (and try
not to screw up this time ;-)

 Thank you.

-- 
tejun

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: calling scsi_adjust_queue_depth() during I/O...
  2005-08-05 15:59           ` Jens Axboe
@ 2005-08-05 17:15             ` Tejun Heo
  0 siblings, 0 replies; 12+ messages in thread
From: Tejun Heo @ 2005-08-05 17:15 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Andrew Vasquez, Linux-SCSI Mailing List

On Fri, Aug 05, 2005 at 05:59:06PM +0200, Jens Axboe wrote:
> On Fri, Aug 05 2005, Andrew Vasquez wrote:
> > On Fri, 05 Aug 2005, Tejun Heo wrote:
> > 
> > > >  Oops, forget about the previous mail.  Above patch make it into the 
> > > > tree and it's the source of the problem.  My git HEAD was pointing at 
> > > > the latest update but I haven't updated my cache, so I was looking at 
> > > > the old source tree.  My apologies for the hassle and the bug.
> > > > 
> > > >  Original code was broken in the following two points.
> > > > 
> > > >  * tag_index wasn't allocated fully
> > > >  * tag_map's extra bits were always initialized w/ 1's.
> > > > 
> > > >  The first bug is critical and the second bug prevents proper enlarging 
> > > > of tag map.  However, the second bug effectively masks the first bug 
> > > > avoiding critical problem.  My above mentioned patch broke things 
> > > > seriously when reducing tag size on flight.
> > > > 
> > > >  Again, my apologies and patch will soon follow.
> > > 
> > >  Here's the fix.  It basically revives bqt->real_max_depth sans
> > > allocation optimization in init_tag_map.  I've also added a comment
> > > explicitly noting that tag map cannot be shrunk to prevent other
> > > morons like me.  :-( Please try this one and let me know how it works.
> > > If this is the correct fix, I'll repost properly to Jens and lkml with
> > > detailed explanation on how it was broken in the original code and how
> > > I broke it with my previous patch.  Sorry.
> > 
> > OK, 20 minutes into lowering and raising the queue-depth and
> > everything appears to be working fine.  I'll continue banging away
> > with my configuration and let you know if anything else comes up.
> > Looks good so far.
> 
> Thanks for fixing it so quickly, Tejun! I'll be on vacation next week,
> can you make sure it gets to Andrew?
> 

 Meaning... Andrew Morton, right?  I'll make that sure.  Have fun on
your vacation.  :-)

-- 
tejun

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: calling scsi_adjust_queue_depth() during I/O...
  2005-08-05 17:10           ` Tejun Heo
@ 2005-08-05 17:20             ` James Bottomley
  2005-08-05 17:24             ` Andrew Vasquez
  1 sibling, 0 replies; 12+ messages in thread
From: James Bottomley @ 2005-08-05 17:20 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Jens Axboe, Andrew Vasquez, Linux-SCSI Mailing List

On Sat, 2005-08-06 at 02:10 +0900, Tejun Heo wrote:
>  Yes, we can do that, but I'm not sure if that would be necessary.
> AFAIK, queues are normally not very deep and a tag only occupies one
> pointer and one bit.  Also, the shrinking operation isn't very common,
> at least for traditional SPI devices and SATA drives, I think.
> 
>  Are newer SCSI devices (say, SAS/iSCSI) different? - like having very
> deep queue and needing dynamic queue depth adjustment?  If that's the
> case, I think I can implement shrinking in a separate patch. (and try
> not to screw up this time ;-)

Well, yes, there are reasons for wanting deeper queues, but I'd leave it
for the time being.

What I'm looking into is support for aic7xxx/aic79xx queueing.  There,
the sequencer has to have a globally unique tag (from which it generates
the device locally unique tag internally).  That gives TCQ depths of up
to 512 I believe.  However, still probably not a significant waste of
memory to worry about.

James



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: calling scsi_adjust_queue_depth() during I/O...
  2005-08-05 17:10           ` Tejun Heo
  2005-08-05 17:20             ` James Bottomley
@ 2005-08-05 17:24             ` Andrew Vasquez
  1 sibling, 0 replies; 12+ messages in thread
From: Andrew Vasquez @ 2005-08-05 17:24 UTC (permalink / raw)
  To: Tejun Heo; +Cc: James Bottomley, Jens Axboe, Linux-SCSI Mailing List

On Sat, 06 Aug 2005, Tejun Heo wrote:

> On Fri, Aug 05, 2005 at 11:32:07AM -0500, James Bottomley wrote:
> > On Fri, 2005-08-05 at 21:33 +0900, Tejun Heo wrote:
> > >  Here's the fix.  It basically revives bqt->real_max_depth sans
> > > allocation optimization in init_tag_map.  I've also added a comment
> > > explicitly noting that tag map cannot be shrunk to prevent other
> > > morons like me.  :-( Please try this one and let me know how it works.
> > > If this is the correct fix, I'll repost properly to Jens and lkml with
> > > detailed explanation on how it was broken in the original code and how
> > > I broke it with my previous patch.  Sorry.
> > 
> > Actually, if you really want to adjust the array size downwards, there's
> > a way we can do it:
> > 
> > - If the bits that would be lost on shrinkage are all zero at the time
> > blk_queue_resize_tags() is called, that means that there are no
> > outstanding tags up there and the array can be shrunk immediately.
> > 
> > - If there are outstanding tags between the new and the old depth, the
> > array can be shrunk when the last one of these returns, say in
> > blk_rq_end_tag()
> > 
> 
>  Hello, James.
> 
>  Yes, we can do that, but I'm not sure if that would be necessary.
> AFAIK, queues are normally not very deep and a tag only occupies one
> pointer and one bit.  Also, the shrinking operation isn't very common,
> at least for traditional SPI devices and SATA drives, I think.
> 
>  Are newer SCSI devices (say, SAS/iSCSI) different? - like having very
> deep queue and needing dynamic queue depth adjustment?  If that's the
> case, I think I can implement shrinking in a separate patch. (and try
> not to screw up this time ;-)

Well from the fibre-channel side of the storage world, a piece of
storage (RAID box) is generally parcelled out to a large number of
hosts.  These boxes tend to have a finite amount of resources
available to service requests to those hosts, so depending of course
on the amount of traffic being directed to the storage, QUEUE_FULL
cases may arise causing a particular host (or a set of hosts), to
throttle down their queue-depths for some period of time.

The trick though, is to dynamically throttle the depth up so as to
fully utilise the shared resources of the storage.

--
Andrew Vasquez

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2005-08-05 17:24 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-08-04 23:41 calling scsi_adjust_queue_depth() during I/O Andrew Vasquez
2005-08-05  7:57 ` Jens Axboe
2005-08-05 11:09   ` Tejun Heo
2005-08-05 11:43     ` Tejun Heo
2005-08-05 12:33       ` Tejun Heo
2005-08-05 15:55         ` Andrew Vasquez
2005-08-05 15:59           ` Jens Axboe
2005-08-05 17:15             ` Tejun Heo
2005-08-05 16:32         ` James Bottomley
2005-08-05 17:10           ` Tejun Heo
2005-08-05 17:20             ` James Bottomley
2005-08-05 17:24             ` Andrew Vasquez

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.