From mboxrd@z Thu Jan 1 00:00:00 1970 From: Aboo Valappil Subject: Re: Linux Virtual SCSI HBAs and Virtual disks Date: Sun, 21 Jan 2007 20:53:01 +1100 Message-ID: <45B337FD.1050501@aboo.org> References: <45ACA765.3030701@aboo.org> <45AD80F3.6020100@torque.net> <45ADE01F.3020708@s5r6.in-berlin.de> <45ADF979.7090003@aboo.org> <45AEA118.5030608@torque.net> <45B336D3.5010106@aboo.org> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Return-path: Received: from ppp245-155.static.internode.on.net ([59.167.245.155]:42109 "EHLO goobu.aboo.org" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751319AbXAUJ6a (ORCPT ); Sun, 21 Jan 2007 04:58:30 -0500 In-Reply-To: <45B336D3.5010106@aboo.org> Sender: linux-scsi-owner@vger.kernel.org List-Id: linux-scsi@vger.kernel.org To: Aboo Valappil Cc: dougg@torque.net, Stefan Richter , linux-scsi@vger.kernel.org Also, the modified version is available in http://vscsihba.aboo.org/vscsihbav202.tgz. I actually uses the "openers" field in scsi_disk to find out if anyone has the scsi_device open or not. Aboo Aboo Aboo Valappil wrote: > Hi Doug Gilbert, > > sorry for the late reply. I am in the process of implementing sense > code and I will make it available. > > I tried the sdparms and it failed not due to lack of sense code and > status. What happened was that the user space SCSI target died due to > a unsupported SCSI command (bug in user space target). When it > crashed, the SCSI disk served by that user space target was opened by > sdparms. The driver removed the scsi_host which was attached to user > space target, thinking that the last registered user space part died. > I think those stack traces are due to the EH thread trying perform > some sort of recovery on SCSI command, but the scsi host has been > removed! > > To prevent this, I implemented some checks. When the last user space > application attached to the scsi_host, the driver will check to make > sure that there is no open SCSI devices on that scsi_host. If there is > some devices open, it will not remove the scsi_host. This kind of > approach should be ok because my design supports re-attaching a new > user space target with an existing scsi_host. The design also allows > to start multiple user space target to serve one scsi_host in the > kernel and there is no issue at all even if one dies. Whoever gets the > SCSI command will serve it and sleep if nothing available. > > Thanks > > Aboo > > > Douglas Gilbert wrote: >> Aboo Valappil wrote: >> >>> Hi All, >>> >>> Thanks everyone to have a look at this. >>> >>> I think i modified to have the latest kernel support. Unfortunately I >>> could not test it with 2.6.20 kernel due to some issues in my laptop >>> and >>> 2.6.20 kernel. But it should work with 2.6.20 with this modification. >>> >>> The modified version is available through >>> http://vscsihba.aboo.org/vscsihbav202.tgz. >>> >>> 1. I fixed the kmem_cache issue for sure. >>> 2. I think i got around with INIT_WORK ... Made the following >>> modifications ... >>> >> >> Perhaps you could get some of my scsi tools (e.g. >> sdparm and sg3_utils) and make sure that vscsihba >> can handle everything they can throw at it. >> If the user space doesn't support a SCSI command then >> your driver should fail gracefully (i.e. CHECK CONDITION, >> etc). >> >> Here is a worrying example: sdparm sends an INQUIRY >> and a couple of MODE SENSE(10) commands to a device. >> /dev/sda was created by your script: >> $ ./start_target.sh id=3 -files zz_lun0 >> >> $ sdparm /dev/sda >> /dev/sda: VirtualH VHD 0 >> >> $ >> >> >> However dmesg showed this: >> >> vscsihba:3: In Reset Device >> vscsihba:3: In Reset Device >> vscsihba:3: In Reset Device >> vscsihba:3: In Reset Device >> vscsihba:3: In Reset Device >> vscsihba:3: In Reset Device >> sd 0:0:0:0: SCSI error: return code = 0x00000002 >> end_request: I/O error, dev sda, sector 10240 >> Buffer I/O error on device sda, logical block 10240 >> vscsihba:3: In Reset Device >> vscsihba:3: In Reset Device >> vscsihba:3: In Reset Device >> vscsihba:3: In Reset Device >> vscsihba:3: In Reset Device >> vscsihba:3: In Reset Device >> sd 0:0:0:0: SCSI error: return code = 0x00000002 >> end_request: I/O error, dev sda, sector 10240 >> Buffer I/O error on device sda, logical block 10240 >> vscsihba:3: In Reset Device >> vscsihba:3: In Reset Device >> vscsihba:3: In Reset Device >> vscsihba:3: In Reset Device >> vscsihba:3: In Reset Device >> vscsihba:3: In Reset Device >> sd 0:0:0:0: SCSI error: return code = 0x00000002 >> end_request: I/O error, dev sda, sector 10240 >> Buffer I/O error on device sda, logical block 10240 >> vscsihba:3: In Reset Device >> vscsihba:3: In Reset Device >> vscsihba:3: In Reset Device >> vscsihba:3: In Reset Device >> vscsihba:3: In Reset Device >> vscsihba:3: In Reset Device >> sd 0:0:0:0: SCSI error: return code = 0x00000002 >> end_request: I/O error, dev sda, sector 10240 >> Buffer I/O error on device sda, logical block 10240 >> vscsihba:3: In Reset Device >> vscsihba:3: In Reset Device >> vscsihba:3: In Reset Device >> vscsihba:3: In Reset Device >> vscsihba:3: In Reset Device >> vscsihba:3: In Reset Device >> sd 0:0:0:0: SCSI error: return code = 0x00000002 >> end_request: I/O error, dev sda, sector 10240 >> Buffer I/O error on device sda, logical block 10240 >> vscsihba:3: In Reset Device >> vscsihba:3: In Reset Device >> vscsihba:3: In Reset Device >> vscsihba:3: In Reset Device >> vscsihba:3: In Reset Device >> vscsihba:3: In Reset Device >> sd 0:0:0:0: SCSI error: return code = 0x00000002 >> end_request: I/O error, dev sda, sector 10240 >> Buffer I/O error on device sda, logical block 10240 >> BUG: at kernel/sched.c:3388 sub_preempt_count() >> [] scsitap_eh_abort+0x1c/0x90 [vscsihba] >> [] scsi_error_handler+0x3e2/0xbe0 >> [] __sched_text_start+0x2f1/0x660 >> [] scsi_error_handler+0x0/0xbe0 >> [] kthread+0xa9/0xe0 >> [] kthread+0x0/0xe0 >> [] kernel_thread_helper+0x7/0x18 >> ======================= >> vscsihba:3: Abortng command serial number : 94 >> BUG: scheduling while atomic: scsi_eh_0/0x00000001/4749 >> [] __sched_text_start+0x484/0x660 >> [] autoremove_wake_function+0x1b/0x50 >> [] lock_timer_base+0x28/0x70 >> [] __mod_timer+0x92/0xd0 >> [] schedule_timeout+0x4b/0xd0 >> [] process_timeout+0x0/0x10 >> [] wait_for_completion_timeout+0x9c/0x130 >> [] default_wake_function+0x0/0x10 >> [] scsi_send_eh_cmnd+0x1b9/0x390 >> [] vprintk+0x1fe/0x3a0 >> [] scsi_delete_timer+0x15/0x60 >> [] scsi_eh_tur+0x34/0xa0 >> [] scsi_error_handler+0x429/0xbe0 >> [] __sched_text_start+0x2f1/0x660 >> [] scsi_error_handler+0x0/0xbe0 >> [] kthread+0xa9/0xe0 >> [] kthread+0x0/0xe0 >> [] kernel_thread_helper+0x7/0x18 >> ======================= >> vscsihba:3: Abortng command serial number : 95 >> vscsihba:3: In Reset Device >> BUG: scheduling while atomic: scsi_eh_0/0x00000001/4749 >> [] __sched_text_start+0x484/0x660 >> [] vprintk+0x1fe/0x3a0 >> [] lock_timer_base+0x28/0x70 >> [] __mod_timer+0x92/0xd0 >> [] schedule_timeout+0x4b/0xd0 >> [] process_timeout+0x0/0x10 >> [] wait_for_completion_timeout+0x9c/0x130 >> [] default_wake_function+0x0/0x10 >> [] scsi_send_eh_cmnd+0x1b9/0x390 >> [] scsi_delete_timer+0x15/0x60 >> [] scsi_eh_tur+0x34/0xa0 >> [] scsitap_eh_device_reset+0x1d/0x30 [vscsihba] >> [] scsi_error_handler+0x968/0xbe0 >> [] __sched_text_start+0x2f1/0x660 >> [] scsi_error_handler+0x0/0xbe0 >> [] kthread+0xa9/0xe0 >> [] kthread+0x0/0xe0 >> [] kernel_thread_helper+0x7/0x18 >> ======================= >> vscsihba:3: Abortng command serial number : 96 >> vscsihba:3: In Reset Host >> BUG: scheduling while atomic: scsi_eh_0/0x00000001/4749 >> [] __sched_text_start+0x484/0x660 >> [] lock_timer_base+0x28/0x70 >> [] __mod_timer+0x92/0xd0 >> [] schedule_timeout+0x4b/0xd0 >> [] process_timeout+0x0/0x10 >> [] msleep+0x25/0x30 >> [] scsi_try_host_reset+0xa1/0xd0 >> [] scsi_error_handler+0x710/0xbe0 >> [] __sched_text_start+0x2f1/0x660 >> [] scsi_error_handler+0x0/0xbe0 >> [] kthread+0xa9/0xe0 >> [] kthread+0x0/0xe0 >> [] kernel_thread_helper+0x7/0x18 >> ======================= >> BUG: scheduling while atomic: scsi_eh_0/0x00000001/4749 >> [] __sched_text_start+0x484/0x660 >> [] lock_timer_base+0x28/0x70 >> [] __mod_timer+0x92/0xd0 >> [] schedule_timeout+0x4b/0xd0 >> [] process_timeout+0x0/0x10 >> [] wait_for_completion_timeout+0x9c/0x130 >> [] default_wake_function+0x0/0x10 >> [] scsi_send_eh_cmnd+0x1b9/0x390 >> [] __sched_text_start+0x2f1/0x660 >> [] __mod_timer+0x92/0xd0 >> [] schedule_timeout+0x52/0xd0 >> [] scsi_eh_tur+0x34/0xa0 >> [] scsi_error_handler+0x760/0xbe0 >> [] __sched_text_start+0x2f1/0x660 >> [] scsi_error_handler+0x0/0xbe0 >> [] kthread+0xa9/0xe0 >> [] kthread+0x0/0xe0 >> [] kernel_thread_helper+0x7/0x18 >> ======================= >> vscsihba:3: Abortng command serial number : 97 >> sd 0:0:0:0: scsi: Device offlined - not ready after error recovery >> BUG: scheduling while atomic: scsi_eh_0/0x00000001/4749 >> [] __sched_text_start+0x484/0x660 >> [] module_put+0x31/0x60 >> [] scsi_device_put+0x3e/0x40 >> [] __scsi_iterate_devices+0x6f/0x90 >> [] scsi_error_handler+0x46/0xbe0 >> [] scsi_error_handler+0x0/0xbe0 >> [] kthread+0xa9/0xe0 >> [] kthread+0x0/0xe0 >> [] kernel_thread_helper+0x7/0x18 >> ======================= >> >> >> Doug Gilbert >> > >