linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Request for review of Linux iSCSI driver version 4.0.0.1
@ 2003-10-23 12:04 Krishna Murthy
  2003-10-27 15:39 ` Christoph Hellwig
  0 siblings, 1 reply; 102+ messages in thread
From: Krishna Murthy @ 2003-10-23 12:04 UTC (permalink / raw)
  To: linux-scsi; +Cc: davmyers

Hi,
	Linux iSCSI driver 4.0.0.1 (linux-iscsi-4.0.0.1.tgz) is available at

http://sf.net/projects/linux-iscsi/

	 Could you please review the same and let us know whether it is OK for 
inclusion into the linux 2-6 kernel?

The following are the changes w.r.t  version 3.4 of our driver which we 
had earlier sent for review:
 1)Compatible with linux -2.6.0 kernel. 
 2)Review comments from linux-scsi have been incorporated:
     - Indentation /Coding style has been taken care of.
     - Device probing and /dev/iscsi links creation have been moved from
	kernel to user mode.
     - linked list macros from list.h is being used now.
     -few of our kernel files were including <endian.h>. This has been 
rectified.



Thanx
N.C.Krishna Murthy

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-10-23 12:04 Request for review of Linux iSCSI driver version 4.0.0.1 Krishna Murthy
@ 2003-10-27 15:39 ` Christoph Hellwig
  2003-10-29 13:23   ` Surekha.PC
                     ` (6 more replies)
  0 siblings, 7 replies; 102+ messages in thread
From: Christoph Hellwig @ 2003-10-27 15:39 UTC (permalink / raw)
  To: Krishna Murthy; +Cc: linux-scsi, davmyers

On Thu, Oct 23, 2003 at 05:34:10PM +0530, Krishna Murthy wrote:
> Hi,
> 	Linux iSCSI driver 4.0.0.1 (linux-iscsi-4.0.0.1.tgz) is available at
> 
> http://sf.net/projects/linux-iscsi/

Here's a list of a few comments I've come up with.  I haven't yet looked
at the data structurws much, but it seems there's a complete lack of
refcounting.  Also there's lots of comments were ysou complain about scsi
midlayer misbehaviour or bugs - please submit patches or submit workarounds
when you encounter them, that's what open source is all about!

Also the same question that went to the unh folks goes to you:  why do
you think your implementation is better than theirs and why do you think
cooperation on one iscsi stack is impossible.

here's the list:


highlevel:
	- kill md5.c and iscsi-crc.c and use common cryptoapi
	md5 / common crc code instead.
	- please kill you thousands of atoi/aton reimplementations
	and look for proper linux functions like simple_strtoul.

README:
	- very outdated with 2.4/vendor issues

Makefile:
	- lots of unused kernel detection stuff

iscis-kernel.h:
	- only backward-compat cruft, should go away.

iscsi.c:
	- doesn't need at least smp_lock.h - please audit header
	usage.
	- kill stupid MODULE/MODULE_LICENSE ifdefs.
	- kill = 0 / = NULL initializers for static variables, not needed.
	- kill LINK_DIR/session->target_link_dir - kernel must
	  not know about device pathes.
	- kill max_*_devices - not needed in kernelspace (and wrong
	  in 2.6)
	- fix up strange indentation artefacts from indent, like:

static
struct iscsi_trace_entry
    trace_table[ISCSI_TRACE_COUNT];

should be:

static struct iscsi_trace_entry trace_table[ISCSI_TRACE_COUNT];

	- first argument to memset is void * - don't cast pointers
	to char *
	- dito for memcpy
	- lose stupid braces, e.g.

list_del_init(&(task->task_group_link));

should be just

list_del_init(&task->task_group_link);

	- don't use <hosts.h> but <scsi/scsi_host.h>, similarly
	use <scsi/scsi*.h> instead of scsi.h if possible.  There's
	still some stuff not moved away from scsi.h yet, but you
	should at least avoid anything explicitly marked deprecated
	in that file and not in <scsi/scsi_*.h>
	- no need to implement ->slave_alloc and ->slave_destroy if
	they are noops anyway.
	- kill that slab name randomization - if kmem_cache_destroy
	fails you module is buggy and needs to be fixed.
	- instead of the single linked list of commands use a 
	lists.h list in the scsi_pointer
	- host and hba can't be NULL in ->queuecommand per defintion,
	similarly id, lun cdbsize and use_sg can't be bigger than you
	set in the template.
	- iscsi_info is far too noisy.  I'd suggest killing it and
	just setting the name field in the template propely
	- the procfs code is a mess.  Please move it over to
	proper per-device / per-host sysfs attributes as procfs
	support in HBA drivers is deprecated.
	- kill ctl_open/ctl_release and move add an owner to the
	fops structure for refcounting.
	- the ifdef MODULE around iscsi_init/iscsi_cleanup is bogus
	- lots of ifdef mess - iscsi_kernel.h still has lots of
  	  old kernel compat stuff - kill it.
	- target_reset_occured() is a dup of scsi_report_device_reset()
	- initialization is a total mess.  You try to emulated the
	  pre-2.6 init ordering.  Use something like:

	shost = scsi_host_alloc();
	<content of ->detect>
	scsi_add_host()

	instead, and read the comment abote .legacy_hosts in the
	scsi_host_template defintion!  Similar inline the code
	in iscsi_release into the exitfunc.
	- iscsi.c is _far_ too big.  Needs to be split into a few
	files.  I'd suggest avoiding the name iscsi.c so you can
	build into iscsi.ko
	- the ioctl API looks very broken, I need more time to
	audit it.  As a start slit ctl_ioctl into a function per ioctl
	subcommand.
	- why can't you store a pointer to the session in host-private
	data of the scsi_device so you don't have to search in each
	->queuecommand?
	- kill the is_root_disk special casing.  drivers can't know
	what the root disks is, there might not even be a single one.
	- compilation fails for me in line 480, looks like this is because
	the left-over ifdefs are wrong - just use set_user_nice unconditionally.
	Or even better stop messing with priorities, kill iscsi_daemonize and
	use daemonize() directly.
	- having multiple kmap() in the same process at the same time can kmap(),
	you redesign iscsi_xmit_task/iscsi_xmit_data/iscsi_recv_data not to do that.
	for the tx path use ->sendpage to avoid a data copy and kmapping altogether.
	- the explanation of PREVENT_DATA_CORRUPTION doesn't make sense.
	- why do you have SLAB_NO_REAP optional?  Either it works or it doesn't..
	- please stop that preallocation, you destroy the effect of per-cpu
	caches with that.  
	- you're only allocating one hba struct but still do complex list walking
	to find.  Either move to multiple hbas (and proper standard lists) or
	kill all that overhead.
	- INVALID_ORDERING_ASSUMPTIONS should probably a rutime option (?)
	- what the heck is iscsi_set_if_addr?  You have absolutely no business
	messing with network device configuration.
	- device_flags() is strange.  Better put a separately allocated struct
	into the scsi_device (allocated in slave_alloc, freed in slave_destroy)
	that only contains the bitmask for now.  (and maybe the session, see
	above)
	- similar for command flags - put your own overlay over the scsi_pointer,
	including a bitmask.
	- kfree(NULL) is fine, don't check for variables beeing non-NULL before
	freeing them.

memset(session, 0, sizeof (*session));
kfree(session);

	is rather bad for the cache.

	- this comment indicates you don't want normal EH:

 * check condition with no sense.  We need to avoid this, 
 * since the Linux SCSI code could put the command in 
 * SCSI_STATE_FAILED, which it's error recovery doesn't appear 
 * to handle correctly, and even if it does, we're trying to 
 * bypass all of the Linux error recovery code
 * to avoid blocking all I/O to the HBA.  Fake some sense 
 * data that gets a retry from Linux.

	so don't use scsi_error.c and define your own eh_strategy_handler method.

	- please explain on this:

 * Philsophically we ought to complete the command and let the
 * midlayer or high-level driver deal with retries.  Since the
 * way the midlayer does retries is undesirable, we instead
 * keep the command in the driver, but requeue it for the same
 * cases the midlayer checks for retries.  This lets us ignore
 * the command's retry count, and do retries until the command
 * timer expires.

	why can't you simply mess with the retry count instead
	of duplicating all the retry logic.  really, the midlayer
	queues are there for a reason.

	- this doesn't seem to be an issue for a 2.6-only driver:

 * not to log?  In 2.5 we can put a pointer in
 * Scsi_Device->hostdata, but 2.4 doesn't
  ...


	- your probably want to rewrite your TCQ code to use the
	block layer tcq code, see include/linux/scsi_tcq.h

	- there's lots of if (sc->device) checks where sc is a
	scsi_cmnd - but the device field never can be NULL for
	those, that's guaranteed by scsi_get_command.  Indeed you
	have far too many checks for impossible conditions.  Just
	use BUG_ON for those if you don't trust the midlayer, but
	trying to handle those conditions is just insane.

	- replace iscsi_inet_aton with in_aton?

 *  FIXME: it'd probably be cleaner to move the timeout logic to the rx thread.
 *         The only danger is if the rx thread somehow blocks indefinately.
 *         Doing timeouts here makes sure the timeouts get checked, at the
 *         cost of having this code constantly loop.

	- yes, you really need to reduce the amount of threads in the driver..
	- shost->can_queue is not supposed to be changed once the host is online,
	  as it's not locked down.
	- please produce a simple testcase for the bug mentioned above
	  iscsi_eh_device_reset.
	- the bug mentioned above iscsi_eh_bus_reset should be long fixed.  if
	not please report to linux-scsi.
	- is the iscsi fake geometry documented somewhere?  If not why don't
	you simply use the default, CAM one?


iscsi-probe.c:
	- lots of unchecked kmalloc()s
	- you can't allocate scsi_cmnds yourself but _must_ use scsi_get_command
	- dito for scsi_device.
	- the whole iscsi_do_cmnd use looks very very strange.  you're probably
	much better of to use the scsi_request based interfaces and use midlayer
	queueing and sometimes even higher level interfaces, e.g. your iscsi_send_tur
	coudld be replaced by an scsi_ioctl(sdp, SCSI_IOCTL_TEST_UNIT_READY, NULL)
	I still don't understand, though why the heck you need to do a TUR in
	a lowlevel driver and even more need a thread do a single TUR, started from
	another kernel tread.  there more I dig into it your probing and thread
	abuse looks horribly screwed.
	- reinitilialize disk is another such candidate.  again why do you
	need to send START_STOP commands from the LLDD, why do you need another
	thread instead of a state machine and again life would be a lot simpler
	by using the midlayer queueing infrastructure.
	- dito for the reports luns and inquiry code in iscsi_detect_luns.
	I'm very very unhappy to see a lowlevel driver mess with all this.
	Even if you need to do it from a LLDD use the midlayer functionality for
	it.  But I still want to see a very good explanation why you need this at all.

iscsi-kernel-event.c:
	- sleep_on is racy.
	- the concept of this is screwed.  (See below)

iscsi-event-handler.c:
	- the kernel-event mechanism is totally bogus.  you're not going
	to wait in kernelspace for an userspace upcall doing procfs
	stuff.  As I already said procfs might not even be mounted.
	use scsi_add_device/scsi_remove_device and kill that stupid
	symlinking crap.  Really, if you driver has to deal with
	pathnames something is horribly wrong. If you really want
	devices to show up in /dev/iscsi/ submit patches to udev.

iscsi.h:
	- Linux now has (lowecase) versions of DRIVER_BYTE & co.
	- Linux now has unshifted status codes

^ permalink raw reply	[flat|nested] 102+ messages in thread

* RE: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-10-27 15:39 ` Christoph Hellwig
@ 2003-10-29 13:23   ` Surekha.PC
  2003-10-29 13:45     ` 'Christoph Hellwig'
  2003-11-06  9:42   ` Sachin Mhatre (smhatre)
                     ` (5 subsequent siblings)
  6 siblings, 1 reply; 102+ messages in thread
From: Surekha.PC @ 2003-10-29 13:23 UTC (permalink / raw)
  To: 'Christoph Hellwig'; +Cc: linux-scsi, davmyers


Hi,

 I have interspersed my answers below for 2 of your comments.

>	- kill the is_root_disk special casing.  drivers can't know
>	what the root disks is, there might not even be a single one.

This flag is needed for our iSCSI network boot implementation to check
if we are booted on iSCSI disk. This is used while trying to shutdown
the iSCSI service.
	
>	- what the heck is iscsi_set_if_addr?  You have absolutely no
business
>	messing with network device configuration.

This is again needed for network boot. The n/w driver and iSCSI driver
are loaded during early boot. Since the n/w interface is not setup at
that time, we need to bring up the interface through this call in iSCSI
driver.

Thanks,
surekha
 

	


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-10-29 13:23   ` Surekha.PC
@ 2003-10-29 13:45     ` 'Christoph Hellwig'
  2003-10-29 17:28       ` Mike Christie
  2003-11-11 11:56       ` Naveen Burmi
  0 siblings, 2 replies; 102+ messages in thread
From: 'Christoph Hellwig' @ 2003-10-29 13:45 UTC (permalink / raw)
  To: Surekha.PC; +Cc: 'Christoph Hellwig', linux-scsi, davmyers

On Wed, Oct 29, 2003 at 06:53:43PM +0530, Surekha.PC wrote:
> 
> Hi,
> 
>  I have interspersed my answers below for 2 of your comments.
> 
> >	- kill the is_root_disk special casing.  drivers can't know
> >	what the root disks is, there might not even be a single one.
> 
> This flag is needed for our iSCSI network boot implementation to check
> if we are booted on iSCSI disk. This is used while trying to shutdown
> the iSCSI service.

What if the root device changes (pivot_root syscall, namespaces)?  You need
to find a way to do what you want without having knowledge about beeing
a root device in the LLDD - there might not be a single root device and it
could change all the time.

> This is again needed for network boot. The n/w driver and iSCSI driver
> are loaded during early boot. Since the n/w interface is not setup at
> that time, we need to bring up the interface through this call in iSCSI
> driver.

That's bogus.  Look at the ip autoconfig code used by rootnfs and reuse
that instead of creating such bogus hacks.  A driver has no business dealing
with the network configuration.


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-10-29 13:45     ` 'Christoph Hellwig'
@ 2003-10-29 17:28       ` Mike Christie
  2003-10-29 18:45         ` Clay Haapala
  2003-10-30 13:34         ` jd
  2003-11-11 11:56       ` Naveen Burmi
  1 sibling, 2 replies; 102+ messages in thread
From: Mike Christie @ 2003-10-29 17:28 UTC (permalink / raw)
  To: 'Christoph Hellwig'
  Cc: Surekha.PC, linux-scsi, davmyers, Richard Bealkowski

Hi Surekha,

>>This is again needed for network boot. The n/w driver and iSCSI driver
>>are loaded during early boot. Since the n/w interface is not setup at
>>that time, we need to bring up the interface through this call in iSCSI
>>driver.
> 
> 
> That's bogus.  Look at the ip autoconfig code used by rootnfs and reuse
> that instead of creating such bogus hacks.  A driver has no business dealing
> with the network configuration.
> 

Your driver does not even need to mess with that code. Just modify your 
usersapce tools in the linux-iscsi package that run in the initrd to do 
this in userspace.


Mike


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-10-29 17:28       ` Mike Christie
@ 2003-10-29 18:45         ` Clay Haapala
  2003-10-29 19:01           ` Mike Christie
  2003-10-30 13:34         ` jd
  1 sibling, 1 reply; 102+ messages in thread
From: Clay Haapala @ 2003-10-29 18:45 UTC (permalink / raw)
  To: linux-scsi

On Wed, 29 Oct 2003, Mike Christie outgrape:
> Hi Surekha,
> 
>>>This is again needed for network boot. The n/w driver and iSCSI
>>>driver are loaded during early boot. Since the n/w interface is not
>>>setup at that time, we need to bring up the interface through this
>>>call in iSCSI driver.
>> That's bogus.  Look at the ip autoconfig code used by rootnfs and
>> reuse that instead of creating such bogus hacks.  A driver has no
>> business dealing with the network configuration.
>>
> 
> Your driver does not even need to mess with that code. Just modify
> your usersapce tools in the linux-iscsi package that run in the
> initrd to do this in userspace.

Mike,

This may turn out to be a discussion about where this code resides and
what code is best to (re)use, not if it is needed.  "Early boot" here
is from PXE -- setting up the iSCSI initiator to reach a target at
some IP where the rest of the boot (initrd and friends) is.

I have not yet looked at the ip autoconfig code that Chris mentions
above, though.  Can't comment on that.
-- 
Clay Haapala (chaapala@cisco.com) Cisco Systems SRBU +1 763-398-1056
   6450 Wedgwood Rd, Suite 130 Maple Grove MN 55311 PGP: C89240AD
      "They're selling independence, Actors in the White House,
       Acid indigestion, Mortage on my life."
         -- from "Ramble Tamble" by CCR 

 "Vote for me if you want to live" -- the Gubernator

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-10-29 18:45         ` Clay Haapala
@ 2003-10-29 19:01           ` Mike Christie
  2003-10-29 19:17             ` Clay Haapala
  0 siblings, 1 reply; 102+ messages in thread
From: Mike Christie @ 2003-10-29 19:01 UTC (permalink / raw)
  To: Clay Haapala; +Cc: linux-scsi


> This may turn out to be a discussion about where this code resides and
> what code is best to (re)use, not if it is needed.  "Early boot" here
> is from PXE -- setting up the iSCSI initiator to reach a target at
> some IP where the rest of the boot (initrd and friends) is.
> 

Oops. I forgot you guys use iNBP.com. I was thinking of pxelinux.

Mike


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-10-29 19:01           ` Mike Christie
@ 2003-10-29 19:17             ` Clay Haapala
  2003-10-29 19:33               ` Mike Christie
  0 siblings, 1 reply; 102+ messages in thread
From: Clay Haapala @ 2003-10-29 19:17 UTC (permalink / raw)
  To: Mike Christie; +Cc: linux-scsi

On Wed, 29 Oct 2003, Mike Christie outgrape:
> 
>> This may turn out to be a discussion about where this code resides
>> and what code is best to (re)use, not if it is needed.  "Early
>> boot" here is from PXE -- setting up the iSCSI initiator to reach a
>> target at some IP where the rest of the boot (initrd and friends)
>> is.
>>
> 
> Oops. I forgot you guys use iNBP.com. I was thinking of pxelinux.
> 
> Mike
Oh, now I have to find out about pxelinux ... google ... OK. :-)

Yeah, like that only different -- using iSCSI to a target to fetch
blocks rather than TFTP of files from a filestore.
-- 
Clay Haapala (chaapala@cisco.com) Cisco Systems SRBU +1 763-398-1056
   6450 Wedgwood Rd, Suite 130 Maple Grove MN 55311 PGP: C89240AD
      "They're selling independence, Actors in the White House,
       Acid indigestion, Mortage on my life."
         -- from "Ramble Tamble" by CCR 

 "Vote for me if you want to live" -- the Gubernator

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-10-29 19:17             ` Clay Haapala
@ 2003-10-29 19:33               ` Mike Christie
  2003-10-30 23:42                 ` Andre Hedrick
  0 siblings, 1 reply; 102+ messages in thread
From: Mike Christie @ 2003-10-29 19:33 UTC (permalink / raw)
  To: Clay Haapala; +Cc: linux-scsi

Clay Haapala wrote:

> On Wed, 29 Oct 2003, Mike Christie outgrape:
> 
>>>This may turn out to be a discussion about where this code resides
>>>and what code is best to (re)use, not if it is needed.  "Early
>>>boot" here is from PXE -- setting up the iSCSI initiator to reach a
>>>target at some IP where the rest of the boot (initrd and friends)
>>>is.
>>>
>>
>>Oops. I forgot you guys use iNBP.com. I was thinking of pxelinux.
>>
>>Mike
> 
> Oh, now I have to find out about pxelinux ... google ... OK. :-)
> 
> Yeah, like that only different -- using iSCSI to a target to fetch
> blocks rather than TFTP of files from a filestore.

Once inbp.com reads in the kernel and initrd, it looks like your linuxrc 
just insmods the netwrok driver and iscsi mod. At this point (after you 
insmod the network driver but before you insmod iscsi.ko.) you can start 
the networking interface from userspace. Include a statically compiled 
dhcp program and run it from the linuxrc.

Mike


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-10-29 17:28       ` Mike Christie
  2003-10-29 18:45         ` Clay Haapala
@ 2003-10-30 13:34         ` jd
  1 sibling, 0 replies; 102+ messages in thread
From: jd @ 2003-10-30 13:34 UTC (permalink / raw)
  To: Mike Christie, 'Christoph Hellwig'
  Cc: Surekha.PC, linux-scsi, davmyers, Richard Bealkowski


--- Mike Christie <mikenc@us.ibm.com> wrote:
> Hi Surekha,
> 
> >>This is again needed for network boot. The n/w driver and iSCSI
> driver
> >>are loaded during early boot. Since the n/w interface is not setup
> at
> >>that time, we need to bring up the interface through this call in
> iSCSI
> >>driver.
> > 
> > 
> > That's bogus.  Look at the ip autoconfig code used by rootnfs and
> reuse
> > that instead of creating such bogus hacks.  A driver has no
> business dealing
> > with the network configuration.
> > 
> 
> Your driver does not even need to mess with that code. Just modify
> your 
> usersapce tools in the linux-iscsi package that run in the initrd to
> do 
> this in userspace.
> 
> 
> Mike
> 

  Hi Surekha,

  You can use a set of tools from "busybox" that 
  includes a small statically linked "shell" and
  dhcp client utility program(s) and create a initrd 
  (ramdisk) that download via pxelinux.

   The linuxrc script would then bring up the 
   network something like :

   insmod <your nic>
   ifconfig up eth0 my_ip_addr
   dhcpl  >  target_info   # query the server 
   insmod iscsi_mod 
   iscsi_conf   cat `target_info`  # user utility to start iscsi

  Ping me offline and I can share the 
  diskless boot sequence I did with unh-iscsi
  ramdisk if your interested.

  
 John Donnelly  AATTTT   HH PP  dott COM 


__________________________________
Do you Yahoo!?
Exclusive Video Premiere - Britney Spears
http://launch.yahoo.com/promos/britneyspears/

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-10-29 19:33               ` Mike Christie
@ 2003-10-30 23:42                 ` Andre Hedrick
  0 siblings, 0 replies; 102+ messages in thread
From: Andre Hedrick @ 2003-10-30 23:42 UTC (permalink / raw)
  To: Mike Christie; +Cc: Clay Haapala, linux-scsi


How about if IBM releases the patent clauses on iBOOT so that any body can
use the INT13/INT19 BIOS callers so Linux can be found as a bootable iscsi
disk during POST?

Just a comment, given IBM's statements of "open spec" and/or "open
standards".  Clearly this would cause IBM to lose a market position, and
would be bad business practices, but see the fore mentioned position the
organization promotes.

Cheers,

Andre Hedrick
LAD Storage Consulting Group

On Wed, 29 Oct 2003, Mike Christie wrote:

> Clay Haapala wrote:
> 
> > On Wed, 29 Oct 2003, Mike Christie outgrape:
> > 
> >>>This may turn out to be a discussion about where this code resides
> >>>and what code is best to (re)use, not if it is needed.  "Early
> >>>boot" here is from PXE -- setting up the iSCSI initiator to reach a
> >>>target at some IP where the rest of the boot (initrd and friends)
> >>>is.
> >>>
> >>
> >>Oops. I forgot you guys use iNBP.com. I was thinking of pxelinux.
> >>
> >>Mike
> > 
> > Oh, now I have to find out about pxelinux ... google ... OK. :-)
> > 
> > Yeah, like that only different -- using iSCSI to a target to fetch
> > blocks rather than TFTP of files from a filestore.
> 
> Once inbp.com reads in the kernel and initrd, it looks like your linuxrc 
> just insmods the netwrok driver and iscsi mod. At this point (after you 
> insmod the network driver but before you insmod iscsi.ko.) you can start 
> the networking interface from userspace. Include a statically compiled 
> dhcp program and run it from the linuxrc.
> 
> Mike
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 102+ messages in thread

* RE: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-10-27 15:39 ` Christoph Hellwig
  2003-10-29 13:23   ` Surekha.PC
@ 2003-11-06  9:42   ` Sachin Mhatre (smhatre)
  2003-11-06 10:09     ` 'Christoph Hellwig'
  2003-11-06 11:10     ` Andre Hedrick
  2003-11-13 14:30   ` Sachin Mhatre (smhatre)
                     ` (4 subsequent siblings)
  6 siblings, 2 replies; 102+ messages in thread
From: Sachin Mhatre (smhatre) @ 2003-11-06  9:42 UTC (permalink / raw)
  To: 'Christoph Hellwig'; +Cc: linux-scsi, davmyers


Hi,
	The following is the reason for sending the TEST_UNIT_READY and
START_STOP initialization command in the iSCSI driver:- 

The iSCSI driver receives a CHECK CONDITION with sense key, NOT READY
and ASC/ASCQ=04/02 (i.e. an initialization command required). The Linux
SCSI mid-layer does not handle this condition. Hence the TUR and START
STOP initialization commands are sent from the iSCSI driver.

Should this condition be necessarily handled in the SCSI mid-layer or
the low-level driver?

Thanks,
Sachin

-----Original Message-----
From: linux-scsi-owner@vger.kernel.org
[mailto:linux-scsi-owner@vger.kernel.org] On Behalf Of Christoph Hellwig
Sent: Monday, October 27, 2003 9:10 PM
To: Krishna Murthy
Cc: linux-scsi@vger.kernel.org; davmyers@cisco.com
Subject: Re: Request for review of Linux iSCSI driver version 4.0.0.1


On Thu, Oct 23, 2003 at 05:34:10PM +0530, Krishna Murthy wrote:
> Hi,
> 	Linux iSCSI driver 4.0.0.1 (linux-iscsi-4.0.0.1.tgz) is
available at
> 
> http://sf.net/projects/linux-iscsi/

Here's a list of a few comments I've come up with.  I haven't yet looked
at the data structurws much, but it seems there's a complete lack of
refcounting.  Also there's lots of comments were ysou complain about
scsi midlayer misbehaviour or bugs - please submit patches or submit
workarounds when you encounter them, that's what open source is all
about!

Also the same question that went to the unh folks goes to you:  why do
you think your implementation is better than theirs and why do you think
cooperation on one iscsi stack is impossible.

here's the list:


iscsi-probe.c:
	- lots of unchecked kmalloc()s
	- you can't allocate scsi_cmnds yourself but _must_ use
scsi_get_command
	- dito for scsi_device.
	- the whole iscsi_do_cmnd use looks very very strange.  you're
probably
	much better of to use the scsi_request based interfaces and use
midlayer
	queueing and sometimes even higher level interfaces, e.g. your
iscsi_send_tur
	coudld be replaced by an scsi_ioctl(sdp,
SCSI_IOCTL_TEST_UNIT_READY, NULL)
	I still don't understand, though why the heck you need to do a
TUR in
	a lowlevel driver and even more need a thread do a single TUR,
started from
	another kernel tread.  there more I dig into it your probing and
thread
	abuse looks horribly screwed.
	- reinitilialize disk is another such candidate.  again why do
you
	need to send START_STOP commands from the LLDD, why do you need
another
	thread instead of a state machine and again life would be a lot
simpler
	by using the midlayer queueing infrastructure.
	- dito for the reports luns and inquiry code in
iscsi_detect_luns.
	I'm very very unhappy to see a lowlevel driver mess with all
this.
	Even if you need to do it from a LLDD use the midlayer
functionality for
	it.  But I still want to see a very good explanation why you
need this at all.

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org More majordomo info
at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-11-06  9:42   ` Sachin Mhatre (smhatre)
@ 2003-11-06 10:09     ` 'Christoph Hellwig'
  2003-11-07  8:55       ` Douglas Gilbert
  2003-11-06 11:10     ` Andre Hedrick
  1 sibling, 1 reply; 102+ messages in thread
From: 'Christoph Hellwig' @ 2003-11-06 10:09 UTC (permalink / raw)
  To: Sachin Mhatre (smhatre); +Cc: 'Christoph Hellwig', linux-scsi, davmyers

On Thu, Nov 06, 2003 at 03:12:51PM +0530, Sachin Mhatre (smhatre) wrote:
> 
> Hi,
> 	The following is the reason for sending the TEST_UNIT_READY and
> START_STOP initialization command in the iSCSI driver:- 
> 
> The iSCSI driver receives a CHECK CONDITION with sense key, NOT READY
> and ASC/ASCQ=04/02 (i.e. an initialization command required). The Linux
> SCSI mid-layer does not handle this condition. Hence the TUR and START
> STOP initialization commands are sent from the iSCSI driver.
> 
> Should this condition be necessarily handled in the SCSI mid-layer or
> the low-level driver?

I think we should handle it in the midlayer.  Could you come up with a
patch to implement it?

--
Christoph Hellwig <hch@lst.de>		-	Freelance Hacker
Contact me for driver hacking and kernel development consulting

^ permalink raw reply	[flat|nested] 102+ messages in thread

* RE: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-11-06  9:42   ` Sachin Mhatre (smhatre)
  2003-11-06 10:09     ` 'Christoph Hellwig'
@ 2003-11-06 11:10     ` Andre Hedrick
  2003-11-06 11:14       ` 'Christoph Hellwig'
  1 sibling, 1 reply; 102+ messages in thread
From: Andre Hedrick @ 2003-11-06 11:10 UTC (permalink / raw)
  To: Sachin Mhatre (smhatre); +Cc: 'Christoph Hellwig', linux-scsi, davmyers


It needs to be handled in the iSCSI driver.
Extending the operation to the entire midlayer will cause great grief to
all the HBA's which do not even support the abridged version of SBC call.
Some support only a subset of the subset of SCSI commands.  Until all
HBA code has a means to report a positive list of commands supported, only
exception callers in the local real/virtual HBA appears doable.

Christoph, until you deal with an iSCSI-FC router and the double transport
callers to sanely address FC appearing as DAS, you will not see the reason
for nested threads for TRU.

Yeah FC local looks like DAS, but there are several timing layers that
local DAS verses ip-storage do not match up.

Just my $0.02

Cheers,

Andre Hedrick
LAD Storage Consulting Group

On Thu, 6 Nov 2003, Sachin Mhatre (smhatre) wrote:

> 
> Hi,
> 	The following is the reason for sending the TEST_UNIT_READY and
> START_STOP initialization command in the iSCSI driver:- 
> 
> The iSCSI driver receives a CHECK CONDITION with sense key, NOT READY
> and ASC/ASCQ=04/02 (i.e. an initialization command required). The Linux
> SCSI mid-layer does not handle this condition. Hence the TUR and START
> STOP initialization commands are sent from the iSCSI driver.
> 
> Should this condition be necessarily handled in the SCSI mid-layer or
> the low-level driver?
> 
> Thanks,
> Sachin
> 
> -----Original Message-----
> From: linux-scsi-owner@vger.kernel.org
> [mailto:linux-scsi-owner@vger.kernel.org] On Behalf Of Christoph Hellwig
> Sent: Monday, October 27, 2003 9:10 PM
> To: Krishna Murthy
> Cc: linux-scsi@vger.kernel.org; davmyers@cisco.com
> Subject: Re: Request for review of Linux iSCSI driver version 4.0.0.1
> 
> 
> On Thu, Oct 23, 2003 at 05:34:10PM +0530, Krishna Murthy wrote:
> > Hi,
> > 	Linux iSCSI driver 4.0.0.1 (linux-iscsi-4.0.0.1.tgz) is
> available at
> > 
> > http://sf.net/projects/linux-iscsi/
> 
> Here's a list of a few comments I've come up with.  I haven't yet looked
> at the data structurws much, but it seems there's a complete lack of
> refcounting.  Also there's lots of comments were ysou complain about
> scsi midlayer misbehaviour or bugs - please submit patches or submit
> workarounds when you encounter them, that's what open source is all
> about!
> 
> Also the same question that went to the unh folks goes to you:  why do
> you think your implementation is better than theirs and why do you think
> cooperation on one iscsi stack is impossible.
> 
> here's the list:
> 
> 
> iscsi-probe.c:
> 	- lots of unchecked kmalloc()s
> 	- you can't allocate scsi_cmnds yourself but _must_ use
> scsi_get_command
> 	- dito for scsi_device.
> 	- the whole iscsi_do_cmnd use looks very very strange.  you're
> probably
> 	much better of to use the scsi_request based interfaces and use
> midlayer
> 	queueing and sometimes even higher level interfaces, e.g. your
> iscsi_send_tur
> 	coudld be replaced by an scsi_ioctl(sdp,
> SCSI_IOCTL_TEST_UNIT_READY, NULL)
> 	I still don't understand, though why the heck you need to do a
> TUR in
> 	a lowlevel driver and even more need a thread do a single TUR,
> started from
> 	another kernel tread.  there more I dig into it your probing and
> thread
> 	abuse looks horribly screwed.
> 	- reinitilialize disk is another such candidate.  again why do
> you
> 	need to send START_STOP commands from the LLDD, why do you need
> another
> 	thread instead of a state machine and again life would be a lot
> simpler
> 	by using the midlayer queueing infrastructure.
> 	- dito for the reports luns and inquiry code in
> iscsi_detect_luns.
> 	I'm very very unhappy to see a lowlevel driver mess with all
> this.
> 	Even if you need to do it from a LLDD use the midlayer
> functionality for
> 	it.  But I still want to see a very good explanation why you
> need this at all.
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org More majordomo info
> at  http://vger.kernel.org/majordomo-info.html
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-11-06 11:10     ` Andre Hedrick
@ 2003-11-06 11:14       ` 'Christoph Hellwig'
  0 siblings, 0 replies; 102+ messages in thread
From: 'Christoph Hellwig' @ 2003-11-06 11:14 UTC (permalink / raw)
  To: Andre Hedrick
  Cc: Sachin Mhatre (smhatre), 'Christoph Hellwig', linux-scsi,
	davmyers

On Thu, Nov 06, 2003 at 03:10:59AM -0800, Andre Hedrick wrote:
> 
> It needs to be handled in the iSCSI driver.
> Extending the operation to the entire midlayer will cause great grief to
> all the HBA's which do not even support the abridged version of SBC call.
> Some support only a subset of the subset of SCSI commands.  Until all
> HBA code has a means to report a positive list of commands supported, only
> exception callers in the local real/virtual HBA appears doable.

in the midlayer != enabled unconditionally.  We just have much better
infrastructure for this than a LLDD.


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-11-06 10:09     ` 'Christoph Hellwig'
@ 2003-11-07  8:55       ` Douglas Gilbert
  2003-11-07  9:30         ` 'Christoph Hellwig'
  2003-11-10 17:43         ` Patrick Mansfield
  0 siblings, 2 replies; 102+ messages in thread
From: Douglas Gilbert @ 2003-11-07  8:55 UTC (permalink / raw)
  To: 'Christoph Hellwig'; +Cc: Sachin Mhatre (smhatre), linux-scsi, davmyers

'Christoph Hellwig' wrote:
> On Thu, Nov 06, 2003 at 03:12:51PM +0530, Sachin Mhatre (smhatre) wrote:
> 
>>Hi,
>>	The following is the reason for sending the TEST_UNIT_READY and
>>START_STOP initialization command in the iSCSI driver:- 
>>
>>The iSCSI driver receives a CHECK CONDITION with sense key, NOT READY
>>and ASC/ASCQ=04/02 (i.e. an initialization command required). The Linux
>>SCSI mid-layer does not handle this condition. Hence the TUR and START
>>STOP initialization commands are sent from the iSCSI driver.
>>
>>Should this condition be necessarily handled in the SCSI mid-layer or
>>the low-level driver?
> 
> 
> I think we should handle it in the midlayer.  Could you come up with a
> patch to implement it?

Interesting challenge. For a start START_STOP UNIT ** is a SBC-2
command so the mid-layer shouldn't be generating it, the
sd driver should. The sd_rw_intr() function in the sd driver
could be made to detect ASC/ASCQ=04/02 but it is pretty obviously
executing in interrupt context. The sd driver has a tailor made
routine to spin up a disk called sd_spinup_disk() but since it
calls schedule_timeout() it must be called from process context.

Also once a spinup has commenced, it will take about 30
seconds to complete. In that time an impatient user could
try and mount a different partition on the same disk.
That process should see a ASC/ASCQ=04/01 ("Logical unit is in
process of becoming ready") and it should wait till it
clears (and not issue another spinup).

Seems like some infrastructure changes would be needed.

Testing with lk 2.6.0-test9: spinning down a disk that
the scsi subsystem had already seen but with nothing mounted
on it, then trying to mount a partition on it, causes a lot
of ugly messages in my log (at least the machine didn't crash).


** There is also the Power Condition mode page that all SCSI
devices may optionally support. Trouble is with my SCSI-3 disks
they either don't support it or the one that does gives
ASC/ASCQ=04/02 when the disk is spun down (violating SPC-3).

Doug Gilbert




^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-11-07  8:55       ` Douglas Gilbert
@ 2003-11-07  9:30         ` 'Christoph Hellwig'
  2003-11-10 17:43         ` Patrick Mansfield
  1 sibling, 0 replies; 102+ messages in thread
From: 'Christoph Hellwig' @ 2003-11-07  9:30 UTC (permalink / raw)
  To: Douglas Gilbert; +Cc: Sachin Mhatre (smhatre), linux-scsi, davmyers

On Fri, Nov 07, 2003 at 06:55:01PM +1000, Douglas Gilbert wrote:
> > I think we should handle it in the midlayer.  Could you come up with a
> > patch to implement it?
> 
> Interesting challenge. For a start START_STOP UNIT ** is a SBC-2
> command so the mid-layer shouldn't be generating it, the
> sd driver should. The sd_rw_intr() function in the sd driver
> could be made to detect ASC/ASCQ=04/02 but it is pretty obviously
> executing in interrupt context. The sd driver has a tailor made
> routine to spin up a disk called sd_spinup_disk() but since it
> calls schedule_timeout() it must be called from process context.

We could call sd_spinup_disk from schedule_work.  This still seems
better than having a thread in a LLDD just for this.

> Also once a spinup has commenced, it will take about 30
> seconds to complete. In that time an impatient user could
> try and mount a different partition on the same disk.
> That process should see a ASC/ASCQ=04/01 ("Logical unit is in
> process of becoming ready") and it should wait till it
> clears (and not issue another spinup).
> 
> Seems like some infrastructure changes would be needed.
> 
> Testing with lk 2.6.0-test9: spinning down a disk that
> the scsi subsystem had already seen but with nothing mounted
> on it, then trying to mount a partition on it, causes a lot
> of ugly messages in my log (at least the machine didn't crash).

Looks like we need to handle that case anyway..

> ** There is also the Power Condition mode page that all SCSI
> devices may optionally support. Trouble is with my SCSI-3 disks
> they either don't support it or the one that does gives
> ASC/ASCQ=04/02 when the disk is spun down (violating SPC-3).

Interesting.  Now that the scsi_device is part of the driver model
we could actually do some fancy things about spinning down disks
in a controlled mannor on suspend and wake them up again on
resume.



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-11-07  8:55       ` Douglas Gilbert
  2003-11-07  9:30         ` 'Christoph Hellwig'
@ 2003-11-10 17:43         ` Patrick Mansfield
  1 sibling, 0 replies; 102+ messages in thread
From: Patrick Mansfield @ 2003-11-10 17:43 UTC (permalink / raw)
  To: Douglas Gilbert
  Cc: 'Christoph Hellwig', Sachin Mhatre (smhatre), linux-scsi,
	davmyers

On Fri, Nov 07, 2003 at 06:55:01PM +1000, Douglas Gilbert wrote:
> 'Christoph Hellwig' wrote:
> > On Thu, Nov 06, 2003 at 03:12:51PM +0530, Sachin Mhatre (smhatre) wrote:

> >>The iSCSI driver receives a CHECK CONDITION with sense key, NOT READY
> >>and ASC/ASCQ=04/02 (i.e. an initialization command required). The Linux
> >>SCSI mid-layer does not handle this condition. Hence the TUR and START
> >>STOP initialization commands are sent from the iSCSI driver.
> >>
> >>Should this condition be necessarily handled in the SCSI mid-layer or
> >>the low-level driver?
> > 
> > 
> > I think we should handle it in the midlayer.  Could you come up with a
> > patch to implement it?

Why doesn't the iscsi target handle the problem?

We already spun the drive up in sd.c during scan + probe, or for removable
or write protected media, on open (sd_open calls check_disk_change).

That is, whoever spun down the drive should spin it up.

In Doug's example, he knows he spun it down, and can spin it up again before
mounting the disk.

-- Patrick Mansfield

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-10-29 13:45     ` 'Christoph Hellwig'
  2003-10-29 17:28       ` Mike Christie
@ 2003-11-11 11:56       ` Naveen Burmi
  2003-11-11 17:36         ` 'Christoph Hellwig'
  2003-11-11 17:40         ` James Bottomley
  1 sibling, 2 replies; 102+ messages in thread
From: Naveen Burmi @ 2003-11-11 11:56 UTC (permalink / raw)
  Cc: 'Christoph Hellwig', linux-scsi, davmyers

On Wednesday 29 October 2003 07:15 pm, 'Christoph Hellwig' wrote:
> On Wed, Oct 29, 2003 at 06:53:43PM +0530, Surekha.PC wrote:
> > Hi,
> >
> >  I have interspersed my answers below for 2 of your comments.
> >
> > >	- kill the is_root_disk special casing.  drivers can't know
> > >	what the root disks is, there might not even be a single one.
> >
> > This flag is needed for our iSCSI network boot implementation to check
> > if we are booted on iSCSI disk. This is used while trying to shutdown
> > the iSCSI service.
>
> What if the root device changes (pivot_root syscall, namespaces)?  You need
> to find a way to do what you want without having knowledge about beeing
> a root device in the LLDD - there might not be a single root device and it
> could change all the time.

During shutdown iSCSI driver gets signal to stop itself. In turn iSCSI driver 
umounts all the partitions present of iSCSI disks, makes all the iSCSI disks 
unavailble to the system and gracefully terminate. 
But if iSCSI disk contains the root partition OR is part of root partition 
then iSCSI driver must not try to umount the root partition and the iSCSI 
disks must be available until the buffer cache gets flushed properly. For 
this to happen iSCSI driver must be aware of all the iSCSI disk(s) that 
constitute the root partition/devices.

	Q. What kernel data structures/API's are available that provide the 
information about root device(s)?

>
> > This is again needed for network boot. The n/w driver and iSCSI driver
> > are loaded during early boot. Since the n/w interface is not setup at
> > that time, we need to bring up the interface through this call in iSCSI
> > driver.
>
> That's bogus.  Look at the ip autoconfig code used by rootnfs and reuse
> that instead of creating such bogus hacks.  A driver has no business
> dealing with the network configuration.

During shutdown iSCSI driver must bring up the network interface, if it goes 
down, so to make available the iSCSI disk(s) on which the root partition 
exists. iscsi_set_if_addr is used to bring up the newtwork interface at the 
time of shutdown. 
Q. Is ip autoconfig code helps us to bring up the newtwork interface at the 
time of shutdown? 

Thanks,
Naveen.

>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-11-11 11:56       ` Naveen Burmi
@ 2003-11-11 17:36         ` 'Christoph Hellwig'
  2003-11-19 12:40           ` Krishna Murthy
  2003-11-11 17:40         ` James Bottomley
  1 sibling, 1 reply; 102+ messages in thread
From: 'Christoph Hellwig' @ 2003-11-11 17:36 UTC (permalink / raw)
  To: Naveen Burmi; +Cc: linux-scsi, davmyers

On Tue, Nov 11, 2003 at 05:26:36PM +0530, Naveen Burmi wrote:
> During shutdown iSCSI driver gets signal to stop itself. In turn iSCSI driver 
> umounts all the partitions present of iSCSI disks, makes all the iSCSI disks 
> unavailble to the system and gracefully terminate. 
> But if iSCSI disk contains the root partition OR is part of root partition 
> then iSCSI driver must not try to umount the root partition and the iSCSI 
> disks must be available until the buffer cache gets flushed properly. For 
> this to happen iSCSI driver must be aware of all the iSCSI disk(s) that 
> constitute the root partition/devices.

I think you'd rather want a reboot notifier for after the final sync..

> 	Q. What kernel data structures/API's are available that provide the 
> information about root device(s)?

there is no such information.  you can only know the root device of a
particular process and that information is not available to drivers.

> > that instead of creating such bogus hacks.  A driver has no business
> > dealing with the network configuration.
> 
> During shutdown iSCSI driver must bring up the network interface, if it goes 
> down, so to make available the iSCSI disk(s) on which the root partition 
> exists. iscsi_set_if_addr is used to bring up the newtwork interface at the 
> time of shutdown. 

you don't need to shut down interfaces on system shutdown.

> Q. Is ip autoconfig code helps us to bring up the newtwork interface at the 
> time of shutdown?

no.


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-11-11 11:56       ` Naveen Burmi
  2003-11-11 17:36         ` 'Christoph Hellwig'
@ 2003-11-11 17:40         ` James Bottomley
  1 sibling, 0 replies; 102+ messages in thread
From: James Bottomley @ 2003-11-11 17:40 UTC (permalink / raw)
  To: naveenb; +Cc: 'Christoph Hellwig', SCSI Mailing List, davmyers

On Tue, 2003-11-11 at 03:56, Naveen Burmi wrote:
> During shutdown iSCSI driver gets signal to stop itself. In turn iSCSI driver 
> umounts all the partitions present of iSCSI disks, makes all the iSCSI disks 
> unavailble to the system and gracefully terminate. 
> But if iSCSI disk contains the root partition OR is part of root partition 
> then iSCSI driver must not try to umount the root partition and the iSCSI 
> disks must be available until the buffer cache gets flushed properly. For 
> this to happen iSCSI driver must be aware of all the iSCSI disk(s) that 
> constitute the root partition/devices.

I see what you're trying to do, but I don't see how this is any
different from the NFS root problem, which has already been long solved
in distribution startup systems.

James



^ permalink raw reply	[flat|nested] 102+ messages in thread

* RE: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-10-27 15:39 ` Christoph Hellwig
  2003-10-29 13:23   ` Surekha.PC
  2003-11-06  9:42   ` Sachin Mhatre (smhatre)
@ 2003-11-13 14:30   ` Sachin Mhatre (smhatre)
  2003-11-13 14:54     ` Matthew Wilcox
  2003-11-19 13:04   ` Sachin Mhatre (smhatre)
                     ` (3 subsequent siblings)
  6 siblings, 1 reply; 102+ messages in thread
From: Sachin Mhatre (smhatre) @ 2003-11-13 14:30 UTC (permalink / raw)
  To: 'Christoph Hellwig'; +Cc: linux-scsi, davmyers

Hi,
	We need the md5 and crc functions from the user level as well as
from the kernel. The crypto API and the CRC32 kernel  library functions
can solve the problem of md5 and crc computation in the kernel. But its
not possible to use these APIs from the user level. One way could be to
use system calls. But wouldn't that make the code cumbersome?

Thanks,
Sachin

-----Original Message-----
From: linux-scsi-owner@vger.kernel.org
[mailto:linux-scsi-owner@vger.kernel.org] On Behalf Of Christoph Hellwig
Sent: Monday, October 27, 2003 9:10 PM
To: Krishna Murthy
Cc: linux-scsi@vger.kernel.org; davmyers@cisco.com
Subject: Re: Request for review of Linux iSCSI driver version 4.0.0.1


On Thu, Oct 23, 2003 at 05:34:10PM +0530, Krishna Murthy wrote:
> Hi,
> 	Linux iSCSI driver 4.0.0.1 (linux-iscsi-4.0.0.1.tgz) is
available at
> 
> http://sf.net/projects/linux-iscsi/

Here's a list of a few comments I've come up with.  I haven't yet looked
at the data structurws much, but it seems there's a complete lack of
refcounting.  Also there's lots of comments were ysou complain about
scsi midlayer misbehaviour or bugs - please submit patches or submit
workarounds when you encounter them, that's what open source is all
about!

Also the same question that went to the unh folks goes to you:  why do
you think your implementation is better than theirs and why do you think
cooperation on one iscsi stack is impossible.

here's the list:


highlevel:
	- kill md5.c and iscsi-crc.c and use common cryptoapi
	md5 / common crc code instead.
	- please kill you thousands of atoi/aton reimplementations
	and look for proper linux functions like simple_strtoul.

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org More majordomo info
at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-11-13 14:30   ` Sachin Mhatre (smhatre)
@ 2003-11-13 14:54     ` Matthew Wilcox
  0 siblings, 0 replies; 102+ messages in thread
From: Matthew Wilcox @ 2003-11-13 14:54 UTC (permalink / raw)
  To: Sachin Mhatre (smhatre); +Cc: 'Christoph Hellwig', linux-scsi, davmyers

On Thu, Nov 13, 2003 at 08:00:11PM +0530, Sachin Mhatre (smhatre) wrote:
> 	We need the md5 and crc functions from the user level as well as
> from the kernel. The crypto API and the CRC32 kernel  library functions
> can solve the problem of md5 and crc computation in the kernel. But its
> not possible to use these APIs from the user level. One way could be to
> use system calls. But wouldn't that make the code cumbersome?

Uhm, your user code is free to do whatever it likes, including using
its own md5 routines instead of the ones in libssl.  But the kernel code
should use the kernel facilities.

-- 
"It's not Hollywood.  War is real, war is primarily not about defeat or
victory, it is about death.  I've seen thousands and thousands of dead bodies.
Do you think I want to have an academic debate on this subject?" -- Robert Fisk

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-11-11 17:36         ` 'Christoph Hellwig'
@ 2003-11-19 12:40           ` Krishna Murthy
  2003-11-19 12:49             ` Matthew Wilcox
  2003-11-19 13:38             ` 'Christoph Hellwig'
  0 siblings, 2 replies; 102+ messages in thread
From: Krishna Murthy @ 2003-11-19 12:40 UTC (permalink / raw)
  To: 'Christoph Hellwig'; +Cc: linux-scsi, davmyers

Hi ,
	The following was one of your comments on the iSCSI driver:

"having multiple  kmap() in the same process at the same time can kmap(),
you redesign iscsi_xmit_task/iscsi_xmit_data/iscsi_recv_data not to do that.
for the tx path use ->sendpage to avoid a data copy and kmapping altogether"


In the first line did you intend to say having multiple kmap() can deadlock?
Regarding usage of sendpage and avoiding kmapping altogether, 
we need to access the data, to calculate data digest.  If we do  not use kmap
how else do we achieve it?

Please let us know.

Thanx
N.C.Krishna Murthy



> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-11-19 12:40           ` Krishna Murthy
@ 2003-11-19 12:49             ` Matthew Wilcox
  2003-11-19 13:38             ` 'Christoph Hellwig'
  1 sibling, 0 replies; 102+ messages in thread
From: Matthew Wilcox @ 2003-11-19 12:49 UTC (permalink / raw)
  To: Krishna Murthy; +Cc: 'Christoph Hellwig', linux-scsi, davmyers

On Wed, Nov 19, 2003 at 06:10:30PM +0530, Krishna Murthy wrote:
> Hi ,
> 	The following was one of your comments on the iSCSI driver:
> 
> "having multiple  kmap() in the same process at the same time can kmap(),
> you redesign iscsi_xmit_task/iscsi_xmit_data/iscsi_recv_data not to do that.
> for the tx path use ->sendpage to avoid a data copy and kmapping altogether"
> 
> 
> In the first line did you intend to say having multiple kmap() can deadlock?

Yes, he did.

> Regarding usage of sendpage and avoiding kmapping altogether, 
> we need to access the data, to calculate data digest.  If we do  not use kmap
> how else do we achieve it?

Use the digest API.  See include/linux/crypto.h.  For example,
crypto_digest_update() takes a scatterlist.  Scatterlists include
a page/offset, so your code doesn't need to kmap() at all.

-- 
"It's not Hollywood.  War is real, war is primarily not about defeat or
victory, it is about death.  I've seen thousands and thousands of dead bodies.
Do you think I want to have an academic debate on this subject?" -- Robert Fisk

^ permalink raw reply	[flat|nested] 102+ messages in thread

* RE: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-10-27 15:39 ` Christoph Hellwig
                     ` (2 preceding siblings ...)
  2003-11-13 14:30   ` Sachin Mhatre (smhatre)
@ 2003-11-19 13:04   ` Sachin Mhatre (smhatre)
  2003-11-19 13:10     ` 'Christoph Hellwig'
  2003-11-19 14:48   ` Naveen Burmi
                     ` (2 subsequent siblings)
  6 siblings, 1 reply; 102+ messages in thread
From: Sachin Mhatre (smhatre) @ 2003-11-19 13:04 UTC (permalink / raw)
  To: 'Christoph Hellwig'; +Cc: linux-scsi, davmyers

Hi Christopher,
	Could you please elaborate on this review comment? 
	- kill ctl_open/ctl_release and move add an owner to the fops
structure for refcounting.

Especially, what do you mean by "move add an owner to the fops structure
for refcounting" ? We already use the .owner field in the
file_operations template. What else are you refering to?


Thanks,
Sachin




^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-11-19 13:04   ` Sachin Mhatre (smhatre)
@ 2003-11-19 13:10     ` 'Christoph Hellwig'
  0 siblings, 0 replies; 102+ messages in thread
From: 'Christoph Hellwig' @ 2003-11-19 13:10 UTC (permalink / raw)
  To: Sachin Mhatre (smhatre); +Cc: linux-scsi, davmyers

On Wed, Nov 19, 2003 at 06:34:39PM +0530, Sachin Mhatre (smhatre) wrote:
> Hi Christopher,
> 	Could you please elaborate on this review comment? 
> 	- kill ctl_open/ctl_release and move add an owner to the fops
> structure for refcounting.
> 
> Especially, what do you mean by "move add an owner to the fops structure
> for refcounting" ? We already use the .owner field in the
> file_operations template. What else are you refering to?

I currently do not have the code in front of me, but IIRC ctl_open/ctl_release
was only doing MOD_INC_USE_COUNT/MOD_DEC_USE_COUNT (or even a broken
equivalent).  These days all refcounting is done by higher levels if you
provide a owner (=THIS_MODULE) field in struct file_operations and these
methods could be removed.  If you already have the owner set that's even
easier..


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-11-19 12:40           ` Krishna Murthy
  2003-11-19 12:49             ` Matthew Wilcox
@ 2003-11-19 13:38             ` 'Christoph Hellwig'
  1 sibling, 0 replies; 102+ messages in thread
From: 'Christoph Hellwig' @ 2003-11-19 13:38 UTC (permalink / raw)
  To: Krishna Murthy; +Cc: 'Christoph Hellwig', linux-scsi, davmyers

On Wed, Nov 19, 2003 at 06:10:30PM +0530, Krishna Murthy wrote:
> Hi ,
> 	The following was one of your comments on the iSCSI driver:
> 
> "having multiple  kmap() in the same process at the same time can kmap(),
> you redesign iscsi_xmit_task/iscsi_xmit_data/iscsi_recv_data not to do that.
> for the tx path use ->sendpage to avoid a data copy and kmapping altogether"
> 
> 
> In the first line did you intend to say having multiple kmap() can deadlock?

Yes.

> Regarding usage of sendpage and avoiding kmapping altogether, 
> we need to access the data, to calculate data digest.  If we do  not use kmap
> how else do we achieve it?

Well, with software crypto someone will have to do the kmap (or
kmap_atomic), but I'd rather leave that as an implementation detail to
the crypto API as I already suggested earlier.


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-11-19 14:48   ` Naveen Burmi
@ 2003-11-19 14:48     ` Christoph Hellwig
  2003-11-19 15:19       ` Naveen Burmi
  2003-12-01 12:10       ` Krishna Murthy
  2003-11-19 17:17     ` Patrick Mansfield
  1 sibling, 2 replies; 102+ messages in thread
From: Christoph Hellwig @ 2003-11-19 14:48 UTC (permalink / raw)
  To: Naveen Burmi; +Cc: Christoph Hellwig, linux-scsi, davmyers

On Wed, Nov 19, 2003 at 08:18:32PM +0530, Naveen Burmi wrote:
> On Monday 27 October 2003 09:09 pm, Christoph Hellwig wrote:
> 
> > iscsi-probe.c:
> > 	- the whole iscsi_do_cmnd use looks very very strange.  
> 
> After a device gets detected by iSCSI LLD, it uses iscsi_do_cmnd to send scsi 
> commands to that device even before registering it with the scsi mid-layer.
>  
> For example user can configure in iscsi LLD configuration (iscsi.conf) file 
> what various LUNs of a particular target needs to be activated. To acheive 
> this we need to execute REPORT LUN command on target, to find out what 
> various luns are present and selectively issue scsi_add_device for luns that 
> needs to be activated.

Couldn't you just use scsi_scan_host_selected from scsi_scan.c for that
if it was exported?  We already have all that big and perfectly multi-lun
capable probing code in the scsi midlayer and it would be a shame to not
use it.


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-10-27 15:39 ` Christoph Hellwig
                     ` (3 preceding siblings ...)
  2003-11-19 13:04   ` Sachin Mhatre (smhatre)
@ 2003-11-19 14:48   ` Naveen Burmi
  2003-11-19 14:48     ` Christoph Hellwig
  2003-11-19 17:17     ` Patrick Mansfield
  2003-11-21 16:42   ` Request for review of Linux iSCSI driver version 4.0.0.1 Clay Haapala
  2003-11-24  6:09   ` Surekha.PC
  6 siblings, 2 replies; 102+ messages in thread
From: Naveen Burmi @ 2003-11-19 14:48 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-scsi, davmyers

On Monday 27 October 2003 09:09 pm, Christoph Hellwig wrote:

> iscsi-probe.c:
> 	- the whole iscsi_do_cmnd use looks very very strange.  

After a device gets detected by iSCSI LLD, it uses iscsi_do_cmnd to send scsi 
commands to that device even before registering it with the scsi mid-layer.
 
For example user can configure in iscsi LLD configuration (iscsi.conf) file 
what various LUNs of a particular target needs to be activated. To acheive 
this we need to execute REPORT LUN command on target, to find out what 
various luns are present and selectively issue scsi_add_device for luns that 
needs to be activated.

>          you're probably
> 	much better of to use the scsi_request based interfaces and use midlayer
> 	queueing and sometimes even higher level interfaces

All these interfaces requires scsi_device instance which implies that device 
must be registered with the scsi mid-layer,  before using these interfaces. 

Naveen.

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-11-19 14:48     ` Christoph Hellwig
@ 2003-11-19 15:19       ` Naveen Burmi
  2003-11-19 15:20         ` Christoph Hellwig
  2003-12-01 12:10       ` Krishna Murthy
  1 sibling, 1 reply; 102+ messages in thread
From: Naveen Burmi @ 2003-11-19 15:19 UTC (permalink / raw)
  Cc: Christoph Hellwig, linux-scsi, davmyers

On Wednesday 19 November 2003 08:18 pm, Christoph Hellwig wrote:
> On Wed, Nov 19, 2003 at 08:18:32PM +0530, Naveen Burmi wrote:
> > On Monday 27 October 2003 09:09 pm, Christoph Hellwig wrote:
> > > iscsi-probe.c:
> > > 	- the whole iscsi_do_cmnd use looks very very strange.
> >
> > After a device gets detected by iSCSI LLD, it uses iscsi_do_cmnd to send
> > scsi commands to that device even before registering it with the scsi
> > mid-layer.
> >
> > For example user can configure in iscsi LLD configuration (iscsi.conf)
> > file what various LUNs of a particular target needs to be activated. To
> > acheive this we need to execute REPORT LUN command on target, to find out
> > what various luns are present and selectively issue scsi_add_device for
> > luns that needs to be activated.
>
> Couldn't you just use scsi_scan_host_selected from scsi_scan.c for that
> if it was exported?  We already have all that big and perfectly multi-lun
> capable probing code in the scsi midlayer and it would be a shame to not
> use it.

scsi_scan_host_selected can be used for activating LUNs, selectively.
But that is the second step.
First step is to execute the REPORT LUN command without registering a device 
with scsi mid-layer. That's what we are acheiving through iscsi_do_cmnd .
Once we know what all luns are available for a target,   
scsi_scan_host_selected can be used to activate the luns, selectively.


> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-11-19 15:19       ` Naveen Burmi
@ 2003-11-19 15:20         ` Christoph Hellwig
  0 siblings, 0 replies; 102+ messages in thread
From: Christoph Hellwig @ 2003-11-19 15:20 UTC (permalink / raw)
  To: Naveen Burmi; +Cc: Christoph Hellwig, linux-scsi, davmyers

On Wed, Nov 19, 2003 at 08:49:36PM +0530, Naveen Burmi wrote:
> scsi_scan_host_selected can be used for activating LUNs, selectively.
> But that is the second step.
> First step is to execute the REPORT LUN command without registering a device 
> with scsi mid-layer. That's what we are acheiving through iscsi_do_cmnd .
> Once we know what all luns are available for a target,   
> scsi_scan_host_selected can be used to activate the luns, selectively.

Why do you insist on not using a struct scsi_device?  You acn still free
it if you don't want it. scsi_report_lun is doing all the REPORT_LUNS work
for you, no need to duplicate it.


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-11-19 14:48   ` Naveen Burmi
  2003-11-19 14:48     ` Christoph Hellwig
@ 2003-11-19 17:17     ` Patrick Mansfield
  2003-11-20 13:32       ` Naveen Burmi
  2003-11-22  8:16       ` How to generate ILI condtion on a tape device Kallol Biswas
  1 sibling, 2 replies; 102+ messages in thread
From: Patrick Mansfield @ 2003-11-19 17:17 UTC (permalink / raw)
  To: Naveen Burmi; +Cc: Christoph Hellwig, linux-scsi, davmyers

On Wed, Nov 19, 2003 at 08:18:32PM +0530, Naveen Burmi wrote:

> After a device gets detected by iSCSI LLD, it uses iscsi_do_cmnd to send scsi 
> commands to that device even before registering it with the scsi mid-layer.
>  
> For example user can configure in iscsi LLD configuration (iscsi.conf) file 
> what various LUNs of a particular target needs to be activated. To acheive 
> this we need to execute REPORT LUN command on target, to find out what 
> various luns are present and selectively issue scsi_add_device for luns that 
> needs to be activated.

If you already have the LUN specified, why do you need to issue a REPORT
LUN command?

I don't know iSCSI specification details, but I would want to specify a
target, not a LUN, in a configuration file and have all LUNs on the target
be automatically configured.

The iSCSI target itself, or actual storage (if iSCSI is just passing
through commands to an actual disk drive) could control what LUNs are seen
by an individual host.

-- Patrick Mansfield

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-11-19 17:17     ` Patrick Mansfield
@ 2003-11-20 13:32       ` Naveen Burmi
  2003-11-20 13:34         ` Christoph Hellwig
  2003-11-22  8:16       ` How to generate ILI condtion on a tape device Kallol Biswas
  1 sibling, 1 reply; 102+ messages in thread
From: Naveen Burmi @ 2003-11-20 13:32 UTC (permalink / raw)
  To: Patrick Mansfield; +Cc: Christoph Hellwig, linux-scsi, davmyers

On Wednesday 19 November 2003 10:47 pm, Patrick Mansfield wrote:

Lun masking feature is provided by the linux-iscsi driver. In the linux-iscsi 
driver configuration file user can specify the luns of a target that needs to 
be activated upon the detection of that target.

> If you already have the LUN specified, why do you need to issue a REPORT
> LUN command?

User has specified the LUNs that needs to be activated upon the detection of 
a target. Once that target gets detected then to find out weather the LUNs 
specified by the user are actually present, we need to issue REPORT LUN 
command.
 
>
> I don't know iSCSI specification details, but I would want to specify a
> target, not a LUN, in a configuration file and have all LUNs on the target
> be automatically configured.
>

User might not want to configure all the LUNs of a target. 

> The iSCSI target itself, or actual storage (if iSCSI is just passing
> through commands to an actual disk drive) could control what LUNs are seen
> by an individual host.
>
> -- Patrick Mansfield

Naveen

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-11-20 13:32       ` Naveen Burmi
@ 2003-11-20 13:34         ` Christoph Hellwig
  2003-11-20 14:53           ` Naveen Burmi
  0 siblings, 1 reply; 102+ messages in thread
From: Christoph Hellwig @ 2003-11-20 13:34 UTC (permalink / raw)
  To: Naveen Burmi; +Cc: Patrick Mansfield, Christoph Hellwig, linux-scsi, davmyers

On Thu, Nov 20, 2003 at 07:02:06PM +0530, Naveen Burmi wrote:
> Lun masking feature is provided by the linux-iscsi driver. In the linux-iscsi 
> driver configuration file user can specify the luns of a target that needs to 
> be activated upon the detection of that target.
> 
> > If you already have the LUN specified, why do you need to issue a REPORT
> > LUN command?
> 
> User has specified the LUNs that needs to be activated upon the detection of 
> a target. Once that target gets detected then to find out weather the LUNs 
> specified by the user are actually present, we need to issue REPORT LUN 
> command.
>  
> >
> > I don't know iSCSI specification details, but I would want to specify a
> > target, not a LUN, in a configuration file and have all LUNs on the target
> > be automatically configured.
> >
> 
> User might not want to configure all the LUNs of a target. 

I'm not sure I agree with that.  Users might also want whatever but it's not
sensible to do that.  Ignore the user request now, if we see real-world use
for that we'll implement it in the midlayer.  And if you think it's really
important implement it in the midlayer now and keep the patch around for
users that want it.


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-11-20 13:34         ` Christoph Hellwig
@ 2003-11-20 14:53           ` Naveen Burmi
  0 siblings, 0 replies; 102+ messages in thread
From: Naveen Burmi @ 2003-11-20 14:53 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Patrick Mansfield, linux-scsi, davmyers

On Thursday 20 November 2003 07:04 pm, Christoph Hellwig wrote:
> On Thu, Nov 20, 2003 at 07:02:06PM +0530, Naveen Burmi wrote:
> > Lun masking feature is provided by the linux-iscsi driver. In the
> > linux-iscsi driver configuration file user can specify the luns of a
> > target that needs to be activated upon the detection of that target.
> >
> > > If you already have the LUN specified, why do you need to issue a
> > > REPORT LUN command?
> >
> > User has specified the LUNs that needs to be activated upon the detection
> > of a target. Once that target gets detected then to find out weather the
> > LUNs specified by the user are actually present, we need to issue REPORT
> > LUN command.
> >
> > > I don't know iSCSI specification details, but I would want to specify a
> > > target, not a LUN, in a configuration file and have all LUNs on the
> > > target be automatically configured.
> >
> > User might not want to configure all the LUNs of a target.
>
> I'm not sure I agree with that.  Users might also want whatever but it's
> not sensible to do that.  Ignore the user request now, if we see real-world
> use for that we'll implement it in the midlayer.  And if you think it's
> really important implement it in the midlayer now and keep the patch around
> for users that want it.

we are actually trying to do  optimization by sending REPORT LUN command 
before actually configuring the target in the mid layer and configure only 
the devices that are actually present.
Agreed its not sensible to do it right now. Let's the need of it should be 
felt in the real-world, after that it can be implemented in the mid-layer.

We will remove iscsi_do_cmnd as pointed out by you from iscsi LLD.

Thanks,
Naveen. 

>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
@ 2003-11-21 11:40 Shashi Kiran T.R
  2003-11-21 17:56 ` Patrick Mansfield
  0 siblings, 1 reply; 102+ messages in thread
From: Shashi Kiran T.R @ 2003-11-21 11:40 UTC (permalink / raw)
  To: linux-scsi

>- the whole iscsi_do_cmnd use looks very very strange.  you're probably
> much better of to use the scsi_request based interfaces and use midlayer
 >queueing and sometimes even higher level interfaces, e.g. your
iscsi_send_tur
 >coudld be replaced by an scsi_ioctl(sdp, SCSI_IOCTL_TEST_UNIT_READY, NULL)
> I still don't understand, though why the heck you need to do a TUR in
 >a lowlevel driver and even more need a thread do a single TUR, started
from
 >another kernel tread.  there more I dig into it your probing and thread
 >abuse looks horribly screwed.
 >- reinitilialize disk is another such candidate.  again why do you
 >need to send START_STOP commands from the LLDD, why do you need another
 >thread instead of a state machine and again life would be a lot simpler
> by using the midlayer queueing infrastructure.

In the existing code upon seeing ASC/ASCQ 04/02, the failed I/O are retried
in the iSCSI driver after issuing START_STOP using iscsi_do_cmnd(). This
ensures that application does not see an I/O error.

 We tried replacing  iscsi_do_cmnd() call  by kernel_scsi_ioctl() , but  the
ioctl call never completed. Looks like scsi mid layer prevents the
processing of START_STOP command until responses for earlier commands are
obtained. The response is deferred upstream from iSCSI driver since it
retries any failed I/O. Probably this is creating a deadlock situation
making this alternative infeasible!.

-Shashi


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-10-27 15:39 ` Christoph Hellwig
                     ` (4 preceding siblings ...)
  2003-11-19 14:48   ` Naveen Burmi
@ 2003-11-21 16:42   ` Clay Haapala
  2003-11-21 17:32     ` Matthew Wilcox
  2003-11-24  6:09   ` Surekha.PC
  6 siblings, 1 reply; 102+ messages in thread
From: Clay Haapala @ 2003-11-21 16:42 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Krishna Murthy, linux-scsi, davmyers

On Mon, 27 Oct 2003, Christoph Hellwig said:
> 
> highlevel:
> 	- kill md5.c and iscsi-crc.c and use common cryptoapi
> 	md5 / common crc code instead.

md5: That appears straight-forward, and has been implemented, subject
     to testing.  SELECT CRYPTO and SELECT MD5 makes it available
     in the kernel.

crc32: Ah, the iSCSI crc32 uses a different polynomial (stronger) than
     Ethernet and the rest.  We argued during the RFC process for just
     using Ethernet's as good enough, but were voted down.  So we are
     stuck with a separate implementation in this case.
-- 
Clay Haapala (chaapala@cisco.com) Cisco Systems SRBU +1 763-398-1056
   6450 Wedgwood Rd, Suite 130 Maple Grove MN 55311 PGP: C89240AD
     Well, looks like hypocrisy is back on the airwaves.
       C'mon, Rush!  Do the crime, do the time, right?

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-11-21 16:42   ` Request for review of Linux iSCSI driver version 4.0.0.1 Clay Haapala
@ 2003-11-21 17:32     ` Matthew Wilcox
  2003-11-21 18:18       ` Clay Haapala
  0 siblings, 1 reply; 102+ messages in thread
From: Matthew Wilcox @ 2003-11-21 17:32 UTC (permalink / raw)
  To: Clay Haapala; +Cc: Christoph Hellwig, Krishna Murthy, linux-scsi, davmyers

On Fri, Nov 21, 2003 at 10:42:40AM -0600, Clay Haapala wrote:
> On Mon, 27 Oct 2003, Christoph Hellwig said:
> > 
> > highlevel:
> > 	- kill md5.c and iscsi-crc.c and use common cryptoapi
> > 	md5 / common crc code instead.
> 
> md5: That appears straight-forward, and has been implemented, subject
>      to testing.  SELECT CRYPTO and SELECT MD5 makes it available
>      in the kernel.
> 
> crc32: Ah, the iSCSI crc32 uses a different polynomial (stronger) than
>      Ethernet and the rest.  We argued during the RFC process for just
>      using Ethernet's as good enough, but were voted down.  So we are
>      stuck with a separate implementation in this case.

OK, but please convert it to live in the crypto/ directory and use
the crypto interfaces.  That way if some unrelated project decides to
implement the same algorithm, we don't end up with two copies of it.

-- 
"It's not Hollywood.  War is real, war is primarily not about defeat or
victory, it is about death.  I've seen thousands and thousands of dead bodies.
Do you think I want to have an academic debate on this subject?" -- Robert Fisk

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-11-21 11:40 Shashi Kiran T.R
@ 2003-11-21 17:56 ` Patrick Mansfield
  0 siblings, 0 replies; 102+ messages in thread
From: Patrick Mansfield @ 2003-11-21 17:56 UTC (permalink / raw)
  To: Shashi Kiran T.R; +Cc: linux-scsi

On Fri, Nov 21, 2003 at 05:10:09PM +0530, Shashi Kiran T.R wrote:

> In the existing code upon seeing ASC/ASCQ 04/02, the failed I/O are retried
> in the iSCSI driver after issuing START_STOP using iscsi_do_cmnd(). This
> ensures that application does not see an I/O error.
> 
>  We tried replacing  iscsi_do_cmnd() call  by kernel_scsi_ioctl() , but  the
> ioctl call never completed. Looks like scsi mid layer prevents the
> processing of START_STOP command until responses for earlier commands are
> obtained. The response is deferred upstream from iSCSI driver since it
> retries any failed I/O. Probably this is creating a deadlock situation
> making this alternative infeasible!.

So we sent a START_STOP via sd.c to spin up the drive, and since then
it has been spun down. Correct?

What spun it down? 

Whatever caused the spin down should spin it back up.

-- Patrick Mansfield

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-11-21 17:32     ` Matthew Wilcox
@ 2003-11-21 18:18       ` Clay Haapala
  2003-11-26 13:41         ` Christoph Hellwig
  0 siblings, 1 reply; 102+ messages in thread
From: Clay Haapala @ 2003-11-21 18:18 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Christoph Hellwig, Krishna Murthy, linux-scsi, davmyers

On Fri, 21 Nov 2003, Matthew Wilcox spake thusly:
> On Fri, Nov 21, 2003 at 10:42:40AM -0600, Clay Haapala wrote:
>> On Mon, 27 Oct 2003, Christoph Hellwig said:
>> > 
>> > highlevel:
>> > 	- kill md5.c and iscsi-crc.c and use common cryptoapi
>> > 	md5 / common crc code instead.
>> 
>> md5: That appears straight-forward, and has been implemented,
>> subject to testing.  SELECT CRYPTO and SELECT MD5 makes it
>> available in the kernel.
>> 
>> crc32: Ah, the iSCSI crc32 uses a different polynomial (stronger)
>> than Ethernet and the rest.  We argued during the RFC process for
>> just using Ethernet's as good enough, but were voted down.  So we
>> are stuck with a separate implementation in this case.
> 
> OK, but please convert it to live in the crypto/ directory and use
> the crypto interfaces.  That way if some unrelated project decides
> to implement the same algorithm, we don't end up with two copies of
> it.

I'm happy to do that.  Having established that it is not a redundant
implementation, though, I would like to have it not be a factor with
respect to driver inclusion, as there are bigger fish to fry.

That being said, I'll probably jump on it, anyway.

Question:  Should the implementation use the crypto api
or be added to the lib/crc32 code?
-- 
Clay Haapala (chaapala@cisco.com) Cisco Systems SRBU +1 763-398-1056
   6450 Wedgwood Rd, Suite 130 Maple Grove MN 55311 PGP: C89240AD
     Well, looks like hypocrisy is back on the airwaves.
       C'mon, Rush!  Do the crime, do the time, right?

^ permalink raw reply	[flat|nested] 102+ messages in thread

* How to generate ILI condtion on a tape device
  2003-11-19 17:17     ` Patrick Mansfield
  2003-11-20 13:32       ` Naveen Burmi
@ 2003-11-22  8:16       ` Kallol Biswas
  2003-11-24  8:31         ` Josef Möllers
  1 sibling, 1 reply; 102+ messages in thread
From: Kallol Biswas @ 2003-11-22  8:16 UTC (permalink / raw)
  Cc: linux-scsi

Hi,
  I was wondering how to generate ILI condition while reading from a
tape.

A few lines from read_tape(), file st.c
.....
 if (SRpnt->sr_sense_buffer[2] & 0x20) { /* ILI */
             if (STp->block_size == 0) {
                  if (transfer < 0) {
                         if (STps->drv_block >= 0)
                             STps->drv_block+= 1;
                             return (-ENOMEM);
                  }
                  (STp->buffer)->buffer_bytes = bytes - transfer;
              } else {
                    scsi_release_request(SRpnt);
                    SRpnt = *aSRpnt = NULL;
                    if (transfer == blks) { /* We did not get anything,
error */
                       printk(KERN_NOTICE "st%d: Incorrect block
size.\n", dev);
                                                                      
                if(STps->drv_block>=0)                                                                               STps->drv_block+= blks - transfer + 1;
                                                                                st_int_ioctl(STp, MTBSR, 1);
                        return (-EIO);
                      }
 /* We have some data, deliver it */
                                               
(STp->buffer)->buffer_bytes = (blks - transfer) * STp->block_size;
                
        DEBC(printk(ST_DEB_MSG                                                        "st%d: ILI but enough data received %ld %d.\n",
               dev, count, (STp->buffer)->buffer_bytes));
         if (STps->drv_block >= 0)
                STps->drv_block += 1;
        if (st_int_ioctl(STp, MTBSR, 1))    return (-EIO);
       }

I want to make the driver print the message "ILI but enough data
received." to generate an error condition that my new driver (similar to
tape driver) encountering while running with a software from Veritas.



-- 
Kallol Biswas <nucleodyne@comcast.net>
www.nucleodyne.com


^ permalink raw reply	[flat|nested] 102+ messages in thread

* RE: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-10-27 15:39 ` Christoph Hellwig
                     ` (5 preceding siblings ...)
  2003-11-21 16:42   ` Request for review of Linux iSCSI driver version 4.0.0.1 Clay Haapala
@ 2003-11-24  6:09   ` Surekha.PC
  2003-11-24  7:48     ` 'Christoph Hellwig'
  6 siblings, 1 reply; 102+ messages in thread
From: Surekha.PC @ 2003-11-24  6:09 UTC (permalink / raw)
  To: 'Christoph Hellwig'; +Cc: linux-scsi


Hi,

>>  	- this comment indicates you don't want normal EH:

>> * check condition with no sense.  We need to avoid this, 
>> * since the Linux SCSI code could put the command in 
>> * SCSI_STATE_FAILED, which it's error recovery doesn't appear 
>> * to handle correctly, and even if it does, we're trying to 
>> * bypass all of the Linux error recovery code
>> * to avoid blocking all I/O to the HBA.  Fake some sense 
>> * data that gets a retry from Linux.

>	so don't use scsi_error.c and define your own
eh_strategy_handler method.

By implementing the iSCSI driver specific strategy_handler we will need
to duplicate most of the mid layer strategy handler logic only to
accommodate the above condition.

Is that justified ? Or is it ok, to fix the above case in scsi_error.c ?

Thanks,
surekha


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-11-24  6:09   ` Surekha.PC
@ 2003-11-24  7:48     ` 'Christoph Hellwig'
  2003-11-24 20:45       ` Patrick Mansfield
  2003-12-11 12:31       ` Naveen Burmi
  0 siblings, 2 replies; 102+ messages in thread
From: 'Christoph Hellwig' @ 2003-11-24  7:48 UTC (permalink / raw)
  To: Surekha.PC; +Cc: 'Christoph Hellwig', linux-scsi

On Mon, Nov 24, 2003 at 11:39:48AM +0530, Surekha.PC wrote:
> 
> Hi,
> 
> >>  	- this comment indicates you don't want normal EH:
> 
> >> * check condition with no sense.  We need to avoid this, 
> >> * since the Linux SCSI code could put the command in 
> >> * SCSI_STATE_FAILED, which it's error recovery doesn't appear 
> >> * to handle correctly, and even if it does, we're trying to 
> >> * bypass all of the Linux error recovery code
> >> * to avoid blocking all I/O to the HBA.  Fake some sense 
> >> * data that gets a retry from Linux.
> 
> >	so don't use scsi_error.c and define your own
> eh_strategy_handler method.
> 
> By implementing the iSCSI driver specific strategy_handler we will need
> to duplicate most of the mid layer strategy handler logic only to
> accommodate the above condition.
> 
> Is that justified ? Or is it ok, to fix the above case in scsi_error.c ?

I haven't followed the code closely, but the comment reads like scsi_error.c
is mostly useless for you - if that were true I'd suggest using
eh_strategy_handler, if on the other hand you can fix scsi_error.c easily
it's better to use it.


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: How to generate ILI condtion on a tape device
  2003-11-22  8:16       ` How to generate ILI condtion on a tape device Kallol Biswas
@ 2003-11-24  8:31         ` Josef Möllers
  2003-11-25  7:58           ` Kallol Biswas
  0 siblings, 1 reply; 102+ messages in thread
From: Josef Möllers @ 2003-11-24  8:31 UTC (permalink / raw)
  To: Kallol Biswas; +Cc: linux-scsi

> Kallol Biswas wrote:
> 
> Hi,
>   I was wondering how to generate ILI condition while reading from a
> tape.

IIRC all you need is a tape with variable block size and you just read
the wrong blocksize.


Josef
-- 
Josef Möllers (Pinguinpfleger bei FSC)
	If failure had no penalty success would not be a prize
						-- T.  Pratchett
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-11-24  7:48     ` 'Christoph Hellwig'
@ 2003-11-24 20:45       ` Patrick Mansfield
  2003-11-26 13:45         ` 'Christoph Hellwig'
  2003-12-11 12:31       ` Naveen Burmi
  1 sibling, 1 reply; 102+ messages in thread
From: Patrick Mansfield @ 2003-11-24 20:45 UTC (permalink / raw)
  To: 'Christoph Hellwig'; +Cc: Surekha.PC, linux-scsi

On Mon, Nov 24, 2003 at 07:48:30AM +0000, 'Christoph Hellwig' wrote:
> On Mon, Nov 24, 2003 at 11:39:48AM +0530, Surekha.PC wrote:
> > 
> > Hi,
> > 
> > >>  	- this comment indicates you don't want normal EH:
> > 
> > >> * check condition with no sense.  We need to avoid this, 
> > >> * since the Linux SCSI code could put the command in 
> > >> * SCSI_STATE_FAILED, which it's error recovery doesn't appear 
> > >> * to handle correctly, and even if it does, we're trying to 
> > >> * bypass all of the Linux error recovery code
> > >> * to avoid blocking all I/O to the HBA.  Fake some sense 
> > >> * data that gets a retry from Linux.
> > 
> > >	so don't use scsi_error.c and define your own
> > eh_strategy_handler method.
> > 
> > By implementing the iSCSI driver specific strategy_handler we will need
> > to duplicate most of the mid layer strategy handler logic only to
> > accommodate the above condition.
> > 
> > Is that justified ? Or is it ok, to fix the above case in scsi_error.c ?
> 
> I haven't followed the code closely, but the comment reads like scsi_error.c
> is mostly useless for you - if that were true I'd suggest using
> eh_strategy_handler, if on the other hand you can fix scsi_error.c easily
> it's better to use it.

The eh_strategy_handler only allows changing what happens after error
handling has started, but cannot modify when the error handler runs.

Using your own won't "avoid blocking all I/O to the HBA".

i.e., we wake up the error handler in scsi_error.c via:

void scsi_eh_wakeup(struct Scsi_Host *shost)
{
	if (shost->host_busy == shost->host_failed) {
		up(shost->eh_wait);
		SCSI_LOG_ERROR_RECOVERY(5,
				printk("Waking error handler thread\n"));
	}
}

We need a new timeout + canceller mechanism, so that a single command can
be cancelled without waiting for all other IO on an adapter to complete.
Errors must still escalate some how so device, bus, or adapter resets can
(under some conditions) still be issued.

The qla2xxx hacks around this by adding its own timer that is shorter than
the scsi cmd timeout. Is that a good or a bad hack? I don't know.

-- Patrick Mansfield

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: How to generate ILI condtion on a tape device
  2003-11-24  8:31         ` Josef Möllers
@ 2003-11-25  7:58           ` Kallol Biswas
  0 siblings, 0 replies; 102+ messages in thread
From: Kallol Biswas @ 2003-11-25  7:58 UTC (permalink / raw)
  To: Josef Möllers; +Cc: scsi

Thank you. I already have tested the code path with dd command.

On Mon, 2003-11-24 at 00:31, Josef Möllers wrote:
> > Kallol Biswas wrote:
> > 
> > Hi,
> >   I was wondering how to generate ILI condition while reading from a
> > tape.
> 
> IIRC all you need is a tape with variable block size and you just read
> the wrong blocksize.
> 
> 
> Josef
-- 
Kallol Biswas <nucleodyne@comcast.net>

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-11-21 18:18       ` Clay Haapala
@ 2003-11-26 13:41         ` Christoph Hellwig
  0 siblings, 0 replies; 102+ messages in thread
From: Christoph Hellwig @ 2003-11-26 13:41 UTC (permalink / raw)
  To: Clay Haapala
  Cc: Matthew Wilcox, Christoph Hellwig, Krishna Murthy, linux-scsi,
	davmyers

On Fri, Nov 21, 2003 at 12:18:17PM -0600, Clay Haapala wrote:
> Question:  Should the implementation use the crypto api
> or be added to the lib/crc32 code?

I've talen a quick look at the code and it looks like you're doing the crc
not only for control but also for data packages, thus it seems the best idea
to make it a digest in the crypto framework.


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-11-24 20:45       ` Patrick Mansfield
@ 2003-11-26 13:45         ` 'Christoph Hellwig'
  0 siblings, 0 replies; 102+ messages in thread
From: 'Christoph Hellwig' @ 2003-11-26 13:45 UTC (permalink / raw)
  To: Patrick Mansfield; +Cc: 'Christoph Hellwig', Surekha.PC, linux-scsi

On Mon, Nov 24, 2003 at 12:45:43PM -0800, Patrick Mansfield wrote:
> We need a new timeout + canceller mechanism, so that a single command can
> be cancelled without waiting for all other IO on an adapter to complete.
> Errors must still escalate some how so device, bus, or adapter resets can
> (under some conditions) still be issued.
> 
> The qla2xxx hacks around this by adding its own timer that is shorter than
> the scsi cmd timeout. Is that a good or a bad hack? I don't know.

Well, it's a hack :)  Although one that works.   I agree with your above
statement that we need to add support for that to the midlayer.  


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-11-19 14:48     ` Christoph Hellwig
  2003-11-19 15:19       ` Naveen Burmi
@ 2003-12-01 12:10       ` Krishna Murthy
  2003-12-01 15:22         ` James Bottomley
  2003-12-01 15:27         ` Christoph Hellwig
  1 sibling, 2 replies; 102+ messages in thread
From: Krishna Murthy @ 2003-12-01 12:10 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-scsi, davmyers

Hi,
	We could use probing code from mid layer provided 
scsi_scan_host_selected is exported. 

	iscsi driver does probe for devices at two places
a)Once after session establishment.
b)Whenever an iSCSI async message indicates a change in
"REPORTED LUNS DATA". (addition/deletion of luns on the target) 

	There are no issues in calling scsi_scan_host_selected() at a).
	Addition of luns could be handled by calling scsi_scan_host_selected 
with rescan parameter set to 1.
	Deletion of luns is not handled. scsi_scan_target does not worry about 
devices which were previously configured, but currently not available.Would 
you recommend handling this in scsi_scan_target (or a new API) or handle in 
iSCSI using scsi_request based interfaces?

Please let us know.

Thanx
N.C.Krishna Murthy
		

On Wednesday 19 Nov 2003 8:18 pm, Christoph Hellwig wrote:
> On Wed, Nov 19, 2003 at 08:18:32PM +0530, Naveen Burmi wrote:
> > On Monday 27 October 2003 09:09 pm, Christoph Hellwig wrote:
> > > iscsi-probe.c:
> > > 	- the whole iscsi_do_cmnd use looks very very strange.
> >
> > After a device gets detected by iSCSI LLD, it uses iscsi_do_cmnd to send
> > scsi commands to that device even before registering it with the scsi
> > mid-layer.
> >
> > For example user can configure in iscsi LLD configuration (iscsi.conf)
> > file what various LUNs of a particular target needs to be activated. To
> > acheive this we need to execute REPORT LUN command on target, to find out
> > what various luns are present and selectively issue scsi_add_device for
> > luns that needs to be activated.
>
> Couldn't you just use scsi_scan_host_selected from scsi_scan.c for that
> if it was exported?  We already have all that big and perfectly multi-lun
> capable probing code in the scsi midlayer and it would be a shame to not
> use it.
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 102+ messages in thread

* RE: Request for review of Linux iSCSI driver version 4.0.0.1
@ 2003-12-01 12:30 Naveen Burmi
  2003-12-01 14:08 ` Naveen Burmi
  2003-12-01 16:20 ` Roman Zippel
  0 siblings, 2 replies; 102+ messages in thread
From: Naveen Burmi @ 2003-12-01 12:30 UTC (permalink / raw)
  To: hch; +Cc: linux-scsi


Christoph,

You have following review comment for linux-iscsi driver:
"the explanation of  PREVENT_DATA_CORRUPTION does'nt make sense".

Following is the explanation of the requirement of  "PREVENT_DATA_CORRUPTION":

In linux kernel there is a rare occurrence of data corruption. This data can 
be buffer cache data as well as raw I/O data. Don't know the actual root 
cause of the problem, but the fix that we put under macro 
"PREVENT_DATA_CORRUPTION" is capable of resolving this problem.

Following is the scenario where we have observed this problem (still persist  
in test9, 2.6 linux kernel):

On iSCSI target, 5428-2, upon detecting the CRC data error by the QlogicFC 
card it generates  the sense data with sense key as HARDWARE ERROR
and Additional sense data as LOGICAL UNIT COMMUNICATION FAILURE.
Upon receving this sense data iSCSI initiator driver is not retrying this 
command and failing the command to the upper layers of SCSI subsystem.
But linux SCSI subsystem does not perform any error recovery for this kind of
sense data.

Following is the fix:
1) Protecting the original buffer cache and raw I/O data from getting
garbled. This is acheived by copying the incoming I/O buffer temporarily 
in an internal buffer and then sending the copied data down to TCP.

2) Before failing the scsi command to upper SCSI layers, iSCSI driver
is retrying the scsi command upon receiving the sense data with sense key
as HARDWARE ERROR.


Are you aware of this issue OR any related issue?

Thanks,
Naveen.
 


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-12-01 12:30 Naveen Burmi
@ 2003-12-01 14:08 ` Naveen Burmi
  2003-12-01 18:48   ` Andre Hedrick
  2003-12-01 19:23   ` Andre Hedrick
  2003-12-01 16:20 ` Roman Zippel
  1 sibling, 2 replies; 102+ messages in thread
From: Naveen Burmi @ 2003-12-01 14:08 UTC (permalink / raw)
  To: hch; +Cc: linux-scsi


Following is a simpler explanation:

iSCSI driver computes checksum on the buffer cache data, pointed to by 
sglist. It passes the checksum and list of data buffers to TCP layer for 
transmission to the iSCSI target. On very rare occasions data present in the 
buffer gets corrupted, after iSCSI driver gives the computed checksum and 
list of data buffers to TCP layers. At the iSCSI target side checksum is 
again calculated and compared with checksum sent along with data by iSCSI 
driver. iSCSI target detects data crc error in case of corrupted data.

Since the data buffers(present in buffer cache) has got corrupted even if 
iSCSI initiator driver re-tries to send data and computed checksum to target 
and if target's checksum matches with the initiator ckecksum, the garbled 
data gets written on to the disk(scsi target).

To prevent writing garbled data onto disk(scsi target) iSCSI driver copies 
the buffer cache data into locally allocated buffers, computes the checksum 
and give the list of locally allocated buffers to the TCP layer  for 
transmission of data to iSCSI target.

Thanks,
Naveen.


On Monday 01 December 2003 06:00 pm, Naveen Burmi wrote:
> Christoph,
>
> You have following review comment for linux-iscsi driver:
> "the explanation of  PREVENT_DATA_CORRUPTION does'nt make sense".
>
> Following is the explanation of the requirement of 
> "PREVENT_DATA_CORRUPTION":
>
> In linux kernel there is a rare occurrence of data corruption. This data
> can be buffer cache data as well as raw I/O data. Don't know the actual
> root cause of the problem, but the fix that we put under macro
> "PREVENT_DATA_CORRUPTION" is capable of resolving this problem.
>
> Following is the scenario where we have observed this problem (still
> persist in test9, 2.6 linux kernel):
>
> On iSCSI target, 5428-2, upon detecting the CRC data error by the QlogicFC
> card it generates  the sense data with sense key as HARDWARE ERROR
> and Additional sense data as LOGICAL UNIT COMMUNICATION FAILURE.
> Upon receving this sense data iSCSI initiator driver is not retrying this
> command and failing the command to the upper layers of SCSI subsystem.
> But linux SCSI subsystem does not perform any error recovery for this kind
> of sense data.
>
> Following is the fix:
> 1) Protecting the original buffer cache and raw I/O data from getting
> garbled. This is acheived by copying the incoming I/O buffer temporarily
> in an internal buffer and then sending the copied data down to TCP.
>
> 2) Before failing the scsi command to upper SCSI layers, iSCSI driver
> is retrying the scsi command upon receiving the sense data with sense key
> as HARDWARE ERROR.
>
>
> Are you aware of this issue OR any related issue?
>
> Thanks,
> Naveen.
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-12-01 12:10       ` Krishna Murthy
@ 2003-12-01 15:22         ` James Bottomley
  2003-12-04 12:30           ` N.C.Krishna Murthy
  2003-12-01 15:27         ` Christoph Hellwig
  1 sibling, 1 reply; 102+ messages in thread
From: James Bottomley @ 2003-12-01 15:22 UTC (permalink / raw)
  To: Krishna Murthy; +Cc: Christoph Hellwig, SCSI Mailing List, davmyers

On Mon, 2003-12-01 at 06:10, Krishna Murthy wrote:
> Hi,
> 	We could use probing code from mid layer provided 
> scsi_scan_host_selected is exported. 
> 
> 	iscsi driver does probe for devices at two places
> a)Once after session establishment.
> b)Whenever an iSCSI async message indicates a change in
> "REPORTED LUNS DATA". (addition/deletion of luns on the target) 

This sounds like an async event which could be handled by the hotplug
system.  Would there be a problem with simply transmitting this to a
user level utility and having it trigger the rescan through the exported
sysfs interfaces?

We could do with a generic infrastructure within SCSI for generating
these events, anyway, if you want to go that route...

James



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-12-01 12:10       ` Krishna Murthy
  2003-12-01 15:22         ` James Bottomley
@ 2003-12-01 15:27         ` Christoph Hellwig
  2003-12-02  5:52           ` N.C.Krishna Murthy
  1 sibling, 1 reply; 102+ messages in thread
From: Christoph Hellwig @ 2003-12-01 15:27 UTC (permalink / raw)
  To: Krishna Murthy; +Cc: linux-scsi, davmyers

On Mon, Dec 01, 2003 at 05:40:09PM +0530, Krishna Murthy wrote:
> scsi_scan_host_selected is exported. 
> 
> 	iscsi driver does probe for devices at two places
> a)Once after session establishment.
> b)Whenever an iSCSI async message indicates a change in
> "REPORTED LUNS DATA". (addition/deletion of luns on the target) 
> 
> 	There are no issues in calling scsi_scan_host_selected() at a).
> 	Addition of luns could be handled by calling scsi_scan_host_selected 
> with rescan parameter set to 1.
> 	Deletion of luns is not handled. scsi_scan_target does not worry about 
> devices which were previously configured, but currently not available.Would 
> you recommend handling this in scsi_scan_target (or a new API) or handle in 
> iSCSI using scsi_request based interfaces?

Case b) could easily use scsi_add_device/scsi_remove_device, couldn't it?


^ permalink raw reply	[flat|nested] 102+ messages in thread

* RE: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-12-01 12:30 Naveen Burmi
  2003-12-01 14:08 ` Naveen Burmi
@ 2003-12-01 16:20 ` Roman Zippel
  2003-12-01 17:19   ` Scott M. Ferris
                     ` (2 more replies)
  1 sibling, 3 replies; 102+ messages in thread
From: Roman Zippel @ 2003-12-01 16:20 UTC (permalink / raw)
  To: Naveen Burmi; +Cc: hch, linux-scsi

Hi,

On Mon, 1 Dec 2003, Naveen Burmi wrote:

> In linux kernel there is a rare occurrence of data corruption. This data can
> be buffer cache data as well as raw I/O data. Don't know the actual root
> cause of the problem, but the fix that we put under macro
> "PREVENT_DATA_CORRUPTION" is capable of resolving this problem.

You probably mean the page cache and it's normal that a page can be
modified, while it's written out.

> On iSCSI target, 5428-2, upon detecting the CRC data error by the QlogicFC
> card it generates  the sense data with sense key as HARDWARE ERROR
> and Additional sense data as LOGICAL UNIT COMMUNICATION FAILURE.
> Upon receving this sense data iSCSI initiator driver is not retrying this
> command and failing the command to the upper layers of SCSI subsystem.
> But linux SCSI subsystem does not perform any error recovery for this kind of
> sense data.

The answer from the target is weird, why does it react with a scsi error
to an iSCSI transport problem?
At ERL0 the target should just drop the packet and/or close the
connection. Recovery of such case would require ERL1, where the target
could request to retransmit the corrupted data pdu or send a CHECK
CONDITION (but also with a different sense data).

bye, Roman

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-12-01 16:20 ` Roman Zippel
@ 2003-12-01 17:19   ` Scott M. Ferris
  2003-12-01 20:06     ` Clay Haapala
  2003-12-02  3:46   ` Andre Hedrick
  2003-12-02 11:56   ` Naveen Burmi
  2 siblings, 1 reply; 102+ messages in thread
From: Scott M. Ferris @ 2003-12-01 17:19 UTC (permalink / raw)
  To: Roman Zippel; +Cc: Naveen Burmi, hch, linux-scsi

Roman Zippel wrote:
> Hi,
> 
> On Mon, 1 Dec 2003, Naveen Burmi wrote:
> 
> > In linux kernel there is a rare occurrence of data corruption. This data can
> > be buffer cache data as well as raw I/O data. Don't know the actual root
> > cause of the problem, but the fix that we put under macro
> > "PREVENT_DATA_CORRUPTION" is capable of resolving this problem.
> 
> You probably mean the page cache and it's normal that a page can be
> modified, while it's written out.
> 
> > On iSCSI target, 5428-2, upon detecting the CRC data error by the QlogicFC
> > card it generates  the sense data with sense key as HARDWARE ERROR
> > and Additional sense data as LOGICAL UNIT COMMUNICATION FAILURE.
> > Upon receving this sense data iSCSI initiator driver is not retrying this
> > command and failing the command to the upper layers of SCSI subsystem.
> > But linux SCSI subsystem does not perform any error recovery for this kind of
> > sense data.
> 
> The answer from the target is weird, why does it react with a scsi error
> to an iSCSI transport problem?

I don't think he's describing an iSCSI transport problem.  It's not
the iSCSI CRC that was bad.  A Fibre Channel HBA in an iSCSI<->FCP
gateway detected a Fibre Channel CRC error, and the gateway decided to
report it as a HARDWARE ERROR to the iSCSI initiator.  Linux doesn't
retry HARDWARE ERRORs, so they have a problem.

I think that's a bad choice of sense key by the gateway. COMMAND
ABORTED would be a better choice if the gateway wants the initiator to
retry the command.  0B/47xx and 0B/48xx are commonly used to report
CRC and parity errors.  Fixing the gateway seems like the best
solution to me.

Copying the data is not a good solution to anything.  If they think
there's a bug in the TCP stack that is corrupting the data, they ought
to go fix the TCP stack.  If the TCP stack isn't corrupting the data,
then copying the data is a waste of time, and puts memory allocation
in the I/O path, which is really not a good idea.

-- 
Scott M. Ferris,
sferris@acm.org 

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-12-01 14:08 ` Naveen Burmi
@ 2003-12-01 18:48   ` Andre Hedrick
  2003-12-01 19:23   ` Andre Hedrick
  1 sibling, 0 replies; 102+ messages in thread
From: Andre Hedrick @ 2003-12-01 18:48 UTC (permalink / raw)
  To: Naveen Burmi; +Cc: hch, linux-scsi



Andre Hedrick
LAD Storage Consulting Group

On Mon, 1 Dec 2003, Naveen Burmi wrote:

> 
> Following is a simpler explanation:
> 
> iSCSI driver computes checksum on the buffer cache data, pointed to by 
> sglist. It passes the checksum and list of data buffers to TCP layer for 
> transmission to the iSCSI target. On very rare occasions data present in the 
> buffer gets corrupted, after iSCSI driver gives the computed checksum and 
> list of data buffers to TCP layers. At the iSCSI target side checksum is 
> again calculated and compared with checksum sent along with data by iSCSI 
> driver. iSCSI target detects data crc error in case of corrupted data.
> 
> Since the data buffers(present in buffer cache) has got corrupted even if 
> iSCSI initiator driver re-tries to send data and computed checksum to target 
> and if target's checksum matches with the initiator ckecksum, the garbled 
> data gets written on to the disk(scsi target).
> 
> To prevent writing garbled data onto disk(scsi target) iSCSI driver copies 
> the buffer cache data into locally allocated buffers, computes the checksum 
> and give the list of locally allocated buffers to the TCP layer  for 
> transmission of data to iSCSI target.
> 
> Thanks,
> Naveen.
> 
> 
> On Monday 01 December 2003 06:00 pm, Naveen Burmi wrote:
> > Christoph,
> >
> > You have following review comment for linux-iscsi driver:
> > "the explanation of  PREVENT_DATA_CORRUPTION does'nt make sense".
> >
> > Following is the explanation of the requirement of 
> > "PREVENT_DATA_CORRUPTION":
> >
> > In linux kernel there is a rare occurrence of data corruption. This data
> > can be buffer cache data as well as raw I/O data. Don't know the actual
> > root cause of the problem, but the fix that we put under macro
> > "PREVENT_DATA_CORRUPTION" is capable of resolving this problem.
> >
> > Following is the scenario where we have observed this problem (still
> > persist in test9, 2.6 linux kernel):
> >
> > On iSCSI target, 5428-2, upon detecting the CRC data error by the QlogicFC
> > card it generates  the sense data with sense key as HARDWARE ERROR
> > and Additional sense data as LOGICAL UNIT COMMUNICATION FAILURE.
> > Upon receving this sense data iSCSI initiator driver is not retrying this
> > command and failing the command to the upper layers of SCSI subsystem.
> > But linux SCSI subsystem does not perform any error recovery for this kind
> > of sense data.
> >
> > Following is the fix:
> > 1) Protecting the original buffer cache and raw I/O data from getting
> > garbled. This is acheived by copying the incoming I/O buffer temporarily
> > in an internal buffer and then sending the copied data down to TCP.
> >
> > 2) Before failing the scsi command to upper SCSI layers, iSCSI driver
> > is retrying the scsi command upon receiving the sense data with sense key
> > as HARDWARE ERROR.
> >
> >
> > Are you aware of this issue OR any related issue?
> >
> > Thanks,
> > Naveen.
> >
> >
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-12-01 14:08 ` Naveen Burmi
  2003-12-01 18:48   ` Andre Hedrick
@ 2003-12-01 19:23   ` Andre Hedrick
  1 sibling, 0 replies; 102+ messages in thread
From: Andre Hedrick @ 2003-12-01 19:23 UTC (permalink / raw)
  To: Naveen Burmi; +Cc: hch, linux-scsi


Sorry spaz attack on the first reply.

Naveen, all your memcpy of buffers did was insure the content you want
down never got there.  Committing CRC32C's on the fly in a ZERO Copy call
will require the higher levels of ERL support. ERL < 1 is a "JOKE".
Sync-and-Steering with at least ERL >= 1 will allow for content in each
PDU to be independent from the other.  Additionally one can logically
retry segments and not the entire command.

If your initiator can not tell or process the difference and know to
retry, to put it nicely "EWW".  Anyone who has ever dealt with the storage
stack in Linux knows, failing commands back up to originator is a last
effort and basically hopeless.  There are no paths back for direct
notification on real or virtual HBA to the caller.  Only if you lose a
channel or device would one abort and fail itself.

Additionally, since the Cisco 4.0.0 is just a virtual HBA, where does the
Qlogic error occur?  If the target is not capable of mapping and retrying
(real disk + real hba) transport errors to notify the initiator to retry,
well lets not go there because it will hurt.

Well I think I am going off task, sigh.

Cheers,

Andre Hedrick
LAD Storage Consulting Group

On Mon, 1 Dec 2003, Naveen Burmi wrote:

> 
> Following is a simpler explanation:
> 
> iSCSI driver computes checksum on the buffer cache data, pointed to by 
> sglist. It passes the checksum and list of data buffers to TCP layer for 
> transmission to the iSCSI target. On very rare occasions data present in the 
> buffer gets corrupted, after iSCSI driver gives the computed checksum and 
> list of data buffers to TCP layers. At the iSCSI target side checksum is 
> again calculated and compared with checksum sent along with data by iSCSI 
> driver. iSCSI target detects data crc error in case of corrupted data.
> 
> Since the data buffers(present in buffer cache) has got corrupted even if 
> iSCSI initiator driver re-tries to send data and computed checksum to target 
> and if target's checksum matches with the initiator ckecksum, the garbled 
> data gets written on to the disk(scsi target).
> 
> To prevent writing garbled data onto disk(scsi target) iSCSI driver copies 
> the buffer cache data into locally allocated buffers, computes the checksum 
> and give the list of locally allocated buffers to the TCP layer  for 
> transmission of data to iSCSI target.
> 
> Thanks,
> Naveen.
> 
> 
> On Monday 01 December 2003 06:00 pm, Naveen Burmi wrote:
> > Christoph,
> >
> > You have following review comment for linux-iscsi driver:
> > "the explanation of  PREVENT_DATA_CORRUPTION does'nt make sense".
> >
> > Following is the explanation of the requirement of 
> > "PREVENT_DATA_CORRUPTION":
> >
> > In linux kernel there is a rare occurrence of data corruption. This data
> > can be buffer cache data as well as raw I/O data. Don't know the actual
> > root cause of the problem, but the fix that we put under macro
> > "PREVENT_DATA_CORRUPTION" is capable of resolving this problem.
> >
> > Following is the scenario where we have observed this problem (still
> > persist in test9, 2.6 linux kernel):
> >
> > On iSCSI target, 5428-2, upon detecting the CRC data error by the QlogicFC
> > card it generates  the sense data with sense key as HARDWARE ERROR
> > and Additional sense data as LOGICAL UNIT COMMUNICATION FAILURE.
> > Upon receving this sense data iSCSI initiator driver is not retrying this
> > command and failing the command to the upper layers of SCSI subsystem.
> > But linux SCSI subsystem does not perform any error recovery for this kind
> > of sense data.
> >
> > Following is the fix:
> > 1) Protecting the original buffer cache and raw I/O data from getting
> > garbled. This is acheived by copying the incoming I/O buffer temporarily
> > in an internal buffer and then sending the copied data down to TCP.
> >
> > 2) Before failing the scsi command to upper SCSI layers, iSCSI driver
> > is retrying the scsi command upon receiving the sense data with sense key
> > as HARDWARE ERROR.
> >
> >
> > Are you aware of this issue OR any related issue?
> >
> > Thanks,
> > Naveen.
> >
> >
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-12-01 17:19   ` Scott M. Ferris
@ 2003-12-01 20:06     ` Clay Haapala
  2003-12-01 20:31       ` Andre Hedrick
  0 siblings, 1 reply; 102+ messages in thread
From: Clay Haapala @ 2003-12-01 20:06 UTC (permalink / raw)
  To: Scott M. Ferris; +Cc: Roman Zippel, Naveen Burmi, hch, linux-scsi

On Mon, 1 Dec 2003, Scott M. Ferris verbalised:
> Roman Zippel wrote:
>> Hi,
>> 
>> On Mon, 1 Dec 2003, Naveen Burmi wrote:
>> 
>> > In linux kernel there is a rare occurrence of data
>> > corruption. This data can be buffer cache data as well as raw I/O
>> > data. Don't know the actual root cause of the problem, but the
>> > fix that we put under macro "PREVENT_DATA_CORRUPTION" is capable
>> > of resolving this problem.
>> 
>> You probably mean the page cache and it's normal that a page can be
>> modified, while it's written out.
>> 
>> > On iSCSI target, 5428-2, upon detecting the CRC data error by the
>> > QlogicFC card it generates the sense data with sense key as
>> > HARDWARE ERROR and Additional sense data as LOGICAL UNIT
>> > COMMUNICATION FAILURE.  Upon receving this sense data iSCSI
>> > initiator driver is not retrying this command and failing the
>> > command to the upper layers of SCSI subsystem.  But linux SCSI
>> > subsystem does not perform any error recovery for this kind of
>> > sense data.
>> 
>> The answer from the target is weird, why does it react with a scsi
>> error to an iSCSI transport problem?
> 
> I don't think he's describing an iSCSI transport problem.  It's not
> the iSCSI CRC that was bad.  A Fibre Channel HBA in an iSCSI<->FCP
> gateway detected a Fibre Channel CRC error, and the gateway decided
> to report it as a HARDWARE ERROR to the iSCSI initiator.  Linux
> doesn't retry HARDWARE ERRORs, so they have a problem.
> 
> I think that's a bad choice of sense key by the gateway. COMMAND
> ABORTED would be a better choice if the gateway wants the initiator
> to retry the command.  0B/47xx and 0B/48xx are commonly used to
> report CRC and parity errors.  Fixing the gateway seems like the
> best solution to me.
> 
> Copying the data is not a good solution to anything.  If they think
> there's a bug in the TCP stack that is corrupting the data, they
> ought to go fix the TCP stack.  If the TCP stack isn't corrupting
> the data, then copying the data is a waste of time, and puts memory
> allocation in the I/O path, which is really not a good idea.

We need to make sure which type of error we are really talking about,
here, iSCSI-CRC32c digest error or Fibre Channel link error.

Using the Cisco SN 5428-2 as the example gateway here, in the case of
a Fibre Channel CRC error, the command is retried a [configurable]
number of times.  If no success, then the ususal SCSI sense codes are
sent back to the initiator, and they should indicate HARDWARE ERROR.

In the case of an error in the iSCSI-CRC32c digest, this is computed
in the SN 5428-2 by the QL 2320 HBA.  (That's where the references to
QLogic came from.)  Early releases of the SN 5428 returned the
HARDWARE ERROR (LUN failure) status, but that has been changed to
match the iSCSI spec in an upcoming release and return the 0B/4705
code (COMMAND ABORT) in that case.  (That fix for the 2320-offload
lagged the fix in the software computation.)

So, Naveen, which type of error are we talking about here?  If it is
the digest error, and COMMAND ABORT happens instead of HARDWARE ERROR,
is the buffer copy still necessary?  I can see the case above where
the page buffer is modified while it is being written (and,
presumably, still marked dirty so another write of it will happen),
but I dimly recall discussion of a case where the buffer got thrown
away.  If a COMMAND ABORT is returned rather than the other code, does
this case still obtain?
-- 
Clay Haapala (chaapala@cisco.com) Cisco Systems SRBU +1 763-398-1056
   6450 Wedgwood Rd, Suite 130 Maple Grove MN 55311 PGP: C89240AD
     Well, looks like hypocrisy is back on the airwaves.
       C'mon, Rush!  Do the crime, do the time, right?

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-12-01 20:06     ` Clay Haapala
@ 2003-12-01 20:31       ` Andre Hedrick
  2003-12-01 20:58         ` Clay Haapala
  0 siblings, 1 reply; 102+ messages in thread
From: Andre Hedrick @ 2003-12-01 20:31 UTC (permalink / raw)
  To: Clay Haapala; +Cc: Scott M. Ferris, Roman Zippel, Naveen Burmi, hch, linux-scsi


Well it sounds like you are passing the FC error back over the iSCSI
transport, as a CRC32C would only invoke a retry or session reinstatement
in worst case (well absolute worst case, TPGT migration blab blah).

It is a bad target unless it was under a task management command set where
the frontend to scsi management would require the full senes data back.
Otherwise, all things considered in storage is a lie, the response is
faked.

Cheers,

Andre
--------------------------------------
If liberals could make it on the airwaves they would not whine; however,
they have the LA Times.


On Mon, 1 Dec 2003, Clay Haapala wrote:

> On Mon, 1 Dec 2003, Scott M. Ferris verbalised:
> > Roman Zippel wrote:
> >> Hi,
> >> 
> >> On Mon, 1 Dec 2003, Naveen Burmi wrote:
> >> 
> >> > In linux kernel there is a rare occurrence of data
> >> > corruption. This data can be buffer cache data as well as raw I/O
> >> > data. Don't know the actual root cause of the problem, but the
> >> > fix that we put under macro "PREVENT_DATA_CORRUPTION" is capable
> >> > of resolving this problem.
> >> 
> >> You probably mean the page cache and it's normal that a page can be
> >> modified, while it's written out.
> >> 
> >> > On iSCSI target, 5428-2, upon detecting the CRC data error by the
> >> > QlogicFC card it generates the sense data with sense key as
> >> > HARDWARE ERROR and Additional sense data as LOGICAL UNIT
> >> > COMMUNICATION FAILURE.  Upon receving this sense data iSCSI
> >> > initiator driver is not retrying this command and failing the
> >> > command to the upper layers of SCSI subsystem.  But linux SCSI
> >> > subsystem does not perform any error recovery for this kind of
> >> > sense data.
> >> 
> >> The answer from the target is weird, why does it react with a scsi
> >> error to an iSCSI transport problem?
> > 
> > I don't think he's describing an iSCSI transport problem.  It's not
> > the iSCSI CRC that was bad.  A Fibre Channel HBA in an iSCSI<->FCP
> > gateway detected a Fibre Channel CRC error, and the gateway decided
> > to report it as a HARDWARE ERROR to the iSCSI initiator.  Linux
> > doesn't retry HARDWARE ERRORs, so they have a problem.
> > 
> > I think that's a bad choice of sense key by the gateway. COMMAND
> > ABORTED would be a better choice if the gateway wants the initiator
> > to retry the command.  0B/47xx and 0B/48xx are commonly used to
> > report CRC and parity errors.  Fixing the gateway seems like the
> > best solution to me.
> > 
> > Copying the data is not a good solution to anything.  If they think
> > there's a bug in the TCP stack that is corrupting the data, they
> > ought to go fix the TCP stack.  If the TCP stack isn't corrupting
> > the data, then copying the data is a waste of time, and puts memory
> > allocation in the I/O path, which is really not a good idea.
> 
> We need to make sure which type of error we are really talking about,
> here, iSCSI-CRC32c digest error or Fibre Channel link error.
> 
> Using the Cisco SN 5428-2 as the example gateway here, in the case of
> a Fibre Channel CRC error, the command is retried a [configurable]
> number of times.  If no success, then the ususal SCSI sense codes are
> sent back to the initiator, and they should indicate HARDWARE ERROR.
> 
> In the case of an error in the iSCSI-CRC32c digest, this is computed
> in the SN 5428-2 by the QL 2320 HBA.  (That's where the references to
> QLogic came from.)  Early releases of the SN 5428 returned the
> HARDWARE ERROR (LUN failure) status, but that has been changed to
> match the iSCSI spec in an upcoming release and return the 0B/4705
> code (COMMAND ABORT) in that case.  (That fix for the 2320-offload
> lagged the fix in the software computation.)
> 
> So, Naveen, which type of error are we talking about here?  If it is
> the digest error, and COMMAND ABORT happens instead of HARDWARE ERROR,
> is the buffer copy still necessary?  I can see the case above where
> the page buffer is modified while it is being written (and,
> presumably, still marked dirty so another write of it will happen),
> but I dimly recall discussion of a case where the buffer got thrown
> away.  If a COMMAND ABORT is returned rather than the other code, does
> this case still obtain?
> -- 
> Clay Haapala (chaapala@cisco.com) Cisco Systems SRBU +1 763-398-1056
>    6450 Wedgwood Rd, Suite 130 Maple Grove MN 55311 PGP: C89240AD
>      Well, looks like hypocrisy is back on the airwaves.
>        C'mon, Rush!  Do the crime, do the time, right?
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-12-01 20:31       ` Andre Hedrick
@ 2003-12-01 20:58         ` Clay Haapala
  0 siblings, 0 replies; 102+ messages in thread
From: Clay Haapala @ 2003-12-01 20:58 UTC (permalink / raw)
  To: Andre Hedrick
  Cc: Scott M. Ferris, Roman Zippel, Naveen Burmi, hch, linux-scsi

On Mon, 1 Dec 2003, Andre Hedrick stated:
> 
> Well it sounds like you are passing the FC error back over the iSCSI
> transport,
You assume too much -- that is what I asked Naveen to specify.

> as a CRC32C would only invoke a retry or session
> reinstatement in worst case (well absolute worst case, TPGT
> migration blab blah).
> 
As I pointed out, the behavior of the target has changed, relatively
recently, properly to return a COMMAND ABORT code for a CRC32C error,
so maybe the "would only invoke a retry..." behavior should now happen.

> It is a bad target unless it was under a task management command set
> where the frontend to scsi management would require the full senes
> data back.  Otherwise, all things considered in storage is a lie,
> the response is faked.
> 
> Cheers,
> 
> Andre
> --------------------------------------
> If liberals could make it on the airwaves they would not whine;
> however, they have the LA Times.
> 
> 
> On Mon, 1 Dec 2003, Clay Haapala wrote:
> 
>> On Mon, 1 Dec 2003, Scott M. Ferris verbalised:
>> > Roman Zippel wrote:
>> >> Hi,
>> >> 
>> >> On Mon, 1 Dec 2003, Naveen Burmi wrote:
>> >> 
>> >> > In linux kernel there is a rare occurrence of data
>> >> > corruption. This data can be buffer cache data as well as raw
>> >> > I/O data. Don't know the actual root cause of the problem, but
>> >> > the fix that we put under macro "PREVENT_DATA_CORRUPTION" is
>> >> > capable of resolving this problem.
>> >> 
>> >> You probably mean the page cache and it's normal that a page can
>> >> be modified, while it's written out.
>> >> 
>> >> > On iSCSI target, 5428-2, upon detecting the CRC data error by
>> >> > the QlogicFC card it generates the sense data with sense key
>> >> > as HARDWARE ERROR and Additional sense data as LOGICAL UNIT
>> >> > COMMUNICATION FAILURE.  Upon receving this sense data iSCSI
>> >> > initiator driver is not retrying this command and failing the
>> >> > command to the upper layers of SCSI subsystem.  But linux SCSI
>> >> > subsystem does not perform any error recovery for this kind of
>> >> > sense data.
>> >> 
>> >> The answer from the target is weird, why does it react with a
>> >> scsi error to an iSCSI transport problem?
>> > 
>> > I don't think he's describing an iSCSI transport problem.  It's
>> > not the iSCSI CRC that was bad.  A Fibre Channel HBA in an
>> > iSCSI<->FCP gateway detected a Fibre Channel CRC error, and the
>> > gateway decided to report it as a HARDWARE ERROR to the iSCSI
>> > initiator.  Linux doesn't retry HARDWARE ERRORs, so they have a
>> > problem.
>> > 
>> > I think that's a bad choice of sense key by the gateway. COMMAND
>> > ABORTED would be a better choice if the gateway wants the
>> > initiator to retry the command.  0B/47xx and 0B/48xx are commonly
>> > used to report CRC and parity errors.  Fixing the gateway seems
>> > like the best solution to me.
>> > 
>> > Copying the data is not a good solution to anything.  If they
>> > think there's a bug in the TCP stack that is corrupting the data,
>> > they ought to go fix the TCP stack.  If the TCP stack isn't
>> > corrupting the data, then copying the data is a waste of time,
>> > and puts memory allocation in the I/O path, which is really not a
>> > good idea.
>> 
>> We need to make sure which type of error we are really talking
>> about, here, iSCSI-CRC32c digest error or Fibre Channel link error.
>> 
>> Using the Cisco SN 5428-2 as the example gateway here, in the case
>> of a Fibre Channel CRC error, the command is retried a
>> [configurable] number of times.  If no success, then the ususal
>> SCSI sense codes are sent back to the initiator, and they should
>> indicate HARDWARE ERROR.
>> 
>> In the case of an error in the iSCSI-CRC32c digest, this is
>> computed in the SN 5428-2 by the QL 2320 HBA.  (That's where the
>> references to QLogic came from.)  Early releases of the SN 5428
>> returned the HARDWARE ERROR (LUN failure) status, but that has been
>> changed to match the iSCSI spec in an upcoming release and return
>> the 0B/4705 code (COMMAND ABORT) in that case.  (That fix for the
>> 2320-offload lagged the fix in the software computation.)
>> 
>> So, Naveen, which type of error are we talking about here?  If it
>> is the digest error, and COMMAND ABORT happens instead of HARDWARE
>> ERROR, is the buffer copy still necessary?  I can see the case
>> above where the page buffer is modified while it is being written
>> (and, presumably, still marked dirty so another write of it will
>> happen), but I dimly recall discussion of a case where the buffer
>> got thrown away.  If a COMMAND ABORT is returned rather than the
>> other code, does this case still obtain?
>> -- 
>> Clay Haapala (chaapala@cisco.com) Cisco Systems SRBU +1
>> 763-398-1056 6450 Wedgwood Rd, Suite 130 Maple Grove MN 55311 PGP:
>> C89240AD Well, looks like hypocrisy is back on the airwaves.
>> C'mon, Rush!  Do the crime, do the time, right?  - To unsubscribe
>> from this list: send the line "unsubscribe linux-scsi" in the body
>> of a message to majordomo@vger.kernel.org More majordomo info at
>> http://vger.kernel.org/majordomo-info.html
>> 

-- 
Clay Haapala (chaapala@cisco.com) Cisco Systems SRBU +1 763-398-1056
   6450 Wedgwood Rd, Suite 130 Maple Grove MN 55311 PGP: C89240AD
     Well, looks like hypocrisy is back on the airwaves.
       C'mon, Rush!  Do the crime, do the time, right?

^ permalink raw reply	[flat|nested] 102+ messages in thread

* RE: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-12-01 16:20 ` Roman Zippel
  2003-12-01 17:19   ` Scott M. Ferris
@ 2003-12-02  3:46   ` Andre Hedrick
  2003-12-02 12:02     ` Naveen Burmi
  2003-12-02 13:57     ` Roman Zippel
  2003-12-02 11:56   ` Naveen Burmi
  2 siblings, 2 replies; 102+ messages in thread
From: Andre Hedrick @ 2003-12-02  3:46 UTC (permalink / raw)
  To: Roman Zippel; +Cc: Naveen Burmi, hch, linux-scsi



The rare cases of "data corruption" under iSCSI is because most iSCSI
products fail to follow the rules of DAS.  Operating from the mindset of
DAS, local buffers are permitted to change regardless.  This includes
execution to platter.  It includes pre and operationial IO execution or
commits to the media.

If a "memcpy" is held and the original content changes with the top level
caller not marking the request dirty, BOOM you just commited crap to
platter.

If you are not zero-copy to the network stack, you will never catch the
changes between calculation of the CRC32C and pumping to the wire.

iSCSI(MC)

BLOCK->SCSI->iSCSI->MEMCPY->CRC32C->TCP->WIRE
                              ^^^^^^^^^^^^^^
                              BOOM memory buffers change

iSCSI(ZC)

BLOCK->SCSI->iSCSI->CRC32C->TCP->WIRE
                      ^^^^^^^^^^^^^^
                      BOOM memory buffers change

iSCSI(MC) passes CRC32C

iSCSI(ZC) fails CRC32C, and retries the failed PDU.

It should be clear that any TARGET, INITIATOR, or PAIR doing iSCSI(MC)
will roast your data.

Now iSCSI(ZC) requires ERL >= 1 w/ Sync-and-Steering.

Real iSCSI does ERL >= 2, but then again > 2 means new async-connections.
To learn more join the IETF reflector.  Please be kind to the folks who
are only "network centric" they mean well but just don't get storage.

Cheers,

Andre Hedrick
LAD Storage Consulting Group

On Mon, 1 Dec 2003, Roman Zippel wrote:

> Hi,
> 
> On Mon, 1 Dec 2003, Naveen Burmi wrote:
> 
> > In linux kernel there is a rare occurrence of data corruption. This data can
> > be buffer cache data as well as raw I/O data. Don't know the actual root
> > cause of the problem, but the fix that we put under macro
> > "PREVENT_DATA_CORRUPTION" is capable of resolving this problem.
> 
> You probably mean the page cache and it's normal that a page can be
> modified, while it's written out.
> 
> > On iSCSI target, 5428-2, upon detecting the CRC data error by the QlogicFC
> > card it generates  the sense data with sense key as HARDWARE ERROR
> > and Additional sense data as LOGICAL UNIT COMMUNICATION FAILURE.
> > Upon receving this sense data iSCSI initiator driver is not retrying this
> > command and failing the command to the upper layers of SCSI subsystem.
> > But linux SCSI subsystem does not perform any error recovery for this kind of
> > sense data.
> 
> The answer from the target is weird, why does it react with a scsi error
> to an iSCSI transport problem?
> At ERL0 the target should just drop the packet and/or close the
> connection. Recovery of such case would require ERL1, where the target
> could request to retransmit the corrupted data pdu or send a CHECK
> CONDITION (but also with a different sense data).
> 
> bye, Roman
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-12-01 15:27         ` Christoph Hellwig
@ 2003-12-02  5:52           ` N.C.Krishna Murthy
  2003-12-02 16:59             ` Patrick Mansfield
  0 siblings, 1 reply; 102+ messages in thread
From: N.C.Krishna Murthy @ 2003-12-02  5:52 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-scsi, davmyers

Hi,
	The iSCSI async message does not provide information about which
lun(s) got added or deleted. The iSCSI async message only contains
a  sense data indicating that  "reported luns data" changed.(ASC/ASCQ = 
3F/0E). Upon receiving this we would have to do a report luns/inquiry, 
compare with list of luns which were configured, find out the list of luns 
which got added or deleted and subsequently use scsi_add_device / 
scsi_remove_device.

Doing this in iSCSI driver would duplicated some code from 
scsi_report_lun_scan/scsi_sequential_scan. 

Instead if we can have scsi_report_lun_scan and scsi_sequential_scan return
back a list of luns they configured(includes discovered and already configured 
luns),  scsi_scan_target can traverse the list of devices on this target and 
use scsi_remove_device for devices (provided device access count = 0)
that to do not have an entry in the list. This takes care of both 
added/deleted luns


Please let me know.

Thanx
N.C.Krishna Murthy



On Monday 01 Dec 2003 8:57 pm, Christoph Hellwig wrote:
> On Mon, Dec 01, 2003 at 05:40:09PM +0530, Krishna Murthy wrote:
> > scsi_scan_host_selected is exported.
> >
> > 	iscsi driver does probe for devices at two places
> > a)Once after session establishment.
> > b)Whenever an iSCSI async message indicates a change in
> > "REPORTED LUNS DATA". (addition/deletion of luns on the target)
> >
> > 	There are no issues in calling scsi_scan_host_selected() at a).
> > 	Addition of luns could be handled by calling scsi_scan_host_selected
> > with rescan parameter set to 1.
> > 	Deletion of luns is not handled. scsi_scan_target does not worry about
> > devices which were previously configured, but currently not
> > available.Would you recommend handling this in scsi_scan_target (or a new
> > API) or handle in iSCSI using scsi_request based interfaces?
>
> Case b) could easily use scsi_add_device/scsi_remove_device, couldn't it?
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-12-01 16:20 ` Roman Zippel
  2003-12-01 17:19   ` Scott M. Ferris
  2003-12-02  3:46   ` Andre Hedrick
@ 2003-12-02 11:56   ` Naveen Burmi
  2003-12-02 14:11     ` Roman Zippel
  2003-12-02 16:37     ` James Bottomley
  2 siblings, 2 replies; 102+ messages in thread
From: Naveen Burmi @ 2003-12-02 11:56 UTC (permalink / raw)
  To: Roman Zippel; +Cc: hch, linux-scsi

On Monday 01 December 2003 09:50 pm, Roman Zippel wrote:

> You probably mean the page cache and it's normal that a page can be
> modified, while it's written out.

Our assumption so far was that if a buffer is given to SCSI HBA driver, then 
nobody can touch the buffer until the HBA says that he is done with the 
buffer. It seems that this assumption isn't true. Can you give an instance 
where somebody (probably buffer cache) will modify the buffer which is handed
over to an HBA driver?

Thanks,
Naveen.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-12-02  3:46   ` Andre Hedrick
@ 2003-12-02 12:02     ` Naveen Burmi
  2003-12-02 13:57     ` Roman Zippel
  1 sibling, 0 replies; 102+ messages in thread
From: Naveen Burmi @ 2003-12-02 12:02 UTC (permalink / raw)
  To: Andre Hedrick; +Cc: hch, linux-scsi, Roman Zippel

On Tuesday 02 December 2003 09:16 am, Andre Hedrick wrote:
> Operating from the mindset of DAS, local buffers are permitted to change 
>regardless. 

Please elaborate more on this.

Thanks,
Naveen.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* RE: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-12-02  3:46   ` Andre Hedrick
  2003-12-02 12:02     ` Naveen Burmi
@ 2003-12-02 13:57     ` Roman Zippel
  1 sibling, 0 replies; 102+ messages in thread
From: Roman Zippel @ 2003-12-02 13:57 UTC (permalink / raw)
  To: Andre Hedrick; +Cc: Naveen Burmi, hch, linux-scsi

Hi,

On Mon, 1 Dec 2003, Andre Hedrick wrote:

> If a "memcpy" is held and the original content changes with the top level
> caller not marking the request dirty, BOOM you just commited crap to
> platter.

This is irrelevant for this discussion, it's the responsibility of the
caller to ensure data integrity, when it's modifying the data while it's
written. The iSCSI driver is only responsible for the checksum generation.

> If you are not zero-copy to the network stack, you will never catch the
> changes between calculation of the CRC32C and pumping to the wire.
>
> iSCSI(MC)
>
> BLOCK->SCSI->iSCSI->MEMCPY->CRC32C->TCP->WIRE
>                               ^^^^^^^^^^^^^^
>                               BOOM memory buffers change

If the driver uses a private buffer, there is nobody who can change it
(otherwise it's simply a bug).

> iSCSI(MC) passes CRC32C
>
> iSCSI(ZC) fails CRC32C, and retries the failed PDU.
>
> It should be clear that any TARGET, INITIATOR, or PAIR doing iSCSI(MC)
> will roast your data.
>
> Now iSCSI(ZC) requires ERL >= 1 w/ Sync-and-Steering.

Wrong, the iSCSI standard allows to escalate the error handling to the
SCSI layer. It's a different problem whether our SCSI layer can deal with
it or not.
There are multiple valid ways to recover from a digest error, there is not
just one true way as you suggest.

> Real iSCSI does ERL >= 2, but then again > 2 means new async-connections.

ERL0 is also "real iSCSI".

bye, Roman

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-12-02 11:56   ` Naveen Burmi
@ 2003-12-02 14:11     ` Roman Zippel
  2003-12-02 16:37     ` James Bottomley
  1 sibling, 0 replies; 102+ messages in thread
From: Roman Zippel @ 2003-12-02 14:11 UTC (permalink / raw)
  To: Naveen Burmi; +Cc: hch, linux-scsi

Hi,

On Tue, 2 Dec 2003, Naveen Burmi wrote:

> > You probably mean the page cache and it's normal that a page can be
> > modified, while it's written out.
>
> Our assumption so far was that if a buffer is given to SCSI HBA driver, then
> nobody can touch the buffer until the HBA says that he is done with the
> buffer. It seems that this assumption isn't true. Can you give an instance
> where somebody (probably buffer cache) will modify the buffer which is handed
> over to an HBA driver?

You only need to mmap a file and write to it. The vm will try to write the
dirty pages back to the storage, but it will not unmap the pages, so the
the process can continue writing to them.

bye, Roman

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-12-02 11:56   ` Naveen Burmi
  2003-12-02 14:11     ` Roman Zippel
@ 2003-12-02 16:37     ` James Bottomley
  2003-12-02 17:42       ` Mike Anderson
                         ` (3 more replies)
  1 sibling, 4 replies; 102+ messages in thread
From: James Bottomley @ 2003-12-02 16:37 UTC (permalink / raw)
  To: naveenb; +Cc: Roman Zippel, hch, SCSI Mailing List

On Tue, 2003-12-02 at 05:56, Naveen Burmi wrote:
> Our assumption so far was that if a buffer is given to SCSI HBA driver, then 
> nobody can touch the buffer until the HBA says that he is done with the 
> buffer. It seems that this assumption isn't true. Can you give an instance 
> where somebody (probably buffer cache) will modify the buffer which is handed
> over to an HBA driver?

This is an incorrect assumption.  The Linux SCSI subsystem is
architected for zero copy, meaning that if the user maps a copy of the
data, they can alter it at will, even if it is in flight within the
driver.  The only thing you can guarantee is that you will get another
write request for any page the user dirtied.

Also note that glibc uses mmaping to handle file descriptors in linux,
so almost every application can alter in-flight data depending on how it
works.

James




^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-12-02  5:52           ` N.C.Krishna Murthy
@ 2003-12-02 16:59             ` Patrick Mansfield
  0 siblings, 0 replies; 102+ messages in thread
From: Patrick Mansfield @ 2003-12-02 16:59 UTC (permalink / raw)
  To: N.C.Krishna Murthy; +Cc: Christoph Hellwig, linux-scsi, davmyers

On Tue, Dec 02, 2003 at 11:22:47AM +0530, N.C.Krishna Murthy wrote:
> Hi,
> 	The iSCSI async message does not provide information about which
> lun(s) got added or deleted. The iSCSI async message only contains
> a  sense data indicating that  "reported luns data" changed.(ASC/ASCQ = 
> 3F/0E). Upon receiving this we would have to do a report luns/inquiry, 
> compare with list of luns which were configured, find out the list of luns 
> which got added or deleted and subsequently use scsi_add_device / 
> scsi_remove_device.
> 
> Doing this in iSCSI driver would duplicated some code from 
> scsi_report_lun_scan/scsi_sequential_scan. 
> 
> Instead if we can have scsi_report_lun_scan and scsi_sequential_scan return
> back a list of luns they configured(includes discovered and already configured 
> luns),  scsi_scan_target can traverse the list of devices on this target and 
> use scsi_remove_device for devices (provided device access count = 0)
> that to do not have an entry in the list. This takes care of both 
> added/deleted luns

That sounds like a good idea, but do not bother with scsi_sequential_scan,
since sparse LUNs usage could cause problems - removing a LUN that still
exists and is in use.

So, the calls to scsi_remove_device would only be triggered via report lun
scanning. 

Their should probably be a different code path and flags/arguments to
conditionally trigger the check and possible removal of LUNs.

And the caller needs to have process context, so we can't just call the
into the code from the mid-layer.

It would be best to trigger a hotplug callout for the event, and let user
land trigger the scan/delete. But, we need early boot time handling or
something else (always rescan with removal via startup scripts?) to handle
LUNs changing while booting.

Code would also need fixing so we always try a report lun scan if the
target exists (code snippet mentioned by James in another post). So if LUN
0 goes away, we still remove devices.

-- Patrick Mansfield

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-12-02 16:37     ` James Bottomley
@ 2003-12-02 17:42       ` Mike Anderson
  2003-12-02 23:55         ` James Bottomley
  2003-12-02 23:41       ` Clay Haapala
                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 102+ messages in thread
From: Mike Anderson @ 2003-12-02 17:42 UTC (permalink / raw)
  To: James Bottomley; +Cc: naveenb, Roman Zippel, hch, SCSI Mailing List

James Bottomley [James.Bottomley@steeleye.com] wrote:
> On Tue, 2003-12-02 at 05:56, Naveen Burmi wrote:
> > Our assumption so far was that if a buffer is given to SCSI HBA driver, then 
> > nobody can touch the buffer until the HBA says that he is done with the 
> > buffer. It seems that this assumption isn't true. Can you give an instance 
> > where somebody (probably buffer cache) will modify the buffer which is handed
> > over to an HBA driver?
> 
> This is an incorrect assumption.  The Linux SCSI subsystem is
> architected for zero copy, meaning that if the user maps a copy of the
> data, they can alter it at will, even if it is in flight within the
> driver.  The only thing you can guarantee is that you will get another
> write request for any page the user dirtied.


The user would need to generate another write request and the
"guarantee" is up to the user (correct?)?

-andmike
--
Michael Anderson
andmike@us.ibm.com


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-12-02 16:37     ` James Bottomley
  2003-12-02 17:42       ` Mike Anderson
@ 2003-12-02 23:41       ` Clay Haapala
  2003-12-03 14:06       ` Naveen Burmi
  2003-12-06 19:37       ` Andre Hedrick
  3 siblings, 0 replies; 102+ messages in thread
From: Clay Haapala @ 2003-12-02 23:41 UTC (permalink / raw)
  To: James Bottomley; +Cc: naveenb, Roman Zippel, hch, SCSI Mailing List

On 02 Dec 2003, James Bottomley said:
> On Tue, 2003-12-02 at 05:56, Naveen Burmi wrote:
>> Our assumption so far was that if a buffer is given to SCSI HBA
>> driver, then nobody can touch the buffer until the HBA says that he
>> is done with the buffer. It seems that this assumption isn't
>> true. Can you give an instance where somebody (probably buffer
>> cache) will modify the buffer which is handed over to an HBA
>> driver?
> 
> This is an incorrect assumption.  The Linux SCSI subsystem is
> architected for zero copy, meaning that if the user maps a copy of
> the data, they can alter it at will, even if it is in flight within
> the driver.  The only thing you can guarantee is that you will get
> another write request for any page the user dirtied.
> 
> Also note that glibc uses mmaping to handle file descriptors in
> linux, so almost every application can alter in-flight data
> depending on how it works.
> 
> James

I'm trying to drive to what a proper design should be, and I am sure
that I'll be a lot smarter when this discussion is done than I am now.
I also kind of feel that we are in Stage 3 Problem-Solving* now.

There are two reasons that the digest could fail:

 1) the network actually silently corrupted the data, which is why
    iSCSI *has* a digest in its RFC, or

 2) (assuming zero-copy) the page(s) in question were modified by the
    user while the digest was being calculated, which is possible,
    indeed probable, as James points out above.

A digest error should return a 0B/4705 COMMAND ABORT and an
initiator is supposed to be free to retry that operation.

Can the iSCSI initiator retry using the identical page it sent before?
No, not if we are in case #2.  Does it matter?  Also No.  The user has
already made the data previously attempted "stale", and it doesn't
matter if it made it to the device or not since the user processes, be
they the filesystem, mmap, or paging, want that data replaced with
whatever is current.

So, given a digest failure, the iSCSI initiator can recalculate the
digest using the current data and retry the operation.  (Yes, and the
page may *still* be being modified, so we go to the top of the loop.)
What would be missing here from a data perspective?

>From a big-picture perspective, would such re-trying impact system
performance?  I would tend to think not, since we are dealing with the
network stack here, and would that not naturally allow other things to
run?  Of course, if little else can run, then the user process
modifying the page won't be able to, and the retries will stop.
Forgive me if I am wrong about that.

Would it be a possible, and perhaps desired, optimization if the
initiator simply returned a "completion OK" in the face of a digest
error IFF it detects that the "guaranteed write request" that James
mentions above is pending for the pages in question?  As in, "Digest
failure on stale data: No Big Deal."  Can it detect the "new" pending
request for those pages?

Digressing, I can see the value of digests for apps like DBMS who are
doing checkpoints and commits and definitely do not pee on their data
after requesting a write of it.  For paging, mmap, etc.?  Naw.
-- 
Clay Haapala (chaapala@cisco.com) Cisco Systems SRBU +1 763-398-1056
   6450 Wedgwood Rd, Suite 130 Maple Grove MN 55311 PGP: C89240AD
   Active dissent has been part of American government since the
         Boston Tea Party, and will be until its end.

*Stages of Problem-Solving:

1) What problem?
2) Oh, I guess there may be a problem.
3) How the hell did it ever work??
4) Fix the problem.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-12-02 17:42       ` Mike Anderson
@ 2003-12-02 23:55         ` James Bottomley
  0 siblings, 0 replies; 102+ messages in thread
From: James Bottomley @ 2003-12-02 23:55 UTC (permalink / raw)
  To: Mike Anderson; +Cc: naveenb, Roman Zippel, hch, SCSI Mailing List

On Tue, 2003-12-02 at 11:42, Mike Anderson wrote:
> The user would need to generate another write request and the
> "guarantee" is up to the user (correct?)?

Well, not necessarily.  There are APIs to flush mmapped data which the
user can call.  However, the most likely thing (especially if the user
is careless---how many times do you code your apps to do fflush() for
instance?) is simply that automatic reaping will flush the page for you,
or failing that it will be flushed on process exit.

For mmapped file descriptors, there are other events which trigger
flushing, but I forget exactly what they are.

James



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-12-02 16:37     ` James Bottomley
  2003-12-02 17:42       ` Mike Anderson
  2003-12-02 23:41       ` Clay Haapala
@ 2003-12-03 14:06       ` Naveen Burmi
  2003-12-03 15:09         ` James Bottomley
  2003-12-06 19:37       ` Andre Hedrick
  3 siblings, 1 reply; 102+ messages in thread
From: Naveen Burmi @ 2003-12-03 14:06 UTC (permalink / raw)
  To: James Bottomley; +Cc: Roman Zippel, hch, SCSI Mailing List, davmyers


PREVENT_DATA_CORRUPTION was based on our assumption which is not correct and 
Linux SCSI subsystem is architected for zero copy, therefore we are removing 
PREVENT_DATA_CORRUPTION.

Thanks,
Naveen.

On Tuesday 02 December 2003 10:07 pm, James Bottomley wrote:
> On Tue, 2003-12-02 at 05:56, Naveen Burmi wrote:
> > Our assumption so far was that if a buffer is given to SCSI HBA driver,
> > then nobody can touch the buffer until the HBA says that he is done with
> > the buffer. It seems that this assumption isn't true. Can you give an
> > instance where somebody (probably buffer cache) will modify the buffer
> > which is handed over to an HBA driver?
>
> This is an incorrect assumption.  The Linux SCSI subsystem is
> architected for zero copy, meaning that if the user maps a copy of the
> data, they can alter it at will, even if it is in flight within the
> driver.  The only thing you can guarantee is that you will get another
> write request for any page the user dirtied.
>
> Also note that glibc uses mmaping to handle file descriptors in linux,
> so almost every application can alter in-flight data depending on how it
> works.
>
> James

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-12-03 14:06       ` Naveen Burmi
@ 2003-12-03 15:09         ` James Bottomley
  2003-12-03 17:03           ` Clay Haapala
  0 siblings, 1 reply; 102+ messages in thread
From: James Bottomley @ 2003-12-03 15:09 UTC (permalink / raw)
  To: naveenb; +Cc: Roman Zippel, hch, SCSI Mailing List, davmyers

On Wed, 2003-12-03 at 08:06, Naveen Burmi wrote:
> PREVENT_DATA_CORRUPTION was based on our assumption which is not correct and 
> Linux SCSI subsystem is architected for zero copy, therefore we are removing 
> PREVENT_DATA_CORRUPTION.

Just as a matter of interest, *all* unix like operating systems have
used a zero copy architecture (well, in SCSI; networking came a bit
later) for at least the last decade, so you might want to check this
assumption in other ports of your iscsi initiator.

James



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-12-03 15:09         ` James Bottomley
@ 2003-12-03 17:03           ` Clay Haapala
  2003-12-03 17:32             ` James Bottomley
  2003-12-03 17:54             ` Mike Anderson
  0 siblings, 2 replies; 102+ messages in thread
From: Clay Haapala @ 2003-12-03 17:03 UTC (permalink / raw)
  To: James Bottomley; +Cc: naveenb, Roman Zippel, hch, SCSI Mailing List, davmyers

On 03 Dec 2003, James Bottomley stated:
> On Wed, 2003-12-03 at 08:06, Naveen Burmi wrote:
>> PREVENT_DATA_CORRUPTION was based on our assumption which is not
>> correct and Linux SCSI subsystem is architected for zero copy,
>> therefore we are removing PREVENT_DATA_CORRUPTION.
> 
> Just as a matter of interest, *all* unix like operating systems have
> used a zero copy architecture (well, in SCSI; networking came a bit
> later) for at least the last decade, so you might want to check this
> assumption in other ports of your iscsi initiator.

And in non-ix OS's, too.

Naveen, is it intended that the iSCSI driver retry when the 0B/4705 is
detected, or will that sense code and COMMAND ABORT be passed up to the
SCSI layers for retry?

James, et al, do the SCSI layers retry on COMMAND ABORT?

I don't believe the SCSI layers retry on HARDWARE ERROR -- this is a
difference between how Solaris and Linux operate, for example, unless
I am misunderstanding or have obsolete information.
-- 
Clay Haapala (chaapala@cisco.com) Cisco Systems SRBU +1 763-398-1056
   6450 Wedgwood Rd, Suite 130 Maple Grove MN 55311 PGP: C89240AD
   Active dissent has been part of American government since the
         Boston Tea Party, and will be until its end.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-12-03 17:03           ` Clay Haapala
@ 2003-12-03 17:32             ` James Bottomley
  2003-12-03 17:54             ` Mike Anderson
  1 sibling, 0 replies; 102+ messages in thread
From: James Bottomley @ 2003-12-03 17:32 UTC (permalink / raw)
  To: Clay Haapala; +Cc: naveenb, Roman Zippel, hch, SCSI Mailing List, davmyers

On Wed, 2003-12-03 at 11:03, Clay Haapala wrote:
> James, et al, do the SCSI layers retry on COMMAND ABORT?

Not at the moment.  We don't have a complex sense processing engine.
What we use is in scsi_error.c:scsi_check_sense(), but it's geared
towards codes usually returned from everyday devices.

If I were you, I'd have the driver return DID_PARITY (or, more simply
DID_ERROR, which will cause a (counted) retry.

> I don't believe the SCSI layers retry on HARDWARE ERROR -- this is a
> difference between how Solaris and Linux operate, for example, unless
> I am misunderstanding or have obsolete information.

That depends on how the error is signalled, but if we think a retry to
be futile, it won't be retried (although there isn't really a DID_ for
hardware error that we don't retry).

James



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-12-03 17:03           ` Clay Haapala
  2003-12-03 17:32             ` James Bottomley
@ 2003-12-03 17:54             ` Mike Anderson
  2003-12-03 20:31               ` Clay Haapala
  2003-12-03 20:45               ` Clay Haapala
  1 sibling, 2 replies; 102+ messages in thread
From: Mike Anderson @ 2003-12-03 17:54 UTC (permalink / raw)
  To: Clay Haapala
  Cc: James Bottomley, naveenb, Roman Zippel, hch, SCSI Mailing List,
	davmyers

Clay Haapala [chaapala@cisco.com] wrote:
> 
> Naveen, is it intended that the iSCSI driver retry when the 0B/4705 is
> detected, or will that sense code and COMMAND ABORT be passed up to the
> SCSI layers for retry?
> 
> James, et al, do the SCSI layers retry on COMMAND ABORT?
> 
> I don't believe the SCSI layers retry on HARDWARE ERROR -- this is a
> difference between how Solaris and Linux operate, for example, unless
> I am misunderstanding or have obsolete information.
> -- 

scsi_check_sense will return NEEDS_RETRY for:
	- ABORTED_COMMAND

	- NOT_READY and UNIT_ATTENTION under the expecting_cc_ua and
	becoming ready case.

	- MEDIUM_ERROR

If the allowed count has not been exceeded the command will be retried.

-andmike
--
Michael Anderson
andmike@us.ibm.com


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-12-03 17:54             ` Mike Anderson
@ 2003-12-03 20:31               ` Clay Haapala
  2003-12-03 21:14                 ` James Bottomley
  2003-12-03 22:57                 ` Scott M. Ferris
  2003-12-03 20:45               ` Clay Haapala
  1 sibling, 2 replies; 102+ messages in thread
From: Clay Haapala @ 2003-12-03 20:31 UTC (permalink / raw)
  To: James Bottomley; +Cc: naveenb, Roman Zippel, hch, SCSI Mailing List, davmyers

On Wed, 3 Dec 2003, Mike Anderson outgrape:
> Clay Haapala [chaapala@cisco.com] wrote:
>> 
>> Naveen, is it intended that the iSCSI driver retry when the 0B/4705
>> is detected, or will that sense code and COMMAND ABORT be passed up
>> to the SCSI layers for retry?
>> 
>> James, et al, do the SCSI layers retry on COMMAND ABORT?
>> 
>> I don't believe the SCSI layers retry on HARDWARE ERROR -- this is
>> a difference between how Solaris and Linux operate, for example,
>> unless I am misunderstanding or have obsolete information.  --
> 
> scsi_check_sense will return NEEDS_RETRY for:
> 	- ABORTED_COMMAND
> 
> 	- NOT_READY and UNIT_ATTENTION under the expecting_cc_ua and
> 	becoming ready case.
> 
> 	- MEDIUM_ERROR
> 
> If the allowed count has not been exceeded the command will be
> retried.
> 
> -andmike

In discussions around here it was pointed out that the driver should
*not* retry on COMMAND ABORT situations, rather let the upper layers
handle it.  A retry at the driver level might be outright incorrect in
the case of tape devices, for instance.
-- 
Clay Haapala (chaapala@cisco.com) Cisco Systems SRBU +1 763-398-1056
   6450 Wedgwood Rd, Suite 130 Maple Grove MN 55311 PGP: C89240AD
   Active dissent has been part of American government since the
         Boston Tea Party, and will be until its end.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-12-03 17:54             ` Mike Anderson
  2003-12-03 20:31               ` Clay Haapala
@ 2003-12-03 20:45               ` Clay Haapala
  2003-12-03 21:19                 ` James Bottomley
                                   ` (2 more replies)
  1 sibling, 3 replies; 102+ messages in thread
From: Clay Haapala @ 2003-12-03 20:45 UTC (permalink / raw)
  To: James Bottomley; +Cc: naveenb, Roman Zippel, hch, SCSI Mailing List, davmyers

On Wed, 3 Dec 2003, Mike Anderson outgrape:

> Clay Haapala [chaapala@cisco.com] wrote:
>> 
>> Naveen, is it intended that the iSCSI driver retry when the 0B/4705
>> is detected, or will that sense code and COMMAND ABORT be passed up
>> to the SCSI layers for retry?
>> 
>> James, et al, do the SCSI layers retry on COMMAND ABORT?
>> 
>> I don't believe the SCSI layers retry on HARDWARE ERROR -- this is
>> a difference between how Solaris and Linux operate, for example,
>> unless I am misunderstanding or have obsolete information.  --
> 
> scsi_check_sense will return NEEDS_RETRY for:
> 	- ABORTED_COMMAND
> 
> 	- NOT_READY and UNIT_ATTENTION under the expecting_cc_ua and
> 	becoming ready case.
> 
> 	- MEDIUM_ERROR
> 
> If the allowed count has not been exceeded the command will be
> retried.
> 
> -andmike

A couple of related SCSI (not necessarily iSCSI) questions:

"If the allowed count has not been exceeded" - The first two commands
after a connection has been estabilished will fail with
Unit_Attention. What if that first command then also gets a crc error
(Aborted_command)?  Will the retry count be large enough?

Does the retry also apply to tape?  Without proper tape error
recovery, retrying the command could result in duplicate records
written to tape.

-- 
Clay Haapala (chaapala@cisco.com) Cisco Systems SRBU +1 763-398-1056
   6450 Wedgwood Rd, Suite 130 Maple Grove MN 55311 PGP: C89240AD
   Active dissent has been part of American government since the
         Boston Tea Party, and will be until its end.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-12-03 20:31               ` Clay Haapala
@ 2003-12-03 21:14                 ` James Bottomley
  2003-12-03 21:53                   ` Scott M. Ferris
  2003-12-03 22:57                 ` Scott M. Ferris
  1 sibling, 1 reply; 102+ messages in thread
From: James Bottomley @ 2003-12-03 21:14 UTC (permalink / raw)
  To: Clay Haapala; +Cc: naveenb, Roman Zippel, hch, SCSI Mailing List, davmyers

On Wed, 2003-12-03 at 14:31, Clay Haapala wrote:
> In discussions around here it was pointed out that the driver should
> *not* retry on COMMAND ABORT situations, rather let the upper layers
> handle it.  A retry at the driver level might be outright incorrect in
> the case of tape devices, for instance.

I agree.  I didn't mean handle the command retry internally, but
translate the sense internally to a DID_ code (rather than relying on us
to modify the sense parser).  We could actually give you one which would
do an immediate retry without decrementing the retries count (for crc
conditions you new shouldn't contribute to the retry processing), say
DID_IMMEDIATE_RETRY.

James



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-12-03 20:45               ` Clay Haapala
@ 2003-12-03 21:19                 ` James Bottomley
  2003-12-11 11:21                   ` Naveen Burmi
  2003-12-03 22:15                 ` Scott M. Ferris
  2003-12-03 23:24                 ` Mike Anderson
  2 siblings, 1 reply; 102+ messages in thread
From: James Bottomley @ 2003-12-03 21:19 UTC (permalink / raw)
  To: Clay Haapala; +Cc: naveenb, Roman Zippel, hch, SCSI Mailing List, davmyers

On Wed, 2003-12-03 at 14:45, Clay Haapala wrote:
> A couple of related SCSI (not necessarily iSCSI) questions:
> 
> "If the allowed count has not been exceeded" - The first two commands
> after a connection has been estabilished will fail with
> Unit_Attention. What if that first command then also gets a crc error
> (Aborted_command)?  Will the retry count be large enough?

Yes and it may not be, that's why I think we give you a DID_ code that
doesn't affect the retry count.

> Does the retry also apply to tape?  Without proper tape error
> recovery, retrying the command could result in duplicate records
> written to tape.

Yes.  That's another good reason for not adding too much to the generic
sense processor.  At the moment, the only retry that tape might not be
happy about is MEDIUM_ERROR.

James



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-12-03 21:14                 ` James Bottomley
@ 2003-12-03 21:53                   ` Scott M. Ferris
  0 siblings, 0 replies; 102+ messages in thread
From: Scott M. Ferris @ 2003-12-03 21:53 UTC (permalink / raw)
  To: James Bottomley
  Cc: Clay Haapala, naveenb, Roman Zippel, hch, SCSI Mailing List,
	davmyers

James Bottomley wrote:
> On Wed, 2003-12-03 at 14:31, Clay Haapala wrote:
> > In discussions around here it was pointed out that the driver should
> > *not* retry on COMMAND ABORT situations, rather let the upper layers
> > handle it.  A retry at the driver level might be outright incorrect in
> > the case of tape devices, for instance.
> 
> I agree.  I didn't mean handle the command retry internally, but
> translate the sense internally to a DID_ code (rather than relying on us
> to modify the sense parser).  We could actually give you one which would
> do an immediate retry without decrementing the retries count (for crc
> conditions you new shouldn't contribute to the retry processing), say
> DID_IMMEDIATE_RETRY.

DID_SOFT_ERROR used to be an unconditional retry in 2.2 and early 2.4
kernels.  Someone "fixed" it to check the allowed count.

-- 
Scott M. Ferris,
sferris@acm.org 

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-12-03 20:45               ` Clay Haapala
  2003-12-03 21:19                 ` James Bottomley
@ 2003-12-03 22:15                 ` Scott M. Ferris
  2003-12-03 22:32                   ` Clay Haapala
  2003-12-03 23:24                 ` Mike Anderson
  2 siblings, 1 reply; 102+ messages in thread
From: Scott M. Ferris @ 2003-12-03 22:15 UTC (permalink / raw)
  To: Clay Haapala
  Cc: James Bottomley, naveenb, Roman Zippel, hch, SCSI Mailing List,
	davmyers

Clay Haapala wrote:
> 
> A couple of related SCSI (not necessarily iSCSI) questions:
> 
> "If the allowed count has not been exceeded" - The first two commands
> after a connection has been estabilished will fail with
> Unit_Attention. What if that first command then also gets a crc error
> (Aborted_command)?  Will the retry count be large enough?

The high-level driver picks the allowed count for each command, which
indicates how many times that command will be issued to the low-level
driver before being failed back to the high-level driver.  For disks,
that's been 5 at least as far back as 2.2 kernels.

> Does the retry also apply to tape?  Without proper tape error
> recovery, retrying the command could result in duplicate records
> written to tape.

The tape driver sets the allowed count to 0 for commands that change
the media, in order to disable any retries that would cause tape
positioning problems.  Only commands like test unit ready and read
block limits get retries for tape.

It would be a serious bug for the iSCSI driver to retry a command with
an allowed count of zero.

-- 
Scott M. Ferris,
sferris@acm.org 

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-12-03 22:15                 ` Scott M. Ferris
@ 2003-12-03 22:32                   ` Clay Haapala
  0 siblings, 0 replies; 102+ messages in thread
From: Clay Haapala @ 2003-12-03 22:32 UTC (permalink / raw)
  To: Scott M. Ferris
  Cc: James Bottomley, naveenb, Roman Zippel, hch, SCSI Mailing List,
	davmyers

Thanks, Scott!

On Wed, 3 Dec 2003, Scott M. Ferris uttered the following:
> Clay Haapala wrote:
>> 
>> A couple of related SCSI (not necessarily iSCSI) questions:
>> 
>> "If the allowed count has not been exceeded" - The first two
>> commands after a connection has been estabilished will fail with
>> Unit_Attention. What if that first command then also gets a crc
>> error (Aborted_command)?  Will the retry count be large enough?
> 
> The high-level driver picks the allowed count for each command,
> which indicates how many times that command will be issued to the
> low-level driver before being failed back to the high-level driver.
> For disks, that's been 5 at least as far back as 2.2 kernels.
> 
>> Does the retry also apply to tape?  Without proper tape error
>> recovery, retrying the command could result in duplicate records
>> written to tape.
> 
> The tape driver sets the allowed count to 0 for commands that change
> the media, in order to disable any retries that would cause tape
> positioning problems.  Only commands like test unit ready and read
> block limits get retries for tape.
> 
> It would be a serious bug for the iSCSI driver to retry a command
> with an allowed count of zero.

-- 
Clay Haapala (chaapala@cisco.com) Cisco Systems SRBU +1 763-398-1056
   6450 Wedgwood Rd, Suite 130 Maple Grove MN 55311 PGP: C89240AD
   Active dissent has been part of American government since the
         Boston Tea Party, and will be until its end.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-12-03 20:31               ` Clay Haapala
  2003-12-03 21:14                 ` James Bottomley
@ 2003-12-03 22:57                 ` Scott M. Ferris
  1 sibling, 0 replies; 102+ messages in thread
From: Scott M. Ferris @ 2003-12-03 22:57 UTC (permalink / raw)
  To: Clay Haapala
  Cc: James Bottomley, naveenb, Roman Zippel, hch, SCSI Mailing List,
	davmyers

Clay Haapala wrote:
> 
> In discussions around here it was pointed out that the driver should
> *not* retry on COMMAND ABORT situations, rather let the upper layers
> handle it.  A retry at the driver level might be outright incorrect in
> the case of tape devices, for instance.

Note that if you let the upper layers handle it, you'll get data loss
from the page cache if you ever reach the point where a buffer write
from the cache runs out of retries.  On a write failure, the block I/O
layer just marks the unwritten buffer !uptodate, making it look like
the disk has the most recent data for that buffer, when it fact that
memory page contains the most recent data which was never written out
to disk.

Somebody really needs to come up with a cache design that doesn't
silently discard unwritten data when block writes start failing.  All
of that gross retry logic in the iSCSI driver came about because it
was so easy to get hundreds of megabytes of data lost forever, and at
the time Cisco wanted a driver work-around instead of a kernel fix.

Linus argued there was nothing else the cache could do, and that the
layers below the cache should do more retries.  I'm still not
convinced that the cache design is good enough, since it can block
badly when the lower layers are changed to do lots of retries, but in
any case, the layers below the cache still aren't doing very many
retries.  This is especially problematic on network SCSI transports
like Fibre Channel and iSCSI, where network problems can cause all
sorts of failures not commonly seen with parallel SCSI, USB, or any
other direct-attached storage.  NFS gets around these issues by being
a filesystem instead of a block device, and having it's own caching
logic.

-- 
Scott M. Ferris,
sferris@acm.org 

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-12-03 20:45               ` Clay Haapala
  2003-12-03 21:19                 ` James Bottomley
  2003-12-03 22:15                 ` Scott M. Ferris
@ 2003-12-03 23:24                 ` Mike Anderson
  2 siblings, 0 replies; 102+ messages in thread
From: Mike Anderson @ 2003-12-03 23:24 UTC (permalink / raw)
  To: Clay Haapala
  Cc: James Bottomley, naveenb, Roman Zippel, hch, SCSI Mailing List,
	davmyers

Clay Haapala [chaapala@cisco.com] wrote:
> On Wed, 3 Dec 2003, Mike Anderson outgrape:
> 
> > Clay Haapala [chaapala@cisco.com] wrote:
> >> 
> >> Naveen, is it intended that the iSCSI driver retry when the 0B/4705
> >> is detected, or will that sense code and COMMAND ABORT be passed up
> >> to the SCSI layers for retry?
> >> 
> >> James, et al, do the SCSI layers retry on COMMAND ABORT?
> >> 
> >> I don't believe the SCSI layers retry on HARDWARE ERROR -- this is
> >> a difference between how Solaris and Linux operate, for example,
> >> unless I am misunderstanding or have obsolete information.  --
> > 
> > scsi_check_sense will return NEEDS_RETRY for:
> > 	- ABORTED_COMMAND
> > 
> > 	- NOT_READY and UNIT_ATTENTION under the expecting_cc_ua and
> > 	becoming ready case.
> > 
> > 	- MEDIUM_ERROR
> > 
> > If the allowed count has not been exceeded the command will be
> > retried.
> > 
> > -andmike
> 
> A couple of related SCSI (not necessarily iSCSI) questions:
> 
> "If the allowed count has not been exceeded" - The first two commands
> after a connection has been estabilished will fail with
> Unit_Attention. What if that first command then also gets a crc error
> (Aborted_command)?  Will the retry count be large enough?
> 

The sd upper level driver sets the allowed value to SD_MAX_RETRIES
(5). In your above example it would be retried. In theory a scsi command
could have been through the retry path SD_MAX_RETRIES-1 times prior and 
would not be retried.

> Does the retry also apply to tape?  Without proper tape error
> recovery, retrying the command could result in duplicate records
> written to tape.
> 

The tape upper level drivers set there retries to 0 for read and write
operations.

andmike
--
Michael Anderson
andmike@us.ibm.com


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-12-01 15:22         ` James Bottomley
@ 2003-12-04 12:30           ` N.C.Krishna Murthy
  2003-12-05 15:33             ` James Bottomley
  0 siblings, 1 reply; 102+ messages in thread
From: N.C.Krishna Murthy @ 2003-12-04 12:30 UTC (permalink / raw)
  To: James Bottomley; +Cc: Christoph Hellwig, SCSI Mailing List, davmyers

Hi,
	Please find my replies/quieries interspersed in your reply.
On Monday 01 Dec 2003 8:52 pm, James Bottomley wrote:
> On Mon, 2003-12-01 at 06:10, Krishna Murthy wrote:
> > Hi,
> > 	We could use probing code from mid layer provided
> > scsi_scan_host_selected is exported.
> >
> > 	iscsi driver does probe for devices at two places
> > a)Once after session establishment.
> > b)Whenever an iSCSI async message indicates a change in
> > "REPORTED LUNS DATA". (addition/deletion of luns on the target)
>
> This sounds like an async event which could be handled by the hotplug
> system.  Would there be a problem with simply transmitting this to a
> user level utility and having it trigger the rescan through the exported
> sysfs interfaces?
Having user level utility trigger the rescan thru sysfs interfaces will call
scsi_scan() which will inturn call scsi_scan_host_selected().
Instead wouldn't it be preferable to call scsi_scan_host_selected()
from the iSCSI driver itself?


>
> We could do with a generic infrastructure within SCSI for generating
> these events, anyway, if you want to go that route...
>
I am not much familiar with SCSI hotplug. 

If the generic infrastructure which you mentioned does not exist now and in 
case there is a plan to do one such thing is there any framework for that?
Wouldn't that require a kernel API which is capable of both 
"adding new luns" and "deleting luns which no longer exist"?

Please let us know
Thanx
N.C.Krishna Murthy

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-12-04 12:30           ` N.C.Krishna Murthy
@ 2003-12-05 15:33             ` James Bottomley
  2003-12-05 17:03               ` Brian King
                                 ` (2 more replies)
  0 siblings, 3 replies; 102+ messages in thread
From: James Bottomley @ 2003-12-05 15:33 UTC (permalink / raw)
  To: N.C.Krishna Murthy; +Cc: Christoph Hellwig, SCSI Mailing List, davmyers

On Thu, 2003-12-04 at 07:30, N.C.Krishna Murthy wrote:
> Having user level utility trigger the rescan thru sysfs interfaces will call
> scsi_scan() which will inturn call scsi_scan_host_selected().
> Instead wouldn't it be preferable to call scsi_scan_host_selected()
> from the iSCSI driver itself?

Well, consider all the ramifications: what happens if the
reconfiguration is deleting an existing LUN with a mounted filesystem? 
What happens if the user wants to do something with the new LUN as soon
as it appears (like fold it into a VM volume, or format it).  The
hotplug system is designed to propagate events that others might be
interested in, which it strikes me this is.  So, if you have to trigger
a hotplug event anyway, you may as well go the whole hog and do the
reconfiguration as part of the hotplug subsystem.

> > We could do with a generic infrastructure within SCSI for generating
> > these events, anyway, if you want to go that route...
> >
> I am not much familiar with SCSI hotplug. 
> 
> If the generic infrastructure which you mentioned does not exist now and in 
> case there is a plan to do one such thing is there any framework for that?
> Wouldn't that require a kernel API which is capable of both 
> "adding new luns" and "deleting luns which no longer exist"?

Well, a hotplug event feeds certain information into the environment
when it triggers the call.  All we really need is a wrapper in SCSI to
add the SCSI information to the environment and trigger the event in a
generic fashion.

James




^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-12-05 15:33             ` James Bottomley
@ 2003-12-05 17:03               ` Brian King
  2003-12-08 15:06               ` N.C.Krishna Murthy
  2003-12-11 14:48               ` N.C.Krishna Murthy
  2 siblings, 0 replies; 102+ messages in thread
From: Brian King @ 2003-12-05 17:03 UTC (permalink / raw)
  To: James Bottomley
  Cc: N.C.Krishna Murthy, Christoph Hellwig, SCSI Mailing List,
	davmyers

James Bottomley wrote:
>>If the generic infrastructure which you mentioned does not exist now and in 
>>case there is a plan to do one such thing is there any framework for that?
>>Wouldn't that require a kernel API which is capable of both 
>>"adding new luns" and "deleting luns which no longer exist"?
> 
> 
> Well, a hotplug event feeds certain information into the environment
> when it triggers the call.  All we really need is a wrapper in SCSI to
> add the SCSI information to the environment and trigger the event in a
> generic fashion.
> 


This would be wonderful. I am currently writing a LLD and was going to 
use the scsi_add_device and scsi_remove_device interfaces to add/remove 
devices as the devices show up and disappear at runtime, but this was 
going to require me to go through a bit of complexity using a workqueue 
to do it at task level. If a LLD had interfaces to do this that could be 
called at interrupt context these interfaces would be much easier to use.



-- 
Brian King
eServer Storage I/O
IBM Linux Technology Center


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-12-02 16:37     ` James Bottomley
                         ` (2 preceding siblings ...)
  2003-12-03 14:06       ` Naveen Burmi
@ 2003-12-06 19:37       ` Andre Hedrick
  2003-12-07  0:37         ` Roman Zippel
  3 siblings, 1 reply; 102+ messages in thread
From: Andre Hedrick @ 2003-12-06 19:37 UTC (permalink / raw)
  To: James Bottomley; +Cc: naveenb, Roman Zippel, hch, SCSI Mailing List


James:

Thank you for explaining to Roman and company the obvious nature of
BLOCK on top of SCSI.  This is only a point of proof that I was correct in
my statements earlier.

iSCSI(MC) will corrupt randomly.

iSCSI(ZC) will fail data digests and invoke the retry.

Also iSCSI ERL = 0 is in the class of functional (waste) transport,
period.

Retrying entire commands because one PDU data digest fails is lame.

If a single command had for example 20 PDU's, and one failed a data
digest, iSCSI(ERL = 0) will have to dump all PDU's on the floor and retry
all of the command (oh and hope there are no other failure, or will retry
again).  Thus network transaction overhead of the original 20 PDU's is now
doubled to 40 PDU's.  This is just laughable and a joke if you are serious
about your data.

Same example with iSCSI(ERL = 1), the original 19 good PDU's are buffered
and an in-connection recovery is executed for the one failed.

Translation:

iSCSI(ERL = 0) 20 * (m + 1), m = I(all postive integers)

iSCSI(ERL = 1) 20 + n, n = I(all postive integers)

m == the number of entire commands retired.
n == the number of PDU's retried.

For ease, make m == n = 1;

iSCSI(ERL = 0) == 40 PDU's

iSCSI(ERL = 1) == 21 PDU's

The only case where they are equal is if all PDU's in iSCSI(ERL = 1) had
to be retried.

The goal of the exercise is to show that both Roman's and Cisco's
iSCSI(ERL = 0) initaitors are not in a data integrity class acceptable for
anyone.

Should either of these parties have an iSCSI(ERL = 1) product then it
becomes more useful.

Cheers,

Andre Hedrick
LAD Storage Consulting Group

On 2 Dec 2003, James Bottomley wrote:

> On Tue, 2003-12-02 at 05:56, Naveen Burmi wrote:
> > Our assumption so far was that if a buffer is given to SCSI HBA driver, then 
> > nobody can touch the buffer until the HBA says that he is done with the 
> > buffer. It seems that this assumption isn't true. Can you give an instance 
> > where somebody (probably buffer cache) will modify the buffer which is handed
> > over to an HBA driver?
> 
> This is an incorrect assumption.  The Linux SCSI subsystem is
> architected for zero copy, meaning that if the user maps a copy of the
> data, they can alter it at will, even if it is in flight within the
> driver.  The only thing you can guarantee is that you will get another
> write request for any page the user dirtied.
> 
> Also note that glibc uses mmaping to handle file descriptors in linux,
> so almost every application can alter in-flight data depending on how it
> works.
> 
> James
> 
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-12-06 19:37       ` Andre Hedrick
@ 2003-12-07  0:37         ` Roman Zippel
  0 siblings, 0 replies; 102+ messages in thread
From: Roman Zippel @ 2003-12-07  0:37 UTC (permalink / raw)
  To: Andre Hedrick; +Cc: James Bottomley, naveenb, hch, SCSI Mailing List

Hi,

On Sat, 6 Dec 2003, Andre Hedrick wrote:

> James:
>
> Thank you for explaining to Roman and company the obvious nature of
> BLOCK on top of SCSI.  This is only a point of proof that I was correct in
> my statements earlier.

In case you missed it, but James was not the only one stating that the
send data is not static and so far it's rather unclear what you're trying
to proof here.

> iSCSI(MC) will corrupt randomly.
>
> iSCSI(ZC) will fail data digests and invoke the retry.

Again, how should the data in a local buffer become corrupted? Where is
that "random corruption" which you don't have to deal with zero copy as
well?

> Same example with iSCSI(ERL = 1), the original 19 good PDU's are buffered
> and an in-connection recovery is executed for the one failed.
>
> Translation:
>
> iSCSI(ERL = 0) 20 * (m + 1), m = I(all postive integers)
>
> iSCSI(ERL = 1) 20 + n, n = I(all postive integers)
>
> m == the number of entire commands retired.
> n == the number of PDU's retried.
>
> For ease, make m == n = 1;
>
> iSCSI(ERL = 0) == 40 PDU's
>
> iSCSI(ERL = 1) == 21 PDU's
>
> The only case where they are equal is if all PDU's in iSCSI(ERL = 1) had
> to be retried.

Andre, maybe you can scare little girls with this, but otherwise this
highly unimpressive, how about you come up with some real data?
It's easily possible to limit the transfer size and so to reduce the
amount of the data which has to be transfered again, which protocol is
used to request a retry of the transfer, is not that important. It would
be a lot more interesting if you could actually show a real perfomance
difference with some real numbers.
Of course ERL1 allows to solve this problem more elegant, but so far you
failed to show any proof that ERL0 is not usable.

> The goal of the exercise is to show that both Roman's and Cisco's
> iSCSI(ERL = 0) initaitors are not in a data integrity class acceptable for
> anyone.

iSCSI works quite fine also without an additional checksum and if one
knows the ip transport is reliable, one can easily live without it.

bye, Roman

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-12-05 15:33             ` James Bottomley
  2003-12-05 17:03               ` Brian King
@ 2003-12-08 15:06               ` N.C.Krishna Murthy
  2003-12-08 15:46                 ` Scott M. Ferris
  2003-12-11 14:48               ` N.C.Krishna Murthy
  2 siblings, 1 reply; 102+ messages in thread
From: N.C.Krishna Murthy @ 2003-12-08 15:06 UTC (permalink / raw)
  To: James Bottomley; +Cc: Christoph Hellwig, SCSI Mailing List, davmyers

Hi,
	We agree to using the wrapper when we are notified about 
"reported lun data change". 

	However when a session is initially established to an iSCSI target,
luns on the target need to be configured. As mentioned in one of my previous
mails , we need to use scsi_scan_host_selected(). 
Will the function be exported in the next release of 2.6 or do I need to 
generate a patch for the same?

Please let us know.

Thanx
N.C.Krishna Murthy
	
On Friday 05 Dec 2003 9:03 pm, James Bottomley wrote:
> On Thu, 2003-12-04 at 07:30, N.C.Krishna Murthy wrote:
> > Having user level utility trigger the rescan thru sysfs interfaces will
> > call scsi_scan() which will inturn call scsi_scan_host_selected().
> > Instead wouldn't it be preferable to call scsi_scan_host_selected()
> > from the iSCSI driver itself?
>
> Well, consider all the ramifications: what happens if the
> reconfiguration is deleting an existing LUN with a mounted filesystem?
> What happens if the user wants to do something with the new LUN as soon
> as it appears (like fold it into a VM volume, or format it).  The
> hotplug system is designed to propagate events that others might be
> interested in, which it strikes me this is.  So, if you have to trigger
> a hotplug event anyway, you may as well go the whole hog and do the
> reconfiguration as part of the hotplug subsystem.
>
> > > We could do with a generic infrastructure within SCSI for generating
> > > these events, anyway, if you want to go that route...
> >
> > I am not much familiar with SCSI hotplug.
> >
> > If the generic infrastructure which you mentioned does not exist now and
> > in case there is a plan to do one such thing is there any framework for
> > that? Wouldn't that require a kernel API which is capable of both
> > "adding new luns" and "deleting luns which no longer exist"?
>
> Well, a hotplug event feeds certain information into the environment
> when it triggers the call.  All we really need is a wrapper in SCSI to
> add the SCSI information to the environment and trigger the event in a
> generic fashion.
>
> James

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-12-08 15:06               ` N.C.Krishna Murthy
@ 2003-12-08 15:46                 ` Scott M. Ferris
  2003-12-10 15:01                   ` N.C.Krishna Murthy
  0 siblings, 1 reply; 102+ messages in thread
From: Scott M. Ferris @ 2003-12-08 15:46 UTC (permalink / raw)
  To: N.C.Krishna Murthy
  Cc: James Bottomley, Christoph Hellwig, SCSI Mailing List, davmyers

N.C.Krishna Murthy wrote:
> Hi,
> 	We agree to using the wrapper when we are notified about 
> "reported lun data change". 
> 
> 	However when a session is initially established to an iSCSI target,
> luns on the target need to be configured. As mentioned in one of my previous
> mails , we need to use scsi_scan_host_selected(). 
> Will the function be exported in the next release of 2.6 or do I need to 
> generate a patch for the same?

You'd be better off letting the OS decide which LUNs to use, and not
masking anything in the low-level driver. If LUN masking is needed,
adding a general mechanism to the kernel to provide LUN masking for
any particular target would be a better approach than having each
low-level driver implement its own way of masking LUNs.

The main motivation for LUN masking is the limited number of SCSI disk
devices Linux can support.  Now that Linux has 32-bit device numbers,
the SCSI disk driver could be changed to support many more LUNs per
target, and the need for LUN masking would be greatly reduced.  This
would probably be less work than adding a general LUN masking
mechanism, and would be useful to a larger number of users than LUN
masking would.

-- 
Scott M. Ferris,
sferris@acm.org 

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-12-08 15:46                 ` Scott M. Ferris
@ 2003-12-10 15:01                   ` N.C.Krishna Murthy
  2003-12-10 16:50                     ` Scott M. Ferris
  0 siblings, 1 reply; 102+ messages in thread
From: N.C.Krishna Murthy @ 2003-12-10 15:01 UTC (permalink / raw)
  To: Scott M. Ferris
  Cc: James Bottomley, Christoph Hellwig, SCSI Mailing List, davmyers

Hi,
	We do not need scsi_scan_host_selected()  for lun masking.

In the current code, after establishing a session with an iSCSI target we 
a)issue report luns to find out what luns exist on the target.
b)Configure each lun using scsi_add_device.
To accomplish a),  we use iscsi_do_cmnd(). 
Linux scsi folks had suggested that we could use the "report lun based"
probing infrastructure that exists in the 2.6 kernel instead of duplicating 
the same in our driver. We need scsi_scan_host_selected() function to utilize 
the same. As of now (2.6.0-test11) the function is not exported.

Thanx
N.C.Krishna Murthy




On Monday 08 Dec 2003 9:16 pm, Scott M. Ferris wrote:
> N.C.Krishna Murthy wrote:
> > Hi,
> > 	We agree to using the wrapper when we are notified about
> > "reported lun data change".
> >
> > 	However when a session is initially established to an iSCSI target,
> > luns on the target need to be configured. As mentioned in one of my
> > previous mails , we need to use scsi_scan_host_selected().
> > Will the function be exported in the next release of 2.6 or do I need to
> > generate a patch for the same?
>
> You'd be better off letting the OS decide which LUNs to use, and not
> masking anything in the low-level driver. If LUN masking is needed,
> adding a general mechanism to the kernel to provide LUN masking for
> any particular target would be a better approach than having each
> low-level driver implement its own way of masking LUNs.
>
> The main motivation for LUN masking is the limited number of SCSI disk
> devices Linux can support.  Now that Linux has 32-bit device numbers,
> the SCSI disk driver could be changed to support many more LUNs per
> target, and the need for LUN masking would be greatly reduced.  This
> would probably be less work than adding a general LUN masking
> mechanism, and would be useful to a larger number of users than LUN
> masking would.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-12-10 15:01                   ` N.C.Krishna Murthy
@ 2003-12-10 16:50                     ` Scott M. Ferris
  0 siblings, 0 replies; 102+ messages in thread
From: Scott M. Ferris @ 2003-12-10 16:50 UTC (permalink / raw)
  To: N.C.Krishna Murthy
  Cc: Scott M. Ferris, James Bottomley, Christoph Hellwig,
	SCSI Mailing List, davmyers

N.C.Krishna Murthy wrote:
>
> Hi,
> 	We do not need scsi_scan_host_selected()  for lun masking.
> 
> In the current code, after establishing a session with an iSCSI target we 
> a)issue report luns to find out what luns exist on the target.
> b)Configure each lun using scsi_add_device.
> To accomplish a),  we use iscsi_do_cmnd(). 
> Linux scsi folks had suggested that we could use the "report lun based"
> probing infrastructure that exists in the 2.6 kernel instead of duplicating 
> the same in our driver. We need scsi_scan_host_selected() function to utilize 
> the same. As of now (2.6.0-test11) the function is not exported.
> 

The point people are trying to make is that the low-level driver
should not be doing either a or b.  When an iSCSI target is added,
tell hotplug, and let it initiate a scan of that target through sysfs.

-- 
Scott M. Ferris,
sferris@acm.org 

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
@ 2003-12-11  0:12 Pat LaVarre
  0 siblings, 0 replies; 102+ messages in thread
From: Pat LaVarre @ 2003-12-11  0:12 UTC (permalink / raw)
  To: linux-scsi; +Cc: dougg, patmans

> Date: 2003-11-07 8:55:01
> From: Doug Gilbert ...
> ....
> ** There is also the Power Condition mode
> page that all SCSI devices may optionally
> support. Trouble is with my SCSI-3 disks they
> either don't support it or the one that does
> gives ASC/ASCQ=04/02 when the disk is spun
> down (violating SPC-3).

We have a specific spc 3 rev and clause in mind?

> Date: 2003-11-10 17:43:22
> From: Patrick Mansfield
> ...
> whoever spun down the drive should spin it up.
> ...
> Whatever caused the spin down should spin it back up.

Auto spin up doesn't work with much direct-attached Windows.

For example, people say Win ME/9X in particular limit ATA/PI read to
7.5+0.5s.  For a device whose spinup exceeds 7s, the only scheme I've
ever seen work there is to notify the host e.g. via SK ASC ASCQ x 2 04
02 that some other thread spun down the drive.  I hear XP/ 2K/ NT need
ASCQ x02 whereas ME/ 9X properly understood that x00 in SCSI ASCQ by
definition means x00..FF and thus ME/ 9X do match x00 with x02.

> Date: 2003-12-03 22:57:58
> From: Scott M. Ferris ...
> ...
> Somebody really needs to come up with a cache
> design that doesn't silently discard unwritten
> data when block writes start failing ...

Aye.

> especially problematic on network SCSI
> transports like Fibre Channel and iSCSI, where
> network problems can cause all sorts of
> failures not commonly seen with parallel
> SCSI, USB, or any other direct-attached storage.

Loss of write-behind also occurs perceptibly often in direct-attached
storage.

dvd+rw/ cd-rw in particular often round up 2 KiB write requests to 32 or
64 KiB read-modify-write, and thus destroy blocks not even addressed
(e.g. parts of other udf.ko files allotted in 2 KiB blocks) when writes
fail.

Pat LaVarre



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-12-03 21:19                 ` James Bottomley
@ 2003-12-11 11:21                   ` Naveen Burmi
  0 siblings, 0 replies; 102+ messages in thread
From: Naveen Burmi @ 2003-12-11 11:21 UTC (permalink / raw)
  To: James Bottomley
  Cc: naveenb, Roman Zippel, hch, SCSI Mailing List, davmyers, chaapala


Hi James,

On Thursday 04 December 2003 02:49 am, James Bottomley wrote:
> On Wed, 2003-12-03 at 14:45, Clay Haapala wrote:
> > A couple of related SCSI (not necessarily iSCSI) questions:
> >
> > "If the allowed count has not been exceeded" - The first two commands
> > after a connection has been estabilished will fail with
> > Unit_Attention. What if that first command then also gets a crc error
> > (Aborted_command)?  Will the retry count be large enough?
>
> Yes and it may not be, that's why I think we give you a DID_ code that
> doesn't affect the retry count.
>

Please provide DID_ code(say, DID_IMMEDIATE_RETRY) so that LLD can use it to 
initiate the retry of scsi command without affecting the retry count.

Thanks,
Naveen. 

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-11-24  7:48     ` 'Christoph Hellwig'
  2003-11-24 20:45       ` Patrick Mansfield
@ 2003-12-11 12:31       ` Naveen Burmi
  1 sibling, 0 replies; 102+ messages in thread
From: Naveen Burmi @ 2003-12-11 12:31 UTC (permalink / raw)
  To: Surekha.PC; +Cc: 'Christoph Hellwig', linux-scsi



On Monday 24 November 2003 01:18 pm, 'Christoph Hellwig' wrote:
> On Mon, Nov 24, 2003 at 11:39:48AM +0530, Surekha.PC wrote:
> > Hi,
> >
> > >>  	- this comment indicates you don't want normal EH:
> > >>
> > >> * check condition with no sense.  We need to avoid this,
> > >> * since the Linux SCSI code could put the command in
> > >> * SCSI_STATE_FAILED, which it's error recovery doesn't appear
> > >> * to handle correctly, and even if it does, we're trying to
> > >> * bypass all of the Linux error recovery code
> > >> * to avoid blocking all I/O to the HBA.  Fake some sense
> > >> * data that gets a retry from Linux.
> > >
> > >	so don't use scsi_error.c and define your own
> >
> > eh_strategy_handler method.
> >
> > By implementing the iSCSI driver specific strategy_handler we will need
> > to duplicate most of the mid layer strategy handler logic only to
> > accommodate the above condition.
> >
> > Is that justified ? Or is it ok, to fix the above case in scsi_error.c ?
>
> I haven't followed the code closely, but the comment reads like
> scsi_error.c is mostly useless for you - if that were true I'd suggest
> using
> eh_strategy_handler, if on the other hand you can fix scsi_error.c easily
> it's better to use it.

James Bottomley agreed to add "DID_" code which ensures retry of scsi 
command without affecting the retry count.
Once that will be in place then instead of faking sense data in case of 
"check condition with no sense" iSCSI LLD will use it to make sure command 
will be retried.

Naveen.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-12-05 15:33             ` James Bottomley
  2003-12-05 17:03               ` Brian King
  2003-12-08 15:06               ` N.C.Krishna Murthy
@ 2003-12-11 14:48               ` N.C.Krishna Murthy
  2 siblings, 0 replies; 102+ messages in thread
From: N.C.Krishna Murthy @ 2003-12-11 14:48 UTC (permalink / raw)
  To: James Bottomley; +Cc: SCSI Mailing List, hotplug, davmyers, Scott M. Ferris

Hi,
	Thanks for the clarification. As you pointed out, 
upon establishing a session with iSCSI target 
OR
upon getting notified of Lun(s) additio/deletion.
we will inform hotplug and let it initiate a scan(re) of that target thru 
sysfs.  Please provide the wrapper in SCSI layer.

Thanx
N.C.Krishna Murthy

On Friday 05 Dec 2003 9:03 pm, James Bottomley wrote:
>
> Well, a hotplug event feeds certain information into the environment
> when it triggers the call.  All we really need is a wrapper in SCSI to
> add the SCSI information to the environment and trigger the event in a
> generic fashion.
>
> James
>
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-12-11 15:47 N.C.Krishna Murthy
@ 2003-12-12 12:48 ` Matthew Wilcox
  2003-12-12 15:29   ` N.C.Krishna Murthy
  2003-12-13  2:46   ` Andre Hedrick
  0 siblings, 2 replies; 102+ messages in thread
From: Matthew Wilcox @ 2003-12-12 12:48 UTC (permalink / raw)
  To: N.C.Krishna Murthy; +Cc: Christoph Hellwig, SCSI -DEVEL, davmyers

On Thu, Dec 11, 2003 at 09:17:46PM +0530, N.C.Krishna Murthy wrote:
> Hi,
> 	One of your comments was
> "having multiple kmap() in the same process at the same time can deadlock(),
> you redesign iscsi_xmit_task/iscsi_xmit_data/iscsi_recv_data not to do that.
> for the tx path use ->sendpage to avoid a data copy and kmapping altogether".
> 
> I did try using sendpage in iscsi_xmit_data. 
> Whenever the scatterlist->length was 4096 sendpage did succeed.
> When it was 8192 I did see a panic 

could you post the code for that?

-- 
"Next the statesmen will invent cheap lies, putting the blame upon 
the nation that is attacked, and every man will be glad of those
conscience-soothing falsities, and will diligently study them, and refuse
to examine any refutations of them; and thus he will by and by convince 
himself that the war is just, and will thank God for the better sleep 
he enjoys after this process of grotesque self-deception." -- Mark Twain

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-12-12 12:48 ` Request " Matthew Wilcox
@ 2003-12-12 15:29   ` N.C.Krishna Murthy
  2003-12-13  2:46   ` Andre Hedrick
  1 sibling, 0 replies; 102+ messages in thread
From: N.C.Krishna Murthy @ 2003-12-12 15:29 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Christoph Hellwig, SCSI -DEVEL, davmyers

[-- Attachment #1: Type: text/plain, Size: 683 bytes --]

Hi,
	Please find code  attached.
Thanx
N.C.Krishna Murthy
On Friday 12 Dec 2003 6:18 pm, Matthew Wilcox wrote:
> On Thu, Dec 11, 2003 at 09:17:46PM +0530, N.C.Krishna Murthy wrote:
> > Hi,
> > 	One of your comments was
> > "having multiple kmap() in the same process at the same time can
> > deadlock(), you redesign iscsi_xmit_task/iscsi_xmit_data/iscsi_recv_data
> > not to do that. for the tx path use ->sendpage to avoid a data copy and
> > kmapping altogether".
> >
> > I did try using sendpage in iscsi_xmit_data.
> > Whenever the scatterlist->length was 4096 sendpage did succeed.
> > When it was 8192 I did see a panic
>
> could you post the code for that?

[-- Attachment #2: code-drop --]
[-- Type: text/x-csrc, Size: 15081 bytes --]


void
iscsi_xmit_data(struct iscsi_task *task, uint32_t ttt, uint32_t data_offset,
		uint32_t data_length)
{
	struct msghdr msg;
	struct IscsiDataHdr stdh;
	Scsi_Cmnd *sc = NULL;
	struct iscsi_session *session = task->session;
	struct scatterlist *sglist = NULL, *sg = NULL, *first_sg = NULL, *last_sg = NULL;
	int wlen, rc, iovn = 0, first_data_iovn = 0;
	unsigned int segment_offset = 0, index = 0;
	int remain, xfrlen;
	uint32_t data_sn = 0;
	int bytes_to_fill, bytes_from_segment;
	char padding[4];
	int pad_bytes;
	uint32_t header_crc32c;
	uint32_t data_crc32c;

	/* This is a TEST code intended to test sendpage.
	 *   I ENSURE THAT HeaderDigest and DataDigest are not used.
	 * I also ensure that padbytes are not required.
	 */
	ssize_t (*sendpage)(struct socket *, struct page *, int, size_t, int);

	printk("%s Entry for data length of %d \n",__FUNCTION__,data_length);

	sc = task->scsi_cmnd;
	/* make sure we have data to send when we expect to */
	if (sc && (iscsi_expected_data_length(sc) == 0)
	    && ((sc->request_bufflen == 0) || (sc->request_buffer == NULL))) {
		printk
		    ("iSCSI: xmit_data for itt %u, sc 0x%x, dlength %u, "
		     "expected %u, no data in buffer\n"
		     "       request_buffer %p len %u, buffer %p len %u\n",
		     task->itt, sc->cmnd[0], data_length,
		     iscsi_expected_data_length(sc), sc->request_buffer,
		     sc->request_bufflen, sc->buffer, sc->bufflen);
		print_cmnd(sc);
		return;
	}

	remain = data_length;
	if (sc == NULL)
		remain = 0;

	memset(&stdh, 0, sizeof (stdh));
	stdh.opcode = ISCSI_OP_SCSI_DATA;
	stdh.itt = htonl(task->itt);
	stdh.ttt = ttt;
	stdh.offset = htonl(data_offset);

	/* PDU header */
	session->tx_iov[0].iov_base = &stdh;
	session->tx_iov[0].iov_len = sizeof (stdh);


	DEBUG_FLOW("iSCSI: xmit_data for itt %u, credit %d @ %u\n"
		   "       request_buffer %p len %u, buffer %p len %u\n",
		   task->itt, remain, data_offset,
		   sc->request_buffer, sc->request_bufflen, sc->buffer,
		   sc->bufflen);

	/* Find the segment and offset within the segment to 
	 * start writing from.  
	 */
	if (sc && sc->use_sg) {
		sg = sglist = (struct scatterlist *) sc->request_buffer;

		segment_offset = data_offset;

		for (index = 0; index < sc->use_sg; index++) {
			if (segment_offset < sglist[index].length)
				break;
			else
				segment_offset -= sglist[index].length;
		}

		if (index >= sc->use_sg) {
			/* didn't find the offset, command will 
			 * eventually timeout 
			 */
			printk
			    ("iSCSI: session iscsi bus %d target id %d "
			     "xmit_data for itt %u couldn't find offset %u "
			     "in sglist %p, sc %p, bufflen %u, use_sg %u\n",
			     session->iscsi_bus, session->target_id, task->itt,
			     data_offset, sglist, sc, sc->request_bufflen,
			     sc->use_sg);
			print_cmnd(sc);
			ISCSI_TRACE(ISCSI_TRACE_OutOfData, sc, task, index,
				    sc->use_sg);
			return;
		}
	}

	ISCSI_TRACE(ISCSI_TRACE_TxData, sc, task, data_offset, data_length);

	do {
		if (signal_pending(current))
			break;

#if (INVALID_ORDERING_ASSUMPTIONS == 0)
		/* since this loop may take a while, check 
		 * for TIMEDOUT tasks and commands 
		 */
		/* Note: this means a task may have a non-zero 
		 * refcount during timeout processing 
		 */
		if (test_bit(SESSION_TASK_TIMEDOUT, &session->control_bits)) {
			process_timedout_tasks(session);
		}
		if (test_bit(SESSION_COMMAND_TIMEDOUT, &session->control_bits)) {
			process_timedout_commands(session);
		}

		/* also queue up command retries */
		if (test_and_clear_bit
		    (SESSION_RETRY_COMMANDS, &session->control_bits)) {
			/* try to queue up delayed commands for retries */
			iscsi_retry_commands(session);
		}

		/* if command PDUs are small (no immediate data),
		 * start commands as soon as possible, so that we can
		 * overlap the R2T latency with the time it takes to
		 * send data for commands already issued.  This increases 
		 * throughput without significantly increasing the completion 
		 * time of commands already issued.  Some broken targets
		 * such as the one by Intel Labs will choke if they receive
		 * another command before they get all of the data for preceding
		 * commands, so this can be conditionally compiled out.
		 */
		if (!session->ImmediateData) {
			DEBUG_FLOW
			    ("iSCSI: checking for new commands before "
			     "sending data to %s\n",
			     session->log_name);
			iscsi_xmit_queued_cmnds(session);
		}
#endif

		iovn = 1;
		wlen = sizeof (stdh);
		if (session->HeaderDigest == ISCSI_DIGEST_CRC32C) {
			/* we'll need to send a digest, 
			 * but can't compute it yet 
			 */
			session->tx_iov[1].iov_base = &header_crc32c;
			session->tx_iov[1].iov_len = sizeof (header_crc32c);
			iovn = 2;
			wlen += sizeof (header_crc32c);
		}

		first_data_iovn = iovn;

		stdh.datasn = htonl(data_sn++);
		stdh.offset = htonl(data_offset);
		stdh.expstatsn = htonl(session->ExpStatSn);

		if (session->MaxXmitDataSegmentLength
		    && (remain > session->MaxXmitDataSegmentLength)) {
			/* enforce the target's data segment limit */
			bytes_to_fill = session->MaxXmitDataSegmentLength;
		} else {
			/* final PDU of a data burst */
			bytes_to_fill = remain;
			stdh.flags = ISCSI_FLAG_FINAL;
		}

		/* check if we need to pad the PDU */
		if (bytes_to_fill % PAD_WORD_LEN) {
			pad_bytes =
			    PAD_WORD_LEN - (bytes_to_fill % PAD_WORD_LEN);
			memset(padding, 0x0, sizeof (padding));
		} else {
			pad_bytes = 0;
		}

		DEBUG_FLOW
		    ("iSCSI: remain %d, bytes_to_fill %d, "
		     "sc->use_sg %u, MaxRecvDataSegmentLength %d\n",
		     remain, bytes_to_fill, sc->use_sg,
		     session->MaxRecvDataSegmentLength);

		xfrlen = 0;

		if (sc) {
			/* find all the PDU data */
			if (sc->use_sg) {
				/* while there is more data and 
				 * we want to send more data 
				 */
				while (bytes_to_fill > 0) {

					if (index >= sc->use_sg) {
						printk
						    ("iSCSI: session iscsi bus "
						     "%d target id %d xmit_data"
						     " index %d exceeds "
						     "sc->use_sg %d, "
						     "bytes_to_fill %d, "
						     "out of buffers\n",
						     session->iscsi_bus,
						     session->target_id, index,
						     sc->use_sg, bytes_to_fill);
						/* the command will eventually 
						 * timeout 
						 */
						print_cmnd(sc);
						ISCSI_TRACE
						    (ISCSI_TRACE_OutOfData, sc,
						     task, index, sc->use_sg);
						goto done;
					}
					if (signal_pending(current)) {
						DEBUG_FLOW
						    ("iSCSI: session iscsi bus "
						     "%d target id %d signal "
						     "pending, returning from "
						     "xmit_data\n",
						     session->iscsi_bus,
						     session->target_id);
						goto done;
					}

					sg = &sglist[index];


					if (first_sg == NULL) {
						first_sg = sg;
					}
					last_sg = sg;

					/* sanity check the sglist 
					 * segment length 
					 */
					if (sg->length <= segment_offset) {
						/* the sglist is corrupt */
						printk
						    ("iSCSI: session iscsi bus "
						     "%d target id %d xmit_data"
						     " index %d, length %u too "
						     "small for offset %u, "
						     "bytes_to_fill %d, sglist "
						     "has been corrupted\n",
						     session->iscsi_bus,
						     session->target_id, index,
						     sg->length, segment_offset,
						     bytes_to_fill);
						/* the command will eventually 
						 * timeout 
						 */
						print_cmnd(sc);
						ISCSI_TRACE
						    (ISCSI_TRACE_BadTxSeg, sc,
						     task, sg->length,
						     segment_offset);
						goto done;
					}

					bytes_from_segment =
					    sg->length - segment_offset;
					if (bytes_from_segment > bytes_to_fill) {
						/* only need part of 
						 * this segment 
						 */
						session->tx_iov[iovn].iov_base =
						 (unsigned char *)  segment_offset;
						session->tx_iov[iovn].iov_len =
						    bytes_to_fill;
						xfrlen += bytes_to_fill;
						printk
						    ("iSCSI: session iscsi bus "
						     "%d target id %d xmit_data"
						     " xfrlen %d, to_fill %d, "
						     "from_segment %d, iov[%2d]"
						     " = partial sg[%2d]\n",
						     session->iscsi_bus,
						     session->target_id, xfrlen,
						     bytes_to_fill,
						     bytes_from_segment, iovn,
						     index);
						iovn++;
						segment_offset += bytes_to_fill;
						break;
					} else {
						/* need all of this segment, and
						 * possibly more from the next 
						 */
						session->tx_iov[iovn].iov_base =
						   (unsigned char *) segment_offset;
						session->tx_iov[iovn].iov_len =
						    bytes_from_segment;
		
						xfrlen += bytes_from_segment;
						printk
						    ("iSCSI: session iscsi bus "
						     "%d target id %d xmit_data"
						     " xfrlen %d, to_fill %d, "
						     "from_segment %d, iov[%2d]"
						     " = sg[%2d]\n",
						     session->iscsi_bus,
						     session->target_id, xfrlen,
						     bytes_to_fill,
						     bytes_from_segment, iovn,
						     index);
						bytes_to_fill -=
						    bytes_from_segment;
						iovn++;
						/* any remaining data starts at 						 * offset 0 of the next segment
						 */
						index++;
						segment_offset = 0;
					}
				}

				if (xfrlen <= 0) {
					printk
					    ("iSCSI: session iscsi bus %d "
					     "target id %d xmit_data picked "
					     "xfrlen of 0, sc->use_sg %d, "
					     "bytes_to_fill %d\n",
					     session->iscsi_bus,
					     session->target_id, sc->use_sg,
					     bytes_to_fill);
					iscsi_drop_session(session);
					goto done;
				}
			} else {
				/* no scatter-gather */
				if ((sc->request_buffer + data_offset +
				     bytes_to_fill) <=
				    (sc->request_buffer +
				     sc->request_bufflen)) {
					/* send all the data */
					session->tx_iov[iovn].iov_base =
					    sc->request_buffer + data_offset;
					session->tx_iov[iovn].iov_len = xfrlen =
					    bytes_to_fill;
					iovn++;
				} else if ((sc->request_buffer + data_offset) <
					   (sc->request_buffer +
					    sc->request_bufflen)) {
					/* send some data, but can't send all 
					 * requested 
					 */
					xfrlen =
					    sc->request_bufflen - data_offset;
					printk
					    ("iSCSI: xmit_data ran out of data,"
					     " buffer %p len %u but offset %d "
					     "length %d, sending final %d "
					     "bytes\n",
					     sc->request_buffer,
					     sc->request_bufflen, data_offset,
					     bytes_to_fill, xfrlen);
					session->tx_iov[iovn].iov_base =
					    sc->request_buffer + data_offset;
					session->tx_iov[iovn].iov_len = xfrlen;
					iovn++;
					stdh.flags = ISCSI_FLAG_FINAL;
					remain = xfrlen;
				} else {
					/* can't send any data */
					printk
					    ("iSCSI: xmit_data ran out of data,"
					     " buffer %p len %u but offset %d "
					     "length %d, sending no more "
					     "data\n",
					     sc->request_buffer,
					     sc->request_bufflen, data_offset,
					     bytes_to_fill);
					goto done;
				}
			}
			if (pad_bytes) {
				session->tx_iov[iovn].iov_base = padding;
				session->tx_iov[iovn].iov_len = pad_bytes;
				iovn++;
				wlen += pad_bytes;
			}
		}

		/* put the data length in the PDU header */
		hton24(stdh.dlength, xfrlen);
		wlen += xfrlen;

		/* header complete, we can finally calculate the HeaderDigest */
		if (session->HeaderDigest == ISCSI_DIGEST_CRC32C)
			header_crc32c = iscsi_crc32c(&stdh, sizeof (stdh));

		/* DataDigest */
		if (xfrlen && (session->DataDigest == ISCSI_DIGEST_CRC32C)) {
			int i;

			data_crc32c =
			    iscsi_crc32c(session->tx_iov[first_data_iovn].
					 iov_base,
					 session->tx_iov[first_data_iovn].
					 iov_len);
			for (i = first_data_iovn + 1; i < iovn; i++) {
				data_crc32c =
				    iscsi_crc32c_continued(session->tx_iov[i].
							   iov_base,
							   session->tx_iov[i].
							   iov_len,
							   data_crc32c);
			}

			/* FIXME: this may not be SMP safe, but it's only for 
			 * testing anyway, so it probably doesn't need to be 
			 */
			if (session->fake_write_data_mismatch > 0) {
				session->fake_write_data_mismatch--;
				smp_mb();
				printk
				    ("iSCSI: session iscsi bus %d target id %d "
				     "faking DataDigest mismatch for itt %u\n",
				     session->iscsi_bus, session->target_id,
				     task->itt);
				data_crc32c = 0x01020304;
			}

			session->tx_iov[iovn].iov_base = &data_crc32c;
			session->tx_iov[iovn].iov_len = sizeof (data_crc32c);
			iovn++;
			wlen += sizeof (data_crc32c);
		}

		if (xfrlen && (session->DataDigest == ISCSI_DIGEST_CRC32C)) {
			memset(&msg, 0, sizeof (msg));
			msg.msg_iov = &session->tx_iov[0];
			msg.msg_iovlen = iovn;

			ISCSI_TRACE(ISCSI_TRACE_TxDataPDU, sc, task,
				    data_offset, xfrlen);

			rc = iscsi_sendmsg(session, &msg, wlen);
			if (rc != wlen) {
				printk
				    ("iSCSI: session iscsi bus %d target id %d "
					     "xmit_data failed to send "
					     "%d bytes, rc %d\n",
				     session->iscsi_bus, session->target_id,
				     wlen, rc);
				iscsi_drop_session(session);
				goto done;
			}
		} else {

	/* Send header*/
			memset(&msg, 0, sizeof (msg));
			msg.msg_iov = &session->tx_iov[0];
			msg.msg_iovlen = 1;
                        rc = iscsi_sendmsg(session, &msg, 48);
			/* I have ensured that header and data digest are not
			 *  set. So Send data
			 */
			
			if (first_sg == NULL) {
				memset(&msg, 0, sizeof (msg));
				msg.msg_iov = &session->tx_iov[1];
				msg.msg_iovlen = iovn-1;
			ISCSI_TRACE(ISCSI_TRACE_TxDataPDU, sc, task,
				    data_offset, xfrlen);
				printk("No SG about to use iscsi_sendmsg\n");
				rc = iscsi_sendmsg(session, &msg, 
							wlen - session->tx_iov[0].iov_len);
				if (rc != (wlen - session->tx_iov[0].iov_len)) {
					printk
				    ("iSCSI: session iscsi_bus %d target id %d "
				     "xmit_data failed to send %d bytes, rc "
				     "%d\n", session->iscsi_bus,
				     session->target_id, wlen, rc);
					iscsi_drop_session(session);
					goto done;
				}
			} else {
				iovn = 1;
				for (sg = first_sg; sg <= last_sg ;sg++) {
					printk("About to send page %p,offset %d len %d \n",sg->page,sg->offset+(int)(session->tx_iov[iovn].iov_base),session->tx_iov[iovn].iov_len);
#if 0
					if (session->tx_iov[iovn].iov_len >
					    4096) {
			                        sendpage = sock_no_sendpage;
					}
					else
#endif
						sendpage = session->socket->ops->sendpage;
					rc = sendpage(session->socket,sg->page,sg->offset+(int)(session->tx_iov[iovn].iov_base),session->tx_iov[iovn].iov_len,0);
					printk("return value of sendpage = %d\n",rc);
					iovn++;
				}
			}
		}

		remain -= xfrlen;

		printk
		    ("iSCSI: xmit_data sent %d @ %u for itt %u, "
		     "remaining %d, final %d\n",
		     xfrlen, data_offset, task->itt, remain,
		     stdh.flags & ISCSI_FLAG_FINAL);

		data_offset += xfrlen;
		if (first_sg) {
			first_sg = last_sg = NULL;
		}

	} while (remain);

      done:
}

Log message:
---------------------------------------------------------------------
About to send page c11d0e48,offset 0 len 8192
Dec 12 08:49:24 linux-3 kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000004
-----------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Request for review of Linux iSCSI driver version 4.0.0.1
  2003-12-12 12:48 ` Request " Matthew Wilcox
  2003-12-12 15:29   ` N.C.Krishna Murthy
@ 2003-12-13  2:46   ` Andre Hedrick
  1 sibling, 0 replies; 102+ messages in thread
From: Andre Hedrick @ 2003-12-13  2:46 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: N.C.Krishna Murthy, Christoph Hellwig, SCSI -DEVEL, davmyers


Matthew,

This happens with any host.  It is a result of scsi_merge failing to
perforn a bounds check as it is attempting to create contigious
scatterlist nents.

Instead of creating a new nent base on page boundaries, because
scatterlists are bound to struct page, it does sane and normal sg list
merging.

There are cases were the length of the nent exceeds the lenght of the page
because of merging.  I know the problem and working on a fix for SATA II
rules, but something is fundementally broken in design.  Trying to tie
scatterlist nents to struct page is wrong unless one breaks on page
lengths.

I know the above is worded goofy, and I will try to explain it again after
a little more thought.

Cheers,

Andre Hedrick
LAD Storage Consulting Group

On Fri, 12 Dec 2003, Matthew Wilcox wrote:

> On Thu, Dec 11, 2003 at 09:17:46PM +0530, N.C.Krishna Murthy wrote:
> > Hi,
> > 	One of your comments was
> > "having multiple kmap() in the same process at the same time can deadlock(),
> > you redesign iscsi_xmit_task/iscsi_xmit_data/iscsi_recv_data not to do that.
> > for the tx path use ->sendpage to avoid a data copy and kmapping altogether".
> > 
> > I did try using sendpage in iscsi_xmit_data. 
> > Whenever the scatterlist->length was 4096 sendpage did succeed.
> > When it was 8192 I did see a panic 
> 
> could you post the code for that?
> 
> -- 
> "Next the statesmen will invent cheap lies, putting the blame upon 
> the nation that is attacked, and every man will be glad of those
> conscience-soothing falsities, and will diligently study them, and refuse
> to examine any refutations of them; and thus he will by and by convince 
> himself that the war is just, and will thank God for the better sleep 
> he enjoys after this process of grotesque self-deception." -- Mark Twain
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 102+ messages in thread

end of thread, other threads:[~2003-12-13  2:53 UTC | newest]

Thread overview: 102+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-10-23 12:04 Request for review of Linux iSCSI driver version 4.0.0.1 Krishna Murthy
2003-10-27 15:39 ` Christoph Hellwig
2003-10-29 13:23   ` Surekha.PC
2003-10-29 13:45     ` 'Christoph Hellwig'
2003-10-29 17:28       ` Mike Christie
2003-10-29 18:45         ` Clay Haapala
2003-10-29 19:01           ` Mike Christie
2003-10-29 19:17             ` Clay Haapala
2003-10-29 19:33               ` Mike Christie
2003-10-30 23:42                 ` Andre Hedrick
2003-10-30 13:34         ` jd
2003-11-11 11:56       ` Naveen Burmi
2003-11-11 17:36         ` 'Christoph Hellwig'
2003-11-19 12:40           ` Krishna Murthy
2003-11-19 12:49             ` Matthew Wilcox
2003-11-19 13:38             ` 'Christoph Hellwig'
2003-11-11 17:40         ` James Bottomley
2003-11-06  9:42   ` Sachin Mhatre (smhatre)
2003-11-06 10:09     ` 'Christoph Hellwig'
2003-11-07  8:55       ` Douglas Gilbert
2003-11-07  9:30         ` 'Christoph Hellwig'
2003-11-10 17:43         ` Patrick Mansfield
2003-11-06 11:10     ` Andre Hedrick
2003-11-06 11:14       ` 'Christoph Hellwig'
2003-11-13 14:30   ` Sachin Mhatre (smhatre)
2003-11-13 14:54     ` Matthew Wilcox
2003-11-19 13:04   ` Sachin Mhatre (smhatre)
2003-11-19 13:10     ` 'Christoph Hellwig'
2003-11-19 14:48   ` Naveen Burmi
2003-11-19 14:48     ` Christoph Hellwig
2003-11-19 15:19       ` Naveen Burmi
2003-11-19 15:20         ` Christoph Hellwig
2003-12-01 12:10       ` Krishna Murthy
2003-12-01 15:22         ` James Bottomley
2003-12-04 12:30           ` N.C.Krishna Murthy
2003-12-05 15:33             ` James Bottomley
2003-12-05 17:03               ` Brian King
2003-12-08 15:06               ` N.C.Krishna Murthy
2003-12-08 15:46                 ` Scott M. Ferris
2003-12-10 15:01                   ` N.C.Krishna Murthy
2003-12-10 16:50                     ` Scott M. Ferris
2003-12-11 14:48               ` N.C.Krishna Murthy
2003-12-01 15:27         ` Christoph Hellwig
2003-12-02  5:52           ` N.C.Krishna Murthy
2003-12-02 16:59             ` Patrick Mansfield
2003-11-19 17:17     ` Patrick Mansfield
2003-11-20 13:32       ` Naveen Burmi
2003-11-20 13:34         ` Christoph Hellwig
2003-11-20 14:53           ` Naveen Burmi
2003-11-22  8:16       ` How to generate ILI condtion on a tape device Kallol Biswas
2003-11-24  8:31         ` Josef Möllers
2003-11-25  7:58           ` Kallol Biswas
2003-11-21 16:42   ` Request for review of Linux iSCSI driver version 4.0.0.1 Clay Haapala
2003-11-21 17:32     ` Matthew Wilcox
2003-11-21 18:18       ` Clay Haapala
2003-11-26 13:41         ` Christoph Hellwig
2003-11-24  6:09   ` Surekha.PC
2003-11-24  7:48     ` 'Christoph Hellwig'
2003-11-24 20:45       ` Patrick Mansfield
2003-11-26 13:45         ` 'Christoph Hellwig'
2003-12-11 12:31       ` Naveen Burmi
  -- strict thread matches above, loose matches on Subject: below --
2003-11-21 11:40 Shashi Kiran T.R
2003-11-21 17:56 ` Patrick Mansfield
2003-12-01 12:30 Naveen Burmi
2003-12-01 14:08 ` Naveen Burmi
2003-12-01 18:48   ` Andre Hedrick
2003-12-01 19:23   ` Andre Hedrick
2003-12-01 16:20 ` Roman Zippel
2003-12-01 17:19   ` Scott M. Ferris
2003-12-01 20:06     ` Clay Haapala
2003-12-01 20:31       ` Andre Hedrick
2003-12-01 20:58         ` Clay Haapala
2003-12-02  3:46   ` Andre Hedrick
2003-12-02 12:02     ` Naveen Burmi
2003-12-02 13:57     ` Roman Zippel
2003-12-02 11:56   ` Naveen Burmi
2003-12-02 14:11     ` Roman Zippel
2003-12-02 16:37     ` James Bottomley
2003-12-02 17:42       ` Mike Anderson
2003-12-02 23:55         ` James Bottomley
2003-12-02 23:41       ` Clay Haapala
2003-12-03 14:06       ` Naveen Burmi
2003-12-03 15:09         ` James Bottomley
2003-12-03 17:03           ` Clay Haapala
2003-12-03 17:32             ` James Bottomley
2003-12-03 17:54             ` Mike Anderson
2003-12-03 20:31               ` Clay Haapala
2003-12-03 21:14                 ` James Bottomley
2003-12-03 21:53                   ` Scott M. Ferris
2003-12-03 22:57                 ` Scott M. Ferris
2003-12-03 20:45               ` Clay Haapala
2003-12-03 21:19                 ` James Bottomley
2003-12-11 11:21                   ` Naveen Burmi
2003-12-03 22:15                 ` Scott M. Ferris
2003-12-03 22:32                   ` Clay Haapala
2003-12-03 23:24                 ` Mike Anderson
2003-12-06 19:37       ` Andre Hedrick
2003-12-07  0:37         ` Roman Zippel
2003-12-11  0:12 Pat LaVarre
2003-12-11 15:47 N.C.Krishna Murthy
2003-12-12 12:48 ` Request " Matthew Wilcox
2003-12-12 15:29   ` N.C.Krishna Murthy
2003-12-13  2:46   ` Andre Hedrick

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).