From mboxrd@z Thu Jan 1 00:00:00 1970
From: Pat LaVarre
Subject: Re: [usb-storage] mode sense blacklist how
Date: 17 Nov 2003 13:14:13 -0700
Sender: linux-scsi-owner@vger.kernel.org
Message-ID: <1069100053.2324.118.camel@patrh9>
References:
<1068767049.2851.166.camel@patrh9>
<1068768796.3fb41e1c8d075@webmail.netregistry.net>
<1068775834.2851.321.camel@patrh9>
<20031113181945.I30194@one-eyed-alien.net>
<1068777510.2851.359.camel@patrh9>
<1068779468.3fb447ccc6e60@webmail.netregistry.net>
<1068838908.2852.34.camel@patrh9> <20031114153607.A7207@beaverton.ibm.com>
<20031116121039.A13224@beaverton.ibm.com>
Mime-Version: 1.0
Content-Type: text/plain
Content-Transfer-Encoding: 7bit
Return-path:
Received: from email-out1.iomega.com ([147.178.1.82]:14759 "EHLO
email.iomega.com") by vger.kernel.org with ESMTP id S263675AbTKQUOp
(ORCPT );
Mon, 17 Nov 2003 15:14:45 -0500
In-Reply-To: <20031116121039.A13224@beaverton.ibm.com>
List-Id: linux-scsi@vger.kernel.org
To: patmans@us.ibm.com
Cc: usb-storage@one-eyed-alien.net, linux-scsi@vger.kernel.org, dmitrik@users.sourceforge.net, mdharm-scsi@one-eyed-alien.net, stern@rowland.harvard.edu, james.bottomley@steeleye.com, ronald@kuetemeier.com, idan@idanso.dyndns.org
> [Resending, this never showed up on linux-scsi.]
us.ibm.co vs.us.ibm.com ambiguities may have entered this thread from
here, sorry ouch ouch ouch, these new-fangled protocols for mixing human
names into mailto addresses don't quite reach me everywhere I connect.
> I am not really sure what happened, I think it is waiting for a 30 hr
> timeout to complete, you ought to lower the timeout a bit ;-)
Just before we redesign this plscsi default for a different audience ...
Please tell me why Ctrl+C aka SIGINT doesn't work?
As a client of SG_IO, am I supposed to work to catch SIGINT explicitly
so I can pass that SIGINT thru sg somehow? (And is that answer
documented anywhere?)
> I am not really sure what happened, I think it is waiting for a 30 hr
> timeout to complete, you ought to lower the timeout a bit ;-)
Thank you for saying. You are the second host developer who has somehow
found the time to tell me this. An override syntax that might work is:
export PLSCSI=/dev/sda -X time 5 0
I'll race you all to try and report back how accurately I'm remembering
that plscsi syntax.
The issue of a long timeout arises because personally I've lived mostly
as a developer of device firmware, rather than host software.
A long timeout was the only effective way I found to persuade the host
to let me stop a test at first weirdness. With short timeouts, the host
would automagically attack the device with resets, destroying all
evidence of what went wrong to begin with.
To my eye, polite hosts delay the automagic attack until the next
command. That way, I can sit at the top of the stack and ask to halt at
first weirdness. So long as I prevent any other commands from going
down thru the stack, then so likewise do I prevent the automagic
attack. But only if I have a host designed to be polite by default.
> The following did nothing, I am running on 2.6 current bk + only my earlier
> send oversize MODE SENSE buffer patch.
>
> Is that expected without -v?
>
> > sudo plscsi -p -i xC0 -x "5A 00 1C:00:00:00 00 00:C0 00 00:00" // Mode Sense (10)
Yes I now agree I should have mentioned we add -v to see a trace of what
was tried when, sorry I did not.
> Full results:
Great fun, thank you.
> + PLPROG=/home/patman/src/plscsi/plscsi
> + DEV=/dev/sda
> + /home/patman/src/plscsi/plscsi -i xC0 -x '1A 00:1C:00 C0 00' /dev/sda
> // x 5 20 sense // xC0 (192) residue
> // -x0102 = -258 = plscsi.main exit int
> + /home/patman/src/plscsi/plscsi -i xC0 -x '1A 00:3F:00 C0 00' /dev/sda
> // x 5 20 sense // xC0 (192) residue
> // -x0102 = -258 = plscsi.main exit int
> + /home/patman/src/plscsi/plscsi -i x0C -x '1A 00:00:00 0C 00' /dev/sda
> // x 5 20 sense // xC (12) residue
> // -x0102 = -258 = plscsi.main exit int
> + /home/patman/src/plscsi/plscsi -i x0C -x '1A 00:3F:00 0C 00' /dev/sda
> // x 5 20 sense // xC (12) residue
> // -x0102 = -258 = plscsi.main exit int
Cool.
Here we repeatedly see nonzero residue equal to the data byte count
asked to copy in. That suggests we may be getting accurate reporting of
residue from the connection between you and the drive.
SK ASC = x 5 20 correlates with versions of t10.org English over time,
all of which I think I remember say op not known i.e. cdb[0] not known
i.e. apparently this device correctly reports it does not support op x1A
Mode Sense (6).
> + /home/patman/src/plscsi/plscsi -p -i xC0 -x '5A 00 1C:00:00:00 00 00:C0 00 00:00' /dev/sda
> + /home/patman/src/plscsi/plscsi -p -i xC0 -x '5A 00 3F:00:00:00 00 00:C0 00 00:00' /dev/sda
> + /home/patman/src/plscsi/plscsi -p -i x0C -x '5A 00 00:00:00:00 00 00:0C 00 00:00' /dev/sda
> + /home/patman/src/plscsi/plscsi -p -i x0C -x '5A 00 3F:00:00:00 00 00:0C 00 00:00' /dev/sda
> + /home/patman/src/plscsi/plscsi -p -i xC4 -x '5A 00 1C:00:00:00 00 00:C4 00 00:00' /dev/sda
> + /home/patman/src/plscsi/plscsi -p -i xC4 -x '5A 00 3F:00:00:00 00 00:C4 00 00:00' /dev/sda
> + /home/patman/src/plscsi/plscsi -p -i x10 -x '5A 00 00:00:00:00 00 00:10 00 00:00' /dev/sda
> + /home/patman/src/plscsi/plscsi -p -i x10 -x '5A 00 3F:00:00:00 00 00:10 00 00:00' /dev/sda
I guess here we have been reading something much like a tty log of:
#!/bin/bash -x
By that guess, this quote says all those commands completed without
error.
>>From that guess we conclude, if we try talking mode sense like Windows
does, we here have found one more device supporting us.
> Following is the "breaking" command that caused the original "--babble"
> and problem.
>
> ./plscsi /dev/sda -v -i x4 -x "5A 00 3F:00:00:00 00 00:08 00"
Ouch that's gross.
That 00:08 in the bytes[7:8] of the -x "$cdb" tells the device to copy
up to 8 bytes in, but the -i x4 tells the device to never copy more than
4 bytes in. We throw paradox at the device, and we get back "babble".
Shame on the device, but shame on us too.
Mind you, since here we have CBI/CB protocol rather than BBB protocol,
the device never sees the -i x4, the device sees only the 00:08. So in
a bus trace we see no babble: only the host that tries to reconcile -i
x4 with 8 bytes in from the device sees babble.
Since the device can't see CBI/CB break this way, the device can't
help. With CBI/CB, by design, only the host can avoid this kind of
trouble, only by more accurately guessing in advance out-of-band how the
device will actually interpret a particular cdb.
> Luckily unplugging the device completes, and I can plug it back in and
> everything is ok! That is quite nice! Thanks for all the nice hotplug and
> ref-counting code everybody.
Brilliant, aye, kudos everyone.
> Following is the usb storage debug logs for the above command, including
> "--babble" output.
>
> I added some messages via "logger -p kern.debug" to separate things.
Ooooh. I've been looking to discover that `logger` command since the
day I discovered `dmesg`. Thank you for naming it explicitly.
> Weird that we tried to send another MODE SENSE *after* I unplugged the
> device, maybe an abort or cancel of the commands caused it to retry before
> the device was actually removed, more likely usb aborted outstanding
> commands prior to a call to scsi_remove_host (that is OK).
>
> -----------------------------------------------------------------------------
>
> Nov 14 14:51:34 laptop patman: NOTE pre post too short command sent
> ...
> Nov 14 14:52:39 laptop patman: NOTE unplugging device NOTE
> ...
> Nov 14 14:52:44 laptop kernel: usb-storage: -- usb_stor_release_resources finished
> ...
> Nov 14 14:52:52 laptop patman: NOTE device unplugged NOTE
I read that trace with interest, thank you.
Pat LaVarre