linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [AIC7xxx] tree things to report
@ 2007-05-23 21:35 Emmanuel Fusté
  0 siblings, 0 replies; 6+ messages in thread
From: Emmanuel Fusté @ 2007-05-23 21:35 UTC (permalink / raw)
  To: hare; +Cc: linux-scsi

Hello,

After one year of rest, I resurrect my old computer, install a
2.6.21 kernel and updated my Debian distro.

Tree things to repport:

First, a cosmetic thing: I have two scsi sync devices and two
async devices. For the first ones, domain validation return
the negociated speed and mode. For the second ones, domain
validation return nothing. I expect it is just a 'missing
feature' but that all went ok. I am right ?

scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 7.0
        <Adaptec 2940 Ultra2 SCSI adapter>
        aic7890/91: Ultra2 Wide Channel A, SCSI Id=7, 32/253 SCBs

scsi 0:0:0:0: Direct-Access     IBM      DMVS18V         02B0
PQ: 0 ANSI: 3
scsi0:A:0:0: Tagged Queuing enabled.  Depth 8 
 target0:0:0: Beginning Domain Validation
 target0:0:0: wide asynchronous
 target0:0:0: FAST-40 WIDE SCSI 80.0 MB/s ST (25ns, offset 31)
 target0:0:0: Domain Validation skipping write tests
 target0:0:0: Ending Domain Validation
scsi 0:0:3:0: CD-ROM            YAMAHA   CRW6416S         1.0d
PQ: 0 ANSI: 2
 target0:0:3: Beginning Domain Validation
 target0:0:3: FAST-10 SCSI 10.0 MB/s ST (100 ns,offset 15)
 target0:0:3: Domain Validation skipping write tests
 target0:0:3: Ending Domain Validation
scsi 0:0:4:0: CD-ROM            TOSHIBA  CD-ROM XM-3501TA 1875
PQ: 0 ANSI: 2
 target0:0:4: Beginning Domain Validation
 target0:0:4: Domain Validation skipping write tests
 target0:0:4: Ending Domain Validation
scsi 0:0:6:0: Sequential-Access WANGTEK  5525ES SCSI REV7 0W 
 PQ: 0 ANSI: 1
 target0:0:6: Beginning Domain Validation
 target0:0:6: Ending Domain Validation

Secondly, It seems that something is doing weird things with
my old CD-ROM reader (XM-3501TA). At some point in time (not
really regular), I get this in my logs:
May 23 00:45:44 rafale kernel: (scsi0:A:4:0): No or incomplete
CDB sent to device.
May 23 00:45:44 rafale kernel: (scsi0:A:4:0): Protocol
violation in Message-in phase.  Attempting to abort.
May 23 00:45:44 rafale kernel: (scsi0:A:4:0): Abort Message Sent
May 23 00:45:44 rafale kernel: (scsi0:A:4:0): SCB 11 - Abort
Completed.
And sometimes (but seem related to problems with my cable):
May 23 04:32:49 rafale kernel: (scsi0:A:4:0): parity error
detected in Status phase. SEQADDR(0xad) SCSIRATE(0x0)
May 23 05:13:03 rafale kernel: (scsi0:A:4:0): parity error
detected in Status phase. SEQADDR(0xac) SCSIRATE(0x0)

There is no scsi bus freeze, and the device work perfectly
without generating other errors. DV problem ? Bad hal daemon
interaction ? Defect in the driver trigged by bad hal daemon
behavior ? 

Last thing, a now two years problem:
cdrwtools -d /dev/sr0 -q still instantly crash the
scsibus/cdwriter and the driver never recover.
I did not have a new log because of the complete bus crash.
Have you new ideas about this problem ??
I will try:
- to get a log on a usb key
- to port patch from Bugzilla Bug 5921 to current kernel. With
the previous ones, the driver recover. (but i was experiencing
FS corruption but it seems it was not related).
- to identify exactly what cdrwtools send to the kernel/driver
which cause the crash.
If some scsi experts have a clue, I am taking.

Thank you all,
Best regards,
Emmanuel.
 
---

Créez votre adresse électronique prenom.nom@laposte.net 
1 Go d'espace de stockage, anti-spam et anti-virus intégrés.

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [AIC7xxx] tree things to report
@ 2007-05-24 22:42 Emmanuel Fusté
  0 siblings, 0 replies; 6+ messages in thread
From: Emmanuel Fusté @ 2007-05-24 22:42 UTC (permalink / raw)
  To: linux-netdev; +Cc: hare, linux-scsi

Hello,
While trying to obtain scsi log to debug a driver problem, I
tried to use netconsole on an old smp system with a 10Mbits
pcnet32 ethernet device.
But few seconds after enabling netconsole, before launching my
scsi test, but after a few console activity the computer
freeze hard.
Is it a know or expected problem ? (2.6.21 kernel, pcnet32:
PCnet/PCI 79C970) Have you some solutions or patch to try ?
Will get back my soldering iron to do a serial cable for now.

Thanks,
Emmanuel.

> Hi Emmanuel,
> 
> Emmanuel Fusté wrote:
> > Hello,
> > 
> > After one year of rest, I resurrect my old computer, install a
> > 2.6.21 kernel and updated my Debian distro.
> > 
> > Tree things to repport:
> > 
> > First, a cosmetic thing: I have two scsi sync devices and two
> > async devices. For the first ones, domain validation return
> > the negociated speed and mode. For the second ones, domain
> > validation return nothing. I expect it is just a 'missing
> > feature' but that all went ok. I am right ?
> > 
> > scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 7.0
> >         <Adaptec 2940 Ultra2 SCSI adapter>
> >         aic7890/91: Ultra2 Wide Channel A, SCSI Id=7,
32/253 SCBs
> > 
> > scsi 0:0:0:0: Direct-Access     IBM      DMVS18V         02B0
> > PQ: 0 ANSI: 3
> > scsi0:A:0:0: Tagged Queuing enabled.  Depth 8 
> >  target0:0:0: Beginning Domain Validation
> >  target0:0:0: wide asynchronous
> >  target0:0:0: FAST-40 WIDE SCSI 80.0 MB/s ST (25ns, offset 31)
> >  target0:0:0: Domain Validation skipping write tests
> >  target0:0:0: Ending Domain Validation
> > scsi 0:0:3:0: CD-ROM            YAMAHA   CRW6416S         1.0d
> > PQ: 0 ANSI: 2
> >  target0:0:3: Beginning Domain Validation
> >  target0:0:3: FAST-10 SCSI 10.0 MB/s ST (100 ns,offset 15)
> >  target0:0:3: Domain Validation skipping write tests
> >  target0:0:3: Ending Domain Validation
> > scsi 0:0:4:0: CD-ROM            TOSHIBA  CD-ROM XM-3501TA 1875
> > PQ: 0 ANSI: 2
> >  target0:0:4: Beginning Domain Validation
> >  target0:0:4: Domain Validation skipping write tests
> >  target0:0:4: Ending Domain Validation
> > scsi 0:0:6:0: Sequential-Access WANGTEK  5525ES SCSI REV7 0W 
> >  PQ: 0 ANSI: 1
> >  target0:0:6: Beginning Domain Validation
> >  target0:0:6: Ending Domain Validation
> > 
> Hmm. Have to have a look at it. It should at least report
the result ...
> 
> > Secondly, It seems that something is doing weird things with
> > my old CD-ROM reader (XM-3501TA). At some point in time (not
> > really regular), I get this in my logs:
> > May 23 00:45:44 rafale kernel: (scsi0:A:4:0): No or incomplete
> > CDB sent to device.
> > May 23 00:45:44 rafale kernel: (scsi0:A:4:0): Protocol
> > violation in Message-in phase.  Attempting to abort.
> > May 23 00:45:44 rafale kernel: (scsi0:A:4:0): Abort
Message Sent
> > May 23 00:45:44 rafale kernel: (scsi0:A:4:0): SCB 11 - Abort
> > Completed.
> > And sometimes (but seem related to problems with my cable):
> > May 23 04:32:49 rafale kernel: (scsi0:A:4:0): parity error
> > detected in Status phase. SEQADDR(0xad) SCSIRATE(0x0)
> > May 23 05:13:03 rafale kernel: (scsi0:A:4:0): parity error
> > detected in Status phase. SEQADDR(0xac) SCSIRATE(0x0)
> > 
> Yes, this looks like a cable probrlem.
> 
> > There is no scsi bus freeze, and the device work perfectly
> > without generating other errors. DV problem ? Bad hal daemon
> > interaction ? Defect in the driver trigged by bad hal daemon
> > behavior ? 
> > 
> Ach, yes, it could at least be triggered by hal. Not all
devices like to
> be polled by hal, especially if they're in a middle of an
operation.
> CD-RW eg. Kay claimed to have it solved, but I still end up
disabling
> hal :-)
> 
> > Last thing, a now two years problem:
> > cdrwtools -d /dev/sr0 -q still instantly crash the
> > scsibus/cdwriter and the driver never recover.
> > I did not have a new log because of the complete bus crash.
> > Have you new ideas about this problem ??
> No, not yet. But it looks as if I finally got some time to
look deeper
> in this problem.
> Bugzilla's still assigned to me, to it's a constant
remainder that
> something's amiss ...
> 
> > I will try:
> > - to get a log on a usb key
> > - to port patch from Bugzilla Bug 5921 to current kernel. With
> > the previous ones, the driver recover. (but i was experiencing
> > FS corruption but it seems it was not related).
> > - to identify exactly what cdrwtools send to the kernel/driver
> > which cause the crash.
> What you should do here is:
> 
> - hook up a serial cable and re-route console messages to that
> - Switch off syslog (as this might block if the SCSI bus frozen)
> - Enable scsi debugging (Error, Timeout, Scan, and Midlayer is
> sufficient) and start cdrwtools.
> - Send me the log from the serial console.
> 
> This will give me at least a starting point what's going wrong.
> 
> Thanks for your patience.
> 
> Cheers,
> 
> Hannes
> -- 
> Dr. Hannes Reinecke		      zSeries & Storage
> hare@suse.de			      +49 911 74053 688
> SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: Markus Rex, HRB 16746 (AG Nürnberg)
> 

---

Créez votre adresse électronique prenom.nom@laposte.net 
1 Go d'espace de stockage, anti-spam et anti-virus intégrés.

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [AIC7xxx] tree things to report
@ 2007-05-26 22:28 Emmanuel Fusté
  2007-05-29  7:19 ` Hannes Reinecke
  0 siblings, 1 reply; 6+ messages in thread
From: Emmanuel Fusté @ 2007-05-26 22:28 UTC (permalink / raw)
  To: hare; +Cc: linux-scsi

[-- Attachment #1: Type: text/plain, Size: 1084 bytes --]

Hello,

> What you should do here is:
>
> - hook up a serial cable and re-route console messages to that
> - Switch off syslog (as this might block if the SCSI bus frozen)
> - Enable scsi debugging (Error, Timeout, Scan, and Midlayer is
> sufficient) and start cdrwtools.
> - Send me the log from the serial console.
>
Ok, I've got logs with netconsole after swapping my Ethernet
card  with another one.
First log:
cdrwtool -d /dev/sr0 -q
interrupted with ctrl-c and the driver recover.
Re-run the command, driver recover but cdrwtool crash.

Second log:
re-run the command, driver never recover, kernel lock.

Reboot

3eme log:
cdrwtool -d /dev/sr0 -q
cdrwtool crash, driver recover.

4eme log:
cdrwtool crash, driver recover.

5eme log:
driver never recover and machine lock solid.

I think that the cleanest log to begin with is No3.

Best regards,
Emmanuel.
PS: for the second pblm, it was a cable problem and now solved.

Créez votre adresse électronique prenom.nom@laposte.net
1 Go d'espace de stockage, anti-spam et anti-virus intégrés.

[-- Attachment #2: log.rafale-1.gz --]
[-- Type: application/gzip, Size: 6387 bytes --]

[-- Attachment #3: log.rafale-2.gz --]
[-- Type: application/gzip, Size: 3895 bytes --]

[-- Attachment #4: log.rafale-3.gz --]
[-- Type: application/gzip, Size: 2849 bytes --]

[-- Attachment #5: log.rafale-4.gz --]
[-- Type: application/gzip, Size: 4829 bytes --]

[-- Attachment #6: log.rafale-5.gz --]
[-- Type: application/gzip, Size: 3270 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [AIC7xxx] tree things to report
  2007-05-26 22:28 [AIC7xxx] tree things to report Emmanuel Fusté
@ 2007-05-29  7:19 ` Hannes Reinecke
  0 siblings, 0 replies; 6+ messages in thread
From: Hannes Reinecke @ 2007-05-29  7:19 UTC (permalink / raw)
  To: Emmanuel Fusté; +Cc: linux-scsi, James Bottomley

[-- Attachment #1: Type: text/plain, Size: 1420 bytes --]

Emmanuel Fusté wrote:
> Hello,
> 
>> What you should do here is:
>>
>> - hook up a serial cable and re-route console messages to that
>> - Switch off syslog (as this might block if the SCSI bus frozen)
>> - Enable scsi debugging (Error, Timeout, Scan, and Midlayer is
>> sufficient) and start cdrwtools.
>> - Send me the log from the serial console.
>>
> Ok, I've got logs with netconsole after swapping my Ethernet
> card  with another one.

Grand. Well done, son.
The logs have been very instructive.

Again we're hitting this 'two commands per lun' problem.
For historic reasons the aic7xxx and aic79xx driver accepted two
commands per luns, as they implemented their internal queueing and could
hold the second command on the queue.
With later versions I've removed this internal queueing and relied on
the block-layer for this.
But this also means we can only accept one command per lun.

Please try the attached patch and see if it helps.

James, I know that the aic7xxx has some 'next_queued_hscb' pointer which
might be utilized for this sort of thing. But I didn't really figure out
how this thing is supposed to work nor how we could utilize it.
So I figured that the added complexity is not really worth it.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

[-- Attachment #2: aic7xxx-allow-only-single-commands --]
[-- Type: text/plain, Size: 1322 bytes --]

Allow only a single command per lun for aic7xxx

With the conversion to use transport classes we also removed the
internal queueing from the driver. Hence the existing hack of
accepting two commands per lun and just holding this other one
internally is no longer valid.

Signed-off-by: Hannes Reinecke <hare@suse.de>

diff --git a/drivers/scsi/aic7xxx/aic79xx_osm.c b/drivers/scsi/aic7xxx/aic79xx_osm.c
index 6054881..df8a3b2 100644
--- a/drivers/scsi/aic7xxx/aic79xx_osm.c
+++ b/drivers/scsi/aic7xxx/aic79xx_osm.c
@@ -775,7 +775,7 @@ struct scsi_host_template aic79xx_driver
 	.can_queue		= AHD_MAX_QUEUE,
 	.this_id		= -1,
 	.max_sectors		= 8192,
-	.cmd_per_lun		= 2,
+	.cmd_per_lun		= 1,
 	.use_clustering		= ENABLE_CLUSTERING,
 	.slave_alloc		= ahd_linux_slave_alloc,
 	.slave_configure	= ahd_linux_slave_configure,
diff --git a/drivers/scsi/aic7xxx/aic7xxx_osm.c b/drivers/scsi/aic7xxx/aic7xxx_osm.c
index 660f26e..e6b87b9 100644
--- a/drivers/scsi/aic7xxx/aic7xxx_osm.c
+++ b/drivers/scsi/aic7xxx/aic7xxx_osm.c
@@ -755,7 +755,7 @@ struct scsi_host_template aic7xxx_driver
 	.can_queue		= AHC_MAX_QUEUE,
 	.this_id		= -1,
 	.max_sectors		= 8192,
-	.cmd_per_lun		= 2,
+	.cmd_per_lun		= 1,
 	.use_clustering		= ENABLE_CLUSTERING,
 	.slave_alloc		= ahc_linux_slave_alloc,
 	.slave_configure	= ahc_linux_slave_configure,

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [AIC7xxx] tree things to report
@ 2007-05-29 21:16 Emmanuel Fusté
  0 siblings, 0 replies; 6+ messages in thread
From: Emmanuel Fusté @ 2007-05-29 21:16 UTC (permalink / raw)
  To: hare; +Cc: linux-scsi, James.Bottomley

[-- Attachment #1: Type: text/plain, Size: 3339 bytes --]

> Grand. Well done, son.
> The logs have been very instructive.
>
> Again we're hitting this 'two commands per lun' problem.
> For historic reasons the aic7xxx and aic79xx driver accepted two
> commands per luns, as they implemented their internal
queueing and could
> hold the second command on the queue.
> With later versions I've removed this internal queueing and
relied on
> the block-layer for this.
> But this also means we can only accept one command per lun.
>
> Please try the attached patch and see if it helps.
>
> James, I know that the aic7xxx has some 'next_queued_hscb'
pointer which
> might be utilized for this sort of thing. But I didn't
really figure out
> how this thing is supposed to work nor how we could utilize it.
> So I figured that the added complexity is not really worth it.
>
Hi,
Some news: I tried the patch and I still get this sort of
instant bus freeze with difficult recovery.
But there is some interesting new things too:

First log: standard boot, netconsole start, echo 32767 >
/proc/sys/dev/scsi/scsi ; cdrwtool -t4 -d/dev/sr0 -q
===> scsi bus crash -> lot of log -> Kernel Bug.

Second log: standard boot , init 1 to go into single user
mode, echo 32767 > /proc/sys/dev/scsi/scsi ; cdrwtool -t4
-d/dev/sr0 -q
Bus crash, recovery, cdrwtool command crash, get the shell back.
Remount root-fs read-only to suppress completely sd 0:0:0:0
activity.
cdrwtool -t4 -d/dev/sr0 -q
Lots of recovery logs and .... blablabla "your cd will be
completely erased "blablabla"press y to continue" !!!!!!!
Y enter -> writer led start to blink, formating is running!!!!
But ~30s later, driver recovery or scsi timeout or midlayer
timeout (I don't know) is kicking, device is reseted, stopping
the disc formating. sniff.
cdrwtool report udf filesystem structure initialization but
all is discarded by the driver or the midlayer. cdrwtool exit.

last log: without reboot: cdrwtool -t4 -d/dev/sr0 -q
bus freeze, lots of log, cdrwtool command crash ...

All that I could say with my limited understanding of the big
picture and what I previously saw:
- The aic7xxx recovery path is still very fragile and unable
to recover from problems under scsi bus activity. Perhaps the
port of your previous work on this path from aic79xx could help.
- It seems that the commands send by cdrwtool which confuse the
driver are commands to "sense" the properties of the inserted
media and not the formating command itself.
- The first part: commands which crash the scsi bus before the
begin of the media format was not happening before the big
driver change (before 2.6.14).
- The second part: formating interrupted because driver
recovery kicked in was already happening with 2.6.13 and the
recovery already fail and never recover....
- Perhaps to worsen all of this, cdrwtool is very crude with
my old yamaha cdwriter and help to trigger a chain of worst
case events which expose lots of bugs/unhanded cases.
(cdrwtool bugs/writer firmware bugs/aic7xxx bugs .....).

Hope this could help.
The positive thing is that now with the help of Francois
Romieu I could use my old pcnet32 card to get the logs ;-)

Best regards,
Emmanuel.
---

Créez votre adresse électronique prenom.nom@laposte.net
1 Go d'espace de stockage, anti-spam et anti-virus intégrés.

[-- Attachment #2: log.rafale-new-1.gz --]
[-- Type: application/gzip, Size: 5576 bytes --]

[-- Attachment #3: log.rafale-new-2.gz --]
[-- Type: application/gzip, Size: 9187 bytes --]

[-- Attachment #4: log.rafale-new-3.gz --]
[-- Type: application/gzip, Size: 3372 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [AIC7xxx] tree things to report
@ 2007-06-01 11:24 Emmanuel Fusté
  0 siblings, 0 replies; 6+ messages in thread
From: Emmanuel Fusté @ 2007-06-01 11:24 UTC (permalink / raw)
  To: hare; +Cc: linux-scsi, James.Bottomley

 
> Please try the attached patch and see if it helps.
> 
> James, I know that the aic7xxx has some 'next_queued_hscb' pointer which
> might be utilized for this sort of thing. But I didn't really figure out
> how this thing is supposed to work nor how we could utilize it.
> So I figured that the added complexity is not really worth it.
> 
Did you have time to look at the new logs this the patch applied ? Do you need something else ?

Cheers,
Emmanuel.
---

Créez votre adresse électronique prenom.nom@laposte.net 
1 Go d'espace de stockage, anti-spam et anti-virus intégrés.

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2007-06-01 11:24 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-05-26 22:28 [AIC7xxx] tree things to report Emmanuel Fusté
2007-05-29  7:19 ` Hannes Reinecke
  -- strict thread matches above, loose matches on Subject: below --
2007-06-01 11:24 Emmanuel Fusté
2007-05-29 21:16 Emmanuel Fusté
2007-05-24 22:42 Emmanuel Fusté
2007-05-23 21:35 Emmanuel Fusté

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).