linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* i/o errors
@ 2004-06-23  9:42 Bernd Schubert
  2004-06-23 14:00 ` Fabien Salvi
  0 siblings, 1 reply; 3+ messages in thread
From: Bernd Schubert @ 2004-06-23  9:42 UTC (permalink / raw)
  To: SCSI Mailing List

Hello,

we have trouble with our transtec 5008 IDE/SCSI raid array, sometimes the 
scsi-driver reports i/o erros. Due to the reason described below, I'm not so 
sure if those i/o errors are really caused by the raid-array. So I want to 
ask if the linux scsi-system might cause/report wrong errors?

Description:
The mainserver (with the transtec raid array) is connect via a gigabit 
ethernet connection to a failover server. Some partitions on the main server 
are mirrored via drbd with this failover server. During a full resync from 
the failover server and write speeds of about 50-60MB/s it happens that the 
raid array doesn't respond any more and so the scsi driver resets the bus. 
Well, though that is not nice, it still doesn't cause any harm. Its much 
worse, that sometimes after such reset the scsi driver might report an I/O 
error of the device, in that case drbd will stop the system immediately.

First I thought its a bug of the scsi-driver, after Justin T. Gibbs told me 
that it can't be a bug of the driver, I contacted transtec and we got the 
raid array replaced with a new one. Unfortunality this didn't solve our 
problems.  To be sure its not a controller, cable, etc. bug, we connected the 
raid array with completely different cables to the failover server, which 
also has a MPT scsi controller instead of the Adaptec controller of the main 
server, but the same problems remained.
So that probably means the the transtec array has a general bug?

However, when some people recently reported problems with their usb sticks, I 
overthought the situation and now I'm also considering that their might be a 
general linux scsi-bug.

USB-stick-problem:
- Suddenly the problematic systems don't like the sticks any more and report 
i/o errors when they are accessing the usb stick. Removing and re-inserting 
the stick doesn't help, however, rebooting the system fixes this issue.
- Well, that might be an USB problem, but those sticks use the sg-driver and 
so the scsi-system, could this somehow be related with our raid i/o errors?

Also, trantec told me that the raid array should report an error to its logs 
when an i/o error happens, but there is no error message at all :(


All those problems happend with 2.4.26, but we now also tried to use 2.6.7 and 
the problem doesn't occur with this kernel. Unfortunality we never reach the 
resync speed of >45 MB, its usually about 30MB/s. If we reduced the 
resync-speed in 2.4.26 to those values, we also never had the problem, so the 
test with 2.6.7 doesn't help so much in this case.


So, does someone  here has an idea if this a bug of the transtec array or of 
the linux-scsi system?


Thanks in advance,
	Bernd


-- 
Bernd Schubert
Physikalisch Chemisches Institut / Theoretische Chemie
Universität Heidelberg
INF 229
69120 Heidelberg
e-mail: bernd.schubert@pci.uni-heidelberg.de
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: i/o errors
  2004-06-23  9:42 i/o errors Bernd Schubert
@ 2004-06-23 14:00 ` Fabien Salvi
  2004-06-23 15:08   ` Bernd Schubert
  0 siblings, 1 reply; 3+ messages in thread
From: Fabien Salvi @ 2004-06-23 14:00 UTC (permalink / raw)
  To: Bernd Schubert; +Cc: SCSI Mailing List

Bernd Schubert a écrit :
> Hello,

Hello,

> we have trouble with our transtec 5008 IDE/SCSI raid array, sometimes the 
> scsi-driver reports i/o erros. Due to the reason described below, I'm not so 
> sure if those i/o errors are really caused by the raid-array. So I want to 
> ask if the linux scsi-system might cause/report wrong errors?
> 
> Description:

 > [...]

> Also, trantec told me that the raid array should report an error to its logs 
> when an i/o error happens, but there is no error message at all :(
> 
> 
> All those problems happend with 2.4.26, but we now also tried to use 2.6.7 and 
> the problem doesn't occur with this kernel. Unfortunality we never reach the 
> resync speed of >45 MB, its usually about 30MB/s. If we reduced the 
> resync-speed in 2.4.26 to those values, we also never had the problem, so the 
> test with 2.6.7 doesn't help so much in this case.
> 
> 
> So, does someone  here has an idea if this a bug of the transtec array or of 
> the linux-scsi system?

I won't be surprised if it's a hardware related problem.
Do you know which is the real manufacturer of the RAID controller and 
firmware ? I don't think Transtec make their own system...

IMHO, you should try big bench without DRDB using I/O benchmark tool and 
also simply dd to make big parallel transfers and check if you can 
reproduce the bug. It would be interested, if you get the bug, to try 
with other linux kernel revision and also other OS...

Good luck!

-- 
Fabien SALVI
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: i/o errors
  2004-06-23 14:00 ` Fabien Salvi
@ 2004-06-23 15:08   ` Bernd Schubert
  0 siblings, 0 replies; 3+ messages in thread
From: Bernd Schubert @ 2004-06-23 15:08 UTC (permalink / raw)
  To: Fabien Salvi; +Cc: SCSI Mailing List

> I won't be surprised if it's a hardware related problem.
> Do you know which is the real manufacturer of the RAID controller and
> firmware ? I don't think Transtec make their own system...

As far as I know its an Infortrend device, at least the manual and usage 
information are similar with an infortrend device. If you have interest, I 
could try to find out which of the infortrend devices it is.

>
> IMHO, you should try big bench without DRDB using I/O benchmark tool and
> also simply dd to make big parallel transfers and check if you can
> reproduce the bug. It would be interested, if you get the bug, to try
> with other linux kernel revision and also other OS...

Of course, I already performed those benchmarks, however only on the 
filesystem and I never could reproduce those bugs. Tomorrow afternoon I will 
try what happens without the filesystem.
Maybe the filesystem layer speed degrading is sufficient to prevent the bug. 
When the problem first occured and asked Justin about it, he told me to use 
his newer driver versions. Then I really thought that it is a driver bug, 
because it got worse with every driver revision. Finally Justin told me that 
every revision became slightly faster - this slight speed increase was enough 
to reliably trigger this bug :/

We are trying to fix this bug for more than four weeks now, and finally we 
would like to use our new storage server. However, I'm really worried that 
this problem will occur during the real usage, though my tests showed that it 
shouldn't happen in real live.


I really would prever not to use an other OS, since I have no recent 
experience with them.

>
> Good luck!

Thanks at  lot!


Cheers,
	Bernd


-- 
Bernd Schubert
Physikalisch Chemisches Institut / Theoretische Chemie
Universität Heidelberg
INF 229
69120 Heidelberg
e-mail: bernd.schubert@pci.uni-heidelberg.de
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2004-06-23 15:08 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-06-23  9:42 i/o errors Bernd Schubert
2004-06-23 14:00 ` Fabien Salvi
2004-06-23 15:08   ` Bernd Schubert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).