* i/o errors
@ 2004-06-23 9:42 Bernd Schubert
2004-06-23 14:00 ` Fabien Salvi
0 siblings, 1 reply; 7+ messages in thread
From: Bernd Schubert @ 2004-06-23 9:42 UTC (permalink / raw)
To: SCSI Mailing List
Hello,
we have trouble with our transtec 5008 IDE/SCSI raid array, sometimes the
scsi-driver reports i/o erros. Due to the reason described below, I'm not so
sure if those i/o errors are really caused by the raid-array. So I want to
ask if the linux scsi-system might cause/report wrong errors?
Description:
The mainserver (with the transtec raid array) is connect via a gigabit
ethernet connection to a failover server. Some partitions on the main server
are mirrored via drbd with this failover server. During a full resync from
the failover server and write speeds of about 50-60MB/s it happens that the
raid array doesn't respond any more and so the scsi driver resets the bus.
Well, though that is not nice, it still doesn't cause any harm. Its much
worse, that sometimes after such reset the scsi driver might report an I/O
error of the device, in that case drbd will stop the system immediately.
First I thought its a bug of the scsi-driver, after Justin T. Gibbs told me
that it can't be a bug of the driver, I contacted transtec and we got the
raid array replaced with a new one. Unfortunality this didn't solve our
problems. To be sure its not a controller, cable, etc. bug, we connected the
raid array with completely different cables to the failover server, which
also has a MPT scsi controller instead of the Adaptec controller of the main
server, but the same problems remained.
So that probably means the the transtec array has a general bug?
However, when some people recently reported problems with their usb sticks, I
overthought the situation and now I'm also considering that their might be a
general linux scsi-bug.
USB-stick-problem:
- Suddenly the problematic systems don't like the sticks any more and report
i/o errors when they are accessing the usb stick. Removing and re-inserting
the stick doesn't help, however, rebooting the system fixes this issue.
- Well, that might be an USB problem, but those sticks use the sg-driver and
so the scsi-system, could this somehow be related with our raid i/o errors?
Also, trantec told me that the raid array should report an error to its logs
when an i/o error happens, but there is no error message at all :(
All those problems happend with 2.4.26, but we now also tried to use 2.6.7 and
the problem doesn't occur with this kernel. Unfortunality we never reach the
resync speed of >45 MB, its usually about 30MB/s. If we reduced the
resync-speed in 2.4.26 to those values, we also never had the problem, so the
test with 2.6.7 doesn't help so much in this case.
So, does someone here has an idea if this a bug of the transtec array or of
the linux-scsi system?
Thanks in advance,
Bernd
--
Bernd Schubert
Physikalisch Chemisches Institut / Theoretische Chemie
Universität Heidelberg
INF 229
69120 Heidelberg
e-mail: bernd.schubert@pci.uni-heidelberg.de
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: i/o errors
2004-06-23 9:42 i/o errors Bernd Schubert
@ 2004-06-23 14:00 ` Fabien Salvi
2004-06-23 15:08 ` Bernd Schubert
0 siblings, 1 reply; 7+ messages in thread
From: Fabien Salvi @ 2004-06-23 14:00 UTC (permalink / raw)
To: Bernd Schubert; +Cc: SCSI Mailing List
Bernd Schubert a écrit :
> Hello,
Hello,
> we have trouble with our transtec 5008 IDE/SCSI raid array, sometimes the
> scsi-driver reports i/o erros. Due to the reason described below, I'm not so
> sure if those i/o errors are really caused by the raid-array. So I want to
> ask if the linux scsi-system might cause/report wrong errors?
>
> Description:
> [...]
> Also, trantec told me that the raid array should report an error to its logs
> when an i/o error happens, but there is no error message at all :(
>
>
> All those problems happend with 2.4.26, but we now also tried to use 2.6.7 and
> the problem doesn't occur with this kernel. Unfortunality we never reach the
> resync speed of >45 MB, its usually about 30MB/s. If we reduced the
> resync-speed in 2.4.26 to those values, we also never had the problem, so the
> test with 2.6.7 doesn't help so much in this case.
>
>
> So, does someone here has an idea if this a bug of the transtec array or of
> the linux-scsi system?
I won't be surprised if it's a hardware related problem.
Do you know which is the real manufacturer of the RAID controller and
firmware ? I don't think Transtec make their own system...
IMHO, you should try big bench without DRDB using I/O benchmark tool and
also simply dd to make big parallel transfers and check if you can
reproduce the bug. It would be interested, if you get the bug, to try
with other linux kernel revision and also other OS...
Good luck!
--
Fabien SALVI
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: i/o errors
2004-06-23 14:00 ` Fabien Salvi
@ 2004-06-23 15:08 ` Bernd Schubert
0 siblings, 0 replies; 7+ messages in thread
From: Bernd Schubert @ 2004-06-23 15:08 UTC (permalink / raw)
To: Fabien Salvi; +Cc: SCSI Mailing List
> I won't be surprised if it's a hardware related problem.
> Do you know which is the real manufacturer of the RAID controller and
> firmware ? I don't think Transtec make their own system...
As far as I know its an Infortrend device, at least the manual and usage
information are similar with an infortrend device. If you have interest, I
could try to find out which of the infortrend devices it is.
>
> IMHO, you should try big bench without DRDB using I/O benchmark tool and
> also simply dd to make big parallel transfers and check if you can
> reproduce the bug. It would be interested, if you get the bug, to try
> with other linux kernel revision and also other OS...
Of course, I already performed those benchmarks, however only on the
filesystem and I never could reproduce those bugs. Tomorrow afternoon I will
try what happens without the filesystem.
Maybe the filesystem layer speed degrading is sufficient to prevent the bug.
When the problem first occured and asked Justin about it, he told me to use
his newer driver versions. Then I really thought that it is a driver bug,
because it got worse with every driver revision. Finally Justin told me that
every revision became slightly faster - this slight speed increase was enough
to reliably trigger this bug :/
We are trying to fix this bug for more than four weeks now, and finally we
would like to use our new storage server. However, I'm really worried that
this problem will occur during the real usage, though my tests showed that it
shouldn't happen in real live.
I really would prever not to use an other OS, since I have no recent
experience with them.
>
> Good luck!
Thanks at lot!
Cheers,
Bernd
--
Bernd Schubert
Physikalisch Chemisches Institut / Theoretische Chemie
Universität Heidelberg
INF 229
69120 Heidelberg
e-mail: bernd.schubert@pci.uni-heidelberg.de
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 7+ messages in thread
* I/O Errors
@ 2004-09-24 0:38 f00bar
2004-09-27 15:53 ` Alex Zarochentsev
0 siblings, 1 reply; 7+ messages in thread
From: f00bar @ 2004-09-24 0:38 UTC (permalink / raw)
To: reiserfs-list
I have been having problems with my reiser4
partitions. I believed it was how I had my machine
configured but I doubt this assumption now.
I sucessfully created a resier4 fs on a LVM2 volume
group. After mounting it and executing a few basic i/o
things (like cp, mv, cat, ls, rm) all is okay.
After some emerging (I use Gentoo), I "lose" the fs. I
have even tried mkfs.resier4 and I still can not use
the fs.
# ls -l /var/tmp/portage
ls: reading directory /var/tmp/portage: Input/output
error
If I try to cp a file, no error, no file, but the
superblock does become corrupt.
I have 2 volume groups on one LVM partition spanning
the entire scsi drive. Half is an ext3 fs and the
other half is reiser4 fs. I mount one or the other as
/var/tmp/portage (where source code is compiled using
Gentoo) for testing purposes.
I am using reiser4progs and libaal 1.0.2_pre1.
# uname -a
Linux yin-yang 2.6.9-rc2-mm1-tao #1 SMP Sun Sep 19
03:14:18 CDT 2004 i686 AMD Athlon(tm) MP 2400+
AuthenticAMD GNU/Linux
# debugfs.reiser4 -p /dev/vgsdc1/reiser4
debugfs.reiser4 1.0.2-pre1
Copyright (C) 2001, 2002, 2003, 2004 by Hans Reiser,
licensing governed by reiser4progs/COPYING.
Default profiles:
format: "format40" (id:0x0 type:0x8)
journal: "journal40" (id:0x0 type:0xf)
oid: "oid40" (id:0x0 type:0x9)
alloc: "alloc40" (id:0x0 type:0xe)
key: "key_large" (id:0x1 type:0x10)
node: "node40" (id:0x0 type:0x2)
statdata: "stat40" (id:0x0 type:0x1)
nodeptr: "nodeptr40" (id:0x3 type:0x1)
direntry: "cde40" (id:0x2 type:0x1)
tail: "tail40" (id:0x6 type:0x1)
extent: "extent40" (id:0x5 type:0x1)
acl: "absent (id:0x4 type:0x1)"
permission: "absent (id:0x0 type:0x6)"
regular: "reg40" (id:0x0 type:0x0)
directory: "dir40" (id:0x1 type:0x0)
symlink: "sym40" (id:0x2 type:0x0)
special: "spl40" (id:0x3 type:0x0)
hash: "r5_hash" (id:0x1 type:0x3)
fibration: "ext_1_fibre" (id:0x2 type:0x4)
formatting: "smart" (id:0x2 type:0x5)
# debugfs.reiser4 -s /dev/vgsdc1/reiser4
debugfs.reiser4 1.0.2-pre1
Copyright (C) 2001, 2002, 2003, 2004 by Hans Reiser,
licensing governed by reiser4progs/COPYING.
Master super block (16):
magic: ReIsEr4
blksize: 4096
format: 0x0 (format40)
uuid: a529df4c-09bb-4803-b5c1-33938d5bbc68
label: <none>
Format super block (17):
plugin: format40
description: Disk-format for reiser4, ver.
1.0.2-pre1
magic: ReIsEr40FoRmAt
flushes: 0
mkfs id: 0x3bd87cf1
blocks: 553984
free blocks: 553937
root block: 23
tail policy: 0x2 (smart)
next oid: 0x10000
file count: 0
tree height: 2
key policy: LARGE
FS status block (21):
FS marked consistent
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: I/O Errors
2004-09-24 0:38 I/O Errors f00bar
@ 2004-09-27 15:53 ` Alex Zarochentsev
2004-09-28 0:26 ` f00bar
0 siblings, 1 reply; 7+ messages in thread
From: Alex Zarochentsev @ 2004-09-27 15:53 UTC (permalink / raw)
To: f00bar; +Cc: reiserfs-list
Hi,
On Thu, Sep 23, 2004 at 05:38:01PM -0700, f00bar wrote:
> I have been having problems with my reiser4
> partitions. I believed it was how I had my machine
> configured but I doubt this assumption now.
>
> I sucessfully created a resier4 fs on a LVM2 volume
> group. After mounting it and executing a few basic i/o
> things (like cp, mv, cat, ls, rm) all is okay.
>
> After some emerging (I use Gentoo), I "lose" the fs. I
> have even tried mkfs.resier4 and I still can not use
> the fs.
>
> # ls -l /var/tmp/portage
> ls: reading directory /var/tmp/portage: Input/output
> error
anything in the logs?
>
> If I try to cp a file, no error, no file, but the
> superblock does become corrupt.
what "fsck.reiser4 /dev/vgsdc1/reiser4" says?
> I have 2 volume groups on one LVM partition spanning
> the entire scsi drive. Half is an ext3 fs and the
> other half is reiser4 fs. I mount one or the other as
> /var/tmp/portage (where source code is compiled using
> Gentoo) for testing purposes.
>
> I am using reiser4progs and libaal 1.0.2_pre1.
>
> # uname -a
> Linux yin-yang 2.6.9-rc2-mm1-tao #1 SMP Sun Sep 19
> 03:14:18 CDT 2004 i686 AMD Athlon(tm) MP 2400+
> AuthenticAMD GNU/Linux
>
>
> # debugfs.reiser4 -p /dev/vgsdc1/reiser4
> debugfs.reiser4 1.0.2-pre1
> Copyright (C) 2001, 2002, 2003, 2004 by Hans Reiser,
> licensing governed by reiser4progs/COPYING.
>
> Default profiles:
> format: "format40" (id:0x0 type:0x8)
> journal: "journal40" (id:0x0 type:0xf)
> oid: "oid40" (id:0x0 type:0x9)
> alloc: "alloc40" (id:0x0 type:0xe)
> key: "key_large" (id:0x1 type:0x10)
> node: "node40" (id:0x0 type:0x2)
> statdata: "stat40" (id:0x0 type:0x1)
> nodeptr: "nodeptr40" (id:0x3 type:0x1)
> direntry: "cde40" (id:0x2 type:0x1)
> tail: "tail40" (id:0x6 type:0x1)
> extent: "extent40" (id:0x5 type:0x1)
> acl: "absent (id:0x4 type:0x1)"
> permission: "absent (id:0x0 type:0x6)"
> regular: "reg40" (id:0x0 type:0x0)
> directory: "dir40" (id:0x1 type:0x0)
> symlink: "sym40" (id:0x2 type:0x0)
> special: "spl40" (id:0x3 type:0x0)
> hash: "r5_hash" (id:0x1 type:0x3)
> fibration: "ext_1_fibre" (id:0x2 type:0x4)
> formatting: "smart" (id:0x2 type:0x5)
>
>
> # debugfs.reiser4 -s /dev/vgsdc1/reiser4
> debugfs.reiser4 1.0.2-pre1
> Copyright (C) 2001, 2002, 2003, 2004 by Hans Reiser,
> licensing governed by reiser4progs/COPYING.
>
> Master super block (16):
> magic: ReIsEr4
> blksize: 4096
> format: 0x0 (format40)
> uuid: a529df4c-09bb-4803-b5c1-33938d5bbc68
> label: <none>
>
> Format super block (17):
> plugin: format40
> description: Disk-format for reiser4, ver.
> 1.0.2-pre1
> magic: ReIsEr40FoRmAt
> flushes: 0
> mkfs id: 0x3bd87cf1
> blocks: 553984
> free blocks: 553937
> root block: 23
> tail policy: 0x2 (smart)
> next oid: 0x10000
> file count: 0
> tree height: 2
> key policy: LARGE
>
> FS status block (21):
> FS marked consistent
>
--
Alex.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: I/O Errors
2004-09-27 15:53 ` Alex Zarochentsev
@ 2004-09-28 0:26 ` f00bar
2004-09-28 15:02 ` E.Gryaznova
0 siblings, 1 reply; 7+ messages in thread
From: f00bar @ 2004-09-28 0:26 UTC (permalink / raw)
To: Alex Zarochentsev; +Cc: reiserfs-list
Hi Alex,
Thanks for your assistance.
When I mount the partition, (actually a logical
volume) the following is written to syslog.
Sep 27 19:17:32 yin-yang ef_hash_table: 8192 buckets
Sep 27 19:17:32 yin-yang z_hash_table: 8192 buckets
Sep 27 19:17:32 yin-yang z_hash_table: 8192 buckets
Sep 27 19:17:32 yin-yang j_hash_table: 16384 buckets
Sep 27 19:17:32 yin-yang loading reiser4
bitmap......done (148 jiffies)
Sep 27 19:17:32 yin-yang d_cursor_hash_table: 256
buckets
I perform an "ls" the following is written to the
console (I assume stderr) but nothing is written to
syslog.
ls: reading directory /var/tmp/portage: Input/output
error
total 0
The following is the output from fsck:
yin-yang log # fsck.reiser4 /dev/vgsdc1/reiser4
*******************************************************************
This is an EXPERIMENTAL version of fsck.reiser4. Read
README first.
*******************************************************************
Fscking the /dev/vgsdc1/reiser4 block device.
Will check the consistency of the Reiser4 SuperBlock.
Will check the consistency of the Reiser4 FileSystem.
Continue?
(Yes/No): yes
***** fsck.reiser4 started at Mon Sep 27 19:23:14 2004
Reiser4 journal (journal40) on /dev/vgsdc1/reiser4: 0
transactions replayed of the total 0 blocks.
Reiser4 fs was detected on /dev/vgsdc1/reiser4.
Master super block (16):
magic: ReIsEr4
blksize: 4096
format: 0x0 (format40)
uuid: 3581757c-9576-4ed6-ba21-59863d2950b1
label: <none>
Format super block (17):
plugin: format40
description: Disk-format for reiser4, ver.
1.0.2-pre1
magic: ReIsEr40FoRmAt
flushes: 0
mkfs id: 0x4e9bfc27
blocks: 553984
free blocks: 553937
root block: 23
tail policy: 0x2 (smart)
next oid: 0x10000
file count: 0
tree height: 2
key policy: LARGE
CHECKING STORAGE TREE
Read nodes 2
Nodes left in the tree 2
Leaves of them 1, Twigs of them 1
Time interval: Mon Sep 27 19:23:14 2004 - Mon
Sep 27 19:23:14 2004
CHECKING EXTENT REGIONS.
Read twigs 1
Time interval: Mon Sep 27 19:23:14 2004 - Mon
Sep 27 19:23:14 2004
CHECKING SEMANTIC TREE
Found 1 objects.
Time interval: Mon Sep 27 19:23:14 2004 - Mon
Sep 27 19:23:14 2004
***** fsck.reiser4 finished at Mon Sep 27 19:23:14
2004
Closing fs...done
FS is consistent.
Regards,
John Baxter
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: I/O Errors
2004-09-28 0:26 ` f00bar
@ 2004-09-28 15:02 ` E.Gryaznova
0 siblings, 0 replies; 7+ messages in thread
From: E.Gryaznova @ 2004-09-28 15:02 UTC (permalink / raw)
To: f00bar; +Cc: Alex Zarochentsev, reiserfs-list
Hello.
Would you please send us the #pvscan, #pvdisplay, #vgdislpay,
#lvdisplay and #lvs --version outputs.
Thanks,
Lena.
f00bar wrote:
>Hi Alex,
>
>Thanks for your assistance.
>
>When I mount the partition, (actually a logical
>volume) the following is written to syslog.
>
>Sep 27 19:17:32 yin-yang ef_hash_table: 8192 buckets
>Sep 27 19:17:32 yin-yang z_hash_table: 8192 buckets
>Sep 27 19:17:32 yin-yang z_hash_table: 8192 buckets
>Sep 27 19:17:32 yin-yang j_hash_table: 16384 buckets
>Sep 27 19:17:32 yin-yang loading reiser4
>bitmap......done (148 jiffies)
>Sep 27 19:17:32 yin-yang d_cursor_hash_table: 256
>buckets
>
>
>I perform an "ls" the following is written to the
>console (I assume stderr) but nothing is written to
>syslog.
>
>ls: reading directory /var/tmp/portage: Input/output
>error
>total 0
>
>
>
>The following is the output from fsck:
>
>yin-yang log # fsck.reiser4 /dev/vgsdc1/reiser4
>*******************************************************************
>This is an EXPERIMENTAL version of fsck.reiser4. Read
>README first.
>*******************************************************************
>
>Fscking the /dev/vgsdc1/reiser4 block device.
>Will check the consistency of the Reiser4 SuperBlock.
>Will check the consistency of the Reiser4 FileSystem.
>Continue?
>(Yes/No): yes
>***** fsck.reiser4 started at Mon Sep 27 19:23:14 2004
>Reiser4 journal (journal40) on /dev/vgsdc1/reiser4: 0
>transactions replayed of the total 0 blocks.
>Reiser4 fs was detected on /dev/vgsdc1/reiser4.
>Master super block (16):
>magic: ReIsEr4
>blksize: 4096
>format: 0x0 (format40)
>uuid: 3581757c-9576-4ed6-ba21-59863d2950b1
>label: <none>
>
>Format super block (17):
>plugin: format40
>description: Disk-format for reiser4, ver.
>1.0.2-pre1
>magic: ReIsEr40FoRmAt
>flushes: 0
>mkfs id: 0x4e9bfc27
>blocks: 553984
>free blocks: 553937
>root block: 23
>tail policy: 0x2 (smart)
>next oid: 0x10000
>file count: 0
>tree height: 2
>key policy: LARGE
>
>
>CHECKING STORAGE TREE
> Read nodes 2
> Nodes left in the tree 2
> Leaves of them 1, Twigs of them 1
> Time interval: Mon Sep 27 19:23:14 2004 - Mon
>Sep 27 19:23:14 2004
>CHECKING EXTENT REGIONS.
> Read twigs 1
> Time interval: Mon Sep 27 19:23:14 2004 - Mon
>Sep 27 19:23:14 2004
>CHECKING SEMANTIC TREE
> Found 1 objects.
> Time interval: Mon Sep 27 19:23:14 2004 - Mon
>Sep 27 19:23:14 2004
>***** fsck.reiser4 finished at Mon Sep 27 19:23:14
>2004
>Closing fs...done
>
>FS is consistent.
>
>Regards,
>John Baxter
>
>
>
>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2004-09-28 15:02 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-09-24 0:38 I/O Errors f00bar
2004-09-27 15:53 ` Alex Zarochentsev
2004-09-28 0:26 ` f00bar
2004-09-28 15:02 ` E.Gryaznova
-- strict thread matches above, loose matches on Subject: below --
2004-06-23 9:42 i/o errors Bernd Schubert
2004-06-23 14:00 ` Fabien Salvi
2004-06-23 15:08 ` Bernd Schubert
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.