All of lore.kernel.org
 help / color / mirror / Atom feed
* i/o errors
@ 2004-06-23  9:42 Bernd Schubert
  2004-06-23 14:00 ` Fabien Salvi
  0 siblings, 1 reply; 7+ messages in thread
From: Bernd Schubert @ 2004-06-23  9:42 UTC (permalink / raw)
  To: SCSI Mailing List

Hello,

we have trouble with our transtec 5008 IDE/SCSI raid array, sometimes the 
scsi-driver reports i/o erros. Due to the reason described below, I'm not so 
sure if those i/o errors are really caused by the raid-array. So I want to 
ask if the linux scsi-system might cause/report wrong errors?

Description:
The mainserver (with the transtec raid array) is connect via a gigabit 
ethernet connection to a failover server. Some partitions on the main server 
are mirrored via drbd with this failover server. During a full resync from 
the failover server and write speeds of about 50-60MB/s it happens that the 
raid array doesn't respond any more and so the scsi driver resets the bus. 
Well, though that is not nice, it still doesn't cause any harm. Its much 
worse, that sometimes after such reset the scsi driver might report an I/O 
error of the device, in that case drbd will stop the system immediately.

First I thought its a bug of the scsi-driver, after Justin T. Gibbs told me 
that it can't be a bug of the driver, I contacted transtec and we got the 
raid array replaced with a new one. Unfortunality this didn't solve our 
problems.  To be sure its not a controller, cable, etc. bug, we connected the 
raid array with completely different cables to the failover server, which 
also has a MPT scsi controller instead of the Adaptec controller of the main 
server, but the same problems remained.
So that probably means the the transtec array has a general bug?

However, when some people recently reported problems with their usb sticks, I 
overthought the situation and now I'm also considering that their might be a 
general linux scsi-bug.

USB-stick-problem:
- Suddenly the problematic systems don't like the sticks any more and report 
i/o errors when they are accessing the usb stick. Removing and re-inserting 
the stick doesn't help, however, rebooting the system fixes this issue.
- Well, that might be an USB problem, but those sticks use the sg-driver and 
so the scsi-system, could this somehow be related with our raid i/o errors?

Also, trantec told me that the raid array should report an error to its logs 
when an i/o error happens, but there is no error message at all :(


All those problems happend with 2.4.26, but we now also tried to use 2.6.7 and 
the problem doesn't occur with this kernel. Unfortunality we never reach the 
resync speed of >45 MB, its usually about 30MB/s. If we reduced the 
resync-speed in 2.4.26 to those values, we also never had the problem, so the 
test with 2.6.7 doesn't help so much in this case.


So, does someone  here has an idea if this a bug of the transtec array or of 
the linux-scsi system?


Thanks in advance,
	Bernd


-- 
Bernd Schubert
Physikalisch Chemisches Institut / Theoretische Chemie
Universität Heidelberg
INF 229
69120 Heidelberg
e-mail: bernd.schubert@pci.uni-heidelberg.de
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: i/o errors
  2004-06-23  9:42 i/o errors Bernd Schubert
@ 2004-06-23 14:00 ` Fabien Salvi
  2004-06-23 15:08   ` Bernd Schubert
  0 siblings, 1 reply; 7+ messages in thread
From: Fabien Salvi @ 2004-06-23 14:00 UTC (permalink / raw)
  To: Bernd Schubert; +Cc: SCSI Mailing List

Bernd Schubert a écrit :
> Hello,

Hello,

> we have trouble with our transtec 5008 IDE/SCSI raid array, sometimes the 
> scsi-driver reports i/o erros. Due to the reason described below, I'm not so 
> sure if those i/o errors are really caused by the raid-array. So I want to 
> ask if the linux scsi-system might cause/report wrong errors?
> 
> Description:

 > [...]

> Also, trantec told me that the raid array should report an error to its logs 
> when an i/o error happens, but there is no error message at all :(
> 
> 
> All those problems happend with 2.4.26, but we now also tried to use 2.6.7 and 
> the problem doesn't occur with this kernel. Unfortunality we never reach the 
> resync speed of >45 MB, its usually about 30MB/s. If we reduced the 
> resync-speed in 2.4.26 to those values, we also never had the problem, so the 
> test with 2.6.7 doesn't help so much in this case.
> 
> 
> So, does someone  here has an idea if this a bug of the transtec array or of 
> the linux-scsi system?

I won't be surprised if it's a hardware related problem.
Do you know which is the real manufacturer of the RAID controller and 
firmware ? I don't think Transtec make their own system...

IMHO, you should try big bench without DRDB using I/O benchmark tool and 
also simply dd to make big parallel transfers and check if you can 
reproduce the bug. It would be interested, if you get the bug, to try 
with other linux kernel revision and also other OS...

Good luck!

-- 
Fabien SALVI
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: i/o errors
  2004-06-23 14:00 ` Fabien Salvi
@ 2004-06-23 15:08   ` Bernd Schubert
  0 siblings, 0 replies; 7+ messages in thread
From: Bernd Schubert @ 2004-06-23 15:08 UTC (permalink / raw)
  To: Fabien Salvi; +Cc: SCSI Mailing List

> I won't be surprised if it's a hardware related problem.
> Do you know which is the real manufacturer of the RAID controller and
> firmware ? I don't think Transtec make their own system...

As far as I know its an Infortrend device, at least the manual and usage 
information are similar with an infortrend device. If you have interest, I 
could try to find out which of the infortrend devices it is.

>
> IMHO, you should try big bench without DRDB using I/O benchmark tool and
> also simply dd to make big parallel transfers and check if you can
> reproduce the bug. It would be interested, if you get the bug, to try
> with other linux kernel revision and also other OS...

Of course, I already performed those benchmarks, however only on the 
filesystem and I never could reproduce those bugs. Tomorrow afternoon I will 
try what happens without the filesystem.
Maybe the filesystem layer speed degrading is sufficient to prevent the bug. 
When the problem first occured and asked Justin about it, he told me to use 
his newer driver versions. Then I really thought that it is a driver bug, 
because it got worse with every driver revision. Finally Justin told me that 
every revision became slightly faster - this slight speed increase was enough 
to reliably trigger this bug :/

We are trying to fix this bug for more than four weeks now, and finally we 
would like to use our new storage server. However, I'm really worried that 
this problem will occur during the real usage, though my tests showed that it 
shouldn't happen in real live.


I really would prever not to use an other OS, since I have no recent 
experience with them.

>
> Good luck!

Thanks at  lot!


Cheers,
	Bernd


-- 
Bernd Schubert
Physikalisch Chemisches Institut / Theoretische Chemie
Universität Heidelberg
INF 229
69120 Heidelberg
e-mail: bernd.schubert@pci.uni-heidelberg.de
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* I/O Errors
@ 2004-09-24  0:38 f00bar
  2004-09-27 15:53 ` Alex Zarochentsev
  0 siblings, 1 reply; 7+ messages in thread
From: f00bar @ 2004-09-24  0:38 UTC (permalink / raw)
  To: reiserfs-list

I have been having problems with my reiser4
partitions. I believed it was how I had my machine
configured but I doubt this assumption now.

I sucessfully created a resier4 fs on a LVM2 volume
group. After mounting it and executing a few basic i/o
things (like cp, mv, cat, ls, rm) all is okay.

After some emerging (I use Gentoo), I "lose" the fs. I
have even tried mkfs.resier4 and I still can not use
the fs.

# ls -l /var/tmp/portage
ls: reading directory /var/tmp/portage: Input/output
error

If I try to cp a file, no error, no file, but the
superblock does become corrupt.

I have 2 volume groups on one LVM partition spanning
the entire scsi drive. Half is an ext3 fs and the
other half is reiser4 fs. I mount one or the other as
/var/tmp/portage (where source code is compiled using
Gentoo) for testing purposes.

I am using reiser4progs and libaal 1.0.2_pre1.

# uname -a
Linux yin-yang 2.6.9-rc2-mm1-tao #1 SMP Sun Sep 19
03:14:18 CDT 2004 i686 AMD Athlon(tm) MP 2400+
AuthenticAMD GNU/Linux


# debugfs.reiser4 -p /dev/vgsdc1/reiser4
debugfs.reiser4 1.0.2-pre1
Copyright (C) 2001, 2002, 2003, 2004 by Hans Reiser,
licensing governed by reiser4progs/COPYING.

Default profiles:
format:     "format40" (id:0x0 type:0x8)
journal:    "journal40" (id:0x0 type:0xf)
oid:        "oid40" (id:0x0 type:0x9)
alloc:      "alloc40" (id:0x0 type:0xe)
key:        "key_large" (id:0x1 type:0x10)
node:       "node40" (id:0x0 type:0x2)
statdata:   "stat40" (id:0x0 type:0x1)
nodeptr:    "nodeptr40" (id:0x3 type:0x1)
direntry:   "cde40" (id:0x2 type:0x1)
tail:       "tail40" (id:0x6 type:0x1)
extent:     "extent40" (id:0x5 type:0x1)
acl:        "absent (id:0x4 type:0x1)"
permission: "absent (id:0x0 type:0x6)"
regular:    "reg40" (id:0x0 type:0x0)
directory:  "dir40" (id:0x1 type:0x0)
symlink:    "sym40" (id:0x2 type:0x0)
special:    "spl40" (id:0x3 type:0x0)
hash:       "r5_hash" (id:0x1 type:0x3)
fibration:  "ext_1_fibre" (id:0x2 type:0x4)
formatting: "smart" (id:0x2 type:0x5)


# debugfs.reiser4 -s /dev/vgsdc1/reiser4
debugfs.reiser4 1.0.2-pre1
Copyright (C) 2001, 2002, 2003, 2004 by Hans Reiser,
licensing governed by reiser4progs/COPYING.

Master super block (16):
magic:          ReIsEr4
blksize:        4096
format:         0x0 (format40)
uuid:           a529df4c-09bb-4803-b5c1-33938d5bbc68
label:          <none>

Format super block (17):
plugin:         format40
description:    Disk-format for reiser4, ver.
1.0.2-pre1
magic:          ReIsEr40FoRmAt
flushes:        0
mkfs id:        0x3bd87cf1
blocks:         553984
free blocks:    553937
root block:     23
tail policy:    0x2 (smart)
next oid:       0x10000
file count:     0
tree height:    2
key policy:     LARGE

FS status block (21):
FS marked consistent


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: I/O Errors
  2004-09-24  0:38 I/O Errors f00bar
@ 2004-09-27 15:53 ` Alex Zarochentsev
  2004-09-28  0:26   ` f00bar
  0 siblings, 1 reply; 7+ messages in thread
From: Alex Zarochentsev @ 2004-09-27 15:53 UTC (permalink / raw)
  To: f00bar; +Cc: reiserfs-list

Hi,

On Thu, Sep 23, 2004 at 05:38:01PM -0700, f00bar wrote:
> I have been having problems with my reiser4
> partitions. I believed it was how I had my machine
> configured but I doubt this assumption now.
> 
> I sucessfully created a resier4 fs on a LVM2 volume
> group. After mounting it and executing a few basic i/o
> things (like cp, mv, cat, ls, rm) all is okay.
> 
> After some emerging (I use Gentoo), I "lose" the fs. I
> have even tried mkfs.resier4 and I still can not use
> the fs.
> 
> # ls -l /var/tmp/portage
> ls: reading directory /var/tmp/portage: Input/output
> error

anything in the logs? 

> 
> If I try to cp a file, no error, no file, but the
> superblock does become corrupt.

what "fsck.reiser4 /dev/vgsdc1/reiser4" says?

> I have 2 volume groups on one LVM partition spanning
> the entire scsi drive. Half is an ext3 fs and the
> other half is reiser4 fs. I mount one or the other as
> /var/tmp/portage (where source code is compiled using
> Gentoo) for testing purposes.
> 
> I am using reiser4progs and libaal 1.0.2_pre1.
> 
> # uname -a
> Linux yin-yang 2.6.9-rc2-mm1-tao #1 SMP Sun Sep 19
> 03:14:18 CDT 2004 i686 AMD Athlon(tm) MP 2400+
> AuthenticAMD GNU/Linux
> 
> 
> # debugfs.reiser4 -p /dev/vgsdc1/reiser4
> debugfs.reiser4 1.0.2-pre1
> Copyright (C) 2001, 2002, 2003, 2004 by Hans Reiser,
> licensing governed by reiser4progs/COPYING.
> 
> Default profiles:
> format:     "format40" (id:0x0 type:0x8)
> journal:    "journal40" (id:0x0 type:0xf)
> oid:        "oid40" (id:0x0 type:0x9)
> alloc:      "alloc40" (id:0x0 type:0xe)
> key:        "key_large" (id:0x1 type:0x10)
> node:       "node40" (id:0x0 type:0x2)
> statdata:   "stat40" (id:0x0 type:0x1)
> nodeptr:    "nodeptr40" (id:0x3 type:0x1)
> direntry:   "cde40" (id:0x2 type:0x1)
> tail:       "tail40" (id:0x6 type:0x1)
> extent:     "extent40" (id:0x5 type:0x1)
> acl:        "absent (id:0x4 type:0x1)"
> permission: "absent (id:0x0 type:0x6)"
> regular:    "reg40" (id:0x0 type:0x0)
> directory:  "dir40" (id:0x1 type:0x0)
> symlink:    "sym40" (id:0x2 type:0x0)
> special:    "spl40" (id:0x3 type:0x0)
> hash:       "r5_hash" (id:0x1 type:0x3)
> fibration:  "ext_1_fibre" (id:0x2 type:0x4)
> formatting: "smart" (id:0x2 type:0x5)
> 
> 
> # debugfs.reiser4 -s /dev/vgsdc1/reiser4
> debugfs.reiser4 1.0.2-pre1
> Copyright (C) 2001, 2002, 2003, 2004 by Hans Reiser,
> licensing governed by reiser4progs/COPYING.
> 
> Master super block (16):
> magic:          ReIsEr4
> blksize:        4096
> format:         0x0 (format40)
> uuid:           a529df4c-09bb-4803-b5c1-33938d5bbc68
> label:          <none>
> 
> Format super block (17):
> plugin:         format40
> description:    Disk-format for reiser4, ver.
> 1.0.2-pre1
> magic:          ReIsEr40FoRmAt
> flushes:        0
> mkfs id:        0x3bd87cf1
> blocks:         553984
> free blocks:    553937
> root block:     23
> tail policy:    0x2 (smart)
> next oid:       0x10000
> file count:     0
> tree height:    2
> key policy:     LARGE
> 
> FS status block (21):
> FS marked consistent
> 

-- 
Alex.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: I/O Errors
  2004-09-27 15:53 ` Alex Zarochentsev
@ 2004-09-28  0:26   ` f00bar
  2004-09-28 15:02     ` E.Gryaznova
  0 siblings, 1 reply; 7+ messages in thread
From: f00bar @ 2004-09-28  0:26 UTC (permalink / raw)
  To: Alex Zarochentsev; +Cc: reiserfs-list

Hi Alex,

Thanks for your assistance.

When I mount the partition, (actually a logical
volume) the following is written to syslog.

Sep 27 19:17:32 yin-yang ef_hash_table: 8192 buckets
Sep 27 19:17:32 yin-yang z_hash_table: 8192 buckets
Sep 27 19:17:32 yin-yang z_hash_table: 8192 buckets
Sep 27 19:17:32 yin-yang j_hash_table: 16384 buckets
Sep 27 19:17:32 yin-yang loading reiser4
bitmap......done (148 jiffies)
Sep 27 19:17:32 yin-yang d_cursor_hash_table: 256
buckets


I perform an "ls" the following is written to the
console (I assume stderr) but nothing is written to
syslog.

ls: reading directory /var/tmp/portage: Input/output
error
total 0



The following is the output from fsck:

yin-yang log # fsck.reiser4 /dev/vgsdc1/reiser4
*******************************************************************
This is an EXPERIMENTAL version of fsck.reiser4. Read
README first.
*******************************************************************

Fscking the /dev/vgsdc1/reiser4 block device.
Will check the consistency of the Reiser4 SuperBlock.
Will check the consistency of the Reiser4 FileSystem.
Continue?
(Yes/No): yes
***** fsck.reiser4 started at Mon Sep 27 19:23:14 2004
Reiser4 journal (journal40) on /dev/vgsdc1/reiser4: 0
transactions replayed of the total 0 blocks.
Reiser4 fs was detected on /dev/vgsdc1/reiser4.
Master super block (16):
magic:          ReIsEr4
blksize:        4096
format:         0x0 (format40)
uuid:           3581757c-9576-4ed6-ba21-59863d2950b1
label:          <none>

Format super block (17):
plugin:         format40
description:    Disk-format for reiser4, ver.
1.0.2-pre1
magic:          ReIsEr40FoRmAt
flushes:        0
mkfs id:        0x4e9bfc27
blocks:         553984
free blocks:    553937
root block:     23
tail policy:    0x2 (smart)
next oid:       0x10000
file count:     0
tree height:    2
key policy:     LARGE


CHECKING STORAGE TREE
        Read nodes 2
        Nodes left in the tree 2
                Leaves of them 1, Twigs of them 1
        Time interval: Mon Sep 27 19:23:14 2004 - Mon
Sep 27 19:23:14 2004
CHECKING EXTENT REGIONS.
        Read twigs 1
        Time interval: Mon Sep 27 19:23:14 2004 - Mon
Sep 27 19:23:14 2004
CHECKING SEMANTIC TREE
        Found 1 objects.
        Time interval: Mon Sep 27 19:23:14 2004 - Mon
Sep 27 19:23:14 2004
***** fsck.reiser4 finished at Mon Sep 27 19:23:14
2004
Closing fs...done

FS is consistent.

Regards,
John Baxter


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: I/O Errors
  2004-09-28  0:26   ` f00bar
@ 2004-09-28 15:02     ` E.Gryaznova
  0 siblings, 0 replies; 7+ messages in thread
From: E.Gryaznova @ 2004-09-28 15:02 UTC (permalink / raw)
  To: f00bar; +Cc: Alex Zarochentsev, reiserfs-list

Hello.

Would you please send us the #pvscan, #pvdisplay, #vgdislpay, 
#lvdisplay  and  #lvs --version outputs.

Thanks,
Lena.
 
f00bar wrote:

>Hi Alex,
>
>Thanks for your assistance.
>
>When I mount the partition, (actually a logical
>volume) the following is written to syslog.
>
>Sep 27 19:17:32 yin-yang ef_hash_table: 8192 buckets
>Sep 27 19:17:32 yin-yang z_hash_table: 8192 buckets
>Sep 27 19:17:32 yin-yang z_hash_table: 8192 buckets
>Sep 27 19:17:32 yin-yang j_hash_table: 16384 buckets
>Sep 27 19:17:32 yin-yang loading reiser4
>bitmap......done (148 jiffies)
>Sep 27 19:17:32 yin-yang d_cursor_hash_table: 256
>buckets
>
>
>I perform an "ls" the following is written to the
>console (I assume stderr) but nothing is written to
>syslog.
>
>ls: reading directory /var/tmp/portage: Input/output
>error
>total 0
>
>
>
>The following is the output from fsck:
>
>yin-yang log # fsck.reiser4 /dev/vgsdc1/reiser4
>*******************************************************************
>This is an EXPERIMENTAL version of fsck.reiser4. Read
>README first.
>*******************************************************************
>
>Fscking the /dev/vgsdc1/reiser4 block device.
>Will check the consistency of the Reiser4 SuperBlock.
>Will check the consistency of the Reiser4 FileSystem.
>Continue?
>(Yes/No): yes
>***** fsck.reiser4 started at Mon Sep 27 19:23:14 2004
>Reiser4 journal (journal40) on /dev/vgsdc1/reiser4: 0
>transactions replayed of the total 0 blocks.
>Reiser4 fs was detected on /dev/vgsdc1/reiser4.
>Master super block (16):
>magic:          ReIsEr4
>blksize:        4096
>format:         0x0 (format40)
>uuid:           3581757c-9576-4ed6-ba21-59863d2950b1
>label:          <none>
>
>Format super block (17):
>plugin:         format40
>description:    Disk-format for reiser4, ver.
>1.0.2-pre1
>magic:          ReIsEr40FoRmAt
>flushes:        0
>mkfs id:        0x4e9bfc27
>blocks:         553984
>free blocks:    553937
>root block:     23
>tail policy:    0x2 (smart)
>next oid:       0x10000
>file count:     0
>tree height:    2
>key policy:     LARGE
>
>
>CHECKING STORAGE TREE
>        Read nodes 2
>        Nodes left in the tree 2
>                Leaves of them 1, Twigs of them 1
>        Time interval: Mon Sep 27 19:23:14 2004 - Mon
>Sep 27 19:23:14 2004
>CHECKING EXTENT REGIONS.
>        Read twigs 1
>        Time interval: Mon Sep 27 19:23:14 2004 - Mon
>Sep 27 19:23:14 2004
>CHECKING SEMANTIC TREE
>        Found 1 objects.
>        Time interval: Mon Sep 27 19:23:14 2004 - Mon
>Sep 27 19:23:14 2004
>***** fsck.reiser4 finished at Mon Sep 27 19:23:14
>2004
>Closing fs...done
>
>FS is consistent.
>
>Regards,
>John Baxter
>
>
>
>  
>



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2004-09-28 15:02 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-09-24  0:38 I/O Errors f00bar
2004-09-27 15:53 ` Alex Zarochentsev
2004-09-28  0:26   ` f00bar
2004-09-28 15:02     ` E.Gryaznova
  -- strict thread matches above, loose matches on Subject: below --
2004-06-23  9:42 i/o errors Bernd Schubert
2004-06-23 14:00 ` Fabien Salvi
2004-06-23 15:08   ` Bernd Schubert

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.