From: Oleg Drokin <green@namesys.com>
To: Kevin <coggy@redefine.org>
Cc: reiserfs-list@namesys.com
Subject: Re: access beyond end of device again
Date: Tue, 25 Jun 2002 12:13:53 +0400 [thread overview]
Message-ID: <20020625121353.A3027@namesys.com> (raw)
In-Reply-To: <712080311.20020627010847@redefine.org>
[-- Attachment #1: Type: text/plain, Size: 1260 bytes --]
Hello!
On Thu, Jun 27, 2002 at 01:08:47AM -0700, Kevin wrote:
> > Well, you have some corrupted leaves in conjunction to corrupted unformatted
> > pointers. So reiserfsck --rebuild-tree seems to be needed.
> > Are you sure there is no data corruption when writing to disk?
> > There were reports that VIA chipsets have problems with more than 3 IDE
> > channels being in use simultaneously.
> > I am sure there is even a test suite that trigger these bugs reliable.
> > (you have not told us anything about your motherboard/system so may
> > be you do not use VIA chipset of course, but finding one of such tools
> > that uses several HDDs simultaneously is advisable).
> It is a 2x400 celeron on an Abit BP6. There are 5 hdd's total, spread
Abit BP6 is particularly bad motherboard, you know.
And running celerons in SMP mode is not supported by Intel.
> across 3 controllers. All the drives are the master on their channel,
> with no slaves. As far as the testing, do you know of any such tools?
> It's worth a try. However, when the file that triggered the error was
> written, the system was not under any stress at all.
E.g. http://www.bit-net.com/~rmiller/dt.html
Also take a look at the two messages from lkml, I have attached.
Bye,
Oleg
[-- Attachment #2: m1 --]
[-- Type: text/plain, Size: 6085 bytes --]
From linux-kernel-owner+green=40namesys.com@vger.kernel.org Wed May 8 05:48:03 2002
Return-Path: <linux-kernel-owner+green=40namesys.com@vger.kernel.org>
Delivered-To: green@localhost.namesys.com
Received: from localhost (localhost [127.0.0.1])
by angband.namesys.com (Postfix on SuSE Linux 7.3 (i386)) with ESMTP id CEAAC41907
for <green@localhost>; Wed, 8 May 2002 05:48:03 +0400 (MSD)
Delivered-To: green@namesys.com
Received: from thebsh.namesys.com [212.16.7.65]
by localhost with POP3 (fetchmail-5.9.0)
for green@localhost (single-drop); Wed, 08 May 2002 05:48:03 +0400 (MSD)
Received: (qmail 29959 invoked from network); 8 May 2002 01:46:15 -0000
Received: from vger.kernel.org (209.116.70.75)
by thebsh.namesys.com with SMTP; 8 May 2002 01:46:15 -0000
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
id <S315478AbSEHBoK>; Tue, 7 May 2002 21:44:10 -0400
Received: (majordomo@vger.kernel.org) by vger.kernel.org
id <S315479AbSEHBoJ>; Tue, 7 May 2002 21:44:09 -0400
Received: from pop.gmx.net ([213.165.64.20]:62967 "HELO mail.gmx.net")
by vger.kernel.org with SMTP id <S315478AbSEHBoI> convert rfc822-to-8bit;
Tue, 7 May 2002 21:44:08 -0400
Received: (qmail 32229 invoked by uid 0); 8 May 2002 01:44:01 -0000
Received: from adsl-162-85.adsl-pool.axelero.hu (HELO lead) (62.201.85.162)
by mail.gmx.net (mp001-rz3) with SMTP; 8 May 2002 01:44:01 -0000
Reply-To: <bPObject@axelero.hu>
From: "P. Breuer" <bPObject@gmx.ch>
To: <andre@linux-ide.org>
Cc: <linux-kernel@vger.kernel.org>
Subject: PROBLEM: silent data corruption using HPT370 on an ABIT VP6
Date: Wed, 8 May 2002 03:43:59 +0200
Message-ID: <EGEOJJNFHLHGOKNADENLOEGCCFAA.bPObject@gmx.ch>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7BIT
X-Priority: 3 (Normal)
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2911.0)
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2600.0000
Importance: Normal
Sender: linux-kernel-owner@vger.kernel.org
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
Status: RO
Content-Length: 3968
Lines: 83
1. Silent disk corruption using HPT370 on an ABIT VP6
2. I have tracked down a crooked bug somewhere in the IDE driver
leading to a slow and silent data corruption, which is a most alarming threat
for the incautious. The case is simple: "cp file1 file2; diff file1 file2"
shows differences under certain conditions.
3. Keywords: kernel, driver, ide, data corruption, i386
4. Kernel versions: 2.4.16 or 2.4.18 (error reproducible in both versions)
5. Hardware environment (details see below):
ABIT VP6 motherboard including: dual Pentium III, VIA APOLLO PRO chipset
VIA onboard EIDE controller,
HPT370 "raid" UDMA/100 controller, integrated on board
Promise TX2 (PDC) UDMA/100 PCI controller card
Hard disks (all masters):
2 x 6GB Quantum Fireball EX6.4A on VIA,
2 x 40GB Quantum FireballP AS40.0 on PDC,
2 x 40GB Quantum FireballP AS40.0 on HPT
6. Software environment:
IDE driver (kernel-integrated)
raidtools-0.90-5 (optional)
General: four 40GB disks of identical geometry have three partitions each,
same partitioning, identified by /dev/hd[e,g,i,k][1-3],
/dev/md[0-2} are three RAID-5 arrays defined on the four disks accordingly
each out of three raid partitions are formatted ext3 with internal journal
7. ERROR description:
Let "file1" be a "large" data file, e.g. 1GB, on a RAID array described above.
Then "cp file1 file2; cmp -l file1 file2" shows (subtle) differences.
There are random differences on several random spots between the files.
The "spots" occur usually as blocks of few bytes in succession. The difference
is up to several dozens of bytes at a 1GB file copy.
8. Tracking down the error:
I have conducted over 100 test cases: the error is consistent, though random.
First I excluded an error in the raid software:
umount /dev/md[0-2]; raidstop /dev/md[0-2].
I used a script to read all four raw disks concurrent:
for d in e, g, i, k; do \
(for i in 1 2 3 4 5; do \
dd if=/dev/hd"$d"1 count=2500000 \
2> /dev/null | md5sum; done \
) >> trc"$d".md5sum done
I found NO differences in trce.md5sum and trcg.md5sum (both disks are on the
Promise controller), but significant differences in trci.md5sum and trck.md5sum,
displaying 3 and 5 different read results out of 5 identical reads, resp.
(both disks are on the HPT370 controller).
Oops!!!
I stayed focused on the HPT370 controller, and compiled a small test environment with a
single processor motherboard and a HPT370A PCI controller card, which, in addition, has
the same HPT BIOS version (1.0.3b1) as the integrated one. I found no problem using this
configuration, so the error might well be related only to the SMP architecture.
9. Solution or workaround?
I browsed through the HighPoint Software web pages and found a remarkable replacement
for the kernel IDE-driver. This is a SCSI IDE emulation module, called hpt37x2.o, that
can be built for "any" 2.4.x kernel. And IT WORKS, at least for me, since at least two days ;)
The only drawback is, that it is not GPL-d and the complete source is not available.
The existence of a working driver is a profound proof for the kernel driver to be in error!
10. Attachments:
I have saved several files out of /proc, boot log, etc. from the test period,
i.e. by using the faulty driver. They are available upon request. Due to the fact, that the
HPT driver is not a native IDE-driver, but a SCSI-emulation, it is not possible to switch
between booting the old and new kernels very easily. One example, the raid arrays are not
recognised from the foreign configuration.
Peter Breuer [P.Breuer@freemail.hu]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[-- Attachment #3: m2 --]
[-- Type: text/plain, Size: 7189 bytes --]
From linux-kernel-owner+green=40namesys.com@vger.kernel.org Tue May 14 22:04:04 2002
Return-Path: <linux-kernel-owner+green=40namesys.com@vger.kernel.org>
Delivered-To: green@localhost.namesys.com
Received: from localhost (localhost [127.0.0.1])
by angband.namesys.com (Postfix on SuSE Linux 7.3 (i386)) with ESMTP id 0E193B17A1
for <green@localhost>; Tue, 14 May 2002 22:04:04 +0400 (MSD)
Delivered-To: green@namesys.com
Received: from thebsh.namesys.com [212.16.7.65]
by localhost with POP3 (fetchmail-5.9.0)
for green@localhost (single-drop); Tue, 14 May 2002 22:04:04 +0400 (MSD)
Received: (qmail 11573 invoked from network); 14 May 2002 18:03:31 -0000
Received: from vger.kernel.org (209.116.70.75)
by thebsh.namesys.com with SMTP; 14 May 2002 18:03:31 -0000
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
id <S315935AbSENR4J>; Tue, 14 May 2002 13:56:09 -0400
Received: (majordomo@vger.kernel.org) by vger.kernel.org
id <S315937AbSENR4I>; Tue, 14 May 2002 13:56:08 -0400
Received: from mail.netbeat.de ([62.208.140.19]:53266 "HELO mail.netbeat.de")
by vger.kernel.org with SMTP id <S315935AbSENRzk>;
Tue, 14 May 2002 13:55:40 -0400
Received: (qmail 2315 invoked from network); 14 May 2002 17:57:31 -0000
Received: from pd9542a05.dip.t-dialin.net (HELO qs2) (217.84.42.5)
by mail.netbeat.de with SMTP; 14 May 2002 17:57:31 -0000
Date: Tue, 14 May 2002 19:55:33 +0200
From: Henning Schroeder <hgs@anna-strasse.de>
X-Mailer: The Bat! (v1.53d)
Reply-To: Henning Schroeder <hgs@anna-strasse.de>
Organization: =?ISO-8859-1?B?QW5uYXN0cmFzc2UgV/xyemJ1cmc=?=
X-Priority: 3 (Normal)
Message-ID: <379487051.20020514195533@anna-strasse.de>
To: linux-kernel@vger.kernel.org
Subject: IDE *data corruption* VIA VT8367
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
Sender: linux-kernel-owner@vger.kernel.org
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
Status: RO
Content-Length: 5198
Lines: 128
Hello,
I╢m not quite sure whether this is a kernel issue, but I can╢t think
of another evildoer :-)
ASUS A7V266-E Mainboard (VT8367 [KT266] Chipset, with VIA IDE and
Promise 20265 IDE Controller on board), 4x MAXTOR 6L020J1 (20GB
ATA-100) attached at the four ports (resulting in hda, hdc, hde, hdg).
Robin Miller╢s Data Test Program (dt) from
http://www.bit-net.com/~rmiller/dt.html reports data errors on (and
only on) hdg when tests are run in parallel. This is especially nasty
because i plan to use the drives in a RAID-0 fashion which results in
data errors as well.
These combinations give errors: (hda hdc hde hdg), (hdc hde hdg)
These combinations run flawless: (hda hdc hde), (hde hdg), (hda hdc
hdg). I did not test more combinations because every test takes some
hours.
Attaching hdg as a slave drive to the first promise port (which gives
me hdf instead and the second promise port emtpy) makes the array run
fine, but performance drops to a figure comparable to a single drive.
There are no error logs whatsoever (except for the dt output). Without
RAID-array and without heavy IDE access, the machine runs stable.
Kernels tested: 2.4.18, 2.4.19pre8
Has anybody seen this before? Any info would be appreciated. I would
be happy to provide more information.
Diagnostics attached below.
------- output from dt (this is actually output from testing the raid
array) ----------------
Command Line:
% dt.d/dt of=/data/test limit=1g min=512 max=32k align=rotate procs=15 log=dtlog runtime=12h
--> Date: June 2nd, Version: 14.10, Author: Robin T. Miller <--
[...]
dt (2150): Error number 1 occurred on Wed May 8 20:16:40 2002
dt (2150): Data compare error at byte 5116 in record number 36
dt (2150): Relative block number where the error occcured is 639 (offset 508)
dt (2150): Data expected = 0xde, data found = 0x33, byte count = 18432
dt (2150): The incorrect data starts at address 0x80b1688 (marked by asterisk '*')
dt (2150): Dumping Pattern Buffer (base = 0x80b1688, offset = 0, limit = 4 bytes):
0x80b1688 *de c6 de c6
dt (2150): The incorrect data starts at address 0x80b33ff (marked by asterisk '*')
dt (2150): Dumping Data Buffer (base = 0x80b2003, offset = 5116, limit = 64 bytes):
0x80b33df de c6 de c6 de c6 de c6 de c6 de c6 de c6 de c6
0x80b33ef de c6 de c6 de c6 de c6 de c6 de c6 de c6 de c6
0x80b33ff *33 33 33 33 de c6 de c6 de c6 de c6 de c6 de c6
0x80b340f de c6 de c6 de c6 de c6 de c6 de c6 de c6 de c6
[...]
dt (2148): Error number 1 occurred on Wed May 8 20:16:42 2002
dt (2148): Data compare error at byte 2044 in record number 857
dt (2148): Relative block number where the error occcured is 27343 (offset 508)
dt (2148): Data expected = 0xff, data found = 0x26, byte count = 12800
dt (2148): The incorrect data starts at address 0x80b1688 (marked by asterisk '*')
dt (2148): Dumping Pattern Buffer (base = 0x80b1688, offset = 0, limit = 4 bytes):
0x80b1688 *ff 00 ff 00
dt (2148): The incorrect data starts at address 0x80b27fc (marked by asterisk '*')
dt (2148): Dumping Data Buffer (base = 0x80b2000, offset = 2044, limit = 64 bytes):
0x80b27dc ff 00 ff 00 ff 00 ff 00 ff 00 ff 00 ff 00 ff 00
0x80b27ec ff 00 ff 00 ff 00 ff 00 ff 00 ff 00 ff 00 ff 00
0x80b27fc *26 33 67 66 ff 00 ff 00 ff 00 ff 00 ff 00 ff 00
0x80b280c ff 00 ff 00 ff 00 ff 00 ff 00 ff 00 ff 00 ff 00
[...]
dt (2160): Error number 1 occurred on Wed May 8 20:16:46 2002
dt (2160): Data compare error at byte 24572 in record number 49
dt (2160): Relative block number where the error occcured is 1223 (offset 508)
dt (2160): Data expected = 0x39, data found = 0xff, byte count = 25088
dt (2160): The incorrect data starts at address 0x80b1688 (marked by asterisk '*')
dt (2160): Dumping Pattern Buffer (base = 0x80b1688, offset = 0, limit = 4 bytes):
0x80b1688 *39 9c c3 39
dt (2160): The incorrect data starts at address 0x80b7ffc (marked by asterisk '*')
dt (2160): Dumping Data Buffer (base = 0x80b2000, offset = 24572, limit = 64 bytes):
0x80b7fdc 39 9c c3 39 39 9c c3 39 39 9c c3 39 39 9c c3 39
0x80b7fec 39 9c c3 39 39 9c c3 39 39 9c c3 39 39 9c c3 39
0x80b7ffc *ff 00 ff 00 39 9c c3 39 39 9c c3 39 39 9c c3 39
0x80b800c 39 9c c3 39 39 9c c3 39 39 9c c3 39 39 9c c3 39
[.... ad nauseaum]
-------------- lspci output ------------
00:00.0 Host bridge: VIA Technologies, Inc. VT8367 [KT266]
00:01.0 PCI bridge: VIA Technologies, Inc. VT8367 [KT266 AGP]
00:06.0 Unknown mass storage controller: Promise Technology, Inc. 20265 (rev 02)
00:0c.0 VGA compatible unclassified device: S3 Inc. 86c864 [Vision 864 DRAM] vers 0
00:0e.0 Ethernet controller: Intel Corp. 82557 [Ethernet Pro 100] (rev 0c)
00:0f.0 Ethernet controller: Intel Corp. 82557 [Ethernet Pro 100] (rev 0c)
00:11.0 ISA bridge: VIA Technologies, Inc. VT8233 PCI to ISA Bridge
00:11.1 IDE interface: VIA Technologies, Inc. Bus Master IDE (rev 06)
--
Best regards,
Henning mailto:hgs@anna-strasse.de
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
next prev parent reply other threads:[~2002-06-25 8:13 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2002-06-24 14:35 access beyond end of device again Kevin
2002-06-24 14:35 ` Oleg Drokin
2002-06-24 14:45 ` Dirk Mueller
2002-06-24 14:49 ` Oleg Drokin
2002-06-24 16:46 ` Hans Reiser
2002-06-25 5:11 ` Oleg Drokin
2002-06-24 16:59 ` Dirk Mueller
2002-06-24 14:37 ` Robert Brockway
2002-06-24 14:48 ` Oleg Drokin
2002-06-24 14:49 ` Robert Brockway
2002-06-24 17:30 ` Kevin
2002-06-25 5:54 ` Oleg Drokin
2002-06-25 6:08 ` Kevin
2002-06-25 6:15 ` Oleg Drokin
2002-06-25 7:46 ` Kevin
2002-06-25 7:55 ` Oleg Drokin
2002-06-25 8:07 ` Kevin
2002-06-25 8:13 ` Oleg Drokin [this message]
2002-06-25 8:46 ` Hans Reiser
2002-06-25 8:58 ` Oleg Drokin
2002-06-25 9:06 ` Hans Reiser
2002-06-25 9:41 ` Oleg Drokin
[not found] ` <353485111.20020627013212@redefine.org>
2002-06-25 23:22 ` Kevin
2002-06-26 4:53 ` Oleg Drokin
2002-07-10 18:04 ` Tim Small
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20020625121353.A3027@namesys.com \
--to=green@namesys.com \
--cc=coggy@redefine.org \
--cc=reiserfs-list@namesys.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.