* Strange network related data corruption
@ 2007-10-07 16:47 Malte Schröder
2007-10-08 13:01 ` Denys Vlasenko
0 siblings, 1 reply; 5+ messages in thread
From: Malte Schröder @ 2007-10-07 16:47 UTC (permalink / raw)
To: linux-kernel
[-- Attachment #1.1: Type: text/plain, Size: 2592 bytes --]
Hello,
I am encountering some strange data corruption when transferring
data from one of my PCs that I use as a file-server.
on the server:
FILE=<large file>; | cut -d" " -f1 | nc -lp5000 -q0; while nc
-lp5000 -q0 < $FILE; do : ; done
on the client:
H=<server>; SUM=$(nc -q0 $H 5000);sleep 1s; while nc -q0 $H 5000 |
sha1sum | (grep -v $SUM || echo -n .); do sleep 1s ;done
(output looks somewhat like this:
..............6dd5fb1ce29d270acdfbb02d00921bf75d141773 -
...
)
I would expect the sha1sum to be the same in every pass (assuming the
source file does not change). But every few passes (with no apparent
pattern) there is a different sum returned. I first noticed this when
transferring large files (backups) with with SMB and NFS(v3 and v4) but
to rule that out I tried netcat in the way noted above.
When I have the server do the sha1sum of the file locally the problem
is not reproducible. When I do this with a small file that easily fits
into the cache the problem stays reproducible.
Another thing I did was to use dd to transfer data in 1GiB chunks from
/dev/zero and generate the sha1sum on the client. There I was not able
to reproduce the problem.
The server is a Athlon64 3400+ (good old Clawhammer) with 1GiB RAM. I
use 4 SATA drives in a software RAID5 configuration, attached to a
Promise TX4 300 SATA-II controller. The filesystem is ext3 without
special mount-options. The dist is Debian/Sid for AMD64 with
self-compiled kernel 2.6.23-rc9 (.config attached).
The clients I tried are a Core2Duo 6600 with 3GiB of RAM, also
Debian/Sid AMD64 (kernel 2.6.23-rc9) and a Centrino notebook with
Pentium M and 1GiB of RAM (Debian/Sid i386, kernel 2.6.23-rc7).
All PCs mentioned have gigabit ethernet and are connected via a gigabit
switch.
I tried these tests between the clients and could not reproduce the
problem there.
I had the server run memtest68+ with 20 passes without problems.
I tried several kernel versions on the server (from .18 to .23-rc9), all
showed the problem. I suspect a hardware problem, but I cannot isolate
the part responsible. I tried another ethernet adapter (the 3com905cin
lspci output) and I also tried the onboard sata controller(s) (2 ports
via and 2 ports promise tx2).
I don't know if this is a kernel problem or just my and my setup, but
maybe some one on this list has an idea wher I could look next.
Thanks and regards
Malte
--
---------------------------------------
Malte Schröder
MalteSch@gmx.de
ICQ# 68121508
---------------------------------------
[-- Attachment #1.2: config-2.6.23-rc9 --]
[-- Type: text/plain, Size: 243 bytes --]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Strange network related data corruption
2007-10-07 16:47 Strange network related data corruption Malte Schröder
@ 2007-10-08 13:01 ` Denys Vlasenko
2007-10-09 11:25 ` Malte Schröder
0 siblings, 1 reply; 5+ messages in thread
From: Denys Vlasenko @ 2007-10-08 13:01 UTC (permalink / raw)
To: Malte Schröder; +Cc: linux-kernel
On Sunday 07 October 2007 17:47, Malte Schröder wrote:
> Hello,
> I am encountering some strange data corruption when transferring
> data from one of my PCs that I use as a file-server.
>
> on the server:
> FILE=<large file>; | cut -d" " -f1 | nc -lp5000 -q0; while nc
> -lp5000 -q0 < $FILE; do : ; done
$ cat z
FILE=z; | cut -d" " -f1 | nc -lp5000 -q0; while nc -lp5000 -q0 < $FILE; do : ; done
$ sh z
z: line 1: syntax error near unexpected token `|'
z: line 1: `FILE=z; | cut -d" " -f1 | nc -lp5000 -q0; while nc -lp5000 -q0 < $FILE; do : ; done'
> on the client:
> H=<server>; SUM=$(nc -q0 $H 5000);sleep 1s; while nc -q0 $H 5000 |
> sha1sum | (grep -v $SUM || echo -n .); do sleep 1s ;done
>
> (output looks somewhat like this:
> ..............6dd5fb1ce29d270acdfbb02d00921bf75d141773 -
> ...
> )
>
> I would expect the sha1sum to be the same in every pass (assuming the
> source file does not change). But every few passes (with no apparent
> pattern) there is a different sum returned. I first noticed this when
> transferring large files (backups) with with SMB and NFS(v3 and v4) but
> to rule that out I tried netcat in the way noted above.
Does it happen over loopback?
tcpdump / tcpflow may help seeing whether there is some corruption
(TCP checksumming should have catched that, but worth looking into).
Basically, you wait for "wrong" checksum to appear, then
you stop script and look into tcpdump/tcpflow logs.
--
vda
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Strange network related data corruption
2007-10-08 13:01 ` Denys Vlasenko
@ 2007-10-09 11:25 ` Malte Schröder
2007-10-09 11:57 ` Denys Vlasenko
0 siblings, 1 reply; 5+ messages in thread
From: Malte Schröder @ 2007-10-09 11:25 UTC (permalink / raw)
To: Denys Vlasenko; +Cc: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 1951 bytes --]
On Mon, 8 Oct 2007 14:01:32 +0100
Denys Vlasenko <vda.linux@googlemail.com> wrote:
> On Sunday 07 October 2007 17:47, Malte Schröder wrote:
> > Hello,
> > I am encountering some strange data corruption when transferring
> > data from one of my PCs that I use as a file-server.
> >
> > on the server:
> > FILE=<large file>; | cut -d" " -f1 | nc -lp5000 -q0; while nc
> > -lp5000 -q0 < $FILE; do : ; done
>
> $ cat z
> FILE=z; | cut -d" " -f1 | nc -lp5000 -q0; while nc -lp5000 -q0 < $FILE; do : ; done
Sorry, there is a copy'n'paste error in my post, correct line would be:
FILE=z ; sha1sum $FILE | cut -d" " -f1 | nc -lp5000 -q0; while nc -lp5000 -q0 < $FILE; do : ;done
>
> $ sh z
> z: line 1: syntax error near unexpected token `|'
> z: line 1: `FILE=z; | cut -d" " -f1 | nc -lp5000 -q0; while nc -lp5000 -q0 < $FILE; do : ; done'
>
> > on the client:
> > H=<server>; SUM=$(nc -q0 $H 5000);sleep 1s; while nc -q0 $H 5000 |
> > sha1sum | (grep -v $SUM || echo -n .); do sleep 1s ;done
> >
> > (output looks somewhat like this:
> > ..............6dd5fb1ce29d270acdfbb02d00921bf75d141773 -
> > ...
> > )
> >
> > I would expect the sha1sum to be the same in every pass (assuming the
> > source file does not change). But every few passes (with no apparent
> > pattern) there is a different sum returned. I first noticed this when
> > transferring large files (backups) with with SMB and NFS(v3 and v4) but
> > to rule that out I tried netcat in the way noted above.
>
> Does it happen over loopback?
I just tried a few times and yes, it also happens on loopback, but
much less frequently. Now I am really confused ...
>
> tcpdump / tcpflow may help seeing whether there is some corruption
> (TCP checksumming should have catched that, but worth looking into).
> Basically, you wait for "wrong" checksum to appear, then
> you stop script and look into tcpdump/tcpflow logs.
> --
> vda
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Strange network related data corruption
2007-10-09 11:25 ` Malte Schröder
@ 2007-10-09 11:57 ` Denys Vlasenko
2007-10-13 7:55 ` Malte Schröder
0 siblings, 1 reply; 5+ messages in thread
From: Denys Vlasenko @ 2007-10-09 11:57 UTC (permalink / raw)
To: Malte Schröder; +Cc: linux-kernel
On Tuesday 09 October 2007 12:25, Malte Schröder wrote:
> > Does it happen over loopback?
>
> I just tried a few times and yes, it also happens on loopback, but
> much less frequently. Now I am really confused ...
Actually, that eliminates a lot of cases.
Run memtest86 overnight ("bad hardware" theory),
try older kernel versions ("kernel bug" theory).
--
vda
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: Strange network related data corruption
2007-10-09 11:57 ` Denys Vlasenko
@ 2007-10-13 7:55 ` Malte Schröder
0 siblings, 0 replies; 5+ messages in thread
From: Malte Schröder @ 2007-10-13 7:55 UTC (permalink / raw)
To: Denys Vlasenko; +Cc: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 917 bytes --]
On Tue, 9 Oct 2007 12:57:20 +0100
Denys Vlasenko <vda.linux@googlemail.com> wrote:
> On Tuesday 09 October 2007 12:25, Malte Schröder wrote:
> > > Does it happen over loopback?
> >
> > I just tried a few times and yes, it also happens on loopback, but
> > much less frequently. Now I am really confused ...
>
> Actually, that eliminates a lot of cases.
>
> Run memtest86 overnight ("bad hardware" theory),
Well, that did not show problems. But I put apart the PC, removed dust
and so on.
Now it doesn't even boot anymore (i.e. no BIOS, not even pcspeaker when
I boot without RAM. So I declare the mainboard as dead.
> try older kernel versions ("kernel bug" theory).
Done. But it is the hardware.
Thanks for the advise :)
> --
> vda
>
--
---------------------------------------
Malte Schröder
MalteSch@gmx.de
ICQ# 68121508
---------------------------------------
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2007-10-13 7:55 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-10-07 16:47 Strange network related data corruption Malte Schröder
2007-10-08 13:01 ` Denys Vlasenko
2007-10-09 11:25 ` Malte Schröder
2007-10-09 11:57 ` Denys Vlasenko
2007-10-13 7:55 ` Malte Schröder
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox