public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Christoph Anton Mitterer <calestyo@scientia.net>
To: Karsten Weiss <knweiss@gmx.de>
Cc: linux-kernel@vger.kernel.org, ak@suse.de, andersen@codepoet.org,
	cw@f00f.org
Subject: Re: data corruption with nvidia chipsets and IDE/SATA drives // memory hole mapping related bug?!
Date: Wed, 13 Dec 2006 19:52:59 +0100	[thread overview]
Message-ID: <45804C0B.4030109@scientia.net> (raw)
In-Reply-To: <Pine.LNX.4.64.0612021048200.2981@addx.localnet>

[-- Attachment #1: Type: text/plain, Size: 2062 bytes --]

Karsten Weiss wrote:
> Here's a diff of a corrupted and a good file written during our
> testcase:
>
> ("-" == corrupted file, "+" == good file)
> ...
>   009f2ff0  67 2a 4c c4 6d 9d 34 44  ad e6 3c 45 05 9a 4d c4  |g*L.m.4D..<E..M.|
> -009f3000  39 60 e6 44 20 ab 46 44  56 aa 46 44 c2 35 e6 44  |9.D .FDV.FD.5.D|
> ....
> +009f3ff0  f3 55 92 44 c1 10 6c 45  5e 12 a0 c3 60 31 93 44  |.U.D..lE^...1.D|
>   009f4000  88 cd 6b 45 c1 6d cd c3  00 a5 8b 44 f2 ac 6b 45  |..kE.m.....D..kE|
>   
Well as I told in my mails to the list I made the experience that not
all bytes of the corrupted area are invalid,.. but only some,.. while it
seems that in you diff ALL the bytes are wrong, right?


> Please notice:
>
> a) the corruption begins at a page boundary
> b) the corrupted byte range is a single memory page and
> c) almost every fourth byte is set to 0x44 in the corrupted case
>     (but the other bytes changed, too)
>
> To me this looks as if a wrong memory page got written into the
> file.
>   
Hmm and do you have any ideas what's the reason for all this? Defect in
the nforce chipset? Or even in the CPU (the Opterons do have integrated
memory controllers).


> >From our testing I can also tell that the data corruption does
> *not* appear at all when we are booting the nodes with mem=2G.
> However, when we are using all the 4GB the data corruption
> shows up - but not everytime and thus not on all nodes.
> Sometimes a node runs for ours without any problem. That's why
> we are testing on 32 nodes in parallel most of the time. I have
> the impression that it has something to do with physical memory
> layout of the running processes.
>   
Hmm maybe,.. but I have absolutely no idea ;)


> Please also notice that this is a silent data corruption. I.e.
> there are no error or warning messages in the kernel log or the
> mce log at all.
>   
Yes I can confirm that.


> Christoph, I will carefully re-read your entire posting and the
> included links on Monday and will also try the memory hole
> setting.
>   
And did you get out anything new?

[-- Attachment #2: calestyo.vcf --]
[-- Type: text/x-vcard, Size: 156 bytes --]

begin:vcard
fn:Mitterer, Christoph Anton
n:Mitterer;Christoph Anton
email;internet:calestyo@scientia.net
x-mozilla-html:TRUE
version:2.1
end:vcard


  parent reply	other threads:[~2006-12-13 18:59 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-12-02  0:56 data corruption with nvidia chipsets and IDE/SATA drives // memory hole mapping related bug?! Christoph Anton Mitterer
2006-12-02  1:15 ` Erik Andersen
2006-12-02  1:28   ` Christoph Anton Mitterer
2006-12-02  5:17 ` Chris Wedgwood
2006-12-02 12:10   ` Christoph Anton Mitterer
     [not found]   ` <20061202111644.GF9995@vianova.fi>
2006-12-08  2:16     ` Christoph Anton Mitterer
2006-12-02 11:00 ` Karsten Weiss
2006-12-02 11:37   ` Alan
2006-12-02 11:39     ` Christoph Anton Mitterer
2006-12-13 18:52   ` Christoph Anton Mitterer [this message]
2006-12-13 19:56     ` Karsten Weiss
2006-12-13 20:11       ` Christoph Anton Mitterer
2006-12-14  9:34         ` Chris Wedgwood
2007-01-15 22:26           ` Christoph Anton Mitterer
2006-12-03  1:17 ` data corruption with nvidia chipsets and IDE/SATA drives Kurtis D. Rader
2006-12-03  3:35   ` Kurtis D. Rader
2006-12-03 14:17   ` Steffen Moser
2006-12-04  1:58   ` data corruption with nvidia nForce 4 " Kurtis D. Rader
2006-12-04 12:47     ` Alan
2006-12-05  6:00   ` data corruption with nvidia " Kurtis D. Rader
2006-12-06 11:11   ` Christian
2006-12-06 21:25     ` Chris Wedgwood
2006-12-14 23:39 ` data corruption with nvidia chipsets and IDE/SATA drives // memory hole mapping related bug?! Dax Kelson
     [not found] <Pine.LNX.4.64.0612021202000.2981@addx.localnet>
2006-12-11  9:24 ` Karsten Weiss
2006-12-13 19:18   ` Christoph Anton Mitterer
2006-12-13 19:53     ` Chris Wedgwood
2006-12-13 20:34       ` Karsten Weiss
2006-12-14  9:22         ` Muli Ben-Yehuda
2006-12-23  2:04         ` Christoph Anton Mitterer
2006-12-23  2:56           ` John A Chaves
2006-12-23  3:26             ` Christoph Anton Mitterer
2006-12-13 19:20   ` Christoph Anton Mitterer
2006-12-13 19:54     ` Chris Wedgwood
2006-12-13 19:57       ` Christoph Anton Mitterer
2006-12-13 22:39         ` Lennart Sorensen
2006-12-13 23:00           ` Christoph Anton Mitterer
2006-12-13 19:53   ` Erik Andersen
2006-12-13 19:59     ` Karsten Weiss
2006-12-13 20:02       ` Christoph Anton Mitterer
2006-12-13 20:29   ` Erik Andersen
2006-12-13 20:32     ` Christoph Anton Mitterer
2006-12-13 23:33     ` Christoph Anton Mitterer
2006-12-14  9:24       ` Muli Ben-Yehuda
2006-12-14 19:23         ` Christoph Anton Mitterer
2006-12-14  9:23     ` Muli Ben-Yehuda
2006-12-14  9:52       ` Erik Andersen
2006-12-14  9:56         ` Muli Ben-Yehuda
2007-01-03 15:02     ` Christoph Anton Mitterer
2007-01-04 13:04     ` Christoph Anton Mitterer
  -- strict thread matches above, loose matches on Subject: below --
2006-12-15 15:57 Paul Slootman
     [not found] <fa.E9jVXDLMKzMZNCbslzUxjMhsInE@ifi.uio.no>
2007-01-03 23:41 ` Robert Hancock
2007-01-15 22:56   ` Christoph Anton Mitterer
2007-01-15 23:05     ` Christoph Anton Mitterer
2007-01-16  0:23       ` Robert Hancock
2007-01-16 13:54         ` Christoph Anton Mitterer
2007-01-16 14:26           ` Robert Hancock
2007-03-22 12:32     ` Christoph Anton Mitterer
2007-03-22 14:48 Dan Halbert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=45804C0B.4030109@scientia.net \
    --to=calestyo@scientia.net \
    --cc=ak@suse.de \
    --cc=andersen@codepoet.org \
    --cc=cw@f00f.org \
    --cc=knweiss@gmx.de \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox