All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christoph Anton Mitterer <calestyo@scientia.net>
To: Karsten Weiss <knweiss@gmx.de>
Cc: linux-kernel@vger.kernel.org, ak@suse.de, andersen@codepoet.org,
	cw@f00f.org
Subject: Re: data corruption with nvidia chipsets and IDE/SATA drives // memory hole mapping related bug?!
Date: Wed, 13 Dec 2006 19:52:59 +0100	[thread overview]
Message-ID: <45804C0B.4030109@scientia.net> (raw)
In-Reply-To: <Pine.LNX.4.64.0612021048200.2981@addx.localnet>

[-- Attachment #1: Type: text/plain, Size: 2062 bytes --]

Karsten Weiss wrote:
> Here's a diff of a corrupted and a good file written during our
> testcase:
>
> ("-" == corrupted file, "+" == good file)
> ...
>   009f2ff0  67 2a 4c c4 6d 9d 34 44  ad e6 3c 45 05 9a 4d c4  |g*L.m.4D..<E..M.|
> -009f3000  39 60 e6 44 20 ab 46 44  56 aa 46 44 c2 35 e6 44  |9.D .FDV.FD.5.D|
> ....
> +009f3ff0  f3 55 92 44 c1 10 6c 45  5e 12 a0 c3 60 31 93 44  |.U.D..lE^...1.D|
>   009f4000  88 cd 6b 45 c1 6d cd c3  00 a5 8b 44 f2 ac 6b 45  |..kE.m.....D..kE|
>   
Well as I told in my mails to the list I made the experience that not
all bytes of the corrupted area are invalid,.. but only some,.. while it
seems that in you diff ALL the bytes are wrong, right?


> Please notice:
>
> a) the corruption begins at a page boundary
> b) the corrupted byte range is a single memory page and
> c) almost every fourth byte is set to 0x44 in the corrupted case
>     (but the other bytes changed, too)
>
> To me this looks as if a wrong memory page got written into the
> file.
>   
Hmm and do you have any ideas what's the reason for all this? Defect in
the nforce chipset? Or even in the CPU (the Opterons do have integrated
memory controllers).


> >From our testing I can also tell that the data corruption does
> *not* appear at all when we are booting the nodes with mem=2G.
> However, when we are using all the 4GB the data corruption
> shows up - but not everytime and thus not on all nodes.
> Sometimes a node runs for ours without any problem. That's why
> we are testing on 32 nodes in parallel most of the time. I have
> the impression that it has something to do with physical memory
> layout of the running processes.
>   
Hmm maybe,.. but I have absolutely no idea ;)


> Please also notice that this is a silent data corruption. I.e.
> there are no error or warning messages in the kernel log or the
> mce log at all.
>   
Yes I can confirm that.


> Christoph, I will carefully re-read your entire posting and the
> included links on Monday and will also try the memory hole
> setting.
>   
And did you get out anything new?

[-- Attachment #2: calestyo.vcf --]
[-- Type: text/x-vcard, Size: 156 bytes --]

begin:vcard
fn:Mitterer, Christoph Anton
n:Mitterer;Christoph Anton
email;internet:calestyo@scientia.net
x-mozilla-html:TRUE
version:2.1
end:vcard


  parent reply	other threads:[~2006-12-13 18:59 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-12-02  0:56 data corruption with nvidia chipsets and IDE/SATA drives // memory hole mapping related bug?! Christoph Anton Mitterer
2006-12-02  1:15 ` Erik Andersen
2006-12-02  1:28   ` Christoph Anton Mitterer
2006-12-02  5:17 ` Chris Wedgwood
2006-12-02 12:10   ` Christoph Anton Mitterer
     [not found]   ` <20061202111644.GF9995@vianova.fi>
2006-12-08  2:16     ` Christoph Anton Mitterer
2006-12-02 11:00 ` Karsten Weiss
2006-12-02 11:37   ` Alan
2006-12-02 11:39     ` Christoph Anton Mitterer
2006-12-13 18:52   ` Christoph Anton Mitterer [this message]
2006-12-13 19:56     ` Karsten Weiss
2006-12-13 20:11       ` Christoph Anton Mitterer
2006-12-14  9:34         ` Chris Wedgwood
2007-01-15 22:26           ` Christoph Anton Mitterer
2006-12-03  1:17 ` data corruption with nvidia chipsets and IDE/SATA drives Kurtis D. Rader
2006-12-03  3:35   ` Kurtis D. Rader
2006-12-03 14:17   ` Steffen Moser
2006-12-04  1:58   ` data corruption with nvidia nForce 4 " Kurtis D. Rader
2006-12-04 12:47     ` Alan
2006-12-05  6:00   ` data corruption with nvidia " Kurtis D. Rader
2006-12-06 11:11   ` Christian
2006-12-06 21:25     ` Chris Wedgwood
2006-12-14 23:39 ` data corruption with nvidia chipsets and IDE/SATA drives // memory hole mapping related bug?! Dax Kelson
     [not found] <Pine.LNX.4.64.0612021202000.2981@addx.localnet>
2006-12-11  9:24 ` Karsten Weiss
2006-12-13 19:18   ` Christoph Anton Mitterer
2006-12-13 19:53     ` Chris Wedgwood
2006-12-13 20:34       ` Karsten Weiss
2006-12-14  9:22         ` Muli Ben-Yehuda
2006-12-23  2:04         ` Christoph Anton Mitterer
2006-12-23  2:56           ` John A Chaves
2006-12-23  3:26             ` Christoph Anton Mitterer
2006-12-13 19:20   ` Christoph Anton Mitterer
2006-12-13 19:54     ` Chris Wedgwood
2006-12-13 19:57       ` Christoph Anton Mitterer
2006-12-13 22:39         ` Lennart Sorensen
2006-12-13 23:00           ` Christoph Anton Mitterer
2006-12-13 19:53   ` Erik Andersen
2006-12-13 19:59     ` Karsten Weiss
2006-12-13 20:02       ` Christoph Anton Mitterer
2006-12-13 20:29   ` Erik Andersen
2006-12-13 20:32     ` Christoph Anton Mitterer
2006-12-13 23:33     ` Christoph Anton Mitterer
2006-12-14  9:24       ` Muli Ben-Yehuda
2006-12-14 19:23         ` Christoph Anton Mitterer
2006-12-14  9:23     ` Muli Ben-Yehuda
2006-12-14  9:52       ` Erik Andersen
2006-12-14  9:56         ` Muli Ben-Yehuda
2007-01-03 15:02     ` Christoph Anton Mitterer
2007-01-04 13:04     ` Christoph Anton Mitterer
  -- strict thread matches above, loose matches on Subject: below --
2006-12-15 15:57 Paul Slootman
     [not found] <fa.E9jVXDLMKzMZNCbslzUxjMhsInE@ifi.uio.no>
2007-01-03 23:41 ` Robert Hancock
2007-01-15 22:56   ` Christoph Anton Mitterer
2007-01-15 23:05     ` Christoph Anton Mitterer
2007-01-16  0:23       ` Robert Hancock
2007-01-16 13:54         ` Christoph Anton Mitterer
2007-01-16 14:26           ` Robert Hancock
2007-03-22 12:32     ` Christoph Anton Mitterer
2007-03-22 14:48 Dan Halbert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=45804C0B.4030109@scientia.net \
    --to=calestyo@scientia.net \
    --cc=ak@suse.de \
    --cc=andersen@codepoet.org \
    --cc=cw@f00f.org \
    --cc=knweiss@gmx.de \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.