* Re: help with next generation bkbits please
2004-09-21 1:22 help with next generation bkbits please Larry McVoy
@ 2004-09-21 1:25 ` Larry McVoy
2004-09-21 8:13 ` Denis Vlasenko
1 sibling, 0 replies; 3+ messages in thread
From: Larry McVoy @ 2004-09-21 1:25 UTC (permalink / raw)
To: linux-kernel
Forgot memory scrubber, here it is.
#include <stdio.h>
#include <stdlib.h>
#include <strings.h>
#include <unistd.h>
#define unless(x) if (!(x))
#define u32 unsigned int
#define status(s) write(1, s, 1); newline = 1;
int newline;
void doit(u32 bs);
int
main(int ac, char **av)
{
if (ac != 2) {
printf("Usage: %s amount_in_MB\n", av[0]);
exit(0);
}
doit(atoi(av[1])<<20);
return (0);
}
void
bad(u32 *start, u32 *p, u32 want)
{
u32 off = (char*)p - (char*)start;
u32 got = *p;
if (newline) printf("\n");
newline = 0;
printf("WANT=0x%08x GOT=0x%08x DIFF=0x%08x OFF=0x%08x ADDR=%p\n",
want, got, want - got, off, (void*)p);
}
void
doit(u32 bs)
{
u32 *buf = malloc(bs);
u32 *end;
u32 *p;
u32 off;
fprintf(stderr, "Scrub %u bytes\n", bs);
unless (buf) {
perror("malloc");
exit(1);
}
end = (u32*)((char*)buf + bs);
bzero(buf, bs);
unless (sizeof(int) == 4) {
fprintf(stderr, "Expected 4 byte ints\n");
exit(1);
}
for (off = 0, p = buf; p < end; *p++ = off, off += 4);
for (;;) {
for (off = 0, p = buf; p < end; off += 4) {
if (*p != off) bad(buf, p, off);
*p++ = 0xdeadbeef;
}
status("d");
for (off = 0, p = buf; p < end; off += 4) {
if (*p != 0xdeadbeef) bad(buf, p, 0xdeadbeef);
*p++ = 0x50505050;
}
status("5");
for (off = 0, p = buf; p < end; off += 4) {
if (*p != 0x50505050) bad(buf, p, 0x50505050);
*p++ = 0x0a0a0a0a;
}
status("a");
for (off = 0, p = buf; p < end; off += 4) {
if (*p != 0x0a0a0a0a) bad(buf, p, 0x0a0a0a0a);
*p++ = 0x55555555;
}
status("-");
for (off = 0, p = buf; p < end; off += 4) {
if (*p != 0x55555555) bad(buf, p, 0x55555555);
*p++ = 0xaaaaaaaa;
}
status("A");
for (off = 0, p = buf; p < end; off += 4) {
if (*p != 0xaaaaaaaa) bad(buf, p, 0xaaaaaaaa);
*p++ = 0x0;
}
status("0");
for (off = 0, p = buf; p < end; off += 4) {
if (*p != 0x0) bad(buf, p, 0);
*p++ = 0xffffffff;
}
status("f");
for (off = 0, p = buf; p < end; off += 4) {
if (*p != 0xffffffff) bad(buf, p, 0xffffffff);
*p++ = off;
}
status("o");
}
}
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: help with next generation bkbits please
2004-09-21 1:22 help with next generation bkbits please Larry McVoy
2004-09-21 1:25 ` Larry McVoy
@ 2004-09-21 8:13 ` Denis Vlasenko
1 sibling, 0 replies; 3+ messages in thread
From: Denis Vlasenko @ 2004-09-21 8:13 UTC (permalink / raw)
To: Larry McVoy, linux-kernel
On Tuesday 21 September 2004 04:22, Larry McVoy wrote:
> Hi,
>
> we're trying to upgrade bkbits.net for your pushing&pulling pleasure
> and we're having some problems.
>
> We wanted to throw lots of memory at the problem so we went with an
> ASUS SK8V motherboard, opteron 148, 4 x 1GB registered / ECC dimms.
> We thought we would be careful so we bought dimms that ASUS claims works.
>
> We can't get the system to stabilize and we're looking for either
> a) information on how to do that or
> b) a suggestion for a machine which will support 4GB or more
>
> What we are currently seeing looks like a cache writeback problem.
> I have a simple memory scrubber, see below, which just cycles through a
> series of patterns, verifying the previous one and writing a new one,
> switch pattern, repeat until pattern list is exhausted, then loop.
> We cycle through the offset into the array, 0xdeadbeef, 0x50505050,
> 0x0a0a0a0a, 0x55555555, 0xaaaaaaaa, 0, 0xffffffff.
>
> What we see is that for 16x4 bytes in a row we will get errors where
> what we get is the previous value. In other words, we just went through
Show the output of scrubber. Does it happen on random addresses?
Same address? With which sizes (L1/L2/main RAM) does it happen? etc...
> a loop that verified that all the data is 0xdeadbeef and then set it
> to 0x50505050, and then in the next loop 16 values will be 0xdeadbeef.
> In other words, it looks like the cache writeback didn't work, it's as
> if the dirty bits were cleared for some reason.
64 bytes is a cacheline size for Opteron. You may have a faulty CPU.
--
vda
^ permalink raw reply [flat|nested] 3+ messages in thread