* weird memory access problem running on dom0
@ 2013-10-22 15:41 Alice Wan
2013-10-22 16:28 ` Andrew Cooper
0 siblings, 1 reply; 6+ messages in thread
From: Alice Wan @ 2013-10-22 15:41 UTC (permalink / raw)
To: jbeulich, keir, Konrad Rzeszutek Wilk; +Cc: xen-devel@lists.xensource.com
[-- Attachment #1.1: Type: text/plain, Size: 1896 bytes --]
hi all,
recently we met an unbelievable weird memory problem running on dom0,
test case is very simple, code is as following:
#define BUF_SIZE 4096
#define IO_PATTERN 0xab
int main(int argc, char *argv[])
{
void *buf;
char cmp_buf[BUF_SIZE];
int err = 0;
buf = malloc(BUF_SIZE);
if (!buf) {
fprintf(stderr, "error %s during %s\n",
strerror(-err),
"malloc");
return 1;
}
memset(buf, IO_PATTERN, BUF_SIZE);
memset(cmp_buf, IO_PATTERN, BUF_SIZE);
if (memcmp(buf, cmp_buf, BUF_SIZE)) {
unsigned long long *ubuf = (unsigned long long *)buf;
int i;
for (i = 0; i < BUF_SIZE / sizeof(unsigned long long); i++)
printf("%d: 0x%llx\n", i, ubuf[i]);
return 2;
}
return 0;
}
memcmp failure occurs while the case is running on 500 machines with
Xen, each for billion times.
error log has two results, one is 0x0, it shows buf is zero, the other
one is 0xabababa...ababa, it shows cmp_buf isn't 0xabab..ab
both of error log shows either buf or cmp_buf is all incorrect.
However, this case pass when we run on native linux kernel(2.6.32)
without Xen.
we suspect maybe it's relevent to pvops behavior of dom0.
we're not sure whether it's a bug fixed in newer version of kernel and
xen, so we have tried diffrent version of Xen and dom0 including
Xen4.0.1+kernel2.6.32/3.0/3.11 and Xen4.2 + kernel2.6.32, unfortunately,
all of these failed.
we found PAT behaves differenly between linux and xen, so we try to add
nopat into command line of kernel 3.11, and it also failed.
now we're blocked, realy need some help.
any advice will be appreciated
thanks in advance
regards,
wanjia
[-- Attachment #1.2: Type: text/html, Size: 2735 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: weird memory access problem running on dom0
2013-10-22 15:41 weird memory access problem running on dom0 Alice Wan
@ 2013-10-22 16:28 ` Andrew Cooper
2013-10-23 4:36 ` Alice Wan
0 siblings, 1 reply; 6+ messages in thread
From: Andrew Cooper @ 2013-10-22 16:28 UTC (permalink / raw)
To: Alice Wan; +Cc: xen-devel@lists.xensource.com, keir, jbeulich
[-- Attachment #1.1: Type: text/plain, Size: 2283 bytes --]
On 22/10/13 16:41, Alice Wan wrote:
> hi all,
>
> recently we met an unbelievable weird memory problem running on
> dom0, test case is very simple, code is as following:
>
> #define BUF_SIZE 4096
> #define IO_PATTERN 0xab
>
> int main(int argc, char *argv[])
> {
> void *buf;
> char cmp_buf[BUF_SIZE];
> int err = 0;
>
> buf = malloc(BUF_SIZE);
> if (!buf) {
> fprintf(stderr, "error %s during %s\n",
> strerror(-err),
> "malloc");
> return 1;
> }
> memset(buf, IO_PATTERN, BUF_SIZE);
> memset(cmp_buf, IO_PATTERN, BUF_SIZE);
>
> if (memcmp(buf, cmp_buf, BUF_SIZE)) {
> unsigned long long *ubuf = (unsigned long long *)buf;
> int i;
>
> for (i = 0; i < BUF_SIZE / sizeof(unsigned long long);
> i++)
> printf("%d: 0x%llx\n", i, ubuf[i]);
>
> return 2;
> }
>
> return 0;
> }
>
> memcmp failure occurs while the case is running on 500 machines
> with Xen, each for billion times.
> error log has two results, one is 0x0, it shows buf is zero, the
> other one is 0xabababa...ababa, it shows cmp_buf isn't 0xabab..ab
>
> both of error log shows either buf or cmp_buf is all incorrect.
>
> However, this case pass when we run on native linux kernel(2.6.32)
> without Xen.
>
> we suspect maybe it's relevent to pvops behavior of dom0.
>
> we're not sure whether it's a bug fixed in newer version of kernel
> and xen, so we have tried diffrent version of Xen and dom0 including
> Xen4.0.1+kernel2.6.32/3.0/3.11 and Xen4.2 + kernel2.6.32,
> unfortunately, all of these failed.
>
> we found PAT behaves differenly between linux and xen, so we try
> to add nopat into command line of kernel 3.11, and it also failed.
>
> now we're blocked, realy need some help.
>
> any advice will be appreciated
>
> thanks in advance
>
>
>
> regards,
> wanjia
Picking randomly at some ideas:
Do you have ballooning enabled?
At the time of a failure, is there anything interesting in the Linux or
Xen dmesg?
Are you running a debug version of Linux or Xen?
~Andrew
[-- Attachment #1.2: Type: text/html, Size: 5098 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: weird memory access problem running on dom0
2013-10-22 16:28 ` Andrew Cooper
@ 2013-10-23 4:36 ` Alice Wan
2013-10-24 9:12 ` Ian Campbell
2013-10-31 14:54 ` George Dunlap
0 siblings, 2 replies; 6+ messages in thread
From: Alice Wan @ 2013-10-23 4:36 UTC (permalink / raw)
To: Andrew Cooper; +Cc: xen-devel@lists.xensource.com, keir, jbeulich
[-- Attachment #1.1: Type: text/plain, Size: 3153 bytes --]
well, no balloon, command line has dom0_mem=
no any useful dmesg, xm dmesg
kernel haven't config DEBUG
and maybe if we open DEBUG, this problem can't be reproduced.
has any ideas about pte_flags ?
some info about mtrr
reg00: base=0x0ffc00000 ( 4092MB), size= 4MB, count=1: write-protect
reg01: base=0x080000000 ( 2048MB), size= 1024MB, count=1: uncachable
reg02: base=0x0c0000000 ( 3072MB), size= 512MB, count=1: uncachable
reg03: base=0x0e0000000 ( 3584MB), size= 256MB, count=1: uncachable
reg04: base=0x0f0000000 ( 3840MB), size= 128MB, count=1: uncachable
reg05: base=0x0f8000000 ( 3968MB), size= 64MB, count=1: uncachable
reg06: base=0x0fc000000 ( 4032MB), size= 32MB, count=1: uncachable
reg07: base=0x0fec00000 ( 4076MB), size= 4MB, count=1: uncachable
regards,
wanjia
2013/10/23 Andrew Cooper <andrew.cooper3@citrix.com>
> On 22/10/13 16:41, Alice Wan wrote:
>
> hi all,
>
> recently we met an unbelievable weird memory problem running on
> dom0, test case is very simple, code is as following:
>
> #define BUF_SIZE 4096
> #define IO_PATTERN 0xab
>
> int main(int argc, char *argv[])
> {
> void *buf;
> char cmp_buf[BUF_SIZE];
> int err = 0;
>
> buf = malloc(BUF_SIZE);
> if (!buf) {
> fprintf(stderr, "error %s during %s\n",
> strerror(-err),
> "malloc");
> return 1;
> }
> memset(buf, IO_PATTERN, BUF_SIZE);
> memset(cmp_buf, IO_PATTERN, BUF_SIZE);
>
> if (memcmp(buf, cmp_buf, BUF_SIZE)) {
> unsigned long long *ubuf = (unsigned long long *)buf;
> int i;
>
> for (i = 0; i < BUF_SIZE / sizeof(unsigned long long);
> i++)
> printf("%d: 0x%llx\n", i, ubuf[i]);
>
> return 2;
> }
>
> return 0;
> }
>
> memcmp failure occurs while the case is running on 500 machines with
> Xen, each for billion times.
> error log has two results, one is 0x0, it shows buf is zero, the
> other one is 0xabababa...ababa, it shows cmp_buf isn't 0xabab..ab
>
> both of error log shows either buf or cmp_buf is all incorrect.
>
> However, this case pass when we run on native linux kernel(2.6.32)
> without Xen.
>
> we suspect maybe it's relevent to pvops behavior of dom0.
>
> we're not sure whether it's a bug fixed in newer version of kernel
> and xen, so we have tried diffrent version of Xen and dom0 including
> Xen4.0.1+kernel2.6.32/3.0/3.11 and Xen4.2 + kernel2.6.32, unfortunately,
> all of these failed.
>
> we found PAT behaves differenly between linux and xen, so we try to
> add nopat into command line of kernel 3.11, and it also failed.
>
> now we're blocked, realy need some help.
>
> any advice will be appreciated
>
> thanks in advance
>
>
>
> regards,
> wanjia
>
>
> Picking randomly at some ideas:
>
> Do you have ballooning enabled?
>
> At the time of a failure, is there anything interesting in the Linux or
> Xen dmesg?
>
> Are you running a debug version of Linux or Xen?
>
> ~Andrew
>
[-- Attachment #1.2: Type: text/html, Size: 6478 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: weird memory access problem running on dom0
2013-10-23 4:36 ` Alice Wan
@ 2013-10-24 9:12 ` Ian Campbell
2013-10-31 14:54 ` George Dunlap
1 sibling, 0 replies; 6+ messages in thread
From: Ian Campbell @ 2013-10-24 9:12 UTC (permalink / raw)
To: Alice Wan; +Cc: Andrew Cooper, xen-devel@lists.xensource.com, keir, jbeulich
On Wed, 2013-10-23 at 12:36 +0800, Alice Wan wrote:
> kernel haven't config DEBUG
>
> and maybe if we open DEBUG, this problem can't be reproduced.
I think it would be worth trying it in order to confirm or deny this
rather than just supposing it might cause problems.
Ian.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: weird memory access problem running on dom0
2013-10-23 4:36 ` Alice Wan
2013-10-24 9:12 ` Ian Campbell
@ 2013-10-31 14:54 ` George Dunlap
2013-12-01 14:32 ` Alice Wan
1 sibling, 1 reply; 6+ messages in thread
From: George Dunlap @ 2013-10-31 14:54 UTC (permalink / raw)
To: Alice Wan
Cc: Andrew Cooper, xen-devel@lists.xensource.com, Keir Fraser,
Jan Beulich
On Wed, Oct 23, 2013 at 5:36 AM, Alice Wan <wanjia19870902@gmail.com> wrote:
> well, no balloon, command line has dom0_mem=
>
> no any useful dmesg, xm dmesg
>
> kernel haven't config DEBUG
>
> and maybe if we open DEBUG, this problem can't be reproduced.
>
> has any ideas about pte_flags ?
>
> some info about mtrr
>
> reg00: base=0x0ffc00000 ( 4092MB), size= 4MB, count=1: write-protect
> reg01: base=0x080000000 ( 2048MB), size= 1024MB, count=1: uncachable
> reg02: base=0x0c0000000 ( 3072MB), size= 512MB, count=1: uncachable
> reg03: base=0x0e0000000 ( 3584MB), size= 256MB, count=1: uncachable
> reg04: base=0x0f0000000 ( 3840MB), size= 128MB, count=1: uncachable
> reg05: base=0x0f8000000 ( 3968MB), size= 64MB, count=1: uncachable
> reg06: base=0x0fc000000 ( 4032MB), size= 32MB, count=1: uncachable
> reg07: base=0x0fec00000 ( 4076MB), size= 4MB, count=1: uncachable
Any updates on this?
-George
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: weird memory access problem running on dom0
2013-10-31 14:54 ` George Dunlap
@ 2013-12-01 14:32 ` Alice Wan
0 siblings, 0 replies; 6+ messages in thread
From: Alice Wan @ 2013-12-01 14:32 UTC (permalink / raw)
To: George Dunlap
Cc: Andrew Cooper, xen-devel@lists.xensource.com, Keir Fraser,
Jan Beulich
[-- Attachment #1.1: Type: text/plain, Size: 1302 bytes --]
yeah, at last we found it's specific to glibc memset implementation, which
is optimized with sse instructions.
the detail reason is as following
http://lists.xenproject.org/archives/html/xen-devel/2013-11/msg00600.html
thank you all for any advices.
regards,
wanjia
2013/10/31 George Dunlap <George.Dunlap@eu.citrix.com>
> On Wed, Oct 23, 2013 at 5:36 AM, Alice Wan <wanjia19870902@gmail.com>
> wrote:
> > well, no balloon, command line has dom0_mem=
> >
> > no any useful dmesg, xm dmesg
> >
> > kernel haven't config DEBUG
> >
> > and maybe if we open DEBUG, this problem can't be reproduced.
> >
> > has any ideas about pte_flags ?
> >
> > some info about mtrr
> >
> > reg00: base=0x0ffc00000 ( 4092MB), size= 4MB, count=1: write-protect
> > reg01: base=0x080000000 ( 2048MB), size= 1024MB, count=1: uncachable
> > reg02: base=0x0c0000000 ( 3072MB), size= 512MB, count=1: uncachable
> > reg03: base=0x0e0000000 ( 3584MB), size= 256MB, count=1: uncachable
> > reg04: base=0x0f0000000 ( 3840MB), size= 128MB, count=1: uncachable
> > reg05: base=0x0f8000000 ( 3968MB), size= 64MB, count=1: uncachable
> > reg06: base=0x0fc000000 ( 4032MB), size= 32MB, count=1: uncachable
> > reg07: base=0x0fec00000 ( 4076MB), size= 4MB, count=1: uncachable
>
> Any updates on this?
>
> -George
>
[-- Attachment #1.2: Type: text/html, Size: 2087 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2013-12-01 14:32 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-22 15:41 weird memory access problem running on dom0 Alice Wan
2013-10-22 16:28 ` Andrew Cooper
2013-10-23 4:36 ` Alice Wan
2013-10-24 9:12 ` Ian Campbell
2013-10-31 14:54 ` George Dunlap
2013-12-01 14:32 ` Alice Wan
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.