* Dom0 crashing on x86_64
@ 2005-07-12 18:09 David F Barrera
2005-07-12 22:44 ` Vincent Hanquez
0 siblings, 1 reply; 8+ messages in thread
From: David F Barrera @ 2005-07-12 18:09 UTC (permalink / raw)
To: xen-devel
I am seeing a problem with Dom0 crashing on x86_64 whenever I create a
DomU. I've done some more testing, and it appears that this problem is
somehow related to networking. Dom0 crashes as soon as the networking
services are started when DomU is coming up. As an experiment, I
brought up DomU without networking, and it stayed up. As soon as I
started DomU with networking enabled, however, Dom0 crashed. Below is
the trace:
Unable to handle kernel paging request at ffffc20000036000 RIP:
<ffffffff802afff9>{net_rx_action+1209}
PGD 13e4067 PUD 13e3067 PMD 13e2067 PTE 0
Oops: 0000 [1]
CPU 0
Modules linked in: thermal processor fan button battery ac
Pid: 2712, comm: sshd Not tainted 2.6.12-xen0
RIP: e030:[<ffffffff802afff9>] <ffffffff802afff9>{net_rx_action+1209}
RSP: e02b:ffff88000290d7f8 EFLAGS: 00010202
RAX: ffffc20000035ff0 RBX: ffff88000de9bb60 RCX: 00000000000000ff
RDX: 0000000000000001 RSI: ffffc20000036000 RDI: 000000000000000e
RBP: ffff88000b5f7c80 R08: 00000000000000ff R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000206 R12: 0000000010c1a06e
R13: ffffffff804df7c0 R14: 0000000000000072 R15: ffffffff804e8800
FS: 00002aaaac231040(0000) GS:ffffffff8050ae80(0000)
knlGS:0000000000000000
CS: e033 DS: 0000 ES: 0000
Process sshd (pid: 2712, threadinfo ffff88000290c000, task
ffff88000c02ef30)
Stack: ffff88000de9bb60 0000000080397db8 0000000000000001
ffff88000290d840
ffff88000a93d380 ffffffff8014bd98 0000000000000000
ffff88000c02ef30
ffffffff8013000e ffffffff80355f6c
Call Trace:<ffffffff8014bd98>{mempool_alloc+152}
<ffffffff8013000e>{proc_opensys+30}
<ffffffff80355f6c>{nf_iterate+92}
<ffffffff80397a50>{br_nf_pre_routing_finish+0}
<ffffffff80356b4d>{nf_hook_slow+125}
<ffffffff80397a50>{br_nf_pre_routing_finish+0}
<ffffffff803984d1>{br_nf_pre_routing+1793}
<ffffffff8014bceb>{mempool_free+171}
<ffffffff80355f6c>{nf_iterate+92}
<ffffffff80393850>{br_handle_fra\uffff\uffff\uffff\uffff\uffff\uffff
\uffff"\ud455\uffff}\uffff
--
Regards,
David F Barrera
Linux Technology Center
Systems and Technology Group, IBM
"The wisest men follow their own direction. "
Euripides
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Dom0 crashing on x86_64
2005-07-12 18:09 David F Barrera
@ 2005-07-12 22:44 ` Vincent Hanquez
2005-07-13 14:07 ` David F Barrera
0 siblings, 1 reply; 8+ messages in thread
From: Vincent Hanquez @ 2005-07-12 22:44 UTC (permalink / raw)
To: David F Barrera; +Cc: xen-devel
On Tue, Jul 12, 2005 at 01:09:09PM -0500, David F Barrera wrote:
> I am seeing a problem with Dom0 crashing on x86_64 whenever I create a
> DomU. I've done some more testing, and it appears that this problem is
> somehow related to networking. Dom0 crashes as soon as the networking
> services are started when DomU is coming up. As an experiment, I
> brought up DomU without networking, and it stayed up. As soon as I
> started DomU with networking enabled, however, Dom0 crashed. Below is
> the trace:
Hi David,
I'm quite confused by other reports.
Your latest "Daily Xen build" and Paul Larson's reply suggest that this
bug was fix.
As well, is that on SLES9 userspace ?
Cheers,
--
Vincent Hanquez
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Dom0 crashing on x86_64
2005-07-12 22:44 ` Vincent Hanquez
@ 2005-07-13 14:07 ` David F Barrera
0 siblings, 0 replies; 8+ messages in thread
From: David F Barrera @ 2005-07-13 14:07 UTC (permalink / raw)
To: Vincent Hanquez; +Cc: xen-devel
On Wed, 2005-07-13 at 00:44 +0200, Vincent Hanquez wrote:
> On Tue, Jul 12, 2005 at 01:09:09PM -0500, David F Barrera wrote:
> > I am seeing a problem with Dom0 crashing on x86_64 whenever I create a
> > DomU. I've done some more testing, and it appears that this problem is
> > somehow related to networking. Dom0 crashes as soon as the networking
> > services are started when DomU is coming up. As an experiment, I
> > brought up DomU without networking, and it stayed up. As soon as I
> > started DomU with networking enabled, however, Dom0 crashed. Below is
> > the trace:
>
> Hi David,
>
> I'm quite confused by other reports.
> Your latest "Daily Xen build" and Paul Larson's reply suggest that this
> bug was fix.
Vincent,
I understand. My report did suggest that the problem was fixed; however,
it was incorrect, as I later found out. It turns out that the DomU that
I had created did not have networking setup properly, thus the VM seemed
functional. When I corrected the networking setup and started a DomU,
Dom0 crashed. By the way, I just did it today, and the same thing is
happening--Dom0 is crashing. This is the trace that I see on the serial
console:
Unable to handle kernel NULL pointer dereference at 0000000000000c20
RIP:
<ffffffff80118aba>{do_page_fault+426}
PGD d313067 PUD d312067 PMD 0
Oops: 0000 [1]
CPU 0
Modules linked in: thermal processor fan button battery ac
Pid: 0, comm: swapper Not tainted 2.6.12-xen0
RIP: e030:[<ffffffff80118aba>] <ffffffff80118aba>{do_page_fault+426}
RSP: e02b:ffffffff8054ba00 EFLAGS: 00010202
RAX: 00000000013e4067 RBX: 0000000000000c20 RCX: 0000000000000000
RDX: 0000000000000067 RSI: 00000000093e4067 RDI: ffff800000000000
RBP: 0000000000000c20 R08: 00000000000000ff R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
R13: ffffc20000036000 R14: 0000000000000000 R15: ffffffff8054bb00
FS: 0000000000000000(0000) GS:ffffffff80537b80(0000)
knlGS:0000000000000000
CS: e033 DS: 0000 ES: 0000
Process swapper (pid: 0, threadinfo ffffffff8054a000, task
ffffffff80435680)
Stack: ffff88000f414000 fff݅
>
> As well, is that on SLES9 userspace ?
>
> Cheers,
--
Regards,
David F Barrera
Linux Technology Center
Systems and Technology Group, IBM
"The wisest men follow their own direction. "
Euripides
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: Dom0 crashing on x86_64
@ 2005-07-13 14:39 Li, Xin B
0 siblings, 0 replies; 8+ messages in thread
From: Li, Xin B @ 2005-07-13 14:39 UTC (permalink / raw)
To: David F Barrera, Vincent Hanquez; +Cc: xen-devel
David F Barrera wrote:
> This is the trace that I see on the serial console:
>
> Unable to handle kernel NULL pointer dereference at
> 0000000000000c20 RIP:
> <ffffffff80118aba>{do_page_fault+426}
> PGD d313067 PUD d312067 PMD 0
> Oops: 0000 [1]
> CPU 0
> Modules linked in: thermal processor fan button battery ac
> Pid: 0, comm: swapper Not tainted 2.6.12-xen0
> RIP: e030:[<ffffffff80118aba>]
> <ffffffff80118aba>{do_page_fault+426} RSP:
> e02b:ffffffff8054ba00 EFLAGS: 00010202
> RAX: 00000000013e4067 RBX: 0000000000000c20 RCX:
> 0000000000000000
> RDX: 0000000000000067 RSI: 00000000093e4067 RDI:
> ffff800000000000
> RBP: 0000000000000c20 R08: 00000000000000ff R09:
> 0000000000000000
> R10: 0000000000000000 R11: 0000000000000206 R12:
> 0000000000000000
> R13: ffffc20000036000 R14: 0000000000000000 R15:
> ffffffff8054bb00
> FS: 0000000000000000(0000) GS:ffffffff80537b80(0000)
> knlGS:0000000000000000
> CS: e033 DS: 0000 ES: 0000
> Process swapper (pid: 0, threadinfo ffffffff8054a000, task
> ffffffff80435680)
> Stack: ffff88000f414000 fff
It is caused by checkin of changeset 5648: Remove non-ISO attributes
from public headers.(
http://xenbits.xensource.com/xen-unstable.hg?cmd=changeset;node=2b6c1a80
98078f7e53de7cf72227fddf01f0b2b6 ). Actually, on x86_64 xenlinux, only
the change to xen/include/public/io/netif.h caused this issue, other
part of this changeset are OK. After reverting the changes to this
file, this issue is gone, but we need a clean patch to this issue.
Here we also found that, on i386 xenlinux, mmap001 of LTP will crash
domU, I'm doubting it is also introduced by this changeset.
-Xin
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: Dom0 crashing on x86_64
@ 2005-07-15 8:07 Li, Xin B
2005-07-15 8:20 ` Keir Fraser
0 siblings, 1 reply; 8+ messages in thread
From: Li, Xin B @ 2005-07-15 8:07 UTC (permalink / raw)
To: Li, Xin B, David F Barrera, Vincent Hanquez; +Cc: xen-devel
This bug is caused by the size of netif_tx_request_t/netif_rx_response_t
on x86_64, which is using 8 byte alignment. When PACKET is removed by
changeset 5648, their sizes are changed from 12 to 16, then
netif_tx_interface_t/netif_rx_interface_t will overflow a page.
We have 2 ways to resolve this bug:
1. add back __attribute__((packed)) to the definition of the two
structures.
2. add #pragma pack(4) to netif.h as:
diff -r 1d026c7023d2 xen/include/public/io/netif.h
--- a/xen/include/public/io/netif.h Thu Jul 14 23:48:06 2005
+++ b/xen/include/public/io/netif.h Fri Jul 15 19:17:52 2005
@@ -8,6 +8,10 @@
#ifndef __XEN_PUBLIC_IO_NETIF_H__
#define __XEN_PUBLIC_IO_NETIF_H__
+
+#ifdef __x86_64__
+#pragma pack(4)
+#endif
typedef struct netif_tx_request {
memory_t addr; /* Machine address of packet. */
3. define a smaller value on x86_64 for
NETIF_TX_RING_SIZE/NETIF_RX_RING_SIZE, 128?
Keir, which one do you perfer?
-Xin
Li, Xin B wrote:
> David F Barrera wrote:
>> This is the trace that I see on the serial console:
>>
>> Unable to handle kernel NULL pointer dereference at
>> 0000000000000c20 RIP:
>> <ffffffff80118aba>{do_page_fault+426}
>> PGD d313067 PUD d312067 PMD 0
>> Oops: 0000 [1]
>> CPU 0
>> Modules linked in: thermal processor fan button battery
>> ac Pid: 0, comm: swapper Not tainted 2.6.12-xen0
>> RIP: e030:[<ffffffff80118aba>]
>> <ffffffff80118aba>{do_page_fault+426} RSP:
>> e02b:ffffffff8054ba00 EFLAGS: 00010202
>> RAX: 00000000013e4067 RBX: 0000000000000c20 RCX:
>> 0000000000000000
>> RDX: 0000000000000067 RSI: 00000000093e4067 RDI:
>> ffff800000000000
>> RBP: 0000000000000c20 R08: 00000000000000ff R09:
>> 0000000000000000
>> R10: 0000000000000000 R11: 0000000000000206 R12:
>> 0000000000000000
>> R13: ffffc20000036000 R14: 0000000000000000 R15:
>> ffffffff8054bb00
>> FS: 0000000000000000(0000) GS:ffffffff80537b80(0000)
>> knlGS:0000000000000000 CS: e033 DS: 0000 ES: 0000
>> Process swapper (pid: 0, threadinfo ffffffff8054a000,
>> task ffffffff80435680) Stack: ffff88000f414000 fff
>
> It is caused by checkin of changeset 5648: Remove non-ISO
> attributes from public headers.(
>
http://xenbits.xensource.com/xen-unstable.hg?cmd=changeset;node=2b6c1a80
> 98078f7e53de7cf72227fddf01f0b2b6 ). Actually, on x86_64
> xenlinux, only the change to
> xen/include/public/io/netif.h caused this issue, other
> part of this changeset are OK. After reverting the
> changes to this file, this issue is gone, but we need a
> clean patch to this issue. Here we also found that, on
> i386 xenlinux, mmap001 of LTP will crash domU, I'm
> doubting it is also introduced by this changeset.
>
> -Xin
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Dom0 crashing on x86_64
2005-07-15 8:07 Li, Xin B
@ 2005-07-15 8:20 ` Keir Fraser
0 siblings, 0 replies; 8+ messages in thread
From: Keir Fraser @ 2005-07-15 8:20 UTC (permalink / raw)
To: Li, Xin B; +Cc: David F Barrera, xen-devel, Vincent Hanquez
On 15 Jul 2005, at 09:07, Li, Xin B wrote:
> 3. define a smaller value on x86_64 for
> NETIF_TX_RING_SIZE/NETIF_RX_RING_SIZE, 128?
>
> Keir, which one do you perfer?
This one is fine for now. I'll add that in with a comment that it can
be removed when we switch to grant tables for netfront/netback. That
will get rid of the 8-byte memory_t out of the structures and relax
natural alignment restrictions.
-- Keir
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: Dom0 crashing on x86_64
@ 2005-07-15 8:26 Li, Xin B
2005-07-15 19:11 ` Jerone Young
0 siblings, 1 reply; 8+ messages in thread
From: Li, Xin B @ 2005-07-15 8:26 UTC (permalink / raw)
To: Keir Fraser; +Cc: David F Barrera, xen-devel, Vincent Hanquez
This patch fixes x86_64 domU network crashes dom0.
This bug is caused by the size of netif_tx_request_t/netif_rx_response_t
on x86_64, which is using 8 byte alignment. When PACKET is removed by
changeset 5648, their sizes are changed from 12 to 16, then
netif_tx_interface_t/netif_rx_interface_t will overflow a page.
Signed-off-by: Xin Li <xin.b.li@intel.com>
Signed-off-by: Xiaofeng Ling <xiaofeng.lingi@intel.com>
-Xin
diff -r 1d026c7023d2 xen/include/public/io/netif.h
--- a/xen/include/public/io/netif.h Thu Jul 14 23:48:06 2005
+++ b/xen/include/public/io/netif.h Fri Jul 15 19:55:23 2005
@@ -21,11 +21,11 @@
s8 status;
} netif_tx_response_t;
-typedef struct {
+typedef struct netif_rx_request {
u16 id; /* Echoed in response message. */
} netif_rx_request_t;
-typedef struct {
+typedef struct netif_rx_response {
memory_t addr; /* Machine address of packet. */
u16 csum_valid:1; /* Protocol checksum is validated? */
u16 id:15;
@@ -46,8 +46,13 @@
#define MASK_NETIF_RX_IDX(_i) ((_i)&(NETIF_RX_RING_SIZE-1))
#define MASK_NETIF_TX_IDX(_i) ((_i)&(NETIF_TX_RING_SIZE-1))
+#ifdef __x86_64__
+#define NETIF_TX_RING_SIZE 128
+#define NETIF_RX_RING_SIZE 128
+#else
#define NETIF_TX_RING_SIZE 256
#define NETIF_RX_RING_SIZE 256
+#endif
/* This structure must fit in a memory page. */
typedef struct netif_tx_interface {
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: Dom0 crashing on x86_64
2005-07-15 8:26 Dom0 crashing on x86_64 Li, Xin B
@ 2005-07-15 19:11 ` Jerone Young
0 siblings, 0 replies; 8+ messages in thread
From: Jerone Young @ 2005-07-15 19:11 UTC (permalink / raw)
To: Li, Xin B; +Cc: xen-devel, David F Barrera, Vincent Hanquez
Awsome! Patch works...can now do networking in x86-64 domU without
crashing Xen.
On Fri, 2005-07-15 at 16:26 +0800, Li, Xin B wrote:
> This patch fixes x86_64 domU network crashes dom0.
>
> This bug is caused by the size of netif_tx_request_t/netif_rx_response_t
> on x86_64, which is using 8 byte alignment. When PACKET is removed by
> changeset 5648, their sizes are changed from 12 to 16, then
> netif_tx_interface_t/netif_rx_interface_t will overflow a page.
>
> Signed-off-by: Xin Li <xin.b.li@intel.com>
> Signed-off-by: Xiaofeng Ling <xiaofeng.lingi@intel.com>
>
> -Xin
>
> diff -r 1d026c7023d2 xen/include/public/io/netif.h
> --- a/xen/include/public/io/netif.h Thu Jul 14 23:48:06 2005
> +++ b/xen/include/public/io/netif.h Fri Jul 15 19:55:23 2005
> @@ -21,11 +21,11 @@
> s8 status;
> } netif_tx_response_t;
>
> -typedef struct {
> +typedef struct netif_rx_request {
> u16 id; /* Echoed in response message. */
> } netif_rx_request_t;
>
> -typedef struct {
> +typedef struct netif_rx_response {
> memory_t addr; /* Machine address of packet. */
> u16 csum_valid:1; /* Protocol checksum is validated? */
> u16 id:15;
> @@ -46,8 +46,13 @@
> #define MASK_NETIF_RX_IDX(_i) ((_i)&(NETIF_RX_RING_SIZE-1))
> #define MASK_NETIF_TX_IDX(_i) ((_i)&(NETIF_TX_RING_SIZE-1))
>
> +#ifdef __x86_64__
> +#define NETIF_TX_RING_SIZE 128
> +#define NETIF_RX_RING_SIZE 128
> +#else
> #define NETIF_TX_RING_SIZE 256
> #define NETIF_RX_RING_SIZE 256
> +#endif
>
> /* This structure must fit in a memory page. */
> typedef struct netif_tx_interface {
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>
--
Jerone Young
IBM Linux Technology Center
jyoung5@us.ibm.com
512-838-1157 (T/L: 678-1157)
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2005-07-15 19:11 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-07-15 8:26 Dom0 crashing on x86_64 Li, Xin B
2005-07-15 19:11 ` Jerone Young
-- strict thread matches above, loose matches on Subject: below --
2005-07-15 8:07 Li, Xin B
2005-07-15 8:20 ` Keir Fraser
2005-07-13 14:39 Li, Xin B
2005-07-12 18:09 David F Barrera
2005-07-12 22:44 ` Vincent Hanquez
2005-07-13 14:07 ` David F Barrera
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.