xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* dom0 alignment check panic due to EFLAGS.AC been set
@ 2013-06-01  9:27 Ma JieYue
  2013-06-01 10:59 ` Pasi Kärkkäinen
  0 siblings, 1 reply; 3+ messages in thread
From: Ma JieYue @ 2013-06-01  9:27 UTC (permalink / raw)
  To: xen-devel

Hi, Mr Ian Campbell and other gurus,


We found a xen dom0 alignment check panic problem in our test during
restarting some processes, here is the callstack


alignment check: 0000 [#1] SMP
last sysfs file: /sys/hypervisor/properties/capabilities
CPU 2
Modules linked in: xt_iprange xt_mac arptable_filter arp_tables
xt_physdev 8021q garp xt_state iptable_filter ip_tables autofs4
ipmi_devintf ipmi_si ipmi_msghandler ebtable_filter ebtable_nat
ebtable_broute bridge stp llc ebtables lockd sunrpc bonding ipv6
nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack xenfs dm_multipath fuse
xen_netback xen_blkback blktap blkback_pagemap loop nbd video output
sbs sbshc parport_pc lp parport joydev ses enclosure snd_seq_dummy
serio_raw bnx2 snd_seq_oss snd_seq_midi_event snd_seq dcdbas
snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd
soundcore snd_page_alloc pcspkr iTCO_wdt iTCO_vendor_support shpchp
raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy
async_tx raid10 raid1 raid0 cciss
Pid: 8601, comm: connector Not tainted 2.6.32.36xen #1 PowerEdge R710
RIP: e030:[<ffffffffa02ce51a>] [<ffffffffa02ce51a>]
bond_3ad_get_active_agg_info+0x61/0x74 [bonding]
RSP: e02b:ffff88009222b800 EFLAGS: 00050202
RAX: 0000000000000001 RBX: ffff88009222b838 RCX: ffff880250875580
RDX: ffff88024dc76c50 RSI: ffff88009222b838 RDI: ffff88024dc77200
RBP: ffff88009222b808 R08: ffff880246a72f50 R09: ffffffff816fb2a0
R10: ffff8800af2c10e8 R11: ffffffff813cca10 R12: ffff880250875000
R13: ffff8800af2c10e8 R14: ffff880250875580 R15: ffff88024dc1ae80
FS: 00007fd130d61740(0000) GS:ffff880028072000(0000) knlGS:0000000000000000
CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007fff1cb42c40 CR3: 00000001f8a5f000 CR4: 0000000000002660
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process td_connector (pid: 8601, threadinfo ffff88009222a000, task
ffff88008adcc470)
Stack:
0000000000000002 ffff88009222b878 ffffffffa02cf3db ffff8800af2c10e8
<0> ffff8802508755ac 4f52505f00704550 0000000200000003 0001001100010002
<0> 0000001472655356 0000000000000000 0000000000000002 ffff880250875580
Call Trace:
[<ffffffffa02cf3db>] bond_3ad_xmit_xor+0x70/0x17f [bonding]
[<ffffffffa02ccd1d>] bond_start_xmit+0x391/0x3ea [bonding]
[<ffffffffa0241422>] ? ipv4_confirm+0x179/0x195 [nf_conntrack_ipv4]
[<ffffffff813a3657>] dev_hard_start_xmit+0x1b9/0x27e
[<ffffffff813a644a>] dev_queue_xmit+0x267/0x30e
[<ffffffff813ce523>] ip_finish_output2+0x1a9/0x1ed
[<ffffffff813ce5c9>] ip_finish_output+0x62/0x67
[<ffffffff813ce67c>] ip_output+0xae/0xb5
[<ffffffff813cca20>] dst_output+0x10/0x12
[<ffffffff813ce0d9>] ip_local_out+0x23/0x28
[<ffffffff813cf0fa>] ip_queue_xmit+0x2ce/0x32a
[<ffffffff810acb19>] ? call_rcu_sched+0x15/0x17
[<ffffffff810acb29>] ? call_rcu+0xe/0x10
[<ffffffff8121e3c6>] ? radix_tree_node_free+0x14/0x16
[<ffffffff813dfd6f>] tcp_transmit_skb+0x62d/0x66d
[<ffffffff8100f175>] ? xen_force_evtchn_callback+0xd/0xf
[<ffffffff8100f8d2>] ? check_events+0x12/0x20
[<ffffffff81120369>] ? __d_free+0x50/0x55
[<ffffffff813e118c>] tcp_write_xmit+0x6d8/0x7be
[<ffffffff813e12d7>] __tcp_push_pending_frames+0x2f/0x62
[<ffffffff813e12d7>] __tcp_push_pending_frames+0x2f/0x62
[<ffffffff813e19e3>] tcp_send_fin+0x102/0x10a
[<ffffffff813d59e2>] tcp_close+0x138/0x388
[<ffffffff813f1e0e>] inet_release+0x5d/0x64
[<ffffffff8139361f>] sock_release+0x1f/0x71
[<ffffffff81393af2>] sock_close+0x27/0x2b
[<ffffffff8110f063>] __fput+0x112/0x1b6
[<ffffffff8110f520>] fput+0x1a/0x1c
[<ffffffff8110a5a9>] filp_close+0x6c/0x77
[<ffffffff81058c8b>] put_files_struct+0x7c/0xd0
[<ffffffff81058d18>] exit_files+0x39/0x3e
[<ffffffff8105a059>] do_exit+0x247/0x677
[<ffffffff810673d8>] ? freezing+0x13/0x15
[<ffffffff8105a528>] sys_exit_group+0x0/0x1b
[<ffffffff8106a843>] get_signal_to_deliver+0x300/0x324
[<ffffffff810121da>] do_notify_resume+0x90/0x6d6
[<ffffffff8100c412>] ? xen_mc_flush+0x173/0x195
[<ffffffff8102f82d>] ? paravirt_end_context_switch+0x17/0x31
[<ffffffff8100b459>] ? xen_end_context_switch+0x1e/0x22
[<ffffffff81049a5b>] ? finish_task_switch+0x51/0xa9
[<ffffffff8101303e>] int_signal+0x12/0x17
Code: fc ff ff 48 85 c0 75 e3 83 c8 ff eb 2e 66 8b 42 06 66 89 03 66
8b 42 32 66 89 43 02 8b 42 0c 66 89 43 04 66 8b 42 16 66 89 43 06 <8b>
42 0e 89 43 08 66 8b 42 12 66 89 43 0c 31 c0 5b c9 c3 55 48
RIP [<ffffffffa02ce51a>] bond_3ad_get_active_agg_info+0x61/0x74 [bonding]
RSP <ffff88009222b800>
---[ end trace d269ed1e3064b31a ]---
Kernel panic - not syncing: Fatal exception in interrupt


We guess it is due to the EFLAGS.AC bit set to 1, which leads to CPU
alignment check. Since lots of unaligned memory operations exists in
the kernel, dom0 could panic anywhere. But we have no idea who set
this AC flag at all.


We found some mail may be related to this problem,

http://lists.xen.org/archives/html/xen-devel/2013-01/msg02285.html
http://old-list-archives.xen.org/archives/html/xen-devel/2011-11/msg00827.html
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=660425

but all these posts reported a domU panic (maybe PV domU) , while mine
is related to dom0


The Xen version is 4.0.1 and dom0 kernel comes from jeremy's git tree

http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commit;h=ae333e97552c81ab10395ad1ffc6d6daaadb144a

It is xen-2.6.32.36 version of jeremy's dom0 git tree, so I guess
maybe it is too old to be related with CPU SMAP feature



Any help is appreciated, thanks.


Best regards,

jerry

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: dom0 alignment check panic due to EFLAGS.AC been set
  2013-06-01  9:27 dom0 alignment check panic due to EFLAGS.AC been set Ma JieYue
@ 2013-06-01 10:59 ` Pasi Kärkkäinen
  2013-06-07  8:57   ` Ma JieYue
  0 siblings, 1 reply; 3+ messages in thread
From: Pasi Kärkkäinen @ 2013-06-01 10:59 UTC (permalink / raw)
  To: Ma JieYue; +Cc: xen-devel

On Sat, Jun 01, 2013 at 05:27:27PM +0800, Ma JieYue wrote:
> 
> We found some mail may be related to this problem,
> 
> http://lists.xen.org/archives/html/xen-devel/2013-01/msg02285.html
> http://old-list-archives.xen.org/archives/html/xen-devel/2011-11/msg00827.html
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=660425
> 
> but all these posts reported a domU panic (maybe PV domU) , while mine
> is related to dom0
> 
> 
> The Xen version is 4.0.1 and dom0 kernel comes from jeremy's git tree
> 

I suggest upgrading your Xen hypervisor.. 4.0.1 is very old,
and not even the latest on 4.0.x branch. 

Currently Xen 4.2.2 is the latest stable release.

> http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commit;h=ae333e97552c81ab10395ad1ffc6d6daaadb144a
> 
> It is xen-2.6.32.36 version of jeremy's dom0 git tree, so I guess
> maybe it is too old to be related with CPU SMAP feature
> 

Jeremy's xen.git is not maintained anymore, so it doesn't have the latest 
xen related fixes and features, and also it's lacking security fixes, 
so I don't recommend using it anymore.

You should switch to mainline Linux 3.x kernel, which should be better in every way. 

> 
> 
> Any help is appreciated, thanks.
> 
> 
> Best regards,
> 
> jerry
> 


-- Pasi

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: dom0 alignment check panic due to EFLAGS.AC been set
  2013-06-01 10:59 ` Pasi Kärkkäinen
@ 2013-06-07  8:57   ` Ma JieYue
  0 siblings, 0 replies; 3+ messages in thread
From: Ma JieYue @ 2013-06-07  8:57 UTC (permalink / raw)
  To: Pasi Kärkkäinen; +Cc: xen-devel

Thank you for your reply.

I admit xen4.0.1 is old, but from other bug reports in xen-devel,

> http://lists.xen.org/archives/html/xen-devel/2013-01/msg02285.html
> http://old-list-archives.xen.org/archives/html/xen-devel/2011-11/msg00827.html
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=660425

I tend to believe it still exists, and from
http://lists.xen.org/archives/html/xen-devel/2013-01/msg02285.html, I
think maybe there hasn't been any specific patch to fix this EFLAGS.AC
problem.

It is obviously this EFLAGS.AC panic is caused by 3 conditions:

1. CPU EFLAGS reg AC bit been set, which I don't know why
2. CR0 AM mask allow this alignment check panic, which is by default behavior
3. Current CPL is 3, in which Dom0 is running

I tried to study the arch/x86/x86_64/entry.S, I guess the
create_bounce_frame is called when Xen switch to dom0, and it did
unset the CPU EFLAGS AC bit

create_bounce_frame:
...
.Lft13: movq  %rax,(%rsi)               # RCX
        /* Rewrite our stack frame and return to guest-OS mode. */
        /* IA32 Ref. Vol. 3: TF, VM, RF and NT flags are cleared on trap. */
        /* Also clear AC: alignment checks shouldn't trigger in kernel mode. */
        movl  $TRAP_syscall,UREGS_entry_vector+8(%rsp)
        andl  $~(X86_EFLAGS_AC|X86_EFLAGS_VM|X86_EFLAGS_RF|\
                 X86_EFLAGS_NT|X86_EFLAGS_TF),UREGS_eflags+8(%rsp)
...

and also alignment check won't happen when running in Xen, which CPL is 0.

Someone also reported in mail list that a 2.6.24 pv kernel never panic
in alignment check, but when he changed to 2.6.32 pv kernel, it
happened often. So, I guess it is a dom0 kernel bug, isn't it?

jeremy, konrad, could you take a look at this?


BRgs
jerry


On Sat, Jun 1, 2013 at 6:59 PM, Pasi Kärkkäinen <pasik@iki.fi> wrote:
> On Sat, Jun 01, 2013 at 05:27:27PM +0800, Ma JieYue wrote:
>>
>> We found some mail may be related to this problem,
>>
>> http://lists.xen.org/archives/html/xen-devel/2013-01/msg02285.html
>> http://old-list-archives.xen.org/archives/html/xen-devel/2011-11/msg00827.html
>> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=660425
>>
>> but all these posts reported a domU panic (maybe PV domU) , while mine
>> is related to dom0
>>
>>
>> The Xen version is 4.0.1 and dom0 kernel comes from jeremy's git tree
>>
>
> I suggest upgrading your Xen hypervisor.. 4.0.1 is very old,
> and not even the latest on 4.0.x branch.
>
> Currently Xen 4.2.2 is the latest stable release.
>
>> http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commit;h=ae333e97552c81ab10395ad1ffc6d6daaadb144a
>>
>> It is xen-2.6.32.36 version of jeremy's dom0 git tree, so I guess
>> maybe it is too old to be related with CPU SMAP feature
>>
>
> Jeremy's xen.git is not maintained anymore, so it doesn't have the latest
> xen related fixes and features, and also it's lacking security fixes,
> so I don't recommend using it anymore.
>
> You should switch to mainline Linux 3.x kernel, which should be better in every way.
>
>>
>>
>> Any help is appreciated, thanks.
>>
>>
>> Best regards,
>>
>> jerry
>>
>
>
> -- Pasi
>

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2013-06-07  8:57 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-06-01  9:27 dom0 alignment check panic due to EFLAGS.AC been set Ma JieYue
2013-06-01 10:59 ` Pasi Kärkkäinen
2013-06-07  8:57   ` Ma JieYue

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).