All of lore.kernel.org
 help / color / mirror / Atom feed
From: annie li <annie.li@oracle.com>
To: Vasily Evseenko <svpcom@gmail.com>
Cc: xen-devel@lists.xen.org
Subject: Re: domU crash with kernel BUG at drivers/net/xen-netfront.c:305
Date: Thu, 02 Jan 2014 13:09:35 +0800	[thread overview]
Message-ID: <52C4F48F.5090003@oracle.com> (raw)
In-Reply-To: <52BD5FDD.6060009@gmail.com>


On 2013/12/27 19:09, Vasily Evseenko wrote:
> Hi,
>
> I've got domU crash (~ every 1-2 days under high network (tcp) load)
> with message:
>
> -----
> [2013-12-26 03:53:18] kernel BUG at drivers/net/xen-netfront.c:305!
> [2013-12-26 03:53:18] invalid opcode: 0000 [#1] SMP
> [2013-12-26 03:53:18] Modules linked in: ipt_REJECT iptable_filter
> xt_set xt_REDIRECT iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4
> nf_nat_ipv4 nf_nat
> ip_tables ip_set_hash_net ip_set_hash_ip ip_set nfnetlink ip6t_REJECT
> nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter
> ip6_table
> s ipv6 ext3 jbd xen_netfront coretemp hwmon crc32_pclmul crc32c_intel
> ghash_clmulni_intel microcode pcspkr ext4 jbd2 mbcache aesni_intel
> ablk_helper c
> ryptd lrw gf128mul glue_helper aes_x86_64 xen_blkfront dm_mirror
> dm_region_hash dm_log dm_mod
> [2013-12-26 03:53:18] CPU: 0 PID: 15126 Comm: python Not tainted
> 3.10.25-11.x86_64 #1
> [2013-12-26 03:53:18] task: ffff8801e5d68ac0 ti: ffff8801e7392000
> task.ti: ffff8801e7392000
> [2013-12-26 03:53:18] RIP: e030:[<ffffffffa015d637>]
> [<ffffffffa015d637>] xennet_alloc_rx_buffers+0x347/0x360 [xen_netfront]
> [2013-12-26 03:53:18] RSP: e02b:ffff8801f2e03ce0  EFLAGS: 00010282
> [2013-12-26 03:53:18] RAX: 00000000000001d4 RBX: ffff8801e5438800 RCX:
> 0000000000000001
> [2013-12-26 03:53:18] RDX: 000000000000002a RSI: 0000000000000000 RDI:
> 0000000000002200
> [2013-12-26 03:53:18] RBP: ffff8801f2e03d40 R08: 0000000000000000 R09:
> 0000000000001000
> [2013-12-26 03:53:18] R10: ffff8801000083c0 R11: dead000000200200 R12:
> 0000000000000220
> [2013-12-26 03:53:18] R13: ffff8801e6eec0c0 R14: 000000000000002a R15:
> 000000000239642a
> [2013-12-26 03:53:18] FS:  00007f4cf48d57e0(0000)
> GS:ffff8801f2e00000(0000) knlGS:0000000000000000
> [2013-12-26 03:53:18] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
> [2013-12-26 03:53:18] CR2: ffffffffff600400 CR3: 00000001e0db3000 CR4:
> 0000000000042660
> [2013-12-26 03:53:18] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [2013-12-26 03:53:18] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> 0000000000000400
> [2013-12-26 03:53:18] Stack:
> [2013-12-26 03:53:18]  ffff8801f2e03df0 02396417e5438000
> ffff8801e5439d58 ffff8801e54394f0
> [2013-12-26 03:53:18]  ffff8801e5438000 002affff00000013
> ffff8801f2e03d40 ffff8801f2e03db0
> [2013-12-26 03:53:18]  0000000000000010 ffff8800655e6ac0
> ffff8801e5438800 ffff8801e511a000
> [2013-12-26 03:53:18] Call Trace:
> [2013-12-26 03:53:18]  <IRQ>
> [2013-12-26 03:53:18]  [<ffffffffa015dc44>] xennet_poll+0x2f4/0x630
> [xen_netfront]
> [2013-12-26 03:53:18]  [<ffffffff810640a9>] ? raise_softirq_irqoff+0x9/0x50
> [2013-12-26 03:53:18]  [<ffffffff8152050c>] ? dev_kfree_skb_irq+0x5c/0x70
> [2013-12-26 03:53:18]  [<ffffffff810e4fb9>] ?
> handle_irq_event_percpu+0xc9/0x210
> [2013-12-26 03:53:18]  [<ffffffff81528022>] net_rx_action+0x112/0x290
> [2013-12-26 03:53:18]  [<ffffffff810e514d>] ? handle_irq_event+0x4d/0x70
> [2013-12-26 03:53:18]  [<ffffffff81063c97>] __do_softirq+0xf7/0x270
> [2013-12-26 03:53:18]  [<ffffffff81600edc>] call_softirq+0x1c/0x30
> [2013-12-26 03:53:18]  [<ffffffff81014505>] do_softirq+0x65/0xa0
> [2013-12-26 03:53:18]  [<ffffffff810639c5>] irq_exit+0xc5/0xd0
> [2013-12-26 03:53:18]  [<ffffffff81351e45>] xen_evtchn_do_upcall+0x35/0x50
> [2013-12-26 03:53:18]  [<ffffffff81600f3e>]
> xen_do_hypervisor_callback+0x1e/0x30
> [2013-12-26 03:53:18]  <EOI>
> [2013-12-26 03:53:18] Code: 8b 35 ee f9 bb e1 48 8d bb 08 0d 00 00 48 83
> c6 64 e8 2e f2 f0 e0 8b 83 ec 0c 00 00 31 d2 89 c1 d1 e9 39 d1 76 9e e9
> 5a ff ff ff <0f> 0b eb fe 0f 0b 0f 1f 00 eb fb 66 66 66 66 66 2e 0f 1f
> 84 00
> [2013-12-26 03:53:18] RIP  [<ffffffffa015d637>]
> xennet_alloc_rx_buffers+0x347/0x360 [xen_netfront]
> [2013-12-26 03:53:18]  RSP <ffff8801f2e03ce0>
> ------------
>
> dom0 and domU kernels are vanilla 3.10.25
> host server has 4 cores x 2 threads with mapping: 4 - dom0, 2 - domU, 2
> - domU
> i've tried xen versions: 4.2.3 and 4.3.1
> also i've tried to disable offloaing on domU:  ethtool -K eth0 tx off
> tso off gso off   ----  no effects
>
> domU's are under high TCP load (a lot of small tcp connections (web server))
> sometimes  i've got on dom0:
> ---
> [2013-12-26 00:16:30] (XEN) grant_table.c:289:d0 Increased maptrack size
> to 2 frames
> [2013-12-26 03:53:18] (XEN) grant_table.c:1858:d0 Bad grant reference
> 99221507
> [2013-12-26 03:53:18] (XEN) grant_table.c:1858:d0 Bad grant reference
> 43646979
> [2013-12-26 03:53:18] (XEN) grant_table.c:1858:d0 Bad grant reference
> 43646979
> [2013-12-26 03:53:18] (XEN) grant_table.c:1858:d0 Bad grant reference
> 99221507
> [2013-12-26 06:15:14] (XEN) grant_table.c:1858:d0 Bad grant reference
> 43646979
> [2013-12-26 06:15:14] (XEN) grant_table.c:1858:d0 Bad grant reference
> 99221507
> [2013-12-26 06:15:14] (XEN) grant_table.c:1858:d0 Bad grant reference
> 99221507
> [2013-12-26 06:15:14] (XEN) grant_table.c:1858:d0 Bad grant reference
> 99221507
>
> ---
>
> It seems the root of problem in dom0 messages above. Is it HW failure or
> some internal kernel structures overflow?
 From the stack, it reminds me this issue is very likely same with the 
one which has been fixed. There is something wrong with counting slots 
in netback, and then responses overlapps request in the ring, and 
grantcopy gets wrong grant reference and throws out error in 
grant_table.c. See 
http://lists.xen.org/archives/html/xen-devel/2013-09/msg01143.html
There were some back and forth work for this issue, but seems the fix 
patch exists since v3.12-rc4. Would you like to have a try with newer 
kernel version?

Thanks
Annie

  parent reply	other threads:[~2014-01-02  5:09 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-27 11:09 domU crash with kernel BUG at drivers/net/xen-netfront.c:305 Vasily Evseenko
2013-12-27 11:53 ` Wei Liu
2013-12-27 12:21   ` Vasily Evseenko
2013-12-27 14:20     ` Wei Liu
2013-12-31 12:56     ` William Dauchy
2013-12-31 14:23       ` Vasily Evseenko
2014-01-02  5:09 ` annie li [this message]
2014-01-02 11:40   ` Pasi Kärkkäinen
2014-01-02 12:01   ` Wei Liu
2014-01-03  6:15     ` annie li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52C4F48F.5090003@oracle.com \
    --to=annie.li@oracle.com \
    --cc=svpcom@gmail.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.