From: Michael Collins <mike@ark-net.org>
To: Alex Braunegg <alex.braunegg@gmail.com>,
'Juergen Gross' <jgross@suse.com>,
xen-devel@lists.xenproject.org
Cc: 'Paul Durrant' <paul.durrant@citrix.com>,
'Wei Liu' <wei.liu2@citrix.com>
Subject: Re: [BUG] kernel bug encountered at drivers/net/xen-netback/netback.c:430!
Date: Thu, 28 Dec 2017 13:05:22 -0500 [thread overview]
Message-ID: <23eb0f3f-590d-f03a-f971-65e86c52fbe4@ark-net.org> (raw)
In-Reply-To: <5a3d6c93.886f620a.2522d.c2fb@mx.google.com>
Alex,
I saw this same issue when running a kernel 4.13+, switched
back to 4.11 and the problem has not resurfaced. I would like to
understand the root cause of this issue.
Mike
On 12/22/2017 3:35 PM, Alex Braunegg wrote:
> Hi all,
>
> Another crash this morning:
>
> vif vif-2-0 vif2.0: Trying to unmap invalid handle! pending_idx: 0x3a
> ------------[ cut here ]------------
> kernel BUG at drivers/net/xen-netback/netback.c:430!
> invalid opcode: 0000 [#1] SMP
> Modules linked in: xt_physdev(E) iptable_filter(E) ip_tables(E) xen_netback(E) nfsd(E) lockd(E) grace(E) nfs_acl(E) auth_rpcgss(E) sunrpc(E) ipmi_si(E) ipmi_msghandler(E) k10temp(E) zfs(POE) zcommon(POE) znvpair(POE) icp(POE) spl(OE) zavl(POE) zunicode(POE) tpm_infineon(E) sp5100_tco(E) i2c_piix4(E) i2c_core(E) ohci_pci(E) ohci_hcd(E) tg3(E) ptp(E) pps_core(E) sg(E) raid1(E) sd_mod(E) ata_generic(E) pata_acpi(E) pata_atiixp(E) ahci(E) libahci(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) dax(E)
> CPU: 0 PID: 14238 Comm: vif2.0-q0-deall Tainted: P OE 4.14.6-1.el6.x86_64 #1
> Hardware name: HP ProLiant MicroServer, BIOS O41 10/01/2013
> task: ffff880059e255c0 task.stack: ffffc90001f64000
> RIP: e030:xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback]
> RSP: e02b:ffffc90001f67c68 EFLAGS: 00010292
> RAX: 0000000000000045 RBX: ffffc90001f55000 RCX: 0000000000000000
> RDX: ffff88007f4146e8 RSI: ffff88007f40db38 RDI: ffff88007f40db38
> RBP: ffffc90001f67e98 R08: 0000000000000372 R09: 0000000000000373
> R10: 0000000000000001 R11: 0000000000000000 R12: ffffc90001f5e730
> R13: 0000160000000000 R14: aaaaaaaaaaaaaaab R15: ffffc9000099bbe8
> FS: 00007f92865d29a0(0000) GS:ffff88007f400000(0000) knlGS:0000000000000000
> CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: ffffffffff600400 CR3: 000000006209c000 CR4: 0000000000000660
> Call Trace:
> ? _raw_spin_unlock_irqrestore+0x11/0x20
> ? error_exit+0x5/0x20
> ? __update_load_avg_cfs_rq+0x176/0x180
> ? xen_mc_flush+0x87/0x120
> ? xen_load_sp0+0x84/0xa0
> ? __switch_to+0x1c1/0x360
> ? finish_task_switch+0x78/0x240
> ? __schedule+0x192/0x496
> ? _raw_spin_lock_irqsave+0x1a/0x3c
> ? _raw_spin_lock_irqsave+0x1a/0x3c
> ? _raw_spin_unlock_irqrestore+0x11/0x20
> xenvif_dealloc_kthread+0x68/0xf0 [xen_netback]
> ? do_wait_intr+0x80/0x80
> ? xenvif_map_frontend_data_rings+0xe0/0xe0 [xen_netback]
> kthread+0x106/0x140
> ? kthread_destroy_worker+0x60/0x60
> ret_from_fork+0x25/0x30
> Code: 89 df 49 83 c4 02 e8 e5 f5 ff ff 4d 39 ec 75 e8 eb a2 48 8b 43 20 48 c7 c6 10 2b 55 a0 48 8b b8 20 03 00 00 31 c0 e8 85 c9 06 e1 <0f> 0b 0f 0b 48 8b 53 20 89 c1 48 c7 c6 48 2b 55 a0 31 c0 45 31
> RIP: xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback] RSP: ffffc90001f67c68
> ---[ end trace 130de0b7e39d0eea ]---
>
> Best regards,
>
> Alex
>
>
>
> -----Original Message-----
> From: Juergen Gross [mailto:jgross@suse.com]
> Sent: Friday, 22 December 2017 5:47 PM
> To: Alex Braunegg; xen-devel@lists.xenproject.org
> Cc: Wei Liu; Paul Durrant
> Subject: Re: [Xen-devel] [BUG] kernel bug encountered at drivers/net/xen-netback/netback.c:430!
>
> On 22/12/17 07:40, Alex Braunegg wrote:
>> Hi all,
>>
>> Experienced the same issue again today:
> Ccing the maintainers.
>
>
> Juergen
>
>> ============================================================================
>> =========
>>
>> vif vif-2-0 vif2.0: Trying to unmap invalid handle! pending_idx: 0x2f
>> ------------[ cut here ]------------
>> kernel BUG at drivers/net/xen-netback/netback.c:430!
>> invalid opcode: 0000 [#1] SMP
>> Modules linked in: xt_physdev(E) iptable_filter(E) ip_tables(E)
>> xen_netback(E) nfsd(E) lockd(E) grace(E) nfs_acl(E) auth_rpcgss(E) sunrpc(E)
>> ipmi_si(E) ipmi_msghandler(E) k10temp(E) zfs(POE) zcommon(POE) znvpair(POE)
>> icp(POE) spl(OE) zavl(POE) zunicode(POE) tpm_infineon(E) sp5100_tco(E)
>> i2c_piix4(E) i2c_core(E) ohci_pci(E) ohci_hcd(E) tg3(E) ptp(E) pps_core(E)
>> sg(E) raid1(E) sd_mod(E) ata_generic(E) pata_acpi(E) pata_atiixp(E) ahci(E)
>> libahci(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) dax(E)
>> CPU: 0 PID: 12636 Comm: vif2.0-q0-deall Tainted: P OE
>> 4.14.6-1.el6.x86_64 #1
>> Hardware name: HP ProLiant MicroServer, BIOS O41 10/01/2013
>> task: ffff880062518000 task.stack: ffffc90004f88000
>> RIP: e030:xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback]
>> RSP: e02b:ffffc90004f8bc68 EFLAGS: 00010292
>> RAX: 0000000000000045 RBX: ffffc90000fcd000 RCX: 0000000000000000
>> RDX: ffff88007f4146e8 RSI: ffff88007f40db38 RDI: ffff88007f40db38
>> RBP: ffffc90004f8be98 R08: 000000000000037d R09: 000000000000037e
>> R10: 0000000000000001 R11: 0000000000000000 R12: ffffc90000fd6730
>> R13: 0000160000000000 R14: aaaaaaaaaaaaaaab R15: ffffc9000099bbe8
>> FS: 00007f40c63639a0(0000) GS:ffff88007f400000(0000) knlGS:0000000000000000
>> CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: ffffffffff600400 CR3: 000000006375f000 CR4: 0000000000000660
>> Call Trace:
>> ? error_exit+0x5/0x20
>> ? __update_load_avg_cfs_rq+0x176/0x180
>> ? xen_mc_flush+0x87/0x120
>> ? xen_load_sp0+0x84/0xa0
>> ? __switch_to+0x1c1/0x360
>> ? finish_task_switch+0x78/0x240
>> ? __schedule+0x192/0x496
>> ? _raw_spin_lock_irqsave+0x1a/0x3c
>> ? _raw_spin_lock_irqsave+0x1a/0x3c
>> ? _raw_spin_unlock_irqrestore+0x11/0x20
>> xenvif_dealloc_kthread+0x68/0xf0 [xen_netback]
>> ? do_wait_intr+0x80/0x80
>> ? xenvif_map_frontend_data_rings+0xe0/0xe0 [xen_netback]
>> kthread+0x106/0x140
>> ? kthread_destroy_worker+0x60/0x60
>> ? kthread_destroy_worker+0x60/0x60
>> ret_from_fork+0x25/0x30
>> Code: 89 df 49 83 c4 02 e8 e5 f5 ff ff 4d 39 ec 75 e8 eb a2 48 8b 43 20 48
>> c7 c6 10 5b 55 a0 48 8b b8 20 03 00 00 31 c0 e8 85 99 06 e1 <0f> 0b 0f 0b 48
>> 8b 53 20 89 c1 48 c7 c6 48 5b 55 a0 31 c0 45 31
>> RIP: xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback] RSP:
>> ffffc90004f8bc68
>> ---[ end trace 010682c76619a1bd ]---
>>
>> ============================================================================
>> =========
>>
>> Best regards,
>>
>> Alex
>>
>> -----Original Message-----
>> From: Alex Braunegg [mailto:alex.braunegg@gmail.com]
>> Sent: Thursday, 21 December 2017 8:04 AM
>> To: 'xen-devel@lists.xenproject.org'
>> Subject: [BUG] kernel bug encountered at
>> drivers/net/xen-netback/netback.c:430!
>>
>> Hi all,
>>
>> I experienced the following bug whilst using a Xen VM. What happened was
>> that this morning a single Xen VM suddenly terminated without cause with the
>> following being logged in dmesg.
>>
>> Only 1 VM experienced an issue (out of 2 which were running), the other
>> remained up and fully functional until I attempted to restart the crashed VM
>> which triggered the kernel bug.
>>
>> Kernel: 4.14.6
>> Xen: 4.8.2
>>
>> ============================================================================
>> =========
>>
>> vif vif-2-0 vif2.0: Trying to unmap invalid handle! pending_idx: 0x3f
>> ------------[ cut here ]------------
>> kernel BUG at drivers/net/xen-netback/netback.c:430!
>> invalid opcode: 0000 [#1] SMP
>> Modules linked in: xt_physdev(E) iptable_filter(E) ip_tables(E)
>> xen_netback(E) nfsd(E) lockd(E) grace(E) nfs_acl(E) auth_rpcgss(E) sunrpc(E)
>> ipmi_si(E) ipmi_msghandler(E) zfs(POE) zcommon(POE) znvpair(POE) icp(POE)
>> spl(OE) zavl(POE) zunicode(POE) k10temp(E) tpm_infineon(E) sp5100_tco(E)
>> i2c_piix4(E) i2c_core(E) ohci_pci(E) ohci_hcd(E) tg3(E) ptp(E) pps_core(E)
>> sg(E) raid1(E) sd_mod(E) ata_generic(E) pata_acpi(E) pata_atiixp(E) ahci(E)
>> libahci(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) dax(E)
>> CPU: 0 PID: 13163 Comm: vif2.0-q0-deall Tainted: P OE
>> 4.14.6-1.el6.x86_64 #1
>> Hardware name: HP ProLiant MicroServer, BIOS O41 10/01/2013
>> task: ffff8800595cc980 task.stack: ffffc900028e0000
>> RIP: e030:xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback]
>> RSP: e02b:ffffc900028e3c68 EFLAGS: 00010292
>> RAX: 0000000000000045 RBX: ffffc90002969000 RCX: 0000000000000000
>> RDX: ffff88007f4146e8 RSI: ffff88007f40db38 RDI: ffff88007f40db38
>> RBP: ffffc900028e3e98 R08: 000000000000037b R09: 000000000000037c
>> R10: 0000000000000001 R11: 0000000000000000 R12: ffffc90002972730
>> R13: 0000160000000000 R14: aaaaaaaaaaaaaaab R15: ffffc9000099bbe8
>> FS: 00007fee260ff9a0(0000) GS:ffff88007f400000(0000) knlGS:0000000000000000
>> CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: ffffffffff600400 CR3: 0000000062815000 CR4: 0000000000000660
>> Call Trace:
>> ? error_exit+0x5/0x20
>> ? __update_load_avg_cfs_rq+0x176/0x180
>> ? xen_mc_flush+0x87/0x120
>> ? xen_load_sp0+0x84/0xa0
>> ? __switch_to+0x1c1/0x360
>> ? finish_task_switch+0x78/0x240
>> ? __schedule+0x192/0x496
>> ? _raw_spin_lock_irqsave+0x1a/0x3c
>> ? _raw_spin_lock_irqsave+0x1a/0x3c
>> ? _raw_spin_unlock_irqrestore+0x11/0x20
>> xenvif_dealloc_kthread+0x68/0xf0 [xen_netback]
>> ? do_wait_intr+0x80/0x80
>> ? xenvif_map_frontend_data_rings+0xe0/0xe0 [xen_netback]
>> kthread+0x106/0x140
>> ? kthread_destroy_worker+0x60/0x60
>> ? kthread_destroy_worker+0x60/0x60
>> ret_from_fork+0x25/0x30
>> Code: 89 df 49 83 c4 02 e8 e5 f5 ff ff 4d 39 ec 75 e8 eb a2 48 8b 43 20 48
>> c7 c6 10 3b 55 a0 48 8b b8 20 03 00 00 31 c0 e8 85 b9 06 e1 <0f> 0b 0f 0b 48
>> 8b 53 20 89 c1 48 c7 c6 48 3b 55 a0 31 c0 45 31
>> RIP: xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback] RSP:
>> ffffc900028e3c68
>> ---[ end trace 7d827dae67002ffc ]---
>>
>> ============================================================================
>> =========
>>
>> The section of relevant kernel code is:
>>
>> ============================================================================
>> =========
>>
>> static inline void xenvif_grant_handle_reset(struct xenvif_queue *queue,
>> u16 pending_idx)
>> {
>> if (unlikely(queue->grant_tx_handle[pending_idx] ==
>> NETBACK_INVALID_HANDLE)) {
>> netdev_err(queue->vif->dev,
>> "Trying to unmap invalid handle! pending_idx:
>> 0x%x\n",
>> pending_idx);
>> BUG();
>> }
>> queue->grant_tx_handle[pending_idx] = NETBACK_INVALID_HANDLE;
>> }
>>
>> ============================================================================
>> =========
>>
>> In an attempt to recover from this situation I restarted / destroyed (xl
>> restart <vmname> / xl destroy <vmname>) the VM to recover it's state and the
>> following error messages were logged at the console:
>>
>> ============================================================================
>> =========
>>
>> libxl: error: libxl_exec.c:129:libxl_report_child_exitstatus:
>> /etc/xen/scripts/block remove [25271] died due to fatal signal Segmentation
>> fault
>> libxl: error: libxl_device.c:1080:device_backend_callback: unable to remove
>> device with path /local/domain/0/backend/vif/2/0
>> libxl: error: libxl.c:1647:devices_destroy_cb: libxl__devices_destroy failed
>> for 2
>>
>> ============================================================================
>> =========
>>
>> After which the physical system hung, then the physical system restarted
>> with nothing else logged and everything came back OK & operational including
>> the VM that crashed.
>>
>> Further details (xl dmesg, xl info) attached.
>>
>> Best regards,
>>
>> Alex Braunegg
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xenproject.org
>> https://lists.xenproject.org/mailman/listinfo/xen-devel
>>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
next prev parent reply other threads:[~2017-12-28 18:05 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-12-22 6:40 [BUG] kernel bug encountered at drivers/net/xen-netback/netback.c:430! Alex Braunegg
2017-12-22 6:47 ` Juergen Gross
2017-12-22 20:35 ` Alex Braunegg
2017-12-28 18:05 ` Michael Collins [this message]
2017-12-28 19:31 ` Alex Braunegg
2018-01-03 13:54 ` Paul Durrant
2018-01-03 18:43 ` Alex Braunegg
2018-01-03 20:33 ` Christoph Moench-Tegeder
2018-01-04 10:29 ` Paul Durrant
2018-01-07 22:19 ` 'Christoph Moench-Tegeder'
2018-01-08 9:35 ` Paul Durrant
2018-01-09 9:44 ` Paul Durrant
2018-01-10 12:52 ` Paul Durrant
2018-01-10 13:58 ` Paul Durrant
2018-01-10 17:53 ` 'Christoph Moench-Tegeder'
2018-01-10 19:55 ` Alex Braunegg
-- strict thread matches above, loose matches on Subject: below --
2017-12-20 21:03 Alex Braunegg
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=23eb0f3f-590d-f03a-f971-65e86c52fbe4@ark-net.org \
--to=mike@ark-net.org \
--cc=alex.braunegg@gmail.com \
--cc=jgross@suse.com \
--cc=paul.durrant@citrix.com \
--cc=wei.liu2@citrix.com \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).