Re: [BUG] kernel bug encountered at drivers/net/xen-netback/netback.c:430!

From: Michael Collins <mike@ark-net.org>
To: Alex Braunegg <alex.braunegg@gmail.com>,
	'Juergen Gross' <jgross@suse.com>,
	xen-devel@lists.xenproject.org
Cc: 'Paul Durrant' <paul.durrant@citrix.com>,
	'Wei Liu' <wei.liu2@citrix.com>
Subject: Re: [BUG] kernel bug encountered at drivers/net/xen-netback/netback.c:430!
Date: Thu, 28 Dec 2017 13:05:22 -0500	[thread overview]
Message-ID: <23eb0f3f-590d-f03a-f971-65e86c52fbe4@ark-net.org> (raw)
In-Reply-To: <5a3d6c93.886f620a.2522d.c2fb@mx.google.com>

Alex,

          I saw this same issue when running a kernel 4.13+, switched 
back to 4.11 and the problem has not resurfaced.  I would like to 
understand the root cause of this issue.

Mike

On 12/22/2017 3:35 PM, Alex Braunegg wrote:
> Hi all,
>
> Another crash this morning:
>
> vif vif-2-0 vif2.0: Trying to unmap invalid handle! pending_idx: 0x3a
> ------------[ cut here ]------------
> kernel BUG at drivers/net/xen-netback/netback.c:430!
> invalid opcode: 0000 [#1] SMP
> Modules linked in: xt_physdev(E) iptable_filter(E) ip_tables(E) xen_netback(E) nfsd(E) lockd(E) grace(E) nfs_acl(E) auth_rpcgss(E) sunrpc(E) ipmi_si(E) ipmi_msghandler(E) k10temp(E) zfs(POE) zcommon(POE) znvpair(POE) icp(POE) spl(OE) zavl(POE) zunicode(POE) tpm_infineon(E) sp5100_tco(E) i2c_piix4(E) i2c_core(E) ohci_pci(E) ohci_hcd(E) tg3(E) ptp(E) pps_core(E) sg(E) raid1(E) sd_mod(E) ata_generic(E) pata_acpi(E) pata_atiixp(E) ahci(E) libahci(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) dax(E)
> CPU: 0 PID: 14238 Comm: vif2.0-q0-deall Tainted: P           OE   4.14.6-1.el6.x86_64 #1
> Hardware name: HP ProLiant MicroServer, BIOS O41     10/01/2013
> task: ffff880059e255c0 task.stack: ffffc90001f64000
> RIP: e030:xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback]
> RSP: e02b:ffffc90001f67c68 EFLAGS: 00010292
> RAX: 0000000000000045 RBX: ffffc90001f55000 RCX: 0000000000000000
> RDX: ffff88007f4146e8 RSI: ffff88007f40db38 RDI: ffff88007f40db38
> RBP: ffffc90001f67e98 R08: 0000000000000372 R09: 0000000000000373
> R10: 0000000000000001 R11: 0000000000000000 R12: ffffc90001f5e730
> R13: 0000160000000000 R14: aaaaaaaaaaaaaaab R15: ffffc9000099bbe8
> FS:  00007f92865d29a0(0000) GS:ffff88007f400000(0000) knlGS:0000000000000000
> CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: ffffffffff600400 CR3: 000000006209c000 CR4: 0000000000000660
> Call Trace:
>   ? _raw_spin_unlock_irqrestore+0x11/0x20
>   ? error_exit+0x5/0x20
>   ? __update_load_avg_cfs_rq+0x176/0x180
>   ? xen_mc_flush+0x87/0x120
>   ? xen_load_sp0+0x84/0xa0
>   ? __switch_to+0x1c1/0x360
>   ? finish_task_switch+0x78/0x240
>   ? __schedule+0x192/0x496
>   ? _raw_spin_lock_irqsave+0x1a/0x3c
>   ? _raw_spin_lock_irqsave+0x1a/0x3c
>   ? _raw_spin_unlock_irqrestore+0x11/0x20
>   xenvif_dealloc_kthread+0x68/0xf0 [xen_netback]
>   ? do_wait_intr+0x80/0x80
>   ? xenvif_map_frontend_data_rings+0xe0/0xe0 [xen_netback]
>   kthread+0x106/0x140
>   ? kthread_destroy_worker+0x60/0x60
>   ret_from_fork+0x25/0x30
> Code: 89 df 49 83 c4 02 e8 e5 f5 ff ff 4d 39 ec 75 e8 eb a2 48 8b 43 20 48 c7 c6 10 2b 55 a0 48 8b b8 20 03 00 00 31 c0 e8 85 c9 06 e1 <0f> 0b 0f 0b 48 8b 53 20 89 c1 48 c7 c6 48 2b 55 a0 31 c0 45 31
> RIP: xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback] RSP: ffffc90001f67c68
> ---[ end trace 130de0b7e39d0eea ]---
>
> Best regards,
>
> Alex
>
>
>
> -----Original Message-----
> From: Juergen Gross [mailto:jgross@suse.com]
> Sent: Friday, 22 December 2017 5:47 PM
> To: Alex Braunegg; xen-devel@lists.xenproject.org
> Cc: Wei Liu; Paul Durrant
> Subject: Re: [Xen-devel] [BUG] kernel bug encountered at drivers/net/xen-netback/netback.c:430!
>
> On 22/12/17 07:40, Alex Braunegg wrote:
>> Hi all,
>>
>> Experienced the same issue again today:
> Ccing the maintainers.
>
>
> Juergen
>
>> ============================================================================
>> =========
>>
>> vif vif-2-0 vif2.0: Trying to unmap invalid handle! pending_idx: 0x2f
>> ------------[ cut here ]------------
>> kernel BUG at drivers/net/xen-netback/netback.c:430!
>> invalid opcode: 0000 [#1] SMP
>> Modules linked in: xt_physdev(E) iptable_filter(E) ip_tables(E)
>> xen_netback(E) nfsd(E) lockd(E) grace(E) nfs_acl(E) auth_rpcgss(E) sunrpc(E)
>> ipmi_si(E) ipmi_msghandler(E) k10temp(E) zfs(POE) zcommon(POE) znvpair(POE)
>> icp(POE) spl(OE) zavl(POE) zunicode(POE) tpm_infineon(E) sp5100_tco(E)
>> i2c_piix4(E) i2c_core(E) ohci_pci(E) ohci_hcd(E) tg3(E) ptp(E) pps_core(E)
>> sg(E) raid1(E) sd_mod(E) ata_generic(E) pata_acpi(E) pata_atiixp(E) ahci(E)
>> libahci(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) dax(E)
>> CPU: 0 PID: 12636 Comm: vif2.0-q0-deall Tainted: P           OE
>> 4.14.6-1.el6.x86_64 #1
>> Hardware name: HP ProLiant MicroServer, BIOS O41     10/01/2013
>> task: ffff880062518000 task.stack: ffffc90004f88000
>> RIP: e030:xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback]
>> RSP: e02b:ffffc90004f8bc68 EFLAGS: 00010292
>> RAX: 0000000000000045 RBX: ffffc90000fcd000 RCX: 0000000000000000
>> RDX: ffff88007f4146e8 RSI: ffff88007f40db38 RDI: ffff88007f40db38
>> RBP: ffffc90004f8be98 R08: 000000000000037d R09: 000000000000037e
>> R10: 0000000000000001 R11: 0000000000000000 R12: ffffc90000fd6730
>> R13: 0000160000000000 R14: aaaaaaaaaaaaaaab R15: ffffc9000099bbe8
>> FS:  00007f40c63639a0(0000) GS:ffff88007f400000(0000) knlGS:0000000000000000
>> CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: ffffffffff600400 CR3: 000000006375f000 CR4: 0000000000000660
>> Call Trace:
>>   ? error_exit+0x5/0x20
>>   ? __update_load_avg_cfs_rq+0x176/0x180
>>   ? xen_mc_flush+0x87/0x120
>>   ? xen_load_sp0+0x84/0xa0
>>   ? __switch_to+0x1c1/0x360
>>   ? finish_task_switch+0x78/0x240
>>   ? __schedule+0x192/0x496
>>   ? _raw_spin_lock_irqsave+0x1a/0x3c
>>   ? _raw_spin_lock_irqsave+0x1a/0x3c
>>   ? _raw_spin_unlock_irqrestore+0x11/0x20
>>   xenvif_dealloc_kthread+0x68/0xf0 [xen_netback]
>>   ? do_wait_intr+0x80/0x80
>>   ? xenvif_map_frontend_data_rings+0xe0/0xe0 [xen_netback]
>>   kthread+0x106/0x140
>>   ? kthread_destroy_worker+0x60/0x60
>>   ? kthread_destroy_worker+0x60/0x60
>>   ret_from_fork+0x25/0x30
>> Code: 89 df 49 83 c4 02 e8 e5 f5 ff ff 4d 39 ec 75 e8 eb a2 48 8b 43 20 48
>> c7 c6 10 5b 55 a0 48 8b b8 20 03 00 00 31 c0 e8 85 99 06 e1 <0f> 0b 0f 0b 48
>> 8b 53 20 89 c1 48 c7 c6 48 5b 55 a0 31 c0 45 31
>> RIP: xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback] RSP:
>> ffffc90004f8bc68
>> ---[ end trace 010682c76619a1bd ]---
>>
>> ============================================================================
>> =========
>>
>> Best regards,
>>
>> Alex
>>
>> -----Original Message-----
>> From: Alex Braunegg [mailto:alex.braunegg@gmail.com]
>> Sent: Thursday, 21 December 2017 8:04 AM
>> To: 'xen-devel@lists.xenproject.org'
>> Subject: [BUG] kernel bug encountered at
>> drivers/net/xen-netback/netback.c:430!
>>
>> Hi all,
>>
>> I experienced the following bug whilst using a Xen VM. What happened was
>> that this morning a single Xen VM suddenly terminated without cause with the
>> following being logged in dmesg.
>>
>> Only 1 VM experienced an issue (out of 2 which were running), the other
>> remained up and fully functional until I attempted to restart the crashed VM
>> which triggered the kernel bug.
>>
>> Kernel:	4.14.6
>> Xen:		4.8.2
>>
>> ============================================================================
>> =========
>>
>> vif vif-2-0 vif2.0: Trying to unmap invalid handle! pending_idx: 0x3f
>> ------------[ cut here ]------------
>> kernel BUG at drivers/net/xen-netback/netback.c:430!
>> invalid opcode: 0000 [#1] SMP
>> Modules linked in: xt_physdev(E) iptable_filter(E) ip_tables(E)
>> xen_netback(E) nfsd(E) lockd(E) grace(E) nfs_acl(E) auth_rpcgss(E) sunrpc(E)
>> ipmi_si(E) ipmi_msghandler(E) zfs(POE) zcommon(POE) znvpair(POE) icp(POE)
>> spl(OE) zavl(POE) zunicode(POE) k10temp(E) tpm_infineon(E) sp5100_tco(E)
>> i2c_piix4(E) i2c_core(E) ohci_pci(E) ohci_hcd(E) tg3(E) ptp(E) pps_core(E)
>> sg(E) raid1(E) sd_mod(E) ata_generic(E) pata_acpi(E) pata_atiixp(E) ahci(E)
>> libahci(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) dax(E)
>> CPU: 0 PID: 13163 Comm: vif2.0-q0-deall Tainted: P           OE
>> 4.14.6-1.el6.x86_64 #1
>> Hardware name: HP ProLiant MicroServer, BIOS O41     10/01/2013
>> task: ffff8800595cc980 task.stack: ffffc900028e0000
>> RIP: e030:xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback]
>> RSP: e02b:ffffc900028e3c68 EFLAGS: 00010292
>> RAX: 0000000000000045 RBX: ffffc90002969000 RCX: 0000000000000000
>> RDX: ffff88007f4146e8 RSI: ffff88007f40db38 RDI: ffff88007f40db38
>> RBP: ffffc900028e3e98 R08: 000000000000037b R09: 000000000000037c
>> R10: 0000000000000001 R11: 0000000000000000 R12: ffffc90002972730
>> R13: 0000160000000000 R14: aaaaaaaaaaaaaaab R15: ffffc9000099bbe8
>> FS:  00007fee260ff9a0(0000) GS:ffff88007f400000(0000) knlGS:0000000000000000
>> CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: ffffffffff600400 CR3: 0000000062815000 CR4: 0000000000000660
>> Call Trace:
>>   ? error_exit+0x5/0x20
>>   ? __update_load_avg_cfs_rq+0x176/0x180
>>   ? xen_mc_flush+0x87/0x120
>>   ? xen_load_sp0+0x84/0xa0
>>   ? __switch_to+0x1c1/0x360
>>   ? finish_task_switch+0x78/0x240
>>   ? __schedule+0x192/0x496
>>   ? _raw_spin_lock_irqsave+0x1a/0x3c
>>   ? _raw_spin_lock_irqsave+0x1a/0x3c
>>   ? _raw_spin_unlock_irqrestore+0x11/0x20
>>   xenvif_dealloc_kthread+0x68/0xf0 [xen_netback]
>>   ? do_wait_intr+0x80/0x80
>>   ? xenvif_map_frontend_data_rings+0xe0/0xe0 [xen_netback]
>>   kthread+0x106/0x140
>>   ? kthread_destroy_worker+0x60/0x60
>>   ? kthread_destroy_worker+0x60/0x60
>>   ret_from_fork+0x25/0x30
>> Code: 89 df 49 83 c4 02 e8 e5 f5 ff ff 4d 39 ec 75 e8 eb a2 48 8b 43 20 48
>> c7 c6 10 3b 55 a0 48 8b b8 20 03 00 00 31 c0 e8 85 b9 06 e1 <0f> 0b 0f 0b 48
>> 8b 53 20 89 c1 48 c7 c6 48 3b 55 a0 31 c0 45 31
>> RIP: xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback] RSP:
>> ffffc900028e3c68
>> ---[ end trace 7d827dae67002ffc ]---
>>
>> ============================================================================
>> =========
>>
>> The section of relevant kernel code is:
>>
>> ============================================================================
>> =========
>>
>> static inline void xenvif_grant_handle_reset(struct xenvif_queue *queue,
>>                                               u16 pending_idx)
>> {
>>          if (unlikely(queue->grant_tx_handle[pending_idx] ==
>>                       NETBACK_INVALID_HANDLE)) {
>>                  netdev_err(queue->vif->dev,
>>                             "Trying to unmap invalid handle! pending_idx:
>> 0x%x\n",
>>                             pending_idx);
>>                  BUG();
>>          }
>>          queue->grant_tx_handle[pending_idx] = NETBACK_INVALID_HANDLE;
>> }
>>
>> ============================================================================
>> =========
>>
>> In an attempt to recover from this situation I restarted / destroyed (xl
>> restart <vmname> / xl destroy <vmname>) the VM to recover it's state and the
>> following error messages were logged at the console:
>>
>> ============================================================================
>> =========
>>
>> libxl: error: libxl_exec.c:129:libxl_report_child_exitstatus:
>> /etc/xen/scripts/block remove [25271] died due to fatal signal Segmentation
>> fault
>> libxl: error: libxl_device.c:1080:device_backend_callback: unable to remove
>> device with path /local/domain/0/backend/vif/2/0
>> libxl: error: libxl.c:1647:devices_destroy_cb: libxl__devices_destroy failed
>> for 2
>>
>> ============================================================================
>> =========
>>
>> After which the physical system hung, then the physical system restarted
>> with nothing else logged and everything came back OK & operational including
>> the VM that crashed.
>>
>> Further details (xl dmesg, xl info) attached.
>>
>> Best regards,
>>
>> Alex Braunegg
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xenproject.org
>> https://lists.xenproject.org/mailman/listinfo/xen-devel
>>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel