DomU unresponsive after "Trying to unmap invalid handle"

* DomU unresponsive after "Trying to unmap invalid handle" - Kernel 4.14.8
@ 2017-12-20 23:04 Christoph Moench-Tegeder
  0 siblings, 0 replies; only message in thread
From: Christoph Moench-Tegeder @ 2017-12-20 23:04 UTC (permalink / raw)
  To: xen-devel

Hi,

my system is:
- amd64
- xen from Debian 9 (stable), package versions 4.8.2+xsa245-0+deb9u1
- home-build kernel Linux 4.14.8 (I believe I saw this with something
  like 4.14.1, but I didn't get the logs at that point and had no time
  to investigate, so I just reverted to 4.13.16).
- all DomUs have 'builder="hvm"'.

When running a few (<=10) DomUs (all Linux), I got the following BUG:

[  515.099852] vif vif-1-0 vif1.0: Trying to unmap invalid handle! pending_idx: 0xde
[  515.099911] ------------[ cut here ]------------
[  515.099911] kernel BUG at drivers/net/xen-netback/netback.c:430!
[  515.099961] invalid opcode: 0000 [#1] SMP
[  515.099981] Modules linked in: xt_physdev br_netfilter iptable_filter xen_netback tun xen_blkback bridge stp llc binfmt_misc x86_pkg_temp_thermal coretemp ghash_clmulni_intel pcbc snd_hda_codec_generic snd_hda_intel aesni_intel snd_hda_codec aes_x86_64 snd_hwdep crypto_simd snd_hda_core glue_helper dcdbas cryptd pcspkr snd_pcm evdev input_leds snd_timer sg snd soundcore battery button iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi usbip_host usbip_core sunrpc loop ip_tables x_tables autofs4 raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_tx hid_cherry usbhid hid raid6_pq raid1 raid0 linear raid10 dm_mod dax md_mod crc32c_intel sd_mod i2c_i801 i2c_core e1000e xhci_pci ehci_pci ptp ehci_hcd xhci_hcd pps_core thermal
[  515.100140] CPU: 1 PID: 3106 Comm: vif1.0-q0-deall Not tainted 4.14.8 #1
[  515.100163] Hardware name: Dell Inc. PowerEdge T20/0VD5HY, BIOS A03 11/25/2013
[  515.100188] task: ffff88011bbf50c0 task.stack: ffffc90048224000
[  515.100212] RIP: e030:xenvif_dealloc_kthread+0x2bc/0x560 [xen_netback]
[  515.100236] RSP: e02b:ffffc90048227cd0 EFLAGS: 00010286
[  515.100257] RAX: ffffffff8195d48a RBX: ffffc90048209000 RCX: 0000000000000000
[  515.100281] RDX: ffffc90048227c60 RSI: ffff88012468db98 RDI: ffff88012468db98
[  515.100317] RBP: ffffc90048227f00 R08: 0000000000000000 R09: 0000000000000392
[  515.100353] R10: 00000000000000e6 R11: 0000000081bf0801 R12: ffffc90048212760
[  515.100390] R13: 0000160000000000 R14: ffffc90048209000 R15: aaaaaaaaaaaaaaab
[  515.100429] FS:  0000000000000000(0000) GS:ffff880124680000(0000) knlGS:0000000000000000
[  515.100479] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
[  515.100513] CR2: 00007f3237350b40 CR3: 00000000ab4da000 CR4: 0000000000042660
[  515.100550] Call Trace:
[  515.100583]  ? wait_woken+0x80/0x80
[  515.100615]  ? radix_tree_lookup+0xd/0x10
[  515.100647]  ? irq_to_desc+0x12/0x20
[  515.100678]  ? irq_get_irq_data+0x9/0x20
[  515.100711]  ? notify_remote_via_irq+0x22/0x40
[  515.100743]  ? xen_send_IPI_one+0x2d/0x70
[  515.100776]  ? xen_smp_send_reschedule+0xb/0x10
[  515.100809]  ? resched_curr+0x4e/0x60
[  515.100840]  ? check_preempt_curr+0x53/0x90
[  515.100872]  ? __update_load_avg_se.isra.3+0x14b/0x150
[  515.100906]  ? __update_load_avg_se.isra.3+0x14b/0x150
[  515.100940]  ? xen_mc_flush+0x101/0x130
[  515.100971]  ? xen_load_sp0+0x6a/0x80
[  515.101002]  ? finish_task_switch+0x8c/0x1c0
[  515.101035]  ? __schedule+0x1e7/0x590
[  515.101068]  kthread+0xfe/0x130
[  515.101099]  ? xenvif_map_frontend_data_rings+0xe0/0xe0 [xen_netback]
[  515.101134]  ? kthread_create_on_node+0x40/0x40
[  515.101167]  ret_from_fork+0x25/0x30
[  515.101217] Code: 03 48 0f af ce 48 81 f9 ff 00 00 00 0f 86 44 ff ff ff 0f 0b 48 8b 43 20 48 c7 c6 48 bb 36 a0 48 8b b8 20 03 00 00 e8 14 b7 1c e1 <0f> 0b 44 89 d0 8b 93 38 bf 00 00 44 39 d2 0f 84 d5 00 00 00 41 
[  515.101326] RIP: xenvif_dealloc_kthread+0x2bc/0x560 [xen_netback] RSP: ffffc90048227cd0
[  515.101395] ---[ end trace 133a3c8923a8054d ]---

Shortly after, the DomU behind that interface became unresponsive (but
not immediately, I could ssh into that DomU but nothing more) via
network and virtual console.
Trying to kill the DomU (xl destroy) and normal shutdown of all other
DomUs gives this situation in xl list:
Name                                        ID   Mem VCPUs      State   Time(s)
Domain-0                                     0  4096     4     r-----     224.4
(null)                                       1     9     1     --psrd     179.7
(null)                                       2     8     1     --ps-d      30.3
(null)                                       3     8     1     --ps-d      28.5
(null)                                       4    13     1     --ps-d      27.4
(null)                                       5    10     1     --ps-d      31.9
(null)                                       6     8     1     --ps-d      21.9
(null)                                       7     0     1     --ps-d      74.9
(null)                                       8     6     1     --ps-d      31.6
(null)                                       9     0     2     --ps-d      37.1

This happened twice in a row this evening, so I guess this is more than
just some fluke.

Google didn't turn anything up about the BUG, and the last "Trying to
unmap invalid handle!"-discussion seems to be from 2013.
Anything else I can get you?

Regards,
Christoph

-- 
Spare Space

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] only message in thread