xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [BUG] kernel bug encountered at drivers/net/xen-netback/netback.c:430!
@ 2017-12-20 21:03 Alex Braunegg
  0 siblings, 0 replies; 17+ messages in thread
From: Alex Braunegg @ 2017-12-20 21:03 UTC (permalink / raw)
  To: xen-devel

[-- Attachment #1: Type: text/plain, Size: 4719 bytes --]

Hi all,

I experienced the following bug whilst using a Xen VM. What happened was
that this morning a single Xen VM suddenly terminated without cause with the
following being logged in dmesg. 

Only 1 VM experienced an issue (out of 2 which were running), the other
remained up and fully functional until I attempted to restart the crashed VM
which triggered the kernel bug.

Kernel:	4.14.6
Xen:		4.8.2

============================================================================
=========

vif vif-2-0 vif2.0: Trying to unmap invalid handle! pending_idx: 0x3f
------------[ cut here ]------------
kernel BUG at drivers/net/xen-netback/netback.c:430!
invalid opcode: 0000 [#1] SMP
Modules linked in: xt_physdev(E) iptable_filter(E) ip_tables(E)
xen_netback(E) nfsd(E) lockd(E) grace(E) nfs_acl(E) auth_rpcgss(E) sunrpc(E)
ipmi_si(E) ipmi_msghandler(E) zfs(POE) zcommon(POE) znvpair(POE) icp(POE)
spl(OE) zavl(POE) zunicode(POE) k10temp(E) tpm_infineon(E) sp5100_tco(E)
i2c_piix4(E) i2c_core(E) ohci_pci(E) ohci_hcd(E) tg3(E) ptp(E) pps_core(E)
sg(E) raid1(E) sd_mod(E) ata_generic(E) pata_acpi(E) pata_atiixp(E) ahci(E)
libahci(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) dax(E)
CPU: 0 PID: 13163 Comm: vif2.0-q0-deall Tainted: P           OE
4.14.6-1.el6.x86_64 #1
Hardware name: HP ProLiant MicroServer, BIOS O41     10/01/2013
task: ffff8800595cc980 task.stack: ffffc900028e0000
RIP: e030:xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback]
RSP: e02b:ffffc900028e3c68 EFLAGS: 00010292
RAX: 0000000000000045 RBX: ffffc90002969000 RCX: 0000000000000000
RDX: ffff88007f4146e8 RSI: ffff88007f40db38 RDI: ffff88007f40db38
RBP: ffffc900028e3e98 R08: 000000000000037b R09: 000000000000037c
R10: 0000000000000001 R11: 0000000000000000 R12: ffffc90002972730
R13: 0000160000000000 R14: aaaaaaaaaaaaaaab R15: ffffc9000099bbe8
FS:  00007fee260ff9a0(0000) GS:ffff88007f400000(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffff600400 CR3: 0000000062815000 CR4: 0000000000000660
Call Trace:
 ? error_exit+0x5/0x20
 ? __update_load_avg_cfs_rq+0x176/0x180
 ? xen_mc_flush+0x87/0x120
 ? xen_load_sp0+0x84/0xa0
 ? __switch_to+0x1c1/0x360
 ? finish_task_switch+0x78/0x240
 ? __schedule+0x192/0x496
 ? _raw_spin_lock_irqsave+0x1a/0x3c
 ? _raw_spin_lock_irqsave+0x1a/0x3c
 ? _raw_spin_unlock_irqrestore+0x11/0x20
 xenvif_dealloc_kthread+0x68/0xf0 [xen_netback]
 ? do_wait_intr+0x80/0x80
 ? xenvif_map_frontend_data_rings+0xe0/0xe0 [xen_netback]
 kthread+0x106/0x140
 ? kthread_destroy_worker+0x60/0x60
 ? kthread_destroy_worker+0x60/0x60
 ret_from_fork+0x25/0x30
Code: 89 df 49 83 c4 02 e8 e5 f5 ff ff 4d 39 ec 75 e8 eb a2 48 8b 43 20 48
c7 c6 10 3b 55 a0 48 8b b8 20 03 00 00 31 c0 e8 85 b9 06 e1 <0f> 0b 0f 0b 48
8b 53 20 89 c1 48 c7 c6 48 3b 55 a0 31 c0 45 31 
RIP: xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback] RSP:
ffffc900028e3c68
---[ end trace 7d827dae67002ffc ]---

============================================================================
=========

The section of relevant kernel code is:

============================================================================
=========

static inline void xenvif_grant_handle_reset(struct xenvif_queue *queue,
                                             u16 pending_idx)
{
        if (unlikely(queue->grant_tx_handle[pending_idx] ==
                     NETBACK_INVALID_HANDLE)) {
                netdev_err(queue->vif->dev,
                           "Trying to unmap invalid handle! pending_idx:
0x%x\n",
                           pending_idx);
                BUG();
        }
        queue->grant_tx_handle[pending_idx] = NETBACK_INVALID_HANDLE;
}

============================================================================
=========

In an attempt to recover from this situation I restarted / destroyed (xl
restart <vmname> / xl destroy <vmname>) the VM to recover it's state and the
following error messages were logged at the console:

============================================================================
=========

libxl: error: libxl_exec.c:129:libxl_report_child_exitstatus:
/etc/xen/scripts/block remove [25271] died due to fatal signal Segmentation
fault
libxl: error: libxl_device.c:1080:device_backend_callback: unable to remove
device with path /local/domain/0/backend/vif/2/0
libxl: error: libxl.c:1647:devices_destroy_cb: libxl__devices_destroy failed
for 2

============================================================================
=========

After which the physical system hung, then the physical system restarted
with nothing else logged and everything came back OK & operational including
the VM that crashed.

Further details (xl dmesg, xl info) attached.

Best regards,

Alex Braunegg

[-- Attachment #2: xl-dmesg.txt --]
[-- Type: text/plain, Size: 4509 bytes --]

 Xen 4.8.2
(XEN) Xen version 4.8.2 (<redacted>) (gcc (GCC) 4.6.2 20111027 (Red Hat 4.6.2-1)) debug=n  Sun Dec 17 14:32:09 EST 2017
(XEN) Latest ChangeSet: 
(XEN) Bootloader: GNU GRUB 0.97
(XEN) Command line: dom0_mem=2048M,max:2048M cpufreq=xen dom0_max_vcpus=1 dom0_vcpus_pin
(XEN) Video information:
(XEN)  VGA is text mode 80x25, font 8x16
(XEN)  VBE/DDC methods: V2; EDID transfer time: 1 seconds
(XEN)  EDID info not retrieved because of reasons unknown
(XEN) Disc information:
(XEN)  Found 7 MBR signatures
(XEN)  Found 6 EDD information structures
(XEN) Xen-e820 RAM map:
(XEN)  0000000000000000 - 000000000009d000 (usable)
(XEN)  000000000009d000 - 00000000000a0000 (reserved)
(XEN)  00000000000e2000 - 0000000000100000 (reserved)
(XEN)  0000000000100000 - 00000000ddf90000 (usable)
(XEN)  00000000ddf9e000 - 00000000ddfa0000 type 9
(XEN)  00000000ddfa0000 - 00000000ddfaa600 (ACPI data)
(XEN)  00000000ddfaa600 - 00000000ddfe0000 (ACPI NVS)
(XEN)  00000000ddfe0000 - 00000000de000000 (reserved)
(XEN)  00000000e0000000 - 00000000f0000000 (reserved)
(XEN)  00000000ffa00000 - 0000000100000000 (reserved)
(XEN)  0000000100000000 - 0000000220000000 (usable)
(XEN) ACPI: RSDP 000F8F50, 0024 (r2 HP    )
(XEN) ACPI: XSDT DDFA0100, 007C (r1 HP     ProLiant 20131001 HP         97)
(XEN) ACPI: FACP DDFA0290, 00F4 (r3 HP     ProLiant 20131001 HP         97)
(XEN) ACPI: DSDT DDFA0620, 6868 (r1 HP     ProLiant        6 INTL 20051117)
(XEN) ACPI: FACS DDFAE000, 0040
(XEN) ACPI: APIC DDFA0390, 0072 (r1 HP     ProLiant 20131001 HP         97)
(XEN) ACPI: MCFG DDFA0410, 003C (r1 HP     ProLiant 20131001 HP         97)
(XEN) ACPI: SPMI DDFA0450, 0041 (r5 HP     ProLiant 20131001 HP         97)
(XEN) ACPI: OEMB DDFAE040, 0072 (r1 HP     ProLiant 20131001 HP         97)
(XEN) ACPI: HPET DDFAB4E0, 0038 (r1 HP     ProLiant 20131001 HP         97)
(XEN) ACPI: EINJ DDFAB520, 0130 (r1  AMIER AMI_EINJ 20131001 HP         97)
(XEN) ACPI: BERT DDFAB6B0, 0030 (r1  AMIER AMI_BERT 20131001 HP         97)
(XEN) ACPI: ERST DDFAB6E0, 01B0 (r1  AMIER AMI_ERST 20131001 HP         97)
(XEN) ACPI: HEST DDFAB890, 00A8 (r1  AMIER ABC_HEST 20131001 HP         97)
(XEN) ACPI: SSDT DDFAB940, 052A (r1 HP     ProLiant        1 AMD         1)
(XEN) System RAM: 8159MB (8354996kB)
(XEN) Domain heap initialised
(XEN) IOAPIC[0]: apic_id 2, version 33, address 0xfec00000, GSI 0-23
(XEN) Enabling APIC mode:  Flat.  Using 1 I/O APICs
(XEN) Failed to get Error Log Address Range.
(XEN) Using scheduler: SMP Credit Scheduler (credit)
(XEN) Platform timer is 14.318MHz HPET
(XEN) Detected 2196.371 MHz processor.
(XEN) Initing memory sharing.
(XEN) AMD-Vi: IOMMU not found!
(XEN) I/O virtualisation disabled
(XEN) ENABLING IO-APIC IRQs
(XEN)  -> Using new ACK method
(XEN) Allocated console ring of 16 KiB.
(XEN) HVM: ASIDs enabled.
(XEN) SVM: Supported advanced features:
(XEN)  - Nested Page Tables (NPT)
(XEN)  - Last Branch Record (LBR) Virtualisation
(XEN)  - Next-RIP Saved on #VMEXIT
(XEN) HVM: SVM enabled
(XEN) HVM: Hardware Assisted Paging (HAP) detected
(XEN) HVM: HAP page sizes: 4kB, 2MB, 1GB
(XEN) Brought up 2 CPUs
(XEN) Xenoprofile: AMD IBS detected (0x1f)
(XEN) Dom0 has maximum 216 PIRQs
(XEN) *** LOADING DOMAIN 0 ***
(XEN)  Xen  kernel: 64-bit, lsb, compat32
(XEN)  Dom0 kernel: 64-bit, PAE, lsb, paddr 0x1000000 -> 0x1ff4000
(XEN) PHYSICAL MEMORY ARRANGEMENT:
(XEN)  Dom0 alloc.:   0000000210000000->0000000214000000 (497047 pages to be allocated)
(XEN)  Init. ramdisk: 000000021d597000->000000021ffff800
(XEN) VIRTUAL MEMORY ARRANGEMENT:
(XEN)  Loaded kernel: ffffffff81000000->ffffffff81ff4000
(XEN)  Init. ramdisk: 0000000000000000->0000000000000000
(XEN)  Phys-Mach map: 0000008000000000->0000008000400000
(XEN)  Start info:    ffffffff81ff4000->ffffffff81ff44b4
(XEN)  Page tables:   ffffffff81ff5000->ffffffff8200a000
(XEN)  Boot stack:    ffffffff8200a000->ffffffff8200b000
(XEN)  TOTAL:         ffffffff80000000->ffffffff82400000
(XEN)  ENTRY ADDRESS: ffffffff81d01180
(XEN) Dom0 has maximum 1 VCPUs
(XEN) Scrubbing Free RAM on 1 nodes using 2 CPUs
(XEN) ..................................done.
(XEN) Initial low memory virq threshold set at 0x4000 pages.
(XEN) Std. Loglevel: Errors and warnings
(XEN) Guest Loglevel: Nothing (Rate-limited: Errors and warnings)
(XEN) Xen is relinquishing VGA console.
(XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch input to Xen)
(XEN) Freed 316kB init memory

[-- Attachment #3: xl-info.txt --]
[-- Type: text/plain, Size: 1404 bytes --]

host                   : <redacted>
release                : 4.14.6-1.el6.x86_64
version                : #1 SMP Sun Dec 17 09:56:11 EST 2017
machine                : x86_64
nr_cpus                : 2
max_cpu_id             : 3
nr_nodes               : 1
cores_per_socket       : 2
threads_per_core       : 1
cpu_mhz                : 2196
hw_caps                : 178bf3ff:80802001:efd3fbff:000837ff:00000000:00000000:00000000:00000100
virt_caps              : hvm
total_memory           : 8159
free_memory            : 2921
sharing_freed_memory   : 0
sharing_used_memory    : 0
outstanding_claims     : 0
free_cpus              : 0
xen_major              : 4
xen_minor              : 8
xen_extra              : .2
xen_version            : 4.8.2
xen_caps               : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64 
xen_scheduler          : credit
xen_pagesize           : 4096
platform_params        : virt_start=0xffff800000000000
xen_changeset          : 
xen_commandline        : dom0_mem=2048M,max:2048M cpufreq=xen dom0_max_vcpus=1 dom0_vcpus_pin
cc_compiler            : gcc (GCC) 4.6.2 20111027 (Red Hat 4.6.2-1)
cc_compile_by          : mockbuild
cc_compile_domain      : <redacted>
cc_compile_date        : Sun Dec 17 14:32:09 EST 2017
build_id               : 83b9fac55c85d3ae6f228e672157a37347d25677
xend_config_format     : 4

[-- Attachment #4: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [BUG] kernel bug encountered at drivers/net/xen-netback/netback.c:430!
@ 2017-12-22  6:40 Alex Braunegg
  2017-12-22  6:47 ` Juergen Gross
  0 siblings, 1 reply; 17+ messages in thread
From: Alex Braunegg @ 2017-12-22  6:40 UTC (permalink / raw)
  To: xen-devel

Hi all,

Experienced the same issue again today:

============================================================================
=========

vif vif-2-0 vif2.0: Trying to unmap invalid handle! pending_idx: 0x2f
------------[ cut here ]------------
kernel BUG at drivers/net/xen-netback/netback.c:430!
invalid opcode: 0000 [#1] SMP
Modules linked in: xt_physdev(E) iptable_filter(E) ip_tables(E)
xen_netback(E) nfsd(E) lockd(E) grace(E) nfs_acl(E) auth_rpcgss(E) sunrpc(E)
ipmi_si(E) ipmi_msghandler(E) k10temp(E) zfs(POE) zcommon(POE) znvpair(POE)
icp(POE) spl(OE) zavl(POE) zunicode(POE) tpm_infineon(E) sp5100_tco(E)
i2c_piix4(E) i2c_core(E) ohci_pci(E) ohci_hcd(E) tg3(E) ptp(E) pps_core(E)
sg(E) raid1(E) sd_mod(E) ata_generic(E) pata_acpi(E) pata_atiixp(E) ahci(E)
libahci(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) dax(E)
CPU: 0 PID: 12636 Comm: vif2.0-q0-deall Tainted: P           OE
4.14.6-1.el6.x86_64 #1
Hardware name: HP ProLiant MicroServer, BIOS O41     10/01/2013
task: ffff880062518000 task.stack: ffffc90004f88000
RIP: e030:xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback]
RSP: e02b:ffffc90004f8bc68 EFLAGS: 00010292
RAX: 0000000000000045 RBX: ffffc90000fcd000 RCX: 0000000000000000
RDX: ffff88007f4146e8 RSI: ffff88007f40db38 RDI: ffff88007f40db38
RBP: ffffc90004f8be98 R08: 000000000000037d R09: 000000000000037e
R10: 0000000000000001 R11: 0000000000000000 R12: ffffc90000fd6730
R13: 0000160000000000 R14: aaaaaaaaaaaaaaab R15: ffffc9000099bbe8
FS:  00007f40c63639a0(0000) GS:ffff88007f400000(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffff600400 CR3: 000000006375f000 CR4: 0000000000000660
Call Trace:
 ? error_exit+0x5/0x20
 ? __update_load_avg_cfs_rq+0x176/0x180
 ? xen_mc_flush+0x87/0x120
 ? xen_load_sp0+0x84/0xa0
 ? __switch_to+0x1c1/0x360
 ? finish_task_switch+0x78/0x240
 ? __schedule+0x192/0x496
 ? _raw_spin_lock_irqsave+0x1a/0x3c
 ? _raw_spin_lock_irqsave+0x1a/0x3c
 ? _raw_spin_unlock_irqrestore+0x11/0x20
 xenvif_dealloc_kthread+0x68/0xf0 [xen_netback]
 ? do_wait_intr+0x80/0x80
 ? xenvif_map_frontend_data_rings+0xe0/0xe0 [xen_netback]
 kthread+0x106/0x140
 ? kthread_destroy_worker+0x60/0x60
 ? kthread_destroy_worker+0x60/0x60
 ret_from_fork+0x25/0x30
Code: 89 df 49 83 c4 02 e8 e5 f5 ff ff 4d 39 ec 75 e8 eb a2 48 8b 43 20 48
c7 c6 10 5b 55 a0 48 8b b8 20 03 00 00 31 c0 e8 85 99 06 e1 <0f> 0b 0f 0b 48
8b 53 20 89 c1 48 c7 c6 48 5b 55 a0 31 c0 45 31 
RIP: xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback] RSP:
ffffc90004f8bc68
---[ end trace 010682c76619a1bd ]---

============================================================================
=========

Best regards,

Alex

-----Original Message-----
From: Alex Braunegg [mailto:alex.braunegg@gmail.com] 
Sent: Thursday, 21 December 2017 8:04 AM
To: 'xen-devel@lists.xenproject.org'
Subject: [BUG] kernel bug encountered at
drivers/net/xen-netback/netback.c:430!

Hi all,

I experienced the following bug whilst using a Xen VM. What happened was
that this morning a single Xen VM suddenly terminated without cause with the
following being logged in dmesg. 

Only 1 VM experienced an issue (out of 2 which were running), the other
remained up and fully functional until I attempted to restart the crashed VM
which triggered the kernel bug.

Kernel:	4.14.6
Xen:		4.8.2

============================================================================
=========

vif vif-2-0 vif2.0: Trying to unmap invalid handle! pending_idx: 0x3f
------------[ cut here ]------------
kernel BUG at drivers/net/xen-netback/netback.c:430!
invalid opcode: 0000 [#1] SMP
Modules linked in: xt_physdev(E) iptable_filter(E) ip_tables(E)
xen_netback(E) nfsd(E) lockd(E) grace(E) nfs_acl(E) auth_rpcgss(E) sunrpc(E)
ipmi_si(E) ipmi_msghandler(E) zfs(POE) zcommon(POE) znvpair(POE) icp(POE)
spl(OE) zavl(POE) zunicode(POE) k10temp(E) tpm_infineon(E) sp5100_tco(E)
i2c_piix4(E) i2c_core(E) ohci_pci(E) ohci_hcd(E) tg3(E) ptp(E) pps_core(E)
sg(E) raid1(E) sd_mod(E) ata_generic(E) pata_acpi(E) pata_atiixp(E) ahci(E)
libahci(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) dax(E)
CPU: 0 PID: 13163 Comm: vif2.0-q0-deall Tainted: P           OE
4.14.6-1.el6.x86_64 #1
Hardware name: HP ProLiant MicroServer, BIOS O41     10/01/2013
task: ffff8800595cc980 task.stack: ffffc900028e0000
RIP: e030:xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback]
RSP: e02b:ffffc900028e3c68 EFLAGS: 00010292
RAX: 0000000000000045 RBX: ffffc90002969000 RCX: 0000000000000000
RDX: ffff88007f4146e8 RSI: ffff88007f40db38 RDI: ffff88007f40db38
RBP: ffffc900028e3e98 R08: 000000000000037b R09: 000000000000037c
R10: 0000000000000001 R11: 0000000000000000 R12: ffffc90002972730
R13: 0000160000000000 R14: aaaaaaaaaaaaaaab R15: ffffc9000099bbe8
FS:  00007fee260ff9a0(0000) GS:ffff88007f400000(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffff600400 CR3: 0000000062815000 CR4: 0000000000000660
Call Trace:
 ? error_exit+0x5/0x20
 ? __update_load_avg_cfs_rq+0x176/0x180
 ? xen_mc_flush+0x87/0x120
 ? xen_load_sp0+0x84/0xa0
 ? __switch_to+0x1c1/0x360
 ? finish_task_switch+0x78/0x240
 ? __schedule+0x192/0x496
 ? _raw_spin_lock_irqsave+0x1a/0x3c
 ? _raw_spin_lock_irqsave+0x1a/0x3c
 ? _raw_spin_unlock_irqrestore+0x11/0x20
 xenvif_dealloc_kthread+0x68/0xf0 [xen_netback]
 ? do_wait_intr+0x80/0x80
 ? xenvif_map_frontend_data_rings+0xe0/0xe0 [xen_netback]
 kthread+0x106/0x140
 ? kthread_destroy_worker+0x60/0x60
 ? kthread_destroy_worker+0x60/0x60
 ret_from_fork+0x25/0x30
Code: 89 df 49 83 c4 02 e8 e5 f5 ff ff 4d 39 ec 75 e8 eb a2 48 8b 43 20 48
c7 c6 10 3b 55 a0 48 8b b8 20 03 00 00 31 c0 e8 85 b9 06 e1 <0f> 0b 0f 0b 48
8b 53 20 89 c1 48 c7 c6 48 3b 55 a0 31 c0 45 31 
RIP: xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback] RSP:
ffffc900028e3c68
---[ end trace 7d827dae67002ffc ]---

============================================================================
=========

The section of relevant kernel code is:

============================================================================
=========

static inline void xenvif_grant_handle_reset(struct xenvif_queue *queue,
                                             u16 pending_idx)
{
        if (unlikely(queue->grant_tx_handle[pending_idx] ==
                     NETBACK_INVALID_HANDLE)) {
                netdev_err(queue->vif->dev,
                           "Trying to unmap invalid handle! pending_idx:
0x%x\n",
                           pending_idx);
                BUG();
        }
        queue->grant_tx_handle[pending_idx] = NETBACK_INVALID_HANDLE;
}

============================================================================
=========

In an attempt to recover from this situation I restarted / destroyed (xl
restart <vmname> / xl destroy <vmname>) the VM to recover it's state and the
following error messages were logged at the console:

============================================================================
=========

libxl: error: libxl_exec.c:129:libxl_report_child_exitstatus:
/etc/xen/scripts/block remove [25271] died due to fatal signal Segmentation
fault
libxl: error: libxl_device.c:1080:device_backend_callback: unable to remove
device with path /local/domain/0/backend/vif/2/0
libxl: error: libxl.c:1647:devices_destroy_cb: libxl__devices_destroy failed
for 2

============================================================================
=========

After which the physical system hung, then the physical system restarted
with nothing else logged and everything came back OK & operational including
the VM that crashed.

Further details (xl dmesg, xl info) attached.

Best regards,

Alex Braunegg


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [BUG] kernel bug encountered at drivers/net/xen-netback/netback.c:430!
  2017-12-22  6:40 Alex Braunegg
@ 2017-12-22  6:47 ` Juergen Gross
  2017-12-22 20:35   ` Alex Braunegg
  0 siblings, 1 reply; 17+ messages in thread
From: Juergen Gross @ 2017-12-22  6:47 UTC (permalink / raw)
  To: Alex Braunegg, xen-devel; +Cc: Paul Durrant, Wei Liu

On 22/12/17 07:40, Alex Braunegg wrote:
> Hi all,
> 
> Experienced the same issue again today:

Ccing the maintainers.


Juergen

> 
> ============================================================================
> =========
> 
> vif vif-2-0 vif2.0: Trying to unmap invalid handle! pending_idx: 0x2f
> ------------[ cut here ]------------
> kernel BUG at drivers/net/xen-netback/netback.c:430!
> invalid opcode: 0000 [#1] SMP
> Modules linked in: xt_physdev(E) iptable_filter(E) ip_tables(E)
> xen_netback(E) nfsd(E) lockd(E) grace(E) nfs_acl(E) auth_rpcgss(E) sunrpc(E)
> ipmi_si(E) ipmi_msghandler(E) k10temp(E) zfs(POE) zcommon(POE) znvpair(POE)
> icp(POE) spl(OE) zavl(POE) zunicode(POE) tpm_infineon(E) sp5100_tco(E)
> i2c_piix4(E) i2c_core(E) ohci_pci(E) ohci_hcd(E) tg3(E) ptp(E) pps_core(E)
> sg(E) raid1(E) sd_mod(E) ata_generic(E) pata_acpi(E) pata_atiixp(E) ahci(E)
> libahci(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) dax(E)
> CPU: 0 PID: 12636 Comm: vif2.0-q0-deall Tainted: P           OE
> 4.14.6-1.el6.x86_64 #1
> Hardware name: HP ProLiant MicroServer, BIOS O41     10/01/2013
> task: ffff880062518000 task.stack: ffffc90004f88000
> RIP: e030:xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback]
> RSP: e02b:ffffc90004f8bc68 EFLAGS: 00010292
> RAX: 0000000000000045 RBX: ffffc90000fcd000 RCX: 0000000000000000
> RDX: ffff88007f4146e8 RSI: ffff88007f40db38 RDI: ffff88007f40db38
> RBP: ffffc90004f8be98 R08: 000000000000037d R09: 000000000000037e
> R10: 0000000000000001 R11: 0000000000000000 R12: ffffc90000fd6730
> R13: 0000160000000000 R14: aaaaaaaaaaaaaaab R15: ffffc9000099bbe8
> FS:  00007f40c63639a0(0000) GS:ffff88007f400000(0000) knlGS:0000000000000000
> CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: ffffffffff600400 CR3: 000000006375f000 CR4: 0000000000000660
> Call Trace:
>  ? error_exit+0x5/0x20
>  ? __update_load_avg_cfs_rq+0x176/0x180
>  ? xen_mc_flush+0x87/0x120
>  ? xen_load_sp0+0x84/0xa0
>  ? __switch_to+0x1c1/0x360
>  ? finish_task_switch+0x78/0x240
>  ? __schedule+0x192/0x496
>  ? _raw_spin_lock_irqsave+0x1a/0x3c
>  ? _raw_spin_lock_irqsave+0x1a/0x3c
>  ? _raw_spin_unlock_irqrestore+0x11/0x20
>  xenvif_dealloc_kthread+0x68/0xf0 [xen_netback]
>  ? do_wait_intr+0x80/0x80
>  ? xenvif_map_frontend_data_rings+0xe0/0xe0 [xen_netback]
>  kthread+0x106/0x140
>  ? kthread_destroy_worker+0x60/0x60
>  ? kthread_destroy_worker+0x60/0x60
>  ret_from_fork+0x25/0x30
> Code: 89 df 49 83 c4 02 e8 e5 f5 ff ff 4d 39 ec 75 e8 eb a2 48 8b 43 20 48
> c7 c6 10 5b 55 a0 48 8b b8 20 03 00 00 31 c0 e8 85 99 06 e1 <0f> 0b 0f 0b 48
> 8b 53 20 89 c1 48 c7 c6 48 5b 55 a0 31 c0 45 31 
> RIP: xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback] RSP:
> ffffc90004f8bc68
> ---[ end trace 010682c76619a1bd ]---
> 
> ============================================================================
> =========
> 
> Best regards,
> 
> Alex
> 
> -----Original Message-----
> From: Alex Braunegg [mailto:alex.braunegg@gmail.com] 
> Sent: Thursday, 21 December 2017 8:04 AM
> To: 'xen-devel@lists.xenproject.org'
> Subject: [BUG] kernel bug encountered at
> drivers/net/xen-netback/netback.c:430!
> 
> Hi all,
> 
> I experienced the following bug whilst using a Xen VM. What happened was
> that this morning a single Xen VM suddenly terminated without cause with the
> following being logged in dmesg. 
> 
> Only 1 VM experienced an issue (out of 2 which were running), the other
> remained up and fully functional until I attempted to restart the crashed VM
> which triggered the kernel bug.
> 
> Kernel:	4.14.6
> Xen:		4.8.2
> 
> ============================================================================
> =========
> 
> vif vif-2-0 vif2.0: Trying to unmap invalid handle! pending_idx: 0x3f
> ------------[ cut here ]------------
> kernel BUG at drivers/net/xen-netback/netback.c:430!
> invalid opcode: 0000 [#1] SMP
> Modules linked in: xt_physdev(E) iptable_filter(E) ip_tables(E)
> xen_netback(E) nfsd(E) lockd(E) grace(E) nfs_acl(E) auth_rpcgss(E) sunrpc(E)
> ipmi_si(E) ipmi_msghandler(E) zfs(POE) zcommon(POE) znvpair(POE) icp(POE)
> spl(OE) zavl(POE) zunicode(POE) k10temp(E) tpm_infineon(E) sp5100_tco(E)
> i2c_piix4(E) i2c_core(E) ohci_pci(E) ohci_hcd(E) tg3(E) ptp(E) pps_core(E)
> sg(E) raid1(E) sd_mod(E) ata_generic(E) pata_acpi(E) pata_atiixp(E) ahci(E)
> libahci(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) dax(E)
> CPU: 0 PID: 13163 Comm: vif2.0-q0-deall Tainted: P           OE
> 4.14.6-1.el6.x86_64 #1
> Hardware name: HP ProLiant MicroServer, BIOS O41     10/01/2013
> task: ffff8800595cc980 task.stack: ffffc900028e0000
> RIP: e030:xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback]
> RSP: e02b:ffffc900028e3c68 EFLAGS: 00010292
> RAX: 0000000000000045 RBX: ffffc90002969000 RCX: 0000000000000000
> RDX: ffff88007f4146e8 RSI: ffff88007f40db38 RDI: ffff88007f40db38
> RBP: ffffc900028e3e98 R08: 000000000000037b R09: 000000000000037c
> R10: 0000000000000001 R11: 0000000000000000 R12: ffffc90002972730
> R13: 0000160000000000 R14: aaaaaaaaaaaaaaab R15: ffffc9000099bbe8
> FS:  00007fee260ff9a0(0000) GS:ffff88007f400000(0000) knlGS:0000000000000000
> CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: ffffffffff600400 CR3: 0000000062815000 CR4: 0000000000000660
> Call Trace:
>  ? error_exit+0x5/0x20
>  ? __update_load_avg_cfs_rq+0x176/0x180
>  ? xen_mc_flush+0x87/0x120
>  ? xen_load_sp0+0x84/0xa0
>  ? __switch_to+0x1c1/0x360
>  ? finish_task_switch+0x78/0x240
>  ? __schedule+0x192/0x496
>  ? _raw_spin_lock_irqsave+0x1a/0x3c
>  ? _raw_spin_lock_irqsave+0x1a/0x3c
>  ? _raw_spin_unlock_irqrestore+0x11/0x20
>  xenvif_dealloc_kthread+0x68/0xf0 [xen_netback]
>  ? do_wait_intr+0x80/0x80
>  ? xenvif_map_frontend_data_rings+0xe0/0xe0 [xen_netback]
>  kthread+0x106/0x140
>  ? kthread_destroy_worker+0x60/0x60
>  ? kthread_destroy_worker+0x60/0x60
>  ret_from_fork+0x25/0x30
> Code: 89 df 49 83 c4 02 e8 e5 f5 ff ff 4d 39 ec 75 e8 eb a2 48 8b 43 20 48
> c7 c6 10 3b 55 a0 48 8b b8 20 03 00 00 31 c0 e8 85 b9 06 e1 <0f> 0b 0f 0b 48
> 8b 53 20 89 c1 48 c7 c6 48 3b 55 a0 31 c0 45 31 
> RIP: xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback] RSP:
> ffffc900028e3c68
> ---[ end trace 7d827dae67002ffc ]---
> 
> ============================================================================
> =========
> 
> The section of relevant kernel code is:
> 
> ============================================================================
> =========
> 
> static inline void xenvif_grant_handle_reset(struct xenvif_queue *queue,
>                                              u16 pending_idx)
> {
>         if (unlikely(queue->grant_tx_handle[pending_idx] ==
>                      NETBACK_INVALID_HANDLE)) {
>                 netdev_err(queue->vif->dev,
>                            "Trying to unmap invalid handle! pending_idx:
> 0x%x\n",
>                            pending_idx);
>                 BUG();
>         }
>         queue->grant_tx_handle[pending_idx] = NETBACK_INVALID_HANDLE;
> }
> 
> ============================================================================
> =========
> 
> In an attempt to recover from this situation I restarted / destroyed (xl
> restart <vmname> / xl destroy <vmname>) the VM to recover it's state and the
> following error messages were logged at the console:
> 
> ============================================================================
> =========
> 
> libxl: error: libxl_exec.c:129:libxl_report_child_exitstatus:
> /etc/xen/scripts/block remove [25271] died due to fatal signal Segmentation
> fault
> libxl: error: libxl_device.c:1080:device_backend_callback: unable to remove
> device with path /local/domain/0/backend/vif/2/0
> libxl: error: libxl.c:1647:devices_destroy_cb: libxl__devices_destroy failed
> for 2
> 
> ============================================================================
> =========
> 
> After which the physical system hung, then the physical system restarted
> with nothing else logged and everything came back OK & operational including
> the VM that crashed.
> 
> Further details (xl dmesg, xl info) attached.
> 
> Best regards,
> 
> Alex Braunegg
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [BUG] kernel bug encountered at drivers/net/xen-netback/netback.c:430!
  2017-12-22  6:47 ` Juergen Gross
@ 2017-12-22 20:35   ` Alex Braunegg
  2017-12-28 18:05     ` Michael Collins
  0 siblings, 1 reply; 17+ messages in thread
From: Alex Braunegg @ 2017-12-22 20:35 UTC (permalink / raw)
  To: 'Juergen Gross', xen-devel
  Cc: 'Paul Durrant', 'Wei Liu'

Hi all,

Another crash this morning:

vif vif-2-0 vif2.0: Trying to unmap invalid handle! pending_idx: 0x3a
------------[ cut here ]------------
kernel BUG at drivers/net/xen-netback/netback.c:430!
invalid opcode: 0000 [#1] SMP
Modules linked in: xt_physdev(E) iptable_filter(E) ip_tables(E) xen_netback(E) nfsd(E) lockd(E) grace(E) nfs_acl(E) auth_rpcgss(E) sunrpc(E) ipmi_si(E) ipmi_msghandler(E) k10temp(E) zfs(POE) zcommon(POE) znvpair(POE) icp(POE) spl(OE) zavl(POE) zunicode(POE) tpm_infineon(E) sp5100_tco(E) i2c_piix4(E) i2c_core(E) ohci_pci(E) ohci_hcd(E) tg3(E) ptp(E) pps_core(E) sg(E) raid1(E) sd_mod(E) ata_generic(E) pata_acpi(E) pata_atiixp(E) ahci(E) libahci(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) dax(E)
CPU: 0 PID: 14238 Comm: vif2.0-q0-deall Tainted: P           OE   4.14.6-1.el6.x86_64 #1
Hardware name: HP ProLiant MicroServer, BIOS O41     10/01/2013
task: ffff880059e255c0 task.stack: ffffc90001f64000
RIP: e030:xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback]
RSP: e02b:ffffc90001f67c68 EFLAGS: 00010292
RAX: 0000000000000045 RBX: ffffc90001f55000 RCX: 0000000000000000
RDX: ffff88007f4146e8 RSI: ffff88007f40db38 RDI: ffff88007f40db38
RBP: ffffc90001f67e98 R08: 0000000000000372 R09: 0000000000000373
R10: 0000000000000001 R11: 0000000000000000 R12: ffffc90001f5e730
R13: 0000160000000000 R14: aaaaaaaaaaaaaaab R15: ffffc9000099bbe8
FS:  00007f92865d29a0(0000) GS:ffff88007f400000(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffff600400 CR3: 000000006209c000 CR4: 0000000000000660
Call Trace:
 ? _raw_spin_unlock_irqrestore+0x11/0x20
 ? error_exit+0x5/0x20
 ? __update_load_avg_cfs_rq+0x176/0x180
 ? xen_mc_flush+0x87/0x120
 ? xen_load_sp0+0x84/0xa0
 ? __switch_to+0x1c1/0x360
 ? finish_task_switch+0x78/0x240
 ? __schedule+0x192/0x496
 ? _raw_spin_lock_irqsave+0x1a/0x3c
 ? _raw_spin_lock_irqsave+0x1a/0x3c
 ? _raw_spin_unlock_irqrestore+0x11/0x20
 xenvif_dealloc_kthread+0x68/0xf0 [xen_netback]
 ? do_wait_intr+0x80/0x80
 ? xenvif_map_frontend_data_rings+0xe0/0xe0 [xen_netback]
 kthread+0x106/0x140
 ? kthread_destroy_worker+0x60/0x60
 ret_from_fork+0x25/0x30
Code: 89 df 49 83 c4 02 e8 e5 f5 ff ff 4d 39 ec 75 e8 eb a2 48 8b 43 20 48 c7 c6 10 2b 55 a0 48 8b b8 20 03 00 00 31 c0 e8 85 c9 06 e1 <0f> 0b 0f 0b 48 8b 53 20 89 c1 48 c7 c6 48 2b 55 a0 31 c0 45 31 
RIP: xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback] RSP: ffffc90001f67c68
---[ end trace 130de0b7e39d0eea ]---

Best regards,

Alex



-----Original Message-----
From: Juergen Gross [mailto:jgross@suse.com] 
Sent: Friday, 22 December 2017 5:47 PM
To: Alex Braunegg; xen-devel@lists.xenproject.org
Cc: Wei Liu; Paul Durrant
Subject: Re: [Xen-devel] [BUG] kernel bug encountered at drivers/net/xen-netback/netback.c:430!

On 22/12/17 07:40, Alex Braunegg wrote:
> Hi all,
> 
> Experienced the same issue again today:

Ccing the maintainers.


Juergen

> 
> ============================================================================
> =========
> 
> vif vif-2-0 vif2.0: Trying to unmap invalid handle! pending_idx: 0x2f
> ------------[ cut here ]------------
> kernel BUG at drivers/net/xen-netback/netback.c:430!
> invalid opcode: 0000 [#1] SMP
> Modules linked in: xt_physdev(E) iptable_filter(E) ip_tables(E)
> xen_netback(E) nfsd(E) lockd(E) grace(E) nfs_acl(E) auth_rpcgss(E) sunrpc(E)
> ipmi_si(E) ipmi_msghandler(E) k10temp(E) zfs(POE) zcommon(POE) znvpair(POE)
> icp(POE) spl(OE) zavl(POE) zunicode(POE) tpm_infineon(E) sp5100_tco(E)
> i2c_piix4(E) i2c_core(E) ohci_pci(E) ohci_hcd(E) tg3(E) ptp(E) pps_core(E)
> sg(E) raid1(E) sd_mod(E) ata_generic(E) pata_acpi(E) pata_atiixp(E) ahci(E)
> libahci(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) dax(E)
> CPU: 0 PID: 12636 Comm: vif2.0-q0-deall Tainted: P           OE
> 4.14.6-1.el6.x86_64 #1
> Hardware name: HP ProLiant MicroServer, BIOS O41     10/01/2013
> task: ffff880062518000 task.stack: ffffc90004f88000
> RIP: e030:xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback]
> RSP: e02b:ffffc90004f8bc68 EFLAGS: 00010292
> RAX: 0000000000000045 RBX: ffffc90000fcd000 RCX: 0000000000000000
> RDX: ffff88007f4146e8 RSI: ffff88007f40db38 RDI: ffff88007f40db38
> RBP: ffffc90004f8be98 R08: 000000000000037d R09: 000000000000037e
> R10: 0000000000000001 R11: 0000000000000000 R12: ffffc90000fd6730
> R13: 0000160000000000 R14: aaaaaaaaaaaaaaab R15: ffffc9000099bbe8
> FS:  00007f40c63639a0(0000) GS:ffff88007f400000(0000) knlGS:0000000000000000
> CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: ffffffffff600400 CR3: 000000006375f000 CR4: 0000000000000660
> Call Trace:
>  ? error_exit+0x5/0x20
>  ? __update_load_avg_cfs_rq+0x176/0x180
>  ? xen_mc_flush+0x87/0x120
>  ? xen_load_sp0+0x84/0xa0
>  ? __switch_to+0x1c1/0x360
>  ? finish_task_switch+0x78/0x240
>  ? __schedule+0x192/0x496
>  ? _raw_spin_lock_irqsave+0x1a/0x3c
>  ? _raw_spin_lock_irqsave+0x1a/0x3c
>  ? _raw_spin_unlock_irqrestore+0x11/0x20
>  xenvif_dealloc_kthread+0x68/0xf0 [xen_netback]
>  ? do_wait_intr+0x80/0x80
>  ? xenvif_map_frontend_data_rings+0xe0/0xe0 [xen_netback]
>  kthread+0x106/0x140
>  ? kthread_destroy_worker+0x60/0x60
>  ? kthread_destroy_worker+0x60/0x60
>  ret_from_fork+0x25/0x30
> Code: 89 df 49 83 c4 02 e8 e5 f5 ff ff 4d 39 ec 75 e8 eb a2 48 8b 43 20 48
> c7 c6 10 5b 55 a0 48 8b b8 20 03 00 00 31 c0 e8 85 99 06 e1 <0f> 0b 0f 0b 48
> 8b 53 20 89 c1 48 c7 c6 48 5b 55 a0 31 c0 45 31 
> RIP: xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback] RSP:
> ffffc90004f8bc68
> ---[ end trace 010682c76619a1bd ]---
> 
> ============================================================================
> =========
> 
> Best regards,
> 
> Alex
> 
> -----Original Message-----
> From: Alex Braunegg [mailto:alex.braunegg@gmail.com] 
> Sent: Thursday, 21 December 2017 8:04 AM
> To: 'xen-devel@lists.xenproject.org'
> Subject: [BUG] kernel bug encountered at
> drivers/net/xen-netback/netback.c:430!
> 
> Hi all,
> 
> I experienced the following bug whilst using a Xen VM. What happened was
> that this morning a single Xen VM suddenly terminated without cause with the
> following being logged in dmesg. 
> 
> Only 1 VM experienced an issue (out of 2 which were running), the other
> remained up and fully functional until I attempted to restart the crashed VM
> which triggered the kernel bug.
> 
> Kernel:	4.14.6
> Xen:		4.8.2
> 
> ============================================================================
> =========
> 
> vif vif-2-0 vif2.0: Trying to unmap invalid handle! pending_idx: 0x3f
> ------------[ cut here ]------------
> kernel BUG at drivers/net/xen-netback/netback.c:430!
> invalid opcode: 0000 [#1] SMP
> Modules linked in: xt_physdev(E) iptable_filter(E) ip_tables(E)
> xen_netback(E) nfsd(E) lockd(E) grace(E) nfs_acl(E) auth_rpcgss(E) sunrpc(E)
> ipmi_si(E) ipmi_msghandler(E) zfs(POE) zcommon(POE) znvpair(POE) icp(POE)
> spl(OE) zavl(POE) zunicode(POE) k10temp(E) tpm_infineon(E) sp5100_tco(E)
> i2c_piix4(E) i2c_core(E) ohci_pci(E) ohci_hcd(E) tg3(E) ptp(E) pps_core(E)
> sg(E) raid1(E) sd_mod(E) ata_generic(E) pata_acpi(E) pata_atiixp(E) ahci(E)
> libahci(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) dax(E)
> CPU: 0 PID: 13163 Comm: vif2.0-q0-deall Tainted: P           OE
> 4.14.6-1.el6.x86_64 #1
> Hardware name: HP ProLiant MicroServer, BIOS O41     10/01/2013
> task: ffff8800595cc980 task.stack: ffffc900028e0000
> RIP: e030:xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback]
> RSP: e02b:ffffc900028e3c68 EFLAGS: 00010292
> RAX: 0000000000000045 RBX: ffffc90002969000 RCX: 0000000000000000
> RDX: ffff88007f4146e8 RSI: ffff88007f40db38 RDI: ffff88007f40db38
> RBP: ffffc900028e3e98 R08: 000000000000037b R09: 000000000000037c
> R10: 0000000000000001 R11: 0000000000000000 R12: ffffc90002972730
> R13: 0000160000000000 R14: aaaaaaaaaaaaaaab R15: ffffc9000099bbe8
> FS:  00007fee260ff9a0(0000) GS:ffff88007f400000(0000) knlGS:0000000000000000
> CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: ffffffffff600400 CR3: 0000000062815000 CR4: 0000000000000660
> Call Trace:
>  ? error_exit+0x5/0x20
>  ? __update_load_avg_cfs_rq+0x176/0x180
>  ? xen_mc_flush+0x87/0x120
>  ? xen_load_sp0+0x84/0xa0
>  ? __switch_to+0x1c1/0x360
>  ? finish_task_switch+0x78/0x240
>  ? __schedule+0x192/0x496
>  ? _raw_spin_lock_irqsave+0x1a/0x3c
>  ? _raw_spin_lock_irqsave+0x1a/0x3c
>  ? _raw_spin_unlock_irqrestore+0x11/0x20
>  xenvif_dealloc_kthread+0x68/0xf0 [xen_netback]
>  ? do_wait_intr+0x80/0x80
>  ? xenvif_map_frontend_data_rings+0xe0/0xe0 [xen_netback]
>  kthread+0x106/0x140
>  ? kthread_destroy_worker+0x60/0x60
>  ? kthread_destroy_worker+0x60/0x60
>  ret_from_fork+0x25/0x30
> Code: 89 df 49 83 c4 02 e8 e5 f5 ff ff 4d 39 ec 75 e8 eb a2 48 8b 43 20 48
> c7 c6 10 3b 55 a0 48 8b b8 20 03 00 00 31 c0 e8 85 b9 06 e1 <0f> 0b 0f 0b 48
> 8b 53 20 89 c1 48 c7 c6 48 3b 55 a0 31 c0 45 31 
> RIP: xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback] RSP:
> ffffc900028e3c68
> ---[ end trace 7d827dae67002ffc ]---
> 
> ============================================================================
> =========
> 
> The section of relevant kernel code is:
> 
> ============================================================================
> =========
> 
> static inline void xenvif_grant_handle_reset(struct xenvif_queue *queue,
>                                              u16 pending_idx)
> {
>         if (unlikely(queue->grant_tx_handle[pending_idx] ==
>                      NETBACK_INVALID_HANDLE)) {
>                 netdev_err(queue->vif->dev,
>                            "Trying to unmap invalid handle! pending_idx:
> 0x%x\n",
>                            pending_idx);
>                 BUG();
>         }
>         queue->grant_tx_handle[pending_idx] = NETBACK_INVALID_HANDLE;
> }
> 
> ============================================================================
> =========
> 
> In an attempt to recover from this situation I restarted / destroyed (xl
> restart <vmname> / xl destroy <vmname>) the VM to recover it's state and the
> following error messages were logged at the console:
> 
> ============================================================================
> =========
> 
> libxl: error: libxl_exec.c:129:libxl_report_child_exitstatus:
> /etc/xen/scripts/block remove [25271] died due to fatal signal Segmentation
> fault
> libxl: error: libxl_device.c:1080:device_backend_callback: unable to remove
> device with path /local/domain/0/backend/vif/2/0
> libxl: error: libxl.c:1647:devices_destroy_cb: libxl__devices_destroy failed
> for 2
> 
> ============================================================================
> =========
> 
> After which the physical system hung, then the physical system restarted
> with nothing else logged and everything came back OK & operational including
> the VM that crashed.
> 
> Further details (xl dmesg, xl info) attached.
> 
> Best regards,
> 
> Alex Braunegg
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [BUG] kernel bug encountered at drivers/net/xen-netback/netback.c:430!
  2017-12-22 20:35   ` Alex Braunegg
@ 2017-12-28 18:05     ` Michael Collins
  2017-12-28 19:31       ` Alex Braunegg
  0 siblings, 1 reply; 17+ messages in thread
From: Michael Collins @ 2017-12-28 18:05 UTC (permalink / raw)
  To: Alex Braunegg, 'Juergen Gross', xen-devel
  Cc: 'Paul Durrant', 'Wei Liu'

Alex,

          I saw this same issue when running a kernel 4.13+, switched 
back to 4.11 and the problem has not resurfaced.  I would like to 
understand the root cause of this issue.

Mike


On 12/22/2017 3:35 PM, Alex Braunegg wrote:
> Hi all,
>
> Another crash this morning:
>
> vif vif-2-0 vif2.0: Trying to unmap invalid handle! pending_idx: 0x3a
> ------------[ cut here ]------------
> kernel BUG at drivers/net/xen-netback/netback.c:430!
> invalid opcode: 0000 [#1] SMP
> Modules linked in: xt_physdev(E) iptable_filter(E) ip_tables(E) xen_netback(E) nfsd(E) lockd(E) grace(E) nfs_acl(E) auth_rpcgss(E) sunrpc(E) ipmi_si(E) ipmi_msghandler(E) k10temp(E) zfs(POE) zcommon(POE) znvpair(POE) icp(POE) spl(OE) zavl(POE) zunicode(POE) tpm_infineon(E) sp5100_tco(E) i2c_piix4(E) i2c_core(E) ohci_pci(E) ohci_hcd(E) tg3(E) ptp(E) pps_core(E) sg(E) raid1(E) sd_mod(E) ata_generic(E) pata_acpi(E) pata_atiixp(E) ahci(E) libahci(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) dax(E)
> CPU: 0 PID: 14238 Comm: vif2.0-q0-deall Tainted: P           OE   4.14.6-1.el6.x86_64 #1
> Hardware name: HP ProLiant MicroServer, BIOS O41     10/01/2013
> task: ffff880059e255c0 task.stack: ffffc90001f64000
> RIP: e030:xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback]
> RSP: e02b:ffffc90001f67c68 EFLAGS: 00010292
> RAX: 0000000000000045 RBX: ffffc90001f55000 RCX: 0000000000000000
> RDX: ffff88007f4146e8 RSI: ffff88007f40db38 RDI: ffff88007f40db38
> RBP: ffffc90001f67e98 R08: 0000000000000372 R09: 0000000000000373
> R10: 0000000000000001 R11: 0000000000000000 R12: ffffc90001f5e730
> R13: 0000160000000000 R14: aaaaaaaaaaaaaaab R15: ffffc9000099bbe8
> FS:  00007f92865d29a0(0000) GS:ffff88007f400000(0000) knlGS:0000000000000000
> CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: ffffffffff600400 CR3: 000000006209c000 CR4: 0000000000000660
> Call Trace:
>   ? _raw_spin_unlock_irqrestore+0x11/0x20
>   ? error_exit+0x5/0x20
>   ? __update_load_avg_cfs_rq+0x176/0x180
>   ? xen_mc_flush+0x87/0x120
>   ? xen_load_sp0+0x84/0xa0
>   ? __switch_to+0x1c1/0x360
>   ? finish_task_switch+0x78/0x240
>   ? __schedule+0x192/0x496
>   ? _raw_spin_lock_irqsave+0x1a/0x3c
>   ? _raw_spin_lock_irqsave+0x1a/0x3c
>   ? _raw_spin_unlock_irqrestore+0x11/0x20
>   xenvif_dealloc_kthread+0x68/0xf0 [xen_netback]
>   ? do_wait_intr+0x80/0x80
>   ? xenvif_map_frontend_data_rings+0xe0/0xe0 [xen_netback]
>   kthread+0x106/0x140
>   ? kthread_destroy_worker+0x60/0x60
>   ret_from_fork+0x25/0x30
> Code: 89 df 49 83 c4 02 e8 e5 f5 ff ff 4d 39 ec 75 e8 eb a2 48 8b 43 20 48 c7 c6 10 2b 55 a0 48 8b b8 20 03 00 00 31 c0 e8 85 c9 06 e1 <0f> 0b 0f 0b 48 8b 53 20 89 c1 48 c7 c6 48 2b 55 a0 31 c0 45 31
> RIP: xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback] RSP: ffffc90001f67c68
> ---[ end trace 130de0b7e39d0eea ]---
>
> Best regards,
>
> Alex
>
>
>
> -----Original Message-----
> From: Juergen Gross [mailto:jgross@suse.com]
> Sent: Friday, 22 December 2017 5:47 PM
> To: Alex Braunegg; xen-devel@lists.xenproject.org
> Cc: Wei Liu; Paul Durrant
> Subject: Re: [Xen-devel] [BUG] kernel bug encountered at drivers/net/xen-netback/netback.c:430!
>
> On 22/12/17 07:40, Alex Braunegg wrote:
>> Hi all,
>>
>> Experienced the same issue again today:
> Ccing the maintainers.
>
>
> Juergen
>
>> ============================================================================
>> =========
>>
>> vif vif-2-0 vif2.0: Trying to unmap invalid handle! pending_idx: 0x2f
>> ------------[ cut here ]------------
>> kernel BUG at drivers/net/xen-netback/netback.c:430!
>> invalid opcode: 0000 [#1] SMP
>> Modules linked in: xt_physdev(E) iptable_filter(E) ip_tables(E)
>> xen_netback(E) nfsd(E) lockd(E) grace(E) nfs_acl(E) auth_rpcgss(E) sunrpc(E)
>> ipmi_si(E) ipmi_msghandler(E) k10temp(E) zfs(POE) zcommon(POE) znvpair(POE)
>> icp(POE) spl(OE) zavl(POE) zunicode(POE) tpm_infineon(E) sp5100_tco(E)
>> i2c_piix4(E) i2c_core(E) ohci_pci(E) ohci_hcd(E) tg3(E) ptp(E) pps_core(E)
>> sg(E) raid1(E) sd_mod(E) ata_generic(E) pata_acpi(E) pata_atiixp(E) ahci(E)
>> libahci(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) dax(E)
>> CPU: 0 PID: 12636 Comm: vif2.0-q0-deall Tainted: P           OE
>> 4.14.6-1.el6.x86_64 #1
>> Hardware name: HP ProLiant MicroServer, BIOS O41     10/01/2013
>> task: ffff880062518000 task.stack: ffffc90004f88000
>> RIP: e030:xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback]
>> RSP: e02b:ffffc90004f8bc68 EFLAGS: 00010292
>> RAX: 0000000000000045 RBX: ffffc90000fcd000 RCX: 0000000000000000
>> RDX: ffff88007f4146e8 RSI: ffff88007f40db38 RDI: ffff88007f40db38
>> RBP: ffffc90004f8be98 R08: 000000000000037d R09: 000000000000037e
>> R10: 0000000000000001 R11: 0000000000000000 R12: ffffc90000fd6730
>> R13: 0000160000000000 R14: aaaaaaaaaaaaaaab R15: ffffc9000099bbe8
>> FS:  00007f40c63639a0(0000) GS:ffff88007f400000(0000) knlGS:0000000000000000
>> CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: ffffffffff600400 CR3: 000000006375f000 CR4: 0000000000000660
>> Call Trace:
>>   ? error_exit+0x5/0x20
>>   ? __update_load_avg_cfs_rq+0x176/0x180
>>   ? xen_mc_flush+0x87/0x120
>>   ? xen_load_sp0+0x84/0xa0
>>   ? __switch_to+0x1c1/0x360
>>   ? finish_task_switch+0x78/0x240
>>   ? __schedule+0x192/0x496
>>   ? _raw_spin_lock_irqsave+0x1a/0x3c
>>   ? _raw_spin_lock_irqsave+0x1a/0x3c
>>   ? _raw_spin_unlock_irqrestore+0x11/0x20
>>   xenvif_dealloc_kthread+0x68/0xf0 [xen_netback]
>>   ? do_wait_intr+0x80/0x80
>>   ? xenvif_map_frontend_data_rings+0xe0/0xe0 [xen_netback]
>>   kthread+0x106/0x140
>>   ? kthread_destroy_worker+0x60/0x60
>>   ? kthread_destroy_worker+0x60/0x60
>>   ret_from_fork+0x25/0x30
>> Code: 89 df 49 83 c4 02 e8 e5 f5 ff ff 4d 39 ec 75 e8 eb a2 48 8b 43 20 48
>> c7 c6 10 5b 55 a0 48 8b b8 20 03 00 00 31 c0 e8 85 99 06 e1 <0f> 0b 0f 0b 48
>> 8b 53 20 89 c1 48 c7 c6 48 5b 55 a0 31 c0 45 31
>> RIP: xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback] RSP:
>> ffffc90004f8bc68
>> ---[ end trace 010682c76619a1bd ]---
>>
>> ============================================================================
>> =========
>>
>> Best regards,
>>
>> Alex
>>
>> -----Original Message-----
>> From: Alex Braunegg [mailto:alex.braunegg@gmail.com]
>> Sent: Thursday, 21 December 2017 8:04 AM
>> To: 'xen-devel@lists.xenproject.org'
>> Subject: [BUG] kernel bug encountered at
>> drivers/net/xen-netback/netback.c:430!
>>
>> Hi all,
>>
>> I experienced the following bug whilst using a Xen VM. What happened was
>> that this morning a single Xen VM suddenly terminated without cause with the
>> following being logged in dmesg.
>>
>> Only 1 VM experienced an issue (out of 2 which were running), the other
>> remained up and fully functional until I attempted to restart the crashed VM
>> which triggered the kernel bug.
>>
>> Kernel:	4.14.6
>> Xen:		4.8.2
>>
>> ============================================================================
>> =========
>>
>> vif vif-2-0 vif2.0: Trying to unmap invalid handle! pending_idx: 0x3f
>> ------------[ cut here ]------------
>> kernel BUG at drivers/net/xen-netback/netback.c:430!
>> invalid opcode: 0000 [#1] SMP
>> Modules linked in: xt_physdev(E) iptable_filter(E) ip_tables(E)
>> xen_netback(E) nfsd(E) lockd(E) grace(E) nfs_acl(E) auth_rpcgss(E) sunrpc(E)
>> ipmi_si(E) ipmi_msghandler(E) zfs(POE) zcommon(POE) znvpair(POE) icp(POE)
>> spl(OE) zavl(POE) zunicode(POE) k10temp(E) tpm_infineon(E) sp5100_tco(E)
>> i2c_piix4(E) i2c_core(E) ohci_pci(E) ohci_hcd(E) tg3(E) ptp(E) pps_core(E)
>> sg(E) raid1(E) sd_mod(E) ata_generic(E) pata_acpi(E) pata_atiixp(E) ahci(E)
>> libahci(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) dax(E)
>> CPU: 0 PID: 13163 Comm: vif2.0-q0-deall Tainted: P           OE
>> 4.14.6-1.el6.x86_64 #1
>> Hardware name: HP ProLiant MicroServer, BIOS O41     10/01/2013
>> task: ffff8800595cc980 task.stack: ffffc900028e0000
>> RIP: e030:xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback]
>> RSP: e02b:ffffc900028e3c68 EFLAGS: 00010292
>> RAX: 0000000000000045 RBX: ffffc90002969000 RCX: 0000000000000000
>> RDX: ffff88007f4146e8 RSI: ffff88007f40db38 RDI: ffff88007f40db38
>> RBP: ffffc900028e3e98 R08: 000000000000037b R09: 000000000000037c
>> R10: 0000000000000001 R11: 0000000000000000 R12: ffffc90002972730
>> R13: 0000160000000000 R14: aaaaaaaaaaaaaaab R15: ffffc9000099bbe8
>> FS:  00007fee260ff9a0(0000) GS:ffff88007f400000(0000) knlGS:0000000000000000
>> CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: ffffffffff600400 CR3: 0000000062815000 CR4: 0000000000000660
>> Call Trace:
>>   ? error_exit+0x5/0x20
>>   ? __update_load_avg_cfs_rq+0x176/0x180
>>   ? xen_mc_flush+0x87/0x120
>>   ? xen_load_sp0+0x84/0xa0
>>   ? __switch_to+0x1c1/0x360
>>   ? finish_task_switch+0x78/0x240
>>   ? __schedule+0x192/0x496
>>   ? _raw_spin_lock_irqsave+0x1a/0x3c
>>   ? _raw_spin_lock_irqsave+0x1a/0x3c
>>   ? _raw_spin_unlock_irqrestore+0x11/0x20
>>   xenvif_dealloc_kthread+0x68/0xf0 [xen_netback]
>>   ? do_wait_intr+0x80/0x80
>>   ? xenvif_map_frontend_data_rings+0xe0/0xe0 [xen_netback]
>>   kthread+0x106/0x140
>>   ? kthread_destroy_worker+0x60/0x60
>>   ? kthread_destroy_worker+0x60/0x60
>>   ret_from_fork+0x25/0x30
>> Code: 89 df 49 83 c4 02 e8 e5 f5 ff ff 4d 39 ec 75 e8 eb a2 48 8b 43 20 48
>> c7 c6 10 3b 55 a0 48 8b b8 20 03 00 00 31 c0 e8 85 b9 06 e1 <0f> 0b 0f 0b 48
>> 8b 53 20 89 c1 48 c7 c6 48 3b 55 a0 31 c0 45 31
>> RIP: xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback] RSP:
>> ffffc900028e3c68
>> ---[ end trace 7d827dae67002ffc ]---
>>
>> ============================================================================
>> =========
>>
>> The section of relevant kernel code is:
>>
>> ============================================================================
>> =========
>>
>> static inline void xenvif_grant_handle_reset(struct xenvif_queue *queue,
>>                                               u16 pending_idx)
>> {
>>          if (unlikely(queue->grant_tx_handle[pending_idx] ==
>>                       NETBACK_INVALID_HANDLE)) {
>>                  netdev_err(queue->vif->dev,
>>                             "Trying to unmap invalid handle! pending_idx:
>> 0x%x\n",
>>                             pending_idx);
>>                  BUG();
>>          }
>>          queue->grant_tx_handle[pending_idx] = NETBACK_INVALID_HANDLE;
>> }
>>
>> ============================================================================
>> =========
>>
>> In an attempt to recover from this situation I restarted / destroyed (xl
>> restart <vmname> / xl destroy <vmname>) the VM to recover it's state and the
>> following error messages were logged at the console:
>>
>> ============================================================================
>> =========
>>
>> libxl: error: libxl_exec.c:129:libxl_report_child_exitstatus:
>> /etc/xen/scripts/block remove [25271] died due to fatal signal Segmentation
>> fault
>> libxl: error: libxl_device.c:1080:device_backend_callback: unable to remove
>> device with path /local/domain/0/backend/vif/2/0
>> libxl: error: libxl.c:1647:devices_destroy_cb: libxl__devices_destroy failed
>> for 2
>>
>> ============================================================================
>> =========
>>
>> After which the physical system hung, then the physical system restarted
>> with nothing else logged and everything came back OK & operational including
>> the VM that crashed.
>>
>> Further details (xl dmesg, xl info) attached.
>>
>> Best regards,
>>
>> Alex Braunegg
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xenproject.org
>> https://lists.xenproject.org/mailman/listinfo/xen-devel
>>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [BUG] kernel bug encountered at drivers/net/xen-netback/netback.c:430!
  2017-12-28 18:05     ` Michael Collins
@ 2017-12-28 19:31       ` Alex Braunegg
  2018-01-03 13:54         ` Paul Durrant
  0 siblings, 1 reply; 17+ messages in thread
From: Alex Braunegg @ 2017-12-28 19:31 UTC (permalink / raw)
  To: 'Michael Collins', 'Juergen Gross', xen-devel
  Cc: 'Paul Durrant', 'Wei Liu'

Hi Mike,

Thanks for the confirmation on that. Since the last crash I was having them daily until I downgraded back to kernel 4.4 and Xen 4.6 where stability resumed. Zero crashes since 24th December.

@Paul, Wei,

Can we get this investigated? It appears that this is a stability blocker for Xen releases on newer kernels.

Best regards,

Alex

-----Original Message-----
From: Michael Collins [mailto:mike@ark-net.org] 
Sent: Friday, 29 December 2017 5:05 AM
To: Alex Braunegg; 'Juergen Gross'; xen-devel@lists.xenproject.org
Cc: 'Paul Durrant'; 'Wei Liu'
Subject: Re: [Xen-devel] [BUG] kernel bug encountered at drivers/net/xen-netback/netback.c:430!

Alex,

          I saw this same issue when running a kernel 4.13+, switched 
back to 4.11 and the problem has not resurfaced.  I would like to 
understand the root cause of this issue.

Mike


On 12/22/2017 3:35 PM, Alex Braunegg wrote:
> Hi all,
>
> Another crash this morning:
>
> vif vif-2-0 vif2.0: Trying to unmap invalid handle! pending_idx: 0x3a
> ------------[ cut here ]------------
> kernel BUG at drivers/net/xen-netback/netback.c:430!
> invalid opcode: 0000 [#1] SMP
> Modules linked in: xt_physdev(E) iptable_filter(E) ip_tables(E) xen_netback(E) nfsd(E) lockd(E) grace(E) nfs_acl(E) auth_rpcgss(E) sunrpc(E) ipmi_si(E) ipmi_msghandler(E) k10temp(E) zfs(POE) zcommon(POE) znvpair(POE) icp(POE) spl(OE) zavl(POE) zunicode(POE) tpm_infineon(E) sp5100_tco(E) i2c_piix4(E) i2c_core(E) ohci_pci(E) ohci_hcd(E) tg3(E) ptp(E) pps_core(E) sg(E) raid1(E) sd_mod(E) ata_generic(E) pata_acpi(E) pata_atiixp(E) ahci(E) libahci(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) dax(E)
> CPU: 0 PID: 14238 Comm: vif2.0-q0-deall Tainted: P           OE   4.14.6-1.el6.x86_64 #1
> Hardware name: HP ProLiant MicroServer, BIOS O41     10/01/2013
> task: ffff880059e255c0 task.stack: ffffc90001f64000
> RIP: e030:xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback]
> RSP: e02b:ffffc90001f67c68 EFLAGS: 00010292
> RAX: 0000000000000045 RBX: ffffc90001f55000 RCX: 0000000000000000
> RDX: ffff88007f4146e8 RSI: ffff88007f40db38 RDI: ffff88007f40db38
> RBP: ffffc90001f67e98 R08: 0000000000000372 R09: 0000000000000373
> R10: 0000000000000001 R11: 0000000000000000 R12: ffffc90001f5e730
> R13: 0000160000000000 R14: aaaaaaaaaaaaaaab R15: ffffc9000099bbe8
> FS:  00007f92865d29a0(0000) GS:ffff88007f400000(0000) knlGS:0000000000000000
> CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: ffffffffff600400 CR3: 000000006209c000 CR4: 0000000000000660
> Call Trace:
>   ? _raw_spin_unlock_irqrestore+0x11/0x20
>   ? error_exit+0x5/0x20
>   ? __update_load_avg_cfs_rq+0x176/0x180
>   ? xen_mc_flush+0x87/0x120
>   ? xen_load_sp0+0x84/0xa0
>   ? __switch_to+0x1c1/0x360
>   ? finish_task_switch+0x78/0x240
>   ? __schedule+0x192/0x496
>   ? _raw_spin_lock_irqsave+0x1a/0x3c
>   ? _raw_spin_lock_irqsave+0x1a/0x3c
>   ? _raw_spin_unlock_irqrestore+0x11/0x20
>   xenvif_dealloc_kthread+0x68/0xf0 [xen_netback]
>   ? do_wait_intr+0x80/0x80
>   ? xenvif_map_frontend_data_rings+0xe0/0xe0 [xen_netback]
>   kthread+0x106/0x140
>   ? kthread_destroy_worker+0x60/0x60
>   ret_from_fork+0x25/0x30
> Code: 89 df 49 83 c4 02 e8 e5 f5 ff ff 4d 39 ec 75 e8 eb a2 48 8b 43 20 48 c7 c6 10 2b 55 a0 48 8b b8 20 03 00 00 31 c0 e8 85 c9 06 e1 <0f> 0b 0f 0b 48 8b 53 20 89 c1 48 c7 c6 48 2b 55 a0 31 c0 45 31
> RIP: xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback] RSP: ffffc90001f67c68
> ---[ end trace 130de0b7e39d0eea ]---
>
> Best regards,
>
> Alex
>
>
>
> -----Original Message-----
> From: Juergen Gross [mailto:jgross@suse.com]
> Sent: Friday, 22 December 2017 5:47 PM
> To: Alex Braunegg; xen-devel@lists.xenproject.org
> Cc: Wei Liu; Paul Durrant
> Subject: Re: [Xen-devel] [BUG] kernel bug encountered at drivers/net/xen-netback/netback.c:430!
>
> On 22/12/17 07:40, Alex Braunegg wrote:
>> Hi all,
>>
>> Experienced the same issue again today:
> Ccing the maintainers.
>
>
> Juergen
>
>> ============================================================================
>> =========
>>
>> vif vif-2-0 vif2.0: Trying to unmap invalid handle! pending_idx: 0x2f
>> ------------[ cut here ]------------
>> kernel BUG at drivers/net/xen-netback/netback.c:430!
>> invalid opcode: 0000 [#1] SMP
>> Modules linked in: xt_physdev(E) iptable_filter(E) ip_tables(E)
>> xen_netback(E) nfsd(E) lockd(E) grace(E) nfs_acl(E) auth_rpcgss(E) sunrpc(E)
>> ipmi_si(E) ipmi_msghandler(E) k10temp(E) zfs(POE) zcommon(POE) znvpair(POE)
>> icp(POE) spl(OE) zavl(POE) zunicode(POE) tpm_infineon(E) sp5100_tco(E)
>> i2c_piix4(E) i2c_core(E) ohci_pci(E) ohci_hcd(E) tg3(E) ptp(E) pps_core(E)
>> sg(E) raid1(E) sd_mod(E) ata_generic(E) pata_acpi(E) pata_atiixp(E) ahci(E)
>> libahci(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) dax(E)
>> CPU: 0 PID: 12636 Comm: vif2.0-q0-deall Tainted: P           OE
>> 4.14.6-1.el6.x86_64 #1
>> Hardware name: HP ProLiant MicroServer, BIOS O41     10/01/2013
>> task: ffff880062518000 task.stack: ffffc90004f88000
>> RIP: e030:xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback]
>> RSP: e02b:ffffc90004f8bc68 EFLAGS: 00010292
>> RAX: 0000000000000045 RBX: ffffc90000fcd000 RCX: 0000000000000000
>> RDX: ffff88007f4146e8 RSI: ffff88007f40db38 RDI: ffff88007f40db38
>> RBP: ffffc90004f8be98 R08: 000000000000037d R09: 000000000000037e
>> R10: 0000000000000001 R11: 0000000000000000 R12: ffffc90000fd6730
>> R13: 0000160000000000 R14: aaaaaaaaaaaaaaab R15: ffffc9000099bbe8
>> FS:  00007f40c63639a0(0000) GS:ffff88007f400000(0000) knlGS:0000000000000000
>> CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: ffffffffff600400 CR3: 000000006375f000 CR4: 0000000000000660
>> Call Trace:
>>   ? error_exit+0x5/0x20
>>   ? __update_load_avg_cfs_rq+0x176/0x180
>>   ? xen_mc_flush+0x87/0x120
>>   ? xen_load_sp0+0x84/0xa0
>>   ? __switch_to+0x1c1/0x360
>>   ? finish_task_switch+0x78/0x240
>>   ? __schedule+0x192/0x496
>>   ? _raw_spin_lock_irqsave+0x1a/0x3c
>>   ? _raw_spin_lock_irqsave+0x1a/0x3c
>>   ? _raw_spin_unlock_irqrestore+0x11/0x20
>>   xenvif_dealloc_kthread+0x68/0xf0 [xen_netback]
>>   ? do_wait_intr+0x80/0x80
>>   ? xenvif_map_frontend_data_rings+0xe0/0xe0 [xen_netback]
>>   kthread+0x106/0x140
>>   ? kthread_destroy_worker+0x60/0x60
>>   ? kthread_destroy_worker+0x60/0x60
>>   ret_from_fork+0x25/0x30
>> Code: 89 df 49 83 c4 02 e8 e5 f5 ff ff 4d 39 ec 75 e8 eb a2 48 8b 43 20 48
>> c7 c6 10 5b 55 a0 48 8b b8 20 03 00 00 31 c0 e8 85 99 06 e1 <0f> 0b 0f 0b 48
>> 8b 53 20 89 c1 48 c7 c6 48 5b 55 a0 31 c0 45 31
>> RIP: xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback] RSP:
>> ffffc90004f8bc68
>> ---[ end trace 010682c76619a1bd ]---
>>
>> ============================================================================
>> =========
>>
>> Best regards,
>>
>> Alex
>>
>> -----Original Message-----
>> From: Alex Braunegg [mailto:alex.braunegg@gmail.com]
>> Sent: Thursday, 21 December 2017 8:04 AM
>> To: 'xen-devel@lists.xenproject.org'
>> Subject: [BUG] kernel bug encountered at
>> drivers/net/xen-netback/netback.c:430!
>>
>> Hi all,
>>
>> I experienced the following bug whilst using a Xen VM. What happened was
>> that this morning a single Xen VM suddenly terminated without cause with the
>> following being logged in dmesg.
>>
>> Only 1 VM experienced an issue (out of 2 which were running), the other
>> remained up and fully functional until I attempted to restart the crashed VM
>> which triggered the kernel bug.
>>
>> Kernel:	4.14.6
>> Xen:		4.8.2
>>
>> ============================================================================
>> =========
>>
>> vif vif-2-0 vif2.0: Trying to unmap invalid handle! pending_idx: 0x3f
>> ------------[ cut here ]------------
>> kernel BUG at drivers/net/xen-netback/netback.c:430!
>> invalid opcode: 0000 [#1] SMP
>> Modules linked in: xt_physdev(E) iptable_filter(E) ip_tables(E)
>> xen_netback(E) nfsd(E) lockd(E) grace(E) nfs_acl(E) auth_rpcgss(E) sunrpc(E)
>> ipmi_si(E) ipmi_msghandler(E) zfs(POE) zcommon(POE) znvpair(POE) icp(POE)
>> spl(OE) zavl(POE) zunicode(POE) k10temp(E) tpm_infineon(E) sp5100_tco(E)
>> i2c_piix4(E) i2c_core(E) ohci_pci(E) ohci_hcd(E) tg3(E) ptp(E) pps_core(E)
>> sg(E) raid1(E) sd_mod(E) ata_generic(E) pata_acpi(E) pata_atiixp(E) ahci(E)
>> libahci(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) dax(E)
>> CPU: 0 PID: 13163 Comm: vif2.0-q0-deall Tainted: P           OE
>> 4.14.6-1.el6.x86_64 #1
>> Hardware name: HP ProLiant MicroServer, BIOS O41     10/01/2013
>> task: ffff8800595cc980 task.stack: ffffc900028e0000
>> RIP: e030:xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback]
>> RSP: e02b:ffffc900028e3c68 EFLAGS: 00010292
>> RAX: 0000000000000045 RBX: ffffc90002969000 RCX: 0000000000000000
>> RDX: ffff88007f4146e8 RSI: ffff88007f40db38 RDI: ffff88007f40db38
>> RBP: ffffc900028e3e98 R08: 000000000000037b R09: 000000000000037c
>> R10: 0000000000000001 R11: 0000000000000000 R12: ffffc90002972730
>> R13: 0000160000000000 R14: aaaaaaaaaaaaaaab R15: ffffc9000099bbe8
>> FS:  00007fee260ff9a0(0000) GS:ffff88007f400000(0000) knlGS:0000000000000000
>> CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: ffffffffff600400 CR3: 0000000062815000 CR4: 0000000000000660
>> Call Trace:
>>   ? error_exit+0x5/0x20
>>   ? __update_load_avg_cfs_rq+0x176/0x180
>>   ? xen_mc_flush+0x87/0x120
>>   ? xen_load_sp0+0x84/0xa0
>>   ? __switch_to+0x1c1/0x360
>>   ? finish_task_switch+0x78/0x240
>>   ? __schedule+0x192/0x496
>>   ? _raw_spin_lock_irqsave+0x1a/0x3c
>>   ? _raw_spin_lock_irqsave+0x1a/0x3c
>>   ? _raw_spin_unlock_irqrestore+0x11/0x20
>>   xenvif_dealloc_kthread+0x68/0xf0 [xen_netback]
>>   ? do_wait_intr+0x80/0x80
>>   ? xenvif_map_frontend_data_rings+0xe0/0xe0 [xen_netback]
>>   kthread+0x106/0x140
>>   ? kthread_destroy_worker+0x60/0x60
>>   ? kthread_destroy_worker+0x60/0x60
>>   ret_from_fork+0x25/0x30
>> Code: 89 df 49 83 c4 02 e8 e5 f5 ff ff 4d 39 ec 75 e8 eb a2 48 8b 43 20 48
>> c7 c6 10 3b 55 a0 48 8b b8 20 03 00 00 31 c0 e8 85 b9 06 e1 <0f> 0b 0f 0b 48
>> 8b 53 20 89 c1 48 c7 c6 48 3b 55 a0 31 c0 45 31
>> RIP: xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback] RSP:
>> ffffc900028e3c68
>> ---[ end trace 7d827dae67002ffc ]---
>>
>> ============================================================================
>> =========
>>
>> The section of relevant kernel code is:
>>
>> ============================================================================
>> =========
>>
>> static inline void xenvif_grant_handle_reset(struct xenvif_queue *queue,
>>                                               u16 pending_idx)
>> {
>>          if (unlikely(queue->grant_tx_handle[pending_idx] ==
>>                       NETBACK_INVALID_HANDLE)) {
>>                  netdev_err(queue->vif->dev,
>>                             "Trying to unmap invalid handle! pending_idx:
>> 0x%x\n",
>>                             pending_idx);
>>                  BUG();
>>          }
>>          queue->grant_tx_handle[pending_idx] = NETBACK_INVALID_HANDLE;
>> }
>>
>> ============================================================================
>> =========
>>
>> In an attempt to recover from this situation I restarted / destroyed (xl
>> restart <vmname> / xl destroy <vmname>) the VM to recover it's state and the
>> following error messages were logged at the console:
>>
>> ============================================================================
>> =========
>>
>> libxl: error: libxl_exec.c:129:libxl_report_child_exitstatus:
>> /etc/xen/scripts/block remove [25271] died due to fatal signal Segmentation
>> fault
>> libxl: error: libxl_device.c:1080:device_backend_callback: unable to remove
>> device with path /local/domain/0/backend/vif/2/0
>> libxl: error: libxl.c:1647:devices_destroy_cb: libxl__devices_destroy failed
>> for 2
>>
>> ============================================================================
>> =========
>>
>> After which the physical system hung, then the physical system restarted
>> with nothing else logged and everything came back OK & operational including
>> the VM that crashed.
>>
>> Further details (xl dmesg, xl info) attached.
>>
>> Best regards,
>>
>> Alex Braunegg
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xenproject.org
>> https://lists.xenproject.org/mailman/listinfo/xen-devel
>>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [BUG] kernel bug encountered at drivers/net/xen-netback/netback.c:430!
  2017-12-28 19:31       ` Alex Braunegg
@ 2018-01-03 13:54         ` Paul Durrant
  2018-01-03 18:43           ` Alex Braunegg
  2018-01-03 20:33           ` Christoph Moench-Tegeder
  0 siblings, 2 replies; 17+ messages in thread
From: Paul Durrant @ 2018-01-03 13:54 UTC (permalink / raw)
  To: 'Alex Braunegg', 'Michael Collins',
	'Juergen Gross', xen-devel@lists.xenproject.org
  Cc: Wei Liu

> -----Original Message-----
> From: Alex Braunegg [mailto:alex.braunegg@gmail.com]
> Sent: 28 December 2017 19:32
> To: 'Michael Collins' <mike@ark-net.org>; 'Juergen Gross'
> <jgross@suse.com>; xen-devel@lists.xenproject.org
> Cc: Paul Durrant <Paul.Durrant@citrix.com>; Wei Liu <wei.liu2@citrix.com>
> Subject: RE: [Xen-devel] [BUG] kernel bug encountered at drivers/net/xen-
> netback/netback.c:430!
> 
> Hi Mike,
> 
> Thanks for the confirmation on that. Since the last crash I was having them
> daily until I downgraded back to kernel 4.4 and Xen 4.6 where stability
> resumed. Zero crashes since 24th December.
> 
> @Paul, Wei,
> 
> Can we get this investigated? It appears that this is a stability blocker for Xen
> releases on newer kernels.

The only mildly suspicious thing I can see in netback is:

commit cc8737a5fe9051b7fa052b08c57ddb9f539c389a
Author: Willem de Bruijn <willemb@google.com>
Date:   Fri Aug 25 13:10:43 2017 -0400

    xen-netback: update ubuf_info initialization to anonymous union

    The xen driver initializes struct ubuf_info fields using designated
    initializers. I recently moved these fields inside a nested anonymous
    struct inside an anonymous union. I had missed this use case.

    This breaks compilation of xen-netback with older compilers.
    >From kbuild bot with gcc-4.4.7:

       drivers/net//xen-netback/interface.c: In function
       'xenvif_init_queue':
       >> drivers/net//xen-netback/interface.c:554: error: unknown field 'ctx' specified in initializer
       >> drivers/net//xen-netback/interface.c:554: warning: missing braces around initializer
          drivers/net//xen-netback/interface.c:554: warning: (near initialization for '(anonymous).<anonymous>')
       >> drivers/net//xen-netback/interface.c:554: warning: initialization makes integer from pointer without a cast
       >> drivers/net//xen-netback/interface.c:555: error: unknown field 'desc' specified in initializer

    Add double braces around the designated initializers to match their
    nested position in the struct. After this, compilation succeeds again.

    Fixes: 4ab6c99d99bb ("sock: MSG_ZEROCOPY notification coalescing")
    Reported-by: kbuild bot <lpk@intel.com>
    Signed-off-by: Willem de Bruijn <willemb@google.com>
    Acked-by: Wei Liu <wei.liu2@citrix.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

...and it's only mildly suspicious since netback uses the ubuf_info structure and stores the pending_idx value used by xenvif_grant_handle_reset() (which is the function calling BUG()) in the desc field; I can't spot anything wrong with the patch as such. It could be that the cause is external to netback.

How easy is it to trigger this? I'm assuming, from the original description, that I can probably trigger it by forcibly terminating a running domain and then trying to restart it.

  Paul

> 
> Best regards,
> 
> Alex
> 
> -----Original Message-----
> From: Michael Collins [mailto:mike@ark-net.org]
> Sent: Friday, 29 December 2017 5:05 AM
> To: Alex Braunegg; 'Juergen Gross'; xen-devel@lists.xenproject.org
> Cc: 'Paul Durrant'; 'Wei Liu'
> Subject: Re: [Xen-devel] [BUG] kernel bug encountered at drivers/net/xen-
> netback/netback.c:430!
> 
> Alex,
> 
>           I saw this same issue when running a kernel 4.13+, switched
> back to 4.11 and the problem has not resurfaced.  I would like to
> understand the root cause of this issue.
> 
> Mike
> 
> 
> On 12/22/2017 3:35 PM, Alex Braunegg wrote:
> > Hi all,
> >
> > Another crash this morning:
> >
> > vif vif-2-0 vif2.0: Trying to unmap invalid handle! pending_idx: 0x3a
> > ------------[ cut here ]------------
> > kernel BUG at drivers/net/xen-netback/netback.c:430!
> > invalid opcode: 0000 [#1] SMP
> > Modules linked in: xt_physdev(E) iptable_filter(E) ip_tables(E)
> xen_netback(E) nfsd(E) lockd(E) grace(E) nfs_acl(E) auth_rpcgss(E) sunrpc(E)
> ipmi_si(E) ipmi_msghandler(E) k10temp(E) zfs(POE) zcommon(POE)
> znvpair(POE) icp(POE) spl(OE) zavl(POE) zunicode(POE) tpm_infineon(E)
> sp5100_tco(E) i2c_piix4(E) i2c_core(E) ohci_pci(E) ohci_hcd(E) tg3(E) ptp(E)
> pps_core(E) sg(E) raid1(E) sd_mod(E) ata_generic(E) pata_acpi(E)
> pata_atiixp(E) ahci(E) libahci(E) dm_mirror(E) dm_region_hash(E) dm_log(E)
> dm_mod(E) dax(E)
> > CPU: 0 PID: 14238 Comm: vif2.0-q0-deall Tainted: P           OE   4.14.6-
> 1.el6.x86_64 #1
> > Hardware name: HP ProLiant MicroServer, BIOS O41     10/01/2013
> > task: ffff880059e255c0 task.stack: ffffc90001f64000
> > RIP: e030:xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback]
> > RSP: e02b:ffffc90001f67c68 EFLAGS: 00010292
> > RAX: 0000000000000045 RBX: ffffc90001f55000 RCX: 0000000000000000
> > RDX: ffff88007f4146e8 RSI: ffff88007f40db38 RDI: ffff88007f40db38
> > RBP: ffffc90001f67e98 R08: 0000000000000372 R09: 0000000000000373
> > R10: 0000000000000001 R11: 0000000000000000 R12: ffffc90001f5e730
> > R13: 0000160000000000 R14: aaaaaaaaaaaaaaab R15: ffffc9000099bbe8
> > FS:  00007f92865d29a0(0000) GS:ffff88007f400000(0000)
> knlGS:0000000000000000
> > CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: ffffffffff600400 CR3: 000000006209c000 CR4: 0000000000000660
> > Call Trace:
> >   ? _raw_spin_unlock_irqrestore+0x11/0x20
> >   ? error_exit+0x5/0x20
> >   ? __update_load_avg_cfs_rq+0x176/0x180
> >   ? xen_mc_flush+0x87/0x120
> >   ? xen_load_sp0+0x84/0xa0
> >   ? __switch_to+0x1c1/0x360
> >   ? finish_task_switch+0x78/0x240
> >   ? __schedule+0x192/0x496
> >   ? _raw_spin_lock_irqsave+0x1a/0x3c
> >   ? _raw_spin_lock_irqsave+0x1a/0x3c
> >   ? _raw_spin_unlock_irqrestore+0x11/0x20
> >   xenvif_dealloc_kthread+0x68/0xf0 [xen_netback]
> >   ? do_wait_intr+0x80/0x80
> >   ? xenvif_map_frontend_data_rings+0xe0/0xe0 [xen_netback]
> >   kthread+0x106/0x140
> >   ? kthread_destroy_worker+0x60/0x60
> >   ret_from_fork+0x25/0x30
> > Code: 89 df 49 83 c4 02 e8 e5 f5 ff ff 4d 39 ec 75 e8 eb a2 48 8b 43 20 48 c7 c6
> 10 2b 55 a0 48 8b b8 20 03 00 00 31 c0 e8 85 c9 06 e1 <0f> 0b 0f 0b 48 8b 53 20
> 89 c1 48 c7 c6 48 2b 55 a0 31 c0 45 31
> > RIP: xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback] RSP:
> ffffc90001f67c68
> > ---[ end trace 130de0b7e39d0eea ]---
> >
> > Best regards,
> >
> > Alex
> >
> >
> >
> > -----Original Message-----
> > From: Juergen Gross [mailto:jgross@suse.com]
> > Sent: Friday, 22 December 2017 5:47 PM
> > To: Alex Braunegg; xen-devel@lists.xenproject.org
> > Cc: Wei Liu; Paul Durrant
> > Subject: Re: [Xen-devel] [BUG] kernel bug encountered at
> drivers/net/xen-netback/netback.c:430!
> >
> > On 22/12/17 07:40, Alex Braunegg wrote:
> >> Hi all,
> >>
> >> Experienced the same issue again today:
> > Ccing the maintainers.
> >
> >
> > Juergen
> >
> >>
> ==========================================================
> ==================
> >> =========
> >>
> >> vif vif-2-0 vif2.0: Trying to unmap invalid handle! pending_idx: 0x2f
> >> ------------[ cut here ]------------
> >> kernel BUG at drivers/net/xen-netback/netback.c:430!
> >> invalid opcode: 0000 [#1] SMP
> >> Modules linked in: xt_physdev(E) iptable_filter(E) ip_tables(E)
> >> xen_netback(E) nfsd(E) lockd(E) grace(E) nfs_acl(E) auth_rpcgss(E)
> sunrpc(E)
> >> ipmi_si(E) ipmi_msghandler(E) k10temp(E) zfs(POE) zcommon(POE)
> znvpair(POE)
> >> icp(POE) spl(OE) zavl(POE) zunicode(POE) tpm_infineon(E) sp5100_tco(E)
> >> i2c_piix4(E) i2c_core(E) ohci_pci(E) ohci_hcd(E) tg3(E) ptp(E) pps_core(E)
> >> sg(E) raid1(E) sd_mod(E) ata_generic(E) pata_acpi(E) pata_atiixp(E)
> ahci(E)
> >> libahci(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) dax(E)
> >> CPU: 0 PID: 12636 Comm: vif2.0-q0-deall Tainted: P           OE
> >> 4.14.6-1.el6.x86_64 #1
> >> Hardware name: HP ProLiant MicroServer, BIOS O41     10/01/2013
> >> task: ffff880062518000 task.stack: ffffc90004f88000
> >> RIP: e030:xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback]
> >> RSP: e02b:ffffc90004f8bc68 EFLAGS: 00010292
> >> RAX: 0000000000000045 RBX: ffffc90000fcd000 RCX: 0000000000000000
> >> RDX: ffff88007f4146e8 RSI: ffff88007f40db38 RDI: ffff88007f40db38
> >> RBP: ffffc90004f8be98 R08: 000000000000037d R09: 000000000000037e
> >> R10: 0000000000000001 R11: 0000000000000000 R12: ffffc90000fd6730
> >> R13: 0000160000000000 R14: aaaaaaaaaaaaaaab R15: ffffc9000099bbe8
> >> FS:  00007f40c63639a0(0000) GS:ffff88007f400000(0000)
> knlGS:0000000000000000
> >> CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> CR2: ffffffffff600400 CR3: 000000006375f000 CR4: 0000000000000660
> >> Call Trace:
> >>   ? error_exit+0x5/0x20
> >>   ? __update_load_avg_cfs_rq+0x176/0x180
> >>   ? xen_mc_flush+0x87/0x120
> >>   ? xen_load_sp0+0x84/0xa0
> >>   ? __switch_to+0x1c1/0x360
> >>   ? finish_task_switch+0x78/0x240
> >>   ? __schedule+0x192/0x496
> >>   ? _raw_spin_lock_irqsave+0x1a/0x3c
> >>   ? _raw_spin_lock_irqsave+0x1a/0x3c
> >>   ? _raw_spin_unlock_irqrestore+0x11/0x20
> >>   xenvif_dealloc_kthread+0x68/0xf0 [xen_netback]
> >>   ? do_wait_intr+0x80/0x80
> >>   ? xenvif_map_frontend_data_rings+0xe0/0xe0 [xen_netback]
> >>   kthread+0x106/0x140
> >>   ? kthread_destroy_worker+0x60/0x60
> >>   ? kthread_destroy_worker+0x60/0x60
> >>   ret_from_fork+0x25/0x30
> >> Code: 89 df 49 83 c4 02 e8 e5 f5 ff ff 4d 39 ec 75 e8 eb a2 48 8b 43 20 48
> >> c7 c6 10 5b 55 a0 48 8b b8 20 03 00 00 31 c0 e8 85 99 06 e1 <0f> 0b 0f 0b 48
> >> 8b 53 20 89 c1 48 c7 c6 48 5b 55 a0 31 c0 45 31
> >> RIP: xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback] RSP:
> >> ffffc90004f8bc68
> >> ---[ end trace 010682c76619a1bd ]---
> >>
> >>
> ==========================================================
> ==================
> >> =========
> >>
> >> Best regards,
> >>
> >> Alex
> >>
> >> -----Original Message-----
> >> From: Alex Braunegg [mailto:alex.braunegg@gmail.com]
> >> Sent: Thursday, 21 December 2017 8:04 AM
> >> To: 'xen-devel@lists.xenproject.org'
> >> Subject: [BUG] kernel bug encountered at
> >> drivers/net/xen-netback/netback.c:430!
> >>
> >> Hi all,
> >>
> >> I experienced the following bug whilst using a Xen VM. What happened
> was
> >> that this morning a single Xen VM suddenly terminated without cause
> with the
> >> following being logged in dmesg.
> >>
> >> Only 1 VM experienced an issue (out of 2 which were running), the other
> >> remained up and fully functional until I attempted to restart the crashed
> VM
> >> which triggered the kernel bug.
> >>
> >> Kernel:	4.14.6
> >> Xen:		4.8.2
> >>
> >>
> ==========================================================
> ==================
> >> =========
> >>
> >> vif vif-2-0 vif2.0: Trying to unmap invalid handle! pending_idx: 0x3f
> >> ------------[ cut here ]------------
> >> kernel BUG at drivers/net/xen-netback/netback.c:430!
> >> invalid opcode: 0000 [#1] SMP
> >> Modules linked in: xt_physdev(E) iptable_filter(E) ip_tables(E)
> >> xen_netback(E) nfsd(E) lockd(E) grace(E) nfs_acl(E) auth_rpcgss(E)
> sunrpc(E)
> >> ipmi_si(E) ipmi_msghandler(E) zfs(POE) zcommon(POE) znvpair(POE)
> icp(POE)
> >> spl(OE) zavl(POE) zunicode(POE) k10temp(E) tpm_infineon(E)
> sp5100_tco(E)
> >> i2c_piix4(E) i2c_core(E) ohci_pci(E) ohci_hcd(E) tg3(E) ptp(E) pps_core(E)
> >> sg(E) raid1(E) sd_mod(E) ata_generic(E) pata_acpi(E) pata_atiixp(E)
> ahci(E)
> >> libahci(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) dax(E)
> >> CPU: 0 PID: 13163 Comm: vif2.0-q0-deall Tainted: P           OE
> >> 4.14.6-1.el6.x86_64 #1
> >> Hardware name: HP ProLiant MicroServer, BIOS O41     10/01/2013
> >> task: ffff8800595cc980 task.stack: ffffc900028e0000
> >> RIP: e030:xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback]
> >> RSP: e02b:ffffc900028e3c68 EFLAGS: 00010292
> >> RAX: 0000000000000045 RBX: ffffc90002969000 RCX: 0000000000000000
> >> RDX: ffff88007f4146e8 RSI: ffff88007f40db38 RDI: ffff88007f40db38
> >> RBP: ffffc900028e3e98 R08: 000000000000037b R09: 000000000000037c
> >> R10: 0000000000000001 R11: 0000000000000000 R12: ffffc90002972730
> >> R13: 0000160000000000 R14: aaaaaaaaaaaaaaab R15: ffffc9000099bbe8
> >> FS:  00007fee260ff9a0(0000) GS:ffff88007f400000(0000)
> knlGS:0000000000000000
> >> CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> CR2: ffffffffff600400 CR3: 0000000062815000 CR4: 0000000000000660
> >> Call Trace:
> >>   ? error_exit+0x5/0x20
> >>   ? __update_load_avg_cfs_rq+0x176/0x180
> >>   ? xen_mc_flush+0x87/0x120
> >>   ? xen_load_sp0+0x84/0xa0
> >>   ? __switch_to+0x1c1/0x360
> >>   ? finish_task_switch+0x78/0x240
> >>   ? __schedule+0x192/0x496
> >>   ? _raw_spin_lock_irqsave+0x1a/0x3c
> >>   ? _raw_spin_lock_irqsave+0x1a/0x3c
> >>   ? _raw_spin_unlock_irqrestore+0x11/0x20
> >>   xenvif_dealloc_kthread+0x68/0xf0 [xen_netback]
> >>   ? do_wait_intr+0x80/0x80
> >>   ? xenvif_map_frontend_data_rings+0xe0/0xe0 [xen_netback]
> >>   kthread+0x106/0x140
> >>   ? kthread_destroy_worker+0x60/0x60
> >>   ? kthread_destroy_worker+0x60/0x60
> >>   ret_from_fork+0x25/0x30
> >> Code: 89 df 49 83 c4 02 e8 e5 f5 ff ff 4d 39 ec 75 e8 eb a2 48 8b 43 20 48
> >> c7 c6 10 3b 55 a0 48 8b b8 20 03 00 00 31 c0 e8 85 b9 06 e1 <0f> 0b 0f 0b 48
> >> 8b 53 20 89 c1 48 c7 c6 48 3b 55 a0 31 c0 45 31
> >> RIP: xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback] RSP:
> >> ffffc900028e3c68
> >> ---[ end trace 7d827dae67002ffc ]---
> >>
> >>
> ==========================================================
> ==================
> >> =========
> >>
> >> The section of relevant kernel code is:
> >>
> >>
> ==========================================================
> ==================
> >> =========
> >>
> >> static inline void xenvif_grant_handle_reset(struct xenvif_queue
> *queue,
> >>                                               u16 pending_idx)
> >> {
> >>          if (unlikely(queue->grant_tx_handle[pending_idx] ==
> >>                       NETBACK_INVALID_HANDLE)) {
> >>                  netdev_err(queue->vif->dev,
> >>                             "Trying to unmap invalid handle! pending_idx:
> >> 0x%x\n",
> >>                             pending_idx);
> >>                  BUG();
> >>          }
> >>          queue->grant_tx_handle[pending_idx] =
> NETBACK_INVALID_HANDLE;
> >> }
> >>
> >>
> ==========================================================
> ==================
> >> =========
> >>
> >> In an attempt to recover from this situation I restarted / destroyed (xl
> >> restart <vmname> / xl destroy <vmname>) the VM to recover it's state
> and the
> >> following error messages were logged at the console:
> >>
> >>
> ==========================================================
> ==================
> >> =========
> >>
> >> libxl: error: libxl_exec.c:129:libxl_report_child_exitstatus:
> >> /etc/xen/scripts/block remove [25271] died due to fatal signal
> Segmentation
> >> fault
> >> libxl: error: libxl_device.c:1080:device_backend_callback: unable to
> remove
> >> device with path /local/domain/0/backend/vif/2/0
> >> libxl: error: libxl.c:1647:devices_destroy_cb: libxl__devices_destroy failed
> >> for 2
> >>
> >>
> ==========================================================
> ==================
> >> =========
> >>
> >> After which the physical system hung, then the physical system restarted
> >> with nothing else logged and everything came back OK & operational
> including
> >> the VM that crashed.
> >>
> >> Further details (xl dmesg, xl info) attached.
> >>
> >> Best regards,
> >>
> >> Alex Braunegg
> >>
> >>
> >> _______________________________________________
> >> Xen-devel mailing list
> >> Xen-devel@lists.xenproject.org
> >> https://lists.xenproject.org/mailman/listinfo/xen-devel
> >>
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xenproject.org
> > https://lists.xenproject.org/mailman/listinfo/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [BUG] kernel bug encountered at drivers/net/xen-netback/netback.c:430!
  2018-01-03 13:54         ` Paul Durrant
@ 2018-01-03 18:43           ` Alex Braunegg
  2018-01-03 20:33           ` Christoph Moench-Tegeder
  1 sibling, 0 replies; 17+ messages in thread
From: Alex Braunegg @ 2018-01-03 18:43 UTC (permalink / raw)
  To: 'Paul Durrant', 'Michael Collins',
	'Juergen Gross', xen-devel
  Cc: 'Wei Liu'

> How easy is it to trigger this? I'm assuming, from the original description, that I can probably trigger it by forcibly terminating a running domain and then trying to restart it.

For me the trigger was just having 2 VM's running and then within 24 hr's one would crash with the debug data sent to console / dmesg. I didn’t have to do anything special to trigger it - nor did I try / attempt to trigger it.

When attempting to restart the crashed VM (using xl) - that’s when I got the additional xl messages & the server rebooted.

> This breaks compilation of xen-netback with older compilers.
>    >From kbuild bot with gcc-4.4.7:

My Xen version (and all packages other packages including the kernel) are built / rebuilt using gcc 4.6.2 so I don’t think I am hitting this gcc issue that the patch fixed.

Best regards,

Alex




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [BUG] kernel bug encountered at drivers/net/xen-netback/netback.c:430!
  2018-01-03 13:54         ` Paul Durrant
  2018-01-03 18:43           ` Alex Braunegg
@ 2018-01-03 20:33           ` Christoph Moench-Tegeder
  2018-01-04 10:29             ` Paul Durrant
  1 sibling, 1 reply; 17+ messages in thread
From: Christoph Moench-Tegeder @ 2018-01-03 20:33 UTC (permalink / raw)
  To: Paul Durrant
  Cc: 'Michael Collins', 'Juergen Gross', Wei Liu,
	'Alex Braunegg', xen-devel@lists.xenproject.org

## Paul Durrant (Paul.Durrant@citrix.com):

> How easy is it to trigger this? I'm assuming, from the original
> description, that I can probably trigger it by forcibly terminating
> a running domain and then trying to restart it.

As Alex said: in the "common cases" (like his and mine) it seems to
be enough to run a few DomUs and just wait a little (no special
load required) - with my 10 domains, the bg triggers in a few minutes
( https://lists.xenproject.org/archives/html/xen-devel/2017-12/msg01516.html
is my report of the issue - I didn't spot Alex' report).
The order of event here is:
- boot Dom0
- xl create a few DomUs (all recent Linux, all builder=hvm in my setup,
  each VM has exactly one virtual network interface, all bridged onto
  the one ethernet interface on the Dom0 which carries all traffic
  to the Dom0 and the DomUs)
- after a few minutes, the Dom0 kernel logs the BUG() in question
- shortly after (not immediately! - may take even some more minutes)
  the DomU behind the vif reported in the BUG becomes unresponsive:
  no network traffic, no reaction on the virtual console, no message
  in syslog).
- trying to xl destroy the unresponsive domain (or trying to do a
  normal shutdown on one of the other domains) results in the corrupted
  state documented in my earlier report (see link).

In my case this "cannot" be an issue with an old gcc - Debian 9 ships
with "gcc (Debian 6.3.0-18) 6.3.0 20170516" (but beware of new bugs,
who knows?).

I could try a new kernel (KPTI, yay!) with that "mildly suspicious" commit
cc8737a5fe9051b7fa052b08c57ddb9f539c389a reverted on the weekend and
report back (just to rule that out - like you, I don't really believe
that this is the cause).
For the record, I'm still running 4.13.16 on the Dom0 (that's the last
working Dom0 kernel).

Regards,
Christoph

-- 
Spare Space

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [BUG] kernel bug encountered at drivers/net/xen-netback/netback.c:430!
  2018-01-03 20:33           ` Christoph Moench-Tegeder
@ 2018-01-04 10:29             ` Paul Durrant
  2018-01-07 22:19               ` 'Christoph Moench-Tegeder'
  0 siblings, 1 reply; 17+ messages in thread
From: Paul Durrant @ 2018-01-04 10:29 UTC (permalink / raw)
  To: 'Christoph Moench-Tegeder'
  Cc: 'Michael Collins', 'Juergen Gross', Wei Liu,
	'Alex Braunegg', xen-devel@lists.xenproject.org

> -----Original Message-----
> From: Christoph Moench-Tegeder [mailto:cmt@burggraben.net]
> Sent: 03 January 2018 20:34
> To: Paul Durrant <Paul.Durrant@citrix.com>
> Cc: 'Alex Braunegg' <alex.braunegg@gmail.com>; 'Michael Collins'
> <mike@ark-net.org>; 'Juergen Gross' <jgross@suse.com>; xen-
> devel@lists.xenproject.org; Wei Liu <wei.liu2@citrix.com>
> Subject: Re: [Xen-devel] [BUG] kernel bug encountered at drivers/net/xen-
> netback/netback.c:430!
> 
> ## Paul Durrant (Paul.Durrant@citrix.com):
> 
> > How easy is it to trigger this? I'm assuming, from the original
> > description, that I can probably trigger it by forcibly terminating
> > a running domain and then trying to restart it.
> 
> As Alex said: in the "common cases" (like his and mine) it seems to
> be enough to run a few DomUs and just wait a little (no special
> load required) - with my 10 domains, the bg triggers in a few minutes
> ( https://lists.xenproject.org/archives/html/xen-devel/2017-
> 12/msg01516.html
> is my report of the issue - I didn't spot Alex' report).
> The order of event here is:
> - boot Dom0
> - xl create a few DomUs (all recent Linux, all builder=hvm in my setup,
>   each VM has exactly one virtual network interface, all bridged onto
>   the one ethernet interface on the Dom0 which carries all traffic
>   to the Dom0 and the DomUs)
> - after a few minutes, the Dom0 kernel logs the BUG() in question
> - shortly after (not immediately! - may take even some more minutes)
>   the DomU behind the vif reported in the BUG becomes unresponsive:
>   no network traffic, no reaction on the virtual console, no message
>   in syslog).
> - trying to xl destroy the unresponsive domain (or trying to do a
>   normal shutdown on one of the other domains) results in the corrupted
>   state documented in my earlier report (see link).
> 
> In my case this "cannot" be an issue with an old gcc - Debian 9 ships
> with "gcc (Debian 6.3.0-18) 6.3.0 20170516" (but beware of new bugs,
> who knows?).
> 
> I could try a new kernel (KPTI, yay!) with that "mildly suspicious" commit
> cc8737a5fe9051b7fa052b08c57ddb9f539c389a reverted on the weekend and
> report back (just to rule that out - like you, I don't really believe
> that this is the cause).
> For the record, I'm still running 4.13.16 on the Dom0 (that's the last
> working Dom0 kernel).

Thanks. Well, that's the only netback commit that's in master but not in 4.13.16 so it would be useful to conclusively rule that out as a cause.

  Cheers,

    Paul

> 
> Regards,
> Christoph
> 
> --
> Spare Space
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [BUG] kernel bug encountered at drivers/net/xen-netback/netback.c:430!
  2018-01-04 10:29             ` Paul Durrant
@ 2018-01-07 22:19               ` 'Christoph Moench-Tegeder'
  2018-01-08  9:35                 ` Paul Durrant
  0 siblings, 1 reply; 17+ messages in thread
From: 'Christoph Moench-Tegeder' @ 2018-01-07 22:19 UTC (permalink / raw)
  To: Paul Durrant
  Cc: 'Michael Collins', 'Juergen Gross', Wei Liu,
	'Alex Braunegg', xen-devel@lists.xenproject.org

## Paul Durrant (Paul.Durrant@citrix.com):

> > I could try a new kernel (KPTI, yay!) with that "mildly suspicious" commit
> > cc8737a5fe9051b7fa052b08c57ddb9f539c389a reverted on the weekend and
> > report back (just to rule that out - like you, I don't really believe
> > that this is the cause).
> > For the record, I'm still running 4.13.16 on the Dom0 (that's the last
> > working Dom0 kernel).
> 
> Thanks. Well, that's the only netback commit that's in master but not in
> 4.13.16 so it would be useful to conclusively rule that out as a cause.

Funny thing: with that commit reverted, I'm running 4.14.12 on my Dom0.
That's holding much longer than any 4.4 kernel on that host before.
That's interesing, as the crashing code looks more correct (at least
for me and some compiler...), and the change is rather small.

Regards,
Christoph

-- 
Spare Space

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [BUG] kernel bug encountered at drivers/net/xen-netback/netback.c:430!
  2018-01-07 22:19               ` 'Christoph Moench-Tegeder'
@ 2018-01-08  9:35                 ` Paul Durrant
  2018-01-09  9:44                   ` Paul Durrant
  0 siblings, 1 reply; 17+ messages in thread
From: Paul Durrant @ 2018-01-08  9:35 UTC (permalink / raw)
  To: 'Christoph Moench-Tegeder'
  Cc: 'Michael Collins', 'Juergen Gross', Wei Liu,
	'Alex Braunegg', xen-devel@lists.xenproject.org

> -----Original Message-----
> From: 'Christoph Moench-Tegeder' [mailto:cmt@burggraben.net]
> Sent: 07 January 2018 22:19
> To: Paul Durrant <Paul.Durrant@citrix.com>
> Cc: 'Michael Collins' <mike@ark-net.org>; 'Juergen Gross'
> <jgross@suse.com>; Wei Liu <wei.liu2@citrix.com>; 'Alex Braunegg'
> <alex.braunegg@gmail.com>; xen-devel@lists.xenproject.org
> Subject: Re: [Xen-devel] [BUG] kernel bug encountered at drivers/net/xen-
> netback/netback.c:430!
> 
> ## Paul Durrant (Paul.Durrant@citrix.com):
> 
> > > I could try a new kernel (KPTI, yay!) with that "mildly suspicious" commit
> > > cc8737a5fe9051b7fa052b08c57ddb9f539c389a reverted on the weekend
> and
> > > report back (just to rule that out - like you, I don't really believe
> > > that this is the cause).
> > > For the record, I'm still running 4.13.16 on the Dom0 (that's the last
> > > working Dom0 kernel).
> >
> > Thanks. Well, that's the only netback commit that's in master but not in
> > 4.13.16 so it would be useful to conclusively rule that out as a cause.
> 
> Funny thing: with that commit reverted, I'm running 4.14.12 on my Dom0.
> That's holding much longer than any 4.4 kernel on that host before.
> That's interesing, as the crashing code looks more correct (at least
> for me and some compiler...), and the change is rather small.
> 

Yes, that is very strange. Thanks for the info.

Cheers,

  Paul

> Regards,
> Christoph
> 
> --
> Spare Space
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [BUG] kernel bug encountered at drivers/net/xen-netback/netback.c:430!
  2018-01-08  9:35                 ` Paul Durrant
@ 2018-01-09  9:44                   ` Paul Durrant
  2018-01-10 12:52                     ` Paul Durrant
  0 siblings, 1 reply; 17+ messages in thread
From: Paul Durrant @ 2018-01-09  9:44 UTC (permalink / raw)
  To: Paul Durrant, 'Christoph Moench-Tegeder'
  Cc: 'Michael Collins', 'Juergen Gross', Wei Liu,
	'Alex Braunegg', xen-devel@lists.xenproject.org

I finally have a reliable repro and and it's trivial...

Just try to copy a large file out of a Windows VM to an SMB share (using PV drivers in the VM). Dom0 goes bang pretty much immediately. I get another BUG too on another CPU...

[ 1062.422497] ------------[ cut here ]------------
[ 1062.422510] kernel BUG at drivers/net/xen-netback/netback.c:1225!
[ 1062.422518] invalid opcode: 0000 [#2] SMP
[ 1062.422522] Modules linked in: xt_physdev br_netfilter iptable_filter tun nfsv3 nfs_acl rpcsec_gss_krbl
[ 1062.422618]  ahci libahci ehci_pci libata ehci_hcd tg3 megaraid_sas ptp usbcore pps_core scsi_mod libpy
[ 1062.422636] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G      D W       4.14.0-rc5+ #13
[ 1062.422642] Hardware name: Dell Inc. PowerEdge R730/0H21J3, BIOS 1.5.4 10/002/2015
[ 1062.422649] task: ffffffff81c10480 task.stack: ffffffff81c00000
[ 1062.422659] RIP: 10000e030:xenvif_zerocopy_callback+0x7e/0xc0 [xen_netback]
[ 1062.422666] RSP: e02b:ffff88200e403d28 EFLAGS: 00010012
[ 1062.422672] RAX: 0000000000000240 RBX: ffffc90048a5a260 RCX: 0000000000000100
[ 1062.422678] RDX: 0000000000000540 RSI: ffffc90048a58420 RDI: 0000000000000039
[ 1062.422684] RBP: ffff88200e403d48 R08: 0000000000000000 R09: 0000000000000000
[ 1062.422691] R10: 0000000000000040 R11: ffff881feea1b268 R12: ffffc90048a63810
[ 1062.422697] R13: 0000000000000001 R14: ffffc90048a578e0 R15: ffff882002da4900
[ 1062.422714] FS:  0000000000000000(0000) GS:ffff88200e400000(0000) knlGS:0000000000000000
[ 1062.422721] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1062.422726] CR2: 00005615b0a0e000 CR3: 0000001fc96c4000 CR4: 0000000000042660
[ 1062.422734] Call Trace:
[ 1062.422738]  <IRQ>
[ 1062.422746]  skb_release_data+0xe4/0x110
[ 1062.422753]  skb_release_all+0x24/0x30
[ 1062.422758]  consume_skb+0x2c/0x90
[ 1062.422765]  __dev_kfree_skb_any+0x2f/0x40
[ 1062.422776]  tg3_poll_work+0x265/0xf20 [tg3]
[ 1062.422783]  ? xenvif_tx_action+0x758/0x8e0 [xen_netback]
[ 1062.422791]  ? __enqueue_entity+0x5c/0x60
[ 1062.422797]  ? enqueue_entity+0x113/0x7b0
[ 1062.422806]  ? tg3_msi_1shot+0x52/0x60 [tg3]
[ 1062.422814]  tg3_poll+0x7e/0x420 [tg3]
[ 1062.422821]  net_rx_action+0x268/0x3e0
[ 1062.422829]  __do_softirq+0x104/0x28f
[ 1062.422837]  irq_exit+0xb6/0xc0
[ 1062.422843]  xen_evtchn_do_upcall+0x30/0x40
[ 1062.422850]  xen_do_hypervisor_callback+0x29/0x40
[ 1062.422855]  </IRQ>

So, I can now start to investigate.

Cheers,

  Paul
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [BUG] kernel bug encountered at drivers/net/xen-netback/netback.c:430!
  2018-01-09  9:44                   ` Paul Durrant
@ 2018-01-10 12:52                     ` Paul Durrant
  2018-01-10 13:58                       ` Paul Durrant
  0 siblings, 1 reply; 17+ messages in thread
From: Paul Durrant @ 2018-01-10 12:52 UTC (permalink / raw)
  To: 'Christoph Moench-Tegeder'
  Cc: 'Michael Collins', 'Juergen Gross', Wei Liu,
	'Alex Braunegg', xen-devel@lists.xenproject.org

I have tracked down the problem to multiple calls to the zerocopy callback for the same ubuf_info. I am not sure exactly which patch introduced the issue but my suspicion is that it was one of the the MSG_ZEROCOPY series (see https://marc.info/?l=linux-netdev&m=149807997726733&w=2).
I have a candidate patch to netback to make use of the ubuf_info ref count to handle the multiple callbacks and that certainly fixes the issue for me. I'll post this shortly recommending a backport to stable.

  Paul
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [BUG] kernel bug encountered at drivers/net/xen-netback/netback.c:430!
  2018-01-10 12:52                     ` Paul Durrant
@ 2018-01-10 13:58                       ` Paul Durrant
  2018-01-10 17:53                         ` 'Christoph Moench-Tegeder'
  0 siblings, 1 reply; 17+ messages in thread
From: Paul Durrant @ 2018-01-10 13:58 UTC (permalink / raw)
  To: Paul Durrant, 'Christoph Moench-Tegeder'
  Cc: 'Michael Collins', 'Juergen Gross', Wei Liu,
	'Alex Braunegg', xen-devel@lists.xenproject.org

> -----Original Message-----
> From: Xen-devel [mailto:xen-devel-bounces@lists.xenproject.org] On Behalf
> Of Paul Durrant
> Sent: 10 January 2018 12:52
> To: 'Christoph Moench-Tegeder' <cmt@burggraben.net>
> Cc: 'Michael Collins' <mike@ark-net.org>; 'Juergen Gross'
> <jgross@suse.com>; Wei Liu <wei.liu2@citrix.com>; 'Alex Braunegg'
> <alex.braunegg@gmail.com>; xen-devel@lists.xenproject.org
> Subject: Re: [Xen-devel] [BUG] kernel bug encountered at drivers/net/xen-
> netback/netback.c:430!
> 
> I have tracked down the problem to multiple calls to the zerocopy callback for
> the same ubuf_info. I am not sure exactly which patch introduced the issue
> but my suspicion is that it was one of the the MSG_ZEROCOPY series (see
> https://marc.info/?l=linux-netdev&m=149807997726733&w=2).
> I have a candidate patch to netback to make use of the ubuf_info ref count
> to handle the multiple callbacks and that certainly fixes the issue for me. I'll
> post this shortly recommending a backport to stable.
> 

Actually no need... The underlying issue was really a bug and has been fixed in 4.14.11. See https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/?h=v4.14.13&id=17155ea827b2fd81330a442ed56d0edafd9969e1

  Paul
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [BUG] kernel bug encountered at drivers/net/xen-netback/netback.c:430!
  2018-01-10 13:58                       ` Paul Durrant
@ 2018-01-10 17:53                         ` 'Christoph Moench-Tegeder'
  2018-01-10 19:55                           ` Alex Braunegg
  0 siblings, 1 reply; 17+ messages in thread
From: 'Christoph Moench-Tegeder' @ 2018-01-10 17:53 UTC (permalink / raw)
  To: Paul Durrant
  Cc: 'Michael Collins', 'Juergen Gross', Wei Liu,
	'Alex Braunegg', xen-devel@lists.xenproject.org

## Paul Durrant (Paul.Durrant@citrix.com):

> Actually no need... The underlying issue was really a bug and has
> been fixed in 4.14.11.

Oh. That explains why reverting the other patch "fixed" the problem -
I had skipped 4.14.10 and 4.14.11 - and the problem has gone away
independently of that.
Cool, I'll try vanilla 4.14.13 really soon now (once I'm home...)

Thanks for the investigation,
Christoph

-- 
Spare Space.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [BUG] kernel bug encountered at drivers/net/xen-netback/netback.c:430!
  2018-01-10 17:53                         ` 'Christoph Moench-Tegeder'
@ 2018-01-10 19:55                           ` Alex Braunegg
  0 siblings, 0 replies; 17+ messages in thread
From: Alex Braunegg @ 2018-01-10 19:55 UTC (permalink / raw)
  To: 'Christoph Moench-Tegeder', 'Paul Durrant'
  Cc: 'Michael Collins', 'Juergen Gross',
	'Wei Liu', xen-devel


> Actually no need... The underlying issue was really a bug and has
> been fixed in 4.14.11.

Thanks for tracking this down & spending time looking at this Paul.

Best regards,

Alex




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2018-01-10 19:55 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-12-20 21:03 [BUG] kernel bug encountered at drivers/net/xen-netback/netback.c:430! Alex Braunegg
  -- strict thread matches above, loose matches on Subject: below --
2017-12-22  6:40 Alex Braunegg
2017-12-22  6:47 ` Juergen Gross
2017-12-22 20:35   ` Alex Braunegg
2017-12-28 18:05     ` Michael Collins
2017-12-28 19:31       ` Alex Braunegg
2018-01-03 13:54         ` Paul Durrant
2018-01-03 18:43           ` Alex Braunegg
2018-01-03 20:33           ` Christoph Moench-Tegeder
2018-01-04 10:29             ` Paul Durrant
2018-01-07 22:19               ` 'Christoph Moench-Tegeder'
2018-01-08  9:35                 ` Paul Durrant
2018-01-09  9:44                   ` Paul Durrant
2018-01-10 12:52                     ` Paul Durrant
2018-01-10 13:58                       ` Paul Durrant
2018-01-10 17:53                         ` 'Christoph Moench-Tegeder'
2018-01-10 19:55                           ` Alex Braunegg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).