mpt3sas bug with Debian jessie kernel only under Xen

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

* mpt3sas bug with Debian jessie kernel only under Xen - "swiotlb buffer is full"
@ 2016-12-04  8:32 Andy Smith
  2016-12-04 15:59 ` Andrew Cooper
  0 siblings, 1 reply; 4+ messages in thread
From: Andy Smith @ 2016-12-04  8:32 UTC (permalink / raw)
  To: xen-devel

Hi,

I have a Debian jessie server with an LSI SAS controller using the
mpt3sas driver.

Under the Debian jessie amd64 kernel (linux-image-3.16.0-4-amd64
3.16.36-1+deb8u2) running under Xen, I cannot put the system's
storage under heavy load without receiving a bunch of "swiotlb
buffer is full" kernel error messages and severely degraded
performance. Sometimes the system panics and reboots itself.

These problems do not happen if booting the kernel on bare metal.

With a bit of searching I found someone having a similar issue with
the Debian jessie kernel (though 686 and several versions back) and
the tg3 driver:

    https://lists.debian.org/debian-kernel/2015/05/msg00307.html

They mention that suggestions on this list led them to compile a
kernel with NEED_DMA_MAP_STATE set.

I already seem to have that set:

$ grep NEED_DMA /boot/config-3.16.0-4-amd64 
CONFIG_NEED_DMA_MAP_STATE=y

Is there something similar that I could try?

The machine has two SSDs in an md RAID-10 and two spinning disks in
another RAID-10. I can induce the situation within a few seconds by
telling mdadm to check both of those arrays at the same time. i.e.:

# /usr/share/mdadm/checkarray /dev/md4 # Spinny disks
# /usr/share/mdadm/checkarray /dev/md5 # SSDs

I expect to see 200,000K/sec (my set maximum) checking rate reported
in /proc/mdstat for md5, and about 98,000K/sec for md4. This happens
on bare metal.

Under Xen, it starts off well but then the kernel errors appear
within a few seconds; md4's speed drops to ~90,000K/sec and md5's
drops right down to just ~100K/sec. If the machine doesn't do a
kernel panic and reset itself very soon, it becomes unusably slow
anyway.

I can also trigger it with fio if I run jobs against filesystems on
both arrays at once.

Some logs appended at the end of this email.

Would it be useful for me to show you a "dmesg" and "xl dmesg"?

Shall I try a kernel and/or hypervisor from testing?

Thanks,
Andy

Dec  4 07:06:00 elephant kernel: [22019.373653] mpt3sas 0000:01:00.0: swiotlb buffer is full (sz: 57344 bytes)
Dec  4 07:06:00 elephant kernel: [22019.374707] mpt3sas 0000:01:00.0: swiotlb buffer is full
Dec  4 07:06:00 elephant kernel: [22019.375754] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
Dec  4 07:06:00 elephant kernel: [22019.376430] IP: [<ffffffffa004e779>] _base_build_sg_scmd_ieee+0x1f9/0x2d0 [mpt3sas]
Dec  4 07:06:00 elephant kernel: [22019.377122] PGD 0
Dec  4 07:06:00 elephant kernel: [22019.377825] Oops: 0000 [#1] SMP
Dec  4 07:06:00 elephant kernel: [22019.378494] Modules linked in: binfmt_misc xen_gntdev xen_evtchn xenfs xen_privcmd nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc ipt_REJECT xt_LOG xt_limit xt_NFLOG nfnetlink_log nfnetlink xt_multiport xt_tcpudp iptable_filter ip_tables x_tables bonding joydev hid_generic usbhid hid x86_pkg_temp_thermal coretemp crc32_pclmul crc32c_intel iTCO_wdt iTCO_vendor_support evdev aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd pcspkr i2c_i801 ast ttm drm_kms_helper xhci_hcd ehci_pci ehci_hcd drm lpc_ich mfd_core mei_me usbcore mei usb_common igb ptp pps_core dca sg i2c_algo_bit i2c_core shpchp tpm_tis tpm button wmi ipmi_si ipmi_msghandler processor thermal_sys acpi_power_meter fuse autofs4 ext4 crc16 mbcache jbd2 dm_mod raid10 raid1 md_mod sd_mod crc_t10dif crct10dif_generic crct10dif_pclmul crct10dif_common ahci libahci libata mpt3sas raid_class scsi_transport_sas scsi_mod
Dec  4 07:06:00 elephant kernel: [22019.384778] CPU: 0 PID: 29516 Comm: md5_resync Not tainted 3.16.0-4-amd64 #1 Debian 3.16.36-1+deb8u2
Dec  4 07:06:00 elephant kernel: [22019.385574] Hardware name: Supermicro Super Server/X10SRH-CLN4F, BIOS 2.0a 09/20/2016
Dec  4 07:06:00 elephant kernel: [22019.386400] task: ffff8800704ae2d0 ti: ffff88005c410000 task.ti: ffff88005c410000
Dec  4 07:06:00 elephant kernel: [22019.387204] RIP: e030:[<ffffffffa004e779>]  [<ffffffffa004e779>] _base_build_sg_scmd_ieee+0x1f9/0x2d0 [mpt3sas]
Dec  4 07:06:00 elephant kernel: [22019.388054] RSP: e02b:ffff88005c413a00  EFLAGS: 00010282
Dec  4 07:06:00 elephant kernel: [22019.388855] RAX: 0000000000000010 RBX: ffff88006fb84070 RCX: ffff88006fb41be0
Dec  4 07:06:00 elephant kernel: [22019.389684] RDX: 0000000000000000 RSI: 00000000ffffff30 RDI: ffff88005c507300
Dec  4 07:06:00 elephant kernel: [22019.390572] RBP: 00000000ffffffff R08: ffff88006f90ae80 R09: 0000000000000000
Dec  4 07:06:00 elephant kernel: [22019.391395] R10: ffff880078eec000 R11: 0000000000000001 R12: ffff880071230720
Dec  4 07:06:00 elephant kernel: [22019.392235] R13: 00000000ffffffeb R14: 00000000fffffff3 R15: 0000000000000000
Dec  4 07:06:00 elephant kernel: [22019.393031] FS:  0000000000000000(0000) GS:ffff880078400000(0000) knlGS:0000000000000000
Dec  4 07:06:00 elephant kernel: [22019.393850] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec  4 07:06:00 elephant kernel: [22019.394639] CR2: 0000000000000010 CR3: 000000006ece5000 CR4: 0000000000042660
Dec  4 07:06:00 elephant kernel: [22019.395434] Stack:
Dec  4 07:06:00 elephant kernel: [22019.396253]  ffff88006f90ae70 0000008068812a20 ffff88005a1a1500 ffff880071230000
Dec  4 07:06:00 elephant kernel: [22019.397065]  ffff88006f880800 ffff88006fcc2800 ffff88006fb84000 000000000000000a
Dec  4 07:06:00 elephant kernel: [22019.397888]  ffffffffa0058b9f ffff880071230720 0200000000000080 ffff88005a1a1500
Dec  4 07:06:00 elephant kernel: [22019.398727] Call Trace:
Dec  4 07:06:00 elephant kernel: [22019.399529]  [<ffffffffa0058b9f>] ? _scsih_qcmd+0x26f/0x3d0 [mpt3sas]
Dec  4 07:06:00 elephant kernel: [22019.400387]  [<ffffffffa00023e4>] ? scsi_dispatch_cmd+0xb4/0x2d0 [scsi_mod]
Dec  4 07:06:00 elephant kernel: [22019.401200]  [<ffffffffa000acbd>] ? scsi_request_fn+0x2fd/0x500 [scsi_mod]
Dec  4 07:06:00 elephant kernel: [22019.402002]  [<ffffffff8127fe7f>] ? __blk_run_queue+0x2f/0x40
Dec  4 07:06:00 elephant kernel: [22019.402755]  [<ffffffff8127ff39>] ? queue_unplugged+0x29/0xc0
Dec  4 07:06:00 elephant kernel: [22019.403509]  [<ffffffff81284277>] ? blk_flush_plug_list+0x1f7/0x230
Dec  4 07:06:00 elephant kernel: [22019.404286]  [<ffffffff812844ca>] ? blk_queue_bio+0x21a/0x3a0
Dec  4 07:06:00 elephant kernel: [22019.405022]  [<ffffffff8127fcb0>] ? generic_make_request+0xb0/0x100
Dec  4 07:06:00 elephant kernel: [22019.405757]  [<ffffffffa010908a>] ? sync_request+0x133a/0x18b0 [raid10]
Dec  4 07:06:00 elephant kernel: [22019.406526]  [<ffffffffa00dfdf4>] ? md_do_sync+0x944/0xd80 [md_mod]
Dec  4 07:06:00 elephant kernel: [22019.407258]  [<ffffffffa00dcb87>] ? md_thread+0x107/0x120 [md_mod]
Dec  4 07:06:00 elephant kernel: [22019.407971]  [<ffffffff81514951>] ? __schedule+0x2b1/0x6f0
Dec  4 07:06:00 elephant kernel: [22019.408715]  [<ffffffffa00dca80>] ? md_stop+0x40/0x40 [md_mod]
Dec  4 07:06:00 elephant kernel: [22019.409427]  [<ffffffff810894bd>] ? kthread+0xbd/0xe0
Dec  4 07:06:00 elephant kernel: [22019.410151]  [<ffffffff81089400>] ? kthread_create_on_node+0x180/0x180
Dec  4 07:06:00 elephant kernel: [22019.410856]  [<ffffffff815184d8>] ? ret_from_fork+0x58/0x90
Dec  4 07:06:00 elephant kernel: [22019.411539]  [<ffffffff81089400>] ? kthread_create_on_node+0x180/0x180
Dec  4 07:06:00 elephant kernel: [22019.412257] Code: 24 ba 04 00 00 44 89 f6 0f af f0 45 85 f6 0f 85 5e ff ff ff 45 85 ed c6 43 0f 80 c6 43 0e 00 89 73 08 48 89 13 74 48 41 83 fd 01 <49> 8b 47 10 41 8b 57 18 0f 84 a7 00 00 00 41 c6 40 0f 00 41 c6
Dec  4 07:06:00 elephant kernel: [22019.413728] RIP  [<ffffffffa004e779>] _base_build_sg_scmd_ieee+0x1f9/0x2d0 [mpt3sas]
Dec  4 07:06:00 elephant kernel: [22019.414469]  RSP <ffff88005c413a00>
Dec  4 07:06:00 elephant kernel: [22019.415174] CR2: 0000000000000010
Dec  4 07:06:00 elephant kernel: [22019.415857] ---[ end trace 3fa287bf370969b9 ]---
Dec  4 07:06:31 elephant kernel: [22049.728424] sd 0:0:1:0: attempting task abort! scmd(ffff88005a1a1500)
Dec  4 07:06:31 elephant kernel: [22049.729312] sd 0:0:1:0: [sdb] CDB:
Dec  4 07:06:31 elephant kernel: [22049.729962] Read(10): 28 00 00 04 44 00 00 04 00 00
Dec  4 07:06:31 elephant kernel: [22049.730616] scsi target0:0:1: handle(0x000a), sas_address(0x4433221101000000), phy(1)
Dec  4 07:06:31 elephant kernel: [22049.731283] scsi target0:0:1: enclosure_logical_id(0x500304801cb30101), slot(1)
Dec  4 07:06:31 elephant kernel: [22049.760538] sd 0:0:1:0: task abort: FAILED scmd(ffff88005a1a1500)
Dec  4 07:06:31 elephant kernel: [22049.940030] sd 0:0:1:0: attempting device reset! scmd(ffff88005a1a1500)
Dec  4 07:06:31 elephant kernel: [22049.940032] sd 0:0:1:0: [sdb] CDB:
Dec  4 07:06:31 elephant kernel: [22049.940034] Read(10): 28 00 00 04 44 00 00 04 00 00
Dec  4 07:06:31 elephant kernel: [22049.940036] scsi target0:0:1: handle(0x000a), sas_address(0x4433221101000000), phy(1)
Dec  4 07:06:31 elephant kernel: [22049.940037] scsi target0:0:1: enclosure_logical_id(0x500304801cb30101), slot(1)
Dec  4 07:06:31 elephant kernel: [22049.967889] sd 0:0:1:0: device reset: FAILED scmd(ffff88005a1a1500)
Dec  4 07:06:31 elephant kernel: [22049.967891] scsi target0:0:1: attempting target reset! scmd(ffff88005a1a1500)
Dec  4 07:06:31 elephant kernel: [22049.967891] sd 0:0:1:0: [sdb] CDB:
Dec  4 07:06:31 elephant kernel: [22049.967893] Read(10): 28 00 00 04 44 00 00 04 00 00
Dec  4 07:06:31 elephant kernel: [22049.967894] scsi target0:0:1: handle(0x000a), sas_address(0x4433221101000000), phy(1)
Dec  4 07:06:31 elephant kernel: [22049.967894] scsi target0:0:1: enclosure_logical_id(0x500304801cb30101), slot(1)
Dec  4 07:06:31 elephant kernel: [22049.995732] scsi target0:0:1: target reset: FAILED scmd(ffff88005a1a1500)
Dec  4 07:06:31 elephant kernel: [22049.995733] mpt3sas0: attempting host reset! scmd(ffff88005a1a1500)
Dec  4 07:06:31 elephant kernel: [22049.995733] sd 0:0:1:0: [sdb] CDB:
Dec  4 07:06:31 elephant kernel: [22049.995735] Read(10): 28 00 00 04 44 00 00 04 00 00
Dec  4 07:06:41 elephant kernel: [22059.992654] mpt3sas0: sending diag reset !!
Dec  4 07:06:42 elephant kernel: [22061.013062] mpt3sas0: diag reset: SUCCESS
Dec  4 07:06:43 elephant kernel: [22061.122098] mpt3sas0: LSISAS3008: FWVersion(12.00.02.00), ChipRevision(0x02), BiosVersion(08.29.01.00)
Dec  4 07:06:43 elephant kernel: [22061.122099] mpt3sas0: Protocol=(
Dec  4 07:06:43 elephant kernel: [22061.122099] Initiator
Dec  4 07:06:43 elephant kernel: [22061.122099] ,Target
Dec  4 07:06:43 elephant kernel: [22061.122100] ),
Dec  4 07:06:43 elephant kernel: [22061.122100] Capabilities=(
Dec  4 07:06:43 elephant kernel: [22061.122100] TLR
Dec  4 07:06:43 elephant kernel: [22061.122100] ,EEDP
Dec  4 07:06:43 elephant kernel: [22061.122101] ,Snapshot Buffer
Dec  4 07:06:43 elephant kernel: [22061.122101] ,Diag Trace Buffer
Dec  4 07:06:43 elephant kernel: [22061.122101] ,Task Set Full
Dec  4 07:06:43 elephant kernel: [22061.122101] ,NCQ
Dec  4 07:06:43 elephant kernel: [22061.122101] )
Dec  4 07:06:43 elephant kernel: [22061.122153] mpt3sas0: sending port enable !!
Dec  4 07:06:50 elephant kernel: [22068.799886] mpt3sas0: port enable: SUCCESS
Dec  4 07:06:50 elephant kernel: [22068.800615] mpt3sas0: search for end-devices: start
Dec  4 07:06:50 elephant kernel: [22068.801606] scsi target0:0:0: handle(0x0009), sas_addr(0x4433221100000000), enclosure logical id(0x500304801cb30101), slot(0)
Dec  4 07:06:50 elephant kernel: [22068.802221] scsi target0:0:1: handle(0x000a), sas_addr(0x4433221101000000), enclosure logical id(0x500304801cb30101), slot(1)
Dec  4 07:06:50 elephant kernel: [22068.802875] scsi target0:0:2: handle(0x000b), sas_addr(0x4433221106000000), enclosure logical id(0x500304801cb30101), slot(6)
Dec  4 07:06:50 elephant kernel: [22068.803466] scsi target0:0:3: handle(0x000c), sas_addr(0x4433221107000000), enclosure logical id(0x500304801cb30101), slot(7)
Dec  4 07:06:50 elephant kernel: [22068.804055] mpt3sas0: search for end-devices: complete
Dec  4 07:06:50 elephant kernel: [22068.804588] mpt3sas0: search for expanders: start
Dec  4 07:06:50 elephant kernel: [22068.805143] mpt3sas0: search for expanders: complete
Dec  4 07:06:50 elephant kernel: [22068.805645] mpt3sas0: host reset: SUCCESS scmd(ffff88005a1a1500)
Dec  4 07:07:01 elephant kernel: [22079.849149] mpt3sas0: removing unresponding devices: start
Dec  4 07:07:01 elephant kernel: [22079.849927] mpt3sas0: removing unresponding devices: end-devices
Dec  4 07:07:01 elephant kernel: [22079.850454] mpt3sas0: removing unresponding devices: expanders
Dec  4 07:07:01 elephant kernel: [22079.850971] mpt3sas0: removing unresponding devices: complete
Dec  4 07:07:01 elephant kernel: [22079.851481] mpt3sas0: scan devices: start
Dec  4 07:07:01 elephant kernel: [22079.852242] mpt3sas0: 	scan devices: expanders start
Dec  4 07:07:01 elephant kernel: [22079.852785] mpt3sas0: 	break from expander scan: ioc_status(0x0022), loginfo(0x310f0400)
Dec  4 07:07:01 elephant kernel: [22079.853305] mpt3sas0: 	scan devices: expanders complete
Dec  4 07:07:01 elephant kernel: [22079.853809] mpt3sas0: 	scan devices: end devices start
Dec  4 07:07:01 elephant kernel: [22079.854930] mpt3sas0: 	break from end device scan: ioc_status(0x0022), loginfo(0x310f0400)
Dec  4 07:07:01 elephant kernel: [22079.855432] mpt3sas0: 	scan devices: end devices complete
Dec  4 07:07:01 elephant kernel: [22079.855931] mpt3sas0: scan devices: complete
Dec  4 07:15:20 elephant kernel: [22578.567343] BUG: unable to handle kernel NULL pointer dereference at           (null)
Dec  4 07:15:20 elephant kernel: [22578.568618] IP: [<ffffffff8108e6eb>] exit_creds+0x1b/0x60
Dec  4 07:15:20 elephant kernel: [22578.569809] PGD 0
Dec  4 07:15:20 elephant kernel: [22578.571006] Oops: 0002 [#2] SMP
Dec  4 07:15:20 elephant kernel: [22578.572188] Modules linked in: binfmt_misc xen_gntdev xen_evtchn xenfs xen_privcmd nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc ipt_REJECT xt_LOG xt_limit xt_NFLOG nfnetlink_log nfnetlink xt_multiport xt_tcpudp iptable_filter ip_tables x_tables bonding joydev hid_generic usbhid hid x86_pkg_temp_thermal coretemp crc32_pclmul crc32c_intel iTCO_wdt iTCO_vendor_support evdev aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd pcspkr i2c_i801 ast ttm drm_kms_helper xhci_hcd ehci_pci ehci_hcd drm lpc_ich mfd_core mei_me usbcore mei usb_common igb ptp pps_core dca sg i2c_algo_bit i2c_core shpchp tpm_tis tpm button wmi ipmi_si ipmi_msghandler processor thermal_sys acpi_power_meter fuse autofs4 ext4 crc16 mbcache jbd2 dm_mod raid10 raid1 md_mod sd_mod crc_t10dif crct10dif_generic crct10dif_pclmul crct10dif_common ahci libahci libata mpt3sas raid_class scsi_transport_sas scsi_mod
Dec  4 07:15:20 elephant kernel: [22578.581851] CPU: 1 PID: 29917 Comm: checkarray Tainted: G      D       3.16.0-4-amd64 #1 Debian 3.16.36-1+deb8u2
Dec  4 07:15:20 elephant kernel: [22578.583055] Hardware name: Supermicro Super Server/X10SRH-CLN4F, BIOS 2.0a 09/20/2016
Dec  4 07:15:20 elephant kernel: [22578.584230] task: ffff880065446c60 ti: ffff880058240000 task.ti: ffff880058240000
Dec  4 07:15:20 elephant kernel: [22578.585394] RIP: e030:[<ffffffff8108e6eb>]  [<ffffffff8108e6eb>] exit_creds+0x1b/0x60
Dec  4 07:15:20 elephant kernel: [22578.586554] RSP: e02b:ffff880058243e28  EFLAGS: 00010292
Dec  4 07:15:20 elephant kernel: [22578.587690] RAX: ffffffff812380d0 RBX: ffff8800704ae2d0 RCX: 0000000000000000
Dec  4 07:15:20 elephant kernel: [22578.588818] RDX: ffffffff81886ee0 RSI: ffff8800704ae2d0 RDI: 0000000000000000
Dec  4 07:15:20 elephant kernel: [22578.589924] RBP: ffff8800704ae2d0 R08: ffffffff818439c0 R09: 00007f7a067d0670
Dec  4 07:15:20 elephant kernel: [22578.591014] R10: 00007f7a06c157f0 R11: 0000000000000246 R12: 0000000000000000
Dec  4 07:15:20 elephant kernel: [22578.592091] R13: ffff880059b1ac00 R14: 0000000000000005 R15: 0000000000000002
Dec  4 07:15:20 elephant kernel: [22578.593170] FS:  00007f7a06c07700(0000) GS:ffff880078420000(0000) knlGS:0000000000000000
Dec  4 07:15:20 elephant kernel: [22578.594219] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec  4 07:15:20 elephant kernel: [22578.595250] CR2: 0000000000000000 CR3: 0000000059f43000 CR4: 0000000000042660
Dec  4 07:15:20 elephant kernel: [22578.596274] Stack:
Dec  4 07:15:20 elephant kernel: [22578.597275]  ffff8800704ae2d0 ffffffff8106560d 0000000000000000 ffff8800704ae2d0
Dec  4 07:15:20 elephant kernel: [22578.598280]  ffffffff810897b8 ffff88006e418a40 ffff88006ee81150 0000000000000005
Dec  4 07:15:20 elephant kernel: [22578.599272]  ffffffffa00dcbe0 ffff88006ee81000 ffff88006ee811b8 ffffffffa00e4645
Dec  4 07:15:20 elephant kernel: [22578.600246] Call Trace:
Dec  4 07:15:20 elephant kernel: [22578.601222]  [<ffffffff8106560d>] ? __put_task_struct+0x4d/0x130
Dec  4 07:15:20 elephant kernel: [22578.602166]  [<ffffffff810897b8>] ? kthread_stop+0x108/0x110
Dec  4 07:15:20 elephant kernel: [22578.603115]  [<ffffffffa00dcbe0>] ? md_unregister_thread+0x40/0x80 [md_mod]
Dec  4 07:15:20 elephant kernel: [22578.604031]  [<ffffffffa00e4645>] ? md_reap_sync_thread+0x15/0x150 [md_mod]
Dec  4 07:15:20 elephant kernel: [22578.604929]  [<ffffffffa00e47f9>] ? action_store+0x79/0x230 [md_mod]
Dec  4 07:15:20 elephant kernel: [22578.605809]  [<ffffffffa00e06f4>] ? md_attr_store+0xb4/0x100 [md_mod]
Dec  4 07:15:20 elephant kernel: [22578.606672]  [<ffffffff8121aa0a>] ? kernfs_fop_write+0xda/0x150
Dec  4 07:15:20 elephant kernel: [22578.607515]  [<ffffffff811aa872>] ? vfs_write+0xb2/0x1f0
Dec  4 07:15:20 elephant kernel: [22578.608338]  [<ffffffff811ab3b2>] ? SyS_write+0x42/0xa0
Dec  4 07:15:20 elephant kernel: [22578.609139]  [<ffffffff8151a5a8>] ? page_fault+0x28/0x30
Dec  4 07:15:20 elephant kernel: [22578.609920]  [<ffffffff8151858d>] ? system_call_fast_compare_end+0x10/0x15
Dec  4 07:15:20 elephant kernel: [22578.610688] Code: ff 85 c0 0f 84 4f fe ff ff e9 26 fe ff ff 66 90 0f 1f 44 00 00 53 48 89 fb 48 8b bf e0 04 00 00 48 c7 83 e0 04 00 00 00 00 00 00 <f0> ff 0f 74 20 48 8b bb e8 04 00 00 48 c7 83 e8 04 00 00 00 00
Dec  4 07:15:20 elephant kernel: [22578.612291] RIP  [<ffffffff8108e6eb>] exit_creds+0x1b/0x60
Dec  4 07:15:20 elephant kernel: [22578.613040]  RSP <ffff880058243e28>
Dec  4 07:15:20 elephant kernel: [22578.613781] CR2: 0000000000000000
Dec  4 07:15:20 elephant kernel: [22578.614508] ---[ end trace 3fa287bf370969ba ]---

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: mpt3sas bug with Debian jessie kernel only under Xen - "swiotlb buffer is full"
  2016-12-04  8:32 mpt3sas bug with Debian jessie kernel only under Xen - "swiotlb buffer is full" Andy Smith
@ 2016-12-04 15:59 ` Andrew Cooper
  2016-12-04 21:04   ` Andy Smith
  2016-12-06  2:43   ` Andy Smith
  0 siblings, 2 replies; 4+ messages in thread
From: Andrew Cooper @ 2016-12-04 15:59 UTC (permalink / raw)
  To: Andy Smith, xen-devel

On 04/12/16 08:32, Andy Smith wrote:
> Hi,
>
> I have a Debian jessie server with an LSI SAS controller using the
> mpt3sas driver.
>
> Under the Debian jessie amd64 kernel (linux-image-3.16.0-4-amd64
> 3.16.36-1+deb8u2) running under Xen, I cannot put the system's
> storage under heavy load without receiving a bunch of "swiotlb
> buffer is full" kernel error messages and severely degraded
> performance. Sometimes the system panics and reboots itself.
>
> These problems do not happen if booting the kernel on bare metal.
>
> With a bit of searching I found someone having a similar issue with
> the Debian jessie kernel (though 686 and several versions back) and
> the tg3 driver:
>
>     https://lists.debian.org/debian-kernel/2015/05/msg00307.html
>
> They mention that suggestions on this list led them to compile a
> kernel with NEED_DMA_MAP_STATE set.
>
> I already seem to have that set:
>
> $ grep NEED_DMA /boot/config-3.16.0-4-amd64 
> CONFIG_NEED_DMA_MAP_STATE=y
>
> Is there something similar that I could try?
>
> The machine has two SSDs in an md RAID-10 and two spinning disks in
> another RAID-10. I can induce the situation within a few seconds by
> telling mdadm to check both of those arrays at the same time. i.e.:
>
> # /usr/share/mdadm/checkarray /dev/md4 # Spinny disks
> # /usr/share/mdadm/checkarray /dev/md5 # SSDs
>
> I expect to see 200,000K/sec (my set maximum) checking rate reported
> in /proc/mdstat for md5, and about 98,000K/sec for md4. This happens
> on bare metal.
>
> Under Xen, it starts off well but then the kernel errors appear
> within a few seconds; md4's speed drops to ~90,000K/sec and md5's
> drops right down to just ~100K/sec. If the machine doesn't do a
> kernel panic and reset itself very soon, it becomes unusably slow
> anyway.
>
> I can also trigger it with fio if I run jobs against filesystems on
> both arrays at once.
>
> Some logs appended at the end of this email.
>
> Would it be useful for me to show you a "dmesg" and "xl dmesg"?
>
> Shall I try a kernel and/or hypervisor from testing?

Can you try these two patches from the XenServer Patch queue?
https://github.com/xenserver/linux-3.x.pg/blob/master/master/series#L613-L614

There are bugs with some device drivers in choosing the correct DMA
mask, which cause them incorrectly to believe that they need
bounce-buffering.  Once you hit bounce buffering, everything grinds to a
halt.

> Dec  4 07:06:00 elephant kernel: [22019.373653] mpt3sas 0000:01:00.0: swiotlb buffer is full (sz: 57344 bytes)
> Dec  4 07:06:00 elephant kernel: [22019.374707] mpt3sas 0000:01:00.0: swiotlb buffer is full
> Dec  4 07:06:00 elephant kernel: [22019.375754] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
> Dec  4 07:06:00 elephant kernel: [22019.376430] IP: [<ffffffffa004e779>] _base_build_sg_scmd_ieee+0x1f9/0x2d0 [mpt3sas]
> Dec  4 07:06:00 elephant kernel: [22019.377122] PGD 0

This alone is a clear error handling bug in the mpt3sas driver.  It
hasn't checked the DMA mapping call for a successful mapping before
following the NULL pointer it got given back.  It is collateral damage
from the swiotlb buffer being full, but a bug none the less.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: mpt3sas bug with Debian jessie kernel only under Xen - "swiotlb buffer is full"
  2016-12-04 15:59 ` Andrew Cooper
@ 2016-12-04 21:04   ` Andy Smith
  2016-12-06  2:43   ` Andy Smith
  1 sibling, 0 replies; 4+ messages in thread
From: Andy Smith @ 2016-12-04 21:04 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel

Hi Andrew,

On Sun, Dec 04, 2016 at 03:59:20PM +0000, Andrew Cooper wrote:
> Can you try these two patches from the XenServer Patch queue?
> https://github.com/xenserver/linux-3.x.pg/blob/master/master/series#L613-L614

Thanks for getting back to me. I will try this in the next day or
two and get back to you.

Cheers,
Andy

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: mpt3sas bug with Debian jessie kernel only under Xen - "swiotlb buffer is full"
  2016-12-04 15:59 ` Andrew Cooper
  2016-12-04 21:04   ` Andy Smith
@ 2016-12-06  2:43   ` Andy Smith
  1 sibling, 0 replies; 4+ messages in thread
From: Andy Smith @ 2016-12-06  2:43 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel

[-- Attachment #1: Type: text/plain, Size: 4336 bytes --]

Hi Andrew,

On Sun, Dec 04, 2016 at 03:59:20PM +0000, Andrew Cooper wrote:
> On 04/12/16 08:32, Andy Smith wrote:
> > Under the Debian jessie amd64 kernel (linux-image-3.16.0-4-amd64
> > 3.16.36-1+deb8u2) running under Xen, I cannot put the system's
> > storage under heavy load without receiving a bunch of "swiotlb
> > buffer is full" kernel error messages and severely degraded
> > performance. Sometimes the system panics and reboots itself.

[…]

> Can you try these two patches from the XenServer Patch queue?
> https://github.com/xenserver/linux-3.x.pg/blob/master/master/series#L613-L614

Looking good.

Using those patches I'm ~20 minutes into this now:

Every 2.0s: cat /proc/mdstat                                 Tue Dec  6 02:16:40 2016

Personalities : [raid1] [raid10]
md5 : active raid10 sdb[0] sda[1]
      1875243008 blocks super 1.2 512K chunks 2 far-copies [2/2] [UU]
      [==>..................]  check = 11.5% (217058176/1875243008) finish=133.9min speed=206252K/sec
      bitmap: 0/14 pages [0KB], 65536KB chunk

md4 : active raid10 sdc[0] sdd[1]
      3906886656 blocks super 1.2 512K chunks 2 far-copies [2/2] [UU]
      [>....................]  check =  2.6% (102650880/3906886656) finish=674.4min speed=94007K/sec
      bitmap: 0/30 pages [0KB], 65536KB chunk

…where previously it would have given kernel errors within 5
seconds, so I think that fixes it. I will have to perform some more
strenuous testing.

Those two patches did not apply cleanly to source of
linux-image-3.16.0-4-amd64 3.16.36-1+deb8u2. The last bit of each
patch was rejected, so I removed them and put them into a separate
patch file (0003-fixup.patch attached).

I have not done this process in a long time so just for the
archives, my process was as per:

    https://kernel-handbook.alioth.debian.org/ch-common-tasks.html#s-common-official

# mkdir -p /data/debian
# chown andy: /data/debian
# apt-get install build-essential fakeroot
# apt-get build-dep linux
$ cd /data/debian
$ apt-get source linux
$ wget https://raw.githubusercontent.com/xenserver/linux-3.x.pg/master/master/0001-dma-add-dma_get_required_mask_from_max_pfn.patch
$ wget https://raw.githubusercontent.com/xenserver/linux-3.x.pg/master/master/0002-x86-xen-correct-dma_get_required_mask-for-Xen-PV-gue.patch
$ # remove last parts of each patch file, create 0003-fixup.patch that performs equivalent changes
$ cd linux-3.16.36
$ # applying these patches is going to change symbols so changing the abiname
$ # is necessary.
$ # See https://kernel-handbook.alioth.debian.org/ch-versions.html#s-abi-name
$ sed -i -e 's/^abiname: 4/abiname: 4bf/' debian/config/defines
$ fakeroot debian/rules debian/control-real
$ bash debian/bin/test-patches -f amd64 ../0001-dma-add-dma_get_required_mask_from_max_pfn.patch ../0002-x86-xen-correct-dma_get_required_mask-for-Xen-PV-gue.patch ../0003-fixup.patch
# dpkg -i ../linux-headers-3.16.0-4bf-amd64_3.16.36-1+deb8u2a~test_amd64.deb ../linux-image-3.16.0-4bf-amd64_3.16.36-1+deb8u2a~test_amd64.deb

boot into new kernel under Xen

$ uname -a
Linux elephant 3.16.0-4bf-amd64 #1 SMP Debian 3.16.36-1+deb8u2a~test (2016-12-05) x86_64 GNU/Linux

I think my next steps should be:

1. Do some more strenuous testing

2. Report bug against source package "linux" in Debian jessie with
   pointer to those two patches.

3. Check if those fixes are already applied in Debian backports
   and/or Debian testing linux package.

> > Dec  4 07:06:00 elephant kernel: [22019.373653] mpt3sas 0000:01:00.0: swiotlb buffer is full (sz: 57344 bytes)
> > Dec  4 07:06:00 elephant kernel: [22019.374707] mpt3sas 0000:01:00.0: swiotlb buffer is full
> > Dec  4 07:06:00 elephant kernel: [22019.375754] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
> > Dec  4 07:06:00 elephant kernel: [22019.376430] IP: [<ffffffffa004e779>] _base_build_sg_scmd_ieee+0x1f9/0x2d0 [mpt3sas]
> > Dec  4 07:06:00 elephant kernel: [22019.377122] PGD 0
> 
> This alone is a clear error handling bug in the mpt3sas driver.  It
> hasn't checked the DMA mapping call for a successful mapping before
> following the NULL pointer it got given back.  It is collateral damage
> from the swiotlb buffer being full, but a bug none the less.

Does that require reporting as an upstream linux bug in mpt3sas
then?

Thanks for your help.

Cheers,
Andy

[-- Attachment #2: 0003-fixup.patch --]
[-- Type: text/x-diff, Size: 1393 bytes --]

diff -u a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
--- a/drivers/xen/swiotlb-xen.c	2016-06-15 20:29:36.000000000 +0000
+++ b/drivers/xen/swiotlb-xen.c	2016-12-05 07:05:13.009992832 +0000
@@ -673,6 +673,13 @@
 }
 EXPORT_SYMBOL_GPL(xen_swiotlb_dma_supported);
 
+u64
+xen_swiotlb_get_required_mask(struct device *dev)
+{
+	return DMA_BIT_MASK(64);
+}
+EXPORT_SYMBOL_GPL(xen_swiotlb_get_required_mask);
+
 int
 xen_swiotlb_set_dma_mask(struct device *dev, u64 dma_mask)
 {
diff -u a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
--- a/include/linux/dma-mapping.h	2016-06-15 20:29:36.000000000 +0000
+++ b/include/linux/dma-mapping.h	2016-12-05 07:03:13.992601404 +0000
@@ -127,6 +127,7 @@
 	return dma_set_mask_and_coherent(dev, mask);
 }
 
+extern u64 dma_get_required_mask_from_max_pfn(struct device *dev);
 extern u64 dma_get_required_mask(struct device *dev);
 
 #ifndef set_arch_dma_coherent_ops
diff -u a/include/xen/swiotlb-xen.h b/include/xen/swiotlb-xen.h
--- a/include/xen/swiotlb-xen.h	2016-06-15 20:29:36.000000000 +0000
+++ b/include/xen/swiotlb-xen.h	2016-12-05 07:06:01.084938801 +0000
@@ -56,6 +56,10 @@
 extern int
 xen_swiotlb_dma_supported(struct device *hwdev, u64 mask);
 
+extern u64
+xen_swiotlb_get_required_mask(struct device *dev);
+
+
 extern int
 xen_swiotlb_set_dma_mask(struct device *dev, u64 dma_mask);
 #endif /* __LINUX_SWIOTLB_XEN_H */

[-- Attachment #3: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-12-06  2:43 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-12-04  8:32 mpt3sas bug with Debian jessie kernel only under Xen - "swiotlb buffer is full" Andy Smith
2016-12-04 15:59 ` Andrew Cooper
2016-12-04 21:04   ` Andy Smith
2016-12-06  2:43   ` Andy Smith

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).