* "swiotlb buffer is full" problem with tg3 and kernel 3.16.0-4-686-pae on Xen 4.4.1
@ 2015-05-21 15:17 Marco Steinacher
2015-05-21 18:11 ` Ian Campbell
0 siblings, 1 reply; 3+ messages in thread
From: Marco Steinacher @ 2015-05-21 15:17 UTC (permalink / raw)
To: xen-devel; +Cc: Marco Steinacher
Hi,
After upgrading to Debian jessie, and consequently to the default Linux
kernel 3.16.0-4-686-pae and Xen hypervisor 4.4.1-amd64 in that
distribution, I'm having problems with the tg3 network driver under high
load. Unfortunately this affects a production system that I am
administrating. It usually happens when doing a DRBD sync. Here is one
such event:
[ 4765.528635] block drbd0: Began resync as SyncSource (will sync 886784
KB [221696 bits set])
[ 4765.528654] block drbd0: updated sync UUID
09891C136111799E:F7FD1C0A50225596:F7FC1C0A50225596:F7FB1C0A50225596
[ 4768.992280] tg3 0000:02:00.0: swiotlb buffer is full (sz: 1448 bytes)
[ 4769.400296] tg3 0000:02:00.0: swiotlb buffer is full (sz: 1448 bytes)
[ 4770.216360] tg3 0000:02:00.0: swiotlb buffer is full (sz: 1448 bytes)
[ 4771.852283] tg3 0000:02:00.0: swiotlb buffer is full (sz: 1448 bytes)
[ 4775.120286] tg3 0000:02:00.0: swiotlb buffer is full (sz: 1448 bytes)
[ 4775.776027] tg3 0000:02:00.0: swiotlb buffer is full (sz: 32768 bytes)
[ 4775.778814] tg3 0000:02:00.0: swiotlb buffer is full (sz: 1448 bytes)
[ 4775.780995] tg3 0000:02:00.0: swiotlb buffer is full (sz: 1448 bytes)
[ 4775.783345] tg3 0000:02:00.0: swiotlb buffer is full (sz: 1448 bytes)
[ 4775.785097] tg3 0000:02:00.0: swiotlb buffer is full (sz: 1448 bytes)
[ 4775.988290] tg3 0000:02:00.0: swiotlb buffer is full (sz: 1448 bytes)
[ 4776.396285] tg3 0000:02:00.0: swiotlb buffer is full (sz: 1448 bytes)
[ 4777.212295] tg3 0000:02:00.0: swiotlb buffer is full (sz: 1448 bytes)
[ 4778.848298] tg3 0000:02:00.0: swiotlb buffer is full (sz: 1448 bytes)
[ 4781.664292] tg3 0000:02:00.0: swiotlb buffer is full (sz: 1448 bytes)
[ 4782.120285] tg3 0000:02:00.0: swiotlb buffer is full (sz: 1448 bytes)
[ 4788.672288] tg3 0000:02:00.0: swiotlb buffer is full (sz: 1448 bytes)
[ 4793.776046] drbd base-disk: [drbd_w_base-dis/1734] sock_sendmsg time
expired, ko = 6
[ 4794.752314] tg3 0000:02:00.0: swiotlb buffer is full (sz: 1448 bytes)
[ 4799.776046] drbd base-disk: [drbd_w_base-dis/1734] sock_sendmsg time
expired, ko = 5
[ 4801.760290] tg3 0000:02:00.0: swiotlb buffer is full (sz: 1448 bytes)
[ 4805.776040] drbd base-disk: [drbd_w_base-dis/1734] sock_sendmsg time
expired, ko = 4
[ 4811.776040] drbd base-disk: [drbd_w_base-dis/1734] sock_sendmsg time
expired, ko = 3
[ 4817.776050] drbd base-disk: [drbd_w_base-dis/1734] sock_sendmsg time
expired, ko = 2
[ 4823.776079] drbd base-disk: [drbd_w_base-dis/1734] sock_sendmsg time
expired, ko = 1
[ 4827.936300] tg3 0000:02:00.0: swiotlb buffer is full (sz: 1448 bytes)
[ 4829.776069] drbd base-disk: peer( Secondary -> Unknown ) conn(
SyncSource -> Timeout )
[ 4829.776088] block drbd0: drbd_send_block() failed
Sometimes I also see the message "swiotlb_tbl_map_single: 8 callbacks
suppressed" or similar between the "buffer full" messages.
Sometimes the sync finishes, sometimes it stalls and fails completely.
The problem only occurs when running Linux 3.16.0-4-686-pae under Xen
4.4.1. It does NOT occur when booting the same kernel without Xen, or
when booting the corresponding amd64 kernel (3.16.0-4-amd64) with or
without Xen. There was no problem in Debian wheezy before the upgrade
(kernel 3.2.0-4-686-pae and Xen Hypervisor 4.1.3-amd64). The problem
also occurs when only dom0 is running (all domU VMs shut down).
I found the thread "tg3 NIC driver bug in 3.14.x under Xen"
(http://www.spinics.net/lists/netdev/msg324124.html) which looks like a
similar issue, but I don't understand exactly what is going on there and
what I could do to fix or debug it further.
Shall I try to build a 3.16.0-4-686-pae kernel with
"CONFIG_NEED_DMA_MAP_STATE=y"?
Shall I try to set the 'iommu' and/or 'swiotlb' kernel parameters? To
what values?
Any help or hint how to fix or work around this issue is very much
appreciated. Also hints how to debug this further are welcome.
Thanks,
Marco
P.S. Here is some information that might help figuring out what's going on:
-------------------------------------------------------------------
kepler:~# ethtool -S eth0 | grep -v ': 0$'
NIC statistics:
rx_octets: 42531865
rx_ucast_packets: 582596
rx_mcast_packets: 127
rx_bcast_packets: 1
tx_octets: 8692263469
tx_ucast_packets: 5755264
tx_mcast_packets: 10
-------------------------------------------------------------------
-------------------------------------------------------------------
kepler:~# ethtool -i eth0
driver: tg3
version: 3.137
firmware-version: 5722-v3.09, ASFIPMI v6.03
bus-info: 0000:02:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no
-------------------------------------------------------------------
-------------------------------------------------------------------
kepler:~# lspci -vvv -s 02:00.0
02:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5722
Gigabit Ethernet PCI Express
Subsystem: IBM IBM System x3350 (Machine type 4192)
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+
Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 32 bytes
Interrupt: pin A routed to IRQ 59
Region 0: Memory at e8200000 (64-bit, non-prefetchable) [size=64K]
Expansion ROM at <ignored> [disabled]
Capabilities: [48] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [50] Vital Product Data
Product Name: Broadcom NetXtreme Gigabit Ethernet Controller
Read-only fields:
[PN] Part number: BCM95722
[EC] Engineering changes: 106679-15
[SN] Serial number: 0123456789
[MN] Manufacture ID: 31 34 65 34
[RV] Reserved: checksum good, 28 byte(s) reserved
Read/write fields:
[YA] Asset tag: XYZ01234567
[RW] Read-write area: 107 byte(s) free
End
Capabilities: [58] Vendor Specific Information: Len=78 <?>
Capabilities: [e8] MSI: Enable+ Count=1/1 Maskable- 64bit+
Address: 00000000fee0200c Data: 4121
Capabilities: [d0] Express (v1) Endpoint, MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd- ExtTag+ PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s, Exit Latency L0s
<4us, L1 <64us
ClockPM- Surprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive-
BWMgmt- ABWMgmt-
Capabilities: [100 v1] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF-
MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF-
MalfTLP- ECRC- UnsupReq+ ACSViol-
UESvrt: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF+
MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
Capabilities: [13c v1] Virtual Channel
Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
Arb: Fixed- WRR32- WRR64- WRR128-
Ctrl: ArbSelect=Fixed
Status: InProgress-
VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01
Status: NegoPending- InProgress-
Capabilities: [160 v1] Device Serial Number 00-21-5e-ff-fe-4d-2c-13
Capabilities: [16c v1] Power Budgeting <?>
Kernel driver in use: tg3
-------------------------------------------------------------------
-------------------------------------------------------------------
kepler:~# brctl show
bridge name bridge id STP enabled interfaces
xenbrext0 8000.00215e4d2c14 no eth1
xenbrint0 8000.00215e4d2c13 no eth0
kepler:~# ifconfig eth0
eth0 Link encap:Ethernet HWaddr 00:21:5e:4d:2c:13
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:582865 errors:0 dropped:0 overruns:0 frame:0
TX packets:5755690 errors:0 dropped:1153 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:42557655 (40.5 MiB) TX bytes:8692339211 (8.0 GiB)
Interrupt:16
kepler:~# ifconfig xenbrint0
xenbrint0 Link encap:Ethernet HWaddr 00:21:5e:4d:2c:13
inet addr:192.168.2.100 Bcast:192.168.2.255 Mask:255.255.255.0
inet6 addr: 2001:1620:206b:1::2:1/64 Scope:Global
inet6 addr: fe80::221:5eff:fe4d:2c13/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:582461 errors:0 dropped:0 overruns:0 frame:0
TX packets:329904 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:32044143 (30.5 MiB) TX bytes:8330130321 (7.7 GiB)
-------------------------------------------------------------------
-------------------------------------------------------------------
kepler:~# cat /proc/version
Linux version 3.16.0-4-686-pae (debian-kernel@lists.debian.org) (gcc
version 4.8.4 (Debian 4.8.4-1) ) #1 SMP Debian 3.16.7-ckt9-3~deb8u1
(2015-04-24)
kepler:~# grep -e SWIOTLB -e CONFIG_NEED_DMA_MAP_STATE /boot/config-*
/boot/config-3.16.0-4-686-pae:CONFIG_SWIOTLB=y
/boot/config-3.16.0-4-686-pae:CONFIG_SWIOTLB_XEN=y
/boot/config-3.16.0-4-amd64:CONFIG_NEED_DMA_MAP_STATE=y
/boot/config-3.16.0-4-amd64:CONFIG_SWIOTLB=y
/boot/config-3.16.0-4-amd64:CONFIG_SWIOTLB_XEN=y
-------------------------------------------------------------------
-------------------------------------------------------------------
kepler:~# xen info
host : kepler
release : 3.16.0-4-686-pae
version : #1 SMP Debian 3.16.7-ckt9-3~deb8u1 (2015-04-24)
machine : i686
nr_cpus : 2
max_cpu_id : 1
nr_nodes : 1
cores_per_socket : 2
threads_per_core : 1
cpu_mhz : 2400
hw_caps :
bfebfbff:20100800:00000000:00000900:0000e39d:00000000:00000001:00000000
virt_caps :
total_memory : 8189
free_memory : 3999
sharing_freed_memory : 0
sharing_used_memory : 0
outstanding_claims : 0
free_cpus : 0
xen_major : 4
xen_minor : 4
xen_extra : .1
xen_version : 4.4.1
xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p
xen_scheduler : credit
xen_pagesize : 4096
platform_params : virt_start=0xff400000
xen_changeset :
xen_commandline : placeholder com1=115200,8n1 console=com1
dom0_mem=4096M,max:4096M
cc_compiler : gcc (Debian 4.9.2-10) 4.9.2
cc_compile_by : waldi
cc_compile_domain : debian.org
cc_compile_date : Mon Apr 6 19:49:18 UTC 2015
xend_config_format : 4
-------------------------------------------------------------------
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: "swiotlb buffer is full" problem with tg3 and kernel 3.16.0-4-686-pae on Xen 4.4.1
2015-05-21 15:17 "swiotlb buffer is full" problem with tg3 and kernel 3.16.0-4-686-pae on Xen 4.4.1 Marco Steinacher
@ 2015-05-21 18:11 ` Ian Campbell
2015-05-22 17:44 ` Marco Steinacher
0 siblings, 1 reply; 3+ messages in thread
From: Ian Campbell @ 2015-05-21 18:11 UTC (permalink / raw)
To: Marco Steinacher; +Cc: xen-devel
On Thu, 2015-05-21 at 17:17 +0200, Marco Steinacher wrote:
> Shall I try to build a 3.16.0-4-686-pae kernel with
> "CONFIG_NEED_DMA_MAP_STATE=y"?
Yes, this is what I would recommend. Although it's not as simple as
turning it on, you actually need to patch the Kconfig as in this mail:
http://www.spinics.net/lists/netdev/msg325844.html such that it comes on
by itself.
> Shall I try to set the 'iommu' and/or 'swiotlb' kernel parameters? To
> what values?
I don't think so, those were used in that thread to demonstrate that the
issue exited even on native -- i.e. not under Xen
If you could file your results into the Debian BTW[0], i.e. by running
"reportbug <linux-image-pkg-name>" then that would be useful to help get
it fixed in a future update.
Ian.
[0] http://bugs.debian.org
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: "swiotlb buffer is full" problem with tg3 and kernel 3.16.0-4-686-pae on Xen 4.4.1
2015-05-21 18:11 ` Ian Campbell
@ 2015-05-22 17:44 ` Marco Steinacher
0 siblings, 0 replies; 3+ messages in thread
From: Marco Steinacher @ 2015-05-22 17:44 UTC (permalink / raw)
To: Ian Campbell; +Cc: xen-devel
Am 21.05.2015 um 20:11 schrieb Ian Campbell:
> On Thu, 2015-05-21 at 17:17 +0200, Marco Steinacher wrote:
>> Shall I try to build a 3.16.0-4-686-pae kernel with
>> "CONFIG_NEED_DMA_MAP_STATE=y"?
>
> Yes, this is what I would recommend. Although it's not as simple as
> turning it on, you actually need to patch the Kconfig as in this mail:
> http://www.spinics.net/lists/netdev/msg325844.html such that it comes on
> by itself.
Thanks for pointing me in the right direction. I compiled a custom
kernel following [0] and the following change:
--- linux-source-3.16.orig/arch/x86/Kconfig 2015-04-24
04:05:11.000000000 +0200
+++ linux-source-3.16/arch/x86/Kconfig 2015-05-22 06:14:17.393963697 +0200
@@ -164,7 +164,7 @@
config NEED_DMA_MAP_STATE
def_bool y
- depends on X86_64 || INTEL_IOMMU || DMA_API_DEBUG
+ depends on X86_64 || INTEL_IOMMU || DMA_API_DEBUG || SWIOTLB
config NEED_SG_DMA_LENGTH
def_bool y
And this solved the problem for me. No "swiotlb buffer is full" messages
anymore and the network interface works just fine again.
> If you could file your results into the Debian BTW, i.e. by running
> "reportbug <linux-image-pkg-name>" then that would be useful to help get
> it fixed in a future update.
Done [1].
Thanks a lot,
Marco
[0] https://www.debian.org/releases/jessie/i386/ch08s06.html.en
[1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=786551
--
OpenPGP Key ID: 0x62937F7F
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2015-05-22 17:44 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-05-21 15:17 "swiotlb buffer is full" problem with tg3 and kernel 3.16.0-4-686-pae on Xen 4.4.1 Marco Steinacher
2015-05-21 18:11 ` Ian Campbell
2015-05-22 17:44 ` Marco Steinacher
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.