All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Nikita V. Shirokov" <tehnerd@tehnerd.com>
To: Alexander Duyck <alexander.duyck@gmail.com>
Cc: netdev@vger.kernel.org, jeffrey.t.kirsher@intel.com
Subject: Re: ixgbe hangs when XDP_TX is enabled
Date: Tue, 21 Aug 2018 09:58:58 -0700	[thread overview]
Message-ID: <20180821165858.GA1507@maindev> (raw)
In-Reply-To: <CAKgT0UcNsFcNNUycTqZ59b5=dX4V=Fk5mVUQ8pOYT_nz194rqQ@mail.gmail.com>

On Tue, Aug 21, 2018 at 08:58:15AM -0700, Alexander Duyck wrote:
> On Mon, Aug 20, 2018 at 12:32 PM Nikita V. Shirokov <tehnerd@tehnerd.com> wrote:
> >
> > we are getting such errors:
> >
> > [  408.737313] ixgbe 0000:03:00.0 eth0: Detected Tx Unit Hang (XDP)
> >                  Tx Queue             <46>
> >                  TDH, TDT             <0>, <2>
> >                  next_to_use          <2>
> >                  next_to_clean        <0>
> >                tx_buffer_info[next_to_clean]
> >                  time_stamp           <0>
> >                  jiffies              <1000197c0>
> > [  408.804438] ixgbe 0000:03:00.0 eth0: tx hang 1 detected on queue 46, resetting adapter
> > [  408.804440] ixgbe 0000:03:00.0 eth0: initiating reset due to tx timeout
> > [  408.817679] ixgbe 0000:03:00.0 eth0: Reset adapter
> > [  408.866091] ixgbe 0000:03:00.0 eth0: TXDCTL.ENABLE for one or more queues not cleared within the polling period
> > [  409.345289] ixgbe 0000:03:00.0 eth0: detected SFP+: 3
> > [  409.497232] ixgbe 0000:03:00.0 eth0: NIC Link is Up 10 Gbps, Flow Control: RX/TX
> >
> > while running XDP prog on ixgbe nic.
> > right now i'm seing this on bpfnext kernel
> > (latest commit from Wed Aug 15 15:04:25 2018 -0700 ;
> > 9a76aba02a37718242d7cdc294f0a3901928aa57)
> >
> > looks like this is the same issue as reported by Brenden in
> > https://www.spinics.net/lists/netdev/msg439438.html
> >
> > --
> > Nikita V. Shirokov
> 
> Could you provide some additional information about your setup.
> Specifically useful would be "ethtool -i", "ethtool -l", and lspci
> -vvv info for your device. The total number of CPUs on the system
> would be useful to know as well. In addition could you try
> reproducing
sure:

ethtool -l eth0
Channel parameters for eth0:
Pre-set maximums:
RX:             0
TX:             0
Other:          1
Combined:       63
Current hardware settings:
RX:             0
TX:             0
Other:          1
Combined:       48

# ethtool -i eth0
driver: ixgbe
version: 5.1.0-k
firmware-version: 0x800006f1
expansion-rom-version:
bus-info: 0000:03:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes


# nproc
48

lspci:

03:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
        Subsystem: Intel Corporation Device 000d
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 32 bytes
        Interrupt: pin A routed to IRQ 30
        NUMA node: 0
        Region 0: Memory at c7d00000 (64-bit, non-prefetchable) [size=1M]
        Region 2: I/O ports at 6000 [size=32]
        Region 4: Memory at c7e80000 (64-bit, non-prefetchable) [size=16K]
        Expansion ROM at c7e00000 [disabled] [size=512K]
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
                Address: 0000000000000000  Data: 0000
                Masking: 00000000  Pending: 00000000
        Capabilities: [70] MSI-X: Enable+ Count=64 Masked-
                Vector table: BAR=4 offset=00000000
                PBA: BAR=4 offset=00002000
        Capabilities: [a0] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
                DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
                        MaxPayload 256 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr+ TransPend+
                LnkCap: Port #2, Speed 5GT/s, Width x8, ASPM L0s, Exit Latency L0s unlimited, L1 <8us
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 5GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
                LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
        Capabilities: [140 v1] Device Serial Number 90-e2-ba-ff-ff-b6-b2-60
        Capabilities: [150 v1] Alternative Routing-ID Interpretation (ARI)
                ARICap: MFVC- ACS-, Next Function: 0
                ARICtl: MFVC- ACS-, Function Group: 0
        Capabilities: [160 v1] Single Root I/O Virtualization (SR-IOV)
                IOVCap: Migration-, Interrupt Message Number: 000
                IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy+
                IOVSta: Migration-
                Initial VFs: 64, Total VFs: 64, Number of VFs: 0, Function Dependency Link: 00
                VF offset: 128, stride: 2, Device ID: 10ed
                Supported Page Size: 00000553, System Page Size: 00000001
                Region 0: Memory at 00000000c7c00000 (64-bit, prefetchable)
                Region 3: Memory at 00000000c7b00000 (64-bit, prefetchable)
                VF Migration: offset: 00000000, BIR: 0
        Kernel driver in use: ixgbe




workaround for now is to do the same, as Brenden did in his original
finding: make sure that combined + xdp queues < max_tx_queues
(e.g. w/ combined == 14 the issue goes away).

> the issue with one of the sample XDP programs provided with the kernel
> such as the xdp2 which I believe uses the XDP_TX function. We need to
> try and create a similar setup in our own environment for
> reproduction and debugging.

will try but this could take a while, because i'm not sure that we have
ixgbe in our test lab (and it would be hard to run such test in prod)

> 
> Thanks.
> 
> - Alex

  reply	other threads:[~2018-08-21 20:19 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-20 19:31 ixgbe hangs when XDP_TX is enabled Nikita V. Shirokov
2018-08-21 15:58 ` Alexander Duyck
2018-08-21 16:58   ` Nikita V. Shirokov [this message]
2018-08-21 18:13     ` Alexander Duyck
2018-08-22 16:22       ` Jeff Kirsher
2018-08-24 14:25         ` Jesper Dangaard Brouer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180821165858.GA1507@maindev \
    --to=tehnerd@tehnerd.com \
    --cc=alexander.duyck@gmail.com \
    --cc=jeffrey.t.kirsher@intel.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.