* Re: ixgbe: ksoftirqd consumes 100% CPU w/ ~50 TCP conns
[not found] <CAEm7KtxXapo-9OevQtFAnuZo5h4FH8RQ-Nm3in4A0uJh0KqYKQ@mail.gmail.com>
@ 2016-05-24 16:40 ` Brandon Philips
2016-05-24 19:46 ` Alexander Duyck
0 siblings, 1 reply; 3+ messages in thread
From: Brandon Philips @ 2016-05-24 16:40 UTC (permalink / raw)
To: Jesse Brandeburg, John Fastabend, Jeff Kirsher, Mark Rustad
Cc: netdev, Matthew Garrett
Hello Everyone-
So we tracked it down to IOMMU causing CPU affinity getting broken[1].
Can we provide any further details or is this a known issue?
Thank You,
Brandon
[1] https://github.com/coreos/bugs/issues/1275#issuecomment-219866601
On Tue, May 17, 2016 at 12:44 PM, Brandon Philips <brandon@ifup.co> wrote:
> Hello ixgbe team-
>
> With Linux v4.6 and the ixgbe driver (details below) a user is reporting
> ksoftirqd consuming 100% of the CPU on all cores after a moderate ~20-50
> number of TCP connections. They are unable to reproduce this issue with
> Cisco hardware.
>
> With Kernel v3.19 they cannot reproduce[1] the issue. Disabling IOMMU
> (intel_iommu=off) does "fix" the issue[2].
>
> Thank You,
>
> Brandon
>
> [1] https://github.com/coreos/bugs/issues/1275#issuecomment-219157803
> [2] https://github.com/coreos/bugs/issues/1275#issuecomment-219819986
>
> Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
> ethtool -i eno1
> driver: ixgbe
> version: 4.0.1-k
> firmware-version: 0x800004e0
> bus-info: 0000:06:00.0
> supports-statistics: yes
> supports-test: yes
> supports-eeprom-access: yes
> supports-register-dump: yes
> supports-priv-flags: no
>
> CPU
> Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: ixgbe: ksoftirqd consumes 100% CPU w/ ~50 TCP conns
2016-05-24 16:40 ` ixgbe: ksoftirqd consumes 100% CPU w/ ~50 TCP conns Brandon Philips
@ 2016-05-24 19:46 ` Alexander Duyck
2016-05-24 20:35 ` Jesper Dangaard Brouer
0 siblings, 1 reply; 3+ messages in thread
From: Alexander Duyck @ 2016-05-24 19:46 UTC (permalink / raw)
To: Brandon Philips
Cc: Jesse Brandeburg, John Fastabend, Jeff Kirsher, Mark Rustad,
Netdev, Matthew Garrett
I'm guessing the issue is lock contention on the IOMMU resource table.
I resolved most of that for the Rx side back when we implemented the
Rx page reuse but the Tx side still has to perform a DMA mapping for
each individual buffer. Depending on the needs of the user if they
still need the IOMMU enabled for use with something like KVM one thing
they may try doing is use the kernel parameter "iommu=pt" to allow
host devices to access memory without the penalty for having to
allocate/free resources and still provide guests with IOMMU isolation.
- Alex
On Tue, May 24, 2016 at 9:40 AM, Brandon Philips <brandon@ifup.co> wrote:
> Hello Everyone-
>
> So we tracked it down to IOMMU causing CPU affinity getting broken[1].
> Can we provide any further details or is this a known issue?
>
> Thank You,
>
> Brandon
>
> [1] https://github.com/coreos/bugs/issues/1275#issuecomment-219866601
>
> On Tue, May 17, 2016 at 12:44 PM, Brandon Philips <brandon@ifup.co> wrote:
>> Hello ixgbe team-
>>
>> With Linux v4.6 and the ixgbe driver (details below) a user is reporting
>> ksoftirqd consuming 100% of the CPU on all cores after a moderate ~20-50
>> number of TCP connections. They are unable to reproduce this issue with
>> Cisco hardware.
>>
>> With Kernel v3.19 they cannot reproduce[1] the issue. Disabling IOMMU
>> (intel_iommu=off) does "fix" the issue[2].
>>
>> Thank You,
>>
>> Brandon
>>
>> [1] https://github.com/coreos/bugs/issues/1275#issuecomment-219157803
>> [2] https://github.com/coreos/bugs/issues/1275#issuecomment-219819986
>>
>> Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
>> ethtool -i eno1
>> driver: ixgbe
>> version: 4.0.1-k
>> firmware-version: 0x800004e0
>> bus-info: 0000:06:00.0
>> supports-statistics: yes
>> supports-test: yes
>> supports-eeprom-access: yes
>> supports-register-dump: yes
>> supports-priv-flags: no
>>
>> CPU
>> Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: ixgbe: ksoftirqd consumes 100% CPU w/ ~50 TCP conns
2016-05-24 19:46 ` Alexander Duyck
@ 2016-05-24 20:35 ` Jesper Dangaard Brouer
0 siblings, 0 replies; 3+ messages in thread
From: Jesper Dangaard Brouer @ 2016-05-24 20:35 UTC (permalink / raw)
To: Alexander Duyck
Cc: brouer, Brandon Philips, Jesse Brandeburg, John Fastabend,
Jeff Kirsher, Mark Rustad, Netdev, Matthew Garrett
On Tue, 24 May 2016 12:46:56 -0700
Alexander Duyck <alexander.duyck@gmail.com> wrote:
> I'm guessing the issue is lock contention on the IOMMU resource table.
> I resolved most of that for the Rx side back when we implemented the
> Rx page reuse but the Tx side still has to perform a DMA mapping for
> each individual buffer. Depending on the needs of the user if they
> still need the IOMMU enabled for use with something like KVM one thing
> they may try doing is use the kernel parameter "iommu=pt" to allow
> host devices to access memory without the penalty for having to
> allocate/free resources and still provide guests with IOMMU isolation.
Listen to Alex, he knows what his is talking about.
My longer term plan for getting rid of the dma_map/unmap overhead is to
_keep_ the pages DMA mapped and recycle them back via page-pool.
Details in my slides, see slide 5:
http://people.netfilter.org/hawk/presentations/MM-summit2016/generic_page_pool_mm_summit2016.pdf
Alex'es RX recycle trick for the Intel drivers are described on
slide14. It seems like, in your use-case the pages might be held "too"
long for the RX recycling trick to work.
If you want to understand the IOMMU problem in details, I recommend to
read the article "True IOMMU Protection from DMA Attacks"
http://www.cs.technion.ac.il/~mad/publications/asplos2016-iommu.pdf
(My solution is different, but they desc the problem very well)
--Jesper
> On Tue, May 24, 2016 at 9:40 AM, Brandon Philips <brandon@ifup.co> wrote:
> > Hello Everyone-
> >
> > So we tracked it down to IOMMU causing CPU affinity getting broken[1].
> > Can we provide any further details or is this a known issue?
> >
> > Thank You,
> >
> > Brandon
> >
> > [1] https://github.com/coreos/bugs/issues/1275#issuecomment-219866601
> >
> > On Tue, May 17, 2016 at 12:44 PM, Brandon Philips <brandon@ifup.co> wrote:
> >> Hello ixgbe team-
> >>
> >> With Linux v4.6 and the ixgbe driver (details below) a user is reporting
> >> ksoftirqd consuming 100% of the CPU on all cores after a moderate ~20-50
> >> number of TCP connections. They are unable to reproduce this issue with
> >> Cisco hardware.
> >>
> >> With Kernel v3.19 they cannot reproduce[1] the issue. Disabling IOMMU
> >> (intel_iommu=off) does "fix" the issue[2].
> >>
> >> Thank You,
> >>
> >> Brandon
> >>
> >> [1] https://github.com/coreos/bugs/issues/1275#issuecomment-219157803
> >> [2] https://github.com/coreos/bugs/issues/1275#issuecomment-219819986
> >>
> >> Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
> >> ethtool -i eno1
> >> driver: ixgbe
> >> version: 4.0.1-k
> >> firmware-version: 0x800004e0
> >> bus-info: 0000:06:00.0
> >> supports-statistics: yes
> >> supports-test: yes
> >> supports-eeprom-access: yes
> >> supports-register-dump: yes
> >> supports-priv-flags: no
> >>
> >> CPU
> >> Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
Author of http://www.iptv-analyzer.org
LinkedIn: http://www.linkedin.com/in/brouer
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2016-05-24 20:35 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CAEm7KtxXapo-9OevQtFAnuZo5h4FH8RQ-Nm3in4A0uJh0KqYKQ@mail.gmail.com>
2016-05-24 16:40 ` ixgbe: ksoftirqd consumes 100% CPU w/ ~50 TCP conns Brandon Philips
2016-05-24 19:46 ` Alexander Duyck
2016-05-24 20:35 ` Jesper Dangaard Brouer
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).