* CPU scheduler to TXQ binding? (ixgbe vs. igb)
@ 2014-09-17 13:26 Jesper Dangaard Brouer
2014-09-17 14:32 ` Eric Dumazet
0 siblings, 1 reply; 13+ messages in thread
From: Jesper Dangaard Brouer @ 2014-09-17 13:26 UTC (permalink / raw)
To: netdev@vger.kernel.org; +Cc: Tom Herbert, Eric Dumazet
The CPU to TXQ binding behavior of ixgbe vs. igb NIC driver are
somehow different. Normally I setup NIC IRQ-to-CPU bindings 1-to-1,
with script set_irq_affinity [1].
For forcing use of a specific HW TXQ, I normally force the CPU binding
of the process, either with "taskset" or with "netperf -T lcpu,rcpu".
This works fine with driver ixgbe, but not with driver igb. That is
with igb, the program forced to specific CPU, can still use another
TXQ. What am I missing?
I'm monitoring this with both:
1) watch -d sudo tc -s -d q ls dev ethXX
2) https://github.com/ffainelli/bqlmon
[1] https://github.com/netoptimizer/network-testing/blob/master/bin/set_irq_affinity
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Sr. Network Kernel Developer at Red Hat
Author of http://www.iptv-analyzer.org
LinkedIn: http://www.linkedin.com/in/brouer
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: CPU scheduler to TXQ binding? (ixgbe vs. igb)
2014-09-17 13:26 CPU scheduler to TXQ binding? (ixgbe vs. igb) Jesper Dangaard Brouer
@ 2014-09-17 14:32 ` Eric Dumazet
2014-09-17 14:55 ` Jesper Dangaard Brouer
2014-09-17 14:59 ` Alexander Duyck
0 siblings, 2 replies; 13+ messages in thread
From: Eric Dumazet @ 2014-09-17 14:32 UTC (permalink / raw)
To: Jesper Dangaard Brouer; +Cc: netdev@vger.kernel.org, Tom Herbert
On Wed, 2014-09-17 at 15:26 +0200, Jesper Dangaard Brouer wrote:
> The CPU to TXQ binding behavior of ixgbe vs. igb NIC driver are
> somehow different. Normally I setup NIC IRQ-to-CPU bindings 1-to-1,
> with script set_irq_affinity [1].
>
> For forcing use of a specific HW TXQ, I normally force the CPU binding
> of the process, either with "taskset" or with "netperf -T lcpu,rcpu".
>
> This works fine with driver ixgbe, but not with driver igb. That is
> with igb, the program forced to specific CPU, can still use another
> TXQ. What am I missing?
>
>
> I'm monitoring this with both:
> 1) watch -d sudo tc -s -d q ls dev ethXX
> 2) https://github.com/ffainelli/bqlmon
>
> [1] https://github.com/netoptimizer/network-testing/blob/master/bin/set_irq_affinity
Have you setup XPS ?
echo 0001 >/sys/class/net/ethX/queues/tx-0/xps_cpus
echo 0002 >/sys/class/net/ethX/queues/tx-1/xps_cpus
echo 0004 >/sys/class/net/ethX/queues/tx-2/xps_cpus
echo 0008 >/sys/class/net/ethX/queues/tx-3/xps_cpus
echo 0010 >/sys/class/net/ethX/queues/tx-4/xps_cpus
echo 0020 >/sys/class/net/ethX/queues/tx-5/xps_cpus
echo 0040 >/sys/class/net/ethX/queues/tx-6/xps_cpus
echo 0080 >/sys/class/net/ethX/queues/tx-7/xps_cpus
Or something like that, depending on number of cpus and TX queues.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: CPU scheduler to TXQ binding? (ixgbe vs. igb)
2014-09-17 14:32 ` Eric Dumazet
@ 2014-09-17 14:55 ` Jesper Dangaard Brouer
2014-09-17 14:59 ` Alexander Duyck
1 sibling, 0 replies; 13+ messages in thread
From: Jesper Dangaard Brouer @ 2014-09-17 14:55 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev@vger.kernel.org, Tom Herbert
On Wed, 17 Sep 2014 07:32:39 -0700
Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Wed, 2014-09-17 at 15:26 +0200, Jesper Dangaard Brouer wrote:
> > The CPU to TXQ binding behavior of ixgbe vs. igb NIC driver are
> > somehow different. Normally I setup NIC IRQ-to-CPU bindings 1-to-1,
> > with script set_irq_affinity [1].
> >
> > For forcing use of a specific HW TXQ, I normally force the CPU binding
> > of the process, either with "taskset" or with "netperf -T lcpu,rcpu".
> >
> > This works fine with driver ixgbe, but not with driver igb. That is
> > with igb, the program forced to specific CPU, can still use another
> > TXQ. What am I missing?
> >
> >
> > I'm monitoring this with both:
> > 1) watch -d sudo tc -s -d q ls dev ethXX
> > 2) https://github.com/ffainelli/bqlmon
> >
> > [1] https://github.com/netoptimizer/network-testing/blob/master/bin/set_irq_affinity
>
> Have you setup XPS ?
>
> echo 0001 >/sys/class/net/ethX/queues/tx-0/xps_cpus
> echo 0002 >/sys/class/net/ethX/queues/tx-1/xps_cpus
> echo 0004 >/sys/class/net/ethX/queues/tx-2/xps_cpus
> echo 0008 >/sys/class/net/ethX/queues/tx-3/xps_cpus
> echo 0010 >/sys/class/net/ethX/queues/tx-4/xps_cpus
> echo 0020 >/sys/class/net/ethX/queues/tx-5/xps_cpus
> echo 0040 >/sys/class/net/ethX/queues/tx-6/xps_cpus
> echo 0080 >/sys/class/net/ethX/queues/tx-7/xps_cpus
>
> Or something like that, depending on number of cpus and TX queues.
Thanks, that worked! They were all default set to "000" for igb, but
set correctly/like-above for ixgbe (strange).
Did:
$ export DEV=eth1 ; export NR_CPUS=11 ; \
for txq in `seq 0 $NR_CPUS` ; do \
file=/sys/class/net/${DEV}/queues/tx-${txq}/xps_cpus \
mask=`printf %X $((1<<$txq))`
test -e $file && sudo sh -c "echo $mask > $file" && \
grep . -H $file ;\
done
/sys/class/net/eth1/queues/tx-0/xps_cpus:001
/sys/class/net/eth1/queues/tx-1/xps_cpus:002
/sys/class/net/eth1/queues/tx-2/xps_cpus:004
/sys/class/net/eth1/queues/tx-3/xps_cpus:008
/sys/class/net/eth1/queues/tx-4/xps_cpus:010
/sys/class/net/eth1/queues/tx-5/xps_cpus:020
/sys/class/net/eth1/queues/tx-6/xps_cpus:040
/sys/class/net/eth1/queues/tx-7/xps_cpus:080
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Sr. Network Kernel Developer at Red Hat
Author of http://www.iptv-analyzer.org
LinkedIn: http://www.linkedin.com/in/brouer
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: CPU scheduler to TXQ binding? (ixgbe vs. igb)
2014-09-17 14:32 ` Eric Dumazet
2014-09-17 14:55 ` Jesper Dangaard Brouer
@ 2014-09-17 14:59 ` Alexander Duyck
2014-09-18 6:56 ` Jesper Dangaard Brouer
1 sibling, 1 reply; 13+ messages in thread
From: Alexander Duyck @ 2014-09-17 14:59 UTC (permalink / raw)
To: Eric Dumazet, Jesper Dangaard Brouer; +Cc: netdev@vger.kernel.org, Tom Herbert
On 09/17/2014 07:32 AM, Eric Dumazet wrote:
> On Wed, 2014-09-17 at 15:26 +0200, Jesper Dangaard Brouer wrote:
>> The CPU to TXQ binding behavior of ixgbe vs. igb NIC driver are
>> somehow different. Normally I setup NIC IRQ-to-CPU bindings 1-to-1,
>> with script set_irq_affinity [1].
>>
>> For forcing use of a specific HW TXQ, I normally force the CPU binding
>> of the process, either with "taskset" or with "netperf -T lcpu,rcpu".
>>
>> This works fine with driver ixgbe, but not with driver igb. That is
>> with igb, the program forced to specific CPU, can still use another
>> TXQ. What am I missing?
>>
>>
>> I'm monitoring this with both:
>> 1) watch -d sudo tc -s -d q ls dev ethXX
>> 2) https://github.com/ffainelli/bqlmon
>>
>> [1] https://github.com/netoptimizer/network-testing/blob/master/bin/set_irq_affinity
>
> Have you setup XPS ?
>
> echo 0001 >/sys/class/net/ethX/queues/tx-0/xps_cpus
> echo 0002 >/sys/class/net/ethX/queues/tx-1/xps_cpus
> echo 0004 >/sys/class/net/ethX/queues/tx-2/xps_cpus
> echo 0008 >/sys/class/net/ethX/queues/tx-3/xps_cpus
> echo 0010 >/sys/class/net/ethX/queues/tx-4/xps_cpus
> echo 0020 >/sys/class/net/ethX/queues/tx-5/xps_cpus
> echo 0040 >/sys/class/net/ethX/queues/tx-6/xps_cpus
> echo 0080 >/sys/class/net/ethX/queues/tx-7/xps_cpus
>
> Or something like that, depending on number of cpus and TX queues.
>
That was what I was thinking as well.
ixgbe has ATR which makes use of XPS to setup the transmit queues for a
1:1 mapping. The receive side of the flow is routed back to the same Rx
queue through flow director mappings.
In the case of igb it only has RSS and doesn't set a default XPS
configuration. So you should probably setup XPS and you might also want
to try and make use of RPS to try and steer receive packets since the Rx
queues won't match the Tx queues.
Thanks,
Alex
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: CPU scheduler to TXQ binding? (ixgbe vs. igb)
2014-09-17 14:59 ` Alexander Duyck
@ 2014-09-18 6:56 ` Jesper Dangaard Brouer
2014-09-18 7:28 ` Jesper Dangaard Brouer
2014-09-18 13:33 ` Eric Dumazet
0 siblings, 2 replies; 13+ messages in thread
From: Jesper Dangaard Brouer @ 2014-09-18 6:56 UTC (permalink / raw)
To: Alexander Duyck; +Cc: Eric Dumazet, netdev@vger.kernel.org, Tom Herbert
On Wed, 17 Sep 2014 07:59:51 -0700
Alexander Duyck <alexander.h.duyck@intel.com> wrote:
> On 09/17/2014 07:32 AM, Eric Dumazet wrote:
> > On Wed, 2014-09-17 at 15:26 +0200, Jesper Dangaard Brouer wrote:
> >> The CPU to TXQ binding behavior of ixgbe vs. igb NIC driver are
> >> somehow different. Normally I setup NIC IRQ-to-CPU bindings 1-to-1,
> >> with script set_irq_affinity [1].
> >>
> >> For forcing use of a specific HW TXQ, I normally force the CPU binding
> >> of the process, either with "taskset" or with "netperf -T lcpu,rcpu".
> >>
> >> This works fine with driver ixgbe, but not with driver igb. That is
> >> with igb, the program forced to specific CPU, can still use another
> >> TXQ. What am I missing?
> >>
> >>
> >> I'm monitoring this with both:
> >> 1) watch -d sudo tc -s -d q ls dev ethXX
> >> 2) https://github.com/ffainelli/bqlmon
> >>
> >> [1] https://github.com/netoptimizer/network-testing/blob/master/bin/set_irq_affinity
> >
> > Have you setup XPS ?
> >
> > echo 0001 >/sys/class/net/ethX/queues/tx-0/xps_cpus
> > echo 0002 >/sys/class/net/ethX/queues/tx-1/xps_cpus
> > echo 0004 >/sys/class/net/ethX/queues/tx-2/xps_cpus
> > echo 0008 >/sys/class/net/ethX/queues/tx-3/xps_cpus
> > echo 0010 >/sys/class/net/ethX/queues/tx-4/xps_cpus
> > echo 0020 >/sys/class/net/ethX/queues/tx-5/xps_cpus
> > echo 0040 >/sys/class/net/ethX/queues/tx-6/xps_cpus
> > echo 0080 >/sys/class/net/ethX/queues/tx-7/xps_cpus
> >
> > Or something like that, depending on number of cpus and TX queues.
> >
>
> That was what I was thinking as well.
>
> ixgbe has ATR which makes use of XPS to setup the transmit queues for a
> 1:1 mapping. The receive side of the flow is routed back to the same Rx
> queue through flow director mappings.
>
> In the case of igb it only has RSS and doesn't set a default XPS
> configuration. So you should probably setup XPS and you might also want
> to try and make use of RPS to try and steer receive packets since the Rx
> queues won't match the Tx queues.
After setting up XPS to CPU 1:1 binding, it works most of the time.
Meaning, most of the traffic will go through the TXQ I've bound the
process to, BUT some packets can still choose another TXQ (observed
monitoring tc output and blqmon).
Could this be related to the missing RPS setup?
Can I get some hints setting up RPS?
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Sr. Network Kernel Developer at Red Hat
Author of http://www.iptv-analyzer.org
LinkedIn: http://www.linkedin.com/in/brouer
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: CPU scheduler to TXQ binding? (ixgbe vs. igb)
2014-09-18 6:56 ` Jesper Dangaard Brouer
@ 2014-09-18 7:28 ` Jesper Dangaard Brouer
2014-09-18 13:33 ` Eric Dumazet
1 sibling, 0 replies; 13+ messages in thread
From: Jesper Dangaard Brouer @ 2014-09-18 7:28 UTC (permalink / raw)
To: Jesper Dangaard Brouer
Cc: Alexander Duyck, Eric Dumazet, netdev@vger.kernel.org,
Tom Herbert
On Thu, 18 Sep 2014 08:56:40 +0200
Jesper Dangaard Brouer <jbrouer@redhat.com> wrote:
> On Wed, 17 Sep 2014 07:59:51 -0700
> Alexander Duyck <alexander.h.duyck@intel.com> wrote:
>
> > On 09/17/2014 07:32 AM, Eric Dumazet wrote:
> > > On Wed, 2014-09-17 at 15:26 +0200, Jesper Dangaard Brouer wrote:
> > >> The CPU to TXQ binding behavior of ixgbe vs. igb NIC driver are
> > >> somehow different. Normally I setup NIC IRQ-to-CPU bindings 1-to-1,
> > >> with script set_irq_affinity [1].
> > >>
> > >> For forcing use of a specific HW TXQ, I normally force the CPU binding
> > >> of the process, either with "taskset" or with "netperf -T lcpu,rcpu".
> > >>
> > >> This works fine with driver ixgbe, but not with driver igb. That is
> > >> with igb, the program forced to specific CPU, can still use another
> > >> TXQ. What am I missing?
> > >>
> > >>
> > >> I'm monitoring this with both:
> > >> 1) watch -d sudo tc -s -d q ls dev ethXX
> > >> 2) https://github.com/ffainelli/bqlmon
> > >>
> > >> [1] https://github.com/netoptimizer/network-testing/blob/master/bin/set_irq_affinity
> > >
> > > Have you setup XPS ?
> > >
> > > echo 0001 >/sys/class/net/ethX/queues/tx-0/xps_cpus
> > > echo 0002 >/sys/class/net/ethX/queues/tx-1/xps_cpus
> > > echo 0004 >/sys/class/net/ethX/queues/tx-2/xps_cpus
> > > echo 0008 >/sys/class/net/ethX/queues/tx-3/xps_cpus
> > > echo 0010 >/sys/class/net/ethX/queues/tx-4/xps_cpus
> > > echo 0020 >/sys/class/net/ethX/queues/tx-5/xps_cpus
> > > echo 0040 >/sys/class/net/ethX/queues/tx-6/xps_cpus
> > > echo 0080 >/sys/class/net/ethX/queues/tx-7/xps_cpus
> > >
> > > Or something like that, depending on number of cpus and TX queues.
> > >
> >
> > That was what I was thinking as well.
> >
> > ixgbe has ATR which makes use of XPS to setup the transmit queues for a
> > 1:1 mapping. The receive side of the flow is routed back to the same Rx
> > queue through flow director mappings.
> >
> > In the case of igb it only has RSS and doesn't set a default XPS
> > configuration. So you should probably setup XPS and you might also want
> > to try and make use of RPS to try and steer receive packets since the Rx
> > queues won't match the Tx queues.
>
> After setting up XPS to CPU 1:1 binding, it works most of the time.
> Meaning, most of the traffic will go through the TXQ I've bound the
> process to, BUT some packets can still choose another TXQ (observed
> monitoring tc output and blqmon).
>
> Could this be related to the missing RPS setup?
It helped setting up RPS, but not 100%. I see small periods of packets
going out on other TXQs, and sometimes as before some heavy flow will
find its way to another TXQ.
> Can I get some hints setting up RPS?
My setup command now maps both XPS and RPS 1:1 to CPUs.
# Setup both RPS and XPS with a 1:1 binding to CPUs
export DEV=eth1 ; export NR_CPUS=11 ; \
for txq in `seq 0 $NR_CPUS` ; do \
file_xps=/sys/class/net/${DEV}/queues/tx-${txq}/xps_cpus \
file_rps=/sys/class/net/${DEV}/queues/rx-${txq}/rps_cpus \
mask=`printf %X $((1<<$txq))`
test -e $file_xps && sudo sh -c "echo $mask > $file_xps" && grep . -H $file_xps ;\
test -e $file_rps && sudo sh -c "echo $mask > $file_rps" && grep . -H $file_rps ;\
done
Output:
/sys/class/net/eth1/queues/tx-0/xps_cpus:001
/sys/class/net/eth1/queues/rx-0/rps_cpus:001
/sys/class/net/eth1/queues/tx-1/xps_cpus:002
/sys/class/net/eth1/queues/rx-1/rps_cpus:002
/sys/class/net/eth1/queues/tx-2/xps_cpus:004
/sys/class/net/eth1/queues/rx-2/rps_cpus:004
/sys/class/net/eth1/queues/tx-3/xps_cpus:008
/sys/class/net/eth1/queues/rx-3/rps_cpus:008
/sys/class/net/eth1/queues/tx-4/xps_cpus:010
/sys/class/net/eth1/queues/rx-4/rps_cpus:010
/sys/class/net/eth1/queues/tx-5/xps_cpus:020
/sys/class/net/eth1/queues/rx-5/rps_cpus:020
/sys/class/net/eth1/queues/tx-6/xps_cpus:040
/sys/class/net/eth1/queues/rx-6/rps_cpus:040
/sys/class/net/eth1/queues/tx-7/xps_cpus:080
/sys/class/net/eth1/queues/rx-7/rps_cpus:080
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Sr. Network Kernel Developer at Red Hat
Author of http://www.iptv-analyzer.org
LinkedIn: http://www.linkedin.com/in/brouer
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: CPU scheduler to TXQ binding? (ixgbe vs. igb)
2014-09-18 6:56 ` Jesper Dangaard Brouer
2014-09-18 7:28 ` Jesper Dangaard Brouer
@ 2014-09-18 13:33 ` Eric Dumazet
2014-09-18 13:41 ` Eric Dumazet
1 sibling, 1 reply; 13+ messages in thread
From: Eric Dumazet @ 2014-09-18 13:33 UTC (permalink / raw)
To: Jesper Dangaard Brouer
Cc: Alexander Duyck, netdev@vger.kernel.org, Tom Herbert
On Thu, 2014-09-18 at 08:56 +0200, Jesper Dangaard Brouer wrote:
> After setting up XPS to CPU 1:1 binding, it works most of the time.
> Meaning, most of the traffic will go through the TXQ I've bound the
> process to, BUT some packets can still choose another TXQ (observed
> monitoring tc output and blqmon).
Note that for TCP, there are packets sent by the process doing the
send(), or packets sent by cpu doing TX completion (because of TSQ),
but also packets sent by ACK processing done in the reverse way.
As Alexander explained, if the ACK packets are delivered into another
CPU, then you might select another TX queue.
This is mostly prevented because of ooo_okay logic, meaning that a busy
bulk flow should stick into a single TX queue, no matter of XPS says.
A TCP_RR workload is free to chose whatever queue, because every packet
starting a RR block has the ooo_okay set (As prior data was delivered
and acknowledged by the opposite peer)
>
> Could this be related to the missing RPS setup?
No, for this to really work, you need hardware support, so that ACK
packets take the same RX queue than the sent packets.
>
> Can I get some hints setting up RPS?
>
Documentation/networking/scaling.txt is full of hints...
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: CPU scheduler to TXQ binding? (ixgbe vs. igb)
2014-09-18 13:33 ` Eric Dumazet
@ 2014-09-18 13:41 ` Eric Dumazet
2014-09-18 15:42 ` Eric Dumazet
0 siblings, 1 reply; 13+ messages in thread
From: Eric Dumazet @ 2014-09-18 13:41 UTC (permalink / raw)
To: Jesper Dangaard Brouer
Cc: Alexander Duyck, netdev@vger.kernel.org, Tom Herbert
On Thu, 2014-09-18 at 06:33 -0700, Eric Dumazet wrote:
> Note that for TCP, there are packets sent by the process doing the
> send(), or packets sent by cpu doing TX completion (because of TSQ),
> but also packets sent by ACK processing done in the reverse way.
>
> As Alexander explained, if the ACK packets are delivered into another
> CPU, then you might select another TX queue.
>
> This is mostly prevented because of ooo_okay logic, meaning that a busy
> bulk flow should stick into a single TX queue, no matter of XPS says.
>
> A TCP_RR workload is free to chose whatever queue, because every packet
> starting a RR block has the ooo_okay set (As prior data was delivered
> and acknowledged by the opposite peer)
>
Last but not least, there is the fact that networking stacks use
mod_timer() to arm timers, and that by default, timer migration is on
( cf /proc/sys/kernel/timer_migration )
We probably should use mod_timer_pinned(), but I could not really see
any difference.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: CPU scheduler to TXQ binding? (ixgbe vs. igb)
2014-09-18 13:41 ` Eric Dumazet
@ 2014-09-18 15:42 ` Eric Dumazet
2014-09-18 15:59 ` Jesper Dangaard Brouer
2014-09-18 16:07 ` Eric Dumazet
0 siblings, 2 replies; 13+ messages in thread
From: Eric Dumazet @ 2014-09-18 15:42 UTC (permalink / raw)
To: Jesper Dangaard Brouer
Cc: Alexander Duyck, netdev@vger.kernel.org, Tom Herbert
On Thu, 2014-09-18 at 06:41 -0700, Eric Dumazet wrote:
> Last but not least, there is the fact that networking stacks use
> mod_timer() to arm timers, and that by default, timer migration is on
> ( cf /proc/sys/kernel/timer_migration )
>
> We probably should use mod_timer_pinned(), but I could not really see
> any difference.
Hmm... actually its quite noticeable :
# ./super_netperf 500 --google-pacing-rate 3000000 -H lpaa24 -l 1000 &
...
# echo 1 >/proc/sys/kernel/timer_migration
# vmstat 5
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
2 0 0 261178336 15812 1001880 0 0 5 1 185 217 0 4 96 0
0 0 0 261173456 15812 1001884 0 0 0 0 1548055 35472 0 15 85 0
2 0 0 261174880 15812 1001888 0 0 0 0 1533309 35163 0 15 85 0
3 0 0 261176768 15812 1001896 0 0 0 0 1533442 35694 0 15 85 0
2 0 0 261173584 15812 1001912 0 0 0 3 1524024 35489 0 16 83 0
3 0 0 261173344 15812 1001912 0 0 0 4 1525034 35392 0 15 85 0
2 0 0 261175840 15812 1001920 0 0 0 0 1545652 35772 0 15 84 0
3 0 0 261176800 15812 1001920 0 0 0 0 1513413 35703 0 15 85 0
0 0 0 261175136 15812 1001920 0 0 0 2 1528775 35639 0 15 85 0
1 0 0 261176480 15812 1001924 0 0 0 0 1510346 35364 0 15 85 0
0 0 0 261174624 15812 1001924 0 0 0 0 1523893 35669 0 15 85 0
0 0 0 261175568 15812 1001928 0 0 0 5 1524099 35605 0 15 85 0
2 0 0 261175776 15812 1001932 0 0 0 5 1510481 35631 0 15 85 0
2 0 0 261173776 15812 1001932 0 0 0 0 1528381 36127 0 15 84 0
3 0 0 261175424 15812 1001932 0 0 0 0 1508722 35402 0 15 85 0
1 0 0 261176048 15812 1001932 0 0 0 0 1495438 35280 0 15 85 0
^C
# echo 0 >/proc/sys/kernel/timer_migration
# vmstat 5
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
2 0 0 261172784 15812 1001936 0 0 5 1 165 228 0 5 95 0
1 0 0 261175776 15812 1001940 0 0 0 0 1187446 32238 0 12 88 0
2 0 0 261172752 15812 1001940 0 0 0 3 1166697 32060 0 12 88 0
1 0 0 261174528 15812 1001944 0 0 0 3 1156846 32048 0 12 88 0
1 0 0 261172688 15812 1001944 0 0 0 0 1152953 32048 0 12 88 0
0 0 0 261169888 15812 1001952 0 0 0 0 1143630 32710 0 12 88 0
2 0 0 261159936 15812 1001748 0 0 0 1016 1153256 32616 0 12 88 0
2 0 0 261162128 15812 1001936 0 0 0 0 1153065 32689 0 12 88 0
1 0 0 261171984 15812 1001936 0 0 0 3 1164407 32041 0 12 88 0
2 0 0 261169552 15812 1001936 0 0 0 5 1162068 31917 0 12 88 0
I am tempted to simply :
diff --git a/net/core/sock.c b/net/core/sock.c
index 9c3f823e76a9..868c6bcd7221 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2288,10 +2288,10 @@ void sk_send_sigurg(struct sock *sk)
}
EXPORT_SYMBOL(sk_send_sigurg);
-void sk_reset_timer(struct sock *sk, struct timer_list* timer,
+void sk_reset_timer(struct sock *sk, struct timer_list *timer,
unsigned long expires)
{
- if (!mod_timer(timer, expires))
+ if (!mod_timer_pinned(timer, expires))
sock_hold(sk);
}
EXPORT_SYMBOL(sk_reset_timer);
^ permalink raw reply related [flat|nested] 13+ messages in thread* Re: CPU scheduler to TXQ binding? (ixgbe vs. igb)
2014-09-18 15:42 ` Eric Dumazet
@ 2014-09-18 15:59 ` Jesper Dangaard Brouer
2014-09-18 16:34 ` Eric Dumazet
2014-09-18 16:07 ` Eric Dumazet
1 sibling, 1 reply; 13+ messages in thread
From: Jesper Dangaard Brouer @ 2014-09-18 15:59 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Alexander Duyck, netdev@vger.kernel.org, Tom Herbert
On Thu, 18 Sep 2014 08:42:31 -0700
Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2014-09-18 at 06:41 -0700, Eric Dumazet wrote:
>
> > Last but not least, there is the fact that networking stacks use
> > mod_timer() to arm timers, and that by default, timer migration is on
> > ( cf /proc/sys/kernel/timer_migration )
I don't have this proc file on my system, as I didn't select CONFIG_SCHED_DEBUG.
> > We probably should use mod_timer_pinned(), but I could not really see
> > any difference.
>
> Hmm... actually its quite noticeable :
Interesting impact.
I'm looking for some 1G hardware without multiqueue, so I can get
around this measurement constraint. And possibly turning it down to
100Mbit/s, so I can more easily measure the HoL blocking effect.
> # ./super_netperf 500 --google-pacing-rate 3000000 -H lpaa24 -l 1000 &
> ...
Interesting option "--google-pacing-rate" ;-)
> # echo 1 >/proc/sys/kernel/timer_migration
> # vmstat 5
> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
> r b swpd free buff cache si so bi bo in cs us sy id wa
> 2 0 0 261178336 15812 1001880 0 0 5 1 185 217 0 4 96 0
> 0 0 0 261173456 15812 1001884 0 0 0 0 1548055 35472 0 15 85 0
> 2 0 0 261174880 15812 1001888 0 0 0 0 1533309 35163 0 15 85 0
> 3 0 0 261176768 15812 1001896 0 0 0 0 1533442 35694 0 15 85 0
[]
> # echo 0 >/proc/sys/kernel/timer_migration
> # vmstat 5
> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
> r b swpd free buff cache si so bi bo in cs us sy id wa
> 2 0 0 261172784 15812 1001936 0 0 5 1 165 228 0 5 95 0
> 1 0 0 261175776 15812 1001940 0 0 0 0 1187446 32238 0 12 88 0
> 2 0 0 261172752 15812 1001940 0 0 0 3 1166697 32060 0 12 88 0
Quite significant, both interrupts and especially CPU system usage drop.
> I am tempted to simply :
>
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 9c3f823e76a9..868c6bcd7221 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -2288,10 +2288,10 @@ void sk_send_sigurg(struct sock *sk)
> }
> EXPORT_SYMBOL(sk_send_sigurg);
>
> -void sk_reset_timer(struct sock *sk, struct timer_list* timer,
> +void sk_reset_timer(struct sock *sk, struct timer_list *timer,
> unsigned long expires)
> {
> - if (!mod_timer(timer, expires))
> + if (!mod_timer_pinned(timer, expires))
> sock_hold(sk);
> }
> EXPORT_SYMBOL(sk_reset_timer);
>
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Sr. Network Kernel Developer at Red Hat
Author of http://www.iptv-analyzer.org
LinkedIn: http://www.linkedin.com/in/brouer
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: CPU scheduler to TXQ binding? (ixgbe vs. igb)
2014-09-18 15:59 ` Jesper Dangaard Brouer
@ 2014-09-18 16:34 ` Eric Dumazet
2014-09-18 18:57 ` Jesper Dangaard Brouer
0 siblings, 1 reply; 13+ messages in thread
From: Eric Dumazet @ 2014-09-18 16:34 UTC (permalink / raw)
To: Jesper Dangaard Brouer
Cc: Alexander Duyck, netdev@vger.kernel.org, Tom Herbert
On Thu, 2014-09-18 at 17:59 +0200, Jesper Dangaard Brouer wrote:
> On Thu, 18 Sep 2014 08:42:31 -0700
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> > On Thu, 2014-09-18 at 06:41 -0700, Eric Dumazet wrote:
> >
> > > Last but not least, there is the fact that networking stacks use
> > > mod_timer() to arm timers, and that by default, timer migration is on
> > > ( cf /proc/sys/kernel/timer_migration )
>
> I don't have this proc file on my system, as I didn't select CONFIG_SCHED_DEBUG.
Interesting... this timer_migration stuff seems a bit scary to me.
>
> > > We probably should use mod_timer_pinned(), but I could not really see
> > > any difference.
> >
> > Hmm... actually its quite noticeable :
>
> Interesting impact.
>
> I'm looking for some 1G hardware without multiqueue, so I can get
> around this measurement constraint. And possibly turning it down to
> 100Mbit/s, so I can more easily measure the HoL blocking effect.
>
ethtool -L eth0 rx 1 tx 1
(Or similar if combined is used)
>
> > # ./super_netperf 500 --google-pacing-rate 3000000 -H lpaa24 -l 1000 &
> > ...
>
> Interesting option "--google-pacing-rate" ;-)
Its using upstream SO_MAX_PACING_RATE, nothing fancy ;)
>
> > # echo 1 >/proc/sys/kernel/timer_migration
> > # vmstat 5
> > procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
> > r b swpd free buff cache si so bi bo in cs us sy id wa
> > 2 0 0 261178336 15812 1001880 0 0 5 1 185 217 0 4 96 0
> > 0 0 0 261173456 15812 1001884 0 0 0 0 1548055 35472 0 15 85 0
> > 2 0 0 261174880 15812 1001888 0 0 0 0 1533309 35163 0 15 85 0
> > 3 0 0 261176768 15812 1001896 0 0 0 0 1533442 35694 0 15 85 0
> []
>
> > # echo 0 >/proc/sys/kernel/timer_migration
> > # vmstat 5
> > procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
> > r b swpd free buff cache si so bi bo in cs us sy id wa
> > 2 0 0 261172784 15812 1001936 0 0 5 1 165 228 0 5 95 0
> > 1 0 0 261175776 15812 1001940 0 0 0 0 1187446 32238 0 12 88 0
> > 2 0 0 261172752 15812 1001940 0 0 0 3 1166697 32060 0 12 88 0
>
> Quite significant, both interrupts and especially CPU system usage drop.
>
Yep...
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: CPU scheduler to TXQ binding? (ixgbe vs. igb)
2014-09-18 16:34 ` Eric Dumazet
@ 2014-09-18 18:57 ` Jesper Dangaard Brouer
0 siblings, 0 replies; 13+ messages in thread
From: Jesper Dangaard Brouer @ 2014-09-18 18:57 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Alexander Duyck, netdev@vger.kernel.org, Tom Herbert
On Thu, 18 Sep 2014 09:34:24 -0700 Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2014-09-18 at 17:59 +0200, Jesper Dangaard Brouer wrote:
> > On Thu, 18 Sep 2014 08:42:31 -0700
> > Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >
[...]
> > I'm looking for some 1G hardware without multiqueue, so I can get
> > around this measurement constraint. And possibly turning it down to
> > 100Mbit/s, so I can more easily measure the HoL blocking effect.
> >
>
> ethtool -L eth0 rx 1 tx 1
>
> (Or similar if combined is used)
Thanks! - that solves my qdisc measurement problem :-)
And yes, I had to use:
ethtool -L eth1 combined 1
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Sr. Network Kernel Developer at Red Hat
Author of http://www.iptv-analyzer.org
LinkedIn: http://www.linkedin.com/in/brouer
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: CPU scheduler to TXQ binding? (ixgbe vs. igb)
2014-09-18 15:42 ` Eric Dumazet
2014-09-18 15:59 ` Jesper Dangaard Brouer
@ 2014-09-18 16:07 ` Eric Dumazet
1 sibling, 0 replies; 13+ messages in thread
From: Eric Dumazet @ 2014-09-18 16:07 UTC (permalink / raw)
To: Jesper Dangaard Brouer
Cc: Alexander Duyck, netdev@vger.kernel.org, Tom Herbert
On Thu, 2014-09-18 at 08:42 -0700, Eric Dumazet wrote:
> On Thu, 2014-09-18 at 06:41 -0700, Eric Dumazet wrote:
>
> > Last but not least, there is the fact that networking stacks use
> > mod_timer() to arm timers, and that by default, timer migration is on
> > ( cf /proc/sys/kernel/timer_migration )
> >
> > We probably should use mod_timer_pinned(), but I could not really see
> > any difference.
>
> Hmm... actually its quite noticeable :
>
> # ./super_netperf 500 --google-pacing-rate 3000000 -H lpaa24 -l 1000 &
> ...
> # echo 1 >/proc/sys/kernel/timer_migration
> # vmstat 5
> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
> r b swpd free buff cache si so bi bo in cs us sy id wa
> 2 0 0 261178336 15812 1001880 0 0 5 1 185 217 0 4 96 0
> 0 0 0 261173456 15812 1001884 0 0 0 0 1548055 35472 0 15 85 0
> 2 0 0 261174880 15812 1001888 0 0 0 0 1533309 35163 0 15 85 0
> 3 0 0 261176768 15812 1001896 0 0 0 0 1533442 35694 0 15 85 0
> 2 0 0 261173584 15812 1001912 0 0 0 3 1524024 35489 0 16 83 0
> 3 0 0 261173344 15812 1001912 0 0 0 4 1525034 35392 0 15 85 0
> 2 0 0 261175840 15812 1001920 0 0 0 0 1545652 35772 0 15 84 0
> 3 0 0 261176800 15812 1001920 0 0 0 0 1513413 35703 0 15 85 0
> 0 0 0 261175136 15812 1001920 0 0 0 2 1528775 35639 0 15 85 0
> 1 0 0 261176480 15812 1001924 0 0 0 0 1510346 35364 0 15 85 0
> 0 0 0 261174624 15812 1001924 0 0 0 0 1523893 35669 0 15 85 0
> 0 0 0 261175568 15812 1001928 0 0 0 5 1524099 35605 0 15 85 0
> 2 0 0 261175776 15812 1001932 0 0 0 5 1510481 35631 0 15 85 0
> 2 0 0 261173776 15812 1001932 0 0 0 0 1528381 36127 0 15 84 0
> 3 0 0 261175424 15812 1001932 0 0 0 0 1508722 35402 0 15 85 0
> 1 0 0 261176048 15812 1001932 0 0 0 0 1495438 35280 0 15 85 0
> ^C
> # echo 0 >/proc/sys/kernel/timer_migration
> # vmstat 5
> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
> r b swpd free buff cache si so bi bo in cs us sy id wa
> 2 0 0 261172784 15812 1001936 0 0 5 1 165 228 0 5 95 0
> 1 0 0 261175776 15812 1001940 0 0 0 0 1187446 32238 0 12 88 0
> 2 0 0 261172752 15812 1001940 0 0 0 3 1166697 32060 0 12 88 0
> 1 0 0 261174528 15812 1001944 0 0 0 3 1156846 32048 0 12 88 0
> 1 0 0 261172688 15812 1001944 0 0 0 0 1152953 32048 0 12 88 0
> 0 0 0 261169888 15812 1001952 0 0 0 0 1143630 32710 0 12 88 0
> 2 0 0 261159936 15812 1001748 0 0 0 1016 1153256 32616 0 12 88 0
> 2 0 0 261162128 15812 1001936 0 0 0 0 1153065 32689 0 12 88 0
> 1 0 0 261171984 15812 1001936 0 0 0 3 1164407 32041 0 12 88 0
> 2 0 0 261169552 15812 1001936 0 0 0 5 1162068 31917 0 12 88 0
>
> I am tempted to simply :
>
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 9c3f823e76a9..868c6bcd7221 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -2288,10 +2288,10 @@ void sk_send_sigurg(struct sock *sk)
> }
> EXPORT_SYMBOL(sk_send_sigurg);
>
> -void sk_reset_timer(struct sock *sk, struct timer_list* timer,
> +void sk_reset_timer(struct sock *sk, struct timer_list *timer,
> unsigned long expires)
> {
> - if (!mod_timer(timer, expires))
> + if (!mod_timer_pinned(timer, expires))
> sock_hold(sk);
> }
> EXPORT_SYMBOL(sk_reset_timer);
>
And/or changing all occurences of HRTIMER_MODE_ABS in net/sched
into HRTIMER_MODE_ABS_PINNED
Because we _want_ qdisc being restarted on the right cpu for sure.
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2014-09-18 18:57 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-09-17 13:26 CPU scheduler to TXQ binding? (ixgbe vs. igb) Jesper Dangaard Brouer
2014-09-17 14:32 ` Eric Dumazet
2014-09-17 14:55 ` Jesper Dangaard Brouer
2014-09-17 14:59 ` Alexander Duyck
2014-09-18 6:56 ` Jesper Dangaard Brouer
2014-09-18 7:28 ` Jesper Dangaard Brouer
2014-09-18 13:33 ` Eric Dumazet
2014-09-18 13:41 ` Eric Dumazet
2014-09-18 15:42 ` Eric Dumazet
2014-09-18 15:59 ` Jesper Dangaard Brouer
2014-09-18 16:34 ` Eric Dumazet
2014-09-18 18:57 ` Jesper Dangaard Brouer
2014-09-18 16:07 ` Eric Dumazet
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).