Drops in qdisc on ifb interface

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Drops in qdisc on ifb interface
@ 2015-05-25 20:05 John A. Sullivan III
  2015-05-25 22:31 ` Eric Dumazet
  0 siblings, 1 reply; 16+ messages in thread
From: John A. Sullivan III @ 2015-05-25 20:05 UTC (permalink / raw)
  To: netdev

Hello, all.  One one of our connections we are doing intensive traffic
shaping with tc.  We are using ifb interfaces for shaping ingress
traffic and we also use ifb interfaces for egress so that we can apply
the same set of rules to multiple interfaces (e.g., tun and eth
interfaces operating on the same physical interface).

These are running on very powerful gateways; I have watched them
handling 16 Gbps with CPU utilization at a handful of percent.  Yet, I
am seeing drops on the ifb interfaces when I do a tc -s qdisc show.

Why would this be? I would expect if there was some kind of problem that
it would manifest as drops on the physical interfaces and not the IFB
interface.  We have played with queue lengths in both directions.  We
are using HFSC with SFQ leaves so I would imagine this overrides the
very short qlen on the IFB interfaces (32).  These are drops and not
overlimits.

Ingress:

root@gwhq-2:~# tc -s qdisc show dev ifb0
qdisc hfsc 11: root refcnt 2 default 50
 Sent 198152831324 bytes 333838154 pkt (dropped 101509, overlimits 9850280 requeues 43871)
 backlog 0b 0p requeues 43871
qdisc sfq 1102: parent 11:10 limit 127p quantum 1514b divisor 4096
 Sent 208463490 bytes 1367761 pkt (dropped 234, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc sfq 1202: parent 11:20 limit 127p quantum 1514b divisor 4096
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc sfq 1302: parent 11:30 limit 127p quantum 1514b divisor 4096
 Sent 13498600307 bytes 203705301 pkt (dropped 23358, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc sfq 1402: parent 11:40 limit 127p quantum 1514b divisor 4096
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc sfq 1502: parent 11:50 limit 127p quantum 1514b divisor 4096
 Sent 184445767527 bytes 128765092 pkt (dropped 77990, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0

root@gwhq-2:~# tc -s class show dev ifb0
class hfsc 11: root
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
 period 0 level 2

class hfsc 11:1 parent 11: ls m1 0bit d 0us m2 1000Mbit ul m1 0bit d 0us m2 1000Mbit
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
 period 210766381 work 198152837828 bytes level 1

class hfsc 11:10 parent 11:1 leaf 1102: rt m1 0bit d 0us m2 1000Mbit
 Sent 208463490 bytes 1367761 pkt (dropped 234, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
 period 0 work 208463490 bytes rtwork 208463490 bytes level 0

class hfsc 11:20 parent 11:1 leaf 1202: rt m1 186182Kbit d 2.2ms m2 100000Kbit ls m1 0bit d 0us m2 100000Kbit
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
 period 0 level 0

class hfsc 11:30 parent 11:1 leaf 1302: rt m1 0bit d 0us m2 100000Kbit ls m1 0bit d 0us m2 300000Kbit
 Sent 13498600307 bytes 203705301 pkt (dropped 23358, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
 period 200073586 work 13498600307 bytes rtwork 10035553945 bytes level 0

class hfsc 11:40 parent 11:1 leaf 1402: rt m1 0bit d 0us m2 200000Kbit ls m1 0bit d 0us m2 500000Kbit
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
 period 0 level 0

class hfsc 11:50 parent 11:1 leaf 1502: rt m1 0bit d 0us m2 200000Kbit ls m1 0bit d 0us m2 100000Kbit
 Sent 184446394921 bytes 128765668 pkt (dropped 77917, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
 period 11254219 work 184445774031 bytes rtwork 39040535823 bytes level 0


Egress:

root@gwhq-2:~# tc -s qdisc show dev ifb1
qdisc hfsc 1: root refcnt 2 default 40
 Sent 783335740812 bytes 551888729 pkt (dropped 9622, overlimits 8546933 requeues 7180)
 backlog 0b 0p requeues 7180
qdisc sfq 1101: parent 1:10 limit 127p quantum 1514b divisor 4096
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc sfq 1201: parent 1:20 limit 127p quantum 1514b divisor 4096
 Sent 345678 bytes 2800 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc sfq 1301: parent 1:30 limit 127p quantum 1514b divisor 4096
 Sent 573479513 bytes 8689797 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc sfq 1401: parent 1:40 limit 127p quantum 1514b divisor 4096
 Sent 782761915621 bytes 543196132 pkt (dropped 9692, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0

root@gwhq-2:~# tc -s class show dev ifb1
class hfsc 1: root
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
 period 0 level 2

class hfsc 1:10 parent 1:1 leaf 1101: rt m1 186182Kbit d 2.2ms m2 100000Kbit ls m1 0bit d 0us m2 100000Kbit
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
 period 0 level 0

class hfsc 1:1 parent 1: ls m1 0bit d 0us m2 1000Mbit ul m1 0bit d 0us m2 1000Mbit
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
 period 27259167 work 783335741126 bytes level 1

class hfsc 1:20 parent 1:1 leaf 1201: rt m1 0bit d 0us m2 100000Kbit ls m1 0bit d 0us m2 300000Kbit
 Sent 345678 bytes 2800 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
 period 2791 work 345678 bytes rtwork 285108 bytes level 0

class hfsc 1:30 parent 1:1 leaf 1301: rt m1 0bit d 0us m2 200000Kbit ls m1 0bit d 0us m2 500000Kbit
 Sent 573479513 bytes 8689797 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
 period 8689608 work 573479513 bytes rtwork 573479447 bytes level 0

class hfsc 1:40 parent 1:1 leaf 1401: rt m1 0bit d 0us m2 200000Kbit ls m1 0bit d 0us m2 100000Kbit
 Sent 782762094327 bytes 543196259 pkt (dropped 9622, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
 period 19132342 work 782761915935 bytes rtwork 200858352128 bytes level 0


Thanks - John

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Drops in qdisc on ifb interface
  2015-05-25 20:05 Drops in qdisc on ifb interface John A. Sullivan III
@ 2015-05-25 22:31 ` Eric Dumazet
  2015-05-26  2:52   ` John A. Sullivan III
  2015-05-28 14:38   ` jsullivan
  0 siblings, 2 replies; 16+ messages in thread
From: Eric Dumazet @ 2015-05-25 22:31 UTC (permalink / raw)
  To: John A. Sullivan III; +Cc: netdev

On Mon, 2015-05-25 at 16:05 -0400, John A. Sullivan III wrote:
> Hello, all.  One one of our connections we are doing intensive traffic
> shaping with tc.  We are using ifb interfaces for shaping ingress
> traffic and we also use ifb interfaces for egress so that we can apply
> the same set of rules to multiple interfaces (e.g., tun and eth
> interfaces operating on the same physical interface).
> 
> These are running on very powerful gateways; I have watched them
> handling 16 Gbps with CPU utilization at a handful of percent.  Yet, I
> am seeing drops on the ifb interfaces when I do a tc -s qdisc show.
> 
> Why would this be? I would expect if there was some kind of problem that
> it would manifest as drops on the physical interfaces and not the IFB
> interface.  We have played with queue lengths in both directions.  We
> are using HFSC with SFQ leaves so I would imagine this overrides the
> very short qlen on the IFB interfaces (32).  These are drops and not
> overlimits.

IFB is single threaded and a serious bottleneck.

Don't use this on egress, this destroys multiqueue capaility.

And SFQ is pretty limited (127 packets)

You might try to change your NIC to have a single queue for RX,
so that you have a single cpu feeding your IFB queue.

(ethtool -L eth0 rx 1)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Drops in qdisc on ifb interface
  2015-05-25 22:31 ` Eric Dumazet
@ 2015-05-26  2:52   ` John A. Sullivan III
  2015-05-26  3:17     ` Eric Dumazet
  2015-05-28 14:38   ` jsullivan
  1 sibling, 1 reply; 16+ messages in thread
From: John A. Sullivan III @ 2015-05-26  2:52 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

On Mon, 2015-05-25 at 15:31 -0700, Eric Dumazet wrote:
> On Mon, 2015-05-25 at 16:05 -0400, John A. Sullivan III wrote:
> > Hello, all.  One one of our connections we are doing intensive traffic
> > shaping with tc.  We are using ifb interfaces for shaping ingress
> > traffic and we also use ifb interfaces for egress so that we can apply
> > the same set of rules to multiple interfaces (e.g., tun and eth
> > interfaces operating on the same physical interface).
> > 
> > These are running on very powerful gateways; I have watched them
> > handling 16 Gbps with CPU utilization at a handful of percent.  Yet, I
> > am seeing drops on the ifb interfaces when I do a tc -s qdisc show.
> > 
> > Why would this be? I would expect if there was some kind of problem that
> > it would manifest as drops on the physical interfaces and not the IFB
> > interface.  We have played with queue lengths in both directions.  We
> > are using HFSC with SFQ leaves so I would imagine this overrides the
> > very short qlen on the IFB interfaces (32).  These are drops and not
> > overlimits.
> 
> IFB is single threaded and a serious bottleneck.
> 
> Don't use this on egress, this destroys multiqueue capaility.
> 
> And SFQ is pretty limited (127 packets)
> 
> You might try to change your NIC to have a single queue for RX,
> so that you have a single cpu feeding your IFB queue.
> 
> (ethtool -L eth0 rx 1)
> 
> 
> 
> 
> 
Hmm . . . I've been thinking about that SFQ leaf qdisc.  I see that
newer kernels allow a much higher "limit" than 127 but it still seems
that the queue depth limit for any one flow is still 127.  When we do
something like GRE/IPSec, I think the decrypted GRE traffic will
distribute across the queues but the IPSec traffic will collapse all the
packets initially into one queue.  At 80ms RTT a 1 Gbps wire speed, I
would need a queue of around 7500.  Thus, can one say that SFQ is almost
useless for high BDP connections?

Is there a similar round-robin type qdisc that does not have this
limitation?

If I recall correctly, if one does not attach a qdisc explicitly to a
class, it defaults to pfifo_fast.  Is that correct? Thanks - John

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Drops in qdisc on ifb interface
  2015-05-26  2:52   ` John A. Sullivan III
@ 2015-05-26  3:17     ` Eric Dumazet
  0 siblings, 0 replies; 16+ messages in thread
From: Eric Dumazet @ 2015-05-26  3:17 UTC (permalink / raw)
  To: John A. Sullivan III; +Cc: netdev

On Mon, 2015-05-25 at 22:52 -0400, John A. Sullivan III wrote:

> Hmm . . . I've been thinking about that SFQ leaf qdisc.  I see that
> newer kernels allow a much higher "limit" than 127 but it still seems
> that the queue depth limit for any one flow is still 127.  When we do
> something like GRE/IPSec, I think the decrypted GRE traffic will
> distribute across the queues but the IPSec traffic will collapse all the
> packets initially into one queue.  At 80ms RTT a 1 Gbps wire speed, I
> would need a queue of around 7500.  Thus, can one say that SFQ is almost
> useless for high BDP connections?

I am a bit surprised, as your 'nstat' output showed no packet
retransmits. So no packets were lost in your sfq.

> 
> Is there a similar round-robin type qdisc that does not have this
> limitation?

fq_codel limit 10000 

> 
> If I recall correctly, if one does not attach a qdisc explicitly to a
> class, it defaults to pfifo_fast.  Is that correct? Thanks - John
> 

That would be pfifo.

pfifo_fast is the default root qdisc ( /proc/sys/net/core/default_qdisc
)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Drops in qdisc on ifb interface
  2015-05-25 22:31 ` Eric Dumazet
  2015-05-26  2:52   ` John A. Sullivan III
@ 2015-05-28 14:38   ` jsullivan
  2015-05-28 15:14     ` Eric Dumazet
  1 sibling, 1 reply; 16+ messages in thread
From: jsullivan @ 2015-05-28 14:38 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev


> On May 25, 2015 at 6:31 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>
> On Mon, 2015-05-25 at 16:05 -0400, John A. Sullivan III wrote:
> > Hello, all. One one of our connections we are doing intensive traffic
> > shaping with tc. We are using ifb interfaces for shaping ingress
> > traffic and we also use ifb interfaces for egress so that we can apply
> > the same set of rules to multiple interfaces (e.g., tun and eth
> > interfaces operating on the same physical interface).
> >
> > These are running on very powerful gateways; I have watched them
> > handling 16 Gbps with CPU utilization at a handful of percent. Yet, I
> > am seeing drops on the ifb interfaces when I do a tc -s qdisc show.
> >
> > Why would this be? I would expect if there was some kind of problem that
> > it would manifest as drops on the physical interfaces and not the IFB
> > interface. We have played with queue lengths in both directions. We
> > are using HFSC with SFQ leaves so I would imagine this overrides the
> > very short qlen on the IFB interfaces (32). These are drops and not
> > overlimits.
>
> IFB is single threaded and a serious bottleneck.
>
> Don't use this on egress, this destroys multiqueue capaility.
>
> And SFQ is pretty limited (127 packets)
>
> You might try to change your NIC to have a single queue for RX,
> so that you have a single cpu feeding your IFB queue.
>
> (ethtool -L eth0 rx 1)
>
>
>
>
>
This has been an interesting exercise - thank you for your help along the way,
Eric.  IFB did not seem to bottleneck in our initial testing but there was
really only one flow of traffic during the test at around 1 Gbps.  However, on a
non-test system with many different flows, IFB does seem to be a serious
bottleneck - I assume this is the consequence of being single-threaded.

Single queue did not seem to help.

Am I correct to assume that IFB would be as much as a bottleneck on the ingress
side as it would be on the egress side? If so, is there any way to do high
performance ingress traffic shaping on Linux - a multi-threaded version of IFB
or a different approach? Thanks - John

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Drops in qdisc on ifb interface
  2015-05-28 14:38   ` jsullivan
@ 2015-05-28 15:14     ` Eric Dumazet
  2015-05-28 15:30       ` jsullivan
  0 siblings, 1 reply; 16+ messages in thread
From: Eric Dumazet @ 2015-05-28 15:14 UTC (permalink / raw)
  To: jsullivan@opensourcedevel.com; +Cc: netdev

On Thu, 2015-05-28 at 10:38 -0400, jsullivan@opensourcedevel.com wrote:

> This has been an interesting exercise - thank you for your help along the way,
> Eric.  IFB did not seem to bottleneck in our initial testing but there was
> really only one flow of traffic during the test at around 1 Gbps.  However, on a
> non-test system with many different flows, IFB does seem to be a serious
> bottleneck - I assume this is the consequence of being single-threaded.
> 
> Single queue did not seem to help.
> 
> Am I correct to assume that IFB would be as much as a bottleneck on the ingress
> side as it would be on the egress side? If so, is there any way to do high
> performance ingress traffic shaping on Linux - a multi-threaded version of IFB
> or a different approach? Thanks - John

IFB has still a long way before being efficient.

In the mean time, you could play with following patch, and
setup /sys/class/net/eth0/gro_timeout to 20000

This way, the GRO aggregation will work even at 1Gbps, and your IFB will
get big GRO packets instead of single MSS segments.

Both IFB but also IP/TCP stack will have less work to do,
and receiver will send fewer ACK packets as well.

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index f287186192bb655ba2dc1a205fb251351d593e98..c37f6657c047d3eb9bd72b647572edd53b1881ac 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -151,7 +151,7 @@ static void igb_setup_dca(struct igb_adapter *);
 #endif /* CONFIG_IGB_DCA */
 static int igb_poll(struct napi_struct *, int);
 static bool igb_clean_tx_irq(struct igb_q_vector *);
-static bool igb_clean_rx_irq(struct igb_q_vector *, int);
+static unsigned int igb_clean_rx_irq(struct igb_q_vector *, int);
 static int igb_ioctl(struct net_device *, struct ifreq *, int cmd);
 static void igb_tx_timeout(struct net_device *);
 static void igb_reset_task(struct work_struct *);
@@ -6342,6 +6342,7 @@ static int igb_poll(struct napi_struct *napi, int budget)
 						     struct igb_q_vector,
 						     napi);
 	bool clean_complete = true;
+	unsigned int packets = 0;
 
 #ifdef CONFIG_IGB_DCA
 	if (q_vector->adapter->flags & IGB_FLAG_DCA_ENABLED)
@@ -6350,15 +6351,17 @@ static int igb_poll(struct napi_struct *napi, int budget)
 	if (q_vector->tx.ring)
 		clean_complete = igb_clean_tx_irq(q_vector);
 
-	if (q_vector->rx.ring)
-		clean_complete &= igb_clean_rx_irq(q_vector, budget);
+	if (q_vector->rx.ring) {
+		packets = igb_clean_rx_irq(q_vector, budget);
+		clean_complete &= packets < budget;
+	}
 
 	/* If all work not completed, return budget and keep polling */
 	if (!clean_complete)
 		return budget;
 
 	/* If not enough Rx work done, exit the polling mode */
-	napi_complete(napi);
+	napi_complete_done(napi, packets);
 	igb_ring_irq_enable(q_vector);
 
 	return 0;
@@ -6926,7 +6929,7 @@ static void igb_process_skb_fields(struct igb_ring *rx_ring,
 	skb->protocol = eth_type_trans(skb, rx_ring->netdev);
 }
 
-static bool igb_clean_rx_irq(struct igb_q_vector *q_vector, const int budget)
+static unsigned int igb_clean_rx_irq(struct igb_q_vector *q_vector, const int budget)
 {
 	struct igb_ring *rx_ring = q_vector->rx.ring;
 	struct sk_buff *skb = rx_ring->skb;
@@ -7000,7 +7003,7 @@ static bool igb_clean_rx_irq(struct igb_q_vector *q_vector, const int budget)
 	if (cleaned_count)
 		igb_alloc_rx_buffers(rx_ring, cleaned_count);
 
-	return total_packets < budget;
+	return total_packets;
 }
 
 static bool igb_alloc_mapped_page(struct igb_ring *rx_ring,

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: Drops in qdisc on ifb interface
  2015-05-28 15:14     ` Eric Dumazet
@ 2015-05-28 15:30       ` jsullivan
  2015-05-28 15:45         ` John Fastabend
  2015-05-28 15:51         ` Eric Dumazet
  0 siblings, 2 replies; 16+ messages in thread
From: jsullivan @ 2015-05-28 15:30 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev


> On May 28, 2015 at 11:14 AM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>
> On Thu, 2015-05-28 at 10:38 -0400, jsullivan@opensourcedevel.com wrote:
>
<snip>
> IFB has still a long way before being efficient.
>
> In the mean time, you could play with following patch, and
> setup /sys/class/net/eth0/gro_timeout to 20000
>
> This way, the GRO aggregation will work even at 1Gbps, and your IFB will
> get big GRO packets instead of single MSS segments.
>
> Both IFB but also IP/TCP stack will have less work to do,
> and receiver will send fewer ACK packets as well.
>
> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c
> b/drivers/net/ethernet/intel/igb/igb_main.c
> index
> f287186192bb655ba2dc1a205fb251351d593e98..c37f6657c047d3eb9bd72b647572edd53b1881ac
> 100644
> --- a/drivers/net/ethernet/intel/igb/igb_main.c
> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> @@ -151,7 +151,7 @@ static void igb_setup_dca(struct igb_adapter *);
> #endif /* CONFIG_IGB_DCA */
<snip>

Interesting but this is destined to become a critical production system for a
high profile, internationally recognized product so I am hesitant to patch.  I
doubt I can convince my company to do it but is improving IFB the sort of
development effort that could be sponsored and then executed in a moderately
short period of time? Thanks - John

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Drops in qdisc on ifb interface
  2015-05-28 15:30       ` jsullivan
@ 2015-05-28 15:45         ` John Fastabend
  2015-05-28 16:26           ` Eric Dumazet
  2015-05-28 16:28           ` jsullivan
  2015-05-28 15:51         ` Eric Dumazet
  1 sibling, 2 replies; 16+ messages in thread
From: John Fastabend @ 2015-05-28 15:45 UTC (permalink / raw)
  To: jsullivan@opensourcedevel.com; +Cc: Eric Dumazet, netdev

On 05/28/2015 08:30 AM, jsullivan@opensourcedevel.com wrote:
>
>> On May 28, 2015 at 11:14 AM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>
>>
>> On Thu, 2015-05-28 at 10:38 -0400, jsullivan@opensourcedevel.com wrote:
>>
> <snip>
>> IFB has still a long way before being efficient.
>>
>> In the mean time, you could play with following patch, and
>> setup /sys/class/net/eth0/gro_timeout to 20000
>>
>> This way, the GRO aggregation will work even at 1Gbps, and your IFB will
>> get big GRO packets instead of single MSS segments.
>>
>> Both IFB but also IP/TCP stack will have less work to do,
>> and receiver will send fewer ACK packets as well.
>>
>> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c
>> b/drivers/net/ethernet/intel/igb/igb_main.c
>> index
>> f287186192bb655ba2dc1a205fb251351d593e98..c37f6657c047d3eb9bd72b647572edd53b1881ac
>> 100644
>> --- a/drivers/net/ethernet/intel/igb/igb_main.c
>> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
>> @@ -151,7 +151,7 @@ static void igb_setup_dca(struct igb_adapter *);
>> #endif /* CONFIG_IGB_DCA */
> <snip>
>
> Interesting but this is destined to become a critical production system for a
> high profile, internationally recognized product so I am hesitant to patch.  I
> doubt I can convince my company to do it but is improving IFB the sort of
> development effort that could be sponsored and then executed in a moderately
> short period of time? Thanks - John
> --

If your experimenting one thing you could do is create many
ifb devices and load balance across them from tc. I'm not
sure if this would be practical in your setup or not but might
be worth trying.

One thing I've been debating adding is the ability to match
on current cpu_id in tc which would allow you to load balance by
cpu. I could send you a patch if you wanted to test it. I would
expect this to help somewhat with 'single queue' issue but sorry
haven't had time yet to test it out myself.

.John

-- 
John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Drops in qdisc on ifb interface
  2015-05-28 15:30       ` jsullivan
  2015-05-28 15:45         ` John Fastabend
@ 2015-05-28 15:51         ` Eric Dumazet
  1 sibling, 0 replies; 16+ messages in thread
From: Eric Dumazet @ 2015-05-28 15:51 UTC (permalink / raw)
  To: jsullivan@opensourcedevel.com, John Fastabend; +Cc: netdev

On Thu, 2015-05-28 at 11:30 -0400, jsullivan@opensourcedevel.com wrote:

> Interesting but this is destined to become a critical production system for a
> high profile, internationally recognized product so I am hesitant to patch.  I
> doubt I can convince my company to do it but is improving IFB the sort of
> development effort that could be sponsored and then executed in a moderately
> short period of time? Thanks - John

I intend to submit this patch very officially.

Note that some Google servers use the same feature with good success on
other NIC. This allowed us to remove interrupt coalescing, lowering RPC
latencies, but keeping good throughput and cpu efficiency for bulk
flows.

http://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=1a2881728211f0915c0fa1364770b9c73a67a073

While IFB might need quite a lot of efforts, I don't know.

You certainly can ask to John Fastabend if he has plans about it in the
short term.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Drops in qdisc on ifb interface
  2015-05-28 15:45         ` John Fastabend
@ 2015-05-28 16:26           ` Eric Dumazet
  2015-05-28 16:33             ` jsullivan
  2015-05-28 16:28           ` jsullivan
  1 sibling, 1 reply; 16+ messages in thread
From: Eric Dumazet @ 2015-05-28 16:26 UTC (permalink / raw)
  To: John Fastabend; +Cc: jsullivan@opensourcedevel.com, netdev

 On Thu, 2015-05-28 at 08:45 -0700, John Fastabend wrote:
> If your experimenting one thing you could do is create many
> ifb devices and load balance across them from tc. I'm not
> sure if this would be practical in your setup or not but might
> be worth trying.
> 
> One thing I've been debating adding is the ability to match
> on current cpu_id in tc which would allow you to load balance by
> cpu. I could send you a patch if you wanted to test it. I would
> expect this to help somewhat with 'single queue' issue but sorry
> haven't had time yet to test it out myself.

It seems John uses a single 1Gbps flow, so only one cpu would receive
NIC interrupts.

The only way he could get better results would be to schedule IFB work
on another core.

(Assuming one cpu is 100% busy servicing NIC + IFB, but I really doubt
it...)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Drops in qdisc on ifb interface
  2015-05-28 15:45         ` John Fastabend
  2015-05-28 16:26           ` Eric Dumazet
@ 2015-05-28 16:28           ` jsullivan
  1 sibling, 0 replies; 16+ messages in thread
From: jsullivan @ 2015-05-28 16:28 UTC (permalink / raw)
  To: John Fastabend; +Cc: Eric Dumazet, netdev

> On May 28, 2015 at 11:45 AM John Fastabend <john.fastabend@gmail.com> wrote:
>
>
> On 05/28/2015 08:30 AM, jsullivan@opensourcedevel.com wrote:
> >
> >> On May 28, 2015 at 11:14 AM Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >>
> >>
> >> On Thu, 2015-05-28 at 10:38 -0400, jsullivan@opensourcedevel.com wrote:
> >>
> > <snip>
> >> IFB has still a long way before being efficient.
> >>
> >> In the mean time, you could play with following patch, and
> >> setup /sys/class/net/eth0/gro_timeout to 20000
> >>
> >> This way, the GRO aggregation will work even at 1Gbps, and your IFB will
> >> get big GRO packets instead of single MSS segments.
> >>
> >> Both IFB but also IP/TCP stack will have less work to do,
> >> and receiver will send fewer ACK packets as well.
> >>
> >> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c
> >> b/drivers/net/ethernet/intel/igb/igb_main.c
> >> index
> >> f287186192bb655ba2dc1a205fb251351d593e98..c37f6657c047d3eb9bd72b647572edd53b1881ac
> >> 100644
> >> --- a/drivers/net/ethernet/intel/igb/igb_main.c
> >> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> >> @@ -151,7 +151,7 @@ static void igb_setup_dca(struct igb_adapter *);
> >> #endif /* CONFIG_IGB_DCA */
> > <snip>
> >
> > Interesting but this is destined to become a critical production system for
> > a
> > high profile, internationally recognized product so I am hesitant to patch.
> > I
> > doubt I can convince my company to do it but is improving IFB the sort of
> > development effort that could be sponsored and then executed in a moderately
> > short period of time? Thanks - John
> > --
>
> If your experimenting one thing you could do is create many
> ifb devices and load balance across them from tc. I'm not
> sure if this would be practical in your setup or not but might
> be worth trying.
>
> One thing I've been debating adding is the ability to match
> on current cpu_id in tc which would allow you to load balance by
> cpu. I could send you a patch if you wanted to test it. I would
> expect this to help somewhat with 'single queue' issue but sorry
> haven't had time yet to test it out myself.
>
> .John
>
> --
> John Fastabend Intel Corporation

In the meantime, I've noticed something strange.  When testing traffic between
the two primary gateways and thus identical traffic flows, I have the bottleneck
on the one which uses two bonded GbE igb interfaces but not on the one which
uses two bonded 10 GbE ixgbe interfaces.  The ethtool -k settings are identical,
e.g., gso, gro, lro.  The ring buffer is larger on the ixgbe cards but I would
not think that would affect this.   Identical kernels.  The gateway hardware is
identical and not working hard at all - no CPU or RAM pressure.

Any idea why one bottlenecks and the other does not?

Returning to your idea, John, how would I load balance? I assume I would need to
attach several filters to the physical interfaces each redirecting traffic to
different IFB devices.  However, couldn't this work against the traffic shaping?
Let's take an extreme example: all the time sensitive ingress packets find their
way onto ifb0 and all the bulk ingress packets find their way onto ifb1.  As
these packets are merged back to the physical interface, wont' they simply be
treated in pfifo_fast (or other physical interface qdisc) order? Thanks - John

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Drops in qdisc on ifb interface
  2015-05-28 16:26           ` Eric Dumazet
@ 2015-05-28 16:33             ` jsullivan
  2015-05-28 17:17               ` Eric Dumazet
  0 siblings, 1 reply; 16+ messages in thread
From: jsullivan @ 2015-05-28 16:33 UTC (permalink / raw)
  To: Eric Dumazet, John Fastabend; +Cc: netdev


> On May 28, 2015 at 12:26 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>
> On Thu, 2015-05-28 at 08:45 -0700, John Fastabend wrote:
> > If your experimenting one thing you could do is create many
> > ifb devices and load balance across them from tc. I'm not
> > sure if this would be practical in your setup or not but might
> > be worth trying.
> >
> > One thing I've been debating adding is the ability to match
> > on current cpu_id in tc which would allow you to load balance by
> > cpu. I could send you a patch if you wanted to test it. I would
> > expect this to help somewhat with 'single queue' issue but sorry
> > haven't had time yet to test it out myself.
>
> It seems John uses a single 1Gbps flow, so only one cpu would receive
> NIC interrupts.
>
> The only way he could get better results would be to schedule IFB work
> on another core.
>
> (Assuming one cpu is 100% busy servicing NIC + IFB, but I really doubt
> it...)
>
>
>
Our initial testing has been single flow but the ultimate purpose is processing
real time video in a complex application which ingests associated meta data,
post to consumer facing cloud, does reporting back - so lots of different
traffics with very different demands - a perfect tc environment.

CPU utilization is remarkably light.  Every once in a while, we see a single CPU
about 50% utilized with si.  Thanks, all - John

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Drops in qdisc on ifb interface
  2015-05-28 16:33             ` jsullivan
@ 2015-05-28 17:17               ` Eric Dumazet
  2015-05-28 17:31                 ` jsullivan
  0 siblings, 1 reply; 16+ messages in thread
From: Eric Dumazet @ 2015-05-28 17:17 UTC (permalink / raw)
  To: jsullivan@opensourcedevel.com; +Cc: John Fastabend, netdev

On Thu, 2015-05-28 at 12:33 -0400, jsullivan@opensourcedevel.com wrote:

> Our initial testing has been single flow but the ultimate purpose is processing
> real time video in a complex application which ingests associated meta data,
> post to consumer facing cloud, does reporting back - so lots of different
> traffics with very different demands - a perfect tc environment.

Wait, do you really plan using TCP for real time video ?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Drops in qdisc on ifb interface
  2015-05-28 17:17               ` Eric Dumazet
@ 2015-05-28 17:31                 ` jsullivan
  2015-05-28 17:49                   ` Eric Dumazet
  0 siblings, 1 reply; 16+ messages in thread
From: jsullivan @ 2015-05-28 17:31 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, John Fastabend


> On May 28, 2015 at 1:17 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>
> On Thu, 2015-05-28 at 12:33 -0400, jsullivan@opensourcedevel.com wrote:
>
> > Our initial testing has been single flow but the ultimate purpose is
> > processing
> > real time video in a complex application which ingests associated meta data,
> > post to consumer facing cloud, does reporting back - so lots of different
> > traffics with very different demands - a perfect tc environment.
>
> Wait, do you really plan using TCP for real time video ?
>
>
The overall product does but the video source feeds come over a different
network via UDP. There are, however, RTMP quality control feeds coming across
this connection.  There may also occasionally be test UDP source feeds on this
connection but those are not production.  Thanks - John

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Drops in qdisc on ifb interface
  2015-05-28 17:31                 ` jsullivan
@ 2015-05-28 17:49                   ` Eric Dumazet
  2015-05-28 17:54                     ` jsullivan
  0 siblings, 1 reply; 16+ messages in thread
From: Eric Dumazet @ 2015-05-28 17:49 UTC (permalink / raw)
  To: jsullivan@opensourcedevel.com; +Cc: netdev, John Fastabend

On Thu, 2015-05-28 at 13:31 -0400, jsullivan@opensourcedevel.com wrote:

> The overall product does but the video source feeds come over a different
> network via UDP. There are, however, RTMP quality control feeds coming across
> this connection.  There may also occasionally be test UDP source feeds on this
> connection but those are not production.  Thanks - John

This is important to know, because UDP wont benefit from GRO.

I was assuming your receiver had to handle ~88000 packets per second,
so I was doubting it could saturate one core,
but maybe your target is very different.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Drops in qdisc on ifb interface
  2015-05-28 17:49                   ` Eric Dumazet
@ 2015-05-28 17:54                     ` jsullivan
  0 siblings, 0 replies; 16+ messages in thread
From: jsullivan @ 2015-05-28 17:54 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, John Fastabend


> On May 28, 2015 at 1:49 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>
> On Thu, 2015-05-28 at 13:31 -0400, jsullivan@opensourcedevel.com wrote:
>
> > The overall product does but the video source feeds come over a different
> > network via UDP. There are, however, RTMP quality control feeds coming
> > across
> > this connection. There may also occasionally be test UDP source feeds on
> > this
> > connection but those are not production. Thanks - John
>
> This is important to know, because UDP wont benefit from GRO.
>
> I was assuming your receiver had to handle ~88000 packets per second,
> so I was doubting it could saturate one core,
> but maybe your target is very different.
>
>
>
That PPS estimate seems accurate - the port speed and CIR on the shaped
connection is 1 Gbps.

I'm still mystified by why the GbE bottlenecks on IFB but the 10GbE does not.
 Thanks -  John

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2015-05-28 17:54 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-05-25 20:05 Drops in qdisc on ifb interface John A. Sullivan III
2015-05-25 22:31 ` Eric Dumazet
2015-05-26  2:52   ` John A. Sullivan III
2015-05-26  3:17     ` Eric Dumazet
2015-05-28 14:38   ` jsullivan
2015-05-28 15:14     ` Eric Dumazet
2015-05-28 15:30       ` jsullivan
2015-05-28 15:45         ` John Fastabend
2015-05-28 16:26           ` Eric Dumazet
2015-05-28 16:33             ` jsullivan
2015-05-28 17:17               ` Eric Dumazet
2015-05-28 17:31                 ` jsullivan
2015-05-28 17:49                   ` Eric Dumazet
2015-05-28 17:54                     ` jsullivan
2015-05-28 16:28           ` jsullivan
2015-05-28 15:51         ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).