[BUG?] ixgbe: only num_online_cpus() of the tx queues are enabled

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [BUG?] ixgbe: only num_online_cpus() of the tx queues are enabled
@ 2014-03-08  6:13 Ming Chen
  2014-03-08  7:12 ` John Fastabend
  2014-03-08 15:27 ` Eric Dumazet
  0 siblings, 2 replies; 23+ messages in thread
From: Ming Chen @ 2014-03-08  6:13 UTC (permalink / raw)
  To: netdev; +Cc: Erez Zadok, Dean Hildebrand, Geoff Kuenning

Hi,

We have an Intel 82599EB dual-port 10GbE NIC, which has 128 tx queues
(64 per port and we used only one port). We found only 12 of the tx
queues are enabled, where 12 is number of CPUs of our system.

We realized that, in the driver code, adapter->num_tx_queues (which
decides netdev->real_num_tx_queues) is indirectly set to "min_t(int,
IXGBE_MAX_RSS_INDICES, num_online_cpus())". It looks like the limit is
for RSS. But why tx queues is also set to the same as rx queues?

The problem of having a small number of tx queues is high probability
of hash collision in skb_tx_hash(). If we have a small number of
long-lived data-intensive TCP flows, the hash collision can causes
unfairness. We found this problem during our benchmarking of NFS when
identical NFS clients are getting very different throughput when
reading a big file from the server. We call this problem Hash-Cast. If
interested, you can take a look at this poster:
http://www.fsl.cs.sunysb.edu/~mchen/fast14poster-hashcast-portrait.pdf

Can anybody take a loot at this? It would be better to have all tx
queues enabled by default. If this is unlikely to happen, is there a
way to reconfigure the NIC so that we can use all tx queues if we
want?

FYI, our kernel version is 3.12.0, but I found the same limit of tx
queues in the code of the latest kernel. I am counting the number of
enabled queues using "ls /sys/class/net/p3p1/queues| grep -c tx-"

Best,
Ming

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [BUG?] ixgbe: only num_online_cpus() of the tx queues are enabled
  2014-03-08  6:13 [BUG?] ixgbe: only num_online_cpus() of the tx queues are enabled Ming Chen
@ 2014-03-08  7:12 ` John Fastabend
  2014-03-09  0:19   ` Ming Chen
  2014-03-08 15:27 ` Eric Dumazet
  1 sibling, 1 reply; 23+ messages in thread
From: John Fastabend @ 2014-03-08  7:12 UTC (permalink / raw)
  To: Ming Chen
  Cc: netdev, Erez Zadok, Dean Hildebrand, Geoff Kuenning, Eric Dumazet

On 3/7/2014 10:13 PM, Ming Chen wrote:
> Hi,
>
> We have an Intel 82599EB dual-port 10GbE NIC, which has 128 tx queues
> (64 per port and we used only one port). We found only 12 of the tx
> queues are enabled, where 12 is number of CPUs of our system.
>
> We realized that, in the driver code, adapter->num_tx_queues (which
> decides netdev->real_num_tx_queues) is indirectly set to "min_t(int,
> IXGBE_MAX_RSS_INDICES, num_online_cpus())". It looks like the limit is
> for RSS. But why tx queues is also set to the same as rx queues?
>
> The problem of having a small number of tx queues is high probability
> of hash collision in skb_tx_hash(). If we have a small number of
> long-lived data-intensive TCP flows, the hash collision can causes
> unfairness. We found this problem during our benchmarking of NFS when
> identical NFS clients are getting very different throughput when
> reading a big file from the server. We call this problem Hash-Cast. If
> interested, you can take a look at this poster:
> http://www.fsl.cs.sunysb.edu/~mchen/fast14poster-hashcast-portrait.pdf
>
> Can anybody take a loot at this? It would be better to have all tx
> queues enabled by default. If this is unlikely to happen, is there a
> way to reconfigure the NIC so that we can use all tx queues if we
> want?

One way to solve this would be to use XPS and cgroups. XPS will allow
you to map the queues to CPUs and then use cgroups to map your
application (NFS here) onto the correct CPU. Then which queue is
picked is deterministic and you could manage the hash-cast problem.
Having to use cgroup to do the management is not ideal though.

Also once you have many sessions on a single mq qdisc queue you
should consider using fq-codel configured via 'tc qdisc add ...'
to get nice fairness properties amongst flows sharing a queue.

>
> FYI, our kernel version is 3.12.0, but I found the same limit of tx
> queues in the code of the latest kernel. I am counting the number of
> enabled queues using "ls /sys/class/net/p3p1/queues| grep -c tx-"

Its been the same for sometime. It should be reasonably easy to allow
this I'll take a look but wont get to it until next week. In the
meantime I'll see what other sort of comments pop up.

This is only observable with a small number of flows correct? With
many flows the distribution should be fair.

>
> Best,
> Ming
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [BUG?] ixgbe: only num_online_cpus() of the tx queues are enabled
  2014-03-08  6:13 [BUG?] ixgbe: only num_online_cpus() of the tx queues are enabled Ming Chen
  2014-03-08  7:12 ` John Fastabend
@ 2014-03-08 15:27 ` Eric Dumazet
  2014-03-08 16:08   ` Eric Dumazet
  2014-03-09  0:30   ` Ming Chen
  1 sibling, 2 replies; 23+ messages in thread
From: Eric Dumazet @ 2014-03-08 15:27 UTC (permalink / raw)
  To: Ming Chen; +Cc: netdev, Erez Zadok, Dean Hildebrand, Geoff Kuenning

On Sat, 2014-03-08 at 01:13 -0500, Ming Chen wrote:
> Hi,
> 
> We have an Intel 82599EB dual-port 10GbE NIC, which has 128 tx queues
> (64 per port and we used only one port). We found only 12 of the tx
> queues are enabled, where 12 is number of CPUs of our system.
> 
> We realized that, in the driver code, adapter->num_tx_queues (which
> decides netdev->real_num_tx_queues) is indirectly set to "min_t(int,
> IXGBE_MAX_RSS_INDICES, num_online_cpus())". It looks like the limit is
> for RSS. But why tx queues is also set to the same as rx queues?
> 
> The problem of having a small number of tx queues is high probability
> of hash collision in skb_tx_hash(). If we have a small number of
> long-lived data-intensive TCP flows, the hash collision can causes
> unfairness. We found this problem during our benchmarking of NFS when
> identical NFS clients are getting very different throughput when
> reading a big file from the server. We call this problem Hash-Cast. If
> interested, you can take a look at this poster:
> http://www.fsl.cs.sunysb.edu/~mchen/fast14poster-hashcast-portrait.pdf
> 
> Can anybody take a loot at this? It would be better to have all tx
> queues enabled by default. If this is unlikely to happen, is there a
> way to reconfigure the NIC so that we can use all tx queues if we
> want?
> 
> FYI, our kernel version is 3.12.0, but I found the same limit of tx
> queues in the code of the latest kernel. I am counting the number of
> enabled queues using "ls /sys/class/net/p3p1/queues| grep -c tx-"
> 
> Best,

Quite frankly, with a 1Gbe link, I would just use FQ and your problem
would disappear.

(I also use FQ with 40Gbe links if that matters)

For a 1Gbe link, the following command is more than enough.

tc qdisc replace dev eth0 root fq

Also, following patch would probably help fairness. I'll submit an
official and more complete patch later.

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index bc0fb0fc7552..296c201516d1 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1911,8 +1911,7 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
 		 * of queued bytes to ensure line rate.
 		 * One example is wifi aggregation (802.11 AMPDU)
 		 */
-		limit = max_t(unsigned int, sysctl_tcp_limit_output_bytes,
-			      sk->sk_pacing_rate >> 10);
+		limit = 2 * skb->truesize;
 
 		if (atomic_read(&sk->sk_wmem_alloc) > limit) {
 			set_bit(TSQ_THROTTLED, &tp->tsq_flags);

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [BUG?] ixgbe: only num_online_cpus() of the tx queues are enabled
  2014-03-08 15:27 ` Eric Dumazet
@ 2014-03-08 16:08   ` Eric Dumazet
  2014-03-09  0:53     ` Ming Chen
  2014-03-09  0:30   ` Ming Chen
  1 sibling, 1 reply; 23+ messages in thread
From: Eric Dumazet @ 2014-03-08 16:08 UTC (permalink / raw)
  To: Ming Chen; +Cc: netdev, Erez Zadok, Dean Hildebrand, Geoff Kuenning

On Sat, 2014-03-08 at 07:27 -0800, Eric Dumazet wrote:

> Quite frankly, with a 1Gbe link, I would just use FQ and your problem
> would disappear.
> 
> (I also use FQ with 40Gbe links if that matters)
> 
> For a 1Gbe link, the following command is more than enough.
> 
> tc qdisc replace dev eth0 root fq
> 
> Also, following patch would probably help fairness. I'll submit an
> official and more complete patch later.

Also try more recent kernels. TCP stack changes a lot these days ;)

http://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git

contains many TCP related stuff which should land in linux-3.15

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [BUG?] ixgbe: only num_online_cpus() of the tx queues are enabled
  2014-03-08  7:12 ` John Fastabend
@ 2014-03-09  0:19   ` Ming Chen
  0 siblings, 0 replies; 23+ messages in thread
From: Ming Chen @ 2014-03-09  0:19 UTC (permalink / raw)
  To: John Fastabend
  Cc: netdev, Erez Zadok, Dean Hildebrand, Geoff Kuenning, Eric Dumazet

Hi John,

Thanks for the suggestion. Please find my comments inline.

On Sat, Mar 8, 2014 at 2:12 AM, John Fastabend
<john.r.fastabend@intel.com> wrote:
> On 3/7/2014 10:13 PM, Ming Chen wrote:
>>
>> Hi,
>>
>> We have an Intel 82599EB dual-port 10GbE NIC, which has 128 tx queues
>> (64 per port and we used only one port). We found only 12 of the tx
>> queues are enabled, where 12 is number of CPUs of our system.
>>
>> We realized that, in the driver code, adapter->num_tx_queues (which
>> decides netdev->real_num_tx_queues) is indirectly set to "min_t(int,
>> IXGBE_MAX_RSS_INDICES, num_online_cpus())". It looks like the limit is
>> for RSS. But why tx queues is also set to the same as rx queues?
>>
>> The problem of having a small number of tx queues is high probability
>> of hash collision in skb_tx_hash(). If we have a small number of
>> long-lived data-intensive TCP flows, the hash collision can causes
>> unfairness. We found this problem during our benchmarking of NFS when
>> identical NFS clients are getting very different throughput when
>> reading a big file from the server. We call this problem Hash-Cast. If
>> interested, you can take a look at this poster:
>> http://www.fsl.cs.sunysb.edu/~mchen/fast14poster-hashcast-portrait.pdf
>>
>> Can anybody take a loot at this? It would be better to have all tx
>> queues enabled by default. If this is unlikely to happen, is there a
>> way to reconfigure the NIC so that we can use all tx queues if we
>> want?
>
>
> One way to solve this would be to use XPS and cgroups. XPS will allow
> you to map the queues to CPUs and then use cgroups to map your
> application (NFS here) onto the correct CPU. Then which queue is
> picked is deterministic and you could manage the hash-cast problem.
> Having to use cgroup to do the management is not ideal though.
>
> Also once you have many sessions on a single mq qdisc queue you
> should consider using fq-codel configured via 'tc qdisc add ...'
> to get nice fairness properties amongst flows sharing a queue.
>

Yeah, we can let all NFS flows share just one queue using XPS and
cgroups. And then use fd-codel to achieve fairness among them. But I
have two doubts: (1) Is cgroups also applicable to kernel process as
we were using the in-kernel NFS server. Never used cgroups before.

(2) I have not tried yet, but would the network throughput be lower if
we just use just one tx queue instead of multiple? Because NFS server
is the only thing we care about in the machine. Using just one tx
queue sounds like a waste of resources considering there are 64 in
total. I will try this and measure the throughput.

>
>>
>> FYI, our kernel version is 3.12.0, but I found the same limit of tx
>> queues in the code of the latest kernel. I am counting the number of
>> enabled queues using "ls /sys/class/net/p3p1/queues| grep -c tx-"
>
>
> Its been the same for sometime. It should be reasonably easy to allow
> this I'll take a look but wont get to it until next week. In the
> meantime I'll see what other sort of comments pop up.

Thanks. It will be great if we can enable all tx-queues somehow.

>
> This is only observable with a small number of flows correct? With
> many flows the distribution should be fair.

Right now we have only experimented with 5 flows. Not sure with larger
number of flows.

>
>>
>> Best,
>> Ming
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [BUG?] ixgbe: only num_online_cpus() of the tx queues are enabled
  2014-03-08 15:27 ` Eric Dumazet
  2014-03-08 16:08   ` Eric Dumazet
@ 2014-03-09  0:30   ` Ming Chen
  2014-03-09  3:29     ` Eric Dumazet
  1 sibling, 1 reply; 23+ messages in thread
From: Ming Chen @ 2014-03-09  0:30 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, Erez Zadok, Dean Hildebrand, Geoff Kuenning

Hi Eric,

Thanks for the suggestion. I believe "tc qdisc replace dev eth0 root
fq" can achieve fairness if we have only one queue. My understanding
is that we cannot directly apply FQ to a multiqueue devices, isn't it?
If we apply FQ separately to each tx queue, then what if we have one
flow (fl-0) in tx-0, but two flows (fl-1 and fl-2) in tx-1? With FQ,
the two flows in tx-1 should get the same bandwidth. But how about
fl-0 and fl-1?

Best,
Ming

On Sat, Mar 8, 2014 at 10:27 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Sat, 2014-03-08 at 01:13 -0500, Ming Chen wrote:
>> Hi,
>>
>> We have an Intel 82599EB dual-port 10GbE NIC, which has 128 tx queues
>> (64 per port and we used only one port). We found only 12 of the tx
>> queues are enabled, where 12 is number of CPUs of our system.
>>
>> We realized that, in the driver code, adapter->num_tx_queues (which
>> decides netdev->real_num_tx_queues) is indirectly set to "min_t(int,
>> IXGBE_MAX_RSS_INDICES, num_online_cpus())". It looks like the limit is
>> for RSS. But why tx queues is also set to the same as rx queues?
>>
>> The problem of having a small number of tx queues is high probability
>> of hash collision in skb_tx_hash(). If we have a small number of
>> long-lived data-intensive TCP flows, the hash collision can causes
>> unfairness. We found this problem during our benchmarking of NFS when
>> identical NFS clients are getting very different throughput when
>> reading a big file from the server. We call this problem Hash-Cast. If
>> interested, you can take a look at this poster:
>> http://www.fsl.cs.sunysb.edu/~mchen/fast14poster-hashcast-portrait.pdf
>>
>> Can anybody take a loot at this? It would be better to have all tx
>> queues enabled by default. If this is unlikely to happen, is there a
>> way to reconfigure the NIC so that we can use all tx queues if we
>> want?
>>
>> FYI, our kernel version is 3.12.0, but I found the same limit of tx
>> queues in the code of the latest kernel. I am counting the number of
>> enabled queues using "ls /sys/class/net/p3p1/queues| grep -c tx-"
>>
>> Best,
>
> Quite frankly, with a 1Gbe link, I would just use FQ and your problem
> would disappear.
>
> (I also use FQ with 40Gbe links if that matters)
>
> For a 1Gbe link, the following command is more than enough.
>
> tc qdisc replace dev eth0 root fq
>
> Also, following patch would probably help fairness. I'll submit an
> official and more complete patch later.
>
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index bc0fb0fc7552..296c201516d1 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -1911,8 +1911,7 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
>                  * of queued bytes to ensure line rate.
>                  * One example is wifi aggregation (802.11 AMPDU)
>                  */
> -               limit = max_t(unsigned int, sysctl_tcp_limit_output_bytes,
> -                             sk->sk_pacing_rate >> 10);
> +               limit = 2 * skb->truesize;
>
>                 if (atomic_read(&sk->sk_wmem_alloc) > limit) {
>                         set_bit(TSQ_THROTTLED, &tp->tsq_flags);
>
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [BUG?] ixgbe: only num_online_cpus() of the tx queues are enabled
  2014-03-08 16:08   ` Eric Dumazet
@ 2014-03-09  0:53     ` Ming Chen
  2014-03-09  3:37       ` Eric Dumazet
  0 siblings, 1 reply; 23+ messages in thread
From: Ming Chen @ 2014-03-09  0:53 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, Erez Zadok, Dean Hildebrand, Geoff Kuenning

Hi Eric,

We noticed many changes in the TCP stack, and a lot of them come from you :-)

Actually, we have a question about this patch you submitted
(http://lwn.net/Articles/564979/) regarding an experiment we conducted
in the 3.12.0 kernel. The results we observed in shown in the second
figure of panel 6 in this poster at
http://www.fsl.cs.sunysb.edu/~mchen/fast14poster-hashcast-portrait.pdf
.  We have repeated the same experiment for 100 times, and observed
that results like that appeared 4 times. For this experiment, we
observed that all five flows are using dedicated tx queues.  But what
makes a big difference is the average packet sizes of the flows.
Client4 has an average packet size of around 3KB while all other
clients generate packet sizes over 50KB. We suspect it might be caused
by this TSO Packets Automatic Sizing feaure. Our reasoning is this: if
a TCP flow starts slowly, this feature will assign it a small packet
size. The packet size and the sending rate can somehow form a feedback
loop, which can force the TCP flow's rate to stay low. What do you
think about this?

We have not tried the latest kernel yet. Frankly speaking, as a
networking layman, I am already overwhelmed by the complexity of the
TCP stack in the 3.12.0 kernel. I guess it is the nature of networking
if we are going to 40GbE or even 100GbE. But anyway, thanks for your
suggestion.

Best,
Ming

On Sat, Mar 8, 2014 at 11:08 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Also try more recent kernels. TCP stack changes a lot these days ;)
>
> http://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git
>
> contains many TCP related stuff which should land in linux-3.15
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [BUG?] ixgbe: only num_online_cpus() of the tx queues are enabled
  2014-03-09  0:30   ` Ming Chen
@ 2014-03-09  3:29     ` Eric Dumazet
  2014-03-09  6:43       ` Ming Chen
  0 siblings, 1 reply; 23+ messages in thread
From: Eric Dumazet @ 2014-03-09  3:29 UTC (permalink / raw)
  To: Ming Chen; +Cc: netdev, Erez Zadok, Dean Hildebrand, Geoff Kuenning

On Sat, 2014-03-08 at 19:30 -0500, Ming Chen wrote:
> Hi Eric,
> 
> Thanks for the suggestion. I believe "tc qdisc replace dev eth0 root
> fq" can achieve fairness if we have only one queue. My understanding
> is that we cannot directly apply FQ to a multiqueue devices, isn't it?
> If we apply FQ separately to each tx queue, then what if we have one
> flow (fl-0) in tx-0, but two flows (fl-1 and fl-2) in tx-1? With FQ,
> the two flows in tx-1 should get the same bandwidth. But how about
> fl-0 and fl-1?

You do not need multiqueue to send traffic for few TCP flows, because
packets will reach 64KB size very fast.

If you want fairness, then multiqueue wont do it, unless you add some
kind of shaper.

TCP is handling one flow, not an arbitrary number of flows.

If you need fairness, then you need an AQM like FQ.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [BUG?] ixgbe: only num_online_cpus() of the tx queues are enabled
  2014-03-09  0:53     ` Ming Chen
@ 2014-03-09  3:37       ` Eric Dumazet
  2014-03-09  3:52         ` John Fastabend
  2014-03-09  6:47         ` Ming Chen
  0 siblings, 2 replies; 23+ messages in thread
From: Eric Dumazet @ 2014-03-09  3:37 UTC (permalink / raw)
  To: Ming Chen; +Cc: netdev, Erez Zadok, Dean Hildebrand, Geoff Kuenning

On Sat, 2014-03-08 at 19:53 -0500, Ming Chen wrote:
> Hi Eric,
> 
> We noticed many changes in the TCP stack, and a lot of them come from you :-)
> 
> Actually, we have a question about this patch you submitted
> (http://lwn.net/Articles/564979/) regarding an experiment we conducted
> in the 3.12.0 kernel. The results we observed in shown in the second
> figure of panel 6 in this poster at
> http://www.fsl.cs.sunysb.edu/~mchen/fast14poster-hashcast-portrait.pdf
> .  We have repeated the same experiment for 100 times, and observed
> that results like that appeared 4 times. For this experiment, we
> observed that all five flows are using dedicated tx queues.  But what
> makes a big difference is the average packet sizes of the flows.
> Client4 has an average packet size of around 3KB while all other
> clients generate packet sizes over 50KB. We suspect it might be caused
> by this TSO Packets Automatic Sizing feaure. Our reasoning is this: if
> a TCP flow starts slowly, this feature will assign it a small packet
> size. The packet size and the sending rate can somehow form a feedback
> loop, which can force the TCP flow's rate to stay low. What do you
> think about this?

I think nothing at all. TCP is not fair. TCP tries to steal whole
bandwidth by definition. One flow can have much more than the neighbour.

With FQ, you can force some fairness, but if you use multiqueue, there
is no guarantee at all, unless you make sure :

- no more than one flow per queue.
- Nic is able to provide fairness among all active TX queues.

Thats the ideal condition, and that is quite hard to meet.

The feedback loop you mention should be solved by the patch I sent
today : TCP Small queue make sure that you have no more than 2 packets
per flow on qdisc / TX queues. So on 'fast' flow cannot have 90% of the
packets in the qdisc. cwnd is maintained to very small values, assuming
receiver is behaving normally.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [BUG?] ixgbe: only num_online_cpus() of the tx queues are enabled
  2014-03-09  3:37       ` Eric Dumazet
@ 2014-03-09  3:52         ` John Fastabend
  2014-03-09  4:11           ` Eric Dumazet
  2014-03-09  6:56           ` Ming Chen
  2014-03-09  6:47         ` Ming Chen
  1 sibling, 2 replies; 23+ messages in thread
From: John Fastabend @ 2014-03-09  3:52 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Ming Chen, netdev, Erez Zadok, Dean Hildebrand, Geoff Kuenning

On 03/08/2014 07:37 PM, Eric Dumazet wrote:
> On Sat, 2014-03-08 at 19:53 -0500, Ming Chen wrote:
>> Hi Eric,
>>
>> We noticed many changes in the TCP stack, and a lot of them come from you :-)
>>
>> Actually, we have a question about this patch you submitted
>> (http://lwn.net/Articles/564979/) regarding an experiment we conducted
>> in the 3.12.0 kernel. The results we observed in shown in the second
>> figure of panel 6 in this poster at
>> http://www.fsl.cs.sunysb.edu/~mchen/fast14poster-hashcast-portrait.pdf
>> .  We have repeated the same experiment for 100 times, and observed
>> that results like that appeared 4 times. For this experiment, we
>> observed that all five flows are using dedicated tx queues.  But what
>> makes a big difference is the average packet sizes of the flows.
>> Client4 has an average packet size of around 3KB while all other
>> clients generate packet sizes over 50KB. We suspect it might be caused
>> by this TSO Packets Automatic Sizing feaure. Our reasoning is this: if
>> a TCP flow starts slowly, this feature will assign it a small packet
>> size. The packet size and the sending rate can somehow form a feedback
>> loop, which can force the TCP flow's rate to stay low. What do you
>> think about this?
>
> I think nothing at all. TCP is not fair. TCP tries to steal whole
> bandwidth by definition. One flow can have much more than the neighbour.
>
> With FQ, you can force some fairness, but if you use multiqueue, there
> is no guarantee at all, unless you make sure :
>
> - no more than one flow per queue.
> - Nic is able to provide fairness among all active TX queues.
>

The NIC by default will round robin amongst the queues and should be
reasonably fair. We could increase the number of TX queues the driver
enables and for a small number of flows the first condition is easier
to meet. Although it wont help as the flow count increases.

Using FQ as a root qdisc though I think will really hurt performance
on small packet sizes. For larger packet sizes its probably less
noticeable. Each queue can use FQ as noted previously.


> Thats the ideal condition, and that is quite hard to meet.
>
> The feedback loop you mention should be solved by the patch I sent
> today : TCP Small queue make sure that you have no more than 2 packets
> per flow on qdisc / TX queues. So on 'fast' flow cannot have 90% of the
> packets in the qdisc. cwnd is maintained to very small values, assuming
> receiver is behaving normally.
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


-- 
John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [BUG?] ixgbe: only num_online_cpus() of the tx queues are enabled
  2014-03-09  3:52         ` John Fastabend
@ 2014-03-09  4:11           ` Eric Dumazet
  2014-03-09  6:56           ` Ming Chen
  1 sibling, 0 replies; 23+ messages in thread
From: Eric Dumazet @ 2014-03-09  4:11 UTC (permalink / raw)
  To: John Fastabend
  Cc: Ming Chen, netdev, Erez Zadok, Dean Hildebrand, Geoff Kuenning

On Sat, 2014-03-08 at 19:52 -0800, John Fastabend wrote:
> On 03/08/2014 07:37 PM, Eric Dumazet wrote:
> > On Sat, 2014-03-08 at 19:53 -0500, Ming Chen wrote:
> >> Hi Eric,
> >>
> >> We noticed many changes in the TCP stack, and a lot of them come from you :-)
> >>
> >> Actually, we have a question about this patch you submitted
> >> (http://lwn.net/Articles/564979/) regarding an experiment we conducted
> >> in the 3.12.0 kernel. The results we observed in shown in the second
> >> figure of panel 6 in this poster at
> >> http://www.fsl.cs.sunysb.edu/~mchen/fast14poster-hashcast-portrait.pdf
> >> .  We have repeated the same experiment for 100 times, and observed
> >> that results like that appeared 4 times. For this experiment, we
> >> observed that all five flows are using dedicated tx queues.  But what
> >> makes a big difference is the average packet sizes of the flows.
> >> Client4 has an average packet size of around 3KB while all other
> >> clients generate packet sizes over 50KB. We suspect it might be caused
> >> by this TSO Packets Automatic Sizing feaure. Our reasoning is this: if
> >> a TCP flow starts slowly, this feature will assign it a small packet
> >> size. The packet size and the sending rate can somehow form a feedback
> >> loop, which can force the TCP flow's rate to stay low. What do you
> >> think about this?
> >
> > I think nothing at all. TCP is not fair. TCP tries to steal whole
> > bandwidth by definition. One flow can have much more than the neighbour.
> >
> > With FQ, you can force some fairness, but if you use multiqueue, there
> > is no guarantee at all, unless you make sure :
> >
> > - no more than one flow per queue.
> > - Nic is able to provide fairness among all active TX queues.
> >
> 
> The NIC by default will round robin amongst the queues and should be
> reasonably fair. We could increase the number of TX queues the driver
> enables and for a small number of flows the first condition is easier
> to meet. Although it wont help as the flow count increases.
> 
> Using FQ as a root qdisc though I think will really hurt performance
> on small packet sizes. For larger packet sizes its probably less
> noticeable. Each queue can use FQ as noted previously.
> 

Note Ming case was using between 1 and 10 flows.

Of course, the MQ+FQ is better for performance, but then the fairness
problem is back.

It all depends of what is really wanted, thats why we can tweak
things ;)

To play with fq (instead of pfifo_fast), and mq, its as simple as :

echo fq >/proc/sys/net/core/default_qdisc
tc qdisc replace dev eth0 root pfifo
tc qdisc del dev eth0 root

And you now have MQ+FQ, instead of MQ+pfifo_fast

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [BUG?] ixgbe: only num_online_cpus() of the tx queues are enabled
  2014-03-09  3:29     ` Eric Dumazet
@ 2014-03-09  6:43       ` Ming Chen
  2014-03-09 13:44         ` Eric Dumazet
  0 siblings, 1 reply; 23+ messages in thread
From: Ming Chen @ 2014-03-09  6:43 UTC (permalink / raw)
  To: Eric Dumazet, John Fastabend
  Cc: netdev, Erez Zadok, Dean Hildebrand, Geoff Kuenning

On Sat, Mar 8, 2014 at 10:29 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> You do not need multiqueue to send traffic for few TCP flows, because
> packets will reach 64KB size very fast.

I am trying to experiment using only one queue for all the TCP flows.
But, how do we force that. I was thinking about XPS. But I just
realized that /sys/class/net/p3p1/queues/tx-n/xps_cpus only set which
CPUs can use them. How to let all CPUs choose one single tx queue? Or,
should I use mqprio and assign all flows to one tc that contains only
one tx queue?

>
> If you want fairness, then multiqueue wont do it, unless you add some
> kind of shaper.
>
> TCP is handling one flow, not an arbitrary number of flows.
>
> If you need fairness, then you need an AQM like FQ.
>

Yeah. You are right. However, if we really want fairness while using
mq, then should AQM and FQ be aware of mq or even the scheduling of
the queues?

Best,
Ming

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [BUG?] ixgbe: only num_online_cpus() of the tx queues are enabled
  2014-03-09  3:37       ` Eric Dumazet
  2014-03-09  3:52         ` John Fastabend
@ 2014-03-09  6:47         ` Ming Chen
  2014-03-09 13:39           ` Eric Dumazet
  1 sibling, 1 reply; 23+ messages in thread
From: Ming Chen @ 2014-03-09  6:47 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, Erez Zadok, Dean Hildebrand, Geoff Kuenning

On Sat, Mar 8, 2014 at 10:37 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> The feedback loop you mention should be solved by the patch I sent
> today : TCP Small queue make sure that you have no more than 2 packets
> per flow on qdisc / TX queues. So on 'fast' flow cannot have 90% of the
> packets in the qdisc. cwnd is maintained to very small values, assuming
> receiver is behaving normally.
>

Thanks for the patch. I forgot to mention that excessively big cwnd
(or bufferbloat) does play a part in Hash-Cast. We found that using
TCP Vegas does not show unfairness even in the presence of hash
collision among TCP flows. Comparing Vegas and Cubic, we found that
Vegas has a much smaller cwnd.

Best,
Ming

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [BUG?] ixgbe: only num_online_cpus() of the tx queues are enabled
  2014-03-09  3:52         ` John Fastabend
  2014-03-09  4:11           ` Eric Dumazet
@ 2014-03-09  6:56           ` Ming Chen
  2014-03-11  4:56             ` Geoff Kuenning
  1 sibling, 1 reply; 23+ messages in thread
From: Ming Chen @ 2014-03-09  6:56 UTC (permalink / raw)
  To: John Fastabend
  Cc: Eric Dumazet, netdev, Erez Zadok, Dean Hildebrand, Geoff Kuenning

On Sat, Mar 8, 2014 at 10:52 PM, John Fastabend
<john.fastabend@gmail.com> wrote:
>
> The NIC by default will round robin amongst the queues and should be
> reasonably fair. We could increase the number of TX queues the driver
> enables and for a small number of flows the first condition is easier
> to meet. Although it wont help as the flow count increases.

Yes, it is definitely better to be capable of using all tx queues.
>From a customer's point of view, if someone bought an NIC with 64
queues, then the vendor (and the NIC driver) should enable him/her to
use all queues if the customer really want to.

In our case, we believe larger number of tx queues will reduce the
probability of hash collision.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [BUG?] ixgbe: only num_online_cpus() of the tx queues are enabled
  2014-03-09  6:47         ` Ming Chen
@ 2014-03-09 13:39           ` Eric Dumazet
  2014-03-09 22:31             ` David Miller
  0 siblings, 1 reply; 23+ messages in thread
From: Eric Dumazet @ 2014-03-09 13:39 UTC (permalink / raw)
  To: Ming Chen; +Cc: netdev, Erez Zadok, Dean Hildebrand, Geoff Kuenning

On Sun, 2014-03-09 at 01:47 -0500, Ming Chen wrote:

> Thanks for the patch. I forgot to mention that excessively big cwnd
> (or bufferbloat) does play a part in Hash-Cast. We found that using
> TCP Vegas does not show unfairness even in the presence of hash
> collision among TCP flows. Comparing Vegas and Cubic, we found that
> Vegas has a much smaller cwnd.

Of course, Vegas is delay based. It does not need to increase cwnd to
insane values.

Problem is it does not compete well in presence of non Vegas flows.

And unfortunately, Internet is filled by non Vegas flows.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [BUG?] ixgbe: only num_online_cpus() of the tx queues are enabled
  2014-03-09  6:43       ` Ming Chen
@ 2014-03-09 13:44         ` Eric Dumazet
  2014-03-09 19:22           ` Ming Chen
  2014-03-10  5:40           ` Ming Chen
  0 siblings, 2 replies; 23+ messages in thread
From: Eric Dumazet @ 2014-03-09 13:44 UTC (permalink / raw)
  To: Ming Chen
  Cc: John Fastabend, netdev, Erez Zadok, Dean Hildebrand,
	Geoff Kuenning

On Sun, 2014-03-09 at 01:43 -0500, Ming Chen wrote:

> I am trying to experiment using only one queue for all the TCP flows.
> But, how do we force that. I was thinking about XPS. But I just
> realized that /sys/class/net/p3p1/queues/tx-n/xps_cpus only set which
> CPUs can use them. How to let all CPUs choose one single tx queue? Or,
> should I use mqprio and assign all flows to one tc that contains only
> one tx queue?

If you want to use a single queue, just declare a single queue on your
NIC...

ethtool -L eth0 rx 1 tx 1

Multiqueue is not a requirement in your case. You can easily reach line
rate with a single queue on a 10Gbe NIC.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [BUG?] ixgbe: only num_online_cpus() of the tx queues are enabled
  2014-03-09 13:44         ` Eric Dumazet
@ 2014-03-09 19:22           ` Ming Chen
  2014-03-09 19:37             ` Eric Dumazet
  2014-03-09 19:41             ` Eric Dumazet
  2014-03-10  5:40           ` Ming Chen
  1 sibling, 2 replies; 23+ messages in thread
From: Ming Chen @ 2014-03-09 19:22 UTC (permalink / raw)
  To: Eric Dumazet, John Fastabend
  Cc: netdev, Erez Zadok, Dean Hildebrand, Geoff Kuenning

Hi Eric and John,

I could not get the command working:

#ethtool -l p3p1
Channel parameters for p3p1:
Pre-set maximums:
RX:             0
TX:             0
Other:          1
Combined:       63
Current hardware settings:
RX:             0
TX:             0
Other:          1
Combined:       6

#ethtool -L p3p1 rx 1 tx 1
Cannot set device channel parameters: Invalid argument

#ethtool -L p3p1 rx 1 tx 1 other 1
other unmodified, ignoring
Cannot set device channel parameters: Invalid argument

Best,
Ming

On Sun, Mar 9, 2014 at 9:44 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> If you want to use a single queue, just declare a single queue on your
> NIC...
>
> ethtool -L eth0 rx 1 tx 1
>
> Multiqueue is not a requirement in your case. You can easily reach line
> rate with a single queue on a 10Gbe NIC.
>
>
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [BUG?] ixgbe: only num_online_cpus() of the tx queues are enabled
  2014-03-09 19:22           ` Ming Chen
@ 2014-03-09 19:37             ` Eric Dumazet
  2014-03-09 19:41             ` Eric Dumazet
  1 sibling, 0 replies; 23+ messages in thread
From: Eric Dumazet @ 2014-03-09 19:37 UTC (permalink / raw)
  To: Ming Chen
  Cc: John Fastabend, netdev, Erez Zadok, Dean Hildebrand,
	Geoff Kuenning

On Sun, 2014-03-09 at 15:22 -0400, Ming Chen wrote:
> Hi Eric and John,
> 
> I could not get the command working:
> 
> #ethtool -l p3p1
> Channel parameters for p3p1:
> Pre-set maximums:
> RX:             0
> TX:             0
> Other:          1
> Combined:       63
> Current hardware settings:
> RX:             0
> TX:             0
> Other:          1
> Combined:       6
> 
> #ethtool -L p3p1 rx 1 tx 1
> Cannot set device channel parameters: Invalid argument
> 
> #ethtool -L p3p1 rx 1 tx 1 other 1
> other unmodified, ignoring
> Cannot set device channel parameters: Invalid argument

You need more recent kernel for ixgbe support :

commit 4c696ca9fbabc5f94a3c6db7f009e73f0ef21831
Author: Alexander Duyck <alexander.h.duyck@intel.com>
Date:   Thu Jan 17 08:39:33 2013 +0000

    ixgbe: Add support for set_channels ethtool operation
    
    This change adds support for the ethtool set_channels operation.
    
    Since the ixgbe driver has to support DCB as well as the other modes the
    assumption I made here is that the number of channels in DCB modes refers
    to the number of queues per traffic class, not the number of queues total.
    
    CC: Ben Hutchings <bhutchings@solarflare.com>
    Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
    Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
    Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [BUG?] ixgbe: only num_online_cpus() of the tx queues are enabled
  2014-03-09 19:22           ` Ming Chen
  2014-03-09 19:37             ` Eric Dumazet
@ 2014-03-09 19:41             ` Eric Dumazet
  2014-03-09 19:43               ` Ming Chen
  1 sibling, 1 reply; 23+ messages in thread
From: Eric Dumazet @ 2014-03-09 19:41 UTC (permalink / raw)
  To: Ming Chen
  Cc: John Fastabend, netdev, Erez Zadok, Dean Hildebrand,
	Geoff Kuenning

On Sun, 2014-03-09 at 15:22 -0400, Ming Chen wrote:
> Hi Eric and John,
> 
> I could not get the command working:
> 
> #ethtool -l p3p1
> Channel parameters for p3p1:
> Pre-set maximums:
> RX:             0
> TX:             0
> Other:          1
> Combined:       63
> Current hardware settings:
> RX:             0
> TX:             0
> Other:          1
> Combined:       6
> 
> #ethtool -L p3p1 rx 1 tx 1
> Cannot set device channel parameters: Invalid argument
> 
> #ethtool -L p3p1 rx 1 tx 1 other 1
> other unmodified, ignoring
> Cannot set device channel parameters: Invalid argument

You could try

ethtool -L p3p1 combined 1

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [BUG?] ixgbe: only num_online_cpus() of the tx queues are enabled
  2014-03-09 19:41             ` Eric Dumazet
@ 2014-03-09 19:43               ` Ming Chen
  0 siblings, 0 replies; 23+ messages in thread
From: Ming Chen @ 2014-03-09 19:43 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: John Fastabend, netdev, Erez Zadok, Dean Hildebrand,
	Geoff Kuenning

Thanks. It works.

On Sun, Mar 9, 2014 at 3:41 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Sun, 2014-03-09 at 15:22 -0400, Ming Chen wrote:
>> Hi Eric and John,
>>
>> I could not get the command working:
>>
>> #ethtool -l p3p1
>> Channel parameters for p3p1:
>> Pre-set maximums:
>> RX:             0
>> TX:             0
>> Other:          1
>> Combined:       63
>> Current hardware settings:
>> RX:             0
>> TX:             0
>> Other:          1
>> Combined:       6
>>
>> #ethtool -L p3p1 rx 1 tx 1
>> Cannot set device channel parameters: Invalid argument
>>
>> #ethtool -L p3p1 rx 1 tx 1 other 1
>> other unmodified, ignoring
>> Cannot set device channel parameters: Invalid argument
>
> You could try
>
> ethtool -L p3p1 combined 1
>
>
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [BUG?] ixgbe: only num_online_cpus() of the tx queues are enabled
  2014-03-09 13:39           ` Eric Dumazet
@ 2014-03-09 22:31             ` David Miller
  0 siblings, 0 replies; 23+ messages in thread
From: David Miller @ 2014-03-09 22:31 UTC (permalink / raw)
  To: eric.dumazet; +Cc: v.mingchen, netdev, ezk, dhildeb, geoff

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sun, 09 Mar 2014 06:39:46 -0700

> On Sun, 2014-03-09 at 01:47 -0500, Ming Chen wrote:
> 
>> Thanks for the patch. I forgot to mention that excessively big cwnd
>> (or bufferbloat) does play a part in Hash-Cast. We found that using
>> TCP Vegas does not show unfairness even in the presence of hash
>> collision among TCP flows. Comparing Vegas and Cubic, we found that
>> Vegas has a much smaller cwnd.
> 
> Of course, Vegas is delay based. It does not need to increase cwnd to
> insane values.
> 
> Problem is it does not compete well in presence of non Vegas flows.
> 
> And unfortunately, Internet is filled by non Vegas flows.

Vegas also does not respond quickly enough to sudden increases in
available bandwidth.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [BUG?] ixgbe: only num_online_cpus() of the tx queues are enabled
  2014-03-09 13:44         ` Eric Dumazet
  2014-03-09 19:22           ` Ming Chen
@ 2014-03-10  5:40           ` Ming Chen
  1 sibling, 0 replies; 23+ messages in thread
From: Ming Chen @ 2014-03-10  5:40 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: John Fastabend, netdev, Erez Zadok, Dean Hildebrand,
	Geoff Kuenning

On Sun, Mar 9, 2014 at 9:44 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> Multiqueue is not a requirement in your case. You can easily reach line
> rate with a single queue on a 10Gbe NIC.
>

I repeated the experiment for 10 times using one tx queue with FQ, and
all clients get fair share of the bandwidth. The overall throughout
showed no difference between the single queue case and the mq case,
and the throughput in both cases are close to the line rate.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [BUG?] ixgbe: only num_online_cpus() of the tx queues are enabled
  2014-03-09  6:56           ` Ming Chen
@ 2014-03-11  4:56             ` Geoff Kuenning
  0 siblings, 0 replies; 23+ messages in thread
From: Geoff Kuenning @ 2014-03-11  4:56 UTC (permalink / raw)
  To: Ming Chen
  Cc: John Fastabend, Eric Dumazet, netdev, Erez Zadok, Dean Hildebrand

> In our case, we believe larger number of tx queues will reduce the
> probability of hash collision.

However, that's only true for fairly small numbers of flows.  If you
have 64 queues and 65 flows then two of the flows are guaranteed to
collide in the best case.

Not that I have a brilliant solution, just pointing it out...
-- 
    Geoff Kuenning   geoff@cs.hmc.edu   http://www.cs.hmc.edu/~geoff/

Orchestra retrospectively extremely satisfied with symphony [No. 1] as
result of barrel of free beer.
	-- Gustav Mahler, post-premiere letter to Arnold Berliner

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2014-03-11  5:05 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-03-08  6:13 [BUG?] ixgbe: only num_online_cpus() of the tx queues are enabled Ming Chen
2014-03-08  7:12 ` John Fastabend
2014-03-09  0:19   ` Ming Chen
2014-03-08 15:27 ` Eric Dumazet
2014-03-08 16:08   ` Eric Dumazet
2014-03-09  0:53     ` Ming Chen
2014-03-09  3:37       ` Eric Dumazet
2014-03-09  3:52         ` John Fastabend
2014-03-09  4:11           ` Eric Dumazet
2014-03-09  6:56           ` Ming Chen
2014-03-11  4:56             ` Geoff Kuenning
2014-03-09  6:47         ` Ming Chen
2014-03-09 13:39           ` Eric Dumazet
2014-03-09 22:31             ` David Miller
2014-03-09  0:30   ` Ming Chen
2014-03-09  3:29     ` Eric Dumazet
2014-03-09  6:43       ` Ming Chen
2014-03-09 13:44         ` Eric Dumazet
2014-03-09 19:22           ` Ming Chen
2014-03-09 19:37             ` Eric Dumazet
2014-03-09 19:41             ` Eric Dumazet
2014-03-09 19:43               ` Ming Chen
2014-03-10  5:40           ` Ming Chen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).