From: Eric Dumazet <dada1@cosmosbay.com>
To: Andrew Dickinson <andrew@whydna.net>
Cc: David Miller <davem@davemloft.net>,
jelaas@gmail.com, netdev@vger.kernel.org
Subject: Re: tx queue hashing hot-spots and poor performance (multiq, ixgbe)
Date: Fri, 01 May 2009 08:14:03 +0200 [thread overview]
Message-ID: <49FA932B.4030405@cosmosbay.com> (raw)
In-Reply-To: <606676310904301653w28f3226fsc477dc92b6a7cdbc@mail.gmail.com>
Andrew Dickinson a écrit :
> OK... I've got some more data on it...
>
> I passed a small number of packets through the system and added a ton
> of printks to it ;-P
>
> Here's the distribution of values as seen by
> skb_rx_queue_recorded()... count on the left, value on the right:
> 37 0
> 31 1
> 31 2
> 39 3
> 37 4
> 31 5
> 42 6
> 39 7
>
> That's nice and even.... Here's what's getting returned from the
> skb_tx_hash(). Again, count on the left, value on the right:
> 31 0
> 81 1
> 37 2
> 70 3
> 37 4
> 31 6
>
> Note that we're entirely missing 5 and 7 and that those interrupts
> seem to have gotten munged onto 1 and 3.
>
> I think the voodoo lies within:
> return (u16) (((u64) hash * dev->real_num_tx_queues) >> 32);
>
> David, I made the change that you suggested:
> //hash = skb_get_rx_queue(skb);
> return skb_get_rx_queue(skb) % dev->real_num_tx_queues;
>
> And now, I see a nice even mixing of interrupts on the TX side (yay!).
>
> However, my problem's not solved entirely... here's what top is showing me:
> top - 23:37:49 up 9 min, 1 user, load average: 3.93, 2.68, 1.21
> Tasks: 119 total, 5 running, 114 sleeping, 0 stopped, 0 zombie
> Cpu0 : 0.0%us, 0.0%sy, 0.0%ni, 99.3%id, 0.0%wa, 0.3%hi, 0.3%si, 0.0%st
> Cpu1 : 0.0%us, 0.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 4.3%hi, 95.7%si, 0.0%st
> Cpu2 : 0.0%us, 0.0%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
> Cpu3 : 0.0%us, 0.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 4.3%hi, 95.7%si, 0.0%st
> Cpu4 : 0.0%us, 0.0%sy, 0.0%ni, 99.3%id, 0.0%wa, 0.3%hi, 0.3%si, 0.0%st
> Cpu5 : 0.0%us, 0.0%sy, 0.0%ni, 2.0%id, 0.0%wa, 4.0%hi, 94.0%si, 0.0%st
> Cpu6 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
> Cpu7 : 0.0%us, 0.0%sy, 0.0%ni, 5.6%id, 0.0%wa, 2.3%hi, 92.1%si, 0.0%st
> Mem: 16403476k total, 335884k used, 16067592k free, 10108k buffers
> Swap: 2096472k total, 0k used, 2096472k free, 146364k cached
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 7 root 15 -5 0 0 0 R 100.2 0.0 5:35.24
> ksoftirqd/1
> 13 root 15 -5 0 0 0 R 100.2 0.0 5:36.98
> ksoftirqd/3
> 19 root 15 -5 0 0 0 R 97.8 0.0 5:34.52
> ksoftirqd/5
> 25 root 15 -5 0 0 0 R 94.5 0.0 5:13.56
> ksoftirqd/7
> 3905 root 20 0 12612 1084 820 R 0.3 0.0 0:00.14 top
> <snip>
>
>
> It appears that only the odd CPUs are actually handling the
> interrupts, which doesn't jive with what /proc/interrupts shows me:
> CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7
> 66: 2970565 0 0 0 0
> 0 0 0 PCI-MSI-edge eth2-rx-0
> 67: 28 821122 0 0 0
> 0 0 0 PCI-MSI-edge eth2-rx-1
> 68: 28 0 2943299 0 0
> 0 0 0 PCI-MSI-edge eth2-rx-2
> 69: 28 0 0 817776 0
> 0 0 0 PCI-MSI-edge eth2-rx-3
> 70: 28 0 0 0 2963924
> 0 0 0 PCI-MSI-edge eth2-rx-4
> 71: 28 0 0 0 0
> 821032 0 0 PCI-MSI-edge eth2-rx-5
> 72: 28 0 0 0 0
> 0 2979987 0 PCI-MSI-edge eth2-rx-6
> 73: 28 0 0 0 0
> 0 0 845422 PCI-MSI-edge eth2-rx-7
> 74: 4664732 0 0 0 0
> 0 0 0 PCI-MSI-edge eth2-tx-0
> 75: 34 4679312 0 0 0
> 0 0 0 PCI-MSI-edge eth2-tx-1
> 76: 28 0 4665014 0 0
> 0 0 0 PCI-MSI-edge eth2-tx-2
> 77: 28 0 0 4681531 0
> 0 0 0 PCI-MSI-edge eth2-tx-3
> 78: 28 0 0 0 4665793
> 0 0 0 PCI-MSI-edge eth2-tx-4
> 79: 28 0 0 0 0
> 4671596 0 0 PCI-MSI-edge eth2-tx-5
> 80: 28 0 0 0 0
> 0 4665279 0 PCI-MSI-edge eth2-tx-6
> 81: 28 0 0 0 0
> 0 0 4664504 PCI-MSI-edge eth2-tx-7
> 82: 2 0 0 0 0
> 0 0 0 PCI-MSI-edge eth2:lsc
>
>
> Why would ksoftirqd only run on half of the cores (and only the odd
> ones to boot)? The one commonality that's striking me is that that
> all the odd CPU#'s are on the same physical processor:
>
> -bash-3.2# cat /proc/cpuinfo | grep -E '(physical|processor)' | grep -v virtual
> processor : 0
> physical id : 0
> processor : 1
> physical id : 1
> processor : 2
> physical id : 0
> processor : 3
> physical id : 1
> processor : 4
> physical id : 0
> processor : 5
> physical id : 1
> processor : 6
> physical id : 0
> processor : 7
> physical id : 1
>
> I did compile the kernel with NUMA support... am I being bitten by
> something there? Other thoughts on where I should look.
>
> Also... is there an incantation to get NAPI to work in the torvalds
> kernel? As you can see, I'm generating quite a few interrrupts.
>
> -A
>
>
> On Thu, Apr 30, 2009 at 7:08 AM, David Miller <davem@davemloft.net> wrote:
>> From: Andrew Dickinson <andrew@whydna.net>
>> Date: Thu, 30 Apr 2009 07:04:33 -0700
>>
>>> I'll do some debugging around skb_tx_hash() and see if I can make
>>> sense of it. I'll let you know what I find. My hypothesis is that
>>> skb_record_rx_queue() isn't being called, but I should dig into it
>>> before I start making claims. ;-P
>> That's one possibility.
>>
>> Another is that the hashing isn't working out. One way to
>> play with that is to simply replace the:
>>
>> hash = skb_get_rx_queue(skb);
>>
>> in skb_tx_hash() with something like:
>>
>> return skb_get_rx_queue(skb) % dev->real_num_tx_queues;
>>
>> and see if that improves the situation.
>>
Hi Andrew
Please try following patch (I dont have multi-queue NIC, sorry)
I will do the followup patch if this ones corrects the distribution problem
you noticed.
Thanks very much for all your findings.
[PATCH] net: skb_tx_hash() improvements
When skb_rx_queue_recorded() is true, we dont want to use jash distribution
as the device driver exactly told us which queue was selected at RX time.
jhash makes a statistical shuffle, but this wont work with 8 static inputs.
Later improvements would be to compute reciprocal value of real_num_tx_queues
to avoid a divide here. But this computation should be done once,
when real_num_tx_queues is set. This needs a separate patch, and a new
field in struct net_device.
Reported-by: Andrew Dickinson <andrew@whydna.net>
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
diff --git a/net/core/dev.c b/net/core/dev.c
index 308a7d0..e2e9e4a 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1735,11 +1735,12 @@ u16 skb_tx_hash(const struct net_device *dev, const struct sk_buff *skb)
{
u32 hash;
- if (skb_rx_queue_recorded(skb)) {
- hash = skb_get_rx_queue(skb);
- } else if (skb->sk && skb->sk->sk_hash) {
+ if (skb_rx_queue_recorded(skb))
+ return skb_get_rx_queue(skb) % dev->real_num_tx_queues;
+
+ if (skb->sk && skb->sk->sk_hash)
hash = skb->sk->sk_hash;
- } else
+ else
hash = skb->protocol;
hash = jhash_1word(hash, skb_tx_hashrnd);
next prev parent reply other threads:[~2009-05-01 6:14 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-04-29 23:00 tx queue hashing hot-spots and poor performance (multiq, ixgbe) Andrew Dickinson
2009-04-30 9:07 ` Jens Låås
2009-04-30 9:24 ` David Miller
2009-04-30 10:51 ` Jens Låås
2009-04-30 11:05 ` David Miller
2009-04-30 14:04 ` Andrew Dickinson
2009-04-30 14:08 ` David Miller
2009-04-30 23:53 ` Andrew Dickinson
2009-05-01 4:19 ` Andrew Dickinson
2009-05-01 7:32 ` Eric Dumazet
2009-05-01 7:47 ` Eric Dumazet
2009-05-01 6:14 ` Eric Dumazet [this message]
2009-05-01 6:19 ` Andrew Dickinson
2009-05-01 6:40 ` Eric Dumazet
2009-05-01 7:23 ` Andrew Dickinson
2009-05-01 7:31 ` Eric Dumazet
2009-05-01 7:34 ` Andrew Dickinson
2009-05-01 21:37 ` Brandeburg, Jesse
2009-05-01 8:29 ` [PATCH] net: skb_tx_hash() improvements Eric Dumazet
2009-05-01 8:52 ` Eric Dumazet
2009-05-01 9:29 ` Eric Dumazet
2009-05-01 16:17 ` David Miller
2009-05-03 21:44 ` David Miller
2009-05-04 6:12 ` Eric Dumazet
2009-05-01 16:08 ` tx queue hashing hot-spots and poor performance (multiq, ixgbe) David Miller
2009-05-01 16:48 ` Eric Dumazet
2009-05-01 17:22 ` David Miller
2009-05-01 10:20 ` Jesper Dangaard Brouer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=49FA932B.4030405@cosmosbay.com \
--to=dada1@cosmosbay.com \
--cc=andrew@whydna.net \
--cc=davem@davemloft.net \
--cc=jelaas@gmail.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.