From: Vitaly Kuznetsov <vkuznets@redhat.com>
To: Simon Xiao <sixiao@microsoft.com>, Eric Dumazet <eric.dumazet@gmail.com>
Cc: Tom Herbert <tom@herbertland.com>,
netdev@vger.kernel.org, "K. Y. Srinivasan" <kys@microsoft.com>,
Haiyang Zhang <haiyangz@microsoft.com>,
devel@linuxdriverproject.org, linux-kernel@vger.kernel.org,
David Miller <davem@davemloft.net>
Subject: Re: [PATCH net-next] hv_netvsc: don't make assumptions on struct flow_keys layout
Date: Thu, 07 Jan 2016 14:28:26 +0100 [thread overview]
Message-ID: <877fjlfrid.fsf@vitty.brq.redhat.com> (raw)
In-Reply-To: <1452171150.8255.207.camel@edumazet-glaptop2.roam.corp.google.com> (Eric Dumazet's message of "Thu, 07 Jan 2016 04:52:30 -0800")
[-- Attachment #1: Type: text/plain, Size: 2330 bytes --]
Eric Dumazet <eric.dumazet@gmail.com> writes:
> On Thu, 2016-01-07 at 10:33 +0100, Vitaly Kuznetsov wrote:
>> Recent changes to 'struct flow_keys' (e.g commit d34af823ff40 ("net: Add
>> VLAN ID to flow_keys")) introduced a performance regression in netvsc
>> driver. Is problem is, however, not the above mentioned commit but the
>> fact that netvsc_set_hash() function did some assumptions on the struct
>> flow_keys data layout and this is wrong. We need to extract the data we
>> need (src/dst addresses and ports) after the dissect.
>>
>> The issue could also be solved in a completely different way: as suggested
>> by Eric instead of our own homegrown netvsc_set_hash() we could use
>> skb_get_hash() which does more or less the same. Unfortunately, the
>> testing done by Simon showed that Hyper-V hosts are not happy with our
>> Jenkins hash, selecting the output queue with the current algorithm based
>> on Toeplitz hash works significantly better.
>
> Were tests done on IPv6 traffic ?
>
Simon, could you please test this patch for IPv6 and show us the numbers?
> Toeplitz hash takes at least 100 ns to hash 12 bytes (one iteration per
> bit : 96 iterations)
>
> For IPv6 it is 3 times this, since we have to hash 36 bytes.
>
> I do not see how it can compete with skb_get_hash() that directly gives
> skb->hash for local TCP flows.
>
My guess is that this is not the bottleneck, something is happening
behind the scene with out packets in Hyper-V host (e.g. re-distributing
them to hardware queues?) but I don't know the internals, Microsoft
folks could probably comment.
> See commits b73c3d0e4f0e1961e15bec18720e48aabebe2109
> ("net: Save TX flow hash in sock and set in skbuf on xmit")
> and 877d1f6291f8e391237e324be58479a3e3a7407c
> ("net: Set sk_txhash from a random number")
>
> I understand Microsoft loves Toeplitz, but this looks not well placed
> here.
>
> I suspect there is another problem.
>
> Please share your numbers and test methodology, and the alternative
> patch Simon tested so that we can double check it.
>
Alternative patch which uses skb_get_hash() attached. Simon, could you
please share the rest (environment, metodology, numbers) with us here?
Thanks!
> Thanks.
>
> PS: For the time being this patch can probably be applied on -net tree,
> as it fixes a real bug.
--
Vitaly
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-hv_netvsc-use-skb_get_hash-instead-of-a-homegrown-im.patch --]
[-- Type: text/x-patch, Size: 2420 bytes --]
>From 0040e79c1303bd225ddbbce679ea944ea11ad0bd Mon Sep 17 00:00:00 2001
From: Vitaly Kuznetsov <vkuznets@redhat.com>
Date: Wed, 6 Jan 2016 12:14:10 +0100
Subject: [PATCH] hv_netvsc: use skb_get_hash() instead of a homegrown
implementation
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
drivers/net/hyperv/netvsc_drv.c | 67 ++---------------------------------------
1 file changed, 3 insertions(+), 64 deletions(-)
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 409b48e..038bf4f 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -195,65 +195,6 @@ static void *init_ppi_data(struct rndis_message *msg, u32 ppi_size,
return ppi;
}
-union sub_key {
- u64 k;
- struct {
- u8 pad[3];
- u8 kb;
- u32 ka;
- };
-};
-
-/* Toeplitz hash function
- * data: network byte order
- * return: host byte order
- */
-static u32 comp_hash(u8 *key, int klen, void *data, int dlen)
-{
- union sub_key subk;
- int k_next = 4;
- u8 dt;
- int i, j;
- u32 ret = 0;
-
- subk.k = 0;
- subk.ka = ntohl(*(u32 *)key);
-
- for (i = 0; i < dlen; i++) {
- subk.kb = key[k_next];
- k_next = (k_next + 1) % klen;
- dt = ((u8 *)data)[i];
- for (j = 0; j < 8; j++) {
- if (dt & 0x80)
- ret ^= subk.ka;
- dt <<= 1;
- subk.k <<= 1;
- }
- }
-
- return ret;
-}
-
-static bool netvsc_set_hash(u32 *hash, struct sk_buff *skb)
-{
- struct flow_keys flow;
- int data_len;
-
- if (!skb_flow_dissect_flow_keys(skb, &flow, 0) ||
- !(flow.basic.n_proto == htons(ETH_P_IP) ||
- flow.basic.n_proto == htons(ETH_P_IPV6)))
- return false;
-
- if (flow.basic.ip_proto == IPPROTO_TCP)
- data_len = 12;
- else
- data_len = 8;
-
- *hash = comp_hash(netvsc_hash_key, HASH_KEYLEN, &flow, data_len);
-
- return true;
-}
-
static u16 netvsc_select_queue(struct net_device *ndev, struct sk_buff *skb,
void *accel_priv, select_queue_fallback_t fallback)
{
@@ -266,11 +207,9 @@ static u16 netvsc_select_queue(struct net_device *ndev, struct sk_buff *skb,
if (nvsc_dev == NULL || ndev->real_num_tx_queues <= 1)
return 0;
- if (netvsc_set_hash(&hash, skb)) {
- q_idx = nvsc_dev->send_table[hash % VRSS_SEND_TAB_SIZE] %
- ndev->real_num_tx_queues;
- skb_set_hash(skb, hash, PKT_HASH_TYPE_L3);
- }
+ hash = skb_get_hash(skb);
+ q_idx = nvsc_dev->send_table[hash % VRSS_SEND_TAB_SIZE] %
+ ndev->real_num_tx_queues;
return q_idx;
}
--
2.4.3
WARNING: multiple messages have this Message-ID (diff)
From: Vitaly Kuznetsov <vkuznets@redhat.com>
To: Simon Xiao <sixiao@microsoft.com>, Eric Dumazet <eric.dumazet@gmail.com>
Cc: Tom Herbert <tom@herbertland.com>,
Haiyang Zhang <haiyangz@microsoft.com>,
linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
devel@linuxdriverproject.org, David Miller <davem@davemloft.net>
Subject: Re: [PATCH net-next] hv_netvsc: don't make assumptions on struct flow_keys layout
Date: Thu, 07 Jan 2016 14:28:26 +0100 [thread overview]
Message-ID: <877fjlfrid.fsf@vitty.brq.redhat.com> (raw)
In-Reply-To: <1452171150.8255.207.camel@edumazet-glaptop2.roam.corp.google.com> (Eric Dumazet's message of "Thu, 07 Jan 2016 04:52:30 -0800")
[-- Attachment #1: Type: text/plain, Size: 2330 bytes --]
Eric Dumazet <eric.dumazet@gmail.com> writes:
> On Thu, 2016-01-07 at 10:33 +0100, Vitaly Kuznetsov wrote:
>> Recent changes to 'struct flow_keys' (e.g commit d34af823ff40 ("net: Add
>> VLAN ID to flow_keys")) introduced a performance regression in netvsc
>> driver. Is problem is, however, not the above mentioned commit but the
>> fact that netvsc_set_hash() function did some assumptions on the struct
>> flow_keys data layout and this is wrong. We need to extract the data we
>> need (src/dst addresses and ports) after the dissect.
>>
>> The issue could also be solved in a completely different way: as suggested
>> by Eric instead of our own homegrown netvsc_set_hash() we could use
>> skb_get_hash() which does more or less the same. Unfortunately, the
>> testing done by Simon showed that Hyper-V hosts are not happy with our
>> Jenkins hash, selecting the output queue with the current algorithm based
>> on Toeplitz hash works significantly better.
>
> Were tests done on IPv6 traffic ?
>
Simon, could you please test this patch for IPv6 and show us the numbers?
> Toeplitz hash takes at least 100 ns to hash 12 bytes (one iteration per
> bit : 96 iterations)
>
> For IPv6 it is 3 times this, since we have to hash 36 bytes.
>
> I do not see how it can compete with skb_get_hash() that directly gives
> skb->hash for local TCP flows.
>
My guess is that this is not the bottleneck, something is happening
behind the scene with out packets in Hyper-V host (e.g. re-distributing
them to hardware queues?) but I don't know the internals, Microsoft
folks could probably comment.
> See commits b73c3d0e4f0e1961e15bec18720e48aabebe2109
> ("net: Save TX flow hash in sock and set in skbuf on xmit")
> and 877d1f6291f8e391237e324be58479a3e3a7407c
> ("net: Set sk_txhash from a random number")
>
> I understand Microsoft loves Toeplitz, but this looks not well placed
> here.
>
> I suspect there is another problem.
>
> Please share your numbers and test methodology, and the alternative
> patch Simon tested so that we can double check it.
>
Alternative patch which uses skb_get_hash() attached. Simon, could you
please share the rest (environment, metodology, numbers) with us here?
Thanks!
> Thanks.
>
> PS: For the time being this patch can probably be applied on -net tree,
> as it fixes a real bug.
--
Vitaly
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-hv_netvsc-use-skb_get_hash-instead-of-a-homegrown-im.patch --]
[-- Type: text/x-patch, Size: 2420 bytes --]
>From 0040e79c1303bd225ddbbce679ea944ea11ad0bd Mon Sep 17 00:00:00 2001
From: Vitaly Kuznetsov <vkuznets@redhat.com>
Date: Wed, 6 Jan 2016 12:14:10 +0100
Subject: [PATCH] hv_netvsc: use skb_get_hash() instead of a homegrown
implementation
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
drivers/net/hyperv/netvsc_drv.c | 67 ++---------------------------------------
1 file changed, 3 insertions(+), 64 deletions(-)
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 409b48e..038bf4f 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -195,65 +195,6 @@ static void *init_ppi_data(struct rndis_message *msg, u32 ppi_size,
return ppi;
}
-union sub_key {
- u64 k;
- struct {
- u8 pad[3];
- u8 kb;
- u32 ka;
- };
-};
-
-/* Toeplitz hash function
- * data: network byte order
- * return: host byte order
- */
-static u32 comp_hash(u8 *key, int klen, void *data, int dlen)
-{
- union sub_key subk;
- int k_next = 4;
- u8 dt;
- int i, j;
- u32 ret = 0;
-
- subk.k = 0;
- subk.ka = ntohl(*(u32 *)key);
-
- for (i = 0; i < dlen; i++) {
- subk.kb = key[k_next];
- k_next = (k_next + 1) % klen;
- dt = ((u8 *)data)[i];
- for (j = 0; j < 8; j++) {
- if (dt & 0x80)
- ret ^= subk.ka;
- dt <<= 1;
- subk.k <<= 1;
- }
- }
-
- return ret;
-}
-
-static bool netvsc_set_hash(u32 *hash, struct sk_buff *skb)
-{
- struct flow_keys flow;
- int data_len;
-
- if (!skb_flow_dissect_flow_keys(skb, &flow, 0) ||
- !(flow.basic.n_proto == htons(ETH_P_IP) ||
- flow.basic.n_proto == htons(ETH_P_IPV6)))
- return false;
-
- if (flow.basic.ip_proto == IPPROTO_TCP)
- data_len = 12;
- else
- data_len = 8;
-
- *hash = comp_hash(netvsc_hash_key, HASH_KEYLEN, &flow, data_len);
-
- return true;
-}
-
static u16 netvsc_select_queue(struct net_device *ndev, struct sk_buff *skb,
void *accel_priv, select_queue_fallback_t fallback)
{
@@ -266,11 +207,9 @@ static u16 netvsc_select_queue(struct net_device *ndev, struct sk_buff *skb,
if (nvsc_dev == NULL || ndev->real_num_tx_queues <= 1)
return 0;
- if (netvsc_set_hash(&hash, skb)) {
- q_idx = nvsc_dev->send_table[hash % VRSS_SEND_TAB_SIZE] %
- ndev->real_num_tx_queues;
- skb_set_hash(skb, hash, PKT_HASH_TYPE_L3);
- }
+ hash = skb_get_hash(skb);
+ q_idx = nvsc_dev->send_table[hash % VRSS_SEND_TAB_SIZE] %
+ ndev->real_num_tx_queues;
return q_idx;
}
--
2.4.3
[-- Attachment #3: Type: text/plain, Size: 169 bytes --]
_______________________________________________
devel mailing list
devel@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
next prev parent reply other threads:[~2016-01-07 13:28 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-01-07 9:33 [PATCH net-next] hv_netvsc: don't make assumptions on struct flow_keys layout Vitaly Kuznetsov
2016-01-07 9:33 ` Vitaly Kuznetsov
2016-01-07 12:52 ` Eric Dumazet
2016-01-07 13:28 ` Vitaly Kuznetsov [this message]
2016-01-07 13:28 ` Vitaly Kuznetsov
2016-01-08 1:02 ` John Fastabend
2016-01-08 3:49 ` KY Srinivasan
2016-01-08 3:49 ` KY Srinivasan
2016-01-08 6:16 ` John Fastabend
2016-01-08 6:16 ` John Fastabend
2016-01-08 18:01 ` KY Srinivasan
2016-01-08 21:07 ` Haiyang Zhang
2016-01-08 21:07 ` Haiyang Zhang
2016-01-09 0:17 ` Tom Herbert
2016-01-09 0:17 ` Tom Herbert
2016-01-10 22:25 ` David Miller
2016-01-10 22:25 ` David Miller
2016-01-13 23:10 ` Haiyang Zhang
2016-01-13 23:10 ` Haiyang Zhang
2016-01-14 4:56 ` David Miller
2016-01-14 4:56 ` David Miller
2016-01-14 17:14 ` Tom Herbert
2016-01-14 17:14 ` Tom Herbert
2016-01-14 17:53 ` One Thousand Gnomes
2016-01-14 17:53 ` One Thousand Gnomes
2016-01-14 18:24 ` Eric Dumazet
2016-01-14 18:24 ` Eric Dumazet
2016-01-14 18:35 ` Haiyang Zhang
2016-01-14 18:35 ` Haiyang Zhang
2016-01-14 18:48 ` Tom Herbert
2016-01-14 19:15 ` Haiyang Zhang
2016-01-14 19:15 ` Haiyang Zhang
2016-01-14 19:41 ` Tom Herbert
2016-01-14 20:23 ` Haiyang Zhang
2016-01-14 20:23 ` Haiyang Zhang
2016-01-14 21:44 ` Tom Herbert
2016-01-14 21:44 ` Tom Herbert
2016-01-14 22:06 ` David Miller
2016-01-14 22:08 ` Eric Dumazet
2016-01-14 22:08 ` Eric Dumazet
2016-01-14 22:29 ` Haiyang Zhang
2016-01-14 22:29 ` Haiyang Zhang
2016-01-14 17:53 ` Eric Dumazet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=877fjlfrid.fsf@vitty.brq.redhat.com \
--to=vkuznets@redhat.com \
--cc=davem@davemloft.net \
--cc=devel@linuxdriverproject.org \
--cc=eric.dumazet@gmail.com \
--cc=haiyangz@microsoft.com \
--cc=kys@microsoft.com \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=sixiao@microsoft.com \
--cc=tom@herbertland.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.