* [PATCH net-next] net: ovs: use CRC32 accelerated flow hash if available
@ 2013-12-10 9:52 Francesco Fusco
2013-12-10 10:20 ` David Laight
2013-12-10 19:28 ` Jesse Gross
0 siblings, 2 replies; 9+ messages in thread
From: Francesco Fusco @ 2013-12-10 9:52 UTC (permalink / raw)
To: jesse; +Cc: netdev, dev, Daniel Borkmann, Thomas Graf
Currently OVS uses jhash2() for calculating flow hashes in its
internal flow_hash() function. The performance of the flow_hash()
function is critical, as the input data can be hundreds of bytes
long.
One possible direction to achieve higher performance consists
of replacing jhash2() with an architecture specific hash
function pretty much like the Intel folks have proposed in
DPDK. DPDK provides a very fast hash function that leverages
the 32bit crc32l instruction part of the Intel SSE4.2
instruction set.
OVS is largely deployed in x86_64 based datacenters.
Therefore, we argue that the performance critical fast path
of OVS should exploit underlying CPU features in order to reduce
the per packet processing costs.
We adapted and integrated Intel's hashing function from DPDK in
order to be used with OVS internally. Our patch greatly reduces
the hash footprint from ~200 cycles of jhash2() to around ~90
cycles in case of ovs_flow_hash_crc() (measured with rdtsc over
maximum length flow keys on an i7 Intel CPU).
Additionally, we wrote a microbenchmark to stress the flow table
performance. The benchmark inserts random flows into the flow
hash and then performs lookups. Our hash deployed on a CRC32
capable CPU reduces the lookup for 1000 flows, 100 masks from
~10,100us to ~6,700us, for example.
Signed-off-by: Francesco Fusco <ffusco@redhat.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Thomas Graf <tgraf@redhat.com>
---
net/openvswitch/flow_table.c | 12 ++++++--
net/openvswitch/flow_table.h | 66 ++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 75 insertions(+), 3 deletions(-)
diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
index e425427..262d7ca 100644
--- a/net/openvswitch/flow_table.c
+++ b/net/openvswitch/flow_table.c
@@ -43,13 +43,16 @@
#include <net/ip.h>
#include <net/ipv6.h>
#include <net/ndisc.h>
-
+#ifdef CONFIG_X86
+#include <asm/cpufeature.h>
+#endif
#include "datapath.h"
#define TBL_MIN_BUCKETS 1024
#define REHASH_INTERVAL (10 * 60 * HZ)
static struct kmem_cache *flow_cache;
+static u32 (*__flow_hash)(const u32 *data, u32 len, u32 seed) = jhash2;
static u16 range_n_bytes(const struct sw_flow_key_range *range)
{
@@ -362,7 +365,7 @@ static u32 flow_hash(const struct sw_flow_key *key, int key_start,
/* Make sure number of hash bytes are multiple of u32. */
BUILD_BUG_ON(sizeof(long) % sizeof(u32));
- return jhash2(hash_key, hash_u32s, 0);
+ return __flow_hash(hash_key, hash_u32s, 0);
}
static int flow_key_start(const struct sw_flow_key *key)
@@ -581,7 +584,10 @@ int ovs_flow_init(void)
0, NULL);
if (flow_cache == NULL)
return -ENOMEM;
-
+#ifdef CONFIG_X86_64
+ if (cpu_has_xmm4_2)
+ __flow_hash = ovs_flow_hash_crc;
+#endif
return 0;
}
diff --git a/net/openvswitch/flow_table.h b/net/openvswitch/flow_table.h
index fbe45d5..50cfd1e 100644
--- a/net/openvswitch/flow_table.h
+++ b/net/openvswitch/flow_table.h
@@ -14,6 +14,37 @@
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
* 02110-1301, USA
+ *
+ * Some portions derived from code covered by the following notice:
+ *
+ * Copyright (c) 2010-2013 Intel Corporation. All rights reserved.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#ifndef FLOW_TABLE_H
@@ -78,4 +109,39 @@ bool ovs_flow_cmp_unmasked_key(const struct sw_flow *flow,
void ovs_flow_mask_key(struct sw_flow_key *dst, const struct sw_flow_key *src,
const struct sw_flow_mask *mask);
+
+#ifdef CONFIG_X86_64
+static inline u32 ovs_flow_hash_crc_4b(u32 crc, u32 val)
+{
+ asm ("crc32l %[val], %[crc]\n"
+ : [crc] "+r" (crc)
+ : [val] "rm" (val));
+ return crc;
+}
+
+static inline u32 ovs_flow_hash_crc(const u32 *data, u32 len, u32 seed)
+{
+ const u32 *p32 = (const u32 *) data;
+ u32 i, tmp = 0;
+
+ for (i = 0; i < len; i++)
+ seed = ovs_flow_hash_crc_4b(*p32++, seed);
+
+ switch (3 - ((len * 4) & 0x03)) {
+ case 0:
+ tmp |= *((const u8 *) p32 + 2) << 16;
+ /* Fallthrough */
+ case 1:
+ tmp |= *((const u8 *) p32 + 1) << 8;
+ /* Fallthrough */
+ case 2:
+ tmp |= *((const u8 *) p32);
+ seed = ovs_flow_hash_crc_4b(tmp, seed);
+ default:
+ break;
+ }
+
+ return seed;
+}
+#endif
#endif /* flow_table.h */
--
1.8.3.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* RE: [PATCH net-next] net: ovs: use CRC32 accelerated flow hash if available
2013-12-10 9:52 [PATCH net-next] net: ovs: use CRC32 accelerated flow hash if available Francesco Fusco
@ 2013-12-10 10:20 ` David Laight
2013-12-10 19:28 ` Jesse Gross
1 sibling, 0 replies; 9+ messages in thread
From: David Laight @ 2013-12-10 10:20 UTC (permalink / raw)
To: Francesco Fusco, jesse; +Cc: netdev, dev, Daniel Borkmann, Thomas Graf
> From: Francesco Fusco
...
> One possible direction to achieve higher performance consists
> of replacing jhash2() with an architecture specific hash
> function pretty much like the Intel folks have proposed in
> DPDK. DPDK provides a very fast hash function that leverages
> the 32bit crc32l instruction part of the Intel SSE4.2
> instruction set.
...
IIRC the cpu can execute multiple crc32l instructions in parallel.
If you use a software lookup table to crc (say) 256 zero bytes
it should be possible to crc chunks of a large buffer in parallel.
> +#ifdef CONFIG_X86_64
Why not also i386?
> +static inline u32 ovs_flow_hash_crc_4b(u32 crc, u32 val)
> +{
> + asm ("crc32l %[val], %[crc]\n"
> + : [crc] "+r" (crc)
> + : [val] "rm" (val));
> + return crc;
> +}
> +
> +static inline u32 ovs_flow_hash_crc(const u32 *data, u32 len, u32 seed)
> +{
> + const u32 *p32 = (const u32 *) data;
> + u32 i, tmp = 0;
> +
> + for (i = 0; i < len; i++)
> + seed = ovs_flow_hash_crc_4b(*p32++, seed);
Doesn't that have the arguments if the wrong order ?
Maybe it doesn't affect the result, but the asm pattern works better
the other way around.
> + switch (3 - ((len * 4) & 0x03)) {
This looked like an obscure way to calculate the remainder,
then I noticed the 'len *4' which means it is always 3
(no residual).
> + case 0:
> + tmp |= *((const u8 *) p32 + 2) << 16;
> + /* Fallthrough */
> + case 1:
> + tmp |= *((const u8 *) p32 + 1) << 8;
> + /* Fallthrough */
> + case 2:
> + tmp |= *((const u8 *) p32);
> + seed = ovs_flow_hash_crc_4b(tmp, seed);
> + default:
> + break;
> + }
> +
> + return seed;
> +}
> +#endif
David
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net-next] net: ovs: use CRC32 accelerated flow hash if available
2013-12-10 9:52 [PATCH net-next] net: ovs: use CRC32 accelerated flow hash if available Francesco Fusco
2013-12-10 10:20 ` David Laight
@ 2013-12-10 19:28 ` Jesse Gross
2013-12-10 19:36 ` David Miller
1 sibling, 1 reply; 9+ messages in thread
From: Jesse Gross @ 2013-12-10 19:28 UTC (permalink / raw)
To: Francesco Fusco; +Cc: netdev, dev@openvswitch.org, Daniel Borkmann, Thomas Graf
On Tue, Dec 10, 2013 at 1:52 AM, Francesco Fusco <ffusco@redhat.com> wrote:
> Currently OVS uses jhash2() for calculating flow hashes in its
> internal flow_hash() function. The performance of the flow_hash()
> function is critical, as the input data can be hundreds of bytes
> long.
>
> One possible direction to achieve higher performance consists
> of replacing jhash2() with an architecture specific hash
> function pretty much like the Intel folks have proposed in
> DPDK. DPDK provides a very fast hash function that leverages
> the 32bit crc32l instruction part of the Intel SSE4.2
> instruction set.
>
> OVS is largely deployed in x86_64 based datacenters.
> Therefore, we argue that the performance critical fast path
> of OVS should exploit underlying CPU features in order to reduce
> the per packet processing costs.
>
> We adapted and integrated Intel's hashing function from DPDK in
> order to be used with OVS internally. Our patch greatly reduces
> the hash footprint from ~200 cycles of jhash2() to around ~90
> cycles in case of ovs_flow_hash_crc() (measured with rdtsc over
> maximum length flow keys on an i7 Intel CPU).
>
> Additionally, we wrote a microbenchmark to stress the flow table
> performance. The benchmark inserts random flows into the flow
> hash and then performs lookups. Our hash deployed on a CRC32
> capable CPU reduces the lookup for 1000 flows, 100 masks from
> ~10,100us to ~6,700us, for example.
>
> Signed-off-by: Francesco Fusco <ffusco@redhat.com>
> Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
> Signed-off-by: Thomas Graf <tgraf@redhat.com>
I think this is definitely a good optimization to do given that so
much of the work that OVS does is hashing. However, isn't there a
library where there would be a more appropriate place to put this?
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net-next] net: ovs: use CRC32 accelerated flow hash if available
2013-12-10 19:28 ` Jesse Gross
@ 2013-12-10 19:36 ` David Miller
2013-12-10 20:47 ` Thomas Graf
2013-12-10 21:27 ` Tom Herbert
0 siblings, 2 replies; 9+ messages in thread
From: David Miller @ 2013-12-10 19:36 UTC (permalink / raw)
To: jesse; +Cc: ffusco, netdev, dev, dborkman, tgraf
From: Jesse Gross <jesse@nicira.com>
Date: Tue, 10 Dec 2013 11:28:08 -0800
> I think this is definitely a good optimization to do given that so
> much of the work that OVS does is hashing. However, isn't there a
> library where there would be a more appropriate place to put this?
I also honestly don't see why we want to special case OVS at all
here. This faster hashing would be useful for socket demux and
other locations in the kernel.
When I see changes like this my only reaction is "sad face".
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net-next] net: ovs: use CRC32 accelerated flow hash if available
2013-12-10 19:36 ` David Miller
@ 2013-12-10 20:47 ` Thomas Graf
2013-12-10 21:36 ` Jesse Gross
2013-12-10 22:38 ` David Miller
2013-12-10 21:27 ` Tom Herbert
1 sibling, 2 replies; 9+ messages in thread
From: Thomas Graf @ 2013-12-10 20:47 UTC (permalink / raw)
To: David Miller, jesse; +Cc: ffusco, netdev, dev, dborkman
On 12/10/2013 08:36 PM, David Miller wrote:
> From: Jesse Gross <jesse@nicira.com>
> Date: Tue, 10 Dec 2013 11:28:08 -0800
>
>> I think this is definitely a good optimization to do given that so
>> much of the work that OVS does is hashing. However, isn't there a
>> library where there would be a more appropriate place to put this?
>
> I also honestly don't see why we want to special case OVS at all
> here. This faster hashing would be useful for socket demux and
> other locations in the kernel.
>
> When I see changes like this my only reaction is "sad face".
We discussed this heavily and decided to go with the minimal approach
first to collect some feedback. We were not sure whether the runtime
check, function pointer and hardware dependencies are something other
subsystems that are less x86_64 centric would want to live with.
That said, we are _very_ willing to do the work and move it to lib/
to make it available to other consumers but one should be aware that
crc32 based hashes are not as generally usable as jhash.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net-next] net: ovs: use CRC32 accelerated flow hash if available
2013-12-10 20:47 ` Thomas Graf
@ 2013-12-10 21:36 ` Jesse Gross
2013-12-10 22:38 ` David Miller
1 sibling, 0 replies; 9+ messages in thread
From: Jesse Gross @ 2013-12-10 21:36 UTC (permalink / raw)
To: Thomas Graf
Cc: David Miller, ffusco, netdev, dev@openvswitch.org,
Daniel Borkmann
On Tue, Dec 10, 2013 at 12:47 PM, Thomas Graf <tgraf@redhat.com> wrote:
> On 12/10/2013 08:36 PM, David Miller wrote:
>>
>> From: Jesse Gross <jesse@nicira.com>
>> Date: Tue, 10 Dec 2013 11:28:08 -0800
>>
>>> I think this is definitely a good optimization to do given that so
>>> much of the work that OVS does is hashing. However, isn't there a
>>> library where there would be a more appropriate place to put this?
>>
>>
>> I also honestly don't see why we want to special case OVS at all
>> here. This faster hashing would be useful for socket demux and
>> other locations in the kernel.
>>
>> When I see changes like this my only reaction is "sad face".
>
>
> We discussed this heavily and decided to go with the minimal approach
> first to collect some feedback. We were not sure whether the runtime
> check, function pointer and hardware dependencies are something other
> subsystems that are less x86_64 centric would want to live with.
>
> That said, we are _very_ willing to do the work and move it to lib/
> to make it available to other consumers but one should be aware that
> crc32 based hashes are not as generally usable as jhash.
Most hash users probably don't care what the actual hash function is,
as long as it is consistent. Couldn't you have a function that uses
CRC32 if it is available (or whatever is appropriate on a given
architecture) and jhash otherwise? Most users would then be able to
call this function without knowing or caring about the various quirks.
We could also potentially use the static key stuff to get the right
hash function as well.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net-next] net: ovs: use CRC32 accelerated flow hash if available
2013-12-10 20:47 ` Thomas Graf
2013-12-10 21:36 ` Jesse Gross
@ 2013-12-10 22:38 ` David Miller
1 sibling, 0 replies; 9+ messages in thread
From: David Miller @ 2013-12-10 22:38 UTC (permalink / raw)
To: tgraf; +Cc: jesse, ffusco, netdev, dev, dborkman
From: Thomas Graf <tgraf@redhat.com>
Date: Tue, 10 Dec 2013 21:47:48 +0100
> We were not sure whether the runtime check, function pointer and
> hardware dependencies are something other subsystems that are less
> x86_64 centric would want to live with.
The things you are listing as implicit overhead can be eliminated
almost entirely using static branches.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net-next] net: ovs: use CRC32 accelerated flow hash if available
2013-12-10 19:36 ` David Miller
2013-12-10 20:47 ` Thomas Graf
@ 2013-12-10 21:27 ` Tom Herbert
2013-12-10 21:40 ` Jesse Gross
1 sibling, 1 reply; 9+ messages in thread
From: Tom Herbert @ 2013-12-10 21:27 UTC (permalink / raw)
To: David Miller
Cc: Jesse Gross, ffusco, Linux Netdev List, dev, Daniel Borkmann,
tgraf
On Tue, Dec 10, 2013 at 11:36 AM, David Miller <davem@davemloft.net> wrote:
> From: Jesse Gross <jesse@nicira.com>
> Date: Tue, 10 Dec 2013 11:28:08 -0800
>
>> I think this is definitely a good optimization to do given that so
>> much of the work that OVS does is hashing. However, isn't there a
>> library where there would be a more appropriate place to put this?
>
> I also honestly don't see why we want to special case OVS at all
> here. This faster hashing would be useful for socket demux and
> other locations in the kernel.
>
Also, we already do a lot of work to compute flow hashes for packets
in both TX and RX. Can these be leveraged? For instance, if OVS is
computing a 5-tuple hash on an RX packet it's probably redundant to
skb->rxhash.
> When I see changes like this my only reaction is "sad face".
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net-next] net: ovs: use CRC32 accelerated flow hash if available
2013-12-10 21:27 ` Tom Herbert
@ 2013-12-10 21:40 ` Jesse Gross
0 siblings, 0 replies; 9+ messages in thread
From: Jesse Gross @ 2013-12-10 21:40 UTC (permalink / raw)
To: Tom Herbert
Cc: David Miller, Francesco Fusco, Linux Netdev List,
dev@openvswitch.org, Daniel Borkmann, Thomas Graf
On Tue, Dec 10, 2013 at 1:27 PM, Tom Herbert <therbert@google.com> wrote:
> On Tue, Dec 10, 2013 at 11:36 AM, David Miller <davem@davemloft.net> wrote:
>> From: Jesse Gross <jesse@nicira.com>
>> Date: Tue, 10 Dec 2013 11:28:08 -0800
>>
>>> I think this is definitely a good optimization to do given that so
>>> much of the work that OVS does is hashing. However, isn't there a
>>> library where there would be a more appropriate place to put this?
>>
>> I also honestly don't see why we want to special case OVS at all
>> here. This faster hashing would be useful for socket demux and
>> other locations in the kernel.
>>
>
> Also, we already do a lot of work to compute flow hashes for packets
> in both TX and RX. Can these be leveraged? For instance, if OVS is
> computing a 5-tuple hash on an RX packet it's probably redundant to
> skb->rxhash.
That's true if we care about exactly the 5-tuple. However, the set of
fields that are included in the OVS flow lookup is dynamically
generated down to a bit mask so I don't know if it's worth special
casing. (In fact, I don't think that current OVS userspace will ever
actually emit exactly a 5-tuple.)
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2013-12-10 22:38 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-10 9:52 [PATCH net-next] net: ovs: use CRC32 accelerated flow hash if available Francesco Fusco
2013-12-10 10:20 ` David Laight
2013-12-10 19:28 ` Jesse Gross
2013-12-10 19:36 ` David Miller
2013-12-10 20:47 ` Thomas Graf
2013-12-10 21:36 ` Jesse Gross
2013-12-10 22:38 ` David Miller
2013-12-10 21:27 ` Tom Herbert
2013-12-10 21:40 ` Jesse Gross
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).