* [PATCH net-next 0/4] Allow configuration of multipath hash seed
@ 2024-05-29 11:18 Petr Machata
2024-05-29 11:18 ` [PATCH net-next 1/4] net: ipv4,ipv6: Pass multipath hash computation through a helper Petr Machata
` (4 more replies)
0 siblings, 5 replies; 22+ messages in thread
From: Petr Machata @ 2024-05-29 11:18 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
netdev
Cc: Ido Schimmel, Petr Machata, David Ahern
Let me just quote the commit message of patch #2 here to inform the
motivation and some of the implementation:
When calculating hashes for the purpose of multipath forwarding,
both IPv4 and IPv6 code currently fall back on
flow_hash_from_keys(). That uses a randomly-generated seed. That's a
fine choice by default, but unfortunately some deployments may need
a tighter control over the seed used.
In this patchset, make the seed configurable by adding a new sysctl
key, net.ipv4.fib_multipath_hash_seed to control the seed. This seed
is used specifically for multipath forwarding and not for the other
concerns that flow_hash_from_keys() is used for, such as queue
selection. Expose the knob as sysctl because other such settings,
such as headers to hash, are also handled that way.
Despite being placed in the net.ipv4 namespace, the multipath seed
sysctl is used for both IPv4 and IPv6, similarly to e.g. a number of
TCP variables. Like those, the multipath hash seed is a per-netns
variable.
The new sysctl is added with permissions 0600 so that the hash is
only readable and writable by root.
The seed used by flow_hash_from_keys() is a 128-bit quantity.
However it seems that usually the seed is a much more modest value.
32 bits seem typical (Cisco, Cumulus), some systems go even lower.
For that reason, and to decouple the user interface from
implementation details, go with a 32-bit quantity, which is then
quadruplicated to form the siphash key.
One example of use of this interface is avoiding hash polarization,
where two ECMP routers, one behind the other, happen to make consistent
hashing decisions, and as a result, part of the ECMP space of the latter
router is never used. Another is a load balancer where several machines
forward traffic to one of a number of leaves, and the forwarding
decisions need to be made consistently. (This is a case of a desired
hash polarization, mentioned e.g. in chapter 6.3 of [0].)
There has already been a proposal to include a hash seed control
interface in the past[1]. This patchset uses broadly the same ideas, but
limits the externally visible seed size to 32 bits.
- Patches #1-#2 contain the substance of the work
- Patch #3 is a mlxsw offload
- Patch #4 is a selftest
[0] https://www.usenix.org/system/files/conference/nsdi18/nsdi18-araujo.pdf
[1] https://lore.kernel.org/netdev/YIlVpYMCn%2F8WfE1P@rnd/
Petr Machata (4):
net: ipv4,ipv6: Pass multipath hash computation through a helper
net: ipv4: Add a sysctl to set multipath hash seed
mlxsw: spectrum_router: Apply user-defined multipath hash seed
selftests: forwarding: router_mpath_hash: Add a new selftest
Documentation/networking/ip-sysctl.rst | 10 +
.../ethernet/mellanox/mlxsw/spectrum_router.c | 14 +-
include/net/flow_dissector.h | 2 +
include/net/ip_fib.h | 24 ++
include/net/netns/ipv4.h | 10 +
net/core/flow_dissector.c | 7 +
net/ipv4/route.c | 12 +-
net/ipv4/sysctl_net_ipv4.c | 82 +++++
net/ipv6/route.c | 12 +-
.../testing/selftests/net/forwarding/Makefile | 1 +
.../net/forwarding/router_mpath_seed.sh | 322 ++++++++++++++++++
11 files changed, 482 insertions(+), 14 deletions(-)
create mode 100755 tools/testing/selftests/net/forwarding/router_mpath_seed.sh
--
2.45.0
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH net-next 1/4] net: ipv4,ipv6: Pass multipath hash computation through a helper
2024-05-29 11:18 [PATCH net-next 0/4] Allow configuration of multipath hash seed Petr Machata
@ 2024-05-29 11:18 ` Petr Machata
2024-05-29 11:18 ` [PATCH net-next 2/4] net: ipv4: Add a sysctl to set multipath hash seed Petr Machata
` (3 subsequent siblings)
4 siblings, 0 replies; 22+ messages in thread
From: Petr Machata @ 2024-05-29 11:18 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
netdev
Cc: Ido Schimmel, Petr Machata, David Ahern
The following patches will add a sysctl to control multipath hash seed. On
a system where the seed is not set by hand, the algorithm will need to fall
back to the default system-wide flow hash. In order to centralize this
dispatch, add a helper, fib_multipath_hash_from_keys(), and have all IPv4
and IPv6 route.c invocations of flow_hash_from_keys() go through this
helper instead.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
---
include/net/ip_fib.h | 7 +++++++
net/ipv4/route.c | 12 ++++++------
net/ipv6/route.c | 12 ++++++------
3 files changed, 19 insertions(+), 12 deletions(-)
diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 9b2f69ba5e49..b8b3c07e8f7b 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -521,6 +521,13 @@ void fib_nhc_update_mtu(struct fib_nh_common *nhc, u32 new, u32 orig);
int fib_multipath_hash(const struct net *net, const struct flowi4 *fl4,
const struct sk_buff *skb, struct flow_keys *flkeys);
#endif
+
+static inline u32 fib_multipath_hash_from_keys(const struct net *net,
+ struct flow_keys *keys)
+{
+ return flow_hash_from_keys(keys);
+}
+
int fib_check_nh(struct net *net, struct fib_nh *nh, u32 table, u8 scope,
struct netlink_ext_ack *extack);
void fib_select_multipath(struct fib_result *res, int hash);
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 5fd54103174f..daaccfb37802 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1930,7 +1930,7 @@ static u32 fib_multipath_custom_hash_outer(const struct net *net,
hash_keys.ports.dst = keys.ports.dst;
*p_has_inner = !!(keys.control.flags & FLOW_DIS_ENCAPSULATION);
- return flow_hash_from_keys(&hash_keys);
+ return fib_multipath_hash_from_keys(net, &hash_keys);
}
static u32 fib_multipath_custom_hash_inner(const struct net *net,
@@ -1979,7 +1979,7 @@ static u32 fib_multipath_custom_hash_inner(const struct net *net,
if (hash_fields & FIB_MULTIPATH_HASH_FIELD_INNER_DST_PORT)
hash_keys.ports.dst = keys.ports.dst;
- return flow_hash_from_keys(&hash_keys);
+ return fib_multipath_hash_from_keys(net, &hash_keys);
}
static u32 fib_multipath_custom_hash_skb(const struct net *net,
@@ -2016,7 +2016,7 @@ static u32 fib_multipath_custom_hash_fl4(const struct net *net,
if (hash_fields & FIB_MULTIPATH_HASH_FIELD_DST_PORT)
hash_keys.ports.dst = fl4->fl4_dport;
- return flow_hash_from_keys(&hash_keys);
+ return fib_multipath_hash_from_keys(net, &hash_keys);
}
/* if skb is set it will be used and fl4 can be NULL */
@@ -2037,7 +2037,7 @@ int fib_multipath_hash(const struct net *net, const struct flowi4 *fl4,
hash_keys.addrs.v4addrs.src = fl4->saddr;
hash_keys.addrs.v4addrs.dst = fl4->daddr;
}
- mhash = flow_hash_from_keys(&hash_keys);
+ mhash = fib_multipath_hash_from_keys(net, &hash_keys);
break;
case 1:
/* skb is currently provided only when forwarding */
@@ -2071,7 +2071,7 @@ int fib_multipath_hash(const struct net *net, const struct flowi4 *fl4,
hash_keys.ports.dst = fl4->fl4_dport;
hash_keys.basic.ip_proto = fl4->flowi4_proto;
}
- mhash = flow_hash_from_keys(&hash_keys);
+ mhash = fib_multipath_hash_from_keys(net, &hash_keys);
break;
case 2:
memset(&hash_keys, 0, sizeof(hash_keys));
@@ -2102,7 +2102,7 @@ int fib_multipath_hash(const struct net *net, const struct flowi4 *fl4,
hash_keys.addrs.v4addrs.src = fl4->saddr;
hash_keys.addrs.v4addrs.dst = fl4->daddr;
}
- mhash = flow_hash_from_keys(&hash_keys);
+ mhash = fib_multipath_hash_from_keys(net, &hash_keys);
break;
case 3:
if (skb)
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index bbc2a0dd9314..9d561b9f0f75 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -2372,7 +2372,7 @@ static u32 rt6_multipath_custom_hash_outer(const struct net *net,
hash_keys.ports.dst = keys.ports.dst;
*p_has_inner = !!(keys.control.flags & FLOW_DIS_ENCAPSULATION);
- return flow_hash_from_keys(&hash_keys);
+ return fib_multipath_hash_from_keys(net, &hash_keys);
}
static u32 rt6_multipath_custom_hash_inner(const struct net *net,
@@ -2421,7 +2421,7 @@ static u32 rt6_multipath_custom_hash_inner(const struct net *net,
if (hash_fields & FIB_MULTIPATH_HASH_FIELD_INNER_DST_PORT)
hash_keys.ports.dst = keys.ports.dst;
- return flow_hash_from_keys(&hash_keys);
+ return fib_multipath_hash_from_keys(net, &hash_keys);
}
static u32 rt6_multipath_custom_hash_skb(const struct net *net,
@@ -2460,7 +2460,7 @@ static u32 rt6_multipath_custom_hash_fl6(const struct net *net,
if (hash_fields & FIB_MULTIPATH_HASH_FIELD_DST_PORT)
hash_keys.ports.dst = fl6->fl6_dport;
- return flow_hash_from_keys(&hash_keys);
+ return fib_multipath_hash_from_keys(net, &hash_keys);
}
/* if skb is set it will be used and fl6 can be NULL */
@@ -2482,7 +2482,7 @@ u32 rt6_multipath_hash(const struct net *net, const struct flowi6 *fl6,
hash_keys.tags.flow_label = (__force u32)flowi6_get_flowlabel(fl6);
hash_keys.basic.ip_proto = fl6->flowi6_proto;
}
- mhash = flow_hash_from_keys(&hash_keys);
+ mhash = fib_multipath_hash_from_keys(net, &hash_keys);
break;
case 1:
if (skb) {
@@ -2514,7 +2514,7 @@ u32 rt6_multipath_hash(const struct net *net, const struct flowi6 *fl6,
hash_keys.ports.dst = fl6->fl6_dport;
hash_keys.basic.ip_proto = fl6->flowi6_proto;
}
- mhash = flow_hash_from_keys(&hash_keys);
+ mhash = fib_multipath_hash_from_keys(net, &hash_keys);
break;
case 2:
memset(&hash_keys, 0, sizeof(hash_keys));
@@ -2551,7 +2551,7 @@ u32 rt6_multipath_hash(const struct net *net, const struct flowi6 *fl6,
hash_keys.tags.flow_label = (__force u32)flowi6_get_flowlabel(fl6);
hash_keys.basic.ip_proto = fl6->flowi6_proto;
}
- mhash = flow_hash_from_keys(&hash_keys);
+ mhash = fib_multipath_hash_from_keys(net, &hash_keys);
break;
case 3:
if (skb)
--
2.45.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH net-next 2/4] net: ipv4: Add a sysctl to set multipath hash seed
2024-05-29 11:18 [PATCH net-next 0/4] Allow configuration of multipath hash seed Petr Machata
2024-05-29 11:18 ` [PATCH net-next 1/4] net: ipv4,ipv6: Pass multipath hash computation through a helper Petr Machata
@ 2024-05-29 11:18 ` Petr Machata
2024-05-31 1:00 ` Jakub Kicinski
2024-06-01 8:46 ` Eric Dumazet
2024-05-29 11:18 ` [PATCH net-next 3/4] mlxsw: spectrum_router: Apply user-defined " Petr Machata
` (2 subsequent siblings)
4 siblings, 2 replies; 22+ messages in thread
From: Petr Machata @ 2024-05-29 11:18 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
netdev
Cc: Ido Schimmel, Petr Machata, David Ahern, Jonathan Corbet,
linux-doc, Simon Horman
When calculating hashes for the purpose of multipath forwarding, both IPv4
and IPv6 code currently fall back on flow_hash_from_keys(). That uses a
randomly-generated seed. That's a fine choice by default, but unfortunately
some deployments may need a tighter control over the seed used.
In this patch, make the seed configurable by adding a new sysctl key,
net.ipv4.fib_multipath_hash_seed to control the seed. This seed is used
specifically for multipath forwarding and not for the other concerns that
flow_hash_from_keys() is used for, such as queue selection. Expose the knob
as sysctl because other such settings, such as headers to hash, are also
handled that way. Like those, the multipath hash seed is a per-netns
variable.
Despite being placed in the net.ipv4 namespace, the multipath seed sysctl
is used for both IPv4 and IPv6, similarly to e.g. a number of TCP
variables.
The seed used by flow_hash_from_keys() is a 128-bit quantity. However it
seems that usually the seed is a much more modest value. 32 bits seem
typical (Cisco, Cumulus), some systems go even lower. For that reason, and
to decouple the user interface from implementation details, go with a
32-bit quantity, which is then quadruplicated to form the siphash key.
For locking, use RTNL instead of a custom lock. This based on feedback
given to a patch from Pavel Balaev, which also aimed to introduce multipath
hash seed control [0].
[0] https://lore.kernel.org/netdev/20210413.161521.2301224176572441397.davem@davemloft.net/
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
---
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org
Cc: Simon Horman <horms@kernel.org>
Documentation/networking/ip-sysctl.rst | 10 ++++
include/net/flow_dissector.h | 2 +
include/net/ip_fib.h | 19 +++++-
include/net/netns/ipv4.h | 10 ++++
net/core/flow_dissector.c | 7 +++
net/ipv4/sysctl_net_ipv4.c | 82 ++++++++++++++++++++++++++
6 files changed, 129 insertions(+), 1 deletion(-)
diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst
index bd50df6a5a42..afcf3f323965 100644
--- a/Documentation/networking/ip-sysctl.rst
+++ b/Documentation/networking/ip-sysctl.rst
@@ -131,6 +131,16 @@ fib_multipath_hash_fields - UNSIGNED INTEGER
Default: 0x0007 (source IP, destination IP and IP protocol)
+fib_multipath_hash_seed - UNSIGNED INTEGER
+ The seed value used when calculating hash for multipath routes. Applies
+ to both IPv4 and IPv6 datapath. Only valid for kernels built with
+ CONFIG_IP_ROUTE_MULTIPATH enabled.
+
+ When set to 0, the seed value used for multipath routing defaults to an
+ internal random-generated one.
+
+ Default: 0 (random)
+
fib_sync_mem - UNSIGNED INTEGER
Amount of dirty memory from fib entries that can be backlogged before
synchronize_rcu is forced.
diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h
index 9ab376d1a677..a5423219dee1 100644
--- a/include/net/flow_dissector.h
+++ b/include/net/flow_dissector.h
@@ -433,6 +433,8 @@ static inline bool flow_keys_have_l4(const struct flow_keys *keys)
}
u32 flow_hash_from_keys(struct flow_keys *keys);
+u32 flow_hash_from_keys_seed(struct flow_keys *keys,
+ const siphash_key_t *keyval);
void skb_flow_get_icmp_tci(const struct sk_buff *skb,
struct flow_dissector_key_icmp *key_icmp,
const void *data, int thoff, int hlen);
diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index b8b3c07e8f7b..785c571e2cef 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -520,13 +520,30 @@ void fib_nhc_update_mtu(struct fib_nh_common *nhc, u32 new, u32 orig);
#ifdef CONFIG_IP_ROUTE_MULTIPATH
int fib_multipath_hash(const struct net *net, const struct flowi4 *fl4,
const struct sk_buff *skb, struct flow_keys *flkeys);
-#endif
+static inline u32 fib_multipath_hash_from_keys(const struct net *net,
+ struct flow_keys *keys)
+{
+ struct sysctl_fib_multipath_hash_seed *mphs;
+ u32 ret;
+
+ rcu_read_lock();
+ mphs = rcu_dereference(net->ipv4.sysctl_fib_multipath_hash_seed);
+ if (likely(!mphs))
+ ret = flow_hash_from_keys(keys);
+ else
+ ret = flow_hash_from_keys_seed(keys, &mphs->seed);
+ rcu_read_unlock();
+
+ return ret;
+}
+#else
static inline u32 fib_multipath_hash_from_keys(const struct net *net,
struct flow_keys *keys)
{
return flow_hash_from_keys(keys);
}
+#endif
int fib_check_nh(struct net *net, struct fib_nh *nh, u32 table, u8 scope,
struct netlink_ext_ack *extack);
diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index c356c458b340..1f5043d32cb0 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -40,6 +40,14 @@ struct inet_timewait_death_row {
struct tcp_fastopen_context;
+#ifdef CONFIG_IP_ROUTE_MULTIPATH
+struct sysctl_fib_multipath_hash_seed {
+ siphash_aligned_key_t seed;
+ u32 user_seed;
+ struct rcu_head rcu;
+};
+#endif
+
struct netns_ipv4 {
/* Cacheline organization can be found documented in
* Documentation/networking/net_cachelines/netns_ipv4_sysctl.rst.
@@ -245,6 +253,8 @@ struct netns_ipv4 {
#endif
#endif
#ifdef CONFIG_IP_ROUTE_MULTIPATH
+ struct sysctl_fib_multipath_hash_seed
+ __rcu *sysctl_fib_multipath_hash_seed;
u32 sysctl_fib_multipath_hash_fields;
u8 sysctl_fib_multipath_use_neigh;
u8 sysctl_fib_multipath_hash_policy;
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index f82e9a7d3b37..7b3283ad5b39 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -1792,6 +1792,13 @@ u32 flow_hash_from_keys(struct flow_keys *keys)
}
EXPORT_SYMBOL(flow_hash_from_keys);
+u32 flow_hash_from_keys_seed(struct flow_keys *keys,
+ const siphash_key_t *keyval)
+{
+ return __flow_hash_from_keys(keys, keyval);
+}
+EXPORT_SYMBOL(flow_hash_from_keys_seed);
+
static inline u32 ___skb_get_hash(const struct sk_buff *skb,
struct flow_keys *keys,
const siphash_key_t *keyval)
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index d7892f34a15b..18fae2bf881c 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -464,6 +464,72 @@ static int proc_fib_multipath_hash_fields(struct ctl_table *table, int write,
return ret;
}
+
+static void
+proc_fib_multipath_hash_construct_seed(u32 user_seed, siphash_key_t *key)
+{
+ u64 user_seed_64 = user_seed;
+
+ key->key[0] = (user_seed_64 << 32) | user_seed_64;
+ key->key[1] = key->key[0];
+}
+
+static int proc_fib_multipath_hash_set_seed(struct net *net, u32 user_seed)
+{
+ struct sysctl_fib_multipath_hash_seed *mphs = NULL;
+ struct sysctl_fib_multipath_hash_seed *old;
+
+ if (user_seed) {
+ mphs = kzalloc(sizeof(*mphs), GFP_KERNEL);
+ if (!mphs)
+ return -ENOMEM;
+
+ mphs->user_seed = user_seed;
+ proc_fib_multipath_hash_construct_seed(user_seed, &mphs->seed);
+ }
+
+ rtnl_lock();
+ old = rcu_replace_pointer_rtnl(net->ipv4.sysctl_fib_multipath_hash_seed,
+ mphs);
+ rtnl_unlock();
+
+ if (old)
+ kfree_rcu(old, rcu);
+
+ return 0;
+}
+
+static int proc_fib_multipath_hash_seed(struct ctl_table *table, int write,
+ void *buffer, size_t *lenp,
+ loff_t *ppos)
+{
+ struct sysctl_fib_multipath_hash_seed *mphs;
+ struct net *net = table->data;
+ struct ctl_table tmp;
+ u32 user_seed = 0;
+ int ret;
+
+ rcu_read_lock();
+ mphs = rcu_dereference(net->ipv4.sysctl_fib_multipath_hash_seed);
+ if (mphs)
+ user_seed = mphs->user_seed;
+ rcu_read_unlock();
+
+ tmp = *table;
+ tmp.data = &user_seed;
+
+ ret = proc_douintvec_minmax(&tmp, write, buffer, lenp, ppos);
+
+ if (write && ret == 0) {
+ ret = proc_fib_multipath_hash_set_seed(net, user_seed);
+ if (ret)
+ return ret;
+
+ call_netevent_notifiers(NETEVENT_IPV4_MPATH_HASH_UPDATE, net);
+ }
+
+ return ret;
+}
#endif
static struct ctl_table ipv4_table[] = {
@@ -1072,6 +1138,13 @@ static struct ctl_table ipv4_net_table[] = {
.extra1 = SYSCTL_ONE,
.extra2 = &fib_multipath_hash_fields_all_mask,
},
+ {
+ .procname = "fib_multipath_hash_seed",
+ .data = &init_net,
+ .maxlen = sizeof(u32),
+ .mode = 0644,
+ .proc_handler = proc_fib_multipath_hash_seed,
+ },
#endif
{
.procname = "ip_unprivileged_port_start",
@@ -1557,6 +1630,15 @@ static __net_exit void ipv4_sysctl_exit_net(struct net *net)
{
const struct ctl_table *table;
+#ifdef CONFIG_IP_ROUTE_MULTIPATH
+ {
+ struct sysctl_fib_multipath_hash_seed *mphs;
+
+ mphs = rcu_dereference_raw(net->ipv4.sysctl_fib_multipath_hash_seed);
+ kfree(mphs);
+ }
+#endif
+
kfree(net->ipv4.sysctl_local_reserved_ports);
table = net->ipv4.ipv4_hdr->ctl_table_arg;
unregister_net_sysctl_table(net->ipv4.ipv4_hdr);
--
2.45.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH net-next 3/4] mlxsw: spectrum_router: Apply user-defined multipath hash seed
2024-05-29 11:18 [PATCH net-next 0/4] Allow configuration of multipath hash seed Petr Machata
2024-05-29 11:18 ` [PATCH net-next 1/4] net: ipv4,ipv6: Pass multipath hash computation through a helper Petr Machata
2024-05-29 11:18 ` [PATCH net-next 2/4] net: ipv4: Add a sysctl to set multipath hash seed Petr Machata
@ 2024-05-29 11:18 ` Petr Machata
2024-05-29 11:18 ` [PATCH net-next 4/4] selftests: forwarding: router_mpath_hash: Add a new selftest Petr Machata
2024-05-29 19:57 ` [PATCH net-next 0/4] Allow configuration of multipath hash seed Nikolay Aleksandrov
4 siblings, 0 replies; 22+ messages in thread
From: Petr Machata @ 2024-05-29 11:18 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
netdev
Cc: Ido Schimmel, Petr Machata, David Ahern
When Spectrum machines compute hash for the purposes of ECMP routing, they
use a seed specified through RECR_v2 (Router ECMP Configuration Register).
Up until now mlxsw computed the seed by hashing the machine's base MAC.
Now that we can optionally have a user-provided seed, use that if possible.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
---
.../net/ethernet/mellanox/mlxsw/spectrum_router.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
index 40ba314fbc72..e5b669f0822d 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
@@ -11449,13 +11449,23 @@ static int mlxsw_sp_mp_hash_parsing_depth_adjust(struct mlxsw_sp *mlxsw_sp,
static int mlxsw_sp_mp_hash_init(struct mlxsw_sp *mlxsw_sp)
{
bool old_inc_parsing_depth, new_inc_parsing_depth;
+ struct sysctl_fib_multipath_hash_seed *mphs;
struct mlxsw_sp_mp_hash_config config = {};
+ struct net *net = mlxsw_sp_net(mlxsw_sp);
char recr2_pl[MLXSW_REG_RECR2_LEN];
unsigned long bit;
- u32 seed;
+ u32 seed = 0;
int err;
- seed = jhash(mlxsw_sp->base_mac, sizeof(mlxsw_sp->base_mac), 0);
+ rcu_read_lock();
+ mphs = rcu_dereference(net->ipv4.sysctl_fib_multipath_hash_seed);
+ if (mphs)
+ seed = mphs->user_seed;
+ rcu_read_unlock();
+
+ if (!seed)
+ seed = jhash(mlxsw_sp->base_mac, sizeof(mlxsw_sp->base_mac), 0);
+
mlxsw_reg_recr2_pack(recr2_pl, seed);
mlxsw_sp_mp4_hash_init(mlxsw_sp, &config);
mlxsw_sp_mp6_hash_init(mlxsw_sp, &config);
--
2.45.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH net-next 4/4] selftests: forwarding: router_mpath_hash: Add a new selftest
2024-05-29 11:18 [PATCH net-next 0/4] Allow configuration of multipath hash seed Petr Machata
` (2 preceding siblings ...)
2024-05-29 11:18 ` [PATCH net-next 3/4] mlxsw: spectrum_router: Apply user-defined " Petr Machata
@ 2024-05-29 11:18 ` Petr Machata
2024-05-29 19:57 ` [PATCH net-next 0/4] Allow configuration of multipath hash seed Nikolay Aleksandrov
4 siblings, 0 replies; 22+ messages in thread
From: Petr Machata @ 2024-05-29 11:18 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
netdev
Cc: Ido Schimmel, Petr Machata, David Ahern, Shuah Khan,
linux-kselftest
Add a selftest that exercises the sysctl added in the previous patches.
Test that set/get works as expected; that across seeds we eventually hit
all NHs (test_mpath_seed_*); and that a given seed keeps hitting the same
NHs even across seed changes (test_mpath_seed_stability_*).
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
---
Cc: Shuah Khan <shuah@kernel.org>
Cc: linux-kselftest@vger.kernel.org
.../testing/selftests/net/forwarding/Makefile | 1 +
.../net/forwarding/router_mpath_seed.sh | 322 ++++++++++++++++++
2 files changed, 323 insertions(+)
create mode 100755 tools/testing/selftests/net/forwarding/router_mpath_seed.sh
diff --git a/tools/testing/selftests/net/forwarding/Makefile b/tools/testing/selftests/net/forwarding/Makefile
index fa7b59ff4029..99576d7ecbf6 100644
--- a/tools/testing/selftests/net/forwarding/Makefile
+++ b/tools/testing/selftests/net/forwarding/Makefile
@@ -70,6 +70,7 @@ TEST_PROGS = bridge_fdb_learning_limit.sh \
router_broadcast.sh \
router_mpath_nh_res.sh \
router_mpath_nh.sh \
+ router_mpath_seed.sh \
router_multicast.sh \
router_multipath.sh \
router_nh.sh \
diff --git a/tools/testing/selftests/net/forwarding/router_mpath_seed.sh b/tools/testing/selftests/net/forwarding/router_mpath_seed.sh
new file mode 100755
index 000000000000..0ef3687da8b2
--- /dev/null
+++ b/tools/testing/selftests/net/forwarding/router_mpath_seed.sh
@@ -0,0 +1,322 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+# +-------------------------+ +-------------------------+
+# | H1 | | H2 |
+# | $h1 + | | + $h2 |
+# | 192.0.2.1/28 | | | | 192.0.2.34/28 |
+# | 2001:db8:1::1/64 | | | | 2001:db8:3::2/64 |
+# +-------------------|-----+ +-|-----------------------+
+# | |
+# +-------------------|-----+ +-|-----------------------+
+# | R1 | | | | R2 |
+# | $rp11 + | | + $rp21 |
+# | 192.0.2.2/28 | | 192.0.2.33/28 |
+# | 2001:db8:1::2/64 | | 2001:db8:3::1/64 |
+# | | | |
+# | $rp12 + | | + $rp22 |
+# | 192.0.2.17/28 | | | | 192.0.2.18..27/28 |
+# | 2001:db8:2::17/64 | | | | 2001:db8:2::18..27/64 |
+# +-------------------|-----+ +-|-----------------------+
+# | |
+# `----------'
+
+ALL_TESTS="
+ ping_ipv4
+ ping_ipv6
+ test_mpath_seed_get
+ test_mpath_seed_ipv4
+ test_mpath_seed_ipv6
+ test_mpath_seed_stability_ipv4
+ test_mpath_seed_stability_ipv6
+"
+NUM_NETIFS=6
+source lib.sh
+
+h1_create()
+{
+ simple_if_init $h1 192.0.2.1/28 2001:db8:1::1/64
+ ip -4 route add 192.0.2.32/28 vrf v$h1 nexthop via 192.0.2.2
+ ip -6 route add 2001:db8:3::/64 vrf v$h1 nexthop via 2001:db8:1::2
+}
+
+h1_destroy()
+{
+ ip -6 route del 2001:db8:3::/64 vrf v$h1 nexthop via 2001:db8:1::2
+ ip -4 route del 192.0.2.32/28 vrf v$h1 nexthop via 192.0.2.2
+ simple_if_fini $h1 192.0.2.1/28 2001:db8:1::1/64
+}
+
+h2_create()
+{
+ simple_if_init $h2 192.0.2.34/28 2001:db8:3::2/64
+ ip -4 route add 192.0.2.0/28 vrf v$h2 nexthop via 192.0.2.33
+ ip -6 route add 2001:db8:1::/64 vrf v$h2 nexthop via 2001:db8:3::1
+}
+
+h2_destroy()
+{
+ ip -6 route del 2001:db8:1::/64 vrf v$h2 nexthop via 2001:db8:3::1
+ ip -4 route del 192.0.2.0/28 vrf v$h2 nexthop via 192.0.2.33
+ simple_if_fini $h2 192.0.2.34/28 2001:db8:3::2/64
+}
+
+router1_create()
+{
+ simple_if_init $rp11 192.0.2.2/28 2001:db8:1::2/64
+ __simple_if_init $rp12 v$rp11 192.0.2.17/28 2001:db8:2::17/64
+}
+
+router1_destroy()
+{
+ __simple_if_fini $rp12 192.0.2.17/28 2001:db8:2::17/64
+ simple_if_fini $rp11 192.0.2.2/28 2001:db8:1::2/64
+}
+
+router2_create()
+{
+ simple_if_init $rp21 192.0.2.33/28 2001:db8:3::1/64
+ __simple_if_init $rp22 v$rp21 192.0.2.18/28 2001:db8:2::18/64
+ ip -4 route add 192.0.2.0/28 vrf v$rp21 nexthop via 192.0.2.17
+ ip -6 route add 2001:db8:1::/64 vrf v$rp21 nexthop via 2001:db8:2::17
+}
+
+router2_destroy()
+{
+ ip -6 route del 2001:db8:1::/64 vrf v$rp21 nexthop via 2001:db8:2::17
+ ip -4 route del 192.0.2.0/28 vrf v$rp21 nexthop via 192.0.2.17
+ __simple_if_fini $rp22 192.0.2.18/28 2001:db8:2::18/64
+ simple_if_fini $rp21 192.0.2.33/28 2001:db8:3::1/64
+}
+
+nexthops_create()
+{
+ local i
+ for i in $(seq 10); do
+ ip nexthop add id $((1000 + i)) via 192.0.2.18 dev $rp12
+ ip nexthop add id $((2000 + i)) via 2001:db8:2::18 dev $rp12
+ done
+
+ ip nexthop add id 1000 group $(seq -s / 1001 1010) hw_stats on
+ ip nexthop add id 2000 group $(seq -s / 2001 2010) hw_stats on
+ ip -4 route add 192.0.2.32/28 vrf v$rp11 nhid 1000
+ ip -6 route add 2001:db8:3::/64 vrf v$rp11 nhid 2000
+}
+
+nexthops_destroy()
+{
+ local i
+
+ ip -6 route del 2001:db8:3::/64 vrf v$rp11 nhid 2000
+ ip -4 route del 192.0.2.32/28 vrf v$rp11 nhid 1000
+ ip nexthop del id 2000
+ ip nexthop del id 1000
+
+ for i in $(seq 10 -1 1); do
+ ip nexthop del id $((2000 + i))
+ ip nexthop del id $((1000 + i))
+ done
+}
+
+setup_prepare()
+{
+ h1=${NETIFS[p1]}
+ rp11=${NETIFS[p2]}
+
+ rp12=${NETIFS[p3]}
+ rp22=${NETIFS[p4]}
+
+ rp21=${NETIFS[p5]}
+ h2=${NETIFS[p6]}
+
+ sysctl_set net.ipv4.fib_multipath_hash_seed 0
+
+ vrf_prepare
+
+ h1_create
+ h2_create
+ router1_create
+ router2_create
+
+ forwarding_enable
+}
+
+cleanup()
+{
+ pre_cleanup
+
+ forwarding_restore
+
+ nexthops_destroy
+ router2_destroy
+ router1_destroy
+ h2_destroy
+ h1_destroy
+
+ vrf_cleanup
+
+ sysctl_restore net.ipv4.fib_multipath_hash_seed
+}
+
+ping_ipv4()
+{
+ ping_test $h1 192.0.2.34
+}
+
+ping_ipv6()
+{
+ ping6_test $h1 2001:db8:3::2
+}
+
+test_mpath_seed_get()
+{
+ RET=0
+
+ local i
+ for ((i = 0; i < 100; i++)); do
+ local seed_w=$((999331 * i))
+ sysctl -qw net.ipv4.fib_multipath_hash_seed=$seed_w
+ local seed_r=$(sysctl -n net.ipv4.fib_multipath_hash_seed)
+ ((seed_r == seed_w))
+ check_err $? "mpath seed written as $seed_w, but read as $seed_r"
+ done
+
+ log_test "mpath seed set/get"
+}
+
+nh_stats_snapshot()
+{
+ local group_id=$1; shift
+
+ ip -j -s -s nexthop show id $group_id |
+ jq -c '[.[].group_stats | sort_by(.id) | .[].packets]'
+}
+
+get_active_nh()
+{
+ local s0=$1; shift
+ local s1=$1; shift
+
+ jq -n --argjson s0 "$s0" --argjson s1 "$s1" -f /dev/stdin <<-"EOF"
+ [range($s0 | length)] |
+ map($s1[.] - $s0[.]) |
+ map(if . > 8 then 1 else 0 end) |
+ index(1)
+ EOF
+}
+
+probe_seed()
+{
+ local group_id=$1; shift
+ local seed=$1; shift
+ local -a mz=("$@")
+
+ sysctl -qw net.ipv4.fib_multipath_hash_seed=$seed
+
+ local s0=$(nh_stats_snapshot $group_id)
+ "${mz[@]}"
+ local s1=$(nh_stats_snapshot $group_id)
+
+ get_active_nh "$s0" "$s1"
+}
+
+test_mpath_seed()
+{
+ local group_id=$1; shift
+ local what=$1; shift
+ local -a mz=("$@")
+ local ii
+
+ RET=0
+
+ local -a tally=(0 0 0 0 0 0 0 0 0 0)
+ for ((ii = 0; ii < 100; ii++)); do
+ local act=$(probe_seed $group_id $((999331 * ii)) "${mz[@]}")
+ ((tally[act]++))
+ done
+
+ local tally_str="${tally[@]}"
+ for ((ii = 0; ii < ${#tally[@]}; ii++)); do
+ ((tally[ii] > 0))
+ check_err $? "NH #$ii not hit, tally='$tally_str'"
+ done
+
+ log_test "mpath seed $what"
+ sysctl -qw net.ipv4.fib_multipath_hash_seed=0
+}
+
+test_mpath_seed_ipv4()
+{
+ test_mpath_seed 1000 IPv4 \
+ $MZ $h1 -A 192.0.2.1 -B 192.0.2.34 -q \
+ -p 64 -d 0 -c 10 -t udp
+}
+
+test_mpath_seed_ipv6()
+{
+ test_mpath_seed 2000 IPv6 \
+ $MZ -6 $h1 -A 2001:db8:1::1 -B 2001:db8:3::2 -q \
+ -p 64 -d 0 -c 10 -t udp
+}
+
+check_mpath_seed_stability()
+{
+ local seed=$1; shift
+ local act_0=$1; shift
+ local act_1=$1; shift
+
+ ((act_0 == act_1))
+ check_err $? "seed $seed: active NH moved from $act_0 to $act_1 after seed change"
+}
+
+test_mpath_seed_stability()
+{
+ local group_id=$1; shift
+ local what=$1; shift
+ local -a mz=("$@")
+
+ RET=0
+
+ local seed_0=0
+ local seed_1=3221338814
+ local seed_2=3735928559
+
+ local act_0_0=$(probe_seed $group_id $seed_0 "${mz[@]}")
+ local act_1_0=$(probe_seed $group_id $seed_1 "${mz[@]}")
+ local act_2_0=$(probe_seed $group_id $seed_2 "${mz[@]}")
+
+ local act_0_1=$(probe_seed $group_id $seed_0 "${mz[@]}")
+ local act_1_1=$(probe_seed $group_id $seed_1 "${mz[@]}")
+ local act_2_1=$(probe_seed $group_id $seed_2 "${mz[@]}")
+
+ check_mpath_seed_stability $seed_0 $act_0_0 $act_0_1
+ check_mpath_seed_stability $seed_1 $act_1_0 $act_1_1
+ check_mpath_seed_stability $seed_2 $act_2_0 $act_2_1
+
+ log_test "mpath seed stability $what"
+ sysctl -qw net.ipv4.fib_multipath_hash_seed=0
+}
+
+test_mpath_seed_stability_ipv4()
+{
+ test_mpath_seed_stability 1000 IPv4 \
+ $MZ $h1 -A 192.0.2.1 -B 192.0.2.34 -q \
+ -p 64 -d 0 -c 10 -t udp
+}
+
+test_mpath_seed_stability_ipv6()
+{
+ test_mpath_seed_stability 2000 IPv6 \
+ $MZ -6 $h1 -A 2001:db8:1::1 -B 2001:db8:3::2 -q \
+ -p 64 -d 0 -c 10 -t udp
+}
+
+trap cleanup EXIT
+
+setup_prepare
+setup_wait
+nexthops_create
+
+tests_run
+
+exit $EXIT_STATUS
--
2.45.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH net-next 0/4] Allow configuration of multipath hash seed
2024-05-29 11:18 [PATCH net-next 0/4] Allow configuration of multipath hash seed Petr Machata
` (3 preceding siblings ...)
2024-05-29 11:18 ` [PATCH net-next 4/4] selftests: forwarding: router_mpath_hash: Add a new selftest Petr Machata
@ 2024-05-29 19:57 ` Nikolay Aleksandrov
2024-05-30 15:25 ` Petr Machata
4 siblings, 1 reply; 22+ messages in thread
From: Nikolay Aleksandrov @ 2024-05-29 19:57 UTC (permalink / raw)
To: Petr Machata, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, netdev
Cc: Ido Schimmel, David Ahern
On 5/29/24 14:18, Petr Machata wrote:
> Let me just quote the commit message of patch #2 here to inform the
> motivation and some of the implementation:
>
> When calculating hashes for the purpose of multipath forwarding,
> both IPv4 and IPv6 code currently fall back on
> flow_hash_from_keys(). That uses a randomly-generated seed. That's a
> fine choice by default, but unfortunately some deployments may need
> a tighter control over the seed used.
>
> In this patchset, make the seed configurable by adding a new sysctl
> key, net.ipv4.fib_multipath_hash_seed to control the seed. This seed
> is used specifically for multipath forwarding and not for the other
> concerns that flow_hash_from_keys() is used for, such as queue
> selection. Expose the knob as sysctl because other such settings,
> such as headers to hash, are also handled that way.
>
> Despite being placed in the net.ipv4 namespace, the multipath seed
> sysctl is used for both IPv4 and IPv6, similarly to e.g. a number of
> TCP variables. Like those, the multipath hash seed is a per-netns
> variable.
>
> The new sysctl is added with permissions 0600 so that the hash is
> only readable and writable by root.
>
> The seed used by flow_hash_from_keys() is a 128-bit quantity.
> However it seems that usually the seed is a much more modest value.
> 32 bits seem typical (Cisco, Cumulus), some systems go even lower.
> For that reason, and to decouple the user interface from
> implementation details, go with a 32-bit quantity, which is then
> quadruplicated to form the siphash key.
>
> One example of use of this interface is avoiding hash polarization,
> where two ECMP routers, one behind the other, happen to make consistent
> hashing decisions, and as a result, part of the ECMP space of the latter
> router is never used. Another is a load balancer where several machines
> forward traffic to one of a number of leaves, and the forwarding
> decisions need to be made consistently. (This is a case of a desired
> hash polarization, mentioned e.g. in chapter 6.3 of [0].)
>
> There has already been a proposal to include a hash seed control
> interface in the past[1]. This patchset uses broadly the same ideas, but
> limits the externally visible seed size to 32 bits.
>
> - Patches #1-#2 contain the substance of the work
> - Patch #3 is a mlxsw offload
> - Patch #4 is a selftest
>
> [0] https://www.usenix.org/system/files/conference/nsdi18/nsdi18-araujo.pdf
> [1] https://lore.kernel.org/netdev/YIlVpYMCn%2F8WfE1P@rnd/
>
> Petr Machata (4):
> net: ipv4,ipv6: Pass multipath hash computation through a helper
> net: ipv4: Add a sysctl to set multipath hash seed
> mlxsw: spectrum_router: Apply user-defined multipath hash seed
> selftests: forwarding: router_mpath_hash: Add a new selftest
>
> Documentation/networking/ip-sysctl.rst | 10 +
> .../ethernet/mellanox/mlxsw/spectrum_router.c | 14 +-
> include/net/flow_dissector.h | 2 +
> include/net/ip_fib.h | 24 ++
> include/net/netns/ipv4.h | 10 +
> net/core/flow_dissector.c | 7 +
> net/ipv4/route.c | 12 +-
> net/ipv4/sysctl_net_ipv4.c | 82 +++++
> net/ipv6/route.c | 12 +-
> .../testing/selftests/net/forwarding/Makefile | 1 +
> .../net/forwarding/router_mpath_seed.sh | 322 ++++++++++++++++++
> 11 files changed, 482 insertions(+), 14 deletions(-)
> create mode 100755 tools/testing/selftests/net/forwarding/router_mpath_seed.sh
>
Hi,
I think that using memory management for such simple task is an
overkill. Would it be simpler to define 2 x 4 byte seed variables
in netns_ipv4 (e.g. user_seed, mp_seed). One is set only by the
user through the sysctl, which would also set mp_seed. Then you
can use mp_seed in the fast-path to construct that siphash key.
If the user_seed is set to 0 then you reset to some static init
hash value that is generated on init_net's creation. The idea
is to avoid leaking that initial seed, to have the same seed
for all netns (known behaviour), be able to recognize when a
seed was set and if the user sets a seed then overwrite it for
that ns, but to be able to reset it as well.
Since 32 bits are enough I don't see why we should be using
the flow hash seed, note that init_net's initialization already
uses get_random_bytes() for hashes. This seems like a simpler
scheme that doesn't require memory management for a 32 bit seed.
Also it has the benefit that it will remove the test when generating
a hash because in the initial/non-user-set case we just have the
initial seed in mp_seed which is used to generate the siphash key,
i.e. we always use that internal seed for the hash, regardless if
it was set by the user or it's the initial seed.
That's just one suggestion, if you decide to use more memory you
can keep the whole key in netns_ipv4 instead, the point is I don't
think we need memory management for this value.
Cheers,
Nik
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net-next 0/4] Allow configuration of multipath hash seed
2024-05-29 19:57 ` [PATCH net-next 0/4] Allow configuration of multipath hash seed Nikolay Aleksandrov
@ 2024-05-30 15:25 ` Petr Machata
2024-05-30 17:27 ` Nikolay Aleksandrov
0 siblings, 1 reply; 22+ messages in thread
From: Petr Machata @ 2024-05-30 15:25 UTC (permalink / raw)
To: Nikolay Aleksandrov
Cc: Petr Machata, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, netdev, Ido Schimmel, David Ahern
Nikolay Aleksandrov <razor@blackwall.org> writes:
> I think that using memory management for such simple task is an
> overkill. Would it be simpler to define 2 x 4 byte seed variables
> in netns_ipv4 (e.g. user_seed, mp_seed). One is set only by the
> user through the sysctl, which would also set mp_seed. Then you
> can use mp_seed in the fast-path to construct that siphash key.
> If the user_seed is set to 0 then you reset to some static init
> hash value that is generated on init_net's creation. The idea
Currently the flow dissector siphash key is initialized lazily so that
the pool of random bytes is full when the initialization is done:
https://lore.kernel.org/all/20131023180527.GC2403@order.stressinduktion.org
https://lore.kernel.org/all/20131023111219.GA31531@order.stressinduktion.org
I'm not sure how important that is -- the mailing does not really
discuss much in the way of rationale, and admits it's not critical. But
initializing the seed during net init would undo that. At the same time,
initializing it lazily would be a bit of a mess, as we would have to
possibly retroactively update mp_hash as well, which would be racy vs.
user_hash update unless locked. So dunno.
If you are OK with giving up on the siphash key "quality", I'm fine with
this.
Alternatively I can keep the dispatch in like it currently is. I.e.:
if (user_seed) {
sip_hash = construct(user_seed)
return flow_hash_from_keys_seed(sip_hash)
} else {
return flow_hash_from_keys()
}
I wanted to have the flow dispatcher hash init early as well, as it made
the code branch-free like you note below, but then Ido dug out that
there are $reasons for how it's currently done.
> is to avoid leaking that initial seed, to have the same seed
> for all netns (known behaviour), be able to recognize when a
> seed was set and if the user sets a seed then overwrite it for
> that ns, but to be able to reset it as well.
> Since 32 bits are enough I don't see why we should be using
> the flow hash seed, note that init_net's initialization already
No deep reason in using the dissector hash as far as I'm concerned.
I just didn't want to change things arbitrarily, so kept the current
behavior except where I needed it to change.
> uses get_random_bytes() for hashes. This seems like a simpler
> scheme that doesn't require memory management for a 32 bit seed.
> Also it has the benefit that it will remove the test when generating
> a hash because in the initial/non-user-set case we just have the
> initial seed in mp_seed which is used to generate the siphash key,
> i.e. we always use that internal seed for the hash, regardless if
> it was set by the user or it's the initial seed.
>
> That's just one suggestion, if you decide to use more memory you
> can keep the whole key in netns_ipv4 instead, the point is I don't
> think we need memory management for this value.
I kept the RCU stuff in because it makes it easy to precompute the
siphash key while allowing readers to access it lock-free. I could
inline it and guard with a seqlock instead, but that's a bit messier
code-wise. Or indeed construct in-situ, it's an atomic access plus like
four instructions or something like that.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net-next 0/4] Allow configuration of multipath hash seed
2024-05-30 15:25 ` Petr Machata
@ 2024-05-30 17:27 ` Nikolay Aleksandrov
2024-05-30 18:07 ` Nikolay Aleksandrov
0 siblings, 1 reply; 22+ messages in thread
From: Nikolay Aleksandrov @ 2024-05-30 17:27 UTC (permalink / raw)
To: Petr Machata
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
netdev, Ido Schimmel, David Ahern
On 5/30/24 18:25, Petr Machata wrote:
>
> Nikolay Aleksandrov <razor@blackwall.org> writes:
>
>> I think that using memory management for such simple task is an
>> overkill. Would it be simpler to define 2 x 4 byte seed variables
>> in netns_ipv4 (e.g. user_seed, mp_seed). One is set only by the
>> user through the sysctl, which would also set mp_seed. Then you
>> can use mp_seed in the fast-path to construct that siphash key.
>> If the user_seed is set to 0 then you reset to some static init
>> hash value that is generated on init_net's creation. The idea
>
> Currently the flow dissector siphash key is initialized lazily so that
> the pool of random bytes is full when the initialization is done:
>
> https://lore.kernel.org/all/20131023180527.GC2403@order.stressinduktion.org
> https://lore.kernel.org/all/20131023111219.GA31531@order.stressinduktion.org
>
> I'm not sure how important that is -- the mailing does not really
> discuss much in the way of rationale, and admits it's not critical. But
> initializing the seed during net init would undo that. At the same time,
> initializing it lazily would be a bit of a mess, as we would have to
> possibly retroactively update mp_hash as well, which would be racy vs.
> user_hash update unless locked. So dunno.
If you want to keep the late init, untested:
init_mp_seed_once() -> DO_ONCE(func (net_get_random_once(&init_mp_seed),
memcpy(&init_net.mp_seed, &init_mp_seed))
>
> If you are OK with giving up on the siphash key "quality", I'm fine with
> this.
>
IMO that's fine, early init of the seed wouldn't be a problem. net's
hash_mix is already initialized early.
> Alternatively I can keep the dispatch in like it currently is. I.e.:
>
> if (user_seed) {
> sip_hash = construct(user_seed)
> return flow_hash_from_keys_seed(sip_hash)
> } else {
> return flow_hash_from_keys()
> }
>
> I wanted to have the flow dispatcher hash init early as well, as it made
> the code branch-free like you note below, but then Ido dug out that
+1
> there are $reasons for how it's currently done.
> >> is to avoid leaking that initial seed, to have the same seed
>> for all netns (known behaviour), be able to recognize when a
>> seed was set and if the user sets a seed then overwrite it for
>> that ns, but to be able to reset it as well.
>> Since 32 bits are enough I don't see why we should be using
>> the flow hash seed, note that init_net's initialization already
>
> No deep reason in using the dissector hash as far as I'm concerned.
> I just didn't want to change things arbitrarily, so kept the current
> behavior except where I needed it to change.
>
>> uses get_random_bytes() for hashes. This seems like a simpler
>> scheme that doesn't require memory management for a 32 bit seed.
>> Also it has the benefit that it will remove the test when generating
>> a hash because in the initial/non-user-set case we just have the
>> initial seed in mp_seed which is used to generate the siphash key,
>> i.e. we always use that internal seed for the hash, regardless if
>> it was set by the user or it's the initial seed.
>>
>> That's just one suggestion, if you decide to use more memory you
>> can keep the whole key in netns_ipv4 instead, the point is I don't
>> think we need memory management for this value.
>
> I kept the RCU stuff in because it makes it easy to precompute the
> siphash key while allowing readers to access it lock-free. I could
> inline it and guard with a seqlock instead, but that's a bit messier
> code-wise. Or indeed construct in-situ, it's an atomic access plus like
> four instructions or something like that.
You can READ/WRITE_ONCE() the full 8 bytes every time so it's lock-free
and consistent view of both values for observers. For fast-path it'll
only be accessing one of the two values, so it's fine either way. You
can use barriers to ensure latest value is seen by interested readers,
but for most eventual consistency would be enough.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net-next 0/4] Allow configuration of multipath hash seed
2024-05-30 17:27 ` Nikolay Aleksandrov
@ 2024-05-30 18:07 ` Nikolay Aleksandrov
2024-05-30 21:34 ` Nikolay Aleksandrov
0 siblings, 1 reply; 22+ messages in thread
From: Nikolay Aleksandrov @ 2024-05-30 18:07 UTC (permalink / raw)
To: Petr Machata
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
netdev, Ido Schimmel, David Ahern
On 5/30/24 20:27, Nikolay Aleksandrov wrote:
> On 5/30/24 18:25, Petr Machata wrote:
>>
>> Nikolay Aleksandrov <razor@blackwall.org> writes:
>>
>>> I think that using memory management for such simple task is an
>>> overkill. Would it be simpler to define 2 x 4 byte seed variables
>>> in netns_ipv4 (e.g. user_seed, mp_seed). One is set only by the
>>> user through the sysctl, which would also set mp_seed. Then you
>>> can use mp_seed in the fast-path to construct that siphash key.
>>> If the user_seed is set to 0 then you reset to some static init
>>> hash value that is generated on init_net's creation. The idea
>>
>> Currently the flow dissector siphash key is initialized lazily so that
>> the pool of random bytes is full when the initialization is done:
>>
>> https://lore.kernel.org/all/20131023180527.GC2403@order.stressinduktion.org
>> https://lore.kernel.org/all/20131023111219.GA31531@order.stressinduktion.org
>>
>> I'm not sure how important that is -- the mailing does not really
>> discuss much in the way of rationale, and admits it's not critical. But
>> initializing the seed during net init would undo that. At the same time,
>> initializing it lazily would be a bit of a mess, as we would have to
>> possibly retroactively update mp_hash as well, which would be racy vs.
>> user_hash update unless locked. So dunno.
>
> If you want to keep the late init, untested:
> init_mp_seed_once() -> DO_ONCE(func (net_get_random_once(&init_mp_seed),
> memcpy(&init_net.mp_seed, &init_mp_seed))
>
Blah, just get_random_bytes() instead of net_get*. :)
It'll be executed once anyway. But again I think we can do without all
of this.
>>
>> If you are OK with giving up on the siphash key "quality", I'm fine with
>> this.
>>
>
> IMO that's fine, early init of the seed wouldn't be a problem. net's
> hash_mix is already initialized early.
>
[snip]
>>
>> I kept the RCU stuff in because it makes it easy to precompute the
>> siphash key while allowing readers to access it lock-free. I could
>> inline it and guard with a seqlock instead, but that's a bit messier
>> code-wise. Or indeed construct in-situ, it's an atomic access plus like
>> four instructions or something like that.
>
> You can READ/WRITE_ONCE() the full 8 bytes every time so it's lock-free
> and consistent view of both values for observers. For fast-path it'll
> only be accessing one of the two values, so it's fine either way. You
> can use barriers to ensure latest value is seen by interested readers,
> but for most eventual consistency would be enough.
>
Actually aren't we interested only in user_seed in the external reader
case? We don't care what's in mp_seed, so this is much simpler.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net-next 0/4] Allow configuration of multipath hash seed
2024-05-30 18:07 ` Nikolay Aleksandrov
@ 2024-05-30 21:34 ` Nikolay Aleksandrov
2024-06-03 9:21 ` Petr Machata
0 siblings, 1 reply; 22+ messages in thread
From: Nikolay Aleksandrov @ 2024-05-30 21:34 UTC (permalink / raw)
To: Petr Machata
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
netdev, Ido Schimmel, David Ahern
On 5/30/24 21:07, Nikolay Aleksandrov wrote:
> On 5/30/24 20:27, Nikolay Aleksandrov wrote:
>> On 5/30/24 18:25, Petr Machata wrote:
>>>
>>> Nikolay Aleksandrov <razor@blackwall.org> writes:
>>>
>>
> [snip]
[snip snip]
>>>
>>> I kept the RCU stuff in because it makes it easy to precompute the
>>> siphash key while allowing readers to access it lock-free. I could
>>> inline it and guard with a seqlock instead, but that's a bit messier
>>> code-wise. Or indeed construct in-situ, it's an atomic access plus like
>>> four instructions or something like that.
>>
>> You can READ/WRITE_ONCE() the full 8 bytes every time so it's lock-free
>> and consistent view of both values for observers. For fast-path it'll
>> only be accessing one of the two values, so it's fine either way. You
>> can use barriers to ensure latest value is seen by interested readers,
>> but for most eventual consistency would be enough.
>>
>
> Actually aren't we interested only in user_seed in the external reader
> case? We don't care what's in mp_seed, so this is much simpler.
>
Oh, I misunderstood you, didn't I? :) Were you talking about
constructing the siphash key in the fast-path above? If yes,
then sure it's a few instructions but nothing conditional.
I don't think we need anything atomic in that case.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net-next 2/4] net: ipv4: Add a sysctl to set multipath hash seed
2024-05-29 11:18 ` [PATCH net-next 2/4] net: ipv4: Add a sysctl to set multipath hash seed Petr Machata
@ 2024-05-31 1:00 ` Jakub Kicinski
2024-06-02 11:15 ` Ido Schimmel
2024-06-03 9:51 ` Petr Machata
2024-06-01 8:46 ` Eric Dumazet
1 sibling, 2 replies; 22+ messages in thread
From: Jakub Kicinski @ 2024-05-31 1:00 UTC (permalink / raw)
To: Petr Machata
Cc: David S. Miller, Eric Dumazet, Paolo Abeni, netdev, Ido Schimmel,
David Ahern, Jonathan Corbet, linux-doc, Simon Horman
On Wed, 29 May 2024 13:18:42 +0200 Petr Machata wrote:
> +fib_multipath_hash_seed - UNSIGNED INTEGER
> + The seed value used when calculating hash for multipath routes. Applies
nits..
For RSS we call it key rather than seed, is calling it seed well
established for ECMP?
Can we also call out that hashing implementation is not well defined?
> + to both IPv4 and IPv6 datapath. Only valid for kernels built with
s/valid/present/ ?
> + CONFIG_IP_ROUTE_MULTIPATH enabled.
> +
> + When set to 0, the seed value used for multipath routing defaults to an
> + internal random-generated one.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net-next 2/4] net: ipv4: Add a sysctl to set multipath hash seed
2024-05-29 11:18 ` [PATCH net-next 2/4] net: ipv4: Add a sysctl to set multipath hash seed Petr Machata
2024-05-31 1:00 ` Jakub Kicinski
@ 2024-06-01 8:46 ` Eric Dumazet
2024-06-03 7:29 ` Toke Høiland-Jørgensen
2024-06-03 9:50 ` Petr Machata
1 sibling, 2 replies; 22+ messages in thread
From: Eric Dumazet @ 2024-06-01 8:46 UTC (permalink / raw)
To: Petr Machata
Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, netdev,
Ido Schimmel, David Ahern, Jonathan Corbet, linux-doc,
Simon Horman
On Wed, May 29, 2024 at 1:21 PM Petr Machata <petrm@nvidia.com> wrote:
>
> When calculating hashes for the purpose of multipath forwarding, both IPv4
> and IPv6 code currently fall back on flow_hash_from_keys(). That uses a
> randomly-generated seed. That's a fine choice by default, but unfortunately
> some deployments may need a tighter control over the seed used.
>
> In this patch, make the seed configurable by adding a new sysctl key,
> net.ipv4.fib_multipath_hash_seed to control the seed. This seed is used
> specifically for multipath forwarding and not for the other concerns that
> flow_hash_from_keys() is used for, such as queue selection. Expose the knob
> as sysctl because other such settings, such as headers to hash, are also
> handled that way. Like those, the multipath hash seed is a per-netns
> variable.
>
> Despite being placed in the net.ipv4 namespace, the multipath seed sysctl
> is used for both IPv4 and IPv6, similarly to e.g. a number of TCP
> variables.
>
...
> + rtnl_lock();
> + old = rcu_replace_pointer_rtnl(net->ipv4.sysctl_fib_multipath_hash_seed,
> + mphs);
> + rtnl_unlock();
> +
In case you keep RCU for the next version, please do not use rtnl_lock() here.
A simple xchg() will work just fine.
old = xchg((__force struct struct sysctl_fib_multipath_hash_seed
**)&net->ipv4.sysctl_fib_multipath_hash_seed,
mphs);
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net-next 2/4] net: ipv4: Add a sysctl to set multipath hash seed
2024-05-31 1:00 ` Jakub Kicinski
@ 2024-06-02 11:15 ` Ido Schimmel
2024-06-03 6:51 ` Nicolas Dichtel
2024-06-03 9:51 ` Petr Machata
1 sibling, 1 reply; 22+ messages in thread
From: Ido Schimmel @ 2024-06-02 11:15 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Petr Machata, David S. Miller, Eric Dumazet, Paolo Abeni, netdev,
David Ahern, Jonathan Corbet, linux-doc, Simon Horman
On Thu, May 30, 2024 at 06:00:34PM -0700, Jakub Kicinski wrote:
> On Wed, 29 May 2024 13:18:42 +0200 Petr Machata wrote:
> > +fib_multipath_hash_seed - UNSIGNED INTEGER
> > + The seed value used when calculating hash for multipath routes. Applies
>
> nits..
>
> For RSS we call it key rather than seed, is calling it seed well
> established for ECMP?
I have only seen documentation where it is called "seed". Examples:
Cumulus:
https://docs.nvidia.com/networking-ethernet-software/cumulus-linux-59/Layer-3/Routing/Equal-Cost-Multipath-Load-Sharing/#unique-hash-seed
Arista:
https://arista.my.site.com/AristaCommunity/s/article/hashing-for-l2-port-channels-and-l3-ecmp
Research from Fastly around load balancing (Section 6.3):
https://www.usenix.org/system/files/conference/nsdi18/nsdi18-araujo.pdf
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net-next 2/4] net: ipv4: Add a sysctl to set multipath hash seed
2024-06-02 11:15 ` Ido Schimmel
@ 2024-06-03 6:51 ` Nicolas Dichtel
0 siblings, 0 replies; 22+ messages in thread
From: Nicolas Dichtel @ 2024-06-03 6:51 UTC (permalink / raw)
To: Ido Schimmel, Jakub Kicinski
Cc: Petr Machata, David S. Miller, Eric Dumazet, Paolo Abeni, netdev,
David Ahern, Jonathan Corbet, linux-doc, Simon Horman
Le 02/06/2024 à 13:15, Ido Schimmel a écrit :
> On Thu, May 30, 2024 at 06:00:34PM -0700, Jakub Kicinski wrote:
>> On Wed, 29 May 2024 13:18:42 +0200 Petr Machata wrote:
>>> +fib_multipath_hash_seed - UNSIGNED INTEGER
>>> + The seed value used when calculating hash for multipath routes. Applies
>>
>> nits..
>>
>> For RSS we call it key rather than seed, is calling it seed well
>> established for ECMP?
It seems standard for me (we call it like this in our products).
>
> I have only seen documentation where it is called "seed". Examples:
>
> Cumulus:
> https://docs.nvidia.com/networking-ethernet-software/cumulus-linux-59/Layer-3/Routing/Equal-Cost-Multipath-Load-Sharing/#unique-hash-seed
>
> Arista:
> https://arista.my.site.com/AristaCommunity/s/article/hashing-for-l2-port-channels-and-l3-ecmp
>
> Research from Fastly around load balancing (Section 6.3):
> https://www.usenix.org/system/files/conference/nsdi18/nsdi18-araujo.pdf
>
You can add some others:
https://www.juniper.net/documentation/us/en/software/junos/interfaces-ethernet-switches/topics/topic-map/switches-interface-resilient-hashing.html
https://www.cisco.com/c/en/us/td/docs/switches/datacenter/nexus9000/sw/6-x/unicast/configuration/guide/l3_cli_nxos/l3_manage-routes.html
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net-next 2/4] net: ipv4: Add a sysctl to set multipath hash seed
2024-06-01 8:46 ` Eric Dumazet
@ 2024-06-03 7:29 ` Toke Høiland-Jørgensen
2024-06-03 8:25 ` Eric Dumazet
2024-06-03 9:50 ` Petr Machata
1 sibling, 1 reply; 22+ messages in thread
From: Toke Høiland-Jørgensen @ 2024-06-03 7:29 UTC (permalink / raw)
To: Eric Dumazet, Petr Machata
Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, netdev,
Ido Schimmel, David Ahern, Jonathan Corbet, linux-doc,
Simon Horman
Eric Dumazet <edumazet@google.com> writes:
> On Wed, May 29, 2024 at 1:21 PM Petr Machata <petrm@nvidia.com> wrote:
>>
>> When calculating hashes for the purpose of multipath forwarding, both IPv4
>> and IPv6 code currently fall back on flow_hash_from_keys(). That uses a
>> randomly-generated seed. That's a fine choice by default, but unfortunately
>> some deployments may need a tighter control over the seed used.
>>
>> In this patch, make the seed configurable by adding a new sysctl key,
>> net.ipv4.fib_multipath_hash_seed to control the seed. This seed is used
>> specifically for multipath forwarding and not for the other concerns that
>> flow_hash_from_keys() is used for, such as queue selection. Expose the knob
>> as sysctl because other such settings, such as headers to hash, are also
>> handled that way. Like those, the multipath hash seed is a per-netns
>> variable.
>>
>> Despite being placed in the net.ipv4 namespace, the multipath seed sysctl
>> is used for both IPv4 and IPv6, similarly to e.g. a number of TCP
>> variables.
>>
> ...
>
>> + rtnl_lock();
>> + old = rcu_replace_pointer_rtnl(net->ipv4.sysctl_fib_multipath_hash_seed,
>> + mphs);
>> + rtnl_unlock();
>> +
>
> In case you keep RCU for the next version, please do not use rtnl_lock() here.
>
> A simple xchg() will work just fine.
>
> old = xchg((__force struct struct sysctl_fib_multipath_hash_seed
> **)&net->ipv4.sysctl_fib_multipath_hash_seed,
> mphs);
We added a macro to do this kind of thing without triggering any of the
RCU type linter warnings, in:
76c8eaafe4f0 ("rcu: Create an unrcu_pointer() to remove __rcu from a pointer")
So as an alternative to open-coding the cast, something like this could
work - I guess it's mostly a matter of taste:
old = unrcu_pointer(xchg(&net->ipv4.sysctl_fib_multipath_hash_seed, RCU_INITIALIZER(mphs)));
-Toke
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net-next 2/4] net: ipv4: Add a sysctl to set multipath hash seed
2024-06-03 7:29 ` Toke Høiland-Jørgensen
@ 2024-06-03 8:25 ` Eric Dumazet
2024-06-03 8:58 ` Toke Høiland-Jørgensen
0 siblings, 1 reply; 22+ messages in thread
From: Eric Dumazet @ 2024-06-03 8:25 UTC (permalink / raw)
To: Toke Høiland-Jørgensen
Cc: Petr Machata, David S. Miller, Jakub Kicinski, Paolo Abeni,
netdev, Ido Schimmel, David Ahern, Jonathan Corbet, linux-doc,
Simon Horman
On Mon, Jun 3, 2024 at 9:30 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Eric Dumazet <edumazet@google.com> writes:
>
> > On Wed, May 29, 2024 at 1:21 PM Petr Machata <petrm@nvidia.com> wrote:
> >>
> >> When calculating hashes for the purpose of multipath forwarding, both IPv4
> >> and IPv6 code currently fall back on flow_hash_from_keys(). That uses a
> >> randomly-generated seed. That's a fine choice by default, but unfortunately
> >> some deployments may need a tighter control over the seed used.
> >>
> >> In this patch, make the seed configurable by adding a new sysctl key,
> >> net.ipv4.fib_multipath_hash_seed to control the seed. This seed is used
> >> specifically for multipath forwarding and not for the other concerns that
> >> flow_hash_from_keys() is used for, such as queue selection. Expose the knob
> >> as sysctl because other such settings, such as headers to hash, are also
> >> handled that way. Like those, the multipath hash seed is a per-netns
> >> variable.
> >>
> >> Despite being placed in the net.ipv4 namespace, the multipath seed sysctl
> >> is used for both IPv4 and IPv6, similarly to e.g. a number of TCP
> >> variables.
> >>
> > ...
> >
> >> + rtnl_lock();
> >> + old = rcu_replace_pointer_rtnl(net->ipv4.sysctl_fib_multipath_hash_seed,
> >> + mphs);
> >> + rtnl_unlock();
> >> +
> >
> > In case you keep RCU for the next version, please do not use rtnl_lock() here.
> >
> > A simple xchg() will work just fine.
> >
> > old = xchg((__force struct struct sysctl_fib_multipath_hash_seed
> > **)&net->ipv4.sysctl_fib_multipath_hash_seed,
> > mphs);
>
> We added a macro to do this kind of thing without triggering any of the
> RCU type linter warnings, in:
>
> 76c8eaafe4f0 ("rcu: Create an unrcu_pointer() to remove __rcu from a pointer")
>
> So as an alternative to open-coding the cast, something like this could
> work - I guess it's mostly a matter of taste:
>
> old = unrcu_pointer(xchg(&net->ipv4.sysctl_fib_multipath_hash_seed, RCU_INITIALIZER(mphs)));
Good to know, thanks.
Not sure why __kernel qualifier has been put there.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net-next 2/4] net: ipv4: Add a sysctl to set multipath hash seed
2024-06-03 8:25 ` Eric Dumazet
@ 2024-06-03 8:58 ` Toke Høiland-Jørgensen
2024-06-03 13:53 ` Paul E. McKenney
0 siblings, 1 reply; 22+ messages in thread
From: Toke Høiland-Jørgensen @ 2024-06-03 8:58 UTC (permalink / raw)
To: Eric Dumazet, Paul E . McKenney
Cc: Petr Machata, David S. Miller, Jakub Kicinski, Paolo Abeni,
netdev, Ido Schimmel, David Ahern, Jonathan Corbet, linux-doc,
Simon Horman
Eric Dumazet <edumazet@google.com> writes:
> On Mon, Jun 3, 2024 at 9:30 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Eric Dumazet <edumazet@google.com> writes:
>>
>> > On Wed, May 29, 2024 at 1:21 PM Petr Machata <petrm@nvidia.com> wrote:
>> >>
>> >> When calculating hashes for the purpose of multipath forwarding, both IPv4
>> >> and IPv6 code currently fall back on flow_hash_from_keys(). That uses a
>> >> randomly-generated seed. That's a fine choice by default, but unfortunately
>> >> some deployments may need a tighter control over the seed used.
>> >>
>> >> In this patch, make the seed configurable by adding a new sysctl key,
>> >> net.ipv4.fib_multipath_hash_seed to control the seed. This seed is used
>> >> specifically for multipath forwarding and not for the other concerns that
>> >> flow_hash_from_keys() is used for, such as queue selection. Expose the knob
>> >> as sysctl because other such settings, such as headers to hash, are also
>> >> handled that way. Like those, the multipath hash seed is a per-netns
>> >> variable.
>> >>
>> >> Despite being placed in the net.ipv4 namespace, the multipath seed sysctl
>> >> is used for both IPv4 and IPv6, similarly to e.g. a number of TCP
>> >> variables.
>> >>
>> > ...
>> >
>> >> + rtnl_lock();
>> >> + old = rcu_replace_pointer_rtnl(net->ipv4.sysctl_fib_multipath_hash_seed,
>> >> + mphs);
>> >> + rtnl_unlock();
>> >> +
>> >
>> > In case you keep RCU for the next version, please do not use rtnl_lock() here.
>> >
>> > A simple xchg() will work just fine.
>> >
>> > old = xchg((__force struct struct sysctl_fib_multipath_hash_seed
>> > **)&net->ipv4.sysctl_fib_multipath_hash_seed,
>> > mphs);
>>
>> We added a macro to do this kind of thing without triggering any of the
>> RCU type linter warnings, in:
>>
>> 76c8eaafe4f0 ("rcu: Create an unrcu_pointer() to remove __rcu from a pointer")
>>
>> So as an alternative to open-coding the cast, something like this could
>> work - I guess it's mostly a matter of taste:
>>
>> old = unrcu_pointer(xchg(&net->ipv4.sysctl_fib_multipath_hash_seed, RCU_INITIALIZER(mphs)));
>
> Good to know, thanks.
>
> Not sure why __kernel qualifier has been put there.
Not sure either. Paul, care to enlighten us? :)
-Toke
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net-next 0/4] Allow configuration of multipath hash seed
2024-05-30 21:34 ` Nikolay Aleksandrov
@ 2024-06-03 9:21 ` Petr Machata
0 siblings, 0 replies; 22+ messages in thread
From: Petr Machata @ 2024-06-03 9:21 UTC (permalink / raw)
To: Nikolay Aleksandrov
Cc: Petr Machata, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, netdev, Ido Schimmel, David Ahern
Nikolay Aleksandrov <razor@blackwall.org> writes:
> On 5/30/24 21:07, Nikolay Aleksandrov wrote:
>> On 5/30/24 20:27, Nikolay Aleksandrov wrote:
>>> On 5/30/24 18:25, Petr Machata wrote:
>>>>
>>>> I kept the RCU stuff in because it makes it easy to precompute the
>>>> siphash key while allowing readers to access it lock-free. I could
>>>> inline it and guard with a seqlock instead, but that's a bit messier
>>>> code-wise. Or indeed construct in-situ, it's an atomic access plus like
>>>> four instructions or something like that.
>>>
>>> You can READ/WRITE_ONCE() the full 8 bytes every time so it's lock-free
>>> and consistent view of both values for observers. For fast-path it'll
>>> only be accessing one of the two values, so it's fine either way. You
>>> can use barriers to ensure latest value is seen by interested readers,
>>> but for most eventual consistency would be enough.
>>
>> Actually aren't we interested only in user_seed in the external reader
>> case? We don't care what's in mp_seed, so this is much simpler.
>
> Oh, I misunderstood you, didn't I? :) Were you talking about
> constructing the siphash key in the fast-path above? If yes,
> then sure it's a few instructions but nothing conditional.
That's what I meant. I tried to be concise and went overboard.
> I don't think we need anything atomic in that case.
Hmm, right, no competing increments of any sort, so WRITE_ONCE in the
control path and READ_ONCE in fastpath should be enough.
Thanks for the feedback, I'll send v2 this week.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net-next 2/4] net: ipv4: Add a sysctl to set multipath hash seed
2024-06-01 8:46 ` Eric Dumazet
2024-06-03 7:29 ` Toke Høiland-Jørgensen
@ 2024-06-03 9:50 ` Petr Machata
1 sibling, 0 replies; 22+ messages in thread
From: Petr Machata @ 2024-06-03 9:50 UTC (permalink / raw)
To: Eric Dumazet
Cc: Petr Machata, David S. Miller, Jakub Kicinski, Paolo Abeni,
netdev, Ido Schimmel, David Ahern, Jonathan Corbet, linux-doc,
Simon Horman
Eric Dumazet <edumazet@google.com> writes:
> On Wed, May 29, 2024 at 1:21 PM Petr Machata <petrm@nvidia.com> wrote:
>>
>> When calculating hashes for the purpose of multipath forwarding, both IPv4
>> and IPv6 code currently fall back on flow_hash_from_keys(). That uses a
>> randomly-generated seed. That's a fine choice by default, but unfortunately
>> some deployments may need a tighter control over the seed used.
>>
>> In this patch, make the seed configurable by adding a new sysctl key,
>> net.ipv4.fib_multipath_hash_seed to control the seed. This seed is used
>> specifically for multipath forwarding and not for the other concerns that
>> flow_hash_from_keys() is used for, such as queue selection. Expose the knob
>> as sysctl because other such settings, such as headers to hash, are also
>> handled that way. Like those, the multipath hash seed is a per-netns
>> variable.
>>
>> Despite being placed in the net.ipv4 namespace, the multipath seed sysctl
>> is used for both IPv4 and IPv6, similarly to e.g. a number of TCP
>> variables.
>>
> ...
>
>> + rtnl_lock();
>> + old = rcu_replace_pointer_rtnl(net->ipv4.sysctl_fib_multipath_hash_seed,
>> + mphs);
>> + rtnl_unlock();
>> +
>
> In case you keep RCU for the next version, please do not use rtnl_lock() here.
Thanks. It looks like it's going to be inline and key constructed at the
point of use, so no RCU.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net-next 2/4] net: ipv4: Add a sysctl to set multipath hash seed
2024-05-31 1:00 ` Jakub Kicinski
2024-06-02 11:15 ` Ido Schimmel
@ 2024-06-03 9:51 ` Petr Machata
2024-06-03 11:37 ` Petr Machata
1 sibling, 1 reply; 22+ messages in thread
From: Petr Machata @ 2024-06-03 9:51 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Petr Machata, David S. Miller, Eric Dumazet, Paolo Abeni, netdev,
Ido Schimmel, David Ahern, Jonathan Corbet, linux-doc,
Simon Horman
Jakub Kicinski <kuba@kernel.org> writes:
> On Wed, 29 May 2024 13:18:42 +0200 Petr Machata wrote:
>> +fib_multipath_hash_seed - UNSIGNED INTEGER
>> + The seed value used when calculating hash for multipath routes. Applies
>
> nits..
>
> For RSS we call it key rather than seed, is calling it seed well
> established for ECMP?
>
> Can we also call out that hashing implementation is not well defined?
As others note, this seems to be industry nomenclature, so I'll keep it.
>> + to both IPv4 and IPv6 datapath. Only valid for kernels built with
>
> s/valid/present/ ?
Ack.
>> + CONFIG_IP_ROUTE_MULTIPATH enabled.
>> +
>> + When set to 0, the seed value used for multipath routing defaults to an
>> + internal random-generated one.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net-next 2/4] net: ipv4: Add a sysctl to set multipath hash seed
2024-06-03 9:51 ` Petr Machata
@ 2024-06-03 11:37 ` Petr Machata
0 siblings, 0 replies; 22+ messages in thread
From: Petr Machata @ 2024-06-03 11:37 UTC (permalink / raw)
To: Petr Machata
Cc: Jakub Kicinski, David S. Miller, Eric Dumazet, Paolo Abeni,
netdev, Ido Schimmel, David Ahern, Jonathan Corbet, linux-doc,
Simon Horman
Petr Machata <petrm@nvidia.com> writes:
> Jakub Kicinski <kuba@kernel.org> writes:
>
>> On Wed, 29 May 2024 13:18:42 +0200 Petr Machata wrote:
>>> +fib_multipath_hash_seed - UNSIGNED INTEGER
>>> + The seed value used when calculating hash for multipath routes. Applies
>>
>> nits..
>>
>> For RSS we call it key rather than seed, is calling it seed well
>> established for ECMP?
>>
>> Can we also call out that hashing implementation is not well defined?
>
> As others note, this seems to be industry nomenclature, so I'll keep it.
I meant the "seed" name, I'll mention the algorithm is undefined and
doesn't constitute an ABI.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net-next 2/4] net: ipv4: Add a sysctl to set multipath hash seed
2024-06-03 8:58 ` Toke Høiland-Jørgensen
@ 2024-06-03 13:53 ` Paul E. McKenney
0 siblings, 0 replies; 22+ messages in thread
From: Paul E. McKenney @ 2024-06-03 13:53 UTC (permalink / raw)
To: Toke Høiland-Jørgensen
Cc: Eric Dumazet, Petr Machata, David S. Miller, Jakub Kicinski,
Paolo Abeni, netdev, Ido Schimmel, David Ahern, Jonathan Corbet,
linux-doc, Simon Horman
On Mon, Jun 03, 2024 at 10:58:18AM +0200, Toke Høiland-Jørgensen wrote:
> Eric Dumazet <edumazet@google.com> writes:
>
> > On Mon, Jun 3, 2024 at 9:30 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >>
> >> Eric Dumazet <edumazet@google.com> writes:
> >>
> >> > On Wed, May 29, 2024 at 1:21 PM Petr Machata <petrm@nvidia.com> wrote:
> >> >>
> >> >> When calculating hashes for the purpose of multipath forwarding, both IPv4
> >> >> and IPv6 code currently fall back on flow_hash_from_keys(). That uses a
> >> >> randomly-generated seed. That's a fine choice by default, but unfortunately
> >> >> some deployments may need a tighter control over the seed used.
> >> >>
> >> >> In this patch, make the seed configurable by adding a new sysctl key,
> >> >> net.ipv4.fib_multipath_hash_seed to control the seed. This seed is used
> >> >> specifically for multipath forwarding and not for the other concerns that
> >> >> flow_hash_from_keys() is used for, such as queue selection. Expose the knob
> >> >> as sysctl because other such settings, such as headers to hash, are also
> >> >> handled that way. Like those, the multipath hash seed is a per-netns
> >> >> variable.
> >> >>
> >> >> Despite being placed in the net.ipv4 namespace, the multipath seed sysctl
> >> >> is used for both IPv4 and IPv6, similarly to e.g. a number of TCP
> >> >> variables.
> >> >>
> >> > ...
> >> >
> >> >> + rtnl_lock();
> >> >> + old = rcu_replace_pointer_rtnl(net->ipv4.sysctl_fib_multipath_hash_seed,
> >> >> + mphs);
> >> >> + rtnl_unlock();
> >> >> +
> >> >
> >> > In case you keep RCU for the next version, please do not use rtnl_lock() here.
> >> >
> >> > A simple xchg() will work just fine.
> >> >
> >> > old = xchg((__force struct struct sysctl_fib_multipath_hash_seed
> >> > **)&net->ipv4.sysctl_fib_multipath_hash_seed,
> >> > mphs);
> >>
> >> We added a macro to do this kind of thing without triggering any of the
> >> RCU type linter warnings, in:
> >>
> >> 76c8eaafe4f0 ("rcu: Create an unrcu_pointer() to remove __rcu from a pointer")
> >>
> >> So as an alternative to open-coding the cast, something like this could
> >> work - I guess it's mostly a matter of taste:
> >>
> >> old = unrcu_pointer(xchg(&net->ipv4.sysctl_fib_multipath_hash_seed, RCU_INITIALIZER(mphs)));
> >
> > Good to know, thanks.
> >
> > Not sure why __kernel qualifier has been put there.
>
> Not sure either. Paul, care to enlighten us? :)
Because __kernel says "just plain kernel access". Here are the options:
# define __kernel __attribute__((address_space(0)))
# define __user __attribute__((noderef, address_space(__user)))
# define __iomem __attribute__((noderef, address_space(__iomem)))
# define __percpu __attribute__((noderef, address_space(__percpu)))
# define __rcu __attribute__((noderef, address_space(__rcu)))
So casting to __kernel removes the __rcu, thus avoiding the sparse
complaint.
Thanx, Paul
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2024-06-03 13:53 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-29 11:18 [PATCH net-next 0/4] Allow configuration of multipath hash seed Petr Machata
2024-05-29 11:18 ` [PATCH net-next 1/4] net: ipv4,ipv6: Pass multipath hash computation through a helper Petr Machata
2024-05-29 11:18 ` [PATCH net-next 2/4] net: ipv4: Add a sysctl to set multipath hash seed Petr Machata
2024-05-31 1:00 ` Jakub Kicinski
2024-06-02 11:15 ` Ido Schimmel
2024-06-03 6:51 ` Nicolas Dichtel
2024-06-03 9:51 ` Petr Machata
2024-06-03 11:37 ` Petr Machata
2024-06-01 8:46 ` Eric Dumazet
2024-06-03 7:29 ` Toke Høiland-Jørgensen
2024-06-03 8:25 ` Eric Dumazet
2024-06-03 8:58 ` Toke Høiland-Jørgensen
2024-06-03 13:53 ` Paul E. McKenney
2024-06-03 9:50 ` Petr Machata
2024-05-29 11:18 ` [PATCH net-next 3/4] mlxsw: spectrum_router: Apply user-defined " Petr Machata
2024-05-29 11:18 ` [PATCH net-next 4/4] selftests: forwarding: router_mpath_hash: Add a new selftest Petr Machata
2024-05-29 19:57 ` [PATCH net-next 0/4] Allow configuration of multipath hash seed Nikolay Aleksandrov
2024-05-30 15:25 ` Petr Machata
2024-05-30 17:27 ` Nikolay Aleksandrov
2024-05-30 18:07 ` Nikolay Aleksandrov
2024-05-30 21:34 ` Nikolay Aleksandrov
2024-06-03 9:21 ` Petr Machata
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).