From: Akihiko Odaki <akihiko.odaki@daynix.com>
To: Jason Wang <jasowang@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>,
Willem de Bruijn <willemdebruijn.kernel@gmail.com>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
"Michael S. Tsirkin" <mst@redhat.com>,
Xuan Zhuo <xuanzhuo@linux.alibaba.com>,
Shuah Khan <shuah@kernel.org>,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
netdev@vger.kernel.org, kvm@vger.kernel.org,
virtualization@lists.linux-foundation.org,
linux-kselftest@vger.kernel.org,
Yuri Benditovich <yuri.benditovich@daynix.com>,
Andrew Melnychenko <andrew@daynix.com>,
Stephen Hemminger <stephen@networkplumber.org>,
gur.stavi@huawei.com, Lei Yang <leiyang@redhat.com>,
Simon Horman <horms@kernel.org>
Subject: Re: [PATCH net-next v9 1/6] virtio_net: Add functions for hashing
Date: Mon, 10 Mar 2025 15:53:42 +0900 [thread overview]
Message-ID: <2e27f18b-1fc9-433d-92e9-8b2e3b1b65dc@daynix.com> (raw)
In-Reply-To: <CACGkMEvxkwe9OJRZPb7zz-sRfVpeuoYSz4c2kh9_jjtGbkb_qA@mail.gmail.com>
On 2025/03/10 12:55, Jason Wang wrote:
> On Fri, Mar 7, 2025 at 7:01 PM Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
>>
>> They are useful to implement VIRTIO_NET_F_RSS and
>> VIRTIO_NET_F_HASH_REPORT.
>>
>> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
>> Tested-by: Lei Yang <leiyang@redhat.com>
>> ---
>> include/linux/virtio_net.h | 188 +++++++++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 188 insertions(+)
>>
>> diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
>> index 02a9f4dc594d02372a6c1850cd600eff9d000d8d..426f33b4b82440d61b2af9fdc4c0b0d4c571b2c5 100644
>> --- a/include/linux/virtio_net.h
>> +++ b/include/linux/virtio_net.h
>> @@ -9,6 +9,194 @@
>> #include <uapi/linux/tcp.h>
>> #include <uapi/linux/virtio_net.h>
>>
>> +struct virtio_net_hash {
>> + u32 value;
>> + u16 report;
>> +};
>> +
>> +struct virtio_net_toeplitz_state {
>> + u32 hash;
>> + const u32 *key;
>> +};
>> +
>> +#define VIRTIO_NET_SUPPORTED_HASH_TYPES (VIRTIO_NET_RSS_HASH_TYPE_IPv4 | \
>> + VIRTIO_NET_RSS_HASH_TYPE_TCPv4 | \
>> + VIRTIO_NET_RSS_HASH_TYPE_UDPv4 | \
>> + VIRTIO_NET_RSS_HASH_TYPE_IPv6 | \
>> + VIRTIO_NET_RSS_HASH_TYPE_TCPv6 | \
>> + VIRTIO_NET_RSS_HASH_TYPE_UDPv6)
>
> Let's explain why
>
> #define VIRTIO_NET_HASH_REPORT_IPv6_EX 7
> #define VIRTIO_NET_HASH_REPORT_TCPv6_EX 8
> #define VIRTIO_NET_HASH_REPORT_UDPv6_EX 9
>
> are missed here.
Because they require parsing IPv6 options and I'm not sure how many we
need to parse. QEMU's eBPF program has a hard-coded limit of 30 options;
it has some explanation for this limit, but it does not seem definitive
either:
https://gitlab.com/qemu-project/qemu/-/commit/f3fa412de28ae3cb31d38811d30a77e4e20456cc#6ec48fc8af2f802e92f5127425e845c4c213ff60_0_165
In this patch series, I add an ioctl to query capability instead; it
allows me leaving those hash types unimplemented and is crucial to
assure extensibility for future additions of hash types anyway. Anyone
who find these hash types useful can implement in the future.
>
> And explain how we could maintain migration compatibility
>
> 1) Does those three work for userspace datapath in Qemu? If yes,
> migration will be broken.
They work for userspace datapath so my RFC patch series for QEMU uses
TUNGETVNETHASHCAP to prevent breaking migration:
https://patchew.org/QEMU/20240915-hash-v3-0-79cb08d28647@daynix.com/
This patch series first adds configuration options for users to choose
hash types. QEMU then automatically picks one implementation from the
following (the earlier one is the more preferred):
1) The hash capability of vhost hardware
2) The hash capability I'm proposing here
3) The eBPF program
4) The pure userspace implementation
This decision depends on the following:
- The required hash types; supported ones are queried for 1) and 2)
- Whether vhost is enabled or not and what vhost backend is used
- Whether hash reporting is enabled; 3) is incompatible with this
The network device will not be realized if no implementation satisfies
the requirements.
> 2) once we support those three in the future. For example, is the qemu
> expected to probe this via TUNGETVNETHASHCAP in the destination and
> fail the migration?
QEMU is expected to use TUNGETVNETHASHCAP, but it can selectively enable
hash types with TUNSETVNETHASH to keep migration working.
In summary, this patch series provides a sufficient facility for the
userspace to make extensibility and migration compatible;
TUNGETVNETHASHCAP exposes all of the kernel capabilities and
TUNSETVNETHASH allows the userspace to limit them.
Regards,
Akihiko Odaki
>
> Thanks
>
>
>
>> +
>> +#define VIRTIO_NET_RSS_MAX_KEY_SIZE 40
>> +
>> +static inline void virtio_net_toeplitz_convert_key(u32 *input, size_t len)
>> +{
>> + while (len >= sizeof(*input)) {
>> + *input = be32_to_cpu((__force __be32)*input);
>> + input++;
>> + len -= sizeof(*input);
>> + }
>> +}
>> +
>> +static inline void virtio_net_toeplitz_calc(struct virtio_net_toeplitz_state *state,
>> + const __be32 *input, size_t len)
>> +{
>> + while (len >= sizeof(*input)) {
>> + for (u32 map = be32_to_cpu(*input); map; map &= (map - 1)) {
>> + u32 i = ffs(map);
>> +
>> + state->hash ^= state->key[0] << (32 - i) |
>> + (u32)((u64)state->key[1] >> i);
>> + }
>> +
>> + state->key++;
>> + input++;
>> + len -= sizeof(*input);
>> + }
>> +}
>> +
>> +static inline u8 virtio_net_hash_key_length(u32 types)
>> +{
>> + size_t len = 0;
>> +
>> + if (types & VIRTIO_NET_HASH_REPORT_IPv4)
>> + len = max(len,
>> + sizeof(struct flow_dissector_key_ipv4_addrs));
>> +
>> + if (types &
>> + (VIRTIO_NET_HASH_REPORT_TCPv4 | VIRTIO_NET_HASH_REPORT_UDPv4))
>> + len = max(len,
>> + sizeof(struct flow_dissector_key_ipv4_addrs) +
>> + sizeof(struct flow_dissector_key_ports));
>> +
>> + if (types & VIRTIO_NET_HASH_REPORT_IPv6)
>> + len = max(len,
>> + sizeof(struct flow_dissector_key_ipv6_addrs));
>> +
>> + if (types &
>> + (VIRTIO_NET_HASH_REPORT_TCPv6 | VIRTIO_NET_HASH_REPORT_UDPv6))
>> + len = max(len,
>> + sizeof(struct flow_dissector_key_ipv6_addrs) +
>> + sizeof(struct flow_dissector_key_ports));
>> +
>> + return len + sizeof(u32);
>> +}
>> +
>> +static inline u32 virtio_net_hash_report(u32 types,
>> + const struct flow_keys_basic *keys)
>> +{
>> + switch (keys->basic.n_proto) {
>> + case cpu_to_be16(ETH_P_IP):
>> + if (!(keys->control.flags & FLOW_DIS_IS_FRAGMENT)) {
>> + if (keys->basic.ip_proto == IPPROTO_TCP &&
>> + (types & VIRTIO_NET_RSS_HASH_TYPE_TCPv4))
>> + return VIRTIO_NET_HASH_REPORT_TCPv4;
>> +
>> + if (keys->basic.ip_proto == IPPROTO_UDP &&
>> + (types & VIRTIO_NET_RSS_HASH_TYPE_UDPv4))
>> + return VIRTIO_NET_HASH_REPORT_UDPv4;
>> + }
>> +
>> + if (types & VIRTIO_NET_RSS_HASH_TYPE_IPv4)
>> + return VIRTIO_NET_HASH_REPORT_IPv4;
>> +
>> + return VIRTIO_NET_HASH_REPORT_NONE;
>> +
>> + case cpu_to_be16(ETH_P_IPV6):
>> + if (!(keys->control.flags & FLOW_DIS_IS_FRAGMENT)) {
>> + if (keys->basic.ip_proto == IPPROTO_TCP &&
>> + (types & VIRTIO_NET_RSS_HASH_TYPE_TCPv6))
>> + return VIRTIO_NET_HASH_REPORT_TCPv6;
>> +
>> + if (keys->basic.ip_proto == IPPROTO_UDP &&
>> + (types & VIRTIO_NET_RSS_HASH_TYPE_UDPv6))
>> + return VIRTIO_NET_HASH_REPORT_UDPv6;
>> + }
>> +
>> + if (types & VIRTIO_NET_RSS_HASH_TYPE_IPv6)
>> + return VIRTIO_NET_HASH_REPORT_IPv6;
>> +
>> + return VIRTIO_NET_HASH_REPORT_NONE;
>> +
>> + default:
>> + return VIRTIO_NET_HASH_REPORT_NONE;
>> + }
>> +}
>> +
>> +static inline void virtio_net_hash_rss(const struct sk_buff *skb,
>> + u32 types, const u32 *key,
>> + struct virtio_net_hash *hash)
>> +{
>> + struct virtio_net_toeplitz_state toeplitz_state = { .key = key };
>> + struct flow_keys flow;
>> + struct flow_keys_basic flow_basic;
>> + u16 report;
>> +
>> + if (!skb_flow_dissect_flow_keys(skb, &flow, 0)) {
>> + hash->report = VIRTIO_NET_HASH_REPORT_NONE;
>> + return;
>> + }
>> +
>> + flow_basic = (struct flow_keys_basic) {
>> + .control = flow.control,
>> + .basic = flow.basic
>> + };
>> +
>> + report = virtio_net_hash_report(types, &flow_basic);
>> +
>> + switch (report) {
>> + case VIRTIO_NET_HASH_REPORT_IPv4:
>> + virtio_net_toeplitz_calc(&toeplitz_state,
>> + (__be32 *)&flow.addrs.v4addrs,
>> + sizeof(flow.addrs.v4addrs));
>> + break;
>> +
>> + case VIRTIO_NET_HASH_REPORT_TCPv4:
>> + virtio_net_toeplitz_calc(&toeplitz_state,
>> + (__be32 *)&flow.addrs.v4addrs,
>> + sizeof(flow.addrs.v4addrs));
>> + virtio_net_toeplitz_calc(&toeplitz_state, &flow.ports.ports,
>> + sizeof(flow.ports.ports));
>> + break;
>> +
>> + case VIRTIO_NET_HASH_REPORT_UDPv4:
>> + virtio_net_toeplitz_calc(&toeplitz_state,
>> + (__be32 *)&flow.addrs.v4addrs,
>> + sizeof(flow.addrs.v4addrs));
>> + virtio_net_toeplitz_calc(&toeplitz_state, &flow.ports.ports,
>> + sizeof(flow.ports.ports));
>> + break;
>> +
>> + case VIRTIO_NET_HASH_REPORT_IPv6:
>> + virtio_net_toeplitz_calc(&toeplitz_state,
>> + (__be32 *)&flow.addrs.v6addrs,
>> + sizeof(flow.addrs.v6addrs));
>> + break;
>> +
>> + case VIRTIO_NET_HASH_REPORT_TCPv6:
>> + virtio_net_toeplitz_calc(&toeplitz_state,
>> + (__be32 *)&flow.addrs.v6addrs,
>> + sizeof(flow.addrs.v6addrs));
>> + virtio_net_toeplitz_calc(&toeplitz_state, &flow.ports.ports,
>> + sizeof(flow.ports.ports));
>> + break;
>> +
>> + case VIRTIO_NET_HASH_REPORT_UDPv6:
>> + virtio_net_toeplitz_calc(&toeplitz_state,
>> + (__be32 *)&flow.addrs.v6addrs,
>> + sizeof(flow.addrs.v6addrs));
>> + virtio_net_toeplitz_calc(&toeplitz_state, &flow.ports.ports,
>> + sizeof(flow.ports.ports));
>> + break;
>> +
>> + default:
>> + hash->report = VIRTIO_NET_HASH_REPORT_NONE;
>> + return;
>> + }
>> +
>> + hash->value = toeplitz_state.hash;
>> + hash->report = report;
>> +}
>> +
>> static inline bool virtio_net_hdr_match_proto(__be16 protocol, __u8 gso_type)
>> {
>> switch (gso_type & ~VIRTIO_NET_HDR_GSO_ECN) {
>>
>> --
>> 2.48.1
>>
>
next prev parent reply other threads:[~2025-03-10 6:53 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-07 11:01 [PATCH net-next v9 0/6] tun: Introduce virtio-net hashing feature Akihiko Odaki
2025-03-07 11:01 ` [PATCH net-next v9 1/6] virtio_net: Add functions for hashing Akihiko Odaki
2025-03-10 3:55 ` Jason Wang
2025-03-10 6:53 ` Akihiko Odaki [this message]
2025-03-11 0:47 ` Jason Wang
2025-03-11 5:49 ` Akihiko Odaki
2025-03-17 1:24 ` Jason Wang
2025-03-17 6:08 ` Akihiko Odaki
2025-03-07 11:01 ` [PATCH net-next v9 2/6] net: flow_dissector: Export flow_keys_dissector_symmetric Akihiko Odaki
2025-03-07 11:01 ` [PATCH net-next v9 3/6] tun: Introduce virtio-net hash feature Akihiko Odaki
2025-03-08 19:32 ` Willem de Bruijn
2025-03-10 7:11 ` Akihiko Odaki
2025-03-10 3:55 ` Jason Wang
2025-03-10 4:01 ` Jason Wang
2025-03-10 8:16 ` Akihiko Odaki
2025-03-10 7:45 ` Akihiko Odaki
2025-03-11 0:38 ` Jason Wang
2025-03-11 6:11 ` Akihiko Odaki
2025-03-12 2:35 ` Jason Wang
2025-03-12 5:03 ` Akihiko Odaki
2025-03-17 1:12 ` Jason Wang
2025-03-17 7:06 ` Akihiko Odaki
2025-03-18 0:15 ` Jason Wang
2025-03-18 10:10 ` Akihiko Odaki
2025-03-19 0:58 ` Jason Wang
2025-03-19 5:28 ` Akihiko Odaki
2025-03-20 1:31 ` Jason Wang
2025-03-20 5:33 ` Akihiko Odaki
2025-03-21 1:13 ` Jason Wang
2025-03-21 5:56 ` Akihiko Odaki
2025-03-24 4:40 ` Jason Wang
2025-03-29 9:15 ` Akihiko Odaki
2025-03-11 6:17 ` Akihiko Odaki
2025-03-12 2:59 ` Jason Wang
2025-03-12 5:55 ` Akihiko Odaki
2025-03-17 1:15 ` Jason Wang
2025-03-10 7:58 ` Akihiko Odaki
2025-03-11 0:40 ` Jason Wang
2025-03-10 8:13 ` Akihiko Odaki
2025-03-07 11:01 ` [PATCH net-next v9 4/6] selftest: tun: Test vnet ioctls without device Akihiko Odaki
2025-03-07 11:01 ` [PATCH net-next v9 5/6] selftest: tun: Add tests for virtio-net hashing Akihiko Odaki
2025-03-08 19:39 ` Willem de Bruijn
2025-03-10 4:03 ` Jason Wang
2025-03-11 6:19 ` Akihiko Odaki
2025-03-07 11:01 ` [PATCH net-next v9 6/6] vhost/net: Support VIRTIO_NET_F_HASH_REPORT Akihiko Odaki
2025-03-10 4:43 ` Jason Wang
2025-03-10 7:04 ` Akihiko Odaki
2025-03-11 0:42 ` Jason Wang
2025-03-11 6:24 ` Akihiko Odaki
2025-03-12 3:36 ` Jason Wang
2025-03-12 5:59 ` Akihiko Odaki
2025-03-17 1:15 ` Jason Wang
2025-03-19 4:43 ` Akihiko Odaki
2025-03-13 1:15 ` [PATCH net-next v9 0/6] tun: Introduce virtio-net hashing feature Lei Yang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2e27f18b-1fc9-433d-92e9-8b2e3b1b65dc@daynix.com \
--to=akihiko.odaki@daynix.com \
--cc=andrew@daynix.com \
--cc=corbet@lwn.net \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=gur.stavi@huawei.com \
--cc=horms@kernel.org \
--cc=jasowang@redhat.com \
--cc=kuba@kernel.org \
--cc=kvm@vger.kernel.org \
--cc=leiyang@redhat.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=mst@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=shuah@kernel.org \
--cc=stephen@networkplumber.org \
--cc=virtualization@lists.linux-foundation.org \
--cc=willemdebruijn.kernel@gmail.com \
--cc=xuanzhuo@linux.alibaba.com \
--cc=yuri.benditovich@daynix.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).