From: Yonghong Song <yonghong.song@linux.dev>
To: Eduard Zingerman <eddyz87@gmail.com>, Daniel Xu <dxu@dxuuu.xyz>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>,
Shuah Khan <shuah@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Andrii Nakryiko <andrii@kernel.org>,
Alexei Starovoitov <ast@kernel.org>,
Steffen Klassert <steffen.klassert@secunet.com>,
antony.antony@secunet.com, Mykola Lysenko <mykolal@fb.com>,
Martin KaFai Lau <martin.lau@linux.dev>,
Song Liu <song@kernel.org>,
John Fastabend <john.fastabend@gmail.com>,
KP Singh <kpsingh@kernel.org>,
Stanislav Fomichev <sdf@google.com>, Hao Luo <haoluo@google.com>,
Jiri Olsa <jolsa@kernel.org>, bpf <bpf@vger.kernel.org>,
"open list:KERNEL SELFTEST FRAMEWORK"
<linux-kselftest@vger.kernel.org>,
LKML <linux-kernel@vger.kernel.org>,
devel@linux-ipsec.org,
Network Development <netdev@vger.kernel.org>
Subject: Re: [PATCH ipsec-next v1 6/7] bpf: selftests: test_tunnel: Disable CO-RE relocations
Date: Sun, 26 Nov 2023 21:44:48 -0800 [thread overview]
Message-ID: <42f9bf0d-695a-412d-bea5-cb7036fa7418@linux.dev> (raw)
In-Reply-To: <0535eb913f1a0c2d3c291478fde07e0aa2b333f1.camel@gmail.com>
On 11/26/23 8:52 PM, Eduard Zingerman wrote:
> On Sun, 2023-11-26 at 18:04 -0600, Daniel Xu wrote:
> [...]
>>> Tbh I'm not sure. This test passes with preserve_static_offset
>>> because it suppresses preserve_access_index. In general clang
>>> translates bitfield access to a set of IR statements like:
>>>
>>> C:
>>> struct foo {
>>> unsigned _;
>>> unsigned a:1;
>>> ...
>>> };
>>> ... foo->a ...
>>>
>>> IR:
>>> %a = getelementptr inbounds %struct.foo, ptr %0, i32 0, i32 1
>>> %bf.load = load i8, ptr %a, align 4
>>> %bf.clear = and i8 %bf.load, 1
>>> %bf.cast = zext i8 %bf.clear to i32
>>>
>>> With preserve_static_offset the getelementptr+load are replaced by a
>>> single statement which is preserved as-is till code generation,
>>> thus load with align 4 is preserved.
>>>
>>> On the other hand, I'm not sure that clang guarantees that load or
>>> stores used for bitfield access would be always aligned according to
>>> verifier expectations.
>>>
>>> I think we should check if there are some clang knobs that prevent
>>> generation of unaligned memory access. I'll take a look.
>> Is there a reason to prefer fixing in compiler? I'm not opposed to it,
>> but the downside to compiler fix is it takes years to propagate and
>> sprinkles ifdefs into the code.
>>
>> Would it be possible to have an analogue of BPF_CORE_READ_BITFIELD()?
> Well, the contraption below passes verification, tunnel selftest
> appears to work. I might have messed up some shifts in the macro, though.
I didn't test it. But from high level it should work.
>
> Still, if clang would peek unlucky BYTE_{OFFSET,SIZE} for a particular
> field access might be unaligned.
clang should pick a sensible BYTE_SIZE/BYTE_OFFSET to meet
alignment requirement. This is also required for BPF_CORE_READ_BITFIELD.
>
> ---
>
> diff --git a/tools/testing/selftests/bpf/progs/test_tunnel_kern.c b/tools/testing/selftests/bpf/progs/test_tunnel_kern.c
> index 3065a716544d..41cd913ac7ff 100644
> --- a/tools/testing/selftests/bpf/progs/test_tunnel_kern.c
> +++ b/tools/testing/selftests/bpf/progs/test_tunnel_kern.c
> @@ -9,6 +9,7 @@
> #include "vmlinux.h"
> #include <bpf/bpf_helpers.h>
> #include <bpf/bpf_endian.h>
> +#include <bpf/bpf_core_read.h>
> #include "bpf_kfuncs.h"
> #include "bpf_tracing_net.h"
>
> @@ -144,6 +145,38 @@ int ip6gretap_get_tunnel(struct __sk_buff *skb)
> return TC_ACT_OK;
> }
>
> +#define BPF_CORE_WRITE_BITFIELD(s, field, new_val) ({ \
> + void *p = (void *)s + __CORE_RELO(s, field, BYTE_OFFSET); \
> + unsigned byte_size = __CORE_RELO(s, field, BYTE_SIZE); \
> + unsigned lshift = __CORE_RELO(s, field, LSHIFT_U64); \
> + unsigned rshift = __CORE_RELO(s, field, RSHIFT_U64); \
> + unsigned bit_size = (rshift - lshift); \
> + unsigned long long nval, val, hi, lo; \
> + \
> + asm volatile("" : "=r"(p) : "0"(p)); \
Use asm volatile("" : "+r"(p)) ?
> + \
> + switch (byte_size) { \
> + case 1: val = *(unsigned char *)p; break; \
> + case 2: val = *(unsigned short *)p; break; \
> + case 4: val = *(unsigned int *)p; break; \
> + case 8: val = *(unsigned long long *)p; break; \
> + } \
> + hi = val >> (bit_size + rshift); \
> + hi <<= bit_size + rshift; \
> + lo = val << (bit_size + lshift); \
> + lo >>= bit_size + lshift; \
> + nval = new_val; \
> + nval <<= lshift; \
> + nval >>= rshift; \
> + val = hi | nval | lo; \
> + switch (byte_size) { \
> + case 1: *(unsigned char *)p = val; break; \
> + case 2: *(unsigned short *)p = val; break; \
> + case 4: *(unsigned int *)p = val; break; \
> + case 8: *(unsigned long long *)p = val; break; \
> + } \
> +})
I think this should be put in libbpf public header files but not sure
where to put it. bpf_core_read.h although it is core write?
But on the other hand, this is a uapi struct bitfield write,
strictly speaking, CORE write is really unnecessary here. It
would be great if we can relieve users from dealing with
such unnecessary CORE writes. In that sense, for this particular
case, I would prefer rewriting the code by using byte-level
stores...
> +
> SEC("tc")
> int erspan_set_tunnel(struct __sk_buff *skb)
> {
> @@ -173,9 +206,9 @@ int erspan_set_tunnel(struct __sk_buff *skb)
> __u8 hwid = 7;
>
> md.version = 2;
> - md.u.md2.dir = direction;
> - md.u.md2.hwid = hwid & 0xf;
> - md.u.md2.hwid_upper = (hwid >> 4) & 0x3;
> + BPF_CORE_WRITE_BITFIELD(&md.u.md2, dir, direction);
> + BPF_CORE_WRITE_BITFIELD(&md.u.md2, hwid, (hwid & 0xf));
> + BPF_CORE_WRITE_BITFIELD(&md.u.md2, hwid_upper, (hwid >> 4) & 0x3);
> #endif
>
> ret = bpf_skb_set_tunnel_opt(skb, &md, sizeof(md));
> @@ -214,8 +247,9 @@ int erspan_get_tunnel(struct __sk_buff *skb)
> bpf_printk("\tindex %x\n", index);
> #else
> bpf_printk("\tdirection %d hwid %x timestamp %u\n",
> - md.u.md2.dir,
> - (md.u.md2.hwid_upper << 4) + md.u.md2.hwid,
> + BPF_CORE_READ_BITFIELD(&md.u.md2, dir),
> + (BPF_CORE_READ_BITFIELD(&md.u.md2, hwid_upper) << 4) +
> + BPF_CORE_READ_BITFIELD(&md.u.md2, hwid),
> bpf_ntohl(md.u.md2.timestamp));
> #endif
>
> @@ -252,9 +286,9 @@ int ip4ip6erspan_set_tunnel(struct __sk_buff *skb)
> __u8 hwid = 17;
>
> md.version = 2;
> - md.u.md2.dir = direction;
> - md.u.md2.hwid = hwid & 0xf;
> - md.u.md2.hwid_upper = (hwid >> 4) & 0x3;
> + BPF_CORE_WRITE_BITFIELD(&md.u.md2, dir, direction);
> + BPF_CORE_WRITE_BITFIELD(&md.u.md2, hwid, (hwid & 0xf));
> + BPF_CORE_WRITE_BITFIELD(&md.u.md2, hwid_upper, (hwid >> 4) & 0x3);
> #endif
>
> ret = bpf_skb_set_tunnel_opt(skb, &md, sizeof(md));
> @@ -294,8 +328,9 @@ int ip4ip6erspan_get_tunnel(struct __sk_buff *skb)
> bpf_printk("\tindex %x\n", index);
> #else
> bpf_printk("\tdirection %d hwid %x timestamp %u\n",
> - md.u.md2.dir,
> - (md.u.md2.hwid_upper << 4) + md.u.md2.hwid,
> + BPF_CORE_READ_BITFIELD(&md.u.md2, dir),
> + (BPF_CORE_READ_BITFIELD(&md.u.md2, hwid_upper) << 4) +
> + BPF_CORE_READ_BITFIELD(&md.u.md2, hwid),
> bpf_ntohl(md.u.md2.timestamp));
> #endif
>
next prev parent reply other threads:[~2023-11-27 5:45 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-22 18:20 [PATCH ipsec-next v1 0/7] Add bpf_xdp_get_xfrm_state() kfunc Daniel Xu
2023-11-22 18:20 ` [PATCH ipsec-next v1 1/7] bpf: xfrm: " Daniel Xu
2023-11-22 23:26 ` Alexei Starovoitov
2023-11-25 20:36 ` Yonghong Song
2023-11-26 4:38 ` Daniel Xu
2023-11-22 18:20 ` [PATCH ipsec-next v1 2/7] bpf: xfrm: Add bpf_xdp_xfrm_state_release() kfunc Daniel Xu
2023-11-22 18:20 ` [PATCH ipsec-next v1 3/7] bpf: selftests: test_tunnel: Use ping -6 over ping6 Daniel Xu
2023-11-22 18:20 ` [PATCH ipsec-next v1 4/7] bpf: selftests: test_tunnel: Mount bpffs if necessary Daniel Xu
2023-11-22 18:20 ` [PATCH ipsec-next v1 5/7] bpf: selftests: test_tunnel: Use vmlinux.h declarations Daniel Xu
2023-11-26 0:34 ` Yonghong Song
2023-11-26 4:34 ` Daniel Xu
2023-11-22 18:20 ` [PATCH ipsec-next v1 6/7] bpf: selftests: test_tunnel: Disable CO-RE relocations Daniel Xu
2023-11-26 0:51 ` Yonghong Song
2023-11-26 0:54 ` Alexei Starovoitov
2023-11-26 4:22 ` Yonghong Song
2023-11-26 20:14 ` Eduard Zingerman
2023-11-27 0:04 ` Daniel Xu
2023-11-27 1:52 ` Eduard Zingerman
2023-11-27 5:44 ` Yonghong Song [this message]
2023-11-27 5:53 ` Yonghong Song
2023-11-27 20:45 ` Daniel Xu
2023-11-27 21:32 ` Eduard Zingerman
2023-11-28 0:01 ` Daniel Xu
2023-11-28 4:06 ` Yonghong Song
2023-11-28 16:02 ` Andrii Nakryiko
2023-11-28 16:13 ` Daniel Xu
2023-11-28 16:17 ` Daniel Xu
2023-11-28 16:56 ` Yonghong Song
2023-11-28 16:19 ` Eduard Zingerman
2023-11-27 5:20 ` Yonghong Song
2023-11-22 18:20 ` [PATCH ipsec-next v1 7/7] bpf: xfrm: Add selftest for bpf_xdp_get_xfrm_state() Daniel Xu
2023-11-22 23:28 ` Alexei Starovoitov
2023-11-24 20:59 ` Daniel Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=42f9bf0d-695a-412d-bea5-cb7036fa7418@linux.dev \
--to=yonghong.song@linux.dev \
--cc=alexei.starovoitov@gmail.com \
--cc=andrii@kernel.org \
--cc=antony.antony@secunet.com \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=devel@linux-ipsec.org \
--cc=dxu@dxuuu.xyz \
--cc=eddyz87@gmail.com \
--cc=haoluo@google.com \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kpsingh@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=martin.lau@linux.dev \
--cc=mykolal@fb.com \
--cc=netdev@vger.kernel.org \
--cc=sdf@google.com \
--cc=shuah@kernel.org \
--cc=song@kernel.org \
--cc=steffen.klassert@secunet.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).