All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yonghong Song <yonghong.song@linux.dev>
To: Xu Kuohai <xukuohai@huaweicloud.com>, bpf@vger.kernel.org
Cc: Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@linux.dev>,
	Eduard Zingerman <eddyz87@gmail.com>, Song Liu <song@kernel.org>,
	KP Singh <kpsingh@kernel.org>,
	Stanislav Fomichev <sdf@google.com>, Hao Luo <haoluo@google.com>,
	Jiri Olsa <jolsa@kernel.org>
Subject: Re: [PATCH bpf-next] bpf: Fold LSH and ARSH pair to a single MOVSX for sign-extension
Date: Mon, 29 Apr 2024 09:32:47 -0700	[thread overview]
Message-ID: <c4b2cdf7-0978-4e72-a833-a809655fa84f@linux.dev> (raw)
In-Reply-To: <20240429152036.3411628-1-xukuohai@huaweicloud.com>


On 4/29/24 8:20 AM, Xu Kuohai wrote:
> From: Xu Kuohai <xukuohai@huawei.com>
>
> As shown in the ExpandSEXTINREG function in [1], LLVM generates SRL and
> SRA instruction pair to implement sign-extension. For x86 and arm64,
> this instruction pair will be folded to a single instruction, but the
> LLVM BPF backend does not do such folding.

With -mcpu=v4, sign-extention will be generated and in selftest
test_progs-cpuv4 will test with -mcpu=v4. The cpu v4 support
is added in llvm18, and I hope once llvm18 is widely available, we
might be able to make test_progs-cpuv4 as the default test_progs.

So I think this optimization is not needed.

>
> For example, the following C code:
>
> long f(int x)
> {
> 	return x;
> }
>
> will be compiled to:
>
> r0 = r1
> r0 <<= 0x20
> r0 s>>= 0x20
> exit
>
> Since 32-bit to 64-bit sign-extension is a common case and we already
> have MOVSX instruction for sign-extension, this patch tries to fold
> the 32-bit to 64-bit LSH and ARSH pair to a single MOVSX instruction.
>
> [1] https://github.com/llvm/llvm-project/blob/4523a267829c807f3fc8fab8e5e9613985a51565/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp#L1228
>
> Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
> ---
>   include/linux/filter.h |  8 ++++++++
>   kernel/bpf/verifier.c  | 46 ++++++++++++++++++++++++++++++++++++++++++
>   2 files changed, 54 insertions(+)
>
> diff --git a/include/linux/filter.h b/include/linux/filter.h
> index 7a27f19bf44d..7cc90a32ed9a 100644
> --- a/include/linux/filter.h
> +++ b/include/linux/filter.h
> @@ -173,6 +173,14 @@ struct ctl_table_header;
>   		.off   = 0,					\
>   		.imm   = 0 })
>   
> +#define BPF_MOV64_SEXT_REG(DST, SRC, OFF)			\
> +	((struct bpf_insn) {					\
> +		.code  = BPF_ALU64 | BPF_MOV | BPF_X,		\
> +		.dst_reg = DST,					\
> +		.src_reg = SRC,					\
> +		.off   = OFF,					\
> +		.imm   = 0 })
> +
>   #define BPF_MOV32_REG(DST, SRC)					\
>   	((struct bpf_insn) {					\
>   		.code  = BPF_ALU | BPF_MOV | BPF_X,		\
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 4e474ef44e9c..6bcee052d90d 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -20659,6 +20659,49 @@ static int optimize_bpf_loop(struct bpf_verifier_env *env)
>   	return 0;
>   }
>   
> +static bool is_sext32(struct bpf_insn *insn1, struct bpf_insn *insn2)
> +{
> +	if (insn1->code != (BPF_ALU64 | BPF_K | BPF_LSH) || insn1->imm != 32)
> +		return false;
> +
> +	if (insn2->code != (BPF_ALU64 | BPF_K | BPF_ARSH) || insn2->imm != 32)
> +		return false;
> +
> +	if (insn1->dst_reg != insn2->dst_reg)
> +		return false;
> +
> +	return true;
> +}
> +
> +/* LLVM generates sign-extension with LSH and ARSH pair, replace it with MOVSX.
> + *
> + * Before:
> + * DST <<= 32
> + * DST s>>= 32
> + *
> + * After:
> + * DST = (s32)DST
> + */
> +static int optimize_sext32_insns(struct bpf_verifier_env *env)
> +{
> +	int i, err;
> +	int insn_cnt = env->prog->len;
> +	struct bpf_insn *insn = env->prog->insnsi;
> +
> +	for (i = 0; i < insn_cnt; i++, insn++) {
> +		if (i + 1 >= insn_cnt || !is_sext32(insn, insn + 1))
> +			continue;
> +		/* patch current insn to MOVSX */
> +		*insn = BPF_MOV64_SEXT_REG(insn->dst_reg, insn->dst_reg, 32);
> +		/* remove next insn */
> +		err = verifier_remove_insns(env, i + 1, 1);
> +		if (err)
> +			return err;
> +		insn_cnt--;
> +	}
> +	return 0;
> +}
> +
>   static void free_states(struct bpf_verifier_env *env)
>   {
>   	struct bpf_verifier_state_list *sl, *sln;
> @@ -21577,6 +21620,9 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr, __u3
>   	if (ret == 0)
>   		ret = optimize_bpf_loop(env);
>   
> +	if (ret == 0)
> +		ret = optimize_sext32_insns(env);
> +
>   	if (is_priv) {
>   		if (ret == 0)
>   			opt_hard_wire_dead_code_branches(env);

      reply	other threads:[~2024-04-29 16:32 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-29 15:20 [PATCH bpf-next] bpf: Fold LSH and ARSH pair to a single MOVSX for sign-extension Xu Kuohai
2024-04-29 16:32 ` Yonghong Song [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c4b2cdf7-0978-4e72-a833-a809655fa84f@linux.dev \
    --to=yonghong.song@linux.dev \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=eddyz87@gmail.com \
    --cc=haoluo@google.com \
    --cc=jolsa@kernel.org \
    --cc=kpsingh@kernel.org \
    --cc=martin.lau@linux.dev \
    --cc=sdf@google.com \
    --cc=song@kernel.org \
    --cc=xukuohai@huaweicloud.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.