From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 2C4AEC4345F for ; Fri, 19 Apr 2024 18:52:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:Message-ID:Date:References :In-Reply-To:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=QGAEnNNuiFVs9TXKXXHpnIgJL96s6U8yYO9MY1hlfgw=; b=Usiyku5G/BEcZK V0+JraS677f+J19H++lvuxnviMWflZO5D+3UhQLgqMro/1LL1xD2M4npymvQ/zx/Hd+2fvmPB45ao SS05diC1Ar2fffp1xqwdsXJBnBaO4V2tTHMjzFJUsKQ6ifE/WLQLnRP5ELzV0Ne0aKLjYOfVskGSu 8t7bfbk66m7fcKRjIHQ8pU5R2rx1i7Ua0J21Mb9o8YVCGh5fYM1ECDP4Bp+nC5sppKaU/37ZfCdPN i/K3r4KvlIF1W2jZMGA9E/epZoySS8pQAB1r/FxIPgDP1bE0cDhyXiY/8NdyMCBGLti1H/zKi/AMy KdMoiePbBxX+AEt2FuGw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1rxtLl-00000006iXm-11vI; Fri, 19 Apr 2024 18:52:41 +0000 Received: from sin.source.kernel.org ([2604:1380:40e1:4800::1]) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1rxtLi-00000006iWi-09cN for linux-arm-kernel@lists.infradead.org; Fri, 19 Apr 2024 18:52:39 +0000 Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sin.source.kernel.org (Postfix) with ESMTP id 360FDCE19F1; Fri, 19 Apr 2024 18:52:36 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 13B00C072AA; Fri, 19 Apr 2024 18:52:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1713552755; bh=rdVZSny6BIgCS3vGo8EaqhW6D/kyD4McSwc8Nifxnd0=; h=From:To:Cc:Subject:In-Reply-To:References:Date:From; b=akeL0E3X9MKdKb+NkeF9gFpoDwSm//+rx16chpD3E/IS0lg7e/DFyQwF7g3koWXgY ea5+YLs3SiTjgyBkLUsAV4PD92t7ITrGIrCkF1KX5rp0pQnpMqpDVjb95RRQOha0DN Kds9i+BNWrkRJ7gkLxnq19/EhsP6uiTjBDpGijaZ2ZL3LADYooJIq8bkwDD3RfE75m wHmksuPbEMqD1egfznJJ2vicDv7B+B6sDxH3+F2EjPG3UA3qmbModEKHqNblCnXf5F /0CfdCM3DIar2gHArNU0zkKg5KAh0/ld6pmX1SsGiKXjuh3M5npmuoLmeBUOufu64l DHjKDFXW+FUZw== From: Puranjay Mohan To: "Russell King (Oracle)" Cc: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Eduard Zingerman , Song Liu , Yonghong Song , John Fastabend , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , bpf@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH bpf] arm32, bpf: Fix sign-extension mov instruction In-Reply-To: References: <20240409095038.26356-1-puranjay@kernel.org> Date: Fri, 19 Apr 2024 18:52:31 +0000 Message-ID: MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240419_115238_463467_A6256A12 X-CRM114-Status: GOOD ( 38.63 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org "Russell King (Oracle)" writes: > On Tue, Apr 09, 2024 at 09:50:38AM +0000, Puranjay Mohan wrote: >> The current implementation of the mov instruction with sign extension >> clobbers the source register because it sign extends the source and then >> moves it to the destination. >> >> Fix this by moving the src to a temporary register before doing the sign >> extension only if src is not an emulated register (on the scratch stack). >> >> Also fix the emit_a32_movsx_r64() to put the register back on scratch >> stack if that register is emulated on stack. > > It would be good to include in the commit message an example or two of > the resulting assembly code so that it's clear what the expected > generation is. Instead, I'm going to have to work it out myself, but > I'm quite sure this is information you already have. > >> Fixes: fc832653fa0d ("arm32, bpf: add support for sign-extension mov instruction") >> Reported-by: syzbot+186522670e6722692d86@syzkaller.appspotmail.com >> Closes: https://lore.kernel.org/all/000000000000e9a8d80615163f2a@google.com/ >> Signed-off-by: Puranjay Mohan >> --- >> arch/arm/net/bpf_jit_32.c | 11 +++++++++-- >> 1 file changed, 9 insertions(+), 2 deletions(-) >> >> diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c >> index 1d672457d02f..8fde6ab66cb4 100644 >> --- a/arch/arm/net/bpf_jit_32.c >> +++ b/arch/arm/net/bpf_jit_32.c >> @@ -878,6 +878,13 @@ static inline void emit_a32_mov_r(const s8 dst, const s8 src, const u8 off, >> >> rt = arm_bpf_get_reg32(src, tmp[0], ctx); >> if (off && off != 32) { >> + /* If rt is not a stacked register, move it to tmp, so it doesn't get clobbered by >> + * the shift operations. >> + */ >> + if (rt == src) { >> + emit(ARM_MOV_R(tmp[0], rt), ctx); >> + rt = tmp[0]; >> + } > > This change is adding inefficiency, don't we want to have the JIT > creating as efficient code as possible within the bounds of > reasonableness? > >> emit(ARM_LSL_I(rt, rt, 32 - off), ctx); >> emit(ARM_ASR_I(rt, rt, 32 - off), ctx); > > LSL and ASR can very easily take a different source register to the > destination register. All this needs to be is: > > emit(ARM_LSL_I(tmp[0], rt, 32 - off), ctx); > emit(ARM_ASR_I(tmp[0], tmp[0], 32 - off), ctx); > rt = tmp[0]; > > This will generate: > > lsl tmp[0], src, #32-off > asr tmp[0], tmp[0], #32-off > > and then the store to the output register will occur. > > What about the high-32 bits of the register pair - should that be > taking any value? > >> } > > I notice in passing that the comments are out of sync with the > code - please update the comments along with code changes. > >> @@ -919,15 +926,15 @@ static inline void emit_a32_movsx_r64(const bool is64, const u8 off, const s8 ds >> const s8 *tmp = bpf2a32[TMP_REG_1]; >> const s8 *rt; >> >> - rt = arm_bpf_get_reg64(dst, tmp, ctx); >> - >> emit_a32_mov_r(dst_lo, src_lo, off, ctx); >> if (!is64) { >> if (!ctx->prog->aux->verifier_zext) >> /* Zero out high 4 bytes */ >> emit_a32_mov_i(dst_hi, 0, ctx); >> } else { >> + rt = arm_bpf_get_reg64(dst, tmp, ctx); >> emit(ARM_ASR_I(rt[0], rt[1], 31), ctx); >> + arm_bpf_put_reg64(dst, rt, ctx); >> } >> } > > Why oh why oh why are we emitting code to read the source register > (which may be a load), then write it to the destination (which may > be a store) to only then immediately reload from the destination > to then do the sign extension? This is madness. > > Please... apply some thought to the code generation from the JIT... > or I will remove you from being a maintainer of this code. I spent > time crafting some parts of the JIT to generate efficient code and > I'm seeing that a lot of that work is now being thrown away by > someone who seemingly doesn't care about generating "good" code. > Sorry for this, I also like to make sure the JITs are as efficient as possible. I was too focused on fixing this as fast as possible and didn't pay attention that day. I have reimplemented the whole thing again to make sure all bugs are fixed. The commit message has the generated assembly for all cases: https://lore.kernel.org/all/20240419182832.27707-1-puranjay@kernel.org/ Thanks, Puranjay _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel