From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3BF37156E8 for ; Wed, 16 Aug 2023 16:22:18 +0000 (UTC) Received: from out-30.mta0.migadu.com (out-30.mta0.migadu.com [91.218.175.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 057642711 for ; Wed, 16 Aug 2023 09:22:16 -0700 (PDT) Message-ID: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1692202933; h=from:from:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=pe266H6yLoZFPljTbXhxvPbLZcJenRDQ7MYvn3RPzL8=; b=OJUcKpHlMNTfKEyOrzJZyK2AiNdJ8Lur50Zi6qK4cl5teKPYBdncLbuS/keif1DpLFi7rD A7fE5s8z2+UL8wtxblG141xy7G7mDbLdqGb9aaJkjpvd3qmh8kPii658X0g46dWl52/5/N DonJk5PkOdZxrVFEgLv9JX3THwDNgwQ= Date: Wed, 16 Aug 2023 09:22:09 -0700 Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Reply-To: yonghong.song@linux.dev Subject: Re: Masks and overflow of signed immediates in BPF instructions Content-Language: en-US To: "Jose E. Marchesi" Cc: bpf@vger.kernel.org, david.faust@oracle.com, cupertino.miranda@oracle.com References: <877cpwgzgh.fsf@oracle.com> <87leec44v1.fsf@oracle.com> <87wmxv2ut4.fsf@oracle.com> X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Yonghong Song In-Reply-To: <87wmxv2ut4.fsf@oracle.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Migadu-Flow: FLOW_OUT X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net On 8/16/23 2:36 AM, Jose E. Marchesi wrote: > >>> On 8/15/23 7:19 AM, Jose E. Marchesi wrote: >>>> Hello. >>>> The selftest progs/verifier_masking.c contains inline assembly code >>>> like: >>>> w1 = 0xffffffff; >>>> The 32-bit immediate of that instruction is signed. Therefore, GAS >>>> complains that the above instruction overflows its field: >>>> /tmp/ccNOXFQy.s:46: Error: signed immediate out of range, shall >>>> fit in 32 bits >>>> The llvm assembler is likely relying on signed overflow for the >>>> above to >>>> work. >>> >>> Not really. >>> >>> def _ri_32 : ALU_RI>> (outs GPR32:$dst), >>> (ins GPR32:$src2, i32imm:$imm), >>> "$dst "#OpcodeStr#" $imm", >>> [(set GPR32:$dst, (OpNode GPR32:$src2, >>> i32immSExt32:$imm))]>; >>> >>> >>> If generating from source, the pattern [(set GPR32:$dst, (OpNode >>> GPR32:$src2, i32immSExt32:$imm))] so value 0xffffffff is not SExt32 >>> and it won't match and eventually a LDimm_64 insn will be generated. >> >> If by "generating from source" you mean compiling from C, then sure, I >> wasn't implying clang was generating `r1 = 0xffffffff' for assigning >> that positive value to a register. >> >>> But for inline asm, we will have >>> (outs GPR32:$dst) >>> (ins GPR32:$src2, i32imm:$imm) >>> >>> and i32imm is defined as >>> def i32imm : Operand; >>> which is a unsigned 32bit value, so it is recognized properly >>> and the insn is encoded properly. >> >> We thought the imm32 operand in ALU instructions is signed, not >> unsigned. Is it really unsigned?? > > I am going through all the BPF instructions that get 32-bit, 16-bit and > 64-bit immediates, because it seems to me that we may need to > distinguish between two different levels: > > - Value encoded in the instruction immediate: interpreted as signed or > as unsigned. The 'imm' in the instruction is a 32-bit signed insn. I think we have no dispute here. > > - How the assembler interprets a written number for the corresponding > instruction operand: for example, for which instructions the assemler > shall accept 0xfffffffe and 4294967294 and -2 all to denote the same > value, what value is it (negative or positive) or shall it emit an > overflow error. In llvm, for inline asm, 0xfffffffe, 4294967294 and -2 have the same 4-byte bit-wise encoding, so they will be all encoded the same 0xfffffffe in the actual insn. The following is an example for x86 target in llvm: $ cat t.c int foo() { int a, b; asm volatile("movl $0xfffffffe, %0" : "=r"(a) :); asm volatile("movl $-2, %0" : "=r"(b) :); return a + b; } $ clang -O2 -c t.c $ llvm-objdump -d t.o t.o: file format elf64-x86-64 Disassembly of section .text: 0000000000000000 : 0: b9 fe ff ff ff movl $0xfffffffe, %ecx # imm = 0xFFFFFFFE 5: b8 fe ff ff ff movl $0xfffffffe, %eax # imm = 0xFFFFFFFE a: 01 c8 addl %ecx, %eax c: c3 retq $ Whether it is 0xfffffffe or -2, the insn encoding is the same and disasm prints out 0xfffffffe. > > Will follow up with a summary that hopefully will serve to clarify this. > >>>> Using negative numbers to denote masks is ugly and obfuscating (for >>>> non-obvious cases like -1/0xffffffff) so I suggest we introduce a >>>> pseudo-op so we can do: >>>> w1 = %mask(0xffffffff) >>> >>> I changed above >>> w1 = 0xffffffff; >>> to >>> w1 = %mask(0xffffffff) >>> and hit the following compilation failure. >>> >>> progs/verifier_masking.c:54:9: error: invalid % escape in inline >>> assembly string >>> 53 | asm volatile (" \ >>> | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>> 54 | w1 = %mask(0xffffffff); \ >>> | ^ >>> 1 error generated. >>> >>> Do you have documentation what is '%mask' thing? >> >> It doesn't exist. >> >> I am suggesting to add support for that pseudo-op to the BPF assemblers: >> both GAS and the llvm BPF assembler. >> >>>> allowing the assembler to do the right thing (TM) converting and >>>> checking that the mask is valid and not relying on UB. >>>> Thoughts? >>>>