* Register encoding in assembly for load/store instructions @ 2023-07-25 17:29 Jose E. Marchesi 2023-07-25 18:47 ` Yonghong Song 0 siblings, 1 reply; 16+ messages in thread From: Jose E. Marchesi @ 2023-07-25 17:29 UTC (permalink / raw) To: Yonghong Song; +Cc: bpf Hello Yonghong. We have noticed that the llvm disassembler uses different notations for registers in load and store instructions, depending somehow on the width of the data being loaded or stored. For example, this is an excerpt from the assembler-disassembler.s test file in llvm: // Note: For the group below w1 is used as a destination for sizes u8, u16, u32. // This is disassembler quirk, but is technically not wrong, as there are // no different encodings for 'r1 = load' vs 'w1 = load'. // // CHECK: 71 21 2a 00 00 00 00 00 w1 = *(u8 *)(r2 + 0x2a) // CHECK: 69 21 2a 00 00 00 00 00 w1 = *(u16 *)(r2 + 0x2a) // CHECK: 61 21 2a 00 00 00 00 00 w1 = *(u32 *)(r2 + 0x2a) // CHECK: 79 21 2a 00 00 00 00 00 r1 = *(u64 *)(r2 + 0x2a) r1 = *(u8*)(r2 + 42) r1 = *(u16*)(r2 + 42) r1 = *(u32*)(r2 + 42) r1 = *(u64*)(r2 + 42) The comment there clarifies that the usage of wN instead of rN in the u8, u16 and u32 cases is a "disassembler quirk". Anyway, the problem is that it seems that `clang -S' actually emits these forms with wN. Is that intended? ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Register encoding in assembly for load/store instructions 2023-07-25 17:29 Register encoding in assembly for load/store instructions Jose E. Marchesi @ 2023-07-25 18:47 ` Yonghong Song 2023-07-25 18:56 ` Jose E. Marchesi 0 siblings, 1 reply; 16+ messages in thread From: Yonghong Song @ 2023-07-25 18:47 UTC (permalink / raw) To: Jose E. Marchesi, Yonghong Song; +Cc: bpf On 7/25/23 10:29 AM, Jose E. Marchesi wrote: > > Hello Yonghong. > > We have noticed that the llvm disassembler uses different notations for > registers in load and store instructions, depending somehow on the width > of the data being loaded or stored. > > For example, this is an excerpt from the assembler-disassembler.s test > file in llvm: > > // Note: For the group below w1 is used as a destination for sizes u8, u16, u32. > // This is disassembler quirk, but is technically not wrong, as there are > // no different encodings for 'r1 = load' vs 'w1 = load'. > // > // CHECK: 71 21 2a 00 00 00 00 00 w1 = *(u8 *)(r2 + 0x2a) > // CHECK: 69 21 2a 00 00 00 00 00 w1 = *(u16 *)(r2 + 0x2a) > // CHECK: 61 21 2a 00 00 00 00 00 w1 = *(u32 *)(r2 + 0x2a) > // CHECK: 79 21 2a 00 00 00 00 00 r1 = *(u64 *)(r2 + 0x2a) > r1 = *(u8*)(r2 + 42) > r1 = *(u16*)(r2 + 42) > r1 = *(u32*)(r2 + 42) > r1 = *(u64*)(r2 + 42) > > The comment there clarifies that the usage of wN instead of rN in the > u8, u16 and u32 cases is a "disassembler quirk". > > Anyway, the problem is that it seems that `clang -S' actually emits > these forms with wN. > > Is that intended? Yes, this is intended since alu32 mode is enabled where w* registers are used for 8/16/32 bit load. Note that for newer sign-extended loads, even at alu32 mode, only r* register is used since the sign-extension extends upto 64 bits for all variants (8/16/32). > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Register encoding in assembly for load/store instructions 2023-07-25 18:47 ` Yonghong Song @ 2023-07-25 18:56 ` Jose E. Marchesi 2023-07-25 19:11 ` Jose E. Marchesi 2023-07-25 19:45 ` Yonghong Song 0 siblings, 2 replies; 16+ messages in thread From: Jose E. Marchesi @ 2023-07-25 18:56 UTC (permalink / raw) To: Yonghong Song; +Cc: Yonghong Song, bpf > On 7/25/23 10:29 AM, Jose E. Marchesi wrote: >> Hello Yonghong. >> We have noticed that the llvm disassembler uses different notations >> for >> registers in load and store instructions, depending somehow on the width >> of the data being loaded or stored. >> For example, this is an excerpt from the assembler-disassembler.s >> test >> file in llvm: >> // Note: For the group below w1 is used as a destination for >> sizes u8, u16, u32. >> // This is disassembler quirk, but is technically not wrong, as there are >> // no different encodings for 'r1 = load' vs 'w1 = load'. >> // >> // CHECK: 71 21 2a 00 00 00 00 00 w1 = *(u8 *)(r2 + 0x2a) >> // CHECK: 69 21 2a 00 00 00 00 00 w1 = *(u16 *)(r2 + 0x2a) >> // CHECK: 61 21 2a 00 00 00 00 00 w1 = *(u32 *)(r2 + 0x2a) >> // CHECK: 79 21 2a 00 00 00 00 00 r1 = *(u64 *)(r2 + 0x2a) >> r1 = *(u8*)(r2 + 42) >> r1 = *(u16*)(r2 + 42) >> r1 = *(u32*)(r2 + 42) >> r1 = *(u64*)(r2 + 42) >> The comment there clarifies that the usage of wN instead of rN in >> the >> u8, u16 and u32 cases is a "disassembler quirk". >> Anyway, the problem is that it seems that `clang -S' actually emits >> these forms with wN. >> Is that intended? > > Yes, this is intended since alu32 mode is enabled where > w* registers are used for 8/16/32 bit load. So then why suppporting 'r1 = 8948 8*9r2 + 0x2a)'? The mode is still alu32 mode. Isn't the u{8,16,32} part enough to discriminate? > Note that for newer sign-extended loads, even at alu32 mode, > only r* register is used since the sign-extension extends > upto 64 bits for all variants (8/16/32). Yes we noticed that :) > > > >> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Register encoding in assembly for load/store instructions 2023-07-25 18:56 ` Jose E. Marchesi @ 2023-07-25 19:11 ` Jose E. Marchesi 2023-07-25 19:59 ` Yonghong Song 2023-07-25 19:45 ` Yonghong Song 1 sibling, 1 reply; 16+ messages in thread From: Jose E. Marchesi @ 2023-07-25 19:11 UTC (permalink / raw) To: Yonghong Song; +Cc: Yonghong Song, bpf >> On 7/25/23 10:29 AM, Jose E. Marchesi wrote: >>> Hello Yonghong. >>> We have noticed that the llvm disassembler uses different notations >>> for >>> registers in load and store instructions, depending somehow on the width >>> of the data being loaded or stored. >>> For example, this is an excerpt from the assembler-disassembler.s >>> test >>> file in llvm: >>> // Note: For the group below w1 is used as a destination for >>> sizes u8, u16, u32. >>> // This is disassembler quirk, but is technically not wrong, as >>> there are >>> // no different encodings for 'r1 = load' vs 'w1 = load'. >>> // >>> // CHECK: 71 21 2a 00 00 00 00 00 w1 = *(u8 *)(r2 + 0x2a) >>> // CHECK: 69 21 2a 00 00 00 00 00 w1 = *(u16 *)(r2 + 0x2a) >>> // CHECK: 61 21 2a 00 00 00 00 00 w1 = *(u32 *)(r2 + 0x2a) >>> // CHECK: 79 21 2a 00 00 00 00 00 r1 = *(u64 *)(r2 + 0x2a) >>> r1 = *(u8*)(r2 + 42) >>> r1 = *(u16*)(r2 + 42) >>> r1 = *(u32*)(r2 + 42) >>> r1 = *(u64*)(r2 + 42) >>> The comment there clarifies that the usage of wN instead of rN in >>> the >>> u8, u16 and u32 cases is a "disassembler quirk". >>> Anyway, the problem is that it seems that `clang -S' actually emits >>> these forms with wN. >>> Is that intended? >> >> Yes, this is intended since alu32 mode is enabled where >> w* registers are used for 8/16/32 bit load. > > So then why suppporting 'r1 = 8948 8*9r2 + 0x2a)'? The mode is still > alu32 mode. Isn't the u{8,16,32} part enough to discriminate? Sorry my keyboard num-lock activated mid-sentence. I meant 'r1 = (u8*)(r2 + 42)'. Why supporting that syntax as well as 'w1 = (u8*)(r2 + 42)'? > >> Note that for newer sign-extended loads, even at alu32 mode, >> only r* register is used since the sign-extension extends >> upto 64 bits for all variants (8/16/32). > > Yes we noticed that :) > >> >> >> >>> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Register encoding in assembly for load/store instructions 2023-07-25 19:11 ` Jose E. Marchesi @ 2023-07-25 19:59 ` Yonghong Song 0 siblings, 0 replies; 16+ messages in thread From: Yonghong Song @ 2023-07-25 19:59 UTC (permalink / raw) To: Jose E. Marchesi; +Cc: Yonghong Song, bpf On 7/25/23 12:11 PM, Jose E. Marchesi wrote: > >>> On 7/25/23 10:29 AM, Jose E. Marchesi wrote: >>>> Hello Yonghong. >>>> We have noticed that the llvm disassembler uses different notations >>>> for >>>> registers in load and store instructions, depending somehow on the width >>>> of the data being loaded or stored. >>>> For example, this is an excerpt from the assembler-disassembler.s >>>> test >>>> file in llvm: >>>> // Note: For the group below w1 is used as a destination for >>>> sizes u8, u16, u32. >>>> // This is disassembler quirk, but is technically not wrong, as >>>> there are >>>> // no different encodings for 'r1 = load' vs 'w1 = load'. >>>> // >>>> // CHECK: 71 21 2a 00 00 00 00 00 w1 = *(u8 *)(r2 + 0x2a) >>>> // CHECK: 69 21 2a 00 00 00 00 00 w1 = *(u16 *)(r2 + 0x2a) >>>> // CHECK: 61 21 2a 00 00 00 00 00 w1 = *(u32 *)(r2 + 0x2a) >>>> // CHECK: 79 21 2a 00 00 00 00 00 r1 = *(u64 *)(r2 + 0x2a) >>>> r1 = *(u8*)(r2 + 42) >>>> r1 = *(u16*)(r2 + 42) >>>> r1 = *(u32*)(r2 + 42) >>>> r1 = *(u64*)(r2 + 42) >>>> The comment there clarifies that the usage of wN instead of rN in >>>> the >>>> u8, u16 and u32 cases is a "disassembler quirk". >>>> Anyway, the problem is that it seems that `clang -S' actually emits >>>> these forms with wN. >>>> Is that intended? >>> >>> Yes, this is intended since alu32 mode is enabled where >>> w* registers are used for 8/16/32 bit load. >> >> So then why suppporting 'r1 = 8948 8*9r2 + 0x2a)'? The mode is still >> alu32 mode. Isn't the u{8,16,32} part enough to discriminate? > > Sorry my keyboard num-lock activated mid-sentence. > > I meant 'r1 = (u8*)(r2 + 42)'. > Why supporting that syntax as well as 'w1 = (u8*)(r2 + 42)'? alu32 mode. Original intention is that if w1 = *(u8 *)(r2 + 42) is specified that the hardware will actually only load the value to the 32-bit sub-register. And then hardware will be doing 32-to-64 zero extension automatically. This is different from r1 = *(u8 *)(r2 + 42) where the value will actually load into the 64-bit register by insn itself. > >> >>> Note that for newer sign-extended loads, even at alu32 mode, >>> only r* register is used since the sign-extension extends >>> upto 64 bits for all variants (8/16/32). >> >> Yes we noticed that :) >> >>> >>> >>> >>>> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Register encoding in assembly for load/store instructions 2023-07-25 18:56 ` Jose E. Marchesi 2023-07-25 19:11 ` Jose E. Marchesi @ 2023-07-25 19:45 ` Yonghong Song 2023-07-25 20:09 ` Jose E. Marchesi 1 sibling, 1 reply; 16+ messages in thread From: Yonghong Song @ 2023-07-25 19:45 UTC (permalink / raw) To: Jose E. Marchesi; +Cc: Yonghong Song, bpf On 7/25/23 11:56 AM, Jose E. Marchesi wrote: > >> On 7/25/23 10:29 AM, Jose E. Marchesi wrote: >>> Hello Yonghong. >>> We have noticed that the llvm disassembler uses different notations >>> for >>> registers in load and store instructions, depending somehow on the width >>> of the data being loaded or stored. >>> For example, this is an excerpt from the assembler-disassembler.s >>> test >>> file in llvm: >>> // Note: For the group below w1 is used as a destination for >>> sizes u8, u16, u32. >>> // This is disassembler quirk, but is technically not wrong, as there are >>> // no different encodings for 'r1 = load' vs 'w1 = load'. >>> // >>> // CHECK: 71 21 2a 00 00 00 00 00 w1 = *(u8 *)(r2 + 0x2a) >>> // CHECK: 69 21 2a 00 00 00 00 00 w1 = *(u16 *)(r2 + 0x2a) >>> // CHECK: 61 21 2a 00 00 00 00 00 w1 = *(u32 *)(r2 + 0x2a) >>> // CHECK: 79 21 2a 00 00 00 00 00 r1 = *(u64 *)(r2 + 0x2a) >>> r1 = *(u8*)(r2 + 42) >>> r1 = *(u16*)(r2 + 42) >>> r1 = *(u32*)(r2 + 42) >>> r1 = *(u64*)(r2 + 42) >>> The comment there clarifies that the usage of wN instead of rN in >>> the >>> u8, u16 and u32 cases is a "disassembler quirk". >>> Anyway, the problem is that it seems that `clang -S' actually emits >>> these forms with wN. >>> Is that intended? >> >> Yes, this is intended since alu32 mode is enabled where >> w* registers are used for 8/16/32 bit load. > > So then why suppporting 'r1 = 8948 8*9r2 + 0x2a)'? The mode is still > alu32 mode. Isn't the u{8,16,32} part enough to discriminate? What does this 'r1 = 8948 8*9r2 + 0x2a)' mean? For u8/u16/u32 loads, if objdump with option to indicate alu32 mode, then w* register is used. If no alu32 mode for objdump, then r* register is used. Basically the same insn, disasm is different depending on alu32 mode or not. u8/u16/u32 is not enough to differentiate. > >> Note that for newer sign-extended loads, even at alu32 mode, >> only r* register is used since the sign-extension extends >> upto 64 bits for all variants (8/16/32). > > Yes we noticed that :) > >> >> >> >>> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Register encoding in assembly for load/store instructions 2023-07-25 19:45 ` Yonghong Song @ 2023-07-25 20:09 ` Jose E. Marchesi 2023-07-25 22:10 ` Yonghong Song 0 siblings, 1 reply; 16+ messages in thread From: Jose E. Marchesi @ 2023-07-25 20:09 UTC (permalink / raw) To: Yonghong Song; +Cc: Yonghong Song, bpf > On 7/25/23 11:56 AM, Jose E. Marchesi wrote: >> >>> On 7/25/23 10:29 AM, Jose E. Marchesi wrote: >>>> Hello Yonghong. >>>> We have noticed that the llvm disassembler uses different notations >>>> for >>>> registers in load and store instructions, depending somehow on the width >>>> of the data being loaded or stored. >>>> For example, this is an excerpt from the assembler-disassembler.s >>>> test >>>> file in llvm: >>>> // Note: For the group below w1 is used as a destination for >>>> sizes u8, u16, u32. >>>> // This is disassembler quirk, but is technically not wrong, as there are >>>> // no different encodings for 'r1 = load' vs 'w1 = load'. >>>> // >>>> // CHECK: 71 21 2a 00 00 00 00 00 w1 = *(u8 *)(r2 + 0x2a) >>>> // CHECK: 69 21 2a 00 00 00 00 00 w1 = *(u16 *)(r2 + 0x2a) >>>> // CHECK: 61 21 2a 00 00 00 00 00 w1 = *(u32 *)(r2 + 0x2a) >>>> // CHECK: 79 21 2a 00 00 00 00 00 r1 = *(u64 *)(r2 + 0x2a) >>>> r1 = *(u8*)(r2 + 42) >>>> r1 = *(u16*)(r2 + 42) >>>> r1 = *(u32*)(r2 + 42) >>>> r1 = *(u64*)(r2 + 42) >>>> The comment there clarifies that the usage of wN instead of rN in >>>> the >>>> u8, u16 and u32 cases is a "disassembler quirk". >>>> Anyway, the problem is that it seems that `clang -S' actually emits >>>> these forms with wN. >>>> Is that intended? >>> >>> Yes, this is intended since alu32 mode is enabled where >>> w* registers are used for 8/16/32 bit load. >> So then why suppporting 'r1 = 8948 8*9r2 + 0x2a)'? The mode is >> still >> alu32 mode. Isn't the u{8,16,32} part enough to discriminate? > > What does this 'r1 = 8948 8*9r2 + 0x2a)' mean? > > For u8/u16/u32 loads, if objdump with option to indicate alu32 mode, > then w* register is used. If no alu32 mode for objdump, then r* register > is used. Basically the same insn, disasm is different depending on > alu32 mode or not. u8/u16/u32 is not enough to differentiate. Ok, so the llvm objdump has a switch that tells when to use rN or wN when printing these particular instructions. Thats the "disassembler quirk". To what purpose? Isnt the person passing the command line switch the same person reading the disassembled program? Is this "alu32 mode" more than a cosmetic thing? But what concern us is the assembler, not the disassembler. clang -S (which is not objdump) seems to generate these instructions with wN (see https://godbolt.org/z/5G433Yvrb for a store instruction for example) and we assume the output of clang -S is intended to be passed to an assembler, much like with gcc -S. So, should we support both syntaxes as _input_ syntax in the assembler? >> >>> Note that for newer sign-extended loads, even at alu32 mode, >>> only r* register is used since the sign-extension extends >>> upto 64 bits for all variants (8/16/32). >> Yes we noticed that :) >> >>> >>> >>> >>>> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Register encoding in assembly for load/store instructions 2023-07-25 20:09 ` Jose E. Marchesi @ 2023-07-25 22:10 ` Yonghong Song 2023-07-25 22:26 ` Jose E. Marchesi 0 siblings, 1 reply; 16+ messages in thread From: Yonghong Song @ 2023-07-25 22:10 UTC (permalink / raw) To: Jose E. Marchesi; +Cc: Yonghong Song, bpf On 7/25/23 1:09 PM, Jose E. Marchesi wrote: > > >> On 7/25/23 11:56 AM, Jose E. Marchesi wrote: >>> >>>> On 7/25/23 10:29 AM, Jose E. Marchesi wrote: >>>>> Hello Yonghong. >>>>> We have noticed that the llvm disassembler uses different notations >>>>> for >>>>> registers in load and store instructions, depending somehow on the width >>>>> of the data being loaded or stored. >>>>> For example, this is an excerpt from the assembler-disassembler.s >>>>> test >>>>> file in llvm: >>>>> // Note: For the group below w1 is used as a destination for >>>>> sizes u8, u16, u32. >>>>> // This is disassembler quirk, but is technically not wrong, as there are >>>>> // no different encodings for 'r1 = load' vs 'w1 = load'. >>>>> // >>>>> // CHECK: 71 21 2a 00 00 00 00 00 w1 = *(u8 *)(r2 + 0x2a) >>>>> // CHECK: 69 21 2a 00 00 00 00 00 w1 = *(u16 *)(r2 + 0x2a) >>>>> // CHECK: 61 21 2a 00 00 00 00 00 w1 = *(u32 *)(r2 + 0x2a) >>>>> // CHECK: 79 21 2a 00 00 00 00 00 r1 = *(u64 *)(r2 + 0x2a) >>>>> r1 = *(u8*)(r2 + 42) >>>>> r1 = *(u16*)(r2 + 42) >>>>> r1 = *(u32*)(r2 + 42) >>>>> r1 = *(u64*)(r2 + 42) >>>>> The comment there clarifies that the usage of wN instead of rN in >>>>> the >>>>> u8, u16 and u32 cases is a "disassembler quirk". >>>>> Anyway, the problem is that it seems that `clang -S' actually emits >>>>> these forms with wN. >>>>> Is that intended? >>>> >>>> Yes, this is intended since alu32 mode is enabled where >>>> w* registers are used for 8/16/32 bit load. >>> So then why suppporting 'r1 = 8948 8*9r2 + 0x2a)'? The mode is >>> still >>> alu32 mode. Isn't the u{8,16,32} part enough to discriminate? >> >> What does this 'r1 = 8948 8*9r2 + 0x2a)' mean? >> >> For u8/u16/u32 loads, if objdump with option to indicate alu32 mode, >> then w* register is used. If no alu32 mode for objdump, then r* register >> is used. Basically the same insn, disasm is different depending on >> alu32 mode or not. u8/u16/u32 is not enough to differentiate. > > Ok, so the llvm objdump has a switch that tells when to use rN or wN > when printing these particular instructions. Thats the "disassembler > quirk". To what purpose? Isnt the person passing the command line > switch the same person reading the disassembled program? Is this "alu32 > mode" more than a cosmetic thing? > > But what concern us is the assembler, not the disassembler. > > clang -S (which is not objdump) seems to generate these instructions > with wN (see https://godbolt.org/z/5G433Yvrb for a store instruction for > example) and we assume the output of clang -S is intended to be passed > to an assembler, much like with gcc -S. > > So, should we support both syntaxes as _input_ syntax in the assembler? Considering -mcpu=v3 is recommended cpu flavor (at least in bpf mailing list), and -mcpu=v3 has alu32 enabled by default. So I think gcc can start to emit insn assuming alu32 mode is on by default. So w1 = *(u8 *)(r2 + 42) is preferred. > >>> >>>> Note that for newer sign-extended loads, even at alu32 mode, >>>> only r* register is used since the sign-extension extends >>>> upto 64 bits for all variants (8/16/32). >>> Yes we noticed that :) >>> >>>> >>>> >>>> >>>>> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Register encoding in assembly for load/store instructions 2023-07-25 22:10 ` Yonghong Song @ 2023-07-25 22:26 ` Jose E. Marchesi 2023-07-26 0:31 ` Alexei Starovoitov 0 siblings, 1 reply; 16+ messages in thread From: Jose E. Marchesi @ 2023-07-25 22:26 UTC (permalink / raw) To: Yonghong Song; +Cc: Yonghong Song, bpf > On 7/25/23 1:09 PM, Jose E. Marchesi wrote: >> >>> On 7/25/23 11:56 AM, Jose E. Marchesi wrote: >>>> >>>>> On 7/25/23 10:29 AM, Jose E. Marchesi wrote: >>>>>> Hello Yonghong. >>>>>> We have noticed that the llvm disassembler uses different notations >>>>>> for >>>>>> registers in load and store instructions, depending somehow on the width >>>>>> of the data being loaded or stored. >>>>>> For example, this is an excerpt from the assembler-disassembler.s >>>>>> test >>>>>> file in llvm: >>>>>> // Note: For the group below w1 is used as a destination for >>>>>> sizes u8, u16, u32. >>>>>> // This is disassembler quirk, but is technically not wrong, as there are >>>>>> // no different encodings for 'r1 = load' vs 'w1 = load'. >>>>>> // >>>>>> // CHECK: 71 21 2a 00 00 00 00 00 w1 = *(u8 *)(r2 + 0x2a) >>>>>> // CHECK: 69 21 2a 00 00 00 00 00 w1 = *(u16 *)(r2 + 0x2a) >>>>>> // CHECK: 61 21 2a 00 00 00 00 00 w1 = *(u32 *)(r2 + 0x2a) >>>>>> // CHECK: 79 21 2a 00 00 00 00 00 r1 = *(u64 *)(r2 + 0x2a) >>>>>> r1 = *(u8*)(r2 + 42) >>>>>> r1 = *(u16*)(r2 + 42) >>>>>> r1 = *(u32*)(r2 + 42) >>>>>> r1 = *(u64*)(r2 + 42) >>>>>> The comment there clarifies that the usage of wN instead of rN in >>>>>> the >>>>>> u8, u16 and u32 cases is a "disassembler quirk". >>>>>> Anyway, the problem is that it seems that `clang -S' actually emits >>>>>> these forms with wN. >>>>>> Is that intended? >>>>> >>>>> Yes, this is intended since alu32 mode is enabled where >>>>> w* registers are used for 8/16/32 bit load. >>>> So then why suppporting 'r1 = 8948 8*9r2 + 0x2a)'? The mode is >>>> still >>>> alu32 mode. Isn't the u{8,16,32} part enough to discriminate? >>> >>> What does this 'r1 = 8948 8*9r2 + 0x2a)' mean? >>> >>> For u8/u16/u32 loads, if objdump with option to indicate alu32 mode, >>> then w* register is used. If no alu32 mode for objdump, then r* register >>> is used. Basically the same insn, disasm is different depending on >>> alu32 mode or not. u8/u16/u32 is not enough to differentiate. >> Ok, so the llvm objdump has a switch that tells when to use rN or wN >> when printing these particular instructions. Thats the "disassembler >> quirk". To what purpose? Isnt the person passing the command line >> switch the same person reading the disassembled program? Is this "alu32 >> mode" more than a cosmetic thing? >> But what concern us is the assembler, not the disassembler. >> clang -S (which is not objdump) seems to generate these instructions >> with wN (see https://godbolt.org/z/5G433Yvrb for a store instruction for >> example) and we assume the output of clang -S is intended to be passed >> to an assembler, much like with gcc -S. >> So, should we support both syntaxes as _input_ syntax in the >> assembler? > > Considering -mcpu=v3 is recommended cpu flavor (at least in bpf mailing > list), and -mcpu=v3 has alu32 enabled by default. So I think > gcc can start to emit insn assuming alu32 mode is on by default. > So > w1 = *(u8 *)(r2 + 42) > is preferred. We have V4 by default now. So we can emit w1 = *(u8 *)(r2 + 42) when -mcpu is v3 or higher, or if -malu32 is specified, and r1 = *(u8 *)(r2 + 42) when -mcpu is v2 or lower, or if -mnoalu32 is specified. Sounds good? However this implies that the assembler should indeed recognize both forms of instructions. But note that it will assembly them to the exactly same encoded instruction. This includes inline asm (remember GCC does not have an integrated assembler.) > >> >>>> >>>>> Note that for newer sign-extended loads, even at alu32 mode, >>>>> only r* register is used since the sign-extension extends >>>>> upto 64 bits for all variants (8/16/32). >>>> Yes we noticed that :) >>>> >>>>> >>>>> >>>>> >>>>>> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Register encoding in assembly for load/store instructions 2023-07-25 22:26 ` Jose E. Marchesi @ 2023-07-26 0:31 ` Alexei Starovoitov 2023-07-26 0:39 ` Eduard Zingerman 0 siblings, 1 reply; 16+ messages in thread From: Alexei Starovoitov @ 2023-07-26 0:31 UTC (permalink / raw) To: Jose E. Marchesi; +Cc: Yonghong Song, Yonghong Song, bpf On Tue, Jul 25, 2023 at 3:28 PM Jose E. Marchesi <jose.marchesi@oracle.com> wrote: > > > > On 7/25/23 1:09 PM, Jose E. Marchesi wrote: > >> > >>> On 7/25/23 11:56 AM, Jose E. Marchesi wrote: > >>>> > >>>>> On 7/25/23 10:29 AM, Jose E. Marchesi wrote: > >>>>>> Hello Yonghong. > >>>>>> We have noticed that the llvm disassembler uses different notations > >>>>>> for > >>>>>> registers in load and store instructions, depending somehow on the width > >>>>>> of the data being loaded or stored. > >>>>>> For example, this is an excerpt from the assembler-disassembler.s > >>>>>> test > >>>>>> file in llvm: > >>>>>> // Note: For the group below w1 is used as a destination for > >>>>>> sizes u8, u16, u32. > >>>>>> // This is disassembler quirk, but is technically not wrong, as there are > >>>>>> // no different encodings for 'r1 = load' vs 'w1 = load'. > >>>>>> // > >>>>>> // CHECK: 71 21 2a 00 00 00 00 00 w1 = *(u8 *)(r2 + 0x2a) > >>>>>> // CHECK: 69 21 2a 00 00 00 00 00 w1 = *(u16 *)(r2 + 0x2a) > >>>>>> // CHECK: 61 21 2a 00 00 00 00 00 w1 = *(u32 *)(r2 + 0x2a) > >>>>>> // CHECK: 79 21 2a 00 00 00 00 00 r1 = *(u64 *)(r2 + 0x2a) > >>>>>> r1 = *(u8*)(r2 + 42) > >>>>>> r1 = *(u16*)(r2 + 42) > >>>>>> r1 = *(u32*)(r2 + 42) > >>>>>> r1 = *(u64*)(r2 + 42) > >>>>>> The comment there clarifies that the usage of wN instead of rN in > >>>>>> the > >>>>>> u8, u16 and u32 cases is a "disassembler quirk". > >>>>>> Anyway, the problem is that it seems that `clang -S' actually emits > >>>>>> these forms with wN. > >>>>>> Is that intended? > >>>>> > >>>>> Yes, this is intended since alu32 mode is enabled where > >>>>> w* registers are used for 8/16/32 bit load. > >>>> So then why suppporting 'r1 = 8948 8*9r2 + 0x2a)'? The mode is > >>>> still > >>>> alu32 mode. Isn't the u{8,16,32} part enough to discriminate? > >>> > >>> What does this 'r1 = 8948 8*9r2 + 0x2a)' mean? > >>> > >>> For u8/u16/u32 loads, if objdump with option to indicate alu32 mode, > >>> then w* register is used. If no alu32 mode for objdump, then r* register > >>> is used. Basically the same insn, disasm is different depending on > >>> alu32 mode or not. u8/u16/u32 is not enough to differentiate. > >> Ok, so the llvm objdump has a switch that tells when to use rN or wN > >> when printing these particular instructions. Thats the "disassembler > >> quirk". To what purpose? Isnt the person passing the command line > >> switch the same person reading the disassembled program? Is this "alu32 > >> mode" more than a cosmetic thing? > >> But what concern us is the assembler, not the disassembler. > >> clang -S (which is not objdump) seems to generate these instructions > >> with wN (see https://godbolt.org/z/5G433Yvrb for a store instruction for > >> example) and we assume the output of clang -S is intended to be passed > >> to an assembler, much like with gcc -S. > >> So, should we support both syntaxes as _input_ syntax in the > >> assembler? > > > > Considering -mcpu=v3 is recommended cpu flavor (at least in bpf mailing > > list), and -mcpu=v3 has alu32 enabled by default. So I think > > gcc can start to emit insn assuming alu32 mode is on by default. > > So > > w1 = *(u8 *)(r2 + 42) > > is preferred. > > We have V4 by default now. So we can emit > > w1 = *(u8 *)(r2 + 42) > > when -mcpu is v3 or higher, or if -malu32 is specified, and > > r1 = *(u8 *)(r2 + 42) > > when -mcpu is v2 or lower, or if -mnoalu32 is specified. > > Sounds good? > > However this implies that the assembler should indeed recognize both > forms of instructions. But note that it will assembly them to the > exactly same encoded instruction. This includes inline asm (remember > GCC does not have an integrated assembler.) Good point. I think we made a mistake in clang. We shouldn't be printing w1 = *(u8 *)(r2 + 42) since such instruction doesn't exist in BPF ISA and it's confusing. There is only one instruction: r1 = *(u8 *)(r2 + 42) which is an 8-bit load that zero extends into 64-bit. x86 JIT actually implements it as 8-bit load that stores into a 32-bit subregister, so it kinda matches w1, but that's an implementation detail of the JIT. I think both gcc and clang should always print r1 = *(u8 *)(r2 + 42) regardless of alu32 or not. In gas and clang assembler we can support both w1= and r1= flavors for backward compat. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Register encoding in assembly for load/store instructions 2023-07-26 0:31 ` Alexei Starovoitov @ 2023-07-26 0:39 ` Eduard Zingerman 2023-07-26 4:16 ` Yonghong Song 0 siblings, 1 reply; 16+ messages in thread From: Eduard Zingerman @ 2023-07-26 0:39 UTC (permalink / raw) To: Alexei Starovoitov, Jose E. Marchesi; +Cc: Yonghong Song, Yonghong Song, bpf On Tue, 2023-07-25 at 17:31 -0700, Alexei Starovoitov wrote: > On Tue, Jul 25, 2023 at 3:28 PM Jose E. Marchesi > <jose.marchesi@oracle.com> wrote: > > > > > > > On 7/25/23 1:09 PM, Jose E. Marchesi wrote: > > > > > > > > > On 7/25/23 11:56 AM, Jose E. Marchesi wrote: > > > > > > > > > > > > > On 7/25/23 10:29 AM, Jose E. Marchesi wrote: > > > > > > > > Hello Yonghong. > > > > > > > > We have noticed that the llvm disassembler uses different notations > > > > > > > > for > > > > > > > > registers in load and store instructions, depending somehow on the width > > > > > > > > of the data being loaded or stored. > > > > > > > > For example, this is an excerpt from the assembler-disassembler.s > > > > > > > > test > > > > > > > > file in llvm: > > > > > > > > // Note: For the group below w1 is used as a destination for > > > > > > > > sizes u8, u16, u32. > > > > > > > > // This is disassembler quirk, but is technically not wrong, as there are > > > > > > > > // no different encodings for 'r1 = load' vs 'w1 = load'. > > > > > > > > // > > > > > > > > // CHECK: 71 21 2a 00 00 00 00 00 w1 = *(u8 *)(r2 + 0x2a) > > > > > > > > // CHECK: 69 21 2a 00 00 00 00 00 w1 = *(u16 *)(r2 + 0x2a) > > > > > > > > // CHECK: 61 21 2a 00 00 00 00 00 w1 = *(u32 *)(r2 + 0x2a) > > > > > > > > // CHECK: 79 21 2a 00 00 00 00 00 r1 = *(u64 *)(r2 + 0x2a) > > > > > > > > r1 = *(u8*)(r2 + 42) > > > > > > > > r1 = *(u16*)(r2 + 42) > > > > > > > > r1 = *(u32*)(r2 + 42) > > > > > > > > r1 = *(u64*)(r2 + 42) > > > > > > > > The comment there clarifies that the usage of wN instead of rN in > > > > > > > > the > > > > > > > > u8, u16 and u32 cases is a "disassembler quirk". > > > > > > > > Anyway, the problem is that it seems that `clang -S' actually emits > > > > > > > > these forms with wN. > > > > > > > > Is that intended? > > > > > > > > > > > > > > Yes, this is intended since alu32 mode is enabled where > > > > > > > w* registers are used for 8/16/32 bit load. > > > > > > So then why suppporting 'r1 = 8948 8*9r2 + 0x2a)'? The mode is > > > > > > still > > > > > > alu32 mode. Isn't the u{8,16,32} part enough to discriminate? > > > > > > > > > > What does this 'r1 = 8948 8*9r2 + 0x2a)' mean? > > > > > > > > > > For u8/u16/u32 loads, if objdump with option to indicate alu32 mode, > > > > > then w* register is used. If no alu32 mode for objdump, then r* register > > > > > is used. Basically the same insn, disasm is different depending on > > > > > alu32 mode or not. u8/u16/u32 is not enough to differentiate. > > > > Ok, so the llvm objdump has a switch that tells when to use rN or wN > > > > when printing these particular instructions. Thats the "disassembler > > > > quirk". To what purpose? Isnt the person passing the command line > > > > switch the same person reading the disassembled program? Is this "alu32 > > > > mode" more than a cosmetic thing? > > > > But what concern us is the assembler, not the disassembler. > > > > clang -S (which is not objdump) seems to generate these instructions > > > > with wN (see https://godbolt.org/z/5G433Yvrb for a store instruction for > > > > example) and we assume the output of clang -S is intended to be passed > > > > to an assembler, much like with gcc -S. > > > > So, should we support both syntaxes as _input_ syntax in the > > > > assembler? > > > > > > Considering -mcpu=v3 is recommended cpu flavor (at least in bpf mailing > > > list), and -mcpu=v3 has alu32 enabled by default. So I think > > > gcc can start to emit insn assuming alu32 mode is on by default. > > > So > > > w1 = *(u8 *)(r2 + 42) > > > is preferred. > > > > We have V4 by default now. So we can emit > > > > w1 = *(u8 *)(r2 + 42) > > > > when -mcpu is v3 or higher, or if -malu32 is specified, and > > > > r1 = *(u8 *)(r2 + 42) > > > > when -mcpu is v2 or lower, or if -mnoalu32 is specified. > > > > Sounds good? > > > > However this implies that the assembler should indeed recognize both > > forms of instructions. But note that it will assembly them to the > > exactly same encoded instruction. This includes inline asm (remember > > GCC does not have an integrated assembler.) > > Good point. > I think we made a mistake in clang. > We shouldn't be printing > w1 = *(u8 *)(r2 + 42) > since such instruction doesn't exist in BPF ISA > and it's confusing. > There is only one instruction: > r1 = *(u8 *)(r2 + 42) > which is an 8-bit load that zero extends into 64-bit. > x86 JIT actually implements it as 8-bit load that stores > into a 32-bit subregister, so it kinda matches w1, > but that's an implementation detail of the JIT. > > I think both gcc and clang should always print r1 = *(u8 *)(r2 + 42) > regardless of alu32 or not. > In gas and clang assembler we can support both w1= and r1= > flavors for backward compat. > I agree with Alexei (the ... disassembler quirk ... comment is left by me :). Can dig into clang part of things if this is a consensus. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Register encoding in assembly for load/store instructions 2023-07-26 0:39 ` Eduard Zingerman @ 2023-07-26 4:16 ` Yonghong Song 2023-07-26 14:41 ` Eduard Zingerman 2023-07-28 16:58 ` Eduard Zingerman 0 siblings, 2 replies; 16+ messages in thread From: Yonghong Song @ 2023-07-26 4:16 UTC (permalink / raw) To: Eduard Zingerman, Alexei Starovoitov, Jose E. Marchesi; +Cc: Yonghong Song, bpf On 7/25/23 5:39 PM, Eduard Zingerman wrote: > On Tue, 2023-07-25 at 17:31 -0700, Alexei Starovoitov wrote: >> On Tue, Jul 25, 2023 at 3:28 PM Jose E. Marchesi >> <jose.marchesi@oracle.com> wrote: >>> >>> >>>> On 7/25/23 1:09 PM, Jose E. Marchesi wrote: >>>>> >>>>>> On 7/25/23 11:56 AM, Jose E. Marchesi wrote: >>>>>>> >>>>>>>> On 7/25/23 10:29 AM, Jose E. Marchesi wrote: >>>>>>>>> Hello Yonghong. >>>>>>>>> We have noticed that the llvm disassembler uses different notations >>>>>>>>> for >>>>>>>>> registers in load and store instructions, depending somehow on the width >>>>>>>>> of the data being loaded or stored. >>>>>>>>> For example, this is an excerpt from the assembler-disassembler.s >>>>>>>>> test >>>>>>>>> file in llvm: >>>>>>>>> // Note: For the group below w1 is used as a destination for >>>>>>>>> sizes u8, u16, u32. >>>>>>>>> // This is disassembler quirk, but is technically not wrong, as there are >>>>>>>>> // no different encodings for 'r1 = load' vs 'w1 = load'. >>>>>>>>> // >>>>>>>>> // CHECK: 71 21 2a 00 00 00 00 00 w1 = *(u8 *)(r2 + 0x2a) >>>>>>>>> // CHECK: 69 21 2a 00 00 00 00 00 w1 = *(u16 *)(r2 + 0x2a) >>>>>>>>> // CHECK: 61 21 2a 00 00 00 00 00 w1 = *(u32 *)(r2 + 0x2a) >>>>>>>>> // CHECK: 79 21 2a 00 00 00 00 00 r1 = *(u64 *)(r2 + 0x2a) >>>>>>>>> r1 = *(u8*)(r2 + 42) >>>>>>>>> r1 = *(u16*)(r2 + 42) >>>>>>>>> r1 = *(u32*)(r2 + 42) >>>>>>>>> r1 = *(u64*)(r2 + 42) >>>>>>>>> The comment there clarifies that the usage of wN instead of rN in >>>>>>>>> the >>>>>>>>> u8, u16 and u32 cases is a "disassembler quirk". >>>>>>>>> Anyway, the problem is that it seems that `clang -S' actually emits >>>>>>>>> these forms with wN. >>>>>>>>> Is that intended? >>>>>>>> >>>>>>>> Yes, this is intended since alu32 mode is enabled where >>>>>>>> w* registers are used for 8/16/32 bit load. >>>>>>> So then why suppporting 'r1 = 8948 8*9r2 + 0x2a)'? The mode is >>>>>>> still >>>>>>> alu32 mode. Isn't the u{8,16,32} part enough to discriminate? >>>>>> >>>>>> What does this 'r1 = 8948 8*9r2 + 0x2a)' mean? >>>>>> >>>>>> For u8/u16/u32 loads, if objdump with option to indicate alu32 mode, >>>>>> then w* register is used. If no alu32 mode for objdump, then r* register >>>>>> is used. Basically the same insn, disasm is different depending on >>>>>> alu32 mode or not. u8/u16/u32 is not enough to differentiate. >>>>> Ok, so the llvm objdump has a switch that tells when to use rN or wN >>>>> when printing these particular instructions. Thats the "disassembler >>>>> quirk". To what purpose? Isnt the person passing the command line >>>>> switch the same person reading the disassembled program? Is this "alu32 >>>>> mode" more than a cosmetic thing? >>>>> But what concern us is the assembler, not the disassembler. >>>>> clang -S (which is not objdump) seems to generate these instructions >>>>> with wN (see https://godbolt.org/z/5G433Yvrb for a store instruction for >>>>> example) and we assume the output of clang -S is intended to be passed >>>>> to an assembler, much like with gcc -S. >>>>> So, should we support both syntaxes as _input_ syntax in the >>>>> assembler? >>>> >>>> Considering -mcpu=v3 is recommended cpu flavor (at least in bpf mailing >>>> list), and -mcpu=v3 has alu32 enabled by default. So I think >>>> gcc can start to emit insn assuming alu32 mode is on by default. >>>> So >>>> w1 = *(u8 *)(r2 + 42) >>>> is preferred. >>> >>> We have V4 by default now. So we can emit >>> >>> w1 = *(u8 *)(r2 + 42) >>> >>> when -mcpu is v3 or higher, or if -malu32 is specified, and >>> >>> r1 = *(u8 *)(r2 + 42) >>> >>> when -mcpu is v2 or lower, or if -mnoalu32 is specified. >>> >>> Sounds good? >>> >>> However this implies that the assembler should indeed recognize both >>> forms of instructions. But note that it will assembly them to the >>> exactly same encoded instruction. This includes inline asm (remember >>> GCC does not have an integrated assembler.) >> >> Good point. >> I think we made a mistake in clang. >> We shouldn't be printing >> w1 = *(u8 *)(r2 + 42) >> since such instruction doesn't exist in BPF ISA >> and it's confusing. >> There is only one instruction: >> r1 = *(u8 *)(r2 + 42) >> which is an 8-bit load that zero extends into 64-bit. >> x86 JIT actually implements it as 8-bit load that stores >> into a 32-bit subregister, so it kinda matches w1, >> but that's an implementation detail of the JIT. >> >> I think both gcc and clang should always print r1 = *(u8 *)(r2 + 42) >> regardless of alu32 or not. >> In gas and clang assembler we can support both w1= and r1= >> flavors for backward compat. >> > > I agree with Alexei (the ... disassembler quirk ... comment is left by me :). > Can dig into clang part of things if this is a consensus. For disassembler, we have stx as well may use w* registers with alu32. In llvm BPFDisassembler.cpp, we have if ((InstClass == BPF_LDX || InstClass == BPF_STX) && getInstSize(Insn) != BPF_DW && (InstMode == BPF_MEM || InstMode == BPF_ATOMIC) && STI.hasFeature(BPF::ALU32)) Result = decodeInstruction(DecoderTableBPFALU3264, Instr, Insn, Address, this, STI); else Result = decodeInstruction(DecoderTableBPF64, Instr, Insn, Address, this, STI); Maybe we should just do Result = decodeInstruction(DecoderTableBPF64, Instr, Insn, Address, this, STI); So we already disassemble based on non-alu32 mode? ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Register encoding in assembly for load/store instructions 2023-07-26 4:16 ` Yonghong Song @ 2023-07-26 14:41 ` Eduard Zingerman 2023-07-28 16:58 ` Eduard Zingerman 1 sibling, 0 replies; 16+ messages in thread From: Eduard Zingerman @ 2023-07-26 14:41 UTC (permalink / raw) To: yonghong.song, Alexei Starovoitov, Jose E. Marchesi; +Cc: Yonghong Song, bpf On Tue, 2023-07-25 at 21:16 -0700, Yonghong Song wrote: > > On 7/25/23 5:39 PM, Eduard Zingerman wrote: > > On Tue, 2023-07-25 at 17:31 -0700, Alexei Starovoitov wrote: > > > On Tue, Jul 25, 2023 at 3:28 PM Jose E. Marchesi > > > <jose.marchesi@oracle.com> wrote: > > > > > > > > > > > > > On 7/25/23 1:09 PM, Jose E. Marchesi wrote: > > > > > > > > > > > > > On 7/25/23 11:56 AM, Jose E. Marchesi wrote: > > > > > > > > > > > > > > > > > On 7/25/23 10:29 AM, Jose E. Marchesi wrote: > > > > > > > > > > Hello Yonghong. > > > > > > > > > > We have noticed that the llvm disassembler uses different notations > > > > > > > > > > for > > > > > > > > > > registers in load and store instructions, depending somehow on the width > > > > > > > > > > of the data being loaded or stored. > > > > > > > > > > For example, this is an excerpt from the assembler-disassembler.s > > > > > > > > > > test > > > > > > > > > > file in llvm: > > > > > > > > > > // Note: For the group below w1 is used as a destination for > > > > > > > > > > sizes u8, u16, u32. > > > > > > > > > > // This is disassembler quirk, but is technically not wrong, as there are > > > > > > > > > > // no different encodings for 'r1 = load' vs 'w1 = load'. > > > > > > > > > > // > > > > > > > > > > // CHECK: 71 21 2a 00 00 00 00 00 w1 = *(u8 *)(r2 + 0x2a) > > > > > > > > > > // CHECK: 69 21 2a 00 00 00 00 00 w1 = *(u16 *)(r2 + 0x2a) > > > > > > > > > > // CHECK: 61 21 2a 00 00 00 00 00 w1 = *(u32 *)(r2 + 0x2a) > > > > > > > > > > // CHECK: 79 21 2a 00 00 00 00 00 r1 = *(u64 *)(r2 + 0x2a) > > > > > > > > > > r1 = *(u8*)(r2 + 42) > > > > > > > > > > r1 = *(u16*)(r2 + 42) > > > > > > > > > > r1 = *(u32*)(r2 + 42) > > > > > > > > > > r1 = *(u64*)(r2 + 42) > > > > > > > > > > The comment there clarifies that the usage of wN instead of rN in > > > > > > > > > > the > > > > > > > > > > u8, u16 and u32 cases is a "disassembler quirk". > > > > > > > > > > Anyway, the problem is that it seems that `clang -S' actually emits > > > > > > > > > > these forms with wN. > > > > > > > > > > Is that intended? > > > > > > > > > > > > > > > > > > Yes, this is intended since alu32 mode is enabled where > > > > > > > > > w* registers are used for 8/16/32 bit load. > > > > > > > > So then why suppporting 'r1 = 8948 8*9r2 + 0x2a)'? The mode is > > > > > > > > still > > > > > > > > alu32 mode. Isn't the u{8,16,32} part enough to discriminate? > > > > > > > > > > > > > > What does this 'r1 = 8948 8*9r2 + 0x2a)' mean? > > > > > > > > > > > > > > For u8/u16/u32 loads, if objdump with option to indicate alu32 mode, > > > > > > > then w* register is used. If no alu32 mode for objdump, then r* register > > > > > > > is used. Basically the same insn, disasm is different depending on > > > > > > > alu32 mode or not. u8/u16/u32 is not enough to differentiate. > > > > > > Ok, so the llvm objdump has a switch that tells when to use rN or wN > > > > > > when printing these particular instructions. Thats the "disassembler > > > > > > quirk". To what purpose? Isnt the person passing the command line > > > > > > switch the same person reading the disassembled program? Is this "alu32 > > > > > > mode" more than a cosmetic thing? > > > > > > But what concern us is the assembler, not the disassembler. > > > > > > clang -S (which is not objdump) seems to generate these instructions > > > > > > with wN (see https://godbolt.org/z/5G433Yvrb for a store instruction for > > > > > > example) and we assume the output of clang -S is intended to be passed > > > > > > to an assembler, much like with gcc -S. > > > > > > So, should we support both syntaxes as _input_ syntax in the > > > > > > assembler? > > > > > > > > > > Considering -mcpu=v3 is recommended cpu flavor (at least in bpf mailing > > > > > list), and -mcpu=v3 has alu32 enabled by default. So I think > > > > > gcc can start to emit insn assuming alu32 mode is on by default. > > > > > So > > > > > w1 = *(u8 *)(r2 + 42) > > > > > is preferred. > > > > > > > > We have V4 by default now. So we can emit > > > > > > > > w1 = *(u8 *)(r2 + 42) > > > > > > > > when -mcpu is v3 or higher, or if -malu32 is specified, and > > > > > > > > r1 = *(u8 *)(r2 + 42) > > > > > > > > when -mcpu is v2 or lower, or if -mnoalu32 is specified. > > > > > > > > Sounds good? > > > > > > > > However this implies that the assembler should indeed recognize both > > > > forms of instructions. But note that it will assembly them to the > > > > exactly same encoded instruction. This includes inline asm (remember > > > > GCC does not have an integrated assembler.) > > > > > > Good point. > > > I think we made a mistake in clang. > > > We shouldn't be printing > > > w1 = *(u8 *)(r2 + 42) > > > since such instruction doesn't exist in BPF ISA > > > and it's confusing. > > > There is only one instruction: > > > r1 = *(u8 *)(r2 + 42) > > > which is an 8-bit load that zero extends into 64-bit. > > > x86 JIT actually implements it as 8-bit load that stores > > > into a 32-bit subregister, so it kinda matches w1, > > > but that's an implementation detail of the JIT. > > > > > > I think both gcc and clang should always print r1 = *(u8 *)(r2 + 42) > > > regardless of alu32 or not. > > > In gas and clang assembler we can support both w1= and r1= > > > flavors for backward compat. > > > > > > > I agree with Alexei (the ... disassembler quirk ... comment is left by me :). > > Can dig into clang part of things if this is a consensus. > > For disassembler, we have stx as well may use w* registers with alu32. > In llvm BPFDisassembler.cpp, we have > > if ((InstClass == BPF_LDX || InstClass == BPF_STX) && > getInstSize(Insn) != BPF_DW && > (InstMode == BPF_MEM || InstMode == BPF_ATOMIC) && > STI.hasFeature(BPF::ALU32)) > Result = decodeInstruction(DecoderTableBPFALU3264, Instr, Insn, > Address, > this, STI); > else > Result = decodeInstruction(DecoderTableBPF64, Instr, Insn, Address, > this, > STI); > > Maybe we should just do > > Result = decodeInstruction(DecoderTableBPF64, Instr, Insn, Address, > this, STI); > > So we already disassemble based on non-alu32 mode? > Yes, this changes llvm-objdump behavior to emit 64-bit registers on LHS. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Register encoding in assembly for load/store instructions 2023-07-26 4:16 ` Yonghong Song 2023-07-26 14:41 ` Eduard Zingerman @ 2023-07-28 16:58 ` Eduard Zingerman 2023-07-28 21:29 ` Alexei Starovoitov 2023-07-28 23:25 ` Yonghong Song 1 sibling, 2 replies; 16+ messages in thread From: Eduard Zingerman @ 2023-07-28 16:58 UTC (permalink / raw) To: yonghong.song, Alexei Starovoitov, Jose E. Marchesi; +Cc: Yonghong Song, bpf On Tue, 2023-07-25 at 21:16 -0700, Yonghong Song wrote: > > On 7/25/23 5:39 PM, Eduard Zingerman wrote: > > On Tue, 2023-07-25 at 17:31 -0700, Alexei Starovoitov wrote: > > > On Tue, Jul 25, 2023 at 3:28 PM Jose E. Marchesi > > > <jose.marchesi@oracle.com> wrote: > > > > > > > > > > > > > On 7/25/23 1:09 PM, Jose E. Marchesi wrote: > > > > > > > > > > > > > On 7/25/23 11:56 AM, Jose E. Marchesi wrote: > > > > > > > > > > > > > > > > > On 7/25/23 10:29 AM, Jose E. Marchesi wrote: > > > > > > > > > > Hello Yonghong. > > > > > > > > > > We have noticed that the llvm disassembler uses different notations > > > > > > > > > > for > > > > > > > > > > registers in load and store instructions, depending somehow on the width > > > > > > > > > > of the data being loaded or stored. > > > > > > > > > > For example, this is an excerpt from the assembler-disassembler.s > > > > > > > > > > test > > > > > > > > > > file in llvm: > > > > > > > > > > // Note: For the group below w1 is used as a destination for > > > > > > > > > > sizes u8, u16, u32. > > > > > > > > > > // This is disassembler quirk, but is technically not wrong, as there are > > > > > > > > > > // no different encodings for 'r1 = load' vs 'w1 = load'. > > > > > > > > > > // > > > > > > > > > > // CHECK: 71 21 2a 00 00 00 00 00 w1 = *(u8 *)(r2 + 0x2a) > > > > > > > > > > // CHECK: 69 21 2a 00 00 00 00 00 w1 = *(u16 *)(r2 + 0x2a) > > > > > > > > > > // CHECK: 61 21 2a 00 00 00 00 00 w1 = *(u32 *)(r2 + 0x2a) > > > > > > > > > > // CHECK: 79 21 2a 00 00 00 00 00 r1 = *(u64 *)(r2 + 0x2a) > > > > > > > > > > r1 = *(u8*)(r2 + 42) > > > > > > > > > > r1 = *(u16*)(r2 + 42) > > > > > > > > > > r1 = *(u32*)(r2 + 42) > > > > > > > > > > r1 = *(u64*)(r2 + 42) > > > > > > > > > > The comment there clarifies that the usage of wN instead of rN in > > > > > > > > > > the > > > > > > > > > > u8, u16 and u32 cases is a "disassembler quirk". > > > > > > > > > > Anyway, the problem is that it seems that `clang -S' actually emits > > > > > > > > > > these forms with wN. > > > > > > > > > > Is that intended? > > > > > > > > > > > > > > > > > > Yes, this is intended since alu32 mode is enabled where > > > > > > > > > w* registers are used for 8/16/32 bit load. > > > > > > > > So then why suppporting 'r1 = 8948 8*9r2 + 0x2a)'? The mode is > > > > > > > > still > > > > > > > > alu32 mode. Isn't the u{8,16,32} part enough to discriminate? > > > > > > > > > > > > > > What does this 'r1 = 8948 8*9r2 + 0x2a)' mean? > > > > > > > > > > > > > > For u8/u16/u32 loads, if objdump with option to indicate alu32 mode, > > > > > > > then w* register is used. If no alu32 mode for objdump, then r* register > > > > > > > is used. Basically the same insn, disasm is different depending on > > > > > > > alu32 mode or not. u8/u16/u32 is not enough to differentiate. > > > > > > Ok, so the llvm objdump has a switch that tells when to use rN or wN > > > > > > when printing these particular instructions. Thats the "disassembler > > > > > > quirk". To what purpose? Isnt the person passing the command line > > > > > > switch the same person reading the disassembled program? Is this "alu32 > > > > > > mode" more than a cosmetic thing? > > > > > > But what concern us is the assembler, not the disassembler. > > > > > > clang -S (which is not objdump) seems to generate these instructions > > > > > > with wN (see https://godbolt.org/z/5G433Yvrb for a store instruction for > > > > > > example) and we assume the output of clang -S is intended to be passed > > > > > > to an assembler, much like with gcc -S. > > > > > > So, should we support both syntaxes as _input_ syntax in the > > > > > > assembler? > > > > > > > > > > Considering -mcpu=v3 is recommended cpu flavor (at least in bpf mailing > > > > > list), and -mcpu=v3 has alu32 enabled by default. So I think > > > > > gcc can start to emit insn assuming alu32 mode is on by default. > > > > > So > > > > > w1 = *(u8 *)(r2 + 42) > > > > > is preferred. > > > > > > > > We have V4 by default now. So we can emit > > > > > > > > w1 = *(u8 *)(r2 + 42) > > > > > > > > when -mcpu is v3 or higher, or if -malu32 is specified, and > > > > > > > > r1 = *(u8 *)(r2 + 42) > > > > > > > > when -mcpu is v2 or lower, or if -mnoalu32 is specified. > > > > > > > > Sounds good? > > > > > > > > However this implies that the assembler should indeed recognize both > > > > forms of instructions. But note that it will assembly them to the > > > > exactly same encoded instruction. This includes inline asm (remember > > > > GCC does not have an integrated assembler.) > > > > > > Good point. > > > I think we made a mistake in clang. > > > We shouldn't be printing > > > w1 = *(u8 *)(r2 + 42) > > > since such instruction doesn't exist in BPF ISA > > > and it's confusing. > > > There is only one instruction: > > > r1 = *(u8 *)(r2 + 42) > > > which is an 8-bit load that zero extends into 64-bit. > > > x86 JIT actually implements it as 8-bit load that stores > > > into a 32-bit subregister, so it kinda matches w1, > > > but that's an implementation detail of the JIT. > > > > > > I think both gcc and clang should always print r1 = *(u8 *)(r2 + 42) > > > regardless of alu32 or not. > > > In gas and clang assembler we can support both w1= and r1= > > > flavors for backward compat. > > > > > > > I agree with Alexei (the ... disassembler quirk ... comment is left by me :). > > Can dig into clang part of things if this is a consensus. > > For disassembler, we have stx as well may use w* registers with alu32. > In llvm BPFDisassembler.cpp, we have > > if ((InstClass == BPF_LDX || InstClass == BPF_STX) && > getInstSize(Insn) != BPF_DW && > (InstMode == BPF_MEM || InstMode == BPF_ATOMIC) && > STI.hasFeature(BPF::ALU32)) > Result = decodeInstruction(DecoderTableBPFALU3264, Instr, Insn, > Address, > this, STI); > else > Result = decodeInstruction(DecoderTableBPF64, Instr, Insn, Address, > this, > STI); > > Maybe we should just do > > Result = decodeInstruction(DecoderTableBPF64, Instr, Insn, Address, > this, STI); > > So we already disassemble based on non-alu32 mode? > Yonghong, Alexei, I have a prototype [1] that consolidates STW/STW32, LDW/LDW32 etc instructions in LLVM BPF backend, thus removing the syntactic difference. I think it simplifies BPFInstrInfo.td a bit but that's up to debate. Should I proceed with it? [1] https://reviews.llvm.org/D156559 ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Register encoding in assembly for load/store instructions 2023-07-28 16:58 ` Eduard Zingerman @ 2023-07-28 21:29 ` Alexei Starovoitov 2023-07-28 23:25 ` Yonghong Song 1 sibling, 0 replies; 16+ messages in thread From: Alexei Starovoitov @ 2023-07-28 21:29 UTC (permalink / raw) To: Eduard Zingerman; +Cc: Yonghong Song, Jose E. Marchesi, Yonghong Song, bpf On Fri, Jul 28, 2023 at 9:58 AM Eduard Zingerman <eddyz87@gmail.com> wrote: > > > > For disassembler, we have stx as well may use w* registers with alu32. > > In llvm BPFDisassembler.cpp, we have > > > > if ((InstClass == BPF_LDX || InstClass == BPF_STX) && > > getInstSize(Insn) != BPF_DW && > > (InstMode == BPF_MEM || InstMode == BPF_ATOMIC) && > > STI.hasFeature(BPF::ALU32)) > > Result = decodeInstruction(DecoderTableBPFALU3264, Instr, Insn, > > Address, > > this, STI); > > else > > Result = decodeInstruction(DecoderTableBPF64, Instr, Insn, Address, > > this, > > STI); > > > > Maybe we should just do > > > > Result = decodeInstruction(DecoderTableBPF64, Instr, Insn, Address, > > this, STI); > > > > So we already disassemble based on non-alu32 mode? > > > > Yonghong, Alexei, > > I have a prototype [1] that consolidates STW/STW32, LDW/LDW32 etc > instructions in LLVM BPF backend, thus removing the syntactic > difference. I think it simplifies BPFInstrInfo.td a bit but that's up > to debate. > > Should I proceed with it? > > [1] https://reviews.llvm.org/D156559 Makes sense to me. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Register encoding in assembly for load/store instructions 2023-07-28 16:58 ` Eduard Zingerman 2023-07-28 21:29 ` Alexei Starovoitov @ 2023-07-28 23:25 ` Yonghong Song 1 sibling, 0 replies; 16+ messages in thread From: Yonghong Song @ 2023-07-28 23:25 UTC (permalink / raw) To: Eduard Zingerman, Alexei Starovoitov, Jose E. Marchesi; +Cc: Yonghong Song, bpf On 7/28/23 9:58 AM, Eduard Zingerman wrote: > On Tue, 2023-07-25 at 21:16 -0700, Yonghong Song wrote: >> >> On 7/25/23 5:39 PM, Eduard Zingerman wrote: >>> On Tue, 2023-07-25 at 17:31 -0700, Alexei Starovoitov wrote: >>>> On Tue, Jul 25, 2023 at 3:28 PM Jose E. Marchesi >>>> <jose.marchesi@oracle.com> wrote: >>>>> >>>>> >>>>>> On 7/25/23 1:09 PM, Jose E. Marchesi wrote: >>>>>>> >>>>>>>> On 7/25/23 11:56 AM, Jose E. Marchesi wrote: >>>>>>>>> >>>>>>>>>> On 7/25/23 10:29 AM, Jose E. Marchesi wrote: >>>>>>>>>>> Hello Yonghong. >>>>>>>>>>> We have noticed that the llvm disassembler uses different notations >>>>>>>>>>> for >>>>>>>>>>> registers in load and store instructions, depending somehow on the width >>>>>>>>>>> of the data being loaded or stored. >>>>>>>>>>> For example, this is an excerpt from the assembler-disassembler.s >>>>>>>>>>> test >>>>>>>>>>> file in llvm: >>>>>>>>>>> // Note: For the group below w1 is used as a destination for >>>>>>>>>>> sizes u8, u16, u32. >>>>>>>>>>> // This is disassembler quirk, but is technically not wrong, as there are >>>>>>>>>>> // no different encodings for 'r1 = load' vs 'w1 = load'. >>>>>>>>>>> // >>>>>>>>>>> // CHECK: 71 21 2a 00 00 00 00 00 w1 = *(u8 *)(r2 + 0x2a) >>>>>>>>>>> // CHECK: 69 21 2a 00 00 00 00 00 w1 = *(u16 *)(r2 + 0x2a) >>>>>>>>>>> // CHECK: 61 21 2a 00 00 00 00 00 w1 = *(u32 *)(r2 + 0x2a) >>>>>>>>>>> // CHECK: 79 21 2a 00 00 00 00 00 r1 = *(u64 *)(r2 + 0x2a) >>>>>>>>>>> r1 = *(u8*)(r2 + 42) >>>>>>>>>>> r1 = *(u16*)(r2 + 42) >>>>>>>>>>> r1 = *(u32*)(r2 + 42) >>>>>>>>>>> r1 = *(u64*)(r2 + 42) >>>>>>>>>>> The comment there clarifies that the usage of wN instead of rN in >>>>>>>>>>> the >>>>>>>>>>> u8, u16 and u32 cases is a "disassembler quirk". >>>>>>>>>>> Anyway, the problem is that it seems that `clang -S' actually emits >>>>>>>>>>> these forms with wN. >>>>>>>>>>> Is that intended? >>>>>>>>>> >>>>>>>>>> Yes, this is intended since alu32 mode is enabled where >>>>>>>>>> w* registers are used for 8/16/32 bit load. >>>>>>>>> So then why suppporting 'r1 = 8948 8*9r2 + 0x2a)'? The mode is >>>>>>>>> still >>>>>>>>> alu32 mode. Isn't the u{8,16,32} part enough to discriminate? >>>>>>>> >>>>>>>> What does this 'r1 = 8948 8*9r2 + 0x2a)' mean? >>>>>>>> >>>>>>>> For u8/u16/u32 loads, if objdump with option to indicate alu32 mode, >>>>>>>> then w* register is used. If no alu32 mode for objdump, then r* register >>>>>>>> is used. Basically the same insn, disasm is different depending on >>>>>>>> alu32 mode or not. u8/u16/u32 is not enough to differentiate. >>>>>>> Ok, so the llvm objdump has a switch that tells when to use rN or wN >>>>>>> when printing these particular instructions. Thats the "disassembler >>>>>>> quirk". To what purpose? Isnt the person passing the command line >>>>>>> switch the same person reading the disassembled program? Is this "alu32 >>>>>>> mode" more than a cosmetic thing? >>>>>>> But what concern us is the assembler, not the disassembler. >>>>>>> clang -S (which is not objdump) seems to generate these instructions >>>>>>> with wN (see https://godbolt.org/z/5G433Yvrb for a store instruction for >>>>>>> example) and we assume the output of clang -S is intended to be passed >>>>>>> to an assembler, much like with gcc -S. >>>>>>> So, should we support both syntaxes as _input_ syntax in the >>>>>>> assembler? >>>>>> >>>>>> Considering -mcpu=v3 is recommended cpu flavor (at least in bpf mailing >>>>>> list), and -mcpu=v3 has alu32 enabled by default. So I think >>>>>> gcc can start to emit insn assuming alu32 mode is on by default. >>>>>> So >>>>>> w1 = *(u8 *)(r2 + 42) >>>>>> is preferred. >>>>> >>>>> We have V4 by default now. So we can emit >>>>> >>>>> w1 = *(u8 *)(r2 + 42) >>>>> >>>>> when -mcpu is v3 or higher, or if -malu32 is specified, and >>>>> >>>>> r1 = *(u8 *)(r2 + 42) >>>>> >>>>> when -mcpu is v2 or lower, or if -mnoalu32 is specified. >>>>> >>>>> Sounds good? >>>>> >>>>> However this implies that the assembler should indeed recognize both >>>>> forms of instructions. But note that it will assembly them to the >>>>> exactly same encoded instruction. This includes inline asm (remember >>>>> GCC does not have an integrated assembler.) >>>> >>>> Good point. >>>> I think we made a mistake in clang. >>>> We shouldn't be printing >>>> w1 = *(u8 *)(r2 + 42) >>>> since such instruction doesn't exist in BPF ISA >>>> and it's confusing. >>>> There is only one instruction: >>>> r1 = *(u8 *)(r2 + 42) >>>> which is an 8-bit load that zero extends into 64-bit. >>>> x86 JIT actually implements it as 8-bit load that stores >>>> into a 32-bit subregister, so it kinda matches w1, >>>> but that's an implementation detail of the JIT. >>>> >>>> I think both gcc and clang should always print r1 = *(u8 *)(r2 + 42) >>>> regardless of alu32 or not. >>>> In gas and clang assembler we can support both w1= and r1= >>>> flavors for backward compat. >>>> >>> >>> I agree with Alexei (the ... disassembler quirk ... comment is left by me :). >>> Can dig into clang part of things if this is a consensus. >> >> For disassembler, we have stx as well may use w* registers with alu32. >> In llvm BPFDisassembler.cpp, we have >> >> if ((InstClass == BPF_LDX || InstClass == BPF_STX) && >> getInstSize(Insn) != BPF_DW && >> (InstMode == BPF_MEM || InstMode == BPF_ATOMIC) && >> STI.hasFeature(BPF::ALU32)) >> Result = decodeInstruction(DecoderTableBPFALU3264, Instr, Insn, >> Address, >> this, STI); >> else >> Result = decodeInstruction(DecoderTableBPF64, Instr, Insn, Address, >> this, >> STI); >> >> Maybe we should just do >> >> Result = decodeInstruction(DecoderTableBPF64, Instr, Insn, Address, >> this, STI); >> >> So we already disassemble based on non-alu32 mode? >> > > Yonghong, Alexei, > > I have a prototype [1] that consolidates STW/STW32, LDW/LDW32 etc > instructions in LLVM BPF backend, thus removing the syntactic > difference. I think it simplifies BPFInstrInfo.td a bit but that's up > to debate. > > Should I proceed with it? > > [1] https://reviews.llvm.org/D156559 I made a comment to the diff w.r.t. backward compatibility issue. ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2023-07-28 23:25 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-07-25 17:29 Register encoding in assembly for load/store instructions Jose E. Marchesi 2023-07-25 18:47 ` Yonghong Song 2023-07-25 18:56 ` Jose E. Marchesi 2023-07-25 19:11 ` Jose E. Marchesi 2023-07-25 19:59 ` Yonghong Song 2023-07-25 19:45 ` Yonghong Song 2023-07-25 20:09 ` Jose E. Marchesi 2023-07-25 22:10 ` Yonghong Song 2023-07-25 22:26 ` Jose E. Marchesi 2023-07-26 0:31 ` Alexei Starovoitov 2023-07-26 0:39 ` Eduard Zingerman 2023-07-26 4:16 ` Yonghong Song 2023-07-26 14:41 ` Eduard Zingerman 2023-07-28 16:58 ` Eduard Zingerman 2023-07-28 21:29 ` Alexei Starovoitov 2023-07-28 23:25 ` Yonghong Song
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox