* Register encoding in assembly for load/store instructions
@ 2023-07-25 17:29 Jose E. Marchesi
2023-07-25 18:47 ` Yonghong Song
0 siblings, 1 reply; 16+ messages in thread
From: Jose E. Marchesi @ 2023-07-25 17:29 UTC (permalink / raw)
To: Yonghong Song; +Cc: bpf
Hello Yonghong.
We have noticed that the llvm disassembler uses different notations for
registers in load and store instructions, depending somehow on the width
of the data being loaded or stored.
For example, this is an excerpt from the assembler-disassembler.s test
file in llvm:
// Note: For the group below w1 is used as a destination for sizes u8, u16, u32.
// This is disassembler quirk, but is technically not wrong, as there are
// no different encodings for 'r1 = load' vs 'w1 = load'.
//
// CHECK: 71 21 2a 00 00 00 00 00 w1 = *(u8 *)(r2 + 0x2a)
// CHECK: 69 21 2a 00 00 00 00 00 w1 = *(u16 *)(r2 + 0x2a)
// CHECK: 61 21 2a 00 00 00 00 00 w1 = *(u32 *)(r2 + 0x2a)
// CHECK: 79 21 2a 00 00 00 00 00 r1 = *(u64 *)(r2 + 0x2a)
r1 = *(u8*)(r2 + 42)
r1 = *(u16*)(r2 + 42)
r1 = *(u32*)(r2 + 42)
r1 = *(u64*)(r2 + 42)
The comment there clarifies that the usage of wN instead of rN in the
u8, u16 and u32 cases is a "disassembler quirk".
Anyway, the problem is that it seems that `clang -S' actually emits
these forms with wN.
Is that intended?
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Register encoding in assembly for load/store instructions
2023-07-25 17:29 Register encoding in assembly for load/store instructions Jose E. Marchesi
@ 2023-07-25 18:47 ` Yonghong Song
2023-07-25 18:56 ` Jose E. Marchesi
0 siblings, 1 reply; 16+ messages in thread
From: Yonghong Song @ 2023-07-25 18:47 UTC (permalink / raw)
To: Jose E. Marchesi, Yonghong Song; +Cc: bpf
On 7/25/23 10:29 AM, Jose E. Marchesi wrote:
>
> Hello Yonghong.
>
> We have noticed that the llvm disassembler uses different notations for
> registers in load and store instructions, depending somehow on the width
> of the data being loaded or stored.
>
> For example, this is an excerpt from the assembler-disassembler.s test
> file in llvm:
>
> // Note: For the group below w1 is used as a destination for sizes u8, u16, u32.
> // This is disassembler quirk, but is technically not wrong, as there are
> // no different encodings for 'r1 = load' vs 'w1 = load'.
> //
> // CHECK: 71 21 2a 00 00 00 00 00 w1 = *(u8 *)(r2 + 0x2a)
> // CHECK: 69 21 2a 00 00 00 00 00 w1 = *(u16 *)(r2 + 0x2a)
> // CHECK: 61 21 2a 00 00 00 00 00 w1 = *(u32 *)(r2 + 0x2a)
> // CHECK: 79 21 2a 00 00 00 00 00 r1 = *(u64 *)(r2 + 0x2a)
> r1 = *(u8*)(r2 + 42)
> r1 = *(u16*)(r2 + 42)
> r1 = *(u32*)(r2 + 42)
> r1 = *(u64*)(r2 + 42)
>
> The comment there clarifies that the usage of wN instead of rN in the
> u8, u16 and u32 cases is a "disassembler quirk".
>
> Anyway, the problem is that it seems that `clang -S' actually emits
> these forms with wN.
>
> Is that intended?
Yes, this is intended since alu32 mode is enabled where
w* registers are used for 8/16/32 bit load.
Note that for newer sign-extended loads, even at alu32 mode,
only r* register is used since the sign-extension extends
upto 64 bits for all variants (8/16/32).
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Register encoding in assembly for load/store instructions
2023-07-25 18:47 ` Yonghong Song
@ 2023-07-25 18:56 ` Jose E. Marchesi
2023-07-25 19:11 ` Jose E. Marchesi
2023-07-25 19:45 ` Yonghong Song
0 siblings, 2 replies; 16+ messages in thread
From: Jose E. Marchesi @ 2023-07-25 18:56 UTC (permalink / raw)
To: Yonghong Song; +Cc: Yonghong Song, bpf
> On 7/25/23 10:29 AM, Jose E. Marchesi wrote:
>> Hello Yonghong.
>> We have noticed that the llvm disassembler uses different notations
>> for
>> registers in load and store instructions, depending somehow on the width
>> of the data being loaded or stored.
>> For example, this is an excerpt from the assembler-disassembler.s
>> test
>> file in llvm:
>> // Note: For the group below w1 is used as a destination for
>> sizes u8, u16, u32.
>> // This is disassembler quirk, but is technically not wrong, as there are
>> // no different encodings for 'r1 = load' vs 'w1 = load'.
>> //
>> // CHECK: 71 21 2a 00 00 00 00 00 w1 = *(u8 *)(r2 + 0x2a)
>> // CHECK: 69 21 2a 00 00 00 00 00 w1 = *(u16 *)(r2 + 0x2a)
>> // CHECK: 61 21 2a 00 00 00 00 00 w1 = *(u32 *)(r2 + 0x2a)
>> // CHECK: 79 21 2a 00 00 00 00 00 r1 = *(u64 *)(r2 + 0x2a)
>> r1 = *(u8*)(r2 + 42)
>> r1 = *(u16*)(r2 + 42)
>> r1 = *(u32*)(r2 + 42)
>> r1 = *(u64*)(r2 + 42)
>> The comment there clarifies that the usage of wN instead of rN in
>> the
>> u8, u16 and u32 cases is a "disassembler quirk".
>> Anyway, the problem is that it seems that `clang -S' actually emits
>> these forms with wN.
>> Is that intended?
>
> Yes, this is intended since alu32 mode is enabled where
> w* registers are used for 8/16/32 bit load.
So then why suppporting 'r1 = 8948 8*9r2 + 0x2a)'? The mode is still
alu32 mode. Isn't the u{8,16,32} part enough to discriminate?
> Note that for newer sign-extended loads, even at alu32 mode,
> only r* register is used since the sign-extension extends
> upto 64 bits for all variants (8/16/32).
Yes we noticed that :)
>
>
>
>>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Register encoding in assembly for load/store instructions
2023-07-25 18:56 ` Jose E. Marchesi
@ 2023-07-25 19:11 ` Jose E. Marchesi
2023-07-25 19:59 ` Yonghong Song
2023-07-25 19:45 ` Yonghong Song
1 sibling, 1 reply; 16+ messages in thread
From: Jose E. Marchesi @ 2023-07-25 19:11 UTC (permalink / raw)
To: Yonghong Song; +Cc: Yonghong Song, bpf
>> On 7/25/23 10:29 AM, Jose E. Marchesi wrote:
>>> Hello Yonghong.
>>> We have noticed that the llvm disassembler uses different notations
>>> for
>>> registers in load and store instructions, depending somehow on the width
>>> of the data being loaded or stored.
>>> For example, this is an excerpt from the assembler-disassembler.s
>>> test
>>> file in llvm:
>>> // Note: For the group below w1 is used as a destination for
>>> sizes u8, u16, u32.
>>> // This is disassembler quirk, but is technically not wrong, as
>>> there are
>>> // no different encodings for 'r1 = load' vs 'w1 = load'.
>>> //
>>> // CHECK: 71 21 2a 00 00 00 00 00 w1 = *(u8 *)(r2 + 0x2a)
>>> // CHECK: 69 21 2a 00 00 00 00 00 w1 = *(u16 *)(r2 + 0x2a)
>>> // CHECK: 61 21 2a 00 00 00 00 00 w1 = *(u32 *)(r2 + 0x2a)
>>> // CHECK: 79 21 2a 00 00 00 00 00 r1 = *(u64 *)(r2 + 0x2a)
>>> r1 = *(u8*)(r2 + 42)
>>> r1 = *(u16*)(r2 + 42)
>>> r1 = *(u32*)(r2 + 42)
>>> r1 = *(u64*)(r2 + 42)
>>> The comment there clarifies that the usage of wN instead of rN in
>>> the
>>> u8, u16 and u32 cases is a "disassembler quirk".
>>> Anyway, the problem is that it seems that `clang -S' actually emits
>>> these forms with wN.
>>> Is that intended?
>>
>> Yes, this is intended since alu32 mode is enabled where
>> w* registers are used for 8/16/32 bit load.
>
> So then why suppporting 'r1 = 8948 8*9r2 + 0x2a)'? The mode is still
> alu32 mode. Isn't the u{8,16,32} part enough to discriminate?
Sorry my keyboard num-lock activated mid-sentence.
I meant 'r1 = (u8*)(r2 + 42)'.
Why supporting that syntax as well as 'w1 = (u8*)(r2 + 42)'?
>
>> Note that for newer sign-extended loads, even at alu32 mode,
>> only r* register is used since the sign-extension extends
>> upto 64 bits for all variants (8/16/32).
>
> Yes we noticed that :)
>
>>
>>
>>
>>>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Register encoding in assembly for load/store instructions
2023-07-25 18:56 ` Jose E. Marchesi
2023-07-25 19:11 ` Jose E. Marchesi
@ 2023-07-25 19:45 ` Yonghong Song
2023-07-25 20:09 ` Jose E. Marchesi
1 sibling, 1 reply; 16+ messages in thread
From: Yonghong Song @ 2023-07-25 19:45 UTC (permalink / raw)
To: Jose E. Marchesi; +Cc: Yonghong Song, bpf
On 7/25/23 11:56 AM, Jose E. Marchesi wrote:
>
>> On 7/25/23 10:29 AM, Jose E. Marchesi wrote:
>>> Hello Yonghong.
>>> We have noticed that the llvm disassembler uses different notations
>>> for
>>> registers in load and store instructions, depending somehow on the width
>>> of the data being loaded or stored.
>>> For example, this is an excerpt from the assembler-disassembler.s
>>> test
>>> file in llvm:
>>> // Note: For the group below w1 is used as a destination for
>>> sizes u8, u16, u32.
>>> // This is disassembler quirk, but is technically not wrong, as there are
>>> // no different encodings for 'r1 = load' vs 'w1 = load'.
>>> //
>>> // CHECK: 71 21 2a 00 00 00 00 00 w1 = *(u8 *)(r2 + 0x2a)
>>> // CHECK: 69 21 2a 00 00 00 00 00 w1 = *(u16 *)(r2 + 0x2a)
>>> // CHECK: 61 21 2a 00 00 00 00 00 w1 = *(u32 *)(r2 + 0x2a)
>>> // CHECK: 79 21 2a 00 00 00 00 00 r1 = *(u64 *)(r2 + 0x2a)
>>> r1 = *(u8*)(r2 + 42)
>>> r1 = *(u16*)(r2 + 42)
>>> r1 = *(u32*)(r2 + 42)
>>> r1 = *(u64*)(r2 + 42)
>>> The comment there clarifies that the usage of wN instead of rN in
>>> the
>>> u8, u16 and u32 cases is a "disassembler quirk".
>>> Anyway, the problem is that it seems that `clang -S' actually emits
>>> these forms with wN.
>>> Is that intended?
>>
>> Yes, this is intended since alu32 mode is enabled where
>> w* registers are used for 8/16/32 bit load.
>
> So then why suppporting 'r1 = 8948 8*9r2 + 0x2a)'? The mode is still
> alu32 mode. Isn't the u{8,16,32} part enough to discriminate?
What does this 'r1 = 8948 8*9r2 + 0x2a)' mean?
For u8/u16/u32 loads, if objdump with option to indicate alu32 mode,
then w* register is used. If no alu32 mode for objdump, then r* register
is used. Basically the same insn, disasm is different depending on
alu32 mode or not. u8/u16/u32 is not enough to differentiate.
>
>> Note that for newer sign-extended loads, even at alu32 mode,
>> only r* register is used since the sign-extension extends
>> upto 64 bits for all variants (8/16/32).
>
> Yes we noticed that :)
>
>>
>>
>>
>>>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Register encoding in assembly for load/store instructions
2023-07-25 19:11 ` Jose E. Marchesi
@ 2023-07-25 19:59 ` Yonghong Song
0 siblings, 0 replies; 16+ messages in thread
From: Yonghong Song @ 2023-07-25 19:59 UTC (permalink / raw)
To: Jose E. Marchesi; +Cc: Yonghong Song, bpf
On 7/25/23 12:11 PM, Jose E. Marchesi wrote:
>
>>> On 7/25/23 10:29 AM, Jose E. Marchesi wrote:
>>>> Hello Yonghong.
>>>> We have noticed that the llvm disassembler uses different notations
>>>> for
>>>> registers in load and store instructions, depending somehow on the width
>>>> of the data being loaded or stored.
>>>> For example, this is an excerpt from the assembler-disassembler.s
>>>> test
>>>> file in llvm:
>>>> // Note: For the group below w1 is used as a destination for
>>>> sizes u8, u16, u32.
>>>> // This is disassembler quirk, but is technically not wrong, as
>>>> there are
>>>> // no different encodings for 'r1 = load' vs 'w1 = load'.
>>>> //
>>>> // CHECK: 71 21 2a 00 00 00 00 00 w1 = *(u8 *)(r2 + 0x2a)
>>>> // CHECK: 69 21 2a 00 00 00 00 00 w1 = *(u16 *)(r2 + 0x2a)
>>>> // CHECK: 61 21 2a 00 00 00 00 00 w1 = *(u32 *)(r2 + 0x2a)
>>>> // CHECK: 79 21 2a 00 00 00 00 00 r1 = *(u64 *)(r2 + 0x2a)
>>>> r1 = *(u8*)(r2 + 42)
>>>> r1 = *(u16*)(r2 + 42)
>>>> r1 = *(u32*)(r2 + 42)
>>>> r1 = *(u64*)(r2 + 42)
>>>> The comment there clarifies that the usage of wN instead of rN in
>>>> the
>>>> u8, u16 and u32 cases is a "disassembler quirk".
>>>> Anyway, the problem is that it seems that `clang -S' actually emits
>>>> these forms with wN.
>>>> Is that intended?
>>>
>>> Yes, this is intended since alu32 mode is enabled where
>>> w* registers are used for 8/16/32 bit load.
>>
>> So then why suppporting 'r1 = 8948 8*9r2 + 0x2a)'? The mode is still
>> alu32 mode. Isn't the u{8,16,32} part enough to discriminate?
>
> Sorry my keyboard num-lock activated mid-sentence.
>
> I meant 'r1 = (u8*)(r2 + 42)'.
> Why supporting that syntax as well as 'w1 = (u8*)(r2 + 42)'?
alu32 mode. Original intention is that if
w1 = *(u8 *)(r2 + 42)
is specified that the hardware will actually only load
the value to the 32-bit sub-register. And then hardware
will be doing 32-to-64 zero extension automatically.
This is different from
r1 = *(u8 *)(r2 + 42)
where the value will actually load into the 64-bit
register by insn itself.
>
>>
>>> Note that for newer sign-extended loads, even at alu32 mode,
>>> only r* register is used since the sign-extension extends
>>> upto 64 bits for all variants (8/16/32).
>>
>> Yes we noticed that :)
>>
>>>
>>>
>>>
>>>>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Register encoding in assembly for load/store instructions
2023-07-25 19:45 ` Yonghong Song
@ 2023-07-25 20:09 ` Jose E. Marchesi
2023-07-25 22:10 ` Yonghong Song
0 siblings, 1 reply; 16+ messages in thread
From: Jose E. Marchesi @ 2023-07-25 20:09 UTC (permalink / raw)
To: Yonghong Song; +Cc: Yonghong Song, bpf
> On 7/25/23 11:56 AM, Jose E. Marchesi wrote:
>>
>>> On 7/25/23 10:29 AM, Jose E. Marchesi wrote:
>>>> Hello Yonghong.
>>>> We have noticed that the llvm disassembler uses different notations
>>>> for
>>>> registers in load and store instructions, depending somehow on the width
>>>> of the data being loaded or stored.
>>>> For example, this is an excerpt from the assembler-disassembler.s
>>>> test
>>>> file in llvm:
>>>> // Note: For the group below w1 is used as a destination for
>>>> sizes u8, u16, u32.
>>>> // This is disassembler quirk, but is technically not wrong, as there are
>>>> // no different encodings for 'r1 = load' vs 'w1 = load'.
>>>> //
>>>> // CHECK: 71 21 2a 00 00 00 00 00 w1 = *(u8 *)(r2 + 0x2a)
>>>> // CHECK: 69 21 2a 00 00 00 00 00 w1 = *(u16 *)(r2 + 0x2a)
>>>> // CHECK: 61 21 2a 00 00 00 00 00 w1 = *(u32 *)(r2 + 0x2a)
>>>> // CHECK: 79 21 2a 00 00 00 00 00 r1 = *(u64 *)(r2 + 0x2a)
>>>> r1 = *(u8*)(r2 + 42)
>>>> r1 = *(u16*)(r2 + 42)
>>>> r1 = *(u32*)(r2 + 42)
>>>> r1 = *(u64*)(r2 + 42)
>>>> The comment there clarifies that the usage of wN instead of rN in
>>>> the
>>>> u8, u16 and u32 cases is a "disassembler quirk".
>>>> Anyway, the problem is that it seems that `clang -S' actually emits
>>>> these forms with wN.
>>>> Is that intended?
>>>
>>> Yes, this is intended since alu32 mode is enabled where
>>> w* registers are used for 8/16/32 bit load.
>> So then why suppporting 'r1 = 8948 8*9r2 + 0x2a)'? The mode is
>> still
>> alu32 mode. Isn't the u{8,16,32} part enough to discriminate?
>
> What does this 'r1 = 8948 8*9r2 + 0x2a)' mean?
>
> For u8/u16/u32 loads, if objdump with option to indicate alu32 mode,
> then w* register is used. If no alu32 mode for objdump, then r* register
> is used. Basically the same insn, disasm is different depending on
> alu32 mode or not. u8/u16/u32 is not enough to differentiate.
Ok, so the llvm objdump has a switch that tells when to use rN or wN
when printing these particular instructions. Thats the "disassembler
quirk". To what purpose? Isnt the person passing the command line
switch the same person reading the disassembled program? Is this "alu32
mode" more than a cosmetic thing?
But what concern us is the assembler, not the disassembler.
clang -S (which is not objdump) seems to generate these instructions
with wN (see https://godbolt.org/z/5G433Yvrb for a store instruction for
example) and we assume the output of clang -S is intended to be passed
to an assembler, much like with gcc -S.
So, should we support both syntaxes as _input_ syntax in the assembler?
>>
>>> Note that for newer sign-extended loads, even at alu32 mode,
>>> only r* register is used since the sign-extension extends
>>> upto 64 bits for all variants (8/16/32).
>> Yes we noticed that :)
>>
>>>
>>>
>>>
>>>>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Register encoding in assembly for load/store instructions
2023-07-25 20:09 ` Jose E. Marchesi
@ 2023-07-25 22:10 ` Yonghong Song
2023-07-25 22:26 ` Jose E. Marchesi
0 siblings, 1 reply; 16+ messages in thread
From: Yonghong Song @ 2023-07-25 22:10 UTC (permalink / raw)
To: Jose E. Marchesi; +Cc: Yonghong Song, bpf
On 7/25/23 1:09 PM, Jose E. Marchesi wrote:
>
>
>> On 7/25/23 11:56 AM, Jose E. Marchesi wrote:
>>>
>>>> On 7/25/23 10:29 AM, Jose E. Marchesi wrote:
>>>>> Hello Yonghong.
>>>>> We have noticed that the llvm disassembler uses different notations
>>>>> for
>>>>> registers in load and store instructions, depending somehow on the width
>>>>> of the data being loaded or stored.
>>>>> For example, this is an excerpt from the assembler-disassembler.s
>>>>> test
>>>>> file in llvm:
>>>>> // Note: For the group below w1 is used as a destination for
>>>>> sizes u8, u16, u32.
>>>>> // This is disassembler quirk, but is technically not wrong, as there are
>>>>> // no different encodings for 'r1 = load' vs 'w1 = load'.
>>>>> //
>>>>> // CHECK: 71 21 2a 00 00 00 00 00 w1 = *(u8 *)(r2 + 0x2a)
>>>>> // CHECK: 69 21 2a 00 00 00 00 00 w1 = *(u16 *)(r2 + 0x2a)
>>>>> // CHECK: 61 21 2a 00 00 00 00 00 w1 = *(u32 *)(r2 + 0x2a)
>>>>> // CHECK: 79 21 2a 00 00 00 00 00 r1 = *(u64 *)(r2 + 0x2a)
>>>>> r1 = *(u8*)(r2 + 42)
>>>>> r1 = *(u16*)(r2 + 42)
>>>>> r1 = *(u32*)(r2 + 42)
>>>>> r1 = *(u64*)(r2 + 42)
>>>>> The comment there clarifies that the usage of wN instead of rN in
>>>>> the
>>>>> u8, u16 and u32 cases is a "disassembler quirk".
>>>>> Anyway, the problem is that it seems that `clang -S' actually emits
>>>>> these forms with wN.
>>>>> Is that intended?
>>>>
>>>> Yes, this is intended since alu32 mode is enabled where
>>>> w* registers are used for 8/16/32 bit load.
>>> So then why suppporting 'r1 = 8948 8*9r2 + 0x2a)'? The mode is
>>> still
>>> alu32 mode. Isn't the u{8,16,32} part enough to discriminate?
>>
>> What does this 'r1 = 8948 8*9r2 + 0x2a)' mean?
>>
>> For u8/u16/u32 loads, if objdump with option to indicate alu32 mode,
>> then w* register is used. If no alu32 mode for objdump, then r* register
>> is used. Basically the same insn, disasm is different depending on
>> alu32 mode or not. u8/u16/u32 is not enough to differentiate.
>
> Ok, so the llvm objdump has a switch that tells when to use rN or wN
> when printing these particular instructions. Thats the "disassembler
> quirk". To what purpose? Isnt the person passing the command line
> switch the same person reading the disassembled program? Is this "alu32
> mode" more than a cosmetic thing?
>
> But what concern us is the assembler, not the disassembler.
>
> clang -S (which is not objdump) seems to generate these instructions
> with wN (see https://godbolt.org/z/5G433Yvrb for a store instruction for
> example) and we assume the output of clang -S is intended to be passed
> to an assembler, much like with gcc -S.
>
> So, should we support both syntaxes as _input_ syntax in the assembler?
Considering -mcpu=v3 is recommended cpu flavor (at least in bpf mailing
list), and -mcpu=v3 has alu32 enabled by default. So I think
gcc can start to emit insn assuming alu32 mode is on by default.
So
w1 = *(u8 *)(r2 + 42)
is preferred.
>
>>>
>>>> Note that for newer sign-extended loads, even at alu32 mode,
>>>> only r* register is used since the sign-extension extends
>>>> upto 64 bits for all variants (8/16/32).
>>> Yes we noticed that :)
>>>
>>>>
>>>>
>>>>
>>>>>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Register encoding in assembly for load/store instructions
2023-07-25 22:10 ` Yonghong Song
@ 2023-07-25 22:26 ` Jose E. Marchesi
2023-07-26 0:31 ` Alexei Starovoitov
0 siblings, 1 reply; 16+ messages in thread
From: Jose E. Marchesi @ 2023-07-25 22:26 UTC (permalink / raw)
To: Yonghong Song; +Cc: Yonghong Song, bpf
> On 7/25/23 1:09 PM, Jose E. Marchesi wrote:
>>
>>> On 7/25/23 11:56 AM, Jose E. Marchesi wrote:
>>>>
>>>>> On 7/25/23 10:29 AM, Jose E. Marchesi wrote:
>>>>>> Hello Yonghong.
>>>>>> We have noticed that the llvm disassembler uses different notations
>>>>>> for
>>>>>> registers in load and store instructions, depending somehow on the width
>>>>>> of the data being loaded or stored.
>>>>>> For example, this is an excerpt from the assembler-disassembler.s
>>>>>> test
>>>>>> file in llvm:
>>>>>> // Note: For the group below w1 is used as a destination for
>>>>>> sizes u8, u16, u32.
>>>>>> // This is disassembler quirk, but is technically not wrong, as there are
>>>>>> // no different encodings for 'r1 = load' vs 'w1 = load'.
>>>>>> //
>>>>>> // CHECK: 71 21 2a 00 00 00 00 00 w1 = *(u8 *)(r2 + 0x2a)
>>>>>> // CHECK: 69 21 2a 00 00 00 00 00 w1 = *(u16 *)(r2 + 0x2a)
>>>>>> // CHECK: 61 21 2a 00 00 00 00 00 w1 = *(u32 *)(r2 + 0x2a)
>>>>>> // CHECK: 79 21 2a 00 00 00 00 00 r1 = *(u64 *)(r2 + 0x2a)
>>>>>> r1 = *(u8*)(r2 + 42)
>>>>>> r1 = *(u16*)(r2 + 42)
>>>>>> r1 = *(u32*)(r2 + 42)
>>>>>> r1 = *(u64*)(r2 + 42)
>>>>>> The comment there clarifies that the usage of wN instead of rN in
>>>>>> the
>>>>>> u8, u16 and u32 cases is a "disassembler quirk".
>>>>>> Anyway, the problem is that it seems that `clang -S' actually emits
>>>>>> these forms with wN.
>>>>>> Is that intended?
>>>>>
>>>>> Yes, this is intended since alu32 mode is enabled where
>>>>> w* registers are used for 8/16/32 bit load.
>>>> So then why suppporting 'r1 = 8948 8*9r2 + 0x2a)'? The mode is
>>>> still
>>>> alu32 mode. Isn't the u{8,16,32} part enough to discriminate?
>>>
>>> What does this 'r1 = 8948 8*9r2 + 0x2a)' mean?
>>>
>>> For u8/u16/u32 loads, if objdump with option to indicate alu32 mode,
>>> then w* register is used. If no alu32 mode for objdump, then r* register
>>> is used. Basically the same insn, disasm is different depending on
>>> alu32 mode or not. u8/u16/u32 is not enough to differentiate.
>> Ok, so the llvm objdump has a switch that tells when to use rN or wN
>> when printing these particular instructions. Thats the "disassembler
>> quirk". To what purpose? Isnt the person passing the command line
>> switch the same person reading the disassembled program? Is this "alu32
>> mode" more than a cosmetic thing?
>> But what concern us is the assembler, not the disassembler.
>> clang -S (which is not objdump) seems to generate these instructions
>> with wN (see https://godbolt.org/z/5G433Yvrb for a store instruction for
>> example) and we assume the output of clang -S is intended to be passed
>> to an assembler, much like with gcc -S.
>> So, should we support both syntaxes as _input_ syntax in the
>> assembler?
>
> Considering -mcpu=v3 is recommended cpu flavor (at least in bpf mailing
> list), and -mcpu=v3 has alu32 enabled by default. So I think
> gcc can start to emit insn assuming alu32 mode is on by default.
> So
> w1 = *(u8 *)(r2 + 42)
> is preferred.
We have V4 by default now. So we can emit
w1 = *(u8 *)(r2 + 42)
when -mcpu is v3 or higher, or if -malu32 is specified, and
r1 = *(u8 *)(r2 + 42)
when -mcpu is v2 or lower, or if -mnoalu32 is specified.
Sounds good?
However this implies that the assembler should indeed recognize both
forms of instructions. But note that it will assembly them to the
exactly same encoded instruction. This includes inline asm (remember
GCC does not have an integrated assembler.)
>
>>
>>>>
>>>>> Note that for newer sign-extended loads, even at alu32 mode,
>>>>> only r* register is used since the sign-extension extends
>>>>> upto 64 bits for all variants (8/16/32).
>>>> Yes we noticed that :)
>>>>
>>>>>
>>>>>
>>>>>
>>>>>>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Register encoding in assembly for load/store instructions
2023-07-25 22:26 ` Jose E. Marchesi
@ 2023-07-26 0:31 ` Alexei Starovoitov
2023-07-26 0:39 ` Eduard Zingerman
0 siblings, 1 reply; 16+ messages in thread
From: Alexei Starovoitov @ 2023-07-26 0:31 UTC (permalink / raw)
To: Jose E. Marchesi; +Cc: Yonghong Song, Yonghong Song, bpf
On Tue, Jul 25, 2023 at 3:28 PM Jose E. Marchesi
<jose.marchesi@oracle.com> wrote:
>
>
> > On 7/25/23 1:09 PM, Jose E. Marchesi wrote:
> >>
> >>> On 7/25/23 11:56 AM, Jose E. Marchesi wrote:
> >>>>
> >>>>> On 7/25/23 10:29 AM, Jose E. Marchesi wrote:
> >>>>>> Hello Yonghong.
> >>>>>> We have noticed that the llvm disassembler uses different notations
> >>>>>> for
> >>>>>> registers in load and store instructions, depending somehow on the width
> >>>>>> of the data being loaded or stored.
> >>>>>> For example, this is an excerpt from the assembler-disassembler.s
> >>>>>> test
> >>>>>> file in llvm:
> >>>>>> // Note: For the group below w1 is used as a destination for
> >>>>>> sizes u8, u16, u32.
> >>>>>> // This is disassembler quirk, but is technically not wrong, as there are
> >>>>>> // no different encodings for 'r1 = load' vs 'w1 = load'.
> >>>>>> //
> >>>>>> // CHECK: 71 21 2a 00 00 00 00 00 w1 = *(u8 *)(r2 + 0x2a)
> >>>>>> // CHECK: 69 21 2a 00 00 00 00 00 w1 = *(u16 *)(r2 + 0x2a)
> >>>>>> // CHECK: 61 21 2a 00 00 00 00 00 w1 = *(u32 *)(r2 + 0x2a)
> >>>>>> // CHECK: 79 21 2a 00 00 00 00 00 r1 = *(u64 *)(r2 + 0x2a)
> >>>>>> r1 = *(u8*)(r2 + 42)
> >>>>>> r1 = *(u16*)(r2 + 42)
> >>>>>> r1 = *(u32*)(r2 + 42)
> >>>>>> r1 = *(u64*)(r2 + 42)
> >>>>>> The comment there clarifies that the usage of wN instead of rN in
> >>>>>> the
> >>>>>> u8, u16 and u32 cases is a "disassembler quirk".
> >>>>>> Anyway, the problem is that it seems that `clang -S' actually emits
> >>>>>> these forms with wN.
> >>>>>> Is that intended?
> >>>>>
> >>>>> Yes, this is intended since alu32 mode is enabled where
> >>>>> w* registers are used for 8/16/32 bit load.
> >>>> So then why suppporting 'r1 = 8948 8*9r2 + 0x2a)'? The mode is
> >>>> still
> >>>> alu32 mode. Isn't the u{8,16,32} part enough to discriminate?
> >>>
> >>> What does this 'r1 = 8948 8*9r2 + 0x2a)' mean?
> >>>
> >>> For u8/u16/u32 loads, if objdump with option to indicate alu32 mode,
> >>> then w* register is used. If no alu32 mode for objdump, then r* register
> >>> is used. Basically the same insn, disasm is different depending on
> >>> alu32 mode or not. u8/u16/u32 is not enough to differentiate.
> >> Ok, so the llvm objdump has a switch that tells when to use rN or wN
> >> when printing these particular instructions. Thats the "disassembler
> >> quirk". To what purpose? Isnt the person passing the command line
> >> switch the same person reading the disassembled program? Is this "alu32
> >> mode" more than a cosmetic thing?
> >> But what concern us is the assembler, not the disassembler.
> >> clang -S (which is not objdump) seems to generate these instructions
> >> with wN (see https://godbolt.org/z/5G433Yvrb for a store instruction for
> >> example) and we assume the output of clang -S is intended to be passed
> >> to an assembler, much like with gcc -S.
> >> So, should we support both syntaxes as _input_ syntax in the
> >> assembler?
> >
> > Considering -mcpu=v3 is recommended cpu flavor (at least in bpf mailing
> > list), and -mcpu=v3 has alu32 enabled by default. So I think
> > gcc can start to emit insn assuming alu32 mode is on by default.
> > So
> > w1 = *(u8 *)(r2 + 42)
> > is preferred.
>
> We have V4 by default now. So we can emit
>
> w1 = *(u8 *)(r2 + 42)
>
> when -mcpu is v3 or higher, or if -malu32 is specified, and
>
> r1 = *(u8 *)(r2 + 42)
>
> when -mcpu is v2 or lower, or if -mnoalu32 is specified.
>
> Sounds good?
>
> However this implies that the assembler should indeed recognize both
> forms of instructions. But note that it will assembly them to the
> exactly same encoded instruction. This includes inline asm (remember
> GCC does not have an integrated assembler.)
Good point.
I think we made a mistake in clang.
We shouldn't be printing
w1 = *(u8 *)(r2 + 42)
since such instruction doesn't exist in BPF ISA
and it's confusing.
There is only one instruction:
r1 = *(u8 *)(r2 + 42)
which is an 8-bit load that zero extends into 64-bit.
x86 JIT actually implements it as 8-bit load that stores
into a 32-bit subregister, so it kinda matches w1,
but that's an implementation detail of the JIT.
I think both gcc and clang should always print r1 = *(u8 *)(r2 + 42)
regardless of alu32 or not.
In gas and clang assembler we can support both w1= and r1=
flavors for backward compat.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Register encoding in assembly for load/store instructions
2023-07-26 0:31 ` Alexei Starovoitov
@ 2023-07-26 0:39 ` Eduard Zingerman
2023-07-26 4:16 ` Yonghong Song
0 siblings, 1 reply; 16+ messages in thread
From: Eduard Zingerman @ 2023-07-26 0:39 UTC (permalink / raw)
To: Alexei Starovoitov, Jose E. Marchesi; +Cc: Yonghong Song, Yonghong Song, bpf
On Tue, 2023-07-25 at 17:31 -0700, Alexei Starovoitov wrote:
> On Tue, Jul 25, 2023 at 3:28 PM Jose E. Marchesi
> <jose.marchesi@oracle.com> wrote:
> >
> >
> > > On 7/25/23 1:09 PM, Jose E. Marchesi wrote:
> > > >
> > > > > On 7/25/23 11:56 AM, Jose E. Marchesi wrote:
> > > > > >
> > > > > > > On 7/25/23 10:29 AM, Jose E. Marchesi wrote:
> > > > > > > > Hello Yonghong.
> > > > > > > > We have noticed that the llvm disassembler uses different notations
> > > > > > > > for
> > > > > > > > registers in load and store instructions, depending somehow on the width
> > > > > > > > of the data being loaded or stored.
> > > > > > > > For example, this is an excerpt from the assembler-disassembler.s
> > > > > > > > test
> > > > > > > > file in llvm:
> > > > > > > > // Note: For the group below w1 is used as a destination for
> > > > > > > > sizes u8, u16, u32.
> > > > > > > > // This is disassembler quirk, but is technically not wrong, as there are
> > > > > > > > // no different encodings for 'r1 = load' vs 'w1 = load'.
> > > > > > > > //
> > > > > > > > // CHECK: 71 21 2a 00 00 00 00 00 w1 = *(u8 *)(r2 + 0x2a)
> > > > > > > > // CHECK: 69 21 2a 00 00 00 00 00 w1 = *(u16 *)(r2 + 0x2a)
> > > > > > > > // CHECK: 61 21 2a 00 00 00 00 00 w1 = *(u32 *)(r2 + 0x2a)
> > > > > > > > // CHECK: 79 21 2a 00 00 00 00 00 r1 = *(u64 *)(r2 + 0x2a)
> > > > > > > > r1 = *(u8*)(r2 + 42)
> > > > > > > > r1 = *(u16*)(r2 + 42)
> > > > > > > > r1 = *(u32*)(r2 + 42)
> > > > > > > > r1 = *(u64*)(r2 + 42)
> > > > > > > > The comment there clarifies that the usage of wN instead of rN in
> > > > > > > > the
> > > > > > > > u8, u16 and u32 cases is a "disassembler quirk".
> > > > > > > > Anyway, the problem is that it seems that `clang -S' actually emits
> > > > > > > > these forms with wN.
> > > > > > > > Is that intended?
> > > > > > >
> > > > > > > Yes, this is intended since alu32 mode is enabled where
> > > > > > > w* registers are used for 8/16/32 bit load.
> > > > > > So then why suppporting 'r1 = 8948 8*9r2 + 0x2a)'? The mode is
> > > > > > still
> > > > > > alu32 mode. Isn't the u{8,16,32} part enough to discriminate?
> > > > >
> > > > > What does this 'r1 = 8948 8*9r2 + 0x2a)' mean?
> > > > >
> > > > > For u8/u16/u32 loads, if objdump with option to indicate alu32 mode,
> > > > > then w* register is used. If no alu32 mode for objdump, then r* register
> > > > > is used. Basically the same insn, disasm is different depending on
> > > > > alu32 mode or not. u8/u16/u32 is not enough to differentiate.
> > > > Ok, so the llvm objdump has a switch that tells when to use rN or wN
> > > > when printing these particular instructions. Thats the "disassembler
> > > > quirk". To what purpose? Isnt the person passing the command line
> > > > switch the same person reading the disassembled program? Is this "alu32
> > > > mode" more than a cosmetic thing?
> > > > But what concern us is the assembler, not the disassembler.
> > > > clang -S (which is not objdump) seems to generate these instructions
> > > > with wN (see https://godbolt.org/z/5G433Yvrb for a store instruction for
> > > > example) and we assume the output of clang -S is intended to be passed
> > > > to an assembler, much like with gcc -S.
> > > > So, should we support both syntaxes as _input_ syntax in the
> > > > assembler?
> > >
> > > Considering -mcpu=v3 is recommended cpu flavor (at least in bpf mailing
> > > list), and -mcpu=v3 has alu32 enabled by default. So I think
> > > gcc can start to emit insn assuming alu32 mode is on by default.
> > > So
> > > w1 = *(u8 *)(r2 + 42)
> > > is preferred.
> >
> > We have V4 by default now. So we can emit
> >
> > w1 = *(u8 *)(r2 + 42)
> >
> > when -mcpu is v3 or higher, or if -malu32 is specified, and
> >
> > r1 = *(u8 *)(r2 + 42)
> >
> > when -mcpu is v2 or lower, or if -mnoalu32 is specified.
> >
> > Sounds good?
> >
> > However this implies that the assembler should indeed recognize both
> > forms of instructions. But note that it will assembly them to the
> > exactly same encoded instruction. This includes inline asm (remember
> > GCC does not have an integrated assembler.)
>
> Good point.
> I think we made a mistake in clang.
> We shouldn't be printing
> w1 = *(u8 *)(r2 + 42)
> since such instruction doesn't exist in BPF ISA
> and it's confusing.
> There is only one instruction:
> r1 = *(u8 *)(r2 + 42)
> which is an 8-bit load that zero extends into 64-bit.
> x86 JIT actually implements it as 8-bit load that stores
> into a 32-bit subregister, so it kinda matches w1,
> but that's an implementation detail of the JIT.
>
> I think both gcc and clang should always print r1 = *(u8 *)(r2 + 42)
> regardless of alu32 or not.
> In gas and clang assembler we can support both w1= and r1=
> flavors for backward compat.
>
I agree with Alexei (the ... disassembler quirk ... comment is left by me :).
Can dig into clang part of things if this is a consensus.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Register encoding in assembly for load/store instructions
2023-07-26 0:39 ` Eduard Zingerman
@ 2023-07-26 4:16 ` Yonghong Song
2023-07-26 14:41 ` Eduard Zingerman
2023-07-28 16:58 ` Eduard Zingerman
0 siblings, 2 replies; 16+ messages in thread
From: Yonghong Song @ 2023-07-26 4:16 UTC (permalink / raw)
To: Eduard Zingerman, Alexei Starovoitov, Jose E. Marchesi; +Cc: Yonghong Song, bpf
On 7/25/23 5:39 PM, Eduard Zingerman wrote:
> On Tue, 2023-07-25 at 17:31 -0700, Alexei Starovoitov wrote:
>> On Tue, Jul 25, 2023 at 3:28 PM Jose E. Marchesi
>> <jose.marchesi@oracle.com> wrote:
>>>
>>>
>>>> On 7/25/23 1:09 PM, Jose E. Marchesi wrote:
>>>>>
>>>>>> On 7/25/23 11:56 AM, Jose E. Marchesi wrote:
>>>>>>>
>>>>>>>> On 7/25/23 10:29 AM, Jose E. Marchesi wrote:
>>>>>>>>> Hello Yonghong.
>>>>>>>>> We have noticed that the llvm disassembler uses different notations
>>>>>>>>> for
>>>>>>>>> registers in load and store instructions, depending somehow on the width
>>>>>>>>> of the data being loaded or stored.
>>>>>>>>> For example, this is an excerpt from the assembler-disassembler.s
>>>>>>>>> test
>>>>>>>>> file in llvm:
>>>>>>>>> // Note: For the group below w1 is used as a destination for
>>>>>>>>> sizes u8, u16, u32.
>>>>>>>>> // This is disassembler quirk, but is technically not wrong, as there are
>>>>>>>>> // no different encodings for 'r1 = load' vs 'w1 = load'.
>>>>>>>>> //
>>>>>>>>> // CHECK: 71 21 2a 00 00 00 00 00 w1 = *(u8 *)(r2 + 0x2a)
>>>>>>>>> // CHECK: 69 21 2a 00 00 00 00 00 w1 = *(u16 *)(r2 + 0x2a)
>>>>>>>>> // CHECK: 61 21 2a 00 00 00 00 00 w1 = *(u32 *)(r2 + 0x2a)
>>>>>>>>> // CHECK: 79 21 2a 00 00 00 00 00 r1 = *(u64 *)(r2 + 0x2a)
>>>>>>>>> r1 = *(u8*)(r2 + 42)
>>>>>>>>> r1 = *(u16*)(r2 + 42)
>>>>>>>>> r1 = *(u32*)(r2 + 42)
>>>>>>>>> r1 = *(u64*)(r2 + 42)
>>>>>>>>> The comment there clarifies that the usage of wN instead of rN in
>>>>>>>>> the
>>>>>>>>> u8, u16 and u32 cases is a "disassembler quirk".
>>>>>>>>> Anyway, the problem is that it seems that `clang -S' actually emits
>>>>>>>>> these forms with wN.
>>>>>>>>> Is that intended?
>>>>>>>>
>>>>>>>> Yes, this is intended since alu32 mode is enabled where
>>>>>>>> w* registers are used for 8/16/32 bit load.
>>>>>>> So then why suppporting 'r1 = 8948 8*9r2 + 0x2a)'? The mode is
>>>>>>> still
>>>>>>> alu32 mode. Isn't the u{8,16,32} part enough to discriminate?
>>>>>>
>>>>>> What does this 'r1 = 8948 8*9r2 + 0x2a)' mean?
>>>>>>
>>>>>> For u8/u16/u32 loads, if objdump with option to indicate alu32 mode,
>>>>>> then w* register is used. If no alu32 mode for objdump, then r* register
>>>>>> is used. Basically the same insn, disasm is different depending on
>>>>>> alu32 mode or not. u8/u16/u32 is not enough to differentiate.
>>>>> Ok, so the llvm objdump has a switch that tells when to use rN or wN
>>>>> when printing these particular instructions. Thats the "disassembler
>>>>> quirk". To what purpose? Isnt the person passing the command line
>>>>> switch the same person reading the disassembled program? Is this "alu32
>>>>> mode" more than a cosmetic thing?
>>>>> But what concern us is the assembler, not the disassembler.
>>>>> clang -S (which is not objdump) seems to generate these instructions
>>>>> with wN (see https://godbolt.org/z/5G433Yvrb for a store instruction for
>>>>> example) and we assume the output of clang -S is intended to be passed
>>>>> to an assembler, much like with gcc -S.
>>>>> So, should we support both syntaxes as _input_ syntax in the
>>>>> assembler?
>>>>
>>>> Considering -mcpu=v3 is recommended cpu flavor (at least in bpf mailing
>>>> list), and -mcpu=v3 has alu32 enabled by default. So I think
>>>> gcc can start to emit insn assuming alu32 mode is on by default.
>>>> So
>>>> w1 = *(u8 *)(r2 + 42)
>>>> is preferred.
>>>
>>> We have V4 by default now. So we can emit
>>>
>>> w1 = *(u8 *)(r2 + 42)
>>>
>>> when -mcpu is v3 or higher, or if -malu32 is specified, and
>>>
>>> r1 = *(u8 *)(r2 + 42)
>>>
>>> when -mcpu is v2 or lower, or if -mnoalu32 is specified.
>>>
>>> Sounds good?
>>>
>>> However this implies that the assembler should indeed recognize both
>>> forms of instructions. But note that it will assembly them to the
>>> exactly same encoded instruction. This includes inline asm (remember
>>> GCC does not have an integrated assembler.)
>>
>> Good point.
>> I think we made a mistake in clang.
>> We shouldn't be printing
>> w1 = *(u8 *)(r2 + 42)
>> since such instruction doesn't exist in BPF ISA
>> and it's confusing.
>> There is only one instruction:
>> r1 = *(u8 *)(r2 + 42)
>> which is an 8-bit load that zero extends into 64-bit.
>> x86 JIT actually implements it as 8-bit load that stores
>> into a 32-bit subregister, so it kinda matches w1,
>> but that's an implementation detail of the JIT.
>>
>> I think both gcc and clang should always print r1 = *(u8 *)(r2 + 42)
>> regardless of alu32 or not.
>> In gas and clang assembler we can support both w1= and r1=
>> flavors for backward compat.
>>
>
> I agree with Alexei (the ... disassembler quirk ... comment is left by me :).
> Can dig into clang part of things if this is a consensus.
For disassembler, we have stx as well may use w* registers with alu32.
In llvm BPFDisassembler.cpp, we have
if ((InstClass == BPF_LDX || InstClass == BPF_STX) &&
getInstSize(Insn) != BPF_DW &&
(InstMode == BPF_MEM || InstMode == BPF_ATOMIC) &&
STI.hasFeature(BPF::ALU32))
Result = decodeInstruction(DecoderTableBPFALU3264, Instr, Insn,
Address,
this, STI);
else
Result = decodeInstruction(DecoderTableBPF64, Instr, Insn, Address,
this,
STI);
Maybe we should just do
Result = decodeInstruction(DecoderTableBPF64, Instr, Insn, Address,
this, STI);
So we already disassemble based on non-alu32 mode?
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Register encoding in assembly for load/store instructions
2023-07-26 4:16 ` Yonghong Song
@ 2023-07-26 14:41 ` Eduard Zingerman
2023-07-28 16:58 ` Eduard Zingerman
1 sibling, 0 replies; 16+ messages in thread
From: Eduard Zingerman @ 2023-07-26 14:41 UTC (permalink / raw)
To: yonghong.song, Alexei Starovoitov, Jose E. Marchesi; +Cc: Yonghong Song, bpf
On Tue, 2023-07-25 at 21:16 -0700, Yonghong Song wrote:
>
> On 7/25/23 5:39 PM, Eduard Zingerman wrote:
> > On Tue, 2023-07-25 at 17:31 -0700, Alexei Starovoitov wrote:
> > > On Tue, Jul 25, 2023 at 3:28 PM Jose E. Marchesi
> > > <jose.marchesi@oracle.com> wrote:
> > > >
> > > >
> > > > > On 7/25/23 1:09 PM, Jose E. Marchesi wrote:
> > > > > >
> > > > > > > On 7/25/23 11:56 AM, Jose E. Marchesi wrote:
> > > > > > > >
> > > > > > > > > On 7/25/23 10:29 AM, Jose E. Marchesi wrote:
> > > > > > > > > > Hello Yonghong.
> > > > > > > > > > We have noticed that the llvm disassembler uses different notations
> > > > > > > > > > for
> > > > > > > > > > registers in load and store instructions, depending somehow on the width
> > > > > > > > > > of the data being loaded or stored.
> > > > > > > > > > For example, this is an excerpt from the assembler-disassembler.s
> > > > > > > > > > test
> > > > > > > > > > file in llvm:
> > > > > > > > > > // Note: For the group below w1 is used as a destination for
> > > > > > > > > > sizes u8, u16, u32.
> > > > > > > > > > // This is disassembler quirk, but is technically not wrong, as there are
> > > > > > > > > > // no different encodings for 'r1 = load' vs 'w1 = load'.
> > > > > > > > > > //
> > > > > > > > > > // CHECK: 71 21 2a 00 00 00 00 00 w1 = *(u8 *)(r2 + 0x2a)
> > > > > > > > > > // CHECK: 69 21 2a 00 00 00 00 00 w1 = *(u16 *)(r2 + 0x2a)
> > > > > > > > > > // CHECK: 61 21 2a 00 00 00 00 00 w1 = *(u32 *)(r2 + 0x2a)
> > > > > > > > > > // CHECK: 79 21 2a 00 00 00 00 00 r1 = *(u64 *)(r2 + 0x2a)
> > > > > > > > > > r1 = *(u8*)(r2 + 42)
> > > > > > > > > > r1 = *(u16*)(r2 + 42)
> > > > > > > > > > r1 = *(u32*)(r2 + 42)
> > > > > > > > > > r1 = *(u64*)(r2 + 42)
> > > > > > > > > > The comment there clarifies that the usage of wN instead of rN in
> > > > > > > > > > the
> > > > > > > > > > u8, u16 and u32 cases is a "disassembler quirk".
> > > > > > > > > > Anyway, the problem is that it seems that `clang -S' actually emits
> > > > > > > > > > these forms with wN.
> > > > > > > > > > Is that intended?
> > > > > > > > >
> > > > > > > > > Yes, this is intended since alu32 mode is enabled where
> > > > > > > > > w* registers are used for 8/16/32 bit load.
> > > > > > > > So then why suppporting 'r1 = 8948 8*9r2 + 0x2a)'? The mode is
> > > > > > > > still
> > > > > > > > alu32 mode. Isn't the u{8,16,32} part enough to discriminate?
> > > > > > >
> > > > > > > What does this 'r1 = 8948 8*9r2 + 0x2a)' mean?
> > > > > > >
> > > > > > > For u8/u16/u32 loads, if objdump with option to indicate alu32 mode,
> > > > > > > then w* register is used. If no alu32 mode for objdump, then r* register
> > > > > > > is used. Basically the same insn, disasm is different depending on
> > > > > > > alu32 mode or not. u8/u16/u32 is not enough to differentiate.
> > > > > > Ok, so the llvm objdump has a switch that tells when to use rN or wN
> > > > > > when printing these particular instructions. Thats the "disassembler
> > > > > > quirk". To what purpose? Isnt the person passing the command line
> > > > > > switch the same person reading the disassembled program? Is this "alu32
> > > > > > mode" more than a cosmetic thing?
> > > > > > But what concern us is the assembler, not the disassembler.
> > > > > > clang -S (which is not objdump) seems to generate these instructions
> > > > > > with wN (see https://godbolt.org/z/5G433Yvrb for a store instruction for
> > > > > > example) and we assume the output of clang -S is intended to be passed
> > > > > > to an assembler, much like with gcc -S.
> > > > > > So, should we support both syntaxes as _input_ syntax in the
> > > > > > assembler?
> > > > >
> > > > > Considering -mcpu=v3 is recommended cpu flavor (at least in bpf mailing
> > > > > list), and -mcpu=v3 has alu32 enabled by default. So I think
> > > > > gcc can start to emit insn assuming alu32 mode is on by default.
> > > > > So
> > > > > w1 = *(u8 *)(r2 + 42)
> > > > > is preferred.
> > > >
> > > > We have V4 by default now. So we can emit
> > > >
> > > > w1 = *(u8 *)(r2 + 42)
> > > >
> > > > when -mcpu is v3 or higher, or if -malu32 is specified, and
> > > >
> > > > r1 = *(u8 *)(r2 + 42)
> > > >
> > > > when -mcpu is v2 or lower, or if -mnoalu32 is specified.
> > > >
> > > > Sounds good?
> > > >
> > > > However this implies that the assembler should indeed recognize both
> > > > forms of instructions. But note that it will assembly them to the
> > > > exactly same encoded instruction. This includes inline asm (remember
> > > > GCC does not have an integrated assembler.)
> > >
> > > Good point.
> > > I think we made a mistake in clang.
> > > We shouldn't be printing
> > > w1 = *(u8 *)(r2 + 42)
> > > since such instruction doesn't exist in BPF ISA
> > > and it's confusing.
> > > There is only one instruction:
> > > r1 = *(u8 *)(r2 + 42)
> > > which is an 8-bit load that zero extends into 64-bit.
> > > x86 JIT actually implements it as 8-bit load that stores
> > > into a 32-bit subregister, so it kinda matches w1,
> > > but that's an implementation detail of the JIT.
> > >
> > > I think both gcc and clang should always print r1 = *(u8 *)(r2 + 42)
> > > regardless of alu32 or not.
> > > In gas and clang assembler we can support both w1= and r1=
> > > flavors for backward compat.
> > >
> >
> > I agree with Alexei (the ... disassembler quirk ... comment is left by me :).
> > Can dig into clang part of things if this is a consensus.
>
> For disassembler, we have stx as well may use w* registers with alu32.
> In llvm BPFDisassembler.cpp, we have
>
> if ((InstClass == BPF_LDX || InstClass == BPF_STX) &&
> getInstSize(Insn) != BPF_DW &&
> (InstMode == BPF_MEM || InstMode == BPF_ATOMIC) &&
> STI.hasFeature(BPF::ALU32))
> Result = decodeInstruction(DecoderTableBPFALU3264, Instr, Insn,
> Address,
> this, STI);
> else
> Result = decodeInstruction(DecoderTableBPF64, Instr, Insn, Address,
> this,
> STI);
>
> Maybe we should just do
>
> Result = decodeInstruction(DecoderTableBPF64, Instr, Insn, Address,
> this, STI);
>
> So we already disassemble based on non-alu32 mode?
>
Yes, this changes llvm-objdump behavior to emit 64-bit registers on LHS.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Register encoding in assembly for load/store instructions
2023-07-26 4:16 ` Yonghong Song
2023-07-26 14:41 ` Eduard Zingerman
@ 2023-07-28 16:58 ` Eduard Zingerman
2023-07-28 21:29 ` Alexei Starovoitov
2023-07-28 23:25 ` Yonghong Song
1 sibling, 2 replies; 16+ messages in thread
From: Eduard Zingerman @ 2023-07-28 16:58 UTC (permalink / raw)
To: yonghong.song, Alexei Starovoitov, Jose E. Marchesi; +Cc: Yonghong Song, bpf
On Tue, 2023-07-25 at 21:16 -0700, Yonghong Song wrote:
>
> On 7/25/23 5:39 PM, Eduard Zingerman wrote:
> > On Tue, 2023-07-25 at 17:31 -0700, Alexei Starovoitov wrote:
> > > On Tue, Jul 25, 2023 at 3:28 PM Jose E. Marchesi
> > > <jose.marchesi@oracle.com> wrote:
> > > >
> > > >
> > > > > On 7/25/23 1:09 PM, Jose E. Marchesi wrote:
> > > > > >
> > > > > > > On 7/25/23 11:56 AM, Jose E. Marchesi wrote:
> > > > > > > >
> > > > > > > > > On 7/25/23 10:29 AM, Jose E. Marchesi wrote:
> > > > > > > > > > Hello Yonghong.
> > > > > > > > > > We have noticed that the llvm disassembler uses different notations
> > > > > > > > > > for
> > > > > > > > > > registers in load and store instructions, depending somehow on the width
> > > > > > > > > > of the data being loaded or stored.
> > > > > > > > > > For example, this is an excerpt from the assembler-disassembler.s
> > > > > > > > > > test
> > > > > > > > > > file in llvm:
> > > > > > > > > > // Note: For the group below w1 is used as a destination for
> > > > > > > > > > sizes u8, u16, u32.
> > > > > > > > > > // This is disassembler quirk, but is technically not wrong, as there are
> > > > > > > > > > // no different encodings for 'r1 = load' vs 'w1 = load'.
> > > > > > > > > > //
> > > > > > > > > > // CHECK: 71 21 2a 00 00 00 00 00 w1 = *(u8 *)(r2 + 0x2a)
> > > > > > > > > > // CHECK: 69 21 2a 00 00 00 00 00 w1 = *(u16 *)(r2 + 0x2a)
> > > > > > > > > > // CHECK: 61 21 2a 00 00 00 00 00 w1 = *(u32 *)(r2 + 0x2a)
> > > > > > > > > > // CHECK: 79 21 2a 00 00 00 00 00 r1 = *(u64 *)(r2 + 0x2a)
> > > > > > > > > > r1 = *(u8*)(r2 + 42)
> > > > > > > > > > r1 = *(u16*)(r2 + 42)
> > > > > > > > > > r1 = *(u32*)(r2 + 42)
> > > > > > > > > > r1 = *(u64*)(r2 + 42)
> > > > > > > > > > The comment there clarifies that the usage of wN instead of rN in
> > > > > > > > > > the
> > > > > > > > > > u8, u16 and u32 cases is a "disassembler quirk".
> > > > > > > > > > Anyway, the problem is that it seems that `clang -S' actually emits
> > > > > > > > > > these forms with wN.
> > > > > > > > > > Is that intended?
> > > > > > > > >
> > > > > > > > > Yes, this is intended since alu32 mode is enabled where
> > > > > > > > > w* registers are used for 8/16/32 bit load.
> > > > > > > > So then why suppporting 'r1 = 8948 8*9r2 + 0x2a)'? The mode is
> > > > > > > > still
> > > > > > > > alu32 mode. Isn't the u{8,16,32} part enough to discriminate?
> > > > > > >
> > > > > > > What does this 'r1 = 8948 8*9r2 + 0x2a)' mean?
> > > > > > >
> > > > > > > For u8/u16/u32 loads, if objdump with option to indicate alu32 mode,
> > > > > > > then w* register is used. If no alu32 mode for objdump, then r* register
> > > > > > > is used. Basically the same insn, disasm is different depending on
> > > > > > > alu32 mode or not. u8/u16/u32 is not enough to differentiate.
> > > > > > Ok, so the llvm objdump has a switch that tells when to use rN or wN
> > > > > > when printing these particular instructions. Thats the "disassembler
> > > > > > quirk". To what purpose? Isnt the person passing the command line
> > > > > > switch the same person reading the disassembled program? Is this "alu32
> > > > > > mode" more than a cosmetic thing?
> > > > > > But what concern us is the assembler, not the disassembler.
> > > > > > clang -S (which is not objdump) seems to generate these instructions
> > > > > > with wN (see https://godbolt.org/z/5G433Yvrb for a store instruction for
> > > > > > example) and we assume the output of clang -S is intended to be passed
> > > > > > to an assembler, much like with gcc -S.
> > > > > > So, should we support both syntaxes as _input_ syntax in the
> > > > > > assembler?
> > > > >
> > > > > Considering -mcpu=v3 is recommended cpu flavor (at least in bpf mailing
> > > > > list), and -mcpu=v3 has alu32 enabled by default. So I think
> > > > > gcc can start to emit insn assuming alu32 mode is on by default.
> > > > > So
> > > > > w1 = *(u8 *)(r2 + 42)
> > > > > is preferred.
> > > >
> > > > We have V4 by default now. So we can emit
> > > >
> > > > w1 = *(u8 *)(r2 + 42)
> > > >
> > > > when -mcpu is v3 or higher, or if -malu32 is specified, and
> > > >
> > > > r1 = *(u8 *)(r2 + 42)
> > > >
> > > > when -mcpu is v2 or lower, or if -mnoalu32 is specified.
> > > >
> > > > Sounds good?
> > > >
> > > > However this implies that the assembler should indeed recognize both
> > > > forms of instructions. But note that it will assembly them to the
> > > > exactly same encoded instruction. This includes inline asm (remember
> > > > GCC does not have an integrated assembler.)
> > >
> > > Good point.
> > > I think we made a mistake in clang.
> > > We shouldn't be printing
> > > w1 = *(u8 *)(r2 + 42)
> > > since such instruction doesn't exist in BPF ISA
> > > and it's confusing.
> > > There is only one instruction:
> > > r1 = *(u8 *)(r2 + 42)
> > > which is an 8-bit load that zero extends into 64-bit.
> > > x86 JIT actually implements it as 8-bit load that stores
> > > into a 32-bit subregister, so it kinda matches w1,
> > > but that's an implementation detail of the JIT.
> > >
> > > I think both gcc and clang should always print r1 = *(u8 *)(r2 + 42)
> > > regardless of alu32 or not.
> > > In gas and clang assembler we can support both w1= and r1=
> > > flavors for backward compat.
> > >
> >
> > I agree with Alexei (the ... disassembler quirk ... comment is left by me :).
> > Can dig into clang part of things if this is a consensus.
>
> For disassembler, we have stx as well may use w* registers with alu32.
> In llvm BPFDisassembler.cpp, we have
>
> if ((InstClass == BPF_LDX || InstClass == BPF_STX) &&
> getInstSize(Insn) != BPF_DW &&
> (InstMode == BPF_MEM || InstMode == BPF_ATOMIC) &&
> STI.hasFeature(BPF::ALU32))
> Result = decodeInstruction(DecoderTableBPFALU3264, Instr, Insn,
> Address,
> this, STI);
> else
> Result = decodeInstruction(DecoderTableBPF64, Instr, Insn, Address,
> this,
> STI);
>
> Maybe we should just do
>
> Result = decodeInstruction(DecoderTableBPF64, Instr, Insn, Address,
> this, STI);
>
> So we already disassemble based on non-alu32 mode?
>
Yonghong, Alexei,
I have a prototype [1] that consolidates STW/STW32, LDW/LDW32 etc
instructions in LLVM BPF backend, thus removing the syntactic
difference. I think it simplifies BPFInstrInfo.td a bit but that's up
to debate.
Should I proceed with it?
[1] https://reviews.llvm.org/D156559
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Register encoding in assembly for load/store instructions
2023-07-28 16:58 ` Eduard Zingerman
@ 2023-07-28 21:29 ` Alexei Starovoitov
2023-07-28 23:25 ` Yonghong Song
1 sibling, 0 replies; 16+ messages in thread
From: Alexei Starovoitov @ 2023-07-28 21:29 UTC (permalink / raw)
To: Eduard Zingerman; +Cc: Yonghong Song, Jose E. Marchesi, Yonghong Song, bpf
On Fri, Jul 28, 2023 at 9:58 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
> >
> > For disassembler, we have stx as well may use w* registers with alu32.
> > In llvm BPFDisassembler.cpp, we have
> >
> > if ((InstClass == BPF_LDX || InstClass == BPF_STX) &&
> > getInstSize(Insn) != BPF_DW &&
> > (InstMode == BPF_MEM || InstMode == BPF_ATOMIC) &&
> > STI.hasFeature(BPF::ALU32))
> > Result = decodeInstruction(DecoderTableBPFALU3264, Instr, Insn,
> > Address,
> > this, STI);
> > else
> > Result = decodeInstruction(DecoderTableBPF64, Instr, Insn, Address,
> > this,
> > STI);
> >
> > Maybe we should just do
> >
> > Result = decodeInstruction(DecoderTableBPF64, Instr, Insn, Address,
> > this, STI);
> >
> > So we already disassemble based on non-alu32 mode?
> >
>
> Yonghong, Alexei,
>
> I have a prototype [1] that consolidates STW/STW32, LDW/LDW32 etc
> instructions in LLVM BPF backend, thus removing the syntactic
> difference. I think it simplifies BPFInstrInfo.td a bit but that's up
> to debate.
>
> Should I proceed with it?
>
> [1] https://reviews.llvm.org/D156559
Makes sense to me.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Register encoding in assembly for load/store instructions
2023-07-28 16:58 ` Eduard Zingerman
2023-07-28 21:29 ` Alexei Starovoitov
@ 2023-07-28 23:25 ` Yonghong Song
1 sibling, 0 replies; 16+ messages in thread
From: Yonghong Song @ 2023-07-28 23:25 UTC (permalink / raw)
To: Eduard Zingerman, Alexei Starovoitov, Jose E. Marchesi; +Cc: Yonghong Song, bpf
On 7/28/23 9:58 AM, Eduard Zingerman wrote:
> On Tue, 2023-07-25 at 21:16 -0700, Yonghong Song wrote:
>>
>> On 7/25/23 5:39 PM, Eduard Zingerman wrote:
>>> On Tue, 2023-07-25 at 17:31 -0700, Alexei Starovoitov wrote:
>>>> On Tue, Jul 25, 2023 at 3:28 PM Jose E. Marchesi
>>>> <jose.marchesi@oracle.com> wrote:
>>>>>
>>>>>
>>>>>> On 7/25/23 1:09 PM, Jose E. Marchesi wrote:
>>>>>>>
>>>>>>>> On 7/25/23 11:56 AM, Jose E. Marchesi wrote:
>>>>>>>>>
>>>>>>>>>> On 7/25/23 10:29 AM, Jose E. Marchesi wrote:
>>>>>>>>>>> Hello Yonghong.
>>>>>>>>>>> We have noticed that the llvm disassembler uses different notations
>>>>>>>>>>> for
>>>>>>>>>>> registers in load and store instructions, depending somehow on the width
>>>>>>>>>>> of the data being loaded or stored.
>>>>>>>>>>> For example, this is an excerpt from the assembler-disassembler.s
>>>>>>>>>>> test
>>>>>>>>>>> file in llvm:
>>>>>>>>>>> // Note: For the group below w1 is used as a destination for
>>>>>>>>>>> sizes u8, u16, u32.
>>>>>>>>>>> // This is disassembler quirk, but is technically not wrong, as there are
>>>>>>>>>>> // no different encodings for 'r1 = load' vs 'w1 = load'.
>>>>>>>>>>> //
>>>>>>>>>>> // CHECK: 71 21 2a 00 00 00 00 00 w1 = *(u8 *)(r2 + 0x2a)
>>>>>>>>>>> // CHECK: 69 21 2a 00 00 00 00 00 w1 = *(u16 *)(r2 + 0x2a)
>>>>>>>>>>> // CHECK: 61 21 2a 00 00 00 00 00 w1 = *(u32 *)(r2 + 0x2a)
>>>>>>>>>>> // CHECK: 79 21 2a 00 00 00 00 00 r1 = *(u64 *)(r2 + 0x2a)
>>>>>>>>>>> r1 = *(u8*)(r2 + 42)
>>>>>>>>>>> r1 = *(u16*)(r2 + 42)
>>>>>>>>>>> r1 = *(u32*)(r2 + 42)
>>>>>>>>>>> r1 = *(u64*)(r2 + 42)
>>>>>>>>>>> The comment there clarifies that the usage of wN instead of rN in
>>>>>>>>>>> the
>>>>>>>>>>> u8, u16 and u32 cases is a "disassembler quirk".
>>>>>>>>>>> Anyway, the problem is that it seems that `clang -S' actually emits
>>>>>>>>>>> these forms with wN.
>>>>>>>>>>> Is that intended?
>>>>>>>>>>
>>>>>>>>>> Yes, this is intended since alu32 mode is enabled where
>>>>>>>>>> w* registers are used for 8/16/32 bit load.
>>>>>>>>> So then why suppporting 'r1 = 8948 8*9r2 + 0x2a)'? The mode is
>>>>>>>>> still
>>>>>>>>> alu32 mode. Isn't the u{8,16,32} part enough to discriminate?
>>>>>>>>
>>>>>>>> What does this 'r1 = 8948 8*9r2 + 0x2a)' mean?
>>>>>>>>
>>>>>>>> For u8/u16/u32 loads, if objdump with option to indicate alu32 mode,
>>>>>>>> then w* register is used. If no alu32 mode for objdump, then r* register
>>>>>>>> is used. Basically the same insn, disasm is different depending on
>>>>>>>> alu32 mode or not. u8/u16/u32 is not enough to differentiate.
>>>>>>> Ok, so the llvm objdump has a switch that tells when to use rN or wN
>>>>>>> when printing these particular instructions. Thats the "disassembler
>>>>>>> quirk". To what purpose? Isnt the person passing the command line
>>>>>>> switch the same person reading the disassembled program? Is this "alu32
>>>>>>> mode" more than a cosmetic thing?
>>>>>>> But what concern us is the assembler, not the disassembler.
>>>>>>> clang -S (which is not objdump) seems to generate these instructions
>>>>>>> with wN (see https://godbolt.org/z/5G433Yvrb for a store instruction for
>>>>>>> example) and we assume the output of clang -S is intended to be passed
>>>>>>> to an assembler, much like with gcc -S.
>>>>>>> So, should we support both syntaxes as _input_ syntax in the
>>>>>>> assembler?
>>>>>>
>>>>>> Considering -mcpu=v3 is recommended cpu flavor (at least in bpf mailing
>>>>>> list), and -mcpu=v3 has alu32 enabled by default. So I think
>>>>>> gcc can start to emit insn assuming alu32 mode is on by default.
>>>>>> So
>>>>>> w1 = *(u8 *)(r2 + 42)
>>>>>> is preferred.
>>>>>
>>>>> We have V4 by default now. So we can emit
>>>>>
>>>>> w1 = *(u8 *)(r2 + 42)
>>>>>
>>>>> when -mcpu is v3 or higher, or if -malu32 is specified, and
>>>>>
>>>>> r1 = *(u8 *)(r2 + 42)
>>>>>
>>>>> when -mcpu is v2 or lower, or if -mnoalu32 is specified.
>>>>>
>>>>> Sounds good?
>>>>>
>>>>> However this implies that the assembler should indeed recognize both
>>>>> forms of instructions. But note that it will assembly them to the
>>>>> exactly same encoded instruction. This includes inline asm (remember
>>>>> GCC does not have an integrated assembler.)
>>>>
>>>> Good point.
>>>> I think we made a mistake in clang.
>>>> We shouldn't be printing
>>>> w1 = *(u8 *)(r2 + 42)
>>>> since such instruction doesn't exist in BPF ISA
>>>> and it's confusing.
>>>> There is only one instruction:
>>>> r1 = *(u8 *)(r2 + 42)
>>>> which is an 8-bit load that zero extends into 64-bit.
>>>> x86 JIT actually implements it as 8-bit load that stores
>>>> into a 32-bit subregister, so it kinda matches w1,
>>>> but that's an implementation detail of the JIT.
>>>>
>>>> I think both gcc and clang should always print r1 = *(u8 *)(r2 + 42)
>>>> regardless of alu32 or not.
>>>> In gas and clang assembler we can support both w1= and r1=
>>>> flavors for backward compat.
>>>>
>>>
>>> I agree with Alexei (the ... disassembler quirk ... comment is left by me :).
>>> Can dig into clang part of things if this is a consensus.
>>
>> For disassembler, we have stx as well may use w* registers with alu32.
>> In llvm BPFDisassembler.cpp, we have
>>
>> if ((InstClass == BPF_LDX || InstClass == BPF_STX) &&
>> getInstSize(Insn) != BPF_DW &&
>> (InstMode == BPF_MEM || InstMode == BPF_ATOMIC) &&
>> STI.hasFeature(BPF::ALU32))
>> Result = decodeInstruction(DecoderTableBPFALU3264, Instr, Insn,
>> Address,
>> this, STI);
>> else
>> Result = decodeInstruction(DecoderTableBPF64, Instr, Insn, Address,
>> this,
>> STI);
>>
>> Maybe we should just do
>>
>> Result = decodeInstruction(DecoderTableBPF64, Instr, Insn, Address,
>> this, STI);
>>
>> So we already disassemble based on non-alu32 mode?
>>
>
> Yonghong, Alexei,
>
> I have a prototype [1] that consolidates STW/STW32, LDW/LDW32 etc
> instructions in LLVM BPF backend, thus removing the syntactic
> difference. I think it simplifies BPFInstrInfo.td a bit but that's up
> to debate.
>
> Should I proceed with it?
>
> [1] https://reviews.llvm.org/D156559
I made a comment to the diff w.r.t. backward compatibility issue.
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2023-07-28 23:25 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-07-25 17:29 Register encoding in assembly for load/store instructions Jose E. Marchesi
2023-07-25 18:47 ` Yonghong Song
2023-07-25 18:56 ` Jose E. Marchesi
2023-07-25 19:11 ` Jose E. Marchesi
2023-07-25 19:59 ` Yonghong Song
2023-07-25 19:45 ` Yonghong Song
2023-07-25 20:09 ` Jose E. Marchesi
2023-07-25 22:10 ` Yonghong Song
2023-07-25 22:26 ` Jose E. Marchesi
2023-07-26 0:31 ` Alexei Starovoitov
2023-07-26 0:39 ` Eduard Zingerman
2023-07-26 4:16 ` Yonghong Song
2023-07-26 14:41 ` Eduard Zingerman
2023-07-28 16:58 ` Eduard Zingerman
2023-07-28 21:29 ` Alexei Starovoitov
2023-07-28 23:25 ` Yonghong Song
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox