* [PATCH bpf-next v2] bpf, docs: Add explanation of endianness
@ 2023-02-20 22:37 Dave Thaler
2023-02-22 22:10 ` patchwork-bot+netdevbpf
2023-02-22 22:10 ` [Bpf] " Alexei Starovoitov
0 siblings, 2 replies; 8+ messages in thread
From: Dave Thaler @ 2023-02-20 22:37 UTC (permalink / raw)
To: bpf; +Cc: bpf, Dave Thaler, David Vernet
From: Dave Thaler <dthaler@microsoft.com>
Document the discussion from the email thread on the IETF bpf list,
where it was explained that the raw format varies by endianness
of the processor.
Signed-off-by: Dave Thaler <dthaler@microsoft.com>
Acked-by: David Vernet <void@manifault.com>
---
V1 -> V2: rebased on top of latest master
---
Documentation/bpf/instruction-set.rst | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/Documentation/bpf/instruction-set.rst b/Documentation/bpf/instruction-set.rst
index af515de5fc3..1d473f060fa 100644
--- a/Documentation/bpf/instruction-set.rst
+++ b/Documentation/bpf/instruction-set.rst
@@ -38,8 +38,9 @@ eBPF has two instruction encodings:
* the wide instruction encoding, which appends a second 64-bit immediate (i.e.,
constant) value after the basic instruction for a total of 128 bits.
-The basic instruction encoding is as follows, where MSB and LSB mean the most significant
-bits and least significant bits, respectively:
+The basic instruction encoding looks as follows for a little-endian processor,
+where MSB and LSB mean the most significant bits and least significant bits,
+respectively:
============= ======= ======= ======= ============
32 bits (MSB) 16 bits 4 bits 4 bits 8 bits (LSB)
@@ -63,6 +64,17 @@ imm offset src_reg dst_reg opcode
**opcode**
operation to perform
+and as follows for a big-endian processor:
+
+============= ======= ==================== =============== ============
+32 bits (MSB) 16 bits 4 bits 4 bits 8 bits (LSB)
+============= ======= ==================== =============== ============
+immediate offset destination register source register opcode
+============= ======= ==================== =============== ============
+
+Multi-byte fields ('immediate' and 'offset') are similarly stored in
+the byte order of the processor.
+
Note that most instructions do not use all of the fields.
Unused fields shall be cleared to zero.
--
2.33.4
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH bpf-next v2] bpf, docs: Add explanation of endianness
2023-02-20 22:37 [PATCH bpf-next v2] bpf, docs: Add explanation of endianness Dave Thaler
@ 2023-02-22 22:10 ` patchwork-bot+netdevbpf
2023-02-22 22:10 ` [Bpf] " Alexei Starovoitov
1 sibling, 0 replies; 8+ messages in thread
From: patchwork-bot+netdevbpf @ 2023-02-22 22:10 UTC (permalink / raw)
To: Dave Thaler; +Cc: bpf, bpf, dthaler, void
Hello:
This patch was applied to bpf/bpf-next.git (master)
by Alexei Starovoitov <ast@kernel.org>:
On Mon, 20 Feb 2023 22:37:42 +0000 you wrote:
> From: Dave Thaler <dthaler@microsoft.com>
>
> Document the discussion from the email thread on the IETF bpf list,
> where it was explained that the raw format varies by endianness
> of the processor.
>
> Signed-off-by: Dave Thaler <dthaler@microsoft.com>
>
> [...]
Here is the summary with links:
- [bpf-next,v2] bpf, docs: Add explanation of endianness
https://git.kernel.org/bpf/bpf-next/c/746ce7671285
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Bpf] [PATCH bpf-next v2] bpf, docs: Add explanation of endianness
2023-02-20 22:37 [PATCH bpf-next v2] bpf, docs: Add explanation of endianness Dave Thaler
2023-02-22 22:10 ` patchwork-bot+netdevbpf
@ 2023-02-22 22:10 ` Alexei Starovoitov
2023-02-22 23:23 ` Jose E. Marchesi
1 sibling, 1 reply; 8+ messages in thread
From: Alexei Starovoitov @ 2023-02-22 22:10 UTC (permalink / raw)
To: Dave Thaler; +Cc: bpf, bpf, Dave Thaler, David Vernet
On Mon, Feb 20, 2023 at 2:37 PM Dave Thaler
<dthaler1968=40googlemail.com@dmarc.ietf.org> wrote:
>
> From: Dave Thaler <dthaler@microsoft.com>
>
> Document the discussion from the email thread on the IETF bpf list,
> where it was explained that the raw format varies by endianness
> of the processor.
>
> Signed-off-by: Dave Thaler <dthaler@microsoft.com>
>
> Acked-by: David Vernet <void@manifault.com>
> ---
>
> V1 -> V2: rebased on top of latest master
> ---
> Documentation/bpf/instruction-set.rst | 16 ++++++++++++++--
> 1 file changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/bpf/instruction-set.rst b/Documentation/bpf/instruction-set.rst
> index af515de5fc3..1d473f060fa 100644
> --- a/Documentation/bpf/instruction-set.rst
> +++ b/Documentation/bpf/instruction-set.rst
> @@ -38,8 +38,9 @@ eBPF has two instruction encodings:
> * the wide instruction encoding, which appends a second 64-bit immediate (i.e.,
> constant) value after the basic instruction for a total of 128 bits.
>
> -The basic instruction encoding is as follows, where MSB and LSB mean the most significant
> -bits and least significant bits, respectively:
> +The basic instruction encoding looks as follows for a little-endian processor,
> +where MSB and LSB mean the most significant bits and least significant bits,
> +respectively:
>
> ============= ======= ======= ======= ============
> 32 bits (MSB) 16 bits 4 bits 4 bits 8 bits (LSB)
> @@ -63,6 +64,17 @@ imm offset src_reg dst_reg opcode
> **opcode**
> operation to perform
>
> +and as follows for a big-endian processor:
> +
> +============= ======= ==================== =============== ============
> +32 bits (MSB) 16 bits 4 bits 4 bits 8 bits (LSB)
> +============= ======= ==================== =============== ============
> +immediate offset destination register source register opcode
> +============= ======= ==================== =============== ============
I've changed it to:
imm offset dst_reg src_reg opcode
to match the little endian table,
but now one of the tables feels wrong.
The encoding is always done by applying C standard to the struct:
struct bpf_insn {
__u8 code; /* opcode */
__u8 dst_reg:4; /* dest register */
__u8 src_reg:4; /* source register */
__s16 off; /* signed offset */
__s32 imm; /* signed immediate constant */
};
I'm not sure how to express this clearly in the table.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Bpf] [PATCH bpf-next v2] bpf, docs: Add explanation of endianness
2023-02-22 22:10 ` [Bpf] " Alexei Starovoitov
@ 2023-02-22 23:23 ` Jose E. Marchesi
2023-02-23 1:56 ` Alexei Starovoitov
0 siblings, 1 reply; 8+ messages in thread
From: Jose E. Marchesi @ 2023-02-22 23:23 UTC (permalink / raw)
To: Alexei Starovoitov; +Cc: Dave Thaler, bpf, bpf, Dave Thaler, David Vernet
> On Mon, Feb 20, 2023 at 2:37 PM Dave Thaler
> <dthaler1968=40googlemail.com@dmarc.ietf.org> wrote:
>>
>> From: Dave Thaler <dthaler@microsoft.com>
>>
>> Document the discussion from the email thread on the IETF bpf list,
>> where it was explained that the raw format varies by endianness
>> of the processor.
>>
>> Signed-off-by: Dave Thaler <dthaler@microsoft.com>
>>
>> Acked-by: David Vernet <void@manifault.com>
>> ---
>>
>> V1 -> V2: rebased on top of latest master
>> ---
>> Documentation/bpf/instruction-set.rst | 16 ++++++++++++++--
>> 1 file changed, 14 insertions(+), 2 deletions(-)
>>
>> diff --git a/Documentation/bpf/instruction-set.rst b/Documentation/bpf/instruction-set.rst
>> index af515de5fc3..1d473f060fa 100644
>> --- a/Documentation/bpf/instruction-set.rst
>> +++ b/Documentation/bpf/instruction-set.rst
>> @@ -38,8 +38,9 @@ eBPF has two instruction encodings:
>> * the wide instruction encoding, which appends a second 64-bit immediate (i.e.,
>> constant) value after the basic instruction for a total of 128 bits.
>>
>> -The basic instruction encoding is as follows, where MSB and LSB mean the most significant
>> -bits and least significant bits, respectively:
>> +The basic instruction encoding looks as follows for a little-endian processor,
>> +where MSB and LSB mean the most significant bits and least significant bits,
>> +respectively:
>>
>> ============= ======= ======= ======= ============
>> 32 bits (MSB) 16 bits 4 bits 4 bits 8 bits (LSB)
>> @@ -63,6 +64,17 @@ imm offset src_reg dst_reg opcode
>> **opcode**
>> operation to perform
>>
>> +and as follows for a big-endian processor:
>> +
>> +============= ======= ==================== =============== ============
>> +32 bits (MSB) 16 bits 4 bits 4 bits 8 bits (LSB)
>> +============= ======= ==================== =============== ============
>> +immediate offset destination register source register opcode
>> +============= ======= ==================== =============== ============
>
> I've changed it to:
> imm offset dst_reg src_reg opcode
>
> to match the little endian table,
> but now one of the tables feels wrong.
> The encoding is always done by applying C standard to the struct:
> struct bpf_insn {
> __u8 code; /* opcode */
> __u8 dst_reg:4; /* dest register */
> __u8 src_reg:4; /* source register */
> __s16 off; /* signed offset */
> __s32 imm; /* signed immediate constant */
> };
> I'm not sure how to express this clearly in the table.
Perhaps it would be simpler to document how the instruction bytes are
stored (be it in an ELF file or as bytes in a memory buffer to be loaded
into the kernel or some other BPF consumer) as opposed to how the
instructions look like once loaded (as a 64-bit word) by a little-endian
or big-endian kernel?
Stored little-endian BPF instructions:
code src_reg dst_reg off imm
foo-le.o: file format elf64-bpfle
0000000000000000 <.text>:
0: 07 01 00 00 ef be ad de r1 += 0xdeadbeef
Stored big-endian BPF instructions:
code dst_reg src_reg off imm
foo-be.o: file format elf64-bpfbe
0000000000000000 <.text>:
0: 07 10 00 00 de ad be ef r1 += 0xdeadbeef
i.e. in the stored bytes the code always comes first, then the
registers, then the offset, then the immediate, regardless of
endianness.
This may be easier to understand by implementors looking to generate
and/or consume bytes conforming BPF instructions.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Bpf] [PATCH bpf-next v2] bpf, docs: Add explanation of endianness
2023-02-22 23:23 ` Jose E. Marchesi
@ 2023-02-23 1:56 ` Alexei Starovoitov
2023-02-23 13:18 ` Jose E. Marchesi
0 siblings, 1 reply; 8+ messages in thread
From: Alexei Starovoitov @ 2023-02-23 1:56 UTC (permalink / raw)
To: Jose E. Marchesi; +Cc: Dave Thaler, bpf, bpf, Dave Thaler, David Vernet
On Wed, Feb 22, 2023 at 3:23 PM Jose E. Marchesi
<jose.marchesi@oracle.com> wrote:
>
>
> > On Mon, Feb 20, 2023 at 2:37 PM Dave Thaler
> > <dthaler1968=40googlemail.com@dmarc.ietf.org> wrote:
> >>
> >> From: Dave Thaler <dthaler@microsoft.com>
> >>
> >> Document the discussion from the email thread on the IETF bpf list,
> >> where it was explained that the raw format varies by endianness
> >> of the processor.
> >>
> >> Signed-off-by: Dave Thaler <dthaler@microsoft.com>
> >>
> >> Acked-by: David Vernet <void@manifault.com>
> >> ---
> >>
> >> V1 -> V2: rebased on top of latest master
> >> ---
> >> Documentation/bpf/instruction-set.rst | 16 ++++++++++++++--
> >> 1 file changed, 14 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/Documentation/bpf/instruction-set.rst b/Documentation/bpf/instruction-set.rst
> >> index af515de5fc3..1d473f060fa 100644
> >> --- a/Documentation/bpf/instruction-set.rst
> >> +++ b/Documentation/bpf/instruction-set.rst
> >> @@ -38,8 +38,9 @@ eBPF has two instruction encodings:
> >> * the wide instruction encoding, which appends a second 64-bit immediate (i.e.,
> >> constant) value after the basic instruction for a total of 128 bits.
> >>
> >> -The basic instruction encoding is as follows, where MSB and LSB mean the most significant
> >> -bits and least significant bits, respectively:
> >> +The basic instruction encoding looks as follows for a little-endian processor,
> >> +where MSB and LSB mean the most significant bits and least significant bits,
> >> +respectively:
> >>
> >> ============= ======= ======= ======= ============
> >> 32 bits (MSB) 16 bits 4 bits 4 bits 8 bits (LSB)
> >> @@ -63,6 +64,17 @@ imm offset src_reg dst_reg opcode
> >> **opcode**
> >> operation to perform
> >>
> >> +and as follows for a big-endian processor:
> >> +
> >> +============= ======= ==================== =============== ============
> >> +32 bits (MSB) 16 bits 4 bits 4 bits 8 bits (LSB)
> >> +============= ======= ==================== =============== ============
> >> +immediate offset destination register source register opcode
> >> +============= ======= ==================== =============== ============
> >
> > I've changed it to:
> > imm offset dst_reg src_reg opcode
> >
> > to match the little endian table,
> > but now one of the tables feels wrong.
> > The encoding is always done by applying C standard to the struct:
> > struct bpf_insn {
> > __u8 code; /* opcode */
> > __u8 dst_reg:4; /* dest register */
> > __u8 src_reg:4; /* source register */
> > __s16 off; /* signed offset */
> > __s32 imm; /* signed immediate constant */
> > };
> > I'm not sure how to express this clearly in the table.
>
> Perhaps it would be simpler to document how the instruction bytes are
> stored (be it in an ELF file or as bytes in a memory buffer to be loaded
> into the kernel or some other BPF consumer) as opposed to how the
> instructions look like once loaded (as a 64-bit word) by a little-endian
> or big-endian kernel?
>
> Stored little-endian BPF instructions:
>
> code src_reg dst_reg off imm
>
> foo-le.o: file format elf64-bpfle
>
> 0000000000000000 <.text>:
> 0: 07 01 00 00 ef be ad de r1 += 0xdeadbeef
>
> Stored big-endian BPF instructions:
>
> code dst_reg src_reg off imm
>
> foo-be.o: file format elf64-bpfbe
>
> 0000000000000000 <.text>:
> 0: 07 10 00 00 de ad be ef r1 += 0xdeadbeef
>
> i.e. in the stored bytes the code always comes first, then the
> registers, then the offset, then the immediate, regardless of
> endianness.
>
> This may be easier to understand by implementors looking to generate
> and/or consume bytes conforming BPF instructions.
+1
I like this format more as well.
Maybe we can drop the table and use a diagram of a kind ?
opcode src dst offset imm assembly
07 0 1 00 00 ef be ad de r1 += 0xdeadbeef // little
07 1 0 00 00 de ad be ef r1 += 0xdeadbeef // big
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Bpf] [PATCH bpf-next v2] bpf, docs: Add explanation of endianness
2023-02-23 1:56 ` Alexei Starovoitov
@ 2023-02-23 13:18 ` Jose E. Marchesi
2023-02-23 16:40 ` Alexei Starovoitov
0 siblings, 1 reply; 8+ messages in thread
From: Jose E. Marchesi @ 2023-02-23 13:18 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Jose E. Marchesi, Dave Thaler, bpf, bpf, Dave Thaler,
David Vernet
> On Wed, Feb 22, 2023 at 3:23 PM Jose E. Marchesi
> <jose.marchesi@oracle.com> wrote:
>>
>>
>> > On Mon, Feb 20, 2023 at 2:37 PM Dave Thaler
>> > <dthaler1968=40googlemail.com@dmarc.ietf.org> wrote:
>> >>
>> >> From: Dave Thaler <dthaler@microsoft.com>
>> >>
>> >> Document the discussion from the email thread on the IETF bpf list,
>> >> where it was explained that the raw format varies by endianness
>> >> of the processor.
>> >>
>> >> Signed-off-by: Dave Thaler <dthaler@microsoft.com>
>> >>
>> >> Acked-by: David Vernet <void@manifault.com>
>> >> ---
>> >>
>> >> V1 -> V2: rebased on top of latest master
>> >> ---
>> >> Documentation/bpf/instruction-set.rst | 16 ++++++++++++++--
>> >> 1 file changed, 14 insertions(+), 2 deletions(-)
>> >>
>> >> diff --git a/Documentation/bpf/instruction-set.rst b/Documentation/bpf/instruction-set.rst
>> >> index af515de5fc3..1d473f060fa 100644
>> >> --- a/Documentation/bpf/instruction-set.rst
>> >> +++ b/Documentation/bpf/instruction-set.rst
>> >> @@ -38,8 +38,9 @@ eBPF has two instruction encodings:
>> >> * the wide instruction encoding, which appends a second 64-bit immediate (i.e.,
>> >> constant) value after the basic instruction for a total of 128 bits.
>> >>
>> >> -The basic instruction encoding is as follows, where MSB and LSB mean the most significant
>> >> -bits and least significant bits, respectively:
>> >> +The basic instruction encoding looks as follows for a little-endian processor,
>> >> +where MSB and LSB mean the most significant bits and least significant bits,
>> >> +respectively:
>> >>
>> >> ============= ======= ======= ======= ============
>> >> 32 bits (MSB) 16 bits 4 bits 4 bits 8 bits (LSB)
>> >> @@ -63,6 +64,17 @@ imm offset src_reg dst_reg opcode
>> >> **opcode**
>> >> operation to perform
>> >>
>> >> +and as follows for a big-endian processor:
>> >> +
>> >> +============= ======= ==================== =============== ============
>> >> +32 bits (MSB) 16 bits 4 bits 4 bits 8 bits (LSB)
>> >> +============= ======= ==================== =============== ============
>> >> +immediate offset destination register source register opcode
>> >> +============= ======= ==================== =============== ============
>> >
>> > I've changed it to:
>> > imm offset dst_reg src_reg opcode
>> >
>> > to match the little endian table,
>> > but now one of the tables feels wrong.
>> > The encoding is always done by applying C standard to the struct:
>> > struct bpf_insn {
>> > __u8 code; /* opcode */
>> > __u8 dst_reg:4; /* dest register */
>> > __u8 src_reg:4; /* source register */
>> > __s16 off; /* signed offset */
>> > __s32 imm; /* signed immediate constant */
>> > };
>> > I'm not sure how to express this clearly in the table.
>>
>> Perhaps it would be simpler to document how the instruction bytes are
>> stored (be it in an ELF file or as bytes in a memory buffer to be loaded
>> into the kernel or some other BPF consumer) as opposed to how the
>> instructions look like once loaded (as a 64-bit word) by a little-endian
>> or big-endian kernel?
>>
>> Stored little-endian BPF instructions:
>>
>> code src_reg dst_reg off imm
>>
>> foo-le.o: file format elf64-bpfle
>>
>> 0000000000000000 <.text>:
>> 0: 07 01 00 00 ef be ad de r1 += 0xdeadbeef
>>
>> Stored big-endian BPF instructions:
>>
>> code dst_reg src_reg off imm
>>
>> foo-be.o: file format elf64-bpfbe
>>
>> 0000000000000000 <.text>:
>> 0: 07 10 00 00 de ad be ef r1 += 0xdeadbeef
>>
>> i.e. in the stored bytes the code always comes first, then the
>> registers, then the offset, then the immediate, regardless of
>> endianness.
>>
>> This may be easier to understand by implementors looking to generate
>> and/or consume bytes conforming BPF instructions.
>
> +1
> I like this format more as well.
> Maybe we can drop the table and use a diagram of a kind ?
>
> opcode src dst offset imm assembly
> 07 0 1 00 00 ef be ad de r1 += 0xdeadbeef // little
> 07 1 0 00 00 de ad be ef r1 += 0xdeadbeef // big
Good idea. What about something like this:
opcode offset imm assembly
src dst
07 0 1 00 00 44 33 22 11 r1 += 0x11223344 // little
dst src
07 1 0 00 00 11 22 33 44 r1 += 0x11223344 // big
I changed the immediate because 0xdeadbeef is negative and it may be
confusing in the assembly part: strictly it would be r1 += -559038737.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Bpf] [PATCH bpf-next v2] bpf, docs: Add explanation of endianness
2023-02-23 13:18 ` Jose E. Marchesi
@ 2023-02-23 16:40 ` Alexei Starovoitov
2023-02-23 16:42 ` Jose E. Marchesi
0 siblings, 1 reply; 8+ messages in thread
From: Alexei Starovoitov @ 2023-02-23 16:40 UTC (permalink / raw)
To: Jose E. Marchesi
Cc: Jose E. Marchesi, Dave Thaler, bpf, bpf, Dave Thaler,
David Vernet
On Thu, Feb 23, 2023 at 5:19 AM Jose E. Marchesi <jemarch@gnu.org> wrote:
>
>
> > On Wed, Feb 22, 2023 at 3:23 PM Jose E. Marchesi
> > <jose.marchesi@oracle.com> wrote:
> >>
> >>
> >> > On Mon, Feb 20, 2023 at 2:37 PM Dave Thaler
> >> > <dthaler1968=40googlemail.com@dmarc.ietf.org> wrote:
> >> >>
> >> >> From: Dave Thaler <dthaler@microsoft.com>
> >> >>
> >> >> Document the discussion from the email thread on the IETF bpf list,
> >> >> where it was explained that the raw format varies by endianness
> >> >> of the processor.
> >> >>
> >> >> Signed-off-by: Dave Thaler <dthaler@microsoft.com>
> >> >>
> >> >> Acked-by: David Vernet <void@manifault.com>
> >> >> ---
> >> >>
> >> >> V1 -> V2: rebased on top of latest master
> >> >> ---
> >> >> Documentation/bpf/instruction-set.rst | 16 ++++++++++++++--
> >> >> 1 file changed, 14 insertions(+), 2 deletions(-)
> >> >>
> >> >> diff --git a/Documentation/bpf/instruction-set.rst b/Documentation/bpf/instruction-set.rst
> >> >> index af515de5fc3..1d473f060fa 100644
> >> >> --- a/Documentation/bpf/instruction-set.rst
> >> >> +++ b/Documentation/bpf/instruction-set.rst
> >> >> @@ -38,8 +38,9 @@ eBPF has two instruction encodings:
> >> >> * the wide instruction encoding, which appends a second 64-bit immediate (i.e.,
> >> >> constant) value after the basic instruction for a total of 128 bits.
> >> >>
> >> >> -The basic instruction encoding is as follows, where MSB and LSB mean the most significant
> >> >> -bits and least significant bits, respectively:
> >> >> +The basic instruction encoding looks as follows for a little-endian processor,
> >> >> +where MSB and LSB mean the most significant bits and least significant bits,
> >> >> +respectively:
> >> >>
> >> >> ============= ======= ======= ======= ============
> >> >> 32 bits (MSB) 16 bits 4 bits 4 bits 8 bits (LSB)
> >> >> @@ -63,6 +64,17 @@ imm offset src_reg dst_reg opcode
> >> >> **opcode**
> >> >> operation to perform
> >> >>
> >> >> +and as follows for a big-endian processor:
> >> >> +
> >> >> +============= ======= ==================== =============== ============
> >> >> +32 bits (MSB) 16 bits 4 bits 4 bits 8 bits (LSB)
> >> >> +============= ======= ==================== =============== ============
> >> >> +immediate offset destination register source register opcode
> >> >> +============= ======= ==================== =============== ============
> >> >
> >> > I've changed it to:
> >> > imm offset dst_reg src_reg opcode
> >> >
> >> > to match the little endian table,
> >> > but now one of the tables feels wrong.
> >> > The encoding is always done by applying C standard to the struct:
> >> > struct bpf_insn {
> >> > __u8 code; /* opcode */
> >> > __u8 dst_reg:4; /* dest register */
> >> > __u8 src_reg:4; /* source register */
> >> > __s16 off; /* signed offset */
> >> > __s32 imm; /* signed immediate constant */
> >> > };
> >> > I'm not sure how to express this clearly in the table.
> >>
> >> Perhaps it would be simpler to document how the instruction bytes are
> >> stored (be it in an ELF file or as bytes in a memory buffer to be loaded
> >> into the kernel or some other BPF consumer) as opposed to how the
> >> instructions look like once loaded (as a 64-bit word) by a little-endian
> >> or big-endian kernel?
> >>
> >> Stored little-endian BPF instructions:
> >>
> >> code src_reg dst_reg off imm
> >>
> >> foo-le.o: file format elf64-bpfle
> >>
> >> 0000000000000000 <.text>:
> >> 0: 07 01 00 00 ef be ad de r1 += 0xdeadbeef
> >>
> >> Stored big-endian BPF instructions:
> >>
> >> code dst_reg src_reg off imm
> >>
> >> foo-be.o: file format elf64-bpfbe
> >>
> >> 0000000000000000 <.text>:
> >> 0: 07 10 00 00 de ad be ef r1 += 0xdeadbeef
> >>
> >> i.e. in the stored bytes the code always comes first, then the
> >> registers, then the offset, then the immediate, regardless of
> >> endianness.
> >>
> >> This may be easier to understand by implementors looking to generate
> >> and/or consume bytes conforming BPF instructions.
> >
> > +1
> > I like this format more as well.
> > Maybe we can drop the table and use a diagram of a kind ?
> >
> > opcode src dst offset imm assembly
> > 07 0 1 00 00 ef be ad de r1 += 0xdeadbeef // little
> > 07 1 0 00 00 de ad be ef r1 += 0xdeadbeef // big
>
> Good idea. What about something like this:
>
> opcode offset imm assembly
> src dst
> 07 0 1 00 00 44 33 22 11 r1 += 0x11223344 // little
> dst src
> 07 1 0 00 00 11 22 33 44 r1 += 0x11223344 // big
>
> I changed the immediate because 0xdeadbeef is negative and it may be
> confusing in the assembly part: strictly it would be r1 += -559038737.
Looks great to me. Do you want to send your first kernel patch? :)
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Bpf] [PATCH bpf-next v2] bpf, docs: Add explanation of endianness
2023-02-23 16:40 ` Alexei Starovoitov
@ 2023-02-23 16:42 ` Jose E. Marchesi
0 siblings, 0 replies; 8+ messages in thread
From: Jose E. Marchesi @ 2023-02-23 16:42 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Jose E. Marchesi, Dave Thaler, bpf, bpf, Dave Thaler,
David Vernet
> On Thu, Feb 23, 2023 at 5:19 AM Jose E. Marchesi <jemarch@gnu.org> wrote:
>>
>>
>> > On Wed, Feb 22, 2023 at 3:23 PM Jose E. Marchesi
>> > <jose.marchesi@oracle.com> wrote:
>> >>
>> >>
>> >> > On Mon, Feb 20, 2023 at 2:37 PM Dave Thaler
>> >> > <dthaler1968=40googlemail.com@dmarc.ietf.org> wrote:
>> >> >>
>> >> >> From: Dave Thaler <dthaler@microsoft.com>
>> >> >>
>> >> >> Document the discussion from the email thread on the IETF bpf list,
>> >> >> where it was explained that the raw format varies by endianness
>> >> >> of the processor.
>> >> >>
>> >> >> Signed-off-by: Dave Thaler <dthaler@microsoft.com>
>> >> >>
>> >> >> Acked-by: David Vernet <void@manifault.com>
>> >> >> ---
>> >> >>
>> >> >> V1 -> V2: rebased on top of latest master
>> >> >> ---
>> >> >> Documentation/bpf/instruction-set.rst | 16 ++++++++++++++--
>> >> >> 1 file changed, 14 insertions(+), 2 deletions(-)
>> >> >>
>> >> >> diff --git a/Documentation/bpf/instruction-set.rst b/Documentation/bpf/instruction-set.rst
>> >> >> index af515de5fc3..1d473f060fa 100644
>> >> >> --- a/Documentation/bpf/instruction-set.rst
>> >> >> +++ b/Documentation/bpf/instruction-set.rst
>> >> >> @@ -38,8 +38,9 @@ eBPF has two instruction encodings:
>> >> >> * the wide instruction encoding, which appends a second 64-bit immediate (i.e.,
>> >> >> constant) value after the basic instruction for a total of 128 bits.
>> >> >>
>> >> >> -The basic instruction encoding is as follows, where MSB and LSB mean the most significant
>> >> >> -bits and least significant bits, respectively:
>> >> >> +The basic instruction encoding looks as follows for a little-endian processor,
>> >> >> +where MSB and LSB mean the most significant bits and least significant bits,
>> >> >> +respectively:
>> >> >>
>> >> >> ============= ======= ======= ======= ============
>> >> >> 32 bits (MSB) 16 bits 4 bits 4 bits 8 bits (LSB)
>> >> >> @@ -63,6 +64,17 @@ imm offset src_reg dst_reg opcode
>> >> >> **opcode**
>> >> >> operation to perform
>> >> >>
>> >> >> +and as follows for a big-endian processor:
>> >> >> +
>> >> >> +============= ======= ==================== =============== ============
>> >> >> +32 bits (MSB) 16 bits 4 bits 4 bits 8 bits (LSB)
>> >> >> +============= ======= ==================== =============== ============
>> >> >> +immediate offset destination register source register opcode
>> >> >> +============= ======= ==================== =============== ============
>> >> >
>> >> > I've changed it to:
>> >> > imm offset dst_reg src_reg opcode
>> >> >
>> >> > to match the little endian table,
>> >> > but now one of the tables feels wrong.
>> >> > The encoding is always done by applying C standard to the struct:
>> >> > struct bpf_insn {
>> >> > __u8 code; /* opcode */
>> >> > __u8 dst_reg:4; /* dest register */
>> >> > __u8 src_reg:4; /* source register */
>> >> > __s16 off; /* signed offset */
>> >> > __s32 imm; /* signed immediate constant */
>> >> > };
>> >> > I'm not sure how to express this clearly in the table.
>> >>
>> >> Perhaps it would be simpler to document how the instruction bytes are
>> >> stored (be it in an ELF file or as bytes in a memory buffer to be loaded
>> >> into the kernel or some other BPF consumer) as opposed to how the
>> >> instructions look like once loaded (as a 64-bit word) by a little-endian
>> >> or big-endian kernel?
>> >>
>> >> Stored little-endian BPF instructions:
>> >>
>> >> code src_reg dst_reg off imm
>> >>
>> >> foo-le.o: file format elf64-bpfle
>> >>
>> >> 0000000000000000 <.text>:
>> >> 0: 07 01 00 00 ef be ad de r1 += 0xdeadbeef
>> >>
>> >> Stored big-endian BPF instructions:
>> >>
>> >> code dst_reg src_reg off imm
>> >>
>> >> foo-be.o: file format elf64-bpfbe
>> >>
>> >> 0000000000000000 <.text>:
>> >> 0: 07 10 00 00 de ad be ef r1 += 0xdeadbeef
>> >>
>> >> i.e. in the stored bytes the code always comes first, then the
>> >> registers, then the offset, then the immediate, regardless of
>> >> endianness.
>> >>
>> >> This may be easier to understand by implementors looking to generate
>> >> and/or consume bytes conforming BPF instructions.
>> >
>> > +1
>> > I like this format more as well.
>> > Maybe we can drop the table and use a diagram of a kind ?
>> >
>> > opcode src dst offset imm assembly
>> > 07 0 1 00 00 ef be ad de r1 += 0xdeadbeef // little
>> > 07 1 0 00 00 de ad be ef r1 += 0xdeadbeef // big
>>
>> Good idea. What about something like this:
>>
>> opcode offset imm assembly
>> src dst
>> 07 0 1 00 00 44 33 22 11 r1 += 0x11223344 // little
>> dst src
>> 07 1 0 00 00 11 22 33 44 r1 += 0x11223344 // big
>>
>> I changed the immediate because 0xdeadbeef is negative and it may be
>> confusing in the assembly part: strictly it would be r1 += -559038737.
>
> Looks great to me. Do you want to send your first kernel patch? :)
Sure, will prepare it :)
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2023-02-23 16:43 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-02-20 22:37 [PATCH bpf-next v2] bpf, docs: Add explanation of endianness Dave Thaler
2023-02-22 22:10 ` patchwork-bot+netdevbpf
2023-02-22 22:10 ` [Bpf] " Alexei Starovoitov
2023-02-22 23:23 ` Jose E. Marchesi
2023-02-23 1:56 ` Alexei Starovoitov
2023-02-23 13:18 ` Jose E. Marchesi
2023-02-23 16:40 ` Alexei Starovoitov
2023-02-23 16:42 ` Jose E. Marchesi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).