[PATCH V3] bpf, docs: Document BPF insn encoding in term of stored bytes

public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH V3] bpf, docs: Document BPF insn encoding in term of stored bytes
@ 2023-02-27 21:05 Jose E. Marchesi
  2023-02-27 21:18 ` David Vernet
  2023-02-27 21:49 ` [Bpf] " Dave Thaler
  0 siblings, 2 replies; 5+ messages in thread
From: Jose E. Marchesi @ 2023-02-27 21:05 UTC (permalink / raw)
  To: bpf; +Cc: Alexei Starovoitov, bpf, David Vernet


[Changes from V2:
- Use src and dst consistently in the document.
- Use a more graphical depiction of the 128-bit instruction.
- Remove `Where:' fragment.
- Clarify that unused bits are reserved and shall be zeroed.]

This patch modifies instruction-set.rst so it documents the encoding
of BPF instructions in terms of how the bytes are stored (be it in an
ELF file or as bytes in a memory buffer to be loaded into the kernel
or some other BPF consumer) as opposed to how the instruction looks
like once loaded.

This is hopefully easier to understand by implementors looking to
generate and/or consume bytes conforming BPF instructions.

The patch also clarifies that the unused bytes in a pseudo-instruction
shall be cleared with zeros.

Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com>
---
 Documentation/bpf/instruction-set.rst | 63 ++++++++++++++-------------
 1 file changed, 33 insertions(+), 30 deletions(-)

diff --git a/Documentation/bpf/instruction-set.rst b/Documentation/bpf/instruction-set.rst
index 01802ed9b29b..fae2e48d6a0b 100644
--- a/Documentation/bpf/instruction-set.rst
+++ b/Documentation/bpf/instruction-set.rst
@@ -38,15 +38,11 @@ eBPF has two instruction encodings:
 * the wide instruction encoding, which appends a second 64-bit immediate (i.e.,
   constant) value after the basic instruction for a total of 128 bits.
 
-The basic instruction encoding looks as follows for a little-endian processor,
-where MSB and LSB mean the most significant bits and least significant bits,
-respectively:
+The fields conforming an encoded basic instruction are stored in the
+following order::
 
-=============  =======  =======  =======  ============
-32 bits (MSB)  16 bits  4 bits   4 bits   8 bits (LSB)
-=============  =======  =======  =======  ============
-imm            offset   src_reg  dst_reg  opcode
-=============  =======  =======  =======  ============
+  opcode:8 src:4 dst:4 offset:16 imm:32 // In little-endian BPF.
+  opcode:8 dst:4 src:4 offset:16 imm:32 // In big-endian BPF.
 
 **imm**
   signed integer immediate value
@@ -54,48 +50,55 @@ imm            offset   src_reg  dst_reg  opcode
 **offset**
   signed integer offset used with pointer arithmetic
 
-**src_reg**
+**src**
   the source register number (0-10), except where otherwise specified
   (`64-bit immediate instructions`_ reuse this field for other purposes)
 
-**dst_reg**
+**dst**
   destination register number (0-10)
 
 **opcode**
   operation to perform
 
-and as follows for a big-endian processor:
+Note that the contents of multi-byte fields ('imm' and 'offset') are
+stored using big-endian byte ordering in big-endian BPF and
+little-endian byte ordering in little-endian BPF.
 
-=============  =======  =======  =======  ============
-32 bits (MSB)  16 bits  4 bits   4 bits   8 bits (LSB)
-=============  =======  =======  =======  ============
-imm            offset   dst_reg  src_reg  opcode
-=============  =======  =======  =======  ============
+For example::
 
-Multi-byte fields ('imm' and 'offset') are similarly stored in
-the byte order of the processor.
+  opcode         offset imm          assembly
+         src dst
+  07     0   1   00 00  44 33 22 11  r1 += 0x11223344 // little
+         dst src
+  07     1   0   00 00  11 22 33 44  r1 += 0x11223344 // big
 
 Note that most instructions do not use all of the fields.
 Unused fields shall be cleared to zero.
 
-As discussed below in `64-bit immediate instructions`_, a 64-bit immediate
-instruction uses a 64-bit immediate value that is constructed as follows.
-The 64 bits following the basic instruction contain a pseudo instruction
-using the same format but with opcode, dst_reg, src_reg, and offset all set to zero,
-and imm containing the high 32 bits of the immediate value.
+As discussed below in `64-bit immediate instructions`_, a 64-bit
+immediate instruction uses a 64-bit immediate value that is
+constructed as follows.  The 64 bits following the basic instruction
+contain a pseudo instruction using the same format but with opcode,
+dst, src, and offset all set to zero, and imm containing the high 32
+bits of the immediate value.
 
-=================  ==================
-64 bits (MSB)      64 bits (LSB)
-=================  ==================
-basic instruction  pseudo instruction
-=================  ==================
+This is depicted in the following figure::
+
+        basic_instruction
+  .-----------------------------.
+  |                             |
+  code:8 regs:16 offset:16 imm:32 unused:32 imm:32
+                                  |              |
+                                  '--------------'
+                                 pseudo instruction
 
 Thus the 64-bit immediate value is constructed as follows:
 
   imm64 = (next_imm << 32) | imm
 
 where 'next_imm' refers to the imm value of the pseudo instruction
-following the basic instruction.
+following the basic instruction.  The unused bytes in the pseudo
+instruction are reserved and shall be cleared to zero.
 
 Instruction classes
 -------------------
@@ -137,7 +140,7 @@ code            source  instruction class
   source  value  description
   ======  =====  ==============================================
   BPF_K   0x00   use 32-bit 'imm' value as source operand
-  BPF_X   0x08   use 'src_reg' register value as source operand
+  BPF_X   0x08   use 'src' register value as source operand
   ======  =====  ==============================================
 
 **instruction class**
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH V3] bpf, docs: Document BPF insn encoding in term of stored bytes
  2023-02-27 21:05 [PATCH V3] bpf, docs: Document BPF insn encoding in term of stored bytes Jose E. Marchesi
@ 2023-02-27 21:18 ` David Vernet
  2023-02-27 21:49 ` [Bpf] " Dave Thaler
  1 sibling, 0 replies; 5+ messages in thread
From: David Vernet @ 2023-02-27 21:18 UTC (permalink / raw)
  To: Jose E. Marchesi; +Cc: bpf, Alexei Starovoitov, bpf

On Mon, Feb 27, 2023 at 10:05:51PM +0100, Jose E. Marchesi wrote:
> 
> [Changes from V2:
> - Use src and dst consistently in the document.
> - Use a more graphical depiction of the 128-bit instruction.
> - Remove `Where:' fragment.
> - Clarify that unused bits are reserved and shall be zeroed.]
> 
> This patch modifies instruction-set.rst so it documents the encoding
> of BPF instructions in terms of how the bytes are stored (be it in an
> ELF file or as bytes in a memory buffer to be loaded into the kernel
> or some other BPF consumer) as opposed to how the instruction looks
> like once loaded.
> 
> This is hopefully easier to understand by implementors looking to
> generate and/or consume bytes conforming BPF instructions.
> 
> The patch also clarifies that the unused bytes in a pseudo-instruction
> shall be cleared with zeros.
> 
> Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com>

Looks great, thanks. Just one more small nit below that I'm only adding
because this doc is part of the standardization (perhaps Alexei or
someone else could also just add it when applying).

Acked-by: David Vernet <void@manifault.com>

> ---
>  Documentation/bpf/instruction-set.rst | 63 ++++++++++++++-------------
>  1 file changed, 33 insertions(+), 30 deletions(-)
> 
> diff --git a/Documentation/bpf/instruction-set.rst b/Documentation/bpf/instruction-set.rst
> index 01802ed9b29b..fae2e48d6a0b 100644
> --- a/Documentation/bpf/instruction-set.rst
> +++ b/Documentation/bpf/instruction-set.rst
> @@ -38,15 +38,11 @@ eBPF has two instruction encodings:
>  * the wide instruction encoding, which appends a second 64-bit immediate (i.e.,
>    constant) value after the basic instruction for a total of 128 bits.
>  
> -The basic instruction encoding looks as follows for a little-endian processor,
> -where MSB and LSB mean the most significant bits and least significant bits,
> -respectively:
> +The fields conforming an encoded basic instruction are stored in the
> +following order::
>  
> -=============  =======  =======  =======  ============
> -32 bits (MSB)  16 bits  4 bits   4 bits   8 bits (LSB)
> -=============  =======  =======  =======  ============
> -imm            offset   src_reg  dst_reg  opcode
> -=============  =======  =======  =======  ============
> +  opcode:8 src:4 dst:4 offset:16 imm:32 // In little-endian BPF.
> +  opcode:8 dst:4 src:4 offset:16 imm:32 // In big-endian BPF.
>  
>  **imm**
>    signed integer immediate value
> @@ -54,48 +50,55 @@ imm            offset   src_reg  dst_reg  opcode
>  **offset**
>    signed integer offset used with pointer arithmetic
>  
> -**src_reg**
> +**src**
>    the source register number (0-10), except where otherwise specified
>    (`64-bit immediate instructions`_ reuse this field for other purposes)
>  
> -**dst_reg**
> +**dst**
>    destination register number (0-10)
>  
>  **opcode**
>    operation to perform
>  
> -and as follows for a big-endian processor:
> +Note that the contents of multi-byte fields ('imm' and 'offset') are
> +stored using big-endian byte ordering in big-endian BPF and
> +little-endian byte ordering in little-endian BPF.
>  
> -=============  =======  =======  =======  ============
> -32 bits (MSB)  16 bits  4 bits   4 bits   8 bits (LSB)
> -=============  =======  =======  =======  ============
> -imm            offset   dst_reg  src_reg  opcode
> -=============  =======  =======  =======  ============
> +For example::
>  
> -Multi-byte fields ('imm' and 'offset') are similarly stored in
> -the byte order of the processor.
> +  opcode         offset imm          assembly
> +         src dst
> +  07     0   1   00 00  44 33 22 11  r1 += 0x11223344 // little
> +         dst src
> +  07     1   0   00 00  11 22 33 44  r1 += 0x11223344 // big
>  
>  Note that most instructions do not use all of the fields.
>  Unused fields shall be cleared to zero.

We should probably also say "Unused fields are reserved and shall be
cleared to zero" here.

>  
> -As discussed below in `64-bit immediate instructions`_, a 64-bit immediate
> -instruction uses a 64-bit immediate value that is constructed as follows.
> -The 64 bits following the basic instruction contain a pseudo instruction
> -using the same format but with opcode, dst_reg, src_reg, and offset all set to zero,
> -and imm containing the high 32 bits of the immediate value.
> +As discussed below in `64-bit immediate instructions`_, a 64-bit
> +immediate instruction uses a 64-bit immediate value that is
> +constructed as follows.  The 64 bits following the basic instruction
> +contain a pseudo instruction using the same format but with opcode,
> +dst, src, and offset all set to zero, and imm containing the high 32
> +bits of the immediate value.
>  
> -=================  ==================
> -64 bits (MSB)      64 bits (LSB)
> -=================  ==================
> -basic instruction  pseudo instruction
> -=================  ==================
> +This is depicted in the following figure::
> +
> +        basic_instruction
> +  .-----------------------------.
> +  |                             |
> +  code:8 regs:16 offset:16 imm:32 unused:32 imm:32
> +                                  |              |
> +                                  '--------------'
> +                                 pseudo instruction
>  
>  Thus the 64-bit immediate value is constructed as follows:
>  
>    imm64 = (next_imm << 32) | imm
>  
>  where 'next_imm' refers to the imm value of the pseudo instruction
> -following the basic instruction.
> +following the basic instruction.  The unused bytes in the pseudo
> +instruction are reserved and shall be cleared to zero.
>  
>  Instruction classes
>  -------------------
> @@ -137,7 +140,7 @@ code            source  instruction class
>    source  value  description
>    ======  =====  ==============================================
>    BPF_K   0x00   use 32-bit 'imm' value as source operand
> -  BPF_X   0x08   use 'src_reg' register value as source operand
> +  BPF_X   0x08   use 'src' register value as source operand
>    ======  =====  ==============================================
>  
>  **instruction class**
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [Bpf] [PATCH V3] bpf, docs: Document BPF insn encoding in term of stored bytes
  2023-02-27 21:05 [PATCH V3] bpf, docs: Document BPF insn encoding in term of stored bytes Jose E. Marchesi
  2023-02-27 21:18 ` David Vernet
@ 2023-02-27 21:49 ` Dave Thaler
  2023-02-27 23:42   ` David Vernet
  2023-02-28  1:05   ` Jose E. Marchesi
  1 sibling, 2 replies; 5+ messages in thread
From: Dave Thaler @ 2023-02-27 21:49 UTC (permalink / raw)
  To: Jose E. Marchesi, bpf; +Cc: Alexei Starovoitov, bpf@ietf.org, David Vernet

> -----Original Message-----
> From: Bpf <bpf-bounces@ietf.org> On Behalf Of Jose E. Marchesi
> Sent: Monday, February 27, 2023 1:06 PM
> To: bpf <bpf@vger.kernel.org>
> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>; bpf@ietf.org; David
> Vernet <void@manifault.com>
> Subject: [Bpf] [PATCH V3] bpf, docs: Document BPF insn encoding in term of
> stored bytes
> 
> 
> [Changes from V2:
> - Use src and dst consistently in the document.

Since my earlier patch, src and dst refer to the values whereas
src_reg and dst_reg refer to register numbers.

> - Use a more graphical depiction of the 128-bit instruction.
> - Remove `Where:' fragment.
> - Clarify that unused bits are reserved and shall be zeroed.]
> 
> This patch modifies instruction-set.rst so it documents the encoding of BPF
> instructions in terms of how the bytes are stored (be it in an ELF file or as
> bytes in a memory buffer to be loaded into the kernel or some other BPF
> consumer) as opposed to how the instruction looks like once loaded.
> 
> This is hopefully easier to understand by implementors looking to generate
> and/or consume bytes conforming BPF instructions.
> 
> The patch also clarifies that the unused bytes in a pseudo-instruction shall be
> cleared with zeros.
> 
> Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com>
> ---
>  Documentation/bpf/instruction-set.rst | 63 ++++++++++++++-------------
>  1 file changed, 33 insertions(+), 30 deletions(-)
> 
> diff --git a/Documentation/bpf/instruction-set.rst
> b/Documentation/bpf/instruction-set.rst
> index 01802ed9b29b..fae2e48d6a0b 100644
> --- a/Documentation/bpf/instruction-set.rst
> +++ b/Documentation/bpf/instruction-set.rst
> @@ -38,15 +38,11 @@ eBPF has two instruction encodings:
>  * the wide instruction encoding, which appends a second 64-bit immediate
> (i.e.,
>    constant) value after the basic instruction for a total of 128 bits.
> 
> -The basic instruction encoding looks as follows for a little-endian processor,
> -where MSB and LSB mean the most significant bits and least significant bits,
> -respectively:
> +The fields conforming an encoded basic instruction are stored in the
> +following order::
> 
> -=============  =======  =======  =======  ============
> -32 bits (MSB)  16 bits  4 bits   4 bits   8 bits (LSB)
> -=============  =======  =======  =======  ============
> -imm            offset   src_reg  dst_reg  opcode
> -=============  =======  =======  =======  ============
> +  opcode:8 src:4 dst:4 offset:16 imm:32 // In little-endian BPF.
> +  opcode:8 dst:4 src:4 offset:16 imm:32 // In big-endian BPF.

I think those should be src_reg and dst_reg (as the register numbers)
not src and dst (which are the values) or this will be a documentation
regression.

Right now I think this is a regression since if I understand right, with this
patch, "src" and "dst" now refer to both in different places which is
confusing.

Dave

>  **imm**
>    signed integer immediate value
> @@ -54,48 +50,55 @@ imm            offset   src_reg  dst_reg  opcode
>  **offset**
>    signed integer offset used with pointer arithmetic
> 
> -**src_reg**
> +**src**
>    the source register number (0-10), except where otherwise specified
>    (`64-bit immediate instructions`_ reuse this field for other purposes)
> 
> -**dst_reg**
> +**dst**
>    destination register number (0-10)
> 
>  **opcode**
>    operation to perform
> 
> -and as follows for a big-endian processor:
> +Note that the contents of multi-byte fields ('imm' and 'offset') are
> +stored using big-endian byte ordering in big-endian BPF and
> +little-endian byte ordering in little-endian BPF.
> 
> -=============  =======  =======  =======  ============
> -32 bits (MSB)  16 bits  4 bits   4 bits   8 bits (LSB)
> -=============  =======  =======  =======  ============
> -imm            offset   dst_reg  src_reg  opcode
> -=============  =======  =======  =======  ============
> +For example::
> 
> -Multi-byte fields ('imm' and 'offset') are similarly stored in -the byte order of
> the processor.
> +  opcode         offset imm          assembly
> +         src dst
> +  07     0   1   00 00  44 33 22 11  r1 += 0x11223344 // little
> +         dst src
> +  07     1   0   00 00  11 22 33 44  r1 += 0x11223344 // big
> 
>  Note that most instructions do not use all of the fields.
>  Unused fields shall be cleared to zero.
> 
> -As discussed below in `64-bit immediate instructions`_, a 64-bit immediate -
> instruction uses a 64-bit immediate value that is constructed as follows.
> -The 64 bits following the basic instruction contain a pseudo instruction -
> using the same format but with opcode, dst_reg, src_reg, and offset all set to
> zero, -and imm containing the high 32 bits of the immediate value.
> +As discussed below in `64-bit immediate instructions`_, a 64-bit
> +immediate instruction uses a 64-bit immediate value that is constructed
> +as follows.  The 64 bits following the basic instruction contain a
> +pseudo instruction using the same format but with opcode, dst, src, and
> +offset all set to zero, and imm containing the high 32 bits of the
> +immediate value.
> 
> -=================  ==================
> -64 bits (MSB)      64 bits (LSB)
> -=================  ==================
> -basic instruction  pseudo instruction
> -=================  ==================
> +This is depicted in the following figure::
> +
> +        basic_instruction
> +  .-----------------------------.
> +  |                             |
> +  code:8 regs:16 offset:16 imm:32 unused:32 imm:32
> +                                  |              |
> +                                  '--------------'
> +                                 pseudo instruction
> 
>  Thus the 64-bit immediate value is constructed as follows:
> 
>    imm64 = (next_imm << 32) | imm
> 
>  where 'next_imm' refers to the imm value of the pseudo instruction -
> following the basic instruction.
> +following the basic instruction.  The unused bytes in the pseudo
> +instruction are reserved and shall be cleared to zero.
> 
>  Instruction classes
>  -------------------
> @@ -137,7 +140,7 @@ code            source  instruction class
>    source  value  description
>    ======  =====  ==============================================
>    BPF_K   0x00   use 32-bit 'imm' value as source operand
> -  BPF_X   0x08   use 'src_reg' register value as source operand
> +  BPF_X   0x08   use 'src' register value as source operand
>    ======  =====  ==============================================
> 
>  **instruction class**
> --
> 2.30.2
> 
> --
> Bpf mailing list
> Bpf@ietf.org
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww
> .ietf.org%2Fmailman%2Flistinfo%2Fbpf&data=05%7C01%7Cdthaler%40micro
> soft.com%7C65d83bf2fe834f73f84908db19067400%7C72f988bf86f141af91ab
> 2d7cd011db47%7C1%7C0%7C638131287757978381%7CUnknown%7CTWFpb
> GZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6
> Mn0%3D%7C3000%7C%7C%7C&sdata=8il1%2B8I1T8GBqn3U%2B7YJehIKjS6s
> gvxTRWS2CTpg%2FZY%3D&reserved=0

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Bpf] [PATCH V3] bpf, docs: Document BPF insn encoding in term of stored bytes
  2023-02-27 21:49 ` [Bpf] " Dave Thaler
@ 2023-02-27 23:42   ` David Vernet
  2023-02-28  1:05   ` Jose E. Marchesi
  1 sibling, 0 replies; 5+ messages in thread
From: David Vernet @ 2023-02-27 23:42 UTC (permalink / raw)
  To: Dave Thaler; +Cc: Jose E. Marchesi, bpf, Alexei Starovoitov, bpf@ietf.org

On Mon, Feb 27, 2023 at 09:49:15PM +0000, Dave Thaler wrote:
> > -----Original Message-----
> > From: Bpf <bpf-bounces@ietf.org> On Behalf Of Jose E. Marchesi
> > Sent: Monday, February 27, 2023 1:06 PM
> > To: bpf <bpf@vger.kernel.org>
> > Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>; bpf@ietf.org; David
> > Vernet <void@manifault.com>
> > Subject: [Bpf] [PATCH V3] bpf, docs: Document BPF insn encoding in term of
> > stored bytes
> > 
> > 
> > [Changes from V2:
> > - Use src and dst consistently in the document.
> 
> Since my earlier patch, src and dst refer to the values whereas
> src_reg and dst_reg refer to register numbers.
> 
> > - Use a more graphical depiction of the 128-bit instruction.
> > - Remove `Where:' fragment.
> > - Clarify that unused bits are reserved and shall be zeroed.]
> > 
> > This patch modifies instruction-set.rst so it documents the encoding of BPF
> > instructions in terms of how the bytes are stored (be it in an ELF file or as
> > bytes in a memory buffer to be loaded into the kernel or some other BPF
> > consumer) as opposed to how the instruction looks like once loaded.
> > 
> > This is hopefully easier to understand by implementors looking to generate
> > and/or consume bytes conforming BPF instructions.
> > 
> > The patch also clarifies that the unused bytes in a pseudo-instruction shall be
> > cleared with zeros.
> > 
> > Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com>
> > ---
> >  Documentation/bpf/instruction-set.rst | 63 ++++++++++++++-------------
> >  1 file changed, 33 insertions(+), 30 deletions(-)
> > 
> > diff --git a/Documentation/bpf/instruction-set.rst
> > b/Documentation/bpf/instruction-set.rst
> > index 01802ed9b29b..fae2e48d6a0b 100644
> > --- a/Documentation/bpf/instruction-set.rst
> > +++ b/Documentation/bpf/instruction-set.rst
> > @@ -38,15 +38,11 @@ eBPF has two instruction encodings:
> >  * the wide instruction encoding, which appends a second 64-bit immediate
> > (i.e.,
> >    constant) value after the basic instruction for a total of 128 bits.
> > 
> > -The basic instruction encoding looks as follows for a little-endian processor,
> > -where MSB and LSB mean the most significant bits and least significant bits,
> > -respectively:
> > +The fields conforming an encoded basic instruction are stored in the
> > +following order::
> > 
> > -=============  =======  =======  =======  ============
> > -32 bits (MSB)  16 bits  4 bits   4 bits   8 bits (LSB)
> > -=============  =======  =======  =======  ============
> > -imm            offset   src_reg  dst_reg  opcode
> > -=============  =======  =======  =======  ============
> > +  opcode:8 src:4 dst:4 offset:16 imm:32 // In little-endian BPF.
> > +  opcode:8 dst:4 src:4 offset:16 imm:32 // In big-endian BPF.
> 
> I think those should be src_reg and dst_reg (as the register numbers)
> not src and dst (which are the values) or this will be a documentation
> regression.
> 
> Right now I think this is a regression since if I understand right, with this
> patch, "src" and "dst" now refer to both in different places which is
> confusing.

Fair enough -- this was my suggestion, and in hindsight I agree that it
would probably be best to avoid ambiguity by using src_reg and dst_reg
here. Apologies for the churn, Jose.

> 
> Dave
> 
> >  **imm**
> >    signed integer immediate value
> > @@ -54,48 +50,55 @@ imm            offset   src_reg  dst_reg  opcode
> >  **offset**
> >    signed integer offset used with pointer arithmetic
> > 
> > -**src_reg**
> > +**src**
> >    the source register number (0-10), except where otherwise specified
> >    (`64-bit immediate instructions`_ reuse this field for other purposes)
> > 
> > -**dst_reg**
> > +**dst**
> >    destination register number (0-10)
> > 
> >  **opcode**
> >    operation to perform
> > 
> > -and as follows for a big-endian processor:
> > +Note that the contents of multi-byte fields ('imm' and 'offset') are
> > +stored using big-endian byte ordering in big-endian BPF and
> > +little-endian byte ordering in little-endian BPF.
> > 
> > -=============  =======  =======  =======  ============
> > -32 bits (MSB)  16 bits  4 bits   4 bits   8 bits (LSB)
> > -=============  =======  =======  =======  ============
> > -imm            offset   dst_reg  src_reg  opcode
> > -=============  =======  =======  =======  ============
> > +For example::
> > 
> > -Multi-byte fields ('imm' and 'offset') are similarly stored in -the byte order of
> > the processor.
> > +  opcode         offset imm          assembly
> > +         src dst
> > +  07     0   1   00 00  44 33 22 11  r1 += 0x11223344 // little
> > +         dst src
> > +  07     1   0   00 00  11 22 33 44  r1 += 0x11223344 // big
> > 
> >  Note that most instructions do not use all of the fields.
> >  Unused fields shall be cleared to zero.
> > 
> > -As discussed below in `64-bit immediate instructions`_, a 64-bit immediate -
> > instruction uses a 64-bit immediate value that is constructed as follows.
> > -The 64 bits following the basic instruction contain a pseudo instruction -
> > using the same format but with opcode, dst_reg, src_reg, and offset all set to
> > zero, -and imm containing the high 32 bits of the immediate value.
> > +As discussed below in `64-bit immediate instructions`_, a 64-bit
> > +immediate instruction uses a 64-bit immediate value that is constructed
> > +as follows.  The 64 bits following the basic instruction contain a
> > +pseudo instruction using the same format but with opcode, dst, src, and
> > +offset all set to zero, and imm containing the high 32 bits of the
> > +immediate value.
> > 
> > -=================  ==================
> > -64 bits (MSB)      64 bits (LSB)
> > -=================  ==================
> > -basic instruction  pseudo instruction
> > -=================  ==================
> > +This is depicted in the following figure::
> > +
> > +        basic_instruction
> > +  .-----------------------------.
> > +  |                             |
> > +  code:8 regs:16 offset:16 imm:32 unused:32 imm:32
> > +                                  |              |
> > +                                  '--------------'
> > +                                 pseudo instruction
> > 
> >  Thus the 64-bit immediate value is constructed as follows:
> > 
> >    imm64 = (next_imm << 32) | imm
> > 
> >  where 'next_imm' refers to the imm value of the pseudo instruction -
> > following the basic instruction.
> > +following the basic instruction.  The unused bytes in the pseudo
> > +instruction are reserved and shall be cleared to zero.
> > 
> >  Instruction classes
> >  -------------------
> > @@ -137,7 +140,7 @@ code            source  instruction class
> >    source  value  description
> >    ======  =====  ==============================================
> >    BPF_K   0x00   use 32-bit 'imm' value as source operand
> > -  BPF_X   0x08   use 'src_reg' register value as source operand
> > +  BPF_X   0x08   use 'src' register value as source operand
> >    ======  =====  ==============================================
> > 
> >  **instruction class**
> > --
> > 2.30.2
> > 
> > --
> > Bpf mailing list
> > Bpf@ietf.org
> > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww
> > .ietf.org%2Fmailman%2Flistinfo%2Fbpf&data=05%7C01%7Cdthaler%40micro
> > soft.com%7C65d83bf2fe834f73f84908db19067400%7C72f988bf86f141af91ab
> > 2d7cd011db47%7C1%7C0%7C638131287757978381%7CUnknown%7CTWFpb
> > GZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6
> > Mn0%3D%7C3000%7C%7C%7C&sdata=8il1%2B8I1T8GBqn3U%2B7YJehIKjS6s
> > gvxTRWS2CTpg%2FZY%3D&reserved=0

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Bpf] [PATCH V3] bpf, docs: Document BPF insn encoding in term of stored bytes
  2023-02-27 21:49 ` [Bpf] " Dave Thaler
  2023-02-27 23:42   ` David Vernet
@ 2023-02-28  1:05   ` Jose E. Marchesi
  1 sibling, 0 replies; 5+ messages in thread
From: Jose E. Marchesi @ 2023-02-28  1:05 UTC (permalink / raw)
  To: Dave Thaler; +Cc: bpf, Alexei Starovoitov, bpf@ietf.org, David Vernet


>> -----Original Message-----
>> From: Bpf <bpf-bounces@ietf.org> On Behalf Of Jose E. Marchesi
>> Sent: Monday, February 27, 2023 1:06 PM
>> To: bpf <bpf@vger.kernel.org>
>> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>; bpf@ietf.org; David
>> Vernet <void@manifault.com>
>> Subject: [Bpf] [PATCH V3] bpf, docs: Document BPF insn encoding in term of
>> stored bytes
>> 
>> 
>> [Changes from V2:
>> - Use src and dst consistently in the document.
>
> Since my earlier patch, src and dst refer to the values whereas
> src_reg and dst_reg refer to register numbers.

Oh, I didn't realize that!
Yeah sure, resending with src_reg and dst_reg.


>> - Use a more graphical depiction of the 128-bit instruction.
>> - Remove `Where:' fragment.
>> - Clarify that unused bits are reserved and shall be zeroed.]
>> 
>> This patch modifies instruction-set.rst so it documents the encoding of BPF
>> instructions in terms of how the bytes are stored (be it in an ELF file or as
>> bytes in a memory buffer to be loaded into the kernel or some other BPF
>> consumer) as opposed to how the instruction looks like once loaded.
>> 
>> This is hopefully easier to understand by implementors looking to generate
>> and/or consume bytes conforming BPF instructions.
>> 
>> The patch also clarifies that the unused bytes in a pseudo-instruction shall be
>> cleared with zeros.
>> 
>> Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com>
>> ---
>>  Documentation/bpf/instruction-set.rst | 63 ++++++++++++++-------------
>>  1 file changed, 33 insertions(+), 30 deletions(-)
>> 
>> diff --git a/Documentation/bpf/instruction-set.rst
>> b/Documentation/bpf/instruction-set.rst
>> index 01802ed9b29b..fae2e48d6a0b 100644
>> --- a/Documentation/bpf/instruction-set.rst
>> +++ b/Documentation/bpf/instruction-set.rst
>> @@ -38,15 +38,11 @@ eBPF has two instruction encodings:
>>  * the wide instruction encoding, which appends a second 64-bit immediate
>> (i.e.,
>>    constant) value after the basic instruction for a total of 128 bits.
>> 
>> -The basic instruction encoding looks as follows for a little-endian processor,
>> -where MSB and LSB mean the most significant bits and least significant bits,
>> -respectively:
>> +The fields conforming an encoded basic instruction are stored in the
>> +following order::
>> 
>> -=============  =======  =======  =======  ============
>> -32 bits (MSB)  16 bits  4 bits   4 bits   8 bits (LSB)
>> -=============  =======  =======  =======  ============
>> -imm            offset   src_reg  dst_reg  opcode
>> -=============  =======  =======  =======  ============
>> +  opcode:8 src:4 dst:4 offset:16 imm:32 // In little-endian BPF.
>> +  opcode:8 dst:4 src:4 offset:16 imm:32 // In big-endian BPF.
>
> I think those should be src_reg and dst_reg (as the register numbers)
> not src and dst (which are the values) or this will be a documentation
> regression.
>
> Right now I think this is a regression since if I understand right, with this
> patch, "src" and "dst" now refer to both in different places which is
> confusing.
>
> Dave
>
>>  **imm**
>>    signed integer immediate value
>> @@ -54,48 +50,55 @@ imm            offset   src_reg  dst_reg  opcode
>>  **offset**
>>    signed integer offset used with pointer arithmetic
>> 
>> -**src_reg**
>> +**src**
>>    the source register number (0-10), except where otherwise specified
>>    (`64-bit immediate instructions`_ reuse this field for other purposes)
>> 
>> -**dst_reg**
>> +**dst**
>>    destination register number (0-10)
>> 
>>  **opcode**
>>    operation to perform
>> 
>> -and as follows for a big-endian processor:
>> +Note that the contents of multi-byte fields ('imm' and 'offset') are
>> +stored using big-endian byte ordering in big-endian BPF and
>> +little-endian byte ordering in little-endian BPF.
>> 
>> -=============  =======  =======  =======  ============
>> -32 bits (MSB)  16 bits  4 bits   4 bits   8 bits (LSB)
>> -=============  =======  =======  =======  ============
>> -imm            offset   dst_reg  src_reg  opcode
>> -=============  =======  =======  =======  ============
>> +For example::
>> 
>> -Multi-byte fields ('imm' and 'offset') are similarly stored in -the byte order of
>> the processor.
>> +  opcode         offset imm          assembly
>> +         src dst
>> +  07     0   1   00 00  44 33 22 11  r1 += 0x11223344 // little
>> +         dst src
>> +  07     1   0   00 00  11 22 33 44  r1 += 0x11223344 // big
>> 
>>  Note that most instructions do not use all of the fields.
>>  Unused fields shall be cleared to zero.
>> 
>> -As discussed below in `64-bit immediate instructions`_, a 64-bit immediate -
>> instruction uses a 64-bit immediate value that is constructed as follows.
>> -The 64 bits following the basic instruction contain a pseudo instruction -
>> using the same format but with opcode, dst_reg, src_reg, and offset all set to
>> zero, -and imm containing the high 32 bits of the immediate value.
>> +As discussed below in `64-bit immediate instructions`_, a 64-bit
>> +immediate instruction uses a 64-bit immediate value that is constructed
>> +as follows.  The 64 bits following the basic instruction contain a
>> +pseudo instruction using the same format but with opcode, dst, src, and
>> +offset all set to zero, and imm containing the high 32 bits of the
>> +immediate value.
>> 
>> -=================  ==================
>> -64 bits (MSB)      64 bits (LSB)
>> -=================  ==================
>> -basic instruction  pseudo instruction
>> -=================  ==================
>> +This is depicted in the following figure::
>> +
>> +        basic_instruction
>> +  .-----------------------------.
>> +  |                             |
>> +  code:8 regs:16 offset:16 imm:32 unused:32 imm:32
>> +                                  |              |
>> +                                  '--------------'
>> +                                 pseudo instruction
>> 
>>  Thus the 64-bit immediate value is constructed as follows:
>> 
>>    imm64 = (next_imm << 32) | imm
>> 
>>  where 'next_imm' refers to the imm value of the pseudo instruction -
>> following the basic instruction.
>> +following the basic instruction.  The unused bytes in the pseudo
>> +instruction are reserved and shall be cleared to zero.
>> 
>>  Instruction classes
>>  -------------------
>> @@ -137,7 +140,7 @@ code            source  instruction class
>>    source  value  description
>>    ======  =====  ==============================================
>>    BPF_K   0x00   use 32-bit 'imm' value as source operand
>> -  BPF_X   0x08   use 'src_reg' register value as source operand
>> +  BPF_X   0x08   use 'src' register value as source operand
>>    ======  =====  ==============================================
>> 
>>  **instruction class**
>> --
>> 2.30.2
>> 
>> --
>> Bpf mailing list
>> Bpf@ietf.org
>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww
>> .ietf.org%2Fmailman%2Flistinfo%2Fbpf&data=05%7C01%7Cdthaler%40micro
>> soft.com%7C65d83bf2fe834f73f84908db19067400%7C72f988bf86f141af91ab
>> 2d7cd011db47%7C1%7C0%7C638131287757978381%7CUnknown%7CTWFpb
>> GZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6
>> Mn0%3D%7C3000%7C%7C%7C&sdata=8il1%2B8I1T8GBqn3U%2B7YJehIKjS6s
>> gvxTRWS2CTpg%2FZY%3D&reserved=0

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-02-28  1:05 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-02-27 21:05 [PATCH V3] bpf, docs: Document BPF insn encoding in term of stored bytes Jose E. Marchesi
2023-02-27 21:18 ` David Vernet
2023-02-27 21:49 ` [Bpf] " Dave Thaler
2023-02-27 23:42   ` David Vernet
2023-02-28  1:05   ` Jose E. Marchesi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox