[PATCH] bpf, docs: Use consistent names for the same field

BPF List
 help / color / mirror / Atom feed

* [PATCH] bpf, docs: Use consistent names for the same field
@ 2023-01-25 18:58 dthaler1968
  2023-01-25 20:18 ` David Vernet
  0 siblings, 1 reply; 6+ messages in thread
From: dthaler1968 @ 2023-01-25 18:58 UTC (permalink / raw)
  To: bpf; +Cc: bpf, Dave Thaler

From: Dave Thaler <dthaler@microsoft.com>

Use consistent names for the same field, e.g., 'dst' vs 'dst_reg'.
Previously a mix of terms were used for the same thing in various cases.

Changes since last submission: addressed comments from Alexei and Stanislav

Signed-off-by: Dave Thaler <dthaler@microsoft.com>
---
 Documentation/bpf/instruction-set.rst | 105 ++++++++++++++++++--------
 1 file changed, 74 insertions(+), 31 deletions(-)

diff --git a/Documentation/bpf/instruction-set.rst b/Documentation/bpf/instruction-set.rst
index 2d3fe59bd26..3778c807cbb 100644
--- a/Documentation/bpf/instruction-set.rst
+++ b/Documentation/bpf/instruction-set.rst
@@ -30,20 +30,59 @@ Instruction encoding
 eBPF has two instruction encodings:
 
 * the basic instruction encoding, which uses 64 bits to encode an instruction
-* the wide instruction encoding, which appends a second 64-bit immediate value
-  (imm64) after the basic instruction for a total of 128 bits.
+* the wide instruction encoding, which appends a second 64-bit immediate (i.e.,
+  constant) value after the basic instruction for a total of 128 bits.
 
-The basic instruction encoding looks as follows:
+The basic instruction encoding is as follows, where MSB and LSB mean the most significant
+bits and least significant bits, respectively:
 
 =============  =======  ===============  ====================  ============
 32 bits (MSB)  16 bits  4 bits           4 bits                8 bits (LSB)
 =============  =======  ===============  ====================  ============
-immediate      offset   source register  destination register  opcode
+imm            offset   src              dst                   opcode
 =============  =======  ===============  ====================  ============
 
+imm
+  signed integer immediate value
+
+offset
+  signed integer offset used with pointer arithmetic
+
+src
+  the source register number (0-10), except where otherwise specified
+  (`64-bit immediate instructions`_ reuse this field for other purposes)
+
+dst
+  destination register number (0-10)
+
+opcode
+  operation to perform
+
 Note that most instructions do not use all of the fields.
 Unused fields shall be cleared to zero.
 
+As discussed below in `64-bit immediate instructions`_, a 64-bit immediate
+instruction uses a 64-bit immediate value that is constructed as follows.
+The 64 bits following the basic instruction contain a pseudo instruction
+using the same format but with opcode, dst, src, and offset all set to zero,
+and imm containing the high 32 bits of the immediate value.
+
+=================  ==================
+64 bits (MSB)      64 bits (LSB)
+=================  ==================
+basic instruction  pseudo instruction
+=================  ==================
+
+Thus the 64-bit immediate value is constructed as follows:
+
+  imm64 = (next_imm << 32) | imm
+
+where 'next_imm' refers to the imm value of the pseudo instruction
+following the basic instruction.
+
+In the remainder of this document 'src' and 'dst' refer to the values of the source
+and destination registers, respectively, rather than the register number.
+
 Instruction classes
 -------------------
 
@@ -71,20 +110,24 @@ For arithmetic and jump instructions (``BPF_ALU``, ``BPF_ALU64``, ``BPF_JMP`` an
 ==============  ======  =================
 4 bits (MSB)    1 bit   3 bits (LSB)
 ==============  ======  =================
-operation code  source  instruction class
+code            source  instruction class
 ==============  ======  =================
 
-The 4th bit encodes the source operand:
+code
+  the operation code, whose meaning varies by instruction class
 
-  ======  =====  ========================================
-  source  value  description
-  ======  =====  ========================================
-  BPF_K   0x00   use 32-bit immediate as source operand
-  BPF_X   0x08   use 'src_reg' register as source operand
-  ======  =====  ========================================
+source
+  the source operand location, which unless otherwise specified is one of:
 
-The four MSB bits store the operation code.
+  ======  =====  ==========================================
+  source  value  description
+  ======  =====  ==========================================
+  BPF_K   0x00   use 32-bit 'imm' value as source operand
+  BPF_X   0x08   use 'src' register value as source operand
+  ======  =====  ==========================================
 
+instruction class
+  the instruction class (see `Instruction classes`_)
 
 Arithmetic instructions
 -----------------------
@@ -121,19 +164,19 @@ the destination register is unchanged whereas for ``BPF_ALU`` the upper
 
 ``BPF_ADD | BPF_X | BPF_ALU`` means::
 
-  dst_reg = (u32) dst_reg + (u32) src_reg;
+  dst = (u32) ((u32) dst + (u32) src)
 
 ``BPF_ADD | BPF_X | BPF_ALU64`` means::
 
-  dst_reg = dst_reg + src_reg
+  dst = dst + src
 
 ``BPF_XOR | BPF_K | BPF_ALU`` means::
 
-  dst_reg = (u32) dst_reg ^ (u32) imm32
+  dst = (u32) dst ^ (u32) imm32
 
 ``BPF_XOR | BPF_K | BPF_ALU64`` means::
 
-  dst_reg = dst_reg ^ imm32
+  dst = dst ^ imm32
 
 Also note that the division and modulo operations are unsigned. Thus, for
 ``BPF_ALU``, 'imm' is first interpreted as an unsigned 32-bit value, whereas
@@ -167,11 +210,11 @@ Examples:
 
 ``BPF_ALU | BPF_TO_LE | BPF_END`` with imm = 16 means::
 
-  dst_reg = htole16(dst_reg)
+  dst = htole16(dst)
 
 ``BPF_ALU | BPF_TO_BE | BPF_END`` with imm = 64 means::
 
-  dst_reg = htobe64(dst_reg)
+  dst = htobe64(dst)
 
 Jump instructions
 -----------------
@@ -246,15 +289,15 @@ instructions that transfer data between a register and memory.
 
 ``BPF_MEM | <size> | BPF_STX`` means::
 
-  *(size *) (dst_reg + off) = src_reg
+  *(size *) (dst + offset) = src_reg
 
 ``BPF_MEM | <size> | BPF_ST`` means::
 
-  *(size *) (dst_reg + off) = imm32
+  *(size *) (dst + offset) = imm32
 
 ``BPF_MEM | <size> | BPF_LDX`` means::
 
-  dst_reg = *(size *) (src_reg + off)
+  dst = *(size *) (src + offset)
 
 Where size is one of: ``BPF_B``, ``BPF_H``, ``BPF_W``, or ``BPF_DW``.
 
@@ -288,11 +331,11 @@ BPF_XOR   0xa0   atomic xor
 
 ``BPF_ATOMIC | BPF_W  | BPF_STX`` with 'imm' = BPF_ADD means::
 
-  *(u32 *)(dst_reg + off16) += src_reg
+  *(u32 *)(dst + offset) += src
 
 ``BPF_ATOMIC | BPF_DW | BPF_STX`` with 'imm' = BPF ADD means::
 
-  *(u64 *)(dst_reg + off16) += src_reg
+  *(u64 *)(dst + offset) += src
 
 In addition to the simple atomic operations, there also is a modifier and
 two complex atomic operations:
@@ -307,16 +350,16 @@ BPF_CMPXCHG  0xf0 | BPF_FETCH  atomic compare and exchange
 
 The ``BPF_FETCH`` modifier is optional for simple atomic operations, and
 always set for the complex atomic operations.  If the ``BPF_FETCH`` flag
-is set, then the operation also overwrites ``src_reg`` with the value that
+is set, then the operation also overwrites ``src`` with the value that
 was in memory before it was modified.
 
-The ``BPF_XCHG`` operation atomically exchanges ``src_reg`` with the value
-addressed by ``dst_reg + off``.
+The ``BPF_XCHG`` operation atomically exchanges ``src`` with the value
+addressed by ``dst + offset``.
 
 The ``BPF_CMPXCHG`` operation atomically compares the value addressed by
-``dst_reg + off`` with ``R0``. If they match, the value addressed by
-``dst_reg + off`` is replaced with ``src_reg``. In either case, the
-value that was at ``dst_reg + off`` before the operation is zero-extended
+``dst + offset`` with ``R0``. If they match, the value addressed by
+``dst + offset`` is replaced with ``src``. In either case, the
+value that was at ``dst + offset`` before the operation is zero-extended
 and loaded back to ``R0``.
 
 64-bit immediate instructions
@@ -329,7 +372,7 @@ There is currently only one such instruction.
 
 ``BPF_LD | BPF_DW | BPF_IMM`` means::
 
-  dst_reg = imm64
+  dst = imm64
 
 
 Legacy BPF Packet access instructions
-- 
2.33.4


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] bpf, docs: Use consistent names for the same field
  2023-01-25 18:58 [PATCH] bpf, docs: Use consistent names for the same field dthaler1968
@ 2023-01-25 20:18 ` David Vernet
  2023-01-27  2:09   ` Dave Thaler
  2023-01-27  2:24   ` dthaler1968
  0 siblings, 2 replies; 6+ messages in thread
From: David Vernet @ 2023-01-25 20:18 UTC (permalink / raw)
  To: dthaler1968; +Cc: bpf, bpf, Dave Thaler

On Wed, Jan 25, 2023 at 06:58:17PM +0000, dthaler1968@googlemail.com wrote:
> From: Dave Thaler <dthaler@microsoft.com>
> 
> Use consistent names for the same field, e.g., 'dst' vs 'dst_reg'.
> Previously a mix of terms were used for the same thing in various cases.
> 
> Changes since last submission: addressed comments from Alexei and Stanislav

In the future, if sending subsequent iterations of a patch, could you
please follow the typical versioning  and changelog convention described
in [0]?

[0]: https://www.kernel.org/doc/html/latest/process/submitting-patches.html

> 
> Signed-off-by: Dave Thaler <dthaler@microsoft.com>
> ---
>  Documentation/bpf/instruction-set.rst | 105 ++++++++++++++++++--------
>  1 file changed, 74 insertions(+), 31 deletions(-)
> 
> diff --git a/Documentation/bpf/instruction-set.rst b/Documentation/bpf/instruction-set.rst
> index 2d3fe59bd26..3778c807cbb 100644
> --- a/Documentation/bpf/instruction-set.rst
> +++ b/Documentation/bpf/instruction-set.rst
> @@ -30,20 +30,59 @@ Instruction encoding
>  eBPF has two instruction encodings:
>  
>  * the basic instruction encoding, which uses 64 bits to encode an instruction
> -* the wide instruction encoding, which appends a second 64-bit immediate value
> -  (imm64) after the basic instruction for a total of 128 bits.
> +* the wide instruction encoding, which appends a second 64-bit immediate (i.e.,
> +  constant) value after the basic instruction for a total of 128 bits.
>  
> -The basic instruction encoding looks as follows:
> +The basic instruction encoding is as follows, where MSB and LSB mean the most significant
> +bits and least significant bits, respectively:
>  
>  =============  =======  ===============  ====================  ============
>  32 bits (MSB)  16 bits  4 bits           4 bits                8 bits (LSB)
>  =============  =======  ===============  ====================  ============
> -immediate      offset   source register  destination register  opcode
> +imm            offset   src              dst                   opcode

What's the rationale for changing source register and destination
register to src and dst respectively here? Below you clarify that they
mean something other than register number after this section in the
document, so why not just leave them as is here to avoid any confusion?

>  =============  =======  ===============  ====================  ============
>  
> +imm

Can we make all of these bold, just to slightly improve readability.
E.g.:

**imm**

> +  signed integer immediate value
> +
> +offset
> +  signed integer offset used with pointer arithmetic
> +
> +src
> +  the source register number (0-10), except where otherwise specified
> +  (`64-bit immediate instructions`_ reuse this field for other purposes)
> +
> +dst
> +  destination register number (0-10)
> +
> +opcode
> +  operation to perform
> +
>  Note that most instructions do not use all of the fields.
>  Unused fields shall be cleared to zero.
>  
> +As discussed below in `64-bit immediate instructions`_, a 64-bit immediate
> +instruction uses a 64-bit immediate value that is constructed as follows.

FWIW, I'd consider moving this description of how imm64 is encoded into
the 64-bit immediate instructions section, as it only has relevance in
that context anyways.

What do you think?

> +The 64 bits following the basic instruction contain a pseudo instruction
> +using the same format but with opcode, dst, src, and offset all set to zero,
> +and imm containing the high 32 bits of the immediate value.
> +
> +=================  ==================
> +64 bits (MSB)      64 bits (LSB)
> +=================  ==================
> +basic instruction  pseudo instruction
> +=================  ==================
> +
> +Thus the 64-bit immediate value is constructed as follows:
> +
> +  imm64 = (next_imm << 32) | imm
> +
> +where 'next_imm' refers to the imm value of the pseudo instruction
> +following the basic instruction.
> +
> +In the remainder of this document 'src' and 'dst' refer to the values of the source
> +and destination registers, respectively, rather than the register number.
> +
>  Instruction classes
>  -------------------
>  
> @@ -71,20 +110,24 @@ For arithmetic and jump instructions (``BPF_ALU``, ``BPF_ALU64``, ``BPF_JMP`` an
>  ==============  ======  =================
>  4 bits (MSB)    1 bit   3 bits (LSB)
>  ==============  ======  =================
> -operation code  source  instruction class
> +code            source  instruction class
>  ==============  ======  =================
>  
> -The 4th bit encodes the source operand:
> +code
> +  the operation code, whose meaning varies by instruction class
>  
> -  ======  =====  ========================================
> -  source  value  description
> -  ======  =====  ========================================
> -  BPF_K   0x00   use 32-bit immediate as source operand
> -  BPF_X   0x08   use 'src_reg' register as source operand
> -  ======  =====  ========================================
> +source
> +  the source operand location, which unless otherwise specified is one of:
>  
> -The four MSB bits store the operation code.
> +  ======  =====  ==========================================
> +  source  value  description
> +  ======  =====  ==========================================
> +  BPF_K   0x00   use 32-bit 'imm' value as source operand
> +  BPF_X   0x08   use 'src' register value as source operand
> +  ======  =====  ==========================================
>  
> +instruction class
> +  the instruction class (see `Instruction classes`_)
>  
>  Arithmetic instructions
>  -----------------------
> @@ -121,19 +164,19 @@ the destination register is unchanged whereas for ``BPF_ALU`` the upper
>  
>  ``BPF_ADD | BPF_X | BPF_ALU`` means::
>  
> -  dst_reg = (u32) dst_reg + (u32) src_reg;
> +  dst = (u32) ((u32) dst + (u32) src)
>  
>  ``BPF_ADD | BPF_X | BPF_ALU64`` means::
>  
> -  dst_reg = dst_reg + src_reg
> +  dst = dst + src
>  
>  ``BPF_XOR | BPF_K | BPF_ALU`` means::
>  
> -  dst_reg = (u32) dst_reg ^ (u32) imm32
> +  dst = (u32) dst ^ (u32) imm32
>  
>  ``BPF_XOR | BPF_K | BPF_ALU64`` means::
>  
> -  dst_reg = dst_reg ^ imm32
> +  dst = dst ^ imm32
>  
>  Also note that the division and modulo operations are unsigned. Thus, for
>  ``BPF_ALU``, 'imm' is first interpreted as an unsigned 32-bit value, whereas
> @@ -167,11 +210,11 @@ Examples:
>  
>  ``BPF_ALU | BPF_TO_LE | BPF_END`` with imm = 16 means::
>  
> -  dst_reg = htole16(dst_reg)
> +  dst = htole16(dst)
>  
>  ``BPF_ALU | BPF_TO_BE | BPF_END`` with imm = 64 means::
>  
> -  dst_reg = htobe64(dst_reg)
> +  dst = htobe64(dst)
>  
>  Jump instructions
>  -----------------
> @@ -246,15 +289,15 @@ instructions that transfer data between a register and memory.
>  
>  ``BPF_MEM | <size> | BPF_STX`` means::
>  
> -  *(size *) (dst_reg + off) = src_reg
> +  *(size *) (dst + offset) = src_reg

s/src_reg/src

>  
>  ``BPF_MEM | <size> | BPF_ST`` means::
>  
> -  *(size *) (dst_reg + off) = imm32
> +  *(size *) (dst + offset) = imm32
>  
>  ``BPF_MEM | <size> | BPF_LDX`` means::
>  
> -  dst_reg = *(size *) (src_reg + off)
> +  dst = *(size *) (src + offset)
>  
>  Where size is one of: ``BPF_B``, ``BPF_H``, ``BPF_W``, or ``BPF_DW``.
>  
> @@ -288,11 +331,11 @@ BPF_XOR   0xa0   atomic xor
>  
>  ``BPF_ATOMIC | BPF_W  | BPF_STX`` with 'imm' = BPF_ADD means::
>  
> -  *(u32 *)(dst_reg + off16) += src_reg
> +  *(u32 *)(dst + offset) += src
>  
>  ``BPF_ATOMIC | BPF_DW | BPF_STX`` with 'imm' = BPF ADD means::
>  
> -  *(u64 *)(dst_reg + off16) += src_reg
> +  *(u64 *)(dst + offset) += src
>  
>  In addition to the simple atomic operations, there also is a modifier and
>  two complex atomic operations:
> @@ -307,16 +350,16 @@ BPF_CMPXCHG  0xf0 | BPF_FETCH  atomic compare and exchange
>  
>  The ``BPF_FETCH`` modifier is optional for simple atomic operations, and
>  always set for the complex atomic operations.  If the ``BPF_FETCH`` flag
> -is set, then the operation also overwrites ``src_reg`` with the value that
> +is set, then the operation also overwrites ``src`` with the value that
>  was in memory before it was modified.
>  
> -The ``BPF_XCHG`` operation atomically exchanges ``src_reg`` with the value
> -addressed by ``dst_reg + off``.
> +The ``BPF_XCHG`` operation atomically exchanges ``src`` with the value
> +addressed by ``dst + offset``.
>  
>  The ``BPF_CMPXCHG`` operation atomically compares the value addressed by
> -``dst_reg + off`` with ``R0``. If they match, the value addressed by
> -``dst_reg + off`` is replaced with ``src_reg``. In either case, the
> -value that was at ``dst_reg + off`` before the operation is zero-extended
> +``dst + offset`` with ``R0``. If they match, the value addressed by
> +``dst + offset`` is replaced with ``src``. In either case, the
> +value that was at ``dst + offset`` before the operation is zero-extended
>  and loaded back to ``R0``.
>  
>  64-bit immediate instructions
> @@ -329,7 +372,7 @@ There is currently only one such instruction.
>  
>  ``BPF_LD | BPF_DW | BPF_IMM`` means::
>  
> -  dst_reg = imm64
> +  dst = imm64
>  
>  
>  Legacy BPF Packet access instructions
> -- 
> 2.33.4
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: [PATCH] bpf, docs: Use consistent names for the same field
  2023-01-25 20:18 ` David Vernet
@ 2023-01-27  2:09   ` Dave Thaler
  2023-01-27  5:36     ` David Vernet
  2023-01-27  2:24   ` dthaler1968
  1 sibling, 1 reply; 6+ messages in thread
From: Dave Thaler @ 2023-01-27  2:09 UTC (permalink / raw)
  To: David Vernet, dthaler1968@googlemail.com
  Cc: bpf@vger.kernel.org, bpf@ietf.org

David Vernet <void@manifault.com> wrote: 
> In the future, if sending subsequent iterations of a patch, could you please
> follow the typical versioning  and changelog convention described in [0]?

Thanks for being patient with a newcomer to this particular process :)

> >  =============  =======  ===============  ====================
> ============
> >  32 bits (MSB)  16 bits  4 bits           4 bits                8 bits (LSB)
> >  =============  =======  ===============  ====================
> ============
> > -immediate      offset   source register  destination register  opcode
> > +imm            offset   src              dst                   opcode
> 
> What's the rationale for changing source register and destination register to
> src and dst respectively here? Below you clarify that they mean something
> other than register number after this section in the document, so why not
> just leave them as is here to avoid any confusion?

Fair point, will update.

> Can we make all of these bold, just to slightly improve readability.
> E.g.:
> 
> **imm**

My view was that it was up to the RST renderer to do so. For example,
if you look at https://github.com/ebpffoundation/ebpf-docs/blob/update/rst/instruction-set.rst which is what I used
to validate the look of this patch plus other patches, it is already
bolded because the github RST renderer bolds definition list terms.

On the other hand, https://htmlpreview.github.io/?https://raw.githubusercontent.com/ebpffoundation/ebpf-docs/pdf/draft-thaler-bpf-isa.html#section-3 is the output of RST -> xml2rfcv3 -> HTML
doesn't do so.  That could be addressed either by me updating the
RST -> xml2rfcv3 converter to automatically bold (i.e., add <strong> to the XML)
or by adding an explicit bolding as you suggest.

I guess the benefit of adding the bolding into the RST itself is if there
are other RST renderers that don't automatically bold definition list terms but
we want them to.  I see other RST files in the Documentation/bpf directory
vary in terms of whether any explicit bolding is used, but I see maps.rst
does so, so I will go ahead and do this and make the RST -> xml2rfcv3
converter map bolding correctly to xml.

Dave

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] bpf, docs: Use consistent names for the same field
  2023-01-27  2:09   ` Dave Thaler
@ 2023-01-27  5:36     ` David Vernet
  0 siblings, 0 replies; 6+ messages in thread
From: David Vernet @ 2023-01-27  5:36 UTC (permalink / raw)
  To: Dave Thaler; +Cc: dthaler1968@googlemail.com, bpf@vger.kernel.org, bpf@ietf.org

On Fri, Jan 27, 2023 at 02:09:28AM +0000, Dave Thaler wrote:
> David Vernet <void@manifault.com> wrote: 
> > In the future, if sending subsequent iterations of a patch, could you please
> > follow the typical versioning  and changelog convention described in [0]?
> 
> Thanks for being patient with a newcomer to this particular process :)

No problem, the process can be a bit arcane :-)

> 
> > >  =============  =======  ===============  ====================
> > ============
> > >  32 bits (MSB)  16 bits  4 bits           4 bits                8 bits (LSB)
> > >  =============  =======  ===============  ====================
> > ============
> > > -immediate      offset   source register  destination register  opcode
> > > +imm            offset   src              dst                   opcode
> > 
> > What's the rationale for changing source register and destination register to
> > src and dst respectively here? Below you clarify that they mean something
> > other than register number after this section in the document, so why not
> > just leave them as is here to avoid any confusion?
> 
> Fair point, will update.
> 
> > Can we make all of these bold, just to slightly improve readability.
> > E.g.:
> > 
> > **imm**
> 
> My view was that it was up to the RST renderer to do so. For example,
> if you look at https://github.com/ebpffoundation/ebpf-docs/blob/update/rst/instruction-set.rst which is what I used
> to validate the look of this patch plus other patches, it is already
> bolded because the github RST renderer bolds definition list terms.
> 
> On the other hand, https://htmlpreview.github.io/?https://raw.githubusercontent.com/ebpffoundation/ebpf-docs/pdf/draft-thaler-bpf-isa.html#section-3 is the output of RST -> xml2rfcv3 -> HTML
> doesn't do so.  That could be addressed either by me updating the
> RST -> xml2rfcv3 converter to automatically bold (i.e., add <strong> to the XML)
> or by adding an explicit bolding as you suggest.
> 
> I guess the benefit of adding the bolding into the RST itself is if there
> are other RST renderers that don't automatically bold definition list terms but
> we want them to.  I see other RST files in the Documentation/bpf directory
> vary in terms of whether any explicit bolding is used, but I see maps.rst
> does so, so I will go ahead and do this and make the RST -> xml2rfcv3
> converter map bolding correctly to xml.

Yeah, definition list items are weird. Not a huge deal either way, but
my preference would be to just force the issue by using the ** ... **
syntax to make it bold. Sounds like we're in agreement.

Thanks,
David

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH] bpf, docs: Use consistent names for the same field
  2023-01-25 20:18 ` David Vernet
  2023-01-27  2:09   ` Dave Thaler
@ 2023-01-27  2:24   ` dthaler1968
  2023-01-27  5:55     ` David Vernet
  1 sibling, 1 reply; 6+ messages in thread
From: dthaler1968 @ 2023-01-27  2:24 UTC (permalink / raw)
  To: bpf; +Cc: bpf, Dave Thaler

From: Dave Thaler <dthaler@microsoft.com>

Use consistent names for the same field, e.g., 'dst' vs 'dst_reg'.
Previously a mix of terms were used for the same thing in various cases.

Signed-off-by: Dave Thaler <dthaler@microsoft.com>
---
V2 -> V3: per David Vernet, added bolding and updated terms for reg numbers

V1 -> V2: addressed comments from Alexei and Stanislav
---
 Documentation/bpf/instruction-set.rst | 105 ++++++++++++++++++--------
 1 file changed, 73 insertions(+), 32 deletions(-)

diff --git a/Documentation/bpf/instruction-set.rst b/Documentation/bpf/instruction-set.rst
index 2d3fe59bd26..2da47fe4ef8 100644
--- a/Documentation/bpf/instruction-set.rst
+++ b/Documentation/bpf/instruction-set.rst
@@ -30,20 +30,56 @@ Instruction encoding
 eBPF has two instruction encodings:
 
 * the basic instruction encoding, which uses 64 bits to encode an instruction
-* the wide instruction encoding, which appends a second 64-bit immediate value
-  (imm64) after the basic instruction for a total of 128 bits.
+* the wide instruction encoding, which appends a second 64-bit immediate (i.e.,
+  constant) value after the basic instruction for a total of 128 bits.
 
-The basic instruction encoding looks as follows:
+The basic instruction encoding is as follows, where MSB and LSB mean the most significant
+bits and least significant bits, respectively:
 
 =============  =======  ===============  ====================  ============
 32 bits (MSB)  16 bits  4 bits           4 bits                8 bits (LSB)
 =============  =======  ===============  ====================  ============
-immediate      offset   source register  destination register  opcode
+imm            offset   src_reg          dst_reg               opcode
 =============  =======  ===============  ====================  ============
 
+**imm**
+  signed integer immediate value
+
+**offset**
+  signed integer offset used with pointer arithmetic
+
+**src_reg**
+  the source register number (0-10), except where otherwise specified
+  (`64-bit immediate instructions`_ reuse this field for other purposes)
+
+**dst_reg**
+  destination register number (0-10)
+
+**opcode**
+  operation to perform
+
 Note that most instructions do not use all of the fields.
 Unused fields shall be cleared to zero.
 
+As discussed below in `64-bit immediate instructions`_, a 64-bit immediate
+instruction uses a 64-bit immediate value that is constructed as follows.
+The 64 bits following the basic instruction contain a pseudo instruction
+using the same format but with opcode, dst_reg, src_reg, and offset all set to zero,
+and imm containing the high 32 bits of the immediate value.
+
+=================  ==================
+64 bits (MSB)      64 bits (LSB)
+=================  ==================
+basic instruction  pseudo instruction
+=================  ==================
+
+Thus the 64-bit immediate value is constructed as follows:
+
+  imm64 = (next_imm << 32) | imm
+
+where 'next_imm' refers to the imm value of the pseudo instruction
+following the basic instruction.
+
 Instruction classes
 -------------------
 
@@ -71,27 +107,32 @@ For arithmetic and jump instructions (``BPF_ALU``, ``BPF_ALU64``, ``BPF_JMP`` an
 ==============  ======  =================
 4 bits (MSB)    1 bit   3 bits (LSB)
 ==============  ======  =================
-operation code  source  instruction class
+code            source  instruction class
 ==============  ======  =================
 
-The 4th bit encodes the source operand:
+**code**
+  the operation code, whose meaning varies by instruction class
 
-  ======  =====  ========================================
-  source  value  description
-  ======  =====  ========================================
-  BPF_K   0x00   use 32-bit immediate as source operand
-  BPF_X   0x08   use 'src_reg' register as source operand
-  ======  =====  ========================================
+**source**
+  the source operand location, which unless otherwise specified is one of:
 
-The four MSB bits store the operation code.
+  ======  =====  ==============================================
+  source  value  description
+  ======  =====  ==============================================
+  BPF_K   0x00   use 32-bit 'imm' value as source operand
+  BPF_X   0x08   use 'src_reg' register value as source operand
+  ======  =====  ==============================================
 
+**instruction class**
+  the instruction class (see `Instruction classes`_)
 
 Arithmetic instructions
 -----------------------
 
 ``BPF_ALU`` uses 32-bit wide operands while ``BPF_ALU64`` uses 64-bit wide operands for
 otherwise identical operations.
-The 'code' field encodes the operation as below:
+The 'code' field encodes the operation as below, where 'src' and 'dst' refer
+to the values of the source and destination registers, respectively.
 
 ========  =====  ==========================================================
 code      value  description
@@ -121,19 +162,19 @@ the destination register is unchanged whereas for ``BPF_ALU`` the upper
 
 ``BPF_ADD | BPF_X | BPF_ALU`` means::
 
-  dst_reg = (u32) dst_reg + (u32) src_reg;
+  dst = (u32) ((u32) dst + (u32) src)
 
 ``BPF_ADD | BPF_X | BPF_ALU64`` means::
 
-  dst_reg = dst_reg + src_reg
+  dst = dst + src
 
 ``BPF_XOR | BPF_K | BPF_ALU`` means::
 
-  dst_reg = (u32) dst_reg ^ (u32) imm32
+  dst = (u32) dst ^ (u32) imm32
 
 ``BPF_XOR | BPF_K | BPF_ALU64`` means::
 
-  dst_reg = dst_reg ^ imm32
+  dst = dst ^ imm32
 
 Also note that the division and modulo operations are unsigned. Thus, for
 ``BPF_ALU``, 'imm' is first interpreted as an unsigned 32-bit value, whereas
@@ -167,11 +208,11 @@ Examples:
 
 ``BPF_ALU | BPF_TO_LE | BPF_END`` with imm = 16 means::
 
-  dst_reg = htole16(dst_reg)
+  dst = htole16(dst)
 
 ``BPF_ALU | BPF_TO_BE | BPF_END`` with imm = 64 means::
 
-  dst_reg = htobe64(dst_reg)
+  dst = htobe64(dst)
 
 Jump instructions
 -----------------
@@ -246,15 +287,15 @@ instructions that transfer data between a register and memory.
 
 ``BPF_MEM | <size> | BPF_STX`` means::
 
-  *(size *) (dst_reg + off) = src_reg
+  *(size *) (dst + offset) = src
 
 ``BPF_MEM | <size> | BPF_ST`` means::
 
-  *(size *) (dst_reg + off) = imm32
+  *(size *) (dst + offset) = imm32
 
 ``BPF_MEM | <size> | BPF_LDX`` means::
 
-  dst_reg = *(size *) (src_reg + off)
+  dst = *(size *) (src + offset)
 
 Where size is one of: ``BPF_B``, ``BPF_H``, ``BPF_W``, or ``BPF_DW``.
 
@@ -288,11 +329,11 @@ BPF_XOR   0xa0   atomic xor
 
 ``BPF_ATOMIC | BPF_W  | BPF_STX`` with 'imm' = BPF_ADD means::
 
-  *(u32 *)(dst_reg + off16) += src_reg
+  *(u32 *)(dst + offset) += src
 
 ``BPF_ATOMIC | BPF_DW | BPF_STX`` with 'imm' = BPF ADD means::
 
-  *(u64 *)(dst_reg + off16) += src_reg
+  *(u64 *)(dst + offset) += src
 
 In addition to the simple atomic operations, there also is a modifier and
 two complex atomic operations:
@@ -307,16 +348,16 @@ BPF_CMPXCHG  0xf0 | BPF_FETCH  atomic compare and exchange
 
 The ``BPF_FETCH`` modifier is optional for simple atomic operations, and
 always set for the complex atomic operations.  If the ``BPF_FETCH`` flag
-is set, then the operation also overwrites ``src_reg`` with the value that
+is set, then the operation also overwrites ``src`` with the value that
 was in memory before it was modified.
 
-The ``BPF_XCHG`` operation atomically exchanges ``src_reg`` with the value
-addressed by ``dst_reg + off``.
+The ``BPF_XCHG`` operation atomically exchanges ``src`` with the value
+addressed by ``dst + offset``.
 
 The ``BPF_CMPXCHG`` operation atomically compares the value addressed by
-``dst_reg + off`` with ``R0``. If they match, the value addressed by
-``dst_reg + off`` is replaced with ``src_reg``. In either case, the
-value that was at ``dst_reg + off`` before the operation is zero-extended
+``dst + offset`` with ``R0``. If they match, the value addressed by
+``dst + offset`` is replaced with ``src``. In either case, the
+value that was at ``dst + offset`` before the operation is zero-extended
 and loaded back to ``R0``.
 
 64-bit immediate instructions
@@ -329,7 +370,7 @@ There is currently only one such instruction.
 
 ``BPF_LD | BPF_DW | BPF_IMM`` means::
 
-  dst_reg = imm64
+  dst = imm64
 
 
 Legacy BPF Packet access instructions
-- 
2.33.4


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] bpf, docs: Use consistent names for the same field
  2023-01-27  2:24   ` dthaler1968
@ 2023-01-27  5:55     ` David Vernet
  0 siblings, 0 replies; 6+ messages in thread
From: David Vernet @ 2023-01-27  5:55 UTC (permalink / raw)
  To: dthaler1968; +Cc: bpf, bpf, Dave Thaler

On Fri, Jan 27, 2023 at 02:24:16AM +0000, dthaler1968@googlemail.com wrote:
> From: Dave Thaler <dthaler@microsoft.com>
> 
> Use consistent names for the same field, e.g., 'dst' vs 'dst_reg'.
> Previously a mix of terms were used for the same thing in various cases.
> 
> Signed-off-by: Dave Thaler <dthaler@microsoft.com>
> ---
> V2 -> V3: per David Vernet, added bolding and updated terms for reg numbers
> 
> V1 -> V2: addressed comments from Alexei and Stanislav
> ---
>  Documentation/bpf/instruction-set.rst | 105 ++++++++++++++++++--------
>  1 file changed, 73 insertions(+), 32 deletions(-)
> 
> diff --git a/Documentation/bpf/instruction-set.rst b/Documentation/bpf/instruction-set.rst
> index 2d3fe59bd26..2da47fe4ef8 100644
> --- a/Documentation/bpf/instruction-set.rst
> +++ b/Documentation/bpf/instruction-set.rst
> @@ -30,20 +30,56 @@ Instruction encoding
>  eBPF has two instruction encodings:
>  
>  * the basic instruction encoding, which uses 64 bits to encode an instruction
> -* the wide instruction encoding, which appends a second 64-bit immediate value
> -  (imm64) after the basic instruction for a total of 128 bits.
> +* the wide instruction encoding, which appends a second 64-bit immediate (i.e.,
> +  constant) value after the basic instruction for a total of 128 bits.
>  
> -The basic instruction encoding looks as follows:
> +The basic instruction encoding is as follows, where MSB and LSB mean the most significant
> +bits and least significant bits, respectively:
>  
>  =============  =======  ===============  ====================  ============
>  32 bits (MSB)  16 bits  4 bits           4 bits                8 bits (LSB)
>  =============  =======  ===============  ====================  ============
> -immediate      offset   source register  destination register  opcode
> +imm            offset   src_reg          dst_reg               opcode
>  =============  =======  ===============  ====================  ============

Can you adjust the width of the src_reg and dst_reg columns to be
minimized, as the others are?

Looks great otherwise.

>  
> +**imm**
> +  signed integer immediate value
> +
> +**offset**
> +  signed integer offset used with pointer arithmetic
> +
> +**src_reg**
> +  the source register number (0-10), except where otherwise specified
> +  (`64-bit immediate instructions`_ reuse this field for other purposes)
> +
> +**dst_reg**
> +  destination register number (0-10)
> +
> +**opcode**
> +  operation to perform
> +
>  Note that most instructions do not use all of the fields.
>  Unused fields shall be cleared to zero.
>  
> +As discussed below in `64-bit immediate instructions`_, a 64-bit immediate
> +instruction uses a 64-bit immediate value that is constructed as follows.
> +The 64 bits following the basic instruction contain a pseudo instruction
> +using the same format but with opcode, dst_reg, src_reg, and offset all set to zero,
> +and imm containing the high 32 bits of the immediate value.
> +
> +=================  ==================
> +64 bits (MSB)      64 bits (LSB)
> +=================  ==================
> +basic instruction  pseudo instruction
> +=================  ==================
> +
> +Thus the 64-bit immediate value is constructed as follows:
> +
> +  imm64 = (next_imm << 32) | imm
> +
> +where 'next_imm' refers to the imm value of the pseudo instruction
> +following the basic instruction.
> +
>  Instruction classes
>  -------------------
>  
> @@ -71,27 +107,32 @@ For arithmetic and jump instructions (``BPF_ALU``, ``BPF_ALU64``, ``BPF_JMP`` an
>  ==============  ======  =================
>  4 bits (MSB)    1 bit   3 bits (LSB)
>  ==============  ======  =================
> -operation code  source  instruction class
> +code            source  instruction class
>  ==============  ======  =================
>  
> -The 4th bit encodes the source operand:
> +**code**
> +  the operation code, whose meaning varies by instruction class
>  
> -  ======  =====  ========================================
> -  source  value  description
> -  ======  =====  ========================================
> -  BPF_K   0x00   use 32-bit immediate as source operand
> -  BPF_X   0x08   use 'src_reg' register as source operand
> -  ======  =====  ========================================
> +**source**
> +  the source operand location, which unless otherwise specified is one of:
>  
> -The four MSB bits store the operation code.
> +  ======  =====  ==============================================
> +  source  value  description
> +  ======  =====  ==============================================
> +  BPF_K   0x00   use 32-bit 'imm' value as source operand
> +  BPF_X   0x08   use 'src_reg' register value as source operand
> +  ======  =====  ==============================================
>  
> +**instruction class**
> +  the instruction class (see `Instruction classes`_)
>  
>  Arithmetic instructions
>  -----------------------
>  
>  ``BPF_ALU`` uses 32-bit wide operands while ``BPF_ALU64`` uses 64-bit wide operands for
>  otherwise identical operations.
> -The 'code' field encodes the operation as below:
> +The 'code' field encodes the operation as below, where 'src' and 'dst' refer
> +to the values of the source and destination registers, respectively.
>  
>  ========  =====  ==========================================================
>  code      value  description
> @@ -121,19 +162,19 @@ the destination register is unchanged whereas for ``BPF_ALU`` the upper
>  
>  ``BPF_ADD | BPF_X | BPF_ALU`` means::
>  
> -  dst_reg = (u32) dst_reg + (u32) src_reg;
> +  dst = (u32) ((u32) dst + (u32) src)
>  
>  ``BPF_ADD | BPF_X | BPF_ALU64`` means::
>  
> -  dst_reg = dst_reg + src_reg
> +  dst = dst + src
>  
>  ``BPF_XOR | BPF_K | BPF_ALU`` means::
>  
> -  dst_reg = (u32) dst_reg ^ (u32) imm32
> +  dst = (u32) dst ^ (u32) imm32
>  
>  ``BPF_XOR | BPF_K | BPF_ALU64`` means::
>  
> -  dst_reg = dst_reg ^ imm32
> +  dst = dst ^ imm32
>  
>  Also note that the division and modulo operations are unsigned. Thus, for
>  ``BPF_ALU``, 'imm' is first interpreted as an unsigned 32-bit value, whereas
> @@ -167,11 +208,11 @@ Examples:
>  
>  ``BPF_ALU | BPF_TO_LE | BPF_END`` with imm = 16 means::
>  
> -  dst_reg = htole16(dst_reg)
> +  dst = htole16(dst)
>  
>  ``BPF_ALU | BPF_TO_BE | BPF_END`` with imm = 64 means::
>  
> -  dst_reg = htobe64(dst_reg)
> +  dst = htobe64(dst)
>  
>  Jump instructions
>  -----------------
> @@ -246,15 +287,15 @@ instructions that transfer data between a register and memory.
>  
>  ``BPF_MEM | <size> | BPF_STX`` means::
>  
> -  *(size *) (dst_reg + off) = src_reg
> +  *(size *) (dst + offset) = src
>  
>  ``BPF_MEM | <size> | BPF_ST`` means::
>  
> -  *(size *) (dst_reg + off) = imm32
> +  *(size *) (dst + offset) = imm32
>  
>  ``BPF_MEM | <size> | BPF_LDX`` means::
>  
> -  dst_reg = *(size *) (src_reg + off)
> +  dst = *(size *) (src + offset)
>  
>  Where size is one of: ``BPF_B``, ``BPF_H``, ``BPF_W``, or ``BPF_DW``.
>  
> @@ -288,11 +329,11 @@ BPF_XOR   0xa0   atomic xor
>  
>  ``BPF_ATOMIC | BPF_W  | BPF_STX`` with 'imm' = BPF_ADD means::
>  
> -  *(u32 *)(dst_reg + off16) += src_reg
> +  *(u32 *)(dst + offset) += src
>  
>  ``BPF_ATOMIC | BPF_DW | BPF_STX`` with 'imm' = BPF ADD means::
>  
> -  *(u64 *)(dst_reg + off16) += src_reg
> +  *(u64 *)(dst + offset) += src
>  
>  In addition to the simple atomic operations, there also is a modifier and
>  two complex atomic operations:
> @@ -307,16 +348,16 @@ BPF_CMPXCHG  0xf0 | BPF_FETCH  atomic compare and exchange
>  
>  The ``BPF_FETCH`` modifier is optional for simple atomic operations, and
>  always set for the complex atomic operations.  If the ``BPF_FETCH`` flag
> -is set, then the operation also overwrites ``src_reg`` with the value that
> +is set, then the operation also overwrites ``src`` with the value that
>  was in memory before it was modified.
>  
> -The ``BPF_XCHG`` operation atomically exchanges ``src_reg`` with the value
> -addressed by ``dst_reg + off``.
> +The ``BPF_XCHG`` operation atomically exchanges ``src`` with the value
> +addressed by ``dst + offset``.
>  
>  The ``BPF_CMPXCHG`` operation atomically compares the value addressed by
> -``dst_reg + off`` with ``R0``. If they match, the value addressed by
> -``dst_reg + off`` is replaced with ``src_reg``. In either case, the
> -value that was at ``dst_reg + off`` before the operation is zero-extended
> +``dst + offset`` with ``R0``. If they match, the value addressed by
> +``dst + offset`` is replaced with ``src``. In either case, the
> +value that was at ``dst + offset`` before the operation is zero-extended
>  and loaded back to ``R0``.
>  
>  64-bit immediate instructions
> @@ -329,7 +370,7 @@ There is currently only one such instruction.
>  
>  ``BPF_LD | BPF_DW | BPF_IMM`` means::
>  
> -  dst_reg = imm64
> +  dst = imm64
>  
>  
>  Legacy BPF Packet access instructions
> -- 
> 2.33.4
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-01-27  5:55 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-01-25 18:58 [PATCH] bpf, docs: Use consistent names for the same field dthaler1968
2023-01-25 20:18 ` David Vernet
2023-01-27  2:09   ` Dave Thaler
2023-01-27  5:36     ` David Vernet
2023-01-27  2:24   ` dthaler1968
2023-01-27  5:55     ` David Vernet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox