Re: [PATCH bpf-next] The original document has some inconsistency.

* Re: [PATCH bpf-next] The original document has some inconsistency.
       [not found] <20240105031450.57681-2-aoyangfang@link.cuhk.edu.cn>
@ 2024-01-09 17:32 ` David Vernet
  2024-01-09 17:32   ` [Bpf] " David Vernet
  2024-01-09 18:06   ` dthaler1968
  0 siblings, 2 replies; 10+ messages in thread
From: David Vernet @ 2024-01-09 17:32 UTC (permalink / raw)
  To: Aoyang Fang; +Cc: bpf, bpf, dthaler1968

[-- Attachment #1: Type: text/plain, Size: 14538 bytes --]

On Fri, Jan 05, 2024 at 11:14:51AM +0800, Aoyang Fang wrote:

Hi Aoyang,

Thanks a lot for your contribution. I agree that we need to fix the
document to be consistent, though I'm afraid that I think this patch
goes in the wrong direction by making everything match the jump
instruction class. More below.

nit: Could you please update the patch subject to be more
self-describing. For example, something like:

Use consistent numerical widths in instructions.rst encodings

> For example:
> 1. 1.3.1 Arithmetic instructions use '8 bits length' encoding to
>    express the 'code' value, e.g., BPF_ADD=0x00, BPF_SUB=0x10,
>    BPF_MUL=0x20. However the length of the 'code' is 4 bits. On the
>    other hand, 1.3.3 Jump instructions use '4 bits length' encoding,
>    e.g., BPF_JEQ=0x1 and BPF_JGT=0x2.
> 2. There are also many places that use '8 bits length' encoding to
>    express the corresponding contents, e.g., 1.4 Load and store
>    instructions, BPF_ABS=0x20, BPF_IND=0x40. However, the length of
>    'mode modifier' is 3 bits.
> 
> To summarize, the only place that has inconsistent encoding is Jump
> instructions. After discussing with Dave, dthaler1968@googlemail.com,
> we agree that the document should be more clear.
> 
> Signed-off-by: Aoyang Fang <aoyangfang@link.cuhk.edu.cn>
>
> ---
>  .../bpf/standardization/instruction-set.rst   | 170 +++++++++---------
>  1 file changed, 85 insertions(+), 85 deletions(-)
> 
> diff --git a/Documentation/bpf/standardization/instruction-set.rst b/Documentation/bpf/standardization/instruction-set.rst
> index 245b6defc..57dd1fa00 100644
> --- a/Documentation/bpf/standardization/instruction-set.rst
> +++ b/Documentation/bpf/standardization/instruction-set.rst
> @@ -172,18 +172,18 @@ Instruction classes
>  
>  The three LSB bits of the 'opcode' field store the instruction class:
>  
> -=========  =====  ===============================  ===================================
> -class      value  description                      reference
> -=========  =====  ===============================  ===================================
> -BPF_LD     0x00   non-standard load operations     `Load and store instructions`_
> -BPF_LDX    0x01   load into register operations    `Load and store instructions`_
> -BPF_ST     0x02   store from immediate operations  `Load and store instructions`_
> -BPF_STX    0x03   store from register operations   `Load and store instructions`_
> -BPF_ALU    0x04   32-bit arithmetic operations     `Arithmetic and jump instructions`_
> -BPF_JMP    0x05   64-bit jump operations           `Arithmetic and jump instructions`_
> -BPF_JMP32  0x06   32-bit jump operations           `Arithmetic and jump instructions`_
> -BPF_ALU64  0x07   64-bit arithmetic operations     `Arithmetic and jump instructions`_
> -=========  =====  ===============================  ===================================
> +=========  =============  ===============================  ===================================
> +class      value(3 bits)  description                      reference
> +=========  =============  ===============================  ===================================
> +BPF_LD     0x0            non-standard load operations     `Load and store instructions`_
> +BPF_LDX    0x1            load into register operations    `Load and store instructions`_
> +BPF_ST     0x2            store from immediate operations  `Load and store instructions`_
> +BPF_STX    0x3            store from register operations   `Load and store instructions`_
> +BPF_ALU    0x4            32-bit arithmetic operations     `Arithmetic and jump instructions`_
> +BPF_JMP    0x5            64-bit jump operations           `Arithmetic and jump instructions`_
> +BPF_JMP32  0x6            32-bit jump operations           `Arithmetic and jump instructions`_
> +BPF_ALU64  0x7            64-bit arithmetic operations     `Arithmetic and jump instructions`_
> +=========  =============  ===============================  ===================================

Hmm, I presonally think this is more confusing. The opcode field is 8
bits. We already specify that the value is the three LSB of the opcode
field. It's certainly subjective, but I think we should have the value
reflect the actual value in the field it's embedded in. In my opinion,
changing the value to not reflect its place in the actual opcode in my
opinion imposes a burden on the reader to go back and reference where
the field actually belongs in the full opcode. It's a tradeoff, but I
think we're already on the winning end of that tradeoff.

>  Arithmetic and jump instructions
>  ================================
> @@ -203,12 +203,12 @@ code            source  instruction class
>  **source**
>    the source operand location, which unless otherwise specified is one of:
>  
> -  ======  =====  ==============================================
> -  source  value  description
> -  ======  =====  ==============================================
> -  BPF_K   0x00   use 32-bit 'imm' value as source operand
> -  BPF_X   0x08   use 'src_reg' register value as source operand
> -  ======  =====  ==============================================
> +  ======  ============  ==============================================
> +  source  value(1 bit)  description
> +  ======  ============  ==============================================
> +  BPF_K   0x0           use 32-bit 'imm' value as source operand
> +  BPF_X   0x1           use 'src_reg' register value as source operand
> +  ======  ============  ==============================================

Same here as well. The value isn't really 0x1, it's 0x8. And 0x08 is
even more clear yet, given that we're representing the value of the bit
in the 8 bit opcode field.

>  **instruction class**
>    the instruction class (see `Instruction classes`_)
> @@ -221,27 +221,27 @@ otherwise identical operations.
>  The 'code' field encodes the operation as below, where 'src' and 'dst' refer
>  to the values of the source and destination registers, respectively.
>  
> -=========  =====  =======  ==========================================================
> -code       value  offset   description
> -=========  =====  =======  ==========================================================
> -BPF_ADD    0x00   0        dst += src
> -BPF_SUB    0x10   0        dst -= src
> -BPF_MUL    0x20   0        dst \*= src
> -BPF_DIV    0x30   0        dst = (src != 0) ? (dst / src) : 0
> -BPF_SDIV   0x30   1        dst = (src != 0) ? (dst s/ src) : 0
> -BPF_OR     0x40   0        dst \|= src
> -BPF_AND    0x50   0        dst &= src
> -BPF_LSH    0x60   0        dst <<= (src & mask)
> -BPF_RSH    0x70   0        dst >>= (src & mask)
> -BPF_NEG    0x80   0        dst = -dst
> -BPF_MOD    0x90   0        dst = (src != 0) ? (dst % src) : dst
> -BPF_SMOD   0x90   1        dst = (src != 0) ? (dst s% src) : dst
> -BPF_XOR    0xa0   0        dst ^= src
> -BPF_MOV    0xb0   0        dst = src
> -BPF_MOVSX  0xb0   8/16/32  dst = (s8,s16,s32)src
> -BPF_ARSH   0xc0   0        :term:`sign extending<Sign Extend>` dst >>= (src & mask)
> -BPF_END    0xd0   0        byte swap operations (see `Byte swap instructions`_ below)
>
> -=========  =====  =======  ==========================================================
> +=========  =============  =======  ==========================================================
> +code       value(4 bits)  offset   description
> +=========  =============  =======  ==========================================================
> +BPF_ADD    0x0            0        dst += src
> +BPF_SUB    0x1            0        dst -= src
> +BPF_MUL    0x2            0        dst \*= src
> +BPF_DIV    0x3            0        dst = (src != 0) ? (dst / src) : 0
> +BPF_SDIV   0x3            1        dst = (src != 0) ? (dst s/ src) : 0
> +BPF_OR     0x4            0        dst \|= src
> +BPF_AND    0x5            0        dst &= src
> +BPF_LSH    0x6            0        dst <<= (src & mask)
> +BPF_RSH    0x7            0        dst >>= (src & mask)
> +BPF_NEG    0x8            0        dst = -dst
> +BPF_MOD    0x9            0        dst = (src != 0) ? (dst % src) : dst
> +BPF_SMOD   0x9            1        dst = (src != 0) ? (dst s% src) : dst
> +BPF_XOR    0xa            0        dst ^= src
> +BPF_MOV    0xb            0        dst = src
> +BPF_MOVSX  0xb            8/16/32  dst = (s8,s16,s32)src
> +BPF_ARSH   0xc            0        :term:`sign extending<Sign Extend>` dst >>= (src & mask)
> +BPF_END    0xd            0        byte swap operations (see `Byte swap instructions`_ below)
> +=========  =============  =======  ==========================================================

Same here.

>  Underflow and overflow are allowed during arithmetic operations, meaning
>  the 64-bit or 32-bit value will wrap. If BPF program execution would
> @@ -314,13 +314,13 @@ select what byte order the operation converts from or to. For
>  ``BPF_ALU64``, the 1-bit source operand field in the opcode is reserved
>  and must be set to 0.
>  
> -=========  =========  =====  =================================================
> -class      source     value  description
> -=========  =========  =====  =================================================
> -BPF_ALU    BPF_TO_LE  0x00   convert between host byte order and little endian
> -BPF_ALU    BPF_TO_BE  0x08   convert between host byte order and big endian
> -BPF_ALU64  Reserved   0x00   do byte swap unconditionally
> -=========  =========  =====  =================================================
> +=========  =========  ============  =================================================
> +class      source     value(1 bit)  description
> +=========  =========  ============  =================================================
> +BPF_ALU    BPF_TO_LE  0x0           convert between host byte order and little endian
> +BPF_ALU    BPF_TO_BE  0x1           convert between host byte order and big endian
> +BPF_ALU64  Reserved   0x0           do byte swap unconditionally
> +=========  =========  ============  =================================================

Same here. Which bit does the 0x1 actually correspond to? It's
self-evident in the former, not the latter.

>  
>  The 'imm' field encodes the width of the swap operations.  The following widths
>  are supported: 16, 32 and 64.
> @@ -352,27 +352,27 @@ Jump instructions
>  otherwise identical operations.
>  The 'code' field encodes the operation as below:
>  
> -========  =====  ===  ===========================================  =========================================
> -code      value  src  description                                  notes
> -========  =====  ===  ===========================================  =========================================
> -BPF_JA    0x0    0x0  PC += offset                                 BPF_JMP class
> -BPF_JA    0x0    0x0  PC += imm                                    BPF_JMP32 class
> -BPF_JEQ   0x1    any  PC += offset if dst == src
> -BPF_JGT   0x2    any  PC += offset if dst > src                    unsigned
> -BPF_JGE   0x3    any  PC += offset if dst >= src                   unsigned
> -BPF_JSET  0x4    any  PC += offset if dst & src
> -BPF_JNE   0x5    any  PC += offset if dst != src
> -BPF_JSGT  0x6    any  PC += offset if dst > src                    signed
> -BPF_JSGE  0x7    any  PC += offset if dst >= src                   signed
> -BPF_CALL  0x8    0x0  call helper function by address              see `Helper functions`_
> -BPF_CALL  0x8    0x1  call PC += imm                               see `Program-local functions`_
> -BPF_CALL  0x8    0x2  call helper function by BTF ID               see `Helper functions`_
> -BPF_EXIT  0x9    0x0  return                                       BPF_JMP only
> -BPF_JLT   0xa    any  PC += offset if dst < src                    unsigned
> -BPF_JLE   0xb    any  PC += offset if dst <= src                   unsigned
> -BPF_JSLT  0xc    any  PC += offset if dst < src                    signed
> -BPF_JSLE  0xd    any  PC += offset if dst <= src                   signed
> -========  =====  ===  ===========================================  =========================================
> +========  =============  ===  ===========================================  =========================================
> +code      value(4 bits)  src  description                                  notes
> +========  =============  ===  ===========================================  =========================================
> +BPF_JA    0x0            0x0  PC += offset                                 BPF_JMP class
> +BPF_JA    0x0            0x0  PC += imm                                    BPF_JMP32 class
> +BPF_JEQ   0x1            any  PC += offset if dst == src
> +BPF_JGT   0x2            any  PC += offset if dst > src                    unsigned
> +BPF_JGE   0x3            any  PC += offset if dst >= src                   unsigned
> +BPF_JSET  0x4            any  PC += offset if dst & src
> +BPF_JNE   0x5            any  PC += offset if dst != src
> +BPF_JSGT  0x6            any  PC += offset if dst > src                    signed
> +BPF_JSGE  0x7            any  PC += offset if dst >= src                   signed
> +BPF_CALL  0x8            0x0  call helper function by address              see `Helper functions`_
> +BPF_CALL  0x8            0x1  call PC += imm                               see `Program-local functions`_
> +BPF_CALL  0x8            0x2  call helper function by BTF ID               see `Helper functions`_
> +BPF_EXIT  0x9            0x0  return                                       BPF_JMP only
> +BPF_JLT   0xa            any  PC += offset if dst < src                    unsigned
> +BPF_JLE   0xb            any  PC += offset if dst <= src                   unsigned
> +BPF_JSLT  0xc            any  PC += offset if dst < src                    signed
> +BPF_JSLE  0xd            any  PC += offset if dst <= src                   signed
> +========  =============  ===  ===========================================  =========================================

Good catch, this definitely needed to be fixed. I personally would vote
to update it to match the other instruction classes, rather than
updating the other ones to match jump instructions. What do you think?

Thanks,
David

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread