[PATCH 0/2] BPF documentation improvements

public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/2] BPF documentation improvements
@ 2026-04-01 17:01 Vineet Gupta
  2026-04-01 17:01 ` [PATCH 1/2] bpf, doc: Clarify Pseudo-C notation and w vs r register usage Vineet Gupta
  2026-04-01 17:01 ` [PATCH 2/2] bpf, doc: Improve MOV* documentation Vineet Gupta
  0 siblings, 2 replies; 7+ messages in thread
From: Vineet Gupta @ 2026-04-01 17:01 UTC (permalink / raw)
  To: bpf; +Cc: bpf, jose.marchesi, ast, Eduard Zingerman, Yonghong Song,
	Vineet Gupta

Hi,

A couple of patches for documentation as I start to hack on gcc-bpf backend.

Thx,
-Vineet

Vineet Gupta (2):
  bpf, doc: Clarify Pseudo-C notation and w vs r register usage
  bpf, doc: Improve MOV* documentation ...

 .../bpf/standardization/instruction-set.rst   | 49 +++++++++++++------
 1 file changed, 35 insertions(+), 14 deletions(-)

-- 
2.53.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/2] bpf, doc: Clarify Pseudo-C notation and w vs r register usage
  2026-04-01 17:01 [PATCH 0/2] BPF documentation improvements Vineet Gupta
@ 2026-04-01 17:01 ` Vineet Gupta
  2026-04-01 17:59   ` bot+bpf-ci
  2026-04-01 17:01 ` [PATCH 2/2] bpf, doc: Improve MOV* documentation Vineet Gupta
  1 sibling, 1 reply; 7+ messages in thread
From: Vineet Gupta @ 2026-04-01 17:01 UTC (permalink / raw)
  To: bpf; +Cc: bpf, jose.marchesi, ast, Eduard Zingerman, Yonghong Song,
	Vineet Gupta

As a new comer to BPF ecosystem I was confused with Pseudo-C being the
actual assembly. And while its obvious now that w and r forms represent
32-bit and 64-bit regs respectively, its better to call this out in
documentation explicity and make it more newbie-proof.

Signed-off-by: Vineet Gupta <vineet.gupta@linux.dev>
---
 .../bpf/standardization/instruction-set.rst      | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/Documentation/bpf/standardization/instruction-set.rst b/Documentation/bpf/standardization/instruction-set.rst
index 39c74611752b..fd688c5d3f04 100644
--- a/Documentation/bpf/standardization/instruction-set.rst
+++ b/Documentation/bpf/standardization/instruction-set.rst
@@ -315,13 +315,21 @@ For arithmetic and jump instructions (``ALU``, ``ALU64``, ``JMP`` and
 Arithmetic instructions
 -----------------------
 
-``ALU`` uses 32-bit wide operands while ``ALU64`` uses 64-bit wide operands for
-otherwise identical operations. ``ALU64`` instructions belong to the
-base64 conformance group unless noted otherwise.
-The 'code' field encodes the operation as below, where 'src' refers to the
+``ALU`` uses 32-bit wide operands ('w' registers in assembly) while
+``ALU64`` uses 64-bit wide operands ('r' registers) for otherwise
+identical operations. ``ALU64`` instructions belong to the base64
+conformance group unless noted otherwise.
+The 'code' field encodes the operation as below, where 'src' refers to
 the source operand and 'dst' refers to the value of the destination
 register.
 
+Note: BPF ISA is unique as it uses "Pseudo-C" notation for the assembly
+      instructions. In the table below, the column name actually specifies
+      the encodings. Assembly instructions (as generated by compilers) are
+      specified in the description column for some cases. Description of
+      ``?DIV``, ``?MOD`` includes additional logic part of semantics not
+      actual assembly.
+
 .. table:: Arithmetic instructions
 
   =====  =====  =======  ===================================================================================
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] bpf, doc: Improve MOV* documentation ...
  2026-04-01 17:01 [PATCH 0/2] BPF documentation improvements Vineet Gupta
  2026-04-01 17:01 ` [PATCH 1/2] bpf, doc: Clarify Pseudo-C notation and w vs r register usage Vineet Gupta
@ 2026-04-01 17:01 ` Vineet Gupta
  2026-04-01 17:59   ` bot+bpf-ci
  1 sibling, 1 reply; 7+ messages in thread
From: Vineet Gupta @ 2026-04-01 17:01 UTC (permalink / raw)
  To: bpf; +Cc: bpf, jose.marchesi, ast, Eduard Zingerman, Yonghong Song,
	Vineet Gupta

 - Added some assembly (pseudo-C) snippets.

 - Rearrange: MOV content comes before MOVSX.

 - MOVSX content itself rearranged: canonical sign extension variant
   for {8,16,32}-> 64 moved ahead of the special variant which only
   sign extends to 32 and zeroes out the upper bits.

 - Remove the hyphen '-' in "sign-extension" to make grep hit all
   instances with one pattern.

Signed-off-by: Vineet Gupta <vineet.gupta@linux.dev>
---
 .../bpf/standardization/instruction-set.rst   | 33 +++++++++++++------
 1 file changed, 23 insertions(+), 10 deletions(-)

diff --git a/Documentation/bpf/standardization/instruction-set.rst b/Documentation/bpf/standardization/instruction-set.rst
index fd688c5d3f04..ac61d3be7af2 100644
--- a/Documentation/bpf/standardization/instruction-set.rst
+++ b/Documentation/bpf/standardization/instruction-set.rst
@@ -414,25 +414,38 @@ etc. This specification requires that signed modulo MUST use truncated division
 
    a % n = a - n * trunc(a / n)
 
-The ``MOVSX`` instruction does a move operation with sign extension.
-``{MOVSX, X, ALU}`` :term:`sign extends<Sign Extend>` 8-bit and 16-bit operands into
-32-bit operands, and zeroes the remaining upper 32 bits.
-``{MOVSX, X, ALU64}`` :term:`sign extends<Sign Extend>` 8-bit, 16-bit, and 32-bit
-operands into 64-bit operands.  Unlike other arithmetic instructions,
-``MOVSX`` is only defined for register source operands (``X``).
+For move operations, the ``MOV`` instruction has a few different forms.
+
+``{MOV, X, ALU64}`` means::
+
+  dst = src  (e.g. r1 = r2)
 
 ``{MOV, K, ALU64}`` means::
 
   dst = (s64)imm
 
-``{MOV, X, ALU}`` means::
+e.g. r1 = -4; r5 = 9282009
+
+``{MOV, X, ALU}`` has zero extension semantics (upper 32 bits are zeroed)::
 
   dst = (u32)src
 
+e.g. w5 = w9
+
+The ``MOVSX`` instruction does a move operation with sign extension.
+``{MOVSX, X, ALU64}`` :term:`sign extends<Sign Extend>` 8-bit, 16-bit, and 32-bit
+operands into 64-bit operands.
+
+The ``{MOVSX, X, ALU}`` form has slightly different semantics: it
+:term:`sign extends<Sign Extend>` 8-bit and 16-bit operands into
+32-bit operands, and zeroes the remaining upper 32 bits (similar to ``MOV``).
+
 ``{MOVSX, X, ALU}`` with 'offset' 8 means::
 
   dst = (u32)(s32)(s8)src
 
+Unlike other arithmetic instructions,
+``MOVSX`` is only defined for register source operands (``X``).
 
 The ``NEG`` instruction is only defined when the source bit is clear
 (``K``).
@@ -605,7 +618,7 @@ For load and store instructions (``LD``, ``LDX``, ``ST``, and ``STX``), the
     ABS            1      legacy BPF packet access (absolute)   `Legacy BPF Packet access instructions`_
     IND            2      legacy BPF packet access (indirect)   `Legacy BPF Packet access instructions`_
     MEM            3      regular load and store operations     `Regular load and store operations`_
-    MEMSX          4      sign-extension load operations        `Sign-extension load operations`_
+    MEMSX          4      sign extension load operations        `Sign extension load operations`_
     ATOMIC         6      atomic operations                     `Atomic operations`_
     =============  =====  ====================================  =============
 
@@ -649,10 +662,10 @@ instructions that transfer data between a register and memory.
 Where '<size>' is one of: ``B``, ``H``, ``W``, or ``DW``, and
 'unsigned size' is one of: u8, u16, u32, or u64.
 
-Sign-extension load operations
+Sign extension load operations
 ------------------------------
 
-The ``MEMSX`` mode modifier is used to encode :term:`sign-extension<Sign Extend>` load
+The ``MEMSX`` mode modifier is used to encode :term:`sign extension<Sign Extend>` load
 instructions that transfer data between a register and memory.
 
 ``{MEMSX, <size>, LDX}`` means::
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] bpf, doc: Improve MOV* documentation ...
  2026-04-01 17:01 ` [PATCH 2/2] bpf, doc: Improve MOV* documentation Vineet Gupta
@ 2026-04-01 17:59   ` bot+bpf-ci
  2026-04-01 18:28     ` Vineet Gupta
  0 siblings, 1 reply; 7+ messages in thread
From: bot+bpf-ci @ 2026-04-01 17:59 UTC (permalink / raw)
  To: vineet.gupta, bpf
  Cc: bpf, jose.marchesi, ast, eddyz87, yonghong.song, vineet.gupta,
	ast, andrii, daniel, martin.lau, eddyz87, yonghong.song, clm,
	ihor.solodrai

[-- Attachment #1: Type: text/plain, Size: 1350 bytes --]

> diff --git a/Documentation/bpf/standardization/instruction-set.rst b/Documentation/bpf/standardization/instruction-set.rst
> index fd688c5d3f04d..ac61d3be7af21 100644
> --- a/Documentation/bpf/standardization/instruction-set.rst
> +++ b/Documentation/bpf/standardization/instruction-set.rst

[ ... ]

> +``{MOV, X, ALU64}`` means::
> +
> +  dst = src  (e.g. r1 = r2)
> +
> +``{MOV, K, ALU64}`` means::
> +
> +  dst = (s64)imm
> +
> +e.g. r1 = -4; r5 = 9282009
> +
> +``{MOV, X, ALU}`` has zero extension semantics (upper 32 bits are zeroed)::
> +
> +  dst = (u32)src
> +
> +e.g. w5 = w9

Should the example for {MOV, X, ALU64} be placed outside the code block,
matching the style used by {MOV, K, ALU64} and {MOV, X, ALU} just below?

As written, the RST literal block renders "(e.g. r1 = r2)" as part of
the formal pseudo-C definition.  The other two entries place their
examples as separate paragraph text after the code block, which keeps
the spec definition distinct from the illustrative examples.

Something like this would be consistent:

  ``{MOV, X, ALU64}`` means::

    dst = src

  e.g. r1 = r2


---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/23861455083

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] bpf, doc: Clarify Pseudo-C notation and w vs r register usage
  2026-04-01 17:01 ` [PATCH 1/2] bpf, doc: Clarify Pseudo-C notation and w vs r register usage Vineet Gupta
@ 2026-04-01 17:59   ` bot+bpf-ci
  2026-04-01 18:10     ` Vineet Gupta
  0 siblings, 1 reply; 7+ messages in thread
From: bot+bpf-ci @ 2026-04-01 17:59 UTC (permalink / raw)
  To: vineet.gupta, bpf
  Cc: bpf, jose.marchesi, ast, eddyz87, yonghong.song, vineet.gupta,
	ast, andrii, daniel, martin.lau, eddyz87, yonghong.song, clm,
	ihor.solodrai

[-- Attachment #1: Type: text/plain, Size: 2055 bytes --]

> diff --git a/Documentation/bpf/standardization/instruction-set.rst b/Documentation/bpf/standardization/instruction-set.rst
> index 39c74611752b6..fd688c5d3f04d 100644
> --- a/Documentation/bpf/standardization/instruction-set.rst
> +++ b/Documentation/bpf/standardization/instruction-set.rst
> @@ -315,13 +315,21 @@ For arithmetic and jump instructions (``ALU``, ``ALU64``, ``JMP`` and

[ ... ]

> +Note: BPF ISA is unique as it uses "Pseudo-C" notation for the assembly
> +      instructions. In the table below, the column name actually specifies
> +      the encodings. Assembly instructions (as generated by compilers) are

Looking at the arithmetic instructions table just below, the columns are
"name", "code", "offset", and "description".  The "name" column contains
mnemonics (ADD, SUB, MUL, etc.) while the "code" column has the actual
numeric encodings (0x0, 0x1, 0x2, etc.).

Does "the column name actually specifies the encodings" refer to the
column titled "name"?  If so, those are mnemonic names rather than
encodings.  The numeric encodings are in the "code" column.  Could this
be reworded to avoid conflating mnemonics with encodings?

> +      specified in the description column for some cases. Description of
> +      ``?DIV``, ``?MOD`` includes additional logic part of semantics not
> +      actual assembly.

The "?DIV" and "?MOD" notation appears to use '?' as a wildcard to
match both DIV/SDIV and MOD/SMOD, but this convention is not defined
anywhere in the document.  Would it be clearer to spell these out
explicitly, e.g. "DIV/SDIV" and "MOD/SMOD"?

Also, "includes additional logic part of semantics not actual assembly"
reads a bit rough.  Perhaps something like "includes additional semantic
logic, not actual assembly" would be clearer.

> +
> +.. table:: Arithmetic instructions

---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/23861455083

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] bpf, doc: Clarify Pseudo-C notation and w vs r register usage
  2026-04-01 17:59   ` bot+bpf-ci
@ 2026-04-01 18:10     ` Vineet Gupta
  0 siblings, 0 replies; 7+ messages in thread
From: Vineet Gupta @ 2026-04-01 18:10 UTC (permalink / raw)
  To: bot+bpf-ci, bpf
  Cc: bpf, jose.marchesi, ast, eddyz87, yonghong.song, andrii, daniel,
	martin.lau, clm, ihor.solodrai

On 4/1/26 10:59 AM, bot+bpf-ci@kernel.org wrote:
>> +Note: BPF ISA is unique as it uses "Pseudo-C" notation for the assembly
>> +      instructions. In the table below, the column name actually specifies
>> +      the encodings. Assembly instructions (as generated by compilers) are
> Looking at the arithmetic instructions table just below, the columns are
> "name", "code", "offset", and "description".  The "name" column contains
> mnemonics (ADD, SUB, MUL, etc.) while the "code" column has the actual
> numeric encodings (0x0, 0x1, 0x2, etc.).

My understanding is that the ISA itself is specified in Pesudo-C 
assembly and formally lacks mnemonics which in a conventional ISA would 
would refer to the "operator" of an instruction, spelled out in English.

> Does "the column name actually specifies the encodings" refer to the
> column titled "name"?

Yes that's what I meant.

> If so, those are mnemonic names rather than
> encodings.  The numeric encodings are in the "code" column.  Could this
> be reworded to avoid conflating mnemonics with encodings?

I can omit this part if this is controversial or non agreeable.


>> +      specified in the description column for some cases. Description of
>> +      ``?DIV``, ``?MOD`` includes additional logic part of semantics not
>> +      actual assembly.
> The "?DIV" and "?MOD" notation appears to use '?' as a wildcard to
> match both DIV/SDIV and MOD/SMOD, but this convention is not defined
> anywhere in the document.  Would it be clearer to spell these out
> explicitly, e.g. "DIV/SDIV" and "MOD/SMOD"?

OK.

> Also, "includes additional logic part of semantics not actual assembly"
> reads a bit rough.  Perhaps something like "includes additional semantic
> logic, not actual assembly" would be clearer.

OK.

Thx,
-Vineet

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] bpf, doc: Improve MOV* documentation ...
  2026-04-01 17:59   ` bot+bpf-ci
@ 2026-04-01 18:28     ` Vineet Gupta
  0 siblings, 0 replies; 7+ messages in thread
From: Vineet Gupta @ 2026-04-01 18:28 UTC (permalink / raw)
  To: bot+bpf-ci, bpf
  Cc: bpf, jose.marchesi, ast, eddyz87, yonghong.song, andrii, daniel,
	martin.lau, clm, ihor.solodrai

On 4/1/26 10:59 AM, bot+bpf-ci@kernel.org wrote:
>> diff --git a/Documentation/bpf/standardization/instruction-set.rst b/Documentation/bpf/standardization/instruction-set.rst
>> index fd688c5d3f04d..ac61d3be7af21 100644
>> --- a/Documentation/bpf/standardization/instruction-set.rst
>> +++ b/Documentation/bpf/standardization/instruction-set.rst
> [ ... ]
>
>> +``{MOV, X, ALU64}`` means::
>> +
>> +  dst = src  (e.g. r1 = r2)
>> +
>> +``{MOV, K, ALU64}`` means::
>> +
>> +  dst = (s64)imm
>> +
>> +e.g. r1 = -4; r5 = 9282009
>> +
>> +``{MOV, X, ALU}`` has zero extension semantics (upper 32 bits are zeroed)::
>> +
>> +  dst = (u32)src
>> +
>> +e.g. w5 = w9
> Should the example for {MOV, X, ALU64} be placed outside the code block,
> matching the style used by {MOV, K, ALU64} and {MOV, X, ALU} just below?
>
> As written, the RST literal block renders "(e.g. r1 = r2)" as part of
> the formal pseudo-C definition.  The other two entries place their
> examples as separate paragraph text after the code block, which keeps
> the spec definition distinct from the illustrative examples.
>
> Something like this would be consistent:
>
>    ``{MOV, X, ALU64}`` means::
>
>      dst = src
>
>    e.g. r1 = r2

OK.

Thx,
-Vineet
>
>
> ---
> AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
> See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
>
> CI run summary: https://github.com/kernel-patches/bpf/actions/runs/23861455083


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-04-01 18:28 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-01 17:01 [PATCH 0/2] BPF documentation improvements Vineet Gupta
2026-04-01 17:01 ` [PATCH 1/2] bpf, doc: Clarify Pseudo-C notation and w vs r register usage Vineet Gupta
2026-04-01 17:59   ` bot+bpf-ci
2026-04-01 18:10     ` Vineet Gupta
2026-04-01 17:01 ` [PATCH 2/2] bpf, doc: Improve MOV* documentation Vineet Gupta
2026-04-01 17:59   ` bot+bpf-ci
2026-04-01 18:28     ` Vineet Gupta

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox