Re: [Bpf] [PATCH] bpf, docs: Document BPF insn encoding in term of stored bytes

From: "Jose E. Marchesi" <jemarch@gnu.org>
To: David Vernet <void@manifault.com>
Cc: "Jose E. Marchesi" <jose.marchesi@oracle.com>,
	bpf <bpf@vger.kernel.org>,
	Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	bpf@ietf.org
Subject: Re: [Bpf] [PATCH] bpf, docs: Document BPF insn encoding in term of stored bytes
Date: Mon, 27 Feb 2023 19:05:05 +0100	[thread overview]
Message-ID: <877cw39e7i.fsf@gnu.org> (raw)
In-Reply-To: <Y/zftLx9nDF5tb9G@maniforge> (David Vernet's message of "Mon, 27 Feb 2023 10:52:04 -0600")

Hi David.

Thanks for the comments, and sorry for the rst formatting mistakes.  I
forgot to add the :: before the indented blocks.  I'm resending with
these.

> On Fri, Feb 24, 2023 at 09:04:07PM +0100, Jose E. Marchesi wrote:
>> 
>> This patch modifies instruction-set.rst so it documents the encoding
>> of BPF instructions in terms of how the bytes are stored (be it in an
>> ELF file or as bytes in a memory buffer to be loaded into the kernel
>> or some other BPF consumer) as opposed to how the instruction looks
>> like once loaded.
>> 
>> This is hopefully easier to understand by implementors looking to
>> generate and/or consume bytes conforming BPF instructions.
>> 
>> The patch also clarifies that the unused bytes in a pseudo-instruction
>> shall be cleared with zeros.
>> 
>> Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com>
>
> Hi Jose,
>
> Thanks for writing this up.
>
>> ---
>>  Documentation/bpf/instruction-set.rst | 43 +++++++++++++--------------
>>  1 file changed, 21 insertions(+), 22 deletions(-)
>> 
>> diff --git a/Documentation/bpf/instruction-set.rst b/Documentation/bpf/instruction-set.rst
>> index 01802ed9b29b..9b28c0e15bb6 100644
>> --- a/Documentation/bpf/instruction-set.rst
>> +++ b/Documentation/bpf/instruction-set.rst
>> @@ -38,15 +38,13 @@ eBPF has two instruction encodings:
>>  * the wide instruction encoding, which appends a second 64-bit immediate (i.e.,
>>    constant) value after the basic instruction for a total of 128 bits.
>>  
>> -The basic instruction encoding looks as follows for a little-endian processor,
>> -where MSB and LSB mean the most significant bits and least significant bits,
>> -respectively:
>> +The fields conforming an encoded basic instruction are stored in the
>> +following order:
>>  
>> -=============  =======  =======  =======  ============
>> -32 bits (MSB)  16 bits  4 bits   4 bits   8 bits (LSB)
>> -=============  =======  =======  =======  ============
>> -imm            offset   src_reg  dst_reg  opcode
>> -=============  =======  =======  =======  ============
>> +  opcode:8 src:4 dst:4 offset:16 imm:32 // In little-endian BPF.
>> +  opcode:8 dst:4 src:4 offset:16 imm:32 // In big-endian BPF.
>
> Unfortunately this won't render correctly. It'll look something like
> this:
>
> The fields conforming an encoded basic instruction are stored in the following order:
>
> opcode:8 src:4 dst:4 offset:16 imm:32 // In little-endian
> BPF. opcode:8 dst:4 src:4 offset:16 imm:32 // In big-endian BPF.
>
> You'll have to add some extra newlines. You can test out your changes
> with:
>
> make SPHINXDIRS=bpf htmldocs
>
> And then the output is put in Documentation/output/bpf
>
> In general, this is sort of the problem we have with rst. We want to
> strike a balance between readable in a text editor, and readable when
> rendered in a web browser. I think we can strike such a balance here,
> but it'll probably require a bit of rst-fu. As described below, I think
> we can fix this with a literal code block by just adding a : to order:
>
>> +The fields conforming an encoded basic instruction are stored in the
>> +following order::
>
>
>> +
>> +Where,
>>  
>>  **imm**
>>    signed integer immediate value
>> @@ -64,16 +62,17 @@ imm            offset   src_reg  dst_reg  opcode
>>  **opcode**
>>    operation to perform
>>  
>> -and as follows for a big-endian processor:
>> +Note that the contents of multi-byte fields ('imm' and 'offset') are
>> +stored using big-endian byte ordering in big-endian BPF and
>> +little-endian byte ordering in little-endian BPF.
>>  
>> -=============  =======  =======  =======  ============
>> -32 bits (MSB)  16 bits  4 bits   4 bits   8 bits (LSB)
>> -=============  =======  =======  =======  ============
>> -imm            offset   dst_reg  src_reg  opcode
>> -=============  =======  =======  =======  ============
>> +For example:
>>  
>> -Multi-byte fields ('imm' and 'offset') are similarly stored in
>> -the byte order of the processor.
>> +  opcode         offset imm          assembly
>> +         src dst
>> +  07     0   1   00 00  44 33 22 11  r1 += 0x11223344 // little
>> +         dst src
>> +  07     1   0   00 00  11 22 33 44  r1 += 0x11223344 // big
>
> This also won't render. rst will think it's a "definition list" (see
> [0]), so it's interpreting the line with '// big' as a term that will be
> defined on the next line.
>
> [0]: https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html#lists-and-quote-like-blocks
>
> If you do "For example::" it should render correctly. This applies
> elsewhere to the patch. Let's just make all of these literal code
> blocks.
>
> Thanks,
> David
>
>>  
>>  Note that most instructions do not use all of the fields.
>>  Unused fields shall be cleared to zero.
>> @@ -84,18 +83,18 @@ The 64 bits following the basic instruction contain a pseudo instruction
>>  using the same format but with opcode, dst_reg, src_reg, and offset all set to zero,
>>  and imm containing the high 32 bits of the immediate value.
>>  
>> -=================  ==================
>> -64 bits (MSB)      64 bits (LSB)
>> -=================  ==================
>> -basic instruction  pseudo instruction
>> -=================  ==================
>> +This is depicted in the following figure:
>> +
>> +  basic_instruction                 pseudo_instruction
>> +  code:8 regs:16 offset:16 imm:32 | unused:32 imm:32
>>  
>>  Thus the 64-bit immediate value is constructed as follows:
>>  
>>    imm64 = (next_imm << 32) | imm
>>  
>>  where 'next_imm' refers to the imm value of the pseudo instruction
>> -following the basic instruction.
>> +following the basic instruction.  The unused bytes in the pseudo
>> +instruction shall be cleared to zero.
>>  
>>  Instruction classes
>>  -------------------
>> -- 
>> 2.30.2
>> 
>> -- 
>> Bpf mailing list
>> Bpf@ietf.org
>> https://www.ietf.org/mailman/listinfo/bpf