From: David Vernet <void@manifault.com>
To: dthaler1968@googlemail.com
Cc: 'Aoyang Fang' <aoyangfang@link.cuhk.edu.cn>,
bpf@vger.kernel.org, bpf@ietf.org
Subject: Re: [PATCH bpf-next] The original document has some inconsistency.
Date: Tue, 9 Jan 2024 13:10:37 -0600 [thread overview]
Message-ID: <20240109191037.GC79024@maniforge> (raw)
In-Reply-To: <016101da4326$8dbad1a0$a93074e0$@gmail.com>
[-- Attachment #1: Type: text/plain, Size: 20323 bytes --]
On Tue, Jan 09, 2024 at 10:06:22AM -0800, dthaler1968@googlemail.com wrote:
> David Vernet <void@manifault.com> writes:
> > Hi Aoyang,
> >
> > Thanks a lot for your contribution. I agree that we need to fix the
> document
> > to be consistent, though I'm afraid that I think this patch goes in the
> wrong
> > direction by making everything match the jump instruction class. More
> below.
>
> I disagree, and I agree with Aoyang's direction.
>
> > nit: Could you please update the patch subject to be more self-describing.
> For
> > example, something like:
> >
> > Use consistent numerical widths in instructions.rst encodings
>
> I agree with that subject.
>
> > > For example:
> > > 1. 1.3.1 Arithmetic instructions use '8 bits length' encoding to
> > > express the 'code' value, e.g., BPF_ADD=0x00, BPF_SUB=0x10,
> > > BPF_MUL=0x20. However the length of the 'code' is 4 bits. On the
> > > other hand, 1.3.3 Jump instructions use '4 bits length' encoding,
> > > e.g., BPF_JEQ=0x1 and BPF_JGT=0x2.
> > > 2. There are also many places that use '8 bits length' encoding to
> > > express the corresponding contents, e.g., 1.4 Load and store
> > > instructions, BPF_ABS=0x20, BPF_IND=0x40. However, the length of
> > > 'mode modifier' is 3 bits.
> > >
> > > To summarize, the only place that has inconsistent encoding is Jump
> > > instructions. After discussing with Dave, dthaler1968@googlemail.com,
> > > we agree that the document should be more clear.
> > >
> > > Signed-off-by: Aoyang Fang <aoyangfang@link.cuhk.edu.cn>
> > >
> > > ---
> > > .../bpf/standardization/instruction-set.rst | 170 +++++++++---------
> > > 1 file changed, 85 insertions(+), 85 deletions(-)
> > >
> > > diff --git a/Documentation/bpf/standardization/instruction-set.rst
> > > b/Documentation/bpf/standardization/instruction-set.rst
> > > index 245b6defc..57dd1fa00 100644
> > > --- a/Documentation/bpf/standardization/instruction-set.rst
> > > +++ b/Documentation/bpf/standardization/instruction-set.rst
> > > @@ -172,18 +172,18 @@ Instruction classes
> > >
> > > The three LSB bits of the 'opcode' field store the instruction class:
> > >
> > > -========= ===== ===============================
> > ===================================
> > > -class value description reference
> > > -========= ===== ===============================
> > ===================================
> > > -BPF_LD 0x00 non-standard load operations `Load and store
> > instructions`_
> > > -BPF_LDX 0x01 load into register operations `Load and store
> > instructions`_
> > > -BPF_ST 0x02 store from immediate operations `Load and store
> > instructions`_
> > > -BPF_STX 0x03 store from register operations `Load and store
> > instructions`_
> > > -BPF_ALU 0x04 32-bit arithmetic operations `Arithmetic and jump
> > instructions`_
> > > -BPF_JMP 0x05 64-bit jump operations `Arithmetic and jump
> > instructions`_
> > > -BPF_JMP32 0x06 32-bit jump operations `Arithmetic and jump
> > instructions`_
> > > -BPF_ALU64 0x07 64-bit arithmetic operations `Arithmetic and jump
> > instructions`_
> > > -========= ===== ===============================
> > > ===================================
> > > +========= ============= ===============================
> > ===================================
> > > +class value(3 bits) description reference
> > > +========= ============= ===============================
> > ===================================
> > > +BPF_LD 0x0 non-standard load operations `Load and
> store
> > instructions`_
> > > +BPF_LDX 0x1 load into register operations `Load and
> store
> > instructions`_
> > > +BPF_ST 0x2 store from immediate operations `Load and
> store
> > instructions`_
> > > +BPF_STX 0x3 store from register operations `Load and
> store
> > instructions`_
> > > +BPF_ALU 0x4 32-bit arithmetic operations `Arithmetic
> and jump
> > instructions`_
> > > +BPF_JMP 0x5 64-bit jump operations `Arithmetic
> and jump
> > instructions`_
> > > +BPF_JMP32 0x6 32-bit jump operations `Arithmetic
> and jump
> > instructions`_
> > > +BPF_ALU64 0x7 64-bit arithmetic operations `Arithmetic
> and jump
> > instructions`_
> > > +========= ============= ===============================
> > > +===================================
> >
> > Hmm, I presonally think this is more confusing. The opcode field is 8
> bits. We
> > already specify that the value is the three LSB of the opcode field. It's
> > certainly subjective, but I think we should have the value reflect the
> actual
> > value in the field it's embedded in. In my opinion, changing the value to
> not
> > reflect its place in the actual opcode in my opinion imposes a burden on
> the
> > reader to go back and reference where the field actually belongs in the
> full
> > opcode. It's a tradeoff, but I think we're already on the winning end of
> that
> > tradeoff.
>
> This document is an IETF standards specification so it's worth looking at
> what
> typical RFC conventions are.
>
> * RFC 791 section 3.1 defines the IPv4 header, where the Version field is
> the high
> 4 bits of a byte. It defines the value as 4, not 0x40.
> It also defines the Type of Service bits which are 1 bit fields with
> value 0 or 1
> (not, say 0x40).
> * RFC 8200 section 3 defines the IPv6 header, where the Version field is the
> high
> 4 bits of a byte. It defines the value as 6, not 0x60.
I definitely believe we should follow IETF conventions, but from looking
at [0] it looks like we're already deviating.
[0]: https://www.rfc-editor.org/rfc/rfc791.txt
Here are some excerpts from what's there:
-------------------------------------------------------------------------------
3.1. Internet Header Format
A summary of the contents of the internet header follows:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Version| IHL |Type of Service| Total Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Identification |Flags| Fragment Offset |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Time to Live | Protocol | Header Checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Source Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Destination Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Options | Padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Example Internet Datagram Header
Figure 4.
Note that each tick mark represents one bit position.
Version: 4 bits
The Version field indicates the format of the internet header. This
document describes version 4.
[...]
Type of Service: 8 bits
The Type of Service provides an indication of the abstract
parameters of the quality of service desired. These parameters are
to be used to guide the selection of the actual service parameters
when transmitting a datagram through a particular network. Several
networks offer service precedence, which somehow treats high
precedence traffic as more important than other traffic (generally
by accepting only traffic above a certain precedence at time of high
load). The major choice is a three way tradeoff between low-delay,
high-reliability, and high-throughput.
Bits 0-2: Precedence.
Bit 3: 0 = Normal Delay, 1 = Low Delay.
Bits 4: 0 = Normal Throughput, 1 = High Throughput.
Bits 5: 0 = Normal Relibility, 1 = High Relibility.
Bit 6-7: Reserved for Future Use.
0 1 2 3 4 5 6 7
+-----+-----+-----+-----+-----+-----+-----+-----+
| | | | | | |
| PRECEDENCE | D | T | R | 0 | 0 |
| | | | | | |
+-----+-----+-----+-----+-----+-----+-----+-----+
Precedence
111 - Network Control
110 - Internetwork Control
101 - CRITIC/ECP
100 - Flash Override
011 - Flash
010 - Immediate
001 - Priority
000 - Routine
[...]
APPENDIX A: Examples & Scenarios
Example 1:
This is an example of the minimal data carrying internet datagram:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Ver= 4 |IHL= 5 |Type of Service| Total Length = 21 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Identification = 111 |Flg=0| Fragment Offset = 0 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Time = 123 | Protocol = 1 | header checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| source address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| destination address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| data |
+-+-+-+-+-+-+-+-+
-------------------------------------------------------------------------------
This is already pretty different from how we're visualizing and
enumerating the instructions in our document. Consider:
1. They're not even using numerical values to define some fields, such
as with Type of Service. They're specifying the exact values of
individual bits within the field (e.g. with Precedence).
2. They're using decimal instead of hexadecimal.
Unless I'm missing something, it seems like the deviation in terms of
using 0x40 vs. 0x4 is specific to how they present examples in the
appendices (though even the appendices are using base 10).
So while I certainly agree that we should follow conventions, I think
I'd prefer that we either follow them completely, or not sacrifice
readability by following them in specific ways which don't necessarily
match the chosen format for our document.
> Etc. Offhand I am not aware of any RFC that uses the convention you
> suggest,
> though perhaps others are?
I am not, no.
> > > Arithmetic and jump instructions
> > > ================================
> > > @@ -203,12 +203,12 @@ code source instruction class
> > > **source**
> > > the source operand location, which unless otherwise specified is one
> of:
> > >
> > > - ====== ===== ==============================================
> > > - source value description
> > > - ====== ===== ==============================================
> > > - BPF_K 0x00 use 32-bit 'imm' value as source operand
> > > - BPF_X 0x08 use 'src_reg' register value as source operand
> > > - ====== ===== ==============================================
> > > + ====== ============
> > > + ==============================================
> > > + source value(1 bit) description
> > > + ====== ============
> > ==============================================
> > > + BPF_K 0x0 use 32-bit 'imm' value as source operand
> > > + BPF_X 0x1 use 'src_reg' register value as source operand
> > > + ====== ============
> > > + ==============================================
> >
> > Same here as well. The value isn't really 0x1, it's 0x8. And 0x08 is even
> more
> > clear yet, given that we're representing the value of the bit in the 8 bit
> opcode
> > field.
>
> Its 1, in the same sense as the TOS bits in RFC 791 are 1.
>
> > > **instruction class**
> > > the instruction class (see `Instruction classes`_) @@ -221,27
> > > +221,27 @@ otherwise identical operations.
> > > The 'code' field encodes the operation as below, where 'src' and
> > > 'dst' refer to the values of the source and destination registers,
> respectively.
> > >
> > > -========= ===== =======
> > ==========================================================
> > > -code value offset description
> > > -========= ===== =======
> > ==========================================================
> > > -BPF_ADD 0x00 0 dst += src
> > > -BPF_SUB 0x10 0 dst -= src
> > > -BPF_MUL 0x20 0 dst \*= src
> > > -BPF_DIV 0x30 0 dst = (src != 0) ? (dst / src) : 0
> > > -BPF_SDIV 0x30 1 dst = (src != 0) ? (dst s/ src) : 0
> > > -BPF_OR 0x40 0 dst \|= src
> > > -BPF_AND 0x50 0 dst &= src
> > > -BPF_LSH 0x60 0 dst <<= (src & mask)
> > > -BPF_RSH 0x70 0 dst >>= (src & mask)
> > > -BPF_NEG 0x80 0 dst = -dst
> > > -BPF_MOD 0x90 0 dst = (src != 0) ? (dst % src) : dst
> > > -BPF_SMOD 0x90 1 dst = (src != 0) ? (dst s% src) : dst
> > > -BPF_XOR 0xa0 0 dst ^= src
> > > -BPF_MOV 0xb0 0 dst = src
> > > -BPF_MOVSX 0xb0 8/16/32 dst = (s8,s16,s32)src
> > > -BPF_ARSH 0xc0 0 :term:`sign extending<Sign Extend>` dst >>=
> (src &
> > mask)
> > > -BPF_END 0xd0 0 byte swap operations (see `Byte swap
> instructions`_
> > below)
> > >
> > > -========= ===== =======
> > > ==========================================================
> > > +========= ============= =======
> > ==========================================================
> > > +code value(4 bits) offset description
> > > +========= ============= =======
> > ==========================================================
> > > +BPF_ADD 0x0 0 dst += src
> > > +BPF_SUB 0x1 0 dst -= src
> > > +BPF_MUL 0x2 0 dst \*= src
> > > +BPF_DIV 0x3 0 dst = (src != 0) ? (dst / src) : 0
> > > +BPF_SDIV 0x3 1 dst = (src != 0) ? (dst s/ src) : 0
> > > +BPF_OR 0x4 0 dst \|= src
> > > +BPF_AND 0x5 0 dst &= src
> > > +BPF_LSH 0x6 0 dst <<= (src & mask)
> > > +BPF_RSH 0x7 0 dst >>= (src & mask)
> > > +BPF_NEG 0x8 0 dst = -dst
> > > +BPF_MOD 0x9 0 dst = (src != 0) ? (dst % src) : dst
> > > +BPF_SMOD 0x9 1 dst = (src != 0) ? (dst s% src) :
> dst
> > > +BPF_XOR 0xa 0 dst ^= src
> > > +BPF_MOV 0xb 0 dst = src
> > > +BPF_MOVSX 0xb 8/16/32 dst = (s8,s16,s32)src
> > > +BPF_ARSH 0xc 0 :term:`sign extending<Sign Extend>`
> dst >>=
> > (src & mask)
> > > +BPF_END 0xd 0 byte swap operations (see `Byte swap
> > instructions`_ below)
> > > +========= ============= =======
> > > +==========================================================
> >
> > Same here.
> >
> > > Underflow and overflow are allowed during arithmetic operations,
> > > meaning the 64-bit or 32-bit value will wrap. If BPF program
> > > execution would @@ -314,13 +314,13 @@ select what byte order the
> > > operation converts from or to. For ``BPF_ALU64``, the 1-bit source
> > > operand field in the opcode is reserved and must be set to 0.
> > >
> > > -========= ========= =====
> > =================================================
> > > -class source value description
> > > -========= ========= =====
> > =================================================
> > > -BPF_ALU BPF_TO_LE 0x00 convert between host byte order and little
> > endian
> > > -BPF_ALU BPF_TO_BE 0x08 convert between host byte order and big
> > endian
> > > -BPF_ALU64 Reserved 0x00 do byte swap unconditionally
> > > -========= ========= =====
> > > =================================================
> > > +========= ========= ============
> > =================================================
> > > +class source value(1 bit) description
> > > +========= ========= ============
> > =================================================
> > > +BPF_ALU BPF_TO_LE 0x0 convert between host byte order and
> little
> > endian
> > > +BPF_ALU BPF_TO_BE 0x1 convert between host byte order and
> big
> > endian
> > > +BPF_ALU64 Reserved 0x0 do byte swap unconditionally
> > > +========= ========= ============
> > > +=================================================
> >
> > Same here. Which bit does the 0x1 actually correspond to? It's
> self-evident in
> > the former, not the latter.
>
> Would you then say that RFC 791 (and many RFCs since) is not self-evident?
Not at all, I would say that they're presenting their information in a
very different way. Imagine if they presented Type of Service in this
way:
Type of Service: 8 bits
============ ===== ========== =========== ========
3 bits (MSB) 1 bit 1 bit 1 bit 2 bits
============ ===== ========== =========== ========
Precedence Delay Throughput Reliability Reserved
============ ===== ========== =========== ========
**Precedence**
The precedence with which traffic should be served.
**Delay**
The quality of service requirement for delay.
**Throughput**
The quality of service requirement for throughput.
**Reliability**
The quality of service requirement for reliability.
**Reserved**
Bits reserved for future use
Precedence types
----------------
...
========== =====
Precedence value
========== =====
Routine 0x0
Priority 0x1
Immediate 0x2
Flash 0x3
Flash Override 0x4
CRITIC/ECP 0x5
Internetwork Control 0x6
Network Control 0x7
Delay
-----
===== =====
Delay value
===== =====
Normal Delay 0x0
Low Delay 0x1
Throughput
----------
========== =====
Throughput value
========== =====
Normal Throughput 0x0
High Throughput 0x1
Reliability
-----
=========== =====
Reliability value
=========== =====
Normal Reliability 0x0
High Reliability 0x1
Reserved
--------
============== =====
Resrved value
============== =====
Reserved bit 0 0x0
Reserved bit 1 0x1
Reserved bit 2 0x2
Reserved bit 3 0x3
-----
This is of course in contrast to the much more concise way they defined
it above. If they did end up defining it this way, I think it would be
less clear to use the actual bit values rather than just the actual
value in the Type of Service mask. It's subjective, but I just don't
really see them as equivalent.
> If the WG chooses to diverge from the most common ways the IETF defines
> bit formats, that might be ok but may need a section explaining the
> divergent convention. My personal preference though is to stay consistent
> with the normal IETF convention, which part of the ISA doc already did.
I agree with you that we should stay consistent, but it seems like we're
being selective about it. Could you help me understand why the
deviations we have already wouldn't have required a separate section?
Thanks,
David
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
WARNING: multiple messages have this Message-ID (diff)
From: David Vernet <void@manifault.com>
To: dthaler1968@googlemail.com
Cc: 'Aoyang Fang' <aoyangfang@link.cuhk.edu.cn>,
bpf@vger.kernel.org, bpf@ietf.org
Subject: Re: [Bpf] [PATCH bpf-next] The original document has some inconsistency.
Date: Tue, 9 Jan 2024 13:10:37 -0600 [thread overview]
Message-ID: <20240109191037.GC79024@maniforge> (raw)
Message-ID: <20240109191037.Svr_0oNlz3RlzVjAapIeU-gmn0hptvNqaE-1DWwRk58@z> (raw)
In-Reply-To: <016101da4326$8dbad1a0$a93074e0$@gmail.com>
[-- Attachment #1.1: Type: text/plain, Size: 20323 bytes --]
On Tue, Jan 09, 2024 at 10:06:22AM -0800, dthaler1968@googlemail.com wrote:
> David Vernet <void@manifault.com> writes:
> > Hi Aoyang,
> >
> > Thanks a lot for your contribution. I agree that we need to fix the
> document
> > to be consistent, though I'm afraid that I think this patch goes in the
> wrong
> > direction by making everything match the jump instruction class. More
> below.
>
> I disagree, and I agree with Aoyang's direction.
>
> > nit: Could you please update the patch subject to be more self-describing.
> For
> > example, something like:
> >
> > Use consistent numerical widths in instructions.rst encodings
>
> I agree with that subject.
>
> > > For example:
> > > 1. 1.3.1 Arithmetic instructions use '8 bits length' encoding to
> > > express the 'code' value, e.g., BPF_ADD=0x00, BPF_SUB=0x10,
> > > BPF_MUL=0x20. However the length of the 'code' is 4 bits. On the
> > > other hand, 1.3.3 Jump instructions use '4 bits length' encoding,
> > > e.g., BPF_JEQ=0x1 and BPF_JGT=0x2.
> > > 2. There are also many places that use '8 bits length' encoding to
> > > express the corresponding contents, e.g., 1.4 Load and store
> > > instructions, BPF_ABS=0x20, BPF_IND=0x40. However, the length of
> > > 'mode modifier' is 3 bits.
> > >
> > > To summarize, the only place that has inconsistent encoding is Jump
> > > instructions. After discussing with Dave, dthaler1968@googlemail.com,
> > > we agree that the document should be more clear.
> > >
> > > Signed-off-by: Aoyang Fang <aoyangfang@link.cuhk.edu.cn>
> > >
> > > ---
> > > .../bpf/standardization/instruction-set.rst | 170 +++++++++---------
> > > 1 file changed, 85 insertions(+), 85 deletions(-)
> > >
> > > diff --git a/Documentation/bpf/standardization/instruction-set.rst
> > > b/Documentation/bpf/standardization/instruction-set.rst
> > > index 245b6defc..57dd1fa00 100644
> > > --- a/Documentation/bpf/standardization/instruction-set.rst
> > > +++ b/Documentation/bpf/standardization/instruction-set.rst
> > > @@ -172,18 +172,18 @@ Instruction classes
> > >
> > > The three LSB bits of the 'opcode' field store the instruction class:
> > >
> > > -========= ===== ===============================
> > ===================================
> > > -class value description reference
> > > -========= ===== ===============================
> > ===================================
> > > -BPF_LD 0x00 non-standard load operations `Load and store
> > instructions`_
> > > -BPF_LDX 0x01 load into register operations `Load and store
> > instructions`_
> > > -BPF_ST 0x02 store from immediate operations `Load and store
> > instructions`_
> > > -BPF_STX 0x03 store from register operations `Load and store
> > instructions`_
> > > -BPF_ALU 0x04 32-bit arithmetic operations `Arithmetic and jump
> > instructions`_
> > > -BPF_JMP 0x05 64-bit jump operations `Arithmetic and jump
> > instructions`_
> > > -BPF_JMP32 0x06 32-bit jump operations `Arithmetic and jump
> > instructions`_
> > > -BPF_ALU64 0x07 64-bit arithmetic operations `Arithmetic and jump
> > instructions`_
> > > -========= ===== ===============================
> > > ===================================
> > > +========= ============= ===============================
> > ===================================
> > > +class value(3 bits) description reference
> > > +========= ============= ===============================
> > ===================================
> > > +BPF_LD 0x0 non-standard load operations `Load and
> store
> > instructions`_
> > > +BPF_LDX 0x1 load into register operations `Load and
> store
> > instructions`_
> > > +BPF_ST 0x2 store from immediate operations `Load and
> store
> > instructions`_
> > > +BPF_STX 0x3 store from register operations `Load and
> store
> > instructions`_
> > > +BPF_ALU 0x4 32-bit arithmetic operations `Arithmetic
> and jump
> > instructions`_
> > > +BPF_JMP 0x5 64-bit jump operations `Arithmetic
> and jump
> > instructions`_
> > > +BPF_JMP32 0x6 32-bit jump operations `Arithmetic
> and jump
> > instructions`_
> > > +BPF_ALU64 0x7 64-bit arithmetic operations `Arithmetic
> and jump
> > instructions`_
> > > +========= ============= ===============================
> > > +===================================
> >
> > Hmm, I presonally think this is more confusing. The opcode field is 8
> bits. We
> > already specify that the value is the three LSB of the opcode field. It's
> > certainly subjective, but I think we should have the value reflect the
> actual
> > value in the field it's embedded in. In my opinion, changing the value to
> not
> > reflect its place in the actual opcode in my opinion imposes a burden on
> the
> > reader to go back and reference where the field actually belongs in the
> full
> > opcode. It's a tradeoff, but I think we're already on the winning end of
> that
> > tradeoff.
>
> This document is an IETF standards specification so it's worth looking at
> what
> typical RFC conventions are.
>
> * RFC 791 section 3.1 defines the IPv4 header, where the Version field is
> the high
> 4 bits of a byte. It defines the value as 4, not 0x40.
> It also defines the Type of Service bits which are 1 bit fields with
> value 0 or 1
> (not, say 0x40).
> * RFC 8200 section 3 defines the IPv6 header, where the Version field is the
> high
> 4 bits of a byte. It defines the value as 6, not 0x60.
I definitely believe we should follow IETF conventions, but from looking
at [0] it looks like we're already deviating.
[0]: https://www.rfc-editor.org/rfc/rfc791.txt
Here are some excerpts from what's there:
-------------------------------------------------------------------------------
3.1. Internet Header Format
A summary of the contents of the internet header follows:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Version| IHL |Type of Service| Total Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Identification |Flags| Fragment Offset |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Time to Live | Protocol | Header Checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Source Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Destination Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Options | Padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Example Internet Datagram Header
Figure 4.
Note that each tick mark represents one bit position.
Version: 4 bits
The Version field indicates the format of the internet header. This
document describes version 4.
[...]
Type of Service: 8 bits
The Type of Service provides an indication of the abstract
parameters of the quality of service desired. These parameters are
to be used to guide the selection of the actual service parameters
when transmitting a datagram through a particular network. Several
networks offer service precedence, which somehow treats high
precedence traffic as more important than other traffic (generally
by accepting only traffic above a certain precedence at time of high
load). The major choice is a three way tradeoff between low-delay,
high-reliability, and high-throughput.
Bits 0-2: Precedence.
Bit 3: 0 = Normal Delay, 1 = Low Delay.
Bits 4: 0 = Normal Throughput, 1 = High Throughput.
Bits 5: 0 = Normal Relibility, 1 = High Relibility.
Bit 6-7: Reserved for Future Use.
0 1 2 3 4 5 6 7
+-----+-----+-----+-----+-----+-----+-----+-----+
| | | | | | |
| PRECEDENCE | D | T | R | 0 | 0 |
| | | | | | |
+-----+-----+-----+-----+-----+-----+-----+-----+
Precedence
111 - Network Control
110 - Internetwork Control
101 - CRITIC/ECP
100 - Flash Override
011 - Flash
010 - Immediate
001 - Priority
000 - Routine
[...]
APPENDIX A: Examples & Scenarios
Example 1:
This is an example of the minimal data carrying internet datagram:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Ver= 4 |IHL= 5 |Type of Service| Total Length = 21 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Identification = 111 |Flg=0| Fragment Offset = 0 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Time = 123 | Protocol = 1 | header checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| source address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| destination address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| data |
+-+-+-+-+-+-+-+-+
-------------------------------------------------------------------------------
This is already pretty different from how we're visualizing and
enumerating the instructions in our document. Consider:
1. They're not even using numerical values to define some fields, such
as with Type of Service. They're specifying the exact values of
individual bits within the field (e.g. with Precedence).
2. They're using decimal instead of hexadecimal.
Unless I'm missing something, it seems like the deviation in terms of
using 0x40 vs. 0x4 is specific to how they present examples in the
appendices (though even the appendices are using base 10).
So while I certainly agree that we should follow conventions, I think
I'd prefer that we either follow them completely, or not sacrifice
readability by following them in specific ways which don't necessarily
match the chosen format for our document.
> Etc. Offhand I am not aware of any RFC that uses the convention you
> suggest,
> though perhaps others are?
I am not, no.
> > > Arithmetic and jump instructions
> > > ================================
> > > @@ -203,12 +203,12 @@ code source instruction class
> > > **source**
> > > the source operand location, which unless otherwise specified is one
> of:
> > >
> > > - ====== ===== ==============================================
> > > - source value description
> > > - ====== ===== ==============================================
> > > - BPF_K 0x00 use 32-bit 'imm' value as source operand
> > > - BPF_X 0x08 use 'src_reg' register value as source operand
> > > - ====== ===== ==============================================
> > > + ====== ============
> > > + ==============================================
> > > + source value(1 bit) description
> > > + ====== ============
> > ==============================================
> > > + BPF_K 0x0 use 32-bit 'imm' value as source operand
> > > + BPF_X 0x1 use 'src_reg' register value as source operand
> > > + ====== ============
> > > + ==============================================
> >
> > Same here as well. The value isn't really 0x1, it's 0x8. And 0x08 is even
> more
> > clear yet, given that we're representing the value of the bit in the 8 bit
> opcode
> > field.
>
> Its 1, in the same sense as the TOS bits in RFC 791 are 1.
>
> > > **instruction class**
> > > the instruction class (see `Instruction classes`_) @@ -221,27
> > > +221,27 @@ otherwise identical operations.
> > > The 'code' field encodes the operation as below, where 'src' and
> > > 'dst' refer to the values of the source and destination registers,
> respectively.
> > >
> > > -========= ===== =======
> > ==========================================================
> > > -code value offset description
> > > -========= ===== =======
> > ==========================================================
> > > -BPF_ADD 0x00 0 dst += src
> > > -BPF_SUB 0x10 0 dst -= src
> > > -BPF_MUL 0x20 0 dst \*= src
> > > -BPF_DIV 0x30 0 dst = (src != 0) ? (dst / src) : 0
> > > -BPF_SDIV 0x30 1 dst = (src != 0) ? (dst s/ src) : 0
> > > -BPF_OR 0x40 0 dst \|= src
> > > -BPF_AND 0x50 0 dst &= src
> > > -BPF_LSH 0x60 0 dst <<= (src & mask)
> > > -BPF_RSH 0x70 0 dst >>= (src & mask)
> > > -BPF_NEG 0x80 0 dst = -dst
> > > -BPF_MOD 0x90 0 dst = (src != 0) ? (dst % src) : dst
> > > -BPF_SMOD 0x90 1 dst = (src != 0) ? (dst s% src) : dst
> > > -BPF_XOR 0xa0 0 dst ^= src
> > > -BPF_MOV 0xb0 0 dst = src
> > > -BPF_MOVSX 0xb0 8/16/32 dst = (s8,s16,s32)src
> > > -BPF_ARSH 0xc0 0 :term:`sign extending<Sign Extend>` dst >>=
> (src &
> > mask)
> > > -BPF_END 0xd0 0 byte swap operations (see `Byte swap
> instructions`_
> > below)
> > >
> > > -========= ===== =======
> > > ==========================================================
> > > +========= ============= =======
> > ==========================================================
> > > +code value(4 bits) offset description
> > > +========= ============= =======
> > ==========================================================
> > > +BPF_ADD 0x0 0 dst += src
> > > +BPF_SUB 0x1 0 dst -= src
> > > +BPF_MUL 0x2 0 dst \*= src
> > > +BPF_DIV 0x3 0 dst = (src != 0) ? (dst / src) : 0
> > > +BPF_SDIV 0x3 1 dst = (src != 0) ? (dst s/ src) : 0
> > > +BPF_OR 0x4 0 dst \|= src
> > > +BPF_AND 0x5 0 dst &= src
> > > +BPF_LSH 0x6 0 dst <<= (src & mask)
> > > +BPF_RSH 0x7 0 dst >>= (src & mask)
> > > +BPF_NEG 0x8 0 dst = -dst
> > > +BPF_MOD 0x9 0 dst = (src != 0) ? (dst % src) : dst
> > > +BPF_SMOD 0x9 1 dst = (src != 0) ? (dst s% src) :
> dst
> > > +BPF_XOR 0xa 0 dst ^= src
> > > +BPF_MOV 0xb 0 dst = src
> > > +BPF_MOVSX 0xb 8/16/32 dst = (s8,s16,s32)src
> > > +BPF_ARSH 0xc 0 :term:`sign extending<Sign Extend>`
> dst >>=
> > (src & mask)
> > > +BPF_END 0xd 0 byte swap operations (see `Byte swap
> > instructions`_ below)
> > > +========= ============= =======
> > > +==========================================================
> >
> > Same here.
> >
> > > Underflow and overflow are allowed during arithmetic operations,
> > > meaning the 64-bit or 32-bit value will wrap. If BPF program
> > > execution would @@ -314,13 +314,13 @@ select what byte order the
> > > operation converts from or to. For ``BPF_ALU64``, the 1-bit source
> > > operand field in the opcode is reserved and must be set to 0.
> > >
> > > -========= ========= =====
> > =================================================
> > > -class source value description
> > > -========= ========= =====
> > =================================================
> > > -BPF_ALU BPF_TO_LE 0x00 convert between host byte order and little
> > endian
> > > -BPF_ALU BPF_TO_BE 0x08 convert between host byte order and big
> > endian
> > > -BPF_ALU64 Reserved 0x00 do byte swap unconditionally
> > > -========= ========= =====
> > > =================================================
> > > +========= ========= ============
> > =================================================
> > > +class source value(1 bit) description
> > > +========= ========= ============
> > =================================================
> > > +BPF_ALU BPF_TO_LE 0x0 convert between host byte order and
> little
> > endian
> > > +BPF_ALU BPF_TO_BE 0x1 convert between host byte order and
> big
> > endian
> > > +BPF_ALU64 Reserved 0x0 do byte swap unconditionally
> > > +========= ========= ============
> > > +=================================================
> >
> > Same here. Which bit does the 0x1 actually correspond to? It's
> self-evident in
> > the former, not the latter.
>
> Would you then say that RFC 791 (and many RFCs since) is not self-evident?
Not at all, I would say that they're presenting their information in a
very different way. Imagine if they presented Type of Service in this
way:
Type of Service: 8 bits
============ ===== ========== =========== ========
3 bits (MSB) 1 bit 1 bit 1 bit 2 bits
============ ===== ========== =========== ========
Precedence Delay Throughput Reliability Reserved
============ ===== ========== =========== ========
**Precedence**
The precedence with which traffic should be served.
**Delay**
The quality of service requirement for delay.
**Throughput**
The quality of service requirement for throughput.
**Reliability**
The quality of service requirement for reliability.
**Reserved**
Bits reserved for future use
Precedence types
----------------
...
========== =====
Precedence value
========== =====
Routine 0x0
Priority 0x1
Immediate 0x2
Flash 0x3
Flash Override 0x4
CRITIC/ECP 0x5
Internetwork Control 0x6
Network Control 0x7
Delay
-----
===== =====
Delay value
===== =====
Normal Delay 0x0
Low Delay 0x1
Throughput
----------
========== =====
Throughput value
========== =====
Normal Throughput 0x0
High Throughput 0x1
Reliability
-----
=========== =====
Reliability value
=========== =====
Normal Reliability 0x0
High Reliability 0x1
Reserved
--------
============== =====
Resrved value
============== =====
Reserved bit 0 0x0
Reserved bit 1 0x1
Reserved bit 2 0x2
Reserved bit 3 0x3
-----
This is of course in contrast to the much more concise way they defined
it above. If they did end up defining it this way, I think it would be
less clear to use the actual bit values rather than just the actual
value in the Type of Service mask. It's subjective, but I just don't
really see them as equivalent.
> If the WG chooses to diverge from the most common ways the IETF defines
> bit formats, that might be ok but may need a section explaining the
> divergent convention. My personal preference though is to stay consistent
> with the normal IETF convention, which part of the ISA doc already did.
I agree with you that we should stay consistent, but it seems like we're
being selective about it. Could you help me understand why the
deviations we have already wouldn't have required a separate section?
Thanks,
David
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
[-- Attachment #2: Type: text/plain, Size: 76 bytes --]
--
Bpf mailing list
Bpf@ietf.org
https://www.ietf.org/mailman/listinfo/bpf
next prev parent reply other threads:[~2024-01-09 19:10 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20240105031450.57681-2-aoyangfang@link.cuhk.edu.cn>
2024-01-09 17:32 ` [PATCH bpf-next] The original document has some inconsistency David Vernet
2024-01-09 17:32 ` [Bpf] " David Vernet
2024-01-09 18:06 ` dthaler1968
2024-01-09 18:06 ` [Bpf] " dthaler1968=40googlemail.com
2024-01-09 19:10 ` David Vernet [this message]
2024-01-09 19:10 ` David Vernet
2024-01-12 20:16 ` dthaler1968
2024-01-12 20:16 ` [Bpf] " dthaler1968=40googlemail.com
2024-01-18 22:54 ` David Vernet
2024-01-18 22:54 ` [Bpf] " David Vernet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240109191037.GC79024@maniforge \
--to=void@manifault.com \
--cc=aoyangfang@link.cuhk.edu.cn \
--cc=bpf@ietf.org \
--cc=bpf@vger.kernel.org \
--cc=dthaler1968@googlemail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox