[PATCH iwl-next v2 00/13] ice: use <linux/packing.h> for Tx and Rx queue context data

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH iwl-next v2 00/13] ice: use <linux/packing.h> for Tx and Rx queue context data
@ 2024-08-28 20:57 Jacob Keller
  2024-08-28 20:57 ` [PATCH iwl-next v2 01/13] lib: packing: refuse operating on bit indices which exceed size of buffer Jacob Keller
                   ` (12 more replies)
  0 siblings, 13 replies; 20+ messages in thread
From: Jacob Keller @ 2024-08-28 20:57 UTC (permalink / raw)
  To: Vladimir Oltean, netdev, Anthony Nguyen, Intel Wired LAN; +Cc: Przemek Kitszel

This series refactors the Tx and Rx queue context data packing in the ice
driver to use the <linux/packing.h> API. This is intended to aid in the
future implementation of unpacking context data for live migration. While
it strictly does save a few bytes in the module according to bloat-o-meter,
the savings is quite minimal (~200 bytes).

While initially implementing the logic, it was discovered that the packing
code does not work properly if the provided buffer size is not a multiple
of 4 bytes. This is also fixed by this series, along with some general
improvements to the API including the addition of pack() and unpack().

First, several improvements and cleanups to the packing API are made:

An additional sanity check is added to packing() to ensure that the
bit indices do not point outside the bounds of the provided buffer.

The packing() implementation is adjusted to work correctly when given
an arbitrary buffer size.

The duplicate kernel-doc for packing() is removed from the header
file.

The packing API gaves the pack() and unpack() wrappers which simplify
the use of packing, and enable better const correctness checking.

The pack() and unpack() implementations are separated and packing()
becomes a simple wrapper that selects between the two operations. This is
intended to improve the logic as several conditionals can be dropped.

A suite of KUnit tests for the packing API have been added, based on
simple selftests originally developed by Vladimir.

Additional unit tests for the packing library are added.

A bug in the QUIRK_MSB_ON_THE_RIGHT of the packing API is fixed, along with
additional unit tests covering this behavior.

Following the improvements to the packing API, the ice driver is
refactored:

The int_q_state field is removed from the ice_tlan_ctx structure and
packing definitions. This field is not used, and can't even be properly
packed or unpacked as its size exceeds 8 bytes.

The existing packing code based on ice_set_ctx() is ripped out and replaced
with an implementation based on pack().

The sizes of fields in ice_tlan_ctx and ice_rlan_ctx are updated to better
reflect their actual size.

The setting of the prefetch enable bit in the Rx context is moved into
ice_setup_rx_ctx() where the rest of the Rx context is configured. This is
less surprising than having the ice_write_rxq_ctx() always overwrite that
field of the context.

The copy_rxq_ctx_to_hw() and ice_write_rxq_ctx() are updated to align with
modern kernel style guidelines. This includes removing an unnecessary NULL
check, updating the kdoc descriptions to align with the check-kdoc, and
making ice_copy_rxq_ctx_to_hw() void. This ensures the style for these
functions will match the style for the similar functions needed to enable
live migration in a future series.

Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
Changes in v2:
- Fix netdev address
- Drop unnecessary fixes tag in prefetch enable patch
- Add a module description to the KUnit tests
- Mark the test cases array in packing_test.c static
- Prefer 'Return:' over 'Returns:' for kdoc
- Link to v1: https://lore.kernel.org/r/20240827-e810-live-migration-jk-prep-ctx-functions-v1-0-0442e2e42d32@intel.com

I opted to include the packing changes in this series to Intel Wired LAN,
because the changes are only required due to the new use with the ice
driver. I also do not believe any existing driver uses
QUIRK_MSB_ON_THE_RIGHT, so I don't think the fix for that quirk needs to go
through net or stable either.

The packing library changes were discussed here:
https://lore.kernel.org/netdev/a0338310-e66c-497c-bc1f-a597e50aa3ff@intel.com/

In addition, several of the changes are taken from Vladimir's github:
https://github.com/vladimiroltean/linux/tree/packing-selftests

Strictly, the changes for pack() and unpack() are not required by ice, but
Vladimir requested that we try to use the new API so that I am not adding
another driver which needs updating in the future. In addition, pack() and
unpack() enable const checking.

---
Jacob Keller (8):
      lib: packing: add KUnit tests adapted from selftests
      lib: packing: add additional KUnit tests
      lib: packing: fix QUIRK_MSB_ON_THE_RIGHT behavior
      ice: remove int_q_state from ice_tlan_ctx
      ice: use <linux/packing.h> for Tx and Rx queue context data
      ice: reduce size of queue context fields
      ice: move prefetch enable to ice_setup_rx_ctx
      ice: cleanup Rx queue context programming functions

Vladimir Oltean (5):
      lib: packing: refuse operating on bit indices which exceed size of buffer
      lib: packing: adjust definitions and implementation for arbitrary buffer lengths
      lib: packing: remove kernel-doc from header file
      lib: packing: add pack() and unpack() wrappers over packing()
      lib: packing: duplicate pack() and unpack() implementations

 drivers/net/ethernet/intel/ice/ice_common.h    |  13 +-
 drivers/net/ethernet/intel/ice/ice_lan_tx_rx.h |  33 +-
 include/linux/packing.h                        |  32 +-
 drivers/net/ethernet/intel/ice/ice_base.c      |   6 +-
 drivers/net/ethernet/intel/ice/ice_common.c    | 333 ++++++--------------
 lib/packing.c                                  | 304 ++++++++++++------
 lib/packing_test.c                             | 412 +++++++++++++++++++++++++
 Documentation/core-api/packing.rst             |  71 +++++
 MAINTAINERS                                    |   1 +
 lib/Kconfig                                    |  12 +
 lib/Makefile                                   |   1 +
 11 files changed, 840 insertions(+), 378 deletions(-)
---
base-commit: 2c163922de69983e6ccedeb5c00dec85b6a17283
change-id: 20240812-e810-live-migration-jk-prep-ctx-functions-d46db279b2c6

Best regards,
-- 
Jacob Keller <jacob.e.keller@intel.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH iwl-next v2 01/13] lib: packing: refuse operating on bit indices which exceed size of buffer
  2024-08-28 20:57 [PATCH iwl-next v2 00/13] ice: use <linux/packing.h> for Tx and Rx queue context data Jacob Keller
@ 2024-08-28 20:57 ` Jacob Keller
  2024-08-28 20:57 ` [PATCH iwl-next v2 02/13] lib: packing: adjust definitions and implementation for arbitrary buffer lengths Jacob Keller
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Jacob Keller @ 2024-08-28 20:57 UTC (permalink / raw)
  To: Vladimir Oltean, netdev, Anthony Nguyen, Intel Wired LAN; +Cc: Przemek Kitszel

From: Vladimir Oltean <vladimir.oltean@nxp.com>

While reworking the implementation, it became apparent that this check
does not exist.

There is no functional issue yet, because at call sites, "startbit" and
"endbit" are always hardcoded to correct values, and never come from the
user.

Even with the upcoming support of arbitrary buffer lengths, the
"startbit >= 8 * pbuflen" check will remain correct. This is because
we intend to always interpret the packed buffer in a way that avoids
discontinuities in the available bit indices.

Fixes: 554aae35007e ("lib: Add support for generic packing operations")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
---
 lib/packing.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/lib/packing.c b/lib/packing.c
index 3f656167c17e..439125286d2b 100644
--- a/lib/packing.c
+++ b/lib/packing.c
@@ -86,8 +86,10 @@ int packing(void *pbuf, u64 *uval, int startbit, int endbit, size_t pbuflen,
 	 */
 	int plogical_first_u8, plogical_last_u8, box;
 
-	/* startbit is expected to be larger than endbit */
-	if (startbit < endbit)
+	/* startbit is expected to be larger than endbit, and both are
+	 * expected to be within the logically addressable range of the buffer.
+	 */
+	if (unlikely(startbit < endbit || startbit >= 8 * pbuflen || endbit < 0))
 		/* Invalid function call */
 		return -EINVAL;
 

-- 
2.46.0.124.g2dc1a81c8933


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH iwl-next v2 02/13] lib: packing: adjust definitions and implementation for arbitrary buffer lengths
  2024-08-28 20:57 [PATCH iwl-next v2 00/13] ice: use <linux/packing.h> for Tx and Rx queue context data Jacob Keller
  2024-08-28 20:57 ` [PATCH iwl-next v2 01/13] lib: packing: refuse operating on bit indices which exceed size of buffer Jacob Keller
@ 2024-08-28 20:57 ` Jacob Keller
  2024-08-28 20:57 ` [PATCH iwl-next v2 03/13] lib: packing: remove kernel-doc from header file Jacob Keller
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Jacob Keller @ 2024-08-28 20:57 UTC (permalink / raw)
  To: Vladimir Oltean, netdev, Anthony Nguyen, Intel Wired LAN; +Cc: Przemek Kitszel

From: Vladimir Oltean <vladimir.oltean@nxp.com>

Jacob Keller has a use case for packing() in the intel/ice networking
driver, but it cannot be used as-is.

Simply put, the API quirks for LSW32_IS_FIRST and LITTLE_ENDIAN are
naively implemented with the undocumented assumption that the buffer
length must be a multiple of 4. All calculations of group offsets and
offsets of bytes within groups assume that this is the case. But in the
ice case, this does not hold true. For example, packing into a buffer
of 22 bytes would yield wrong results, but pretending it was a 24 byte
buffer would work.

Rather than requiring such hacks, and leaving a big question mark when
it comes to discontinuities in the accessible bit fields of such buffer,
we should extend the packing API to support this use case.

It turns out that we can keep the design in terms of groups of 4 bytes,
but also make it work if the total length is not a multiple of 4.
Just like before, imagine the buffer as a big number, and its most
significant bytes (the ones that would make up to a multiple of 4) are
missing. Thus, with a big endian (no quirks) interpretation of the
buffer, those most significant bytes would be absent from the beginning
of the buffer, and with a LSW32_IS_FIRST interpretation, they would be
absent from the end of the buffer. The LITTLE_ENDIAN quirk, in the
packing() API world, only affects byte ordering within groups of 4.
Thus, it does not change which bytes are missing. Only the significance
of the remaining bytes within the (smaller) group.

No change intended for buffer sizes which are multiples of 4. Tested
with the sja1105 driver and with downstream unit tests.

Fixes: 554aae35007e ("lib: Add support for generic packing operations")
Link: https://lore.kernel.org/netdev/a0338310-e66c-497c-bc1f-a597e50aa3ff@intel.com/
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
---
 lib/packing.c                      | 70 ++++++++++++++++++++++---------------
 Documentation/core-api/packing.rst | 71 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 114 insertions(+), 27 deletions(-)

diff --git a/lib/packing.c b/lib/packing.c
index 439125286d2b..435236a914fe 100644
--- a/lib/packing.c
+++ b/lib/packing.c
@@ -9,27 +9,6 @@
 #include <linux/types.h>
 #include <linux/bitrev.h>
 
-static int get_le_offset(int offset)
-{
-	int closest_multiple_of_4;
-
-	closest_multiple_of_4 = (offset / 4) * 4;
-	offset -= closest_multiple_of_4;
-	return closest_multiple_of_4 + (3 - offset);
-}
-
-static int get_reverse_lsw32_offset(int offset, size_t len)
-{
-	int closest_multiple_of_4;
-	int word_index;
-
-	word_index = offset / 4;
-	closest_multiple_of_4 = word_index * 4;
-	offset -= closest_multiple_of_4;
-	word_index = (len / 4) - word_index - 1;
-	return word_index * 4 + offset;
-}
-
 static void adjust_for_msb_right_quirk(u64 *to_write, int *box_start_bit,
 				       int *box_end_bit, u8 *box_mask)
 {
@@ -47,6 +26,48 @@ static void adjust_for_msb_right_quirk(u64 *to_write, int *box_start_bit,
 	*box_end_bit   = new_box_end_bit;
 }
 
+/**
+ * calculate_box_addr - Determine physical location of byte in buffer
+ * @box: Index of byte within buffer seen as a logical big-endian big number
+ * @len: Size of buffer in bytes
+ * @quirks: mask of QUIRK_LSW32_IS_FIRST and QUIRK_LITTLE_ENDIAN
+ *
+ * Function interprets the buffer as a @len byte sized big number, and returns
+ * the physical offset of the @box logical octet within it. Internally, it
+ * treats the big number as groups of 4 bytes. If @len is not a multiple of 4,
+ * the last group may be shorter.
+ *
+ * @QUIRK_LSW32_IS_FIRST gives the ordering of groups of 4 octets relative to
+ * each other. If set, the most significant group of 4 octets is last in the
+ * buffer (and may be truncated if @len is not a multiple of 4).
+ *
+ * @QUIRK_LITTLE_ENDIAN gives the ordering of bytes within each group of 4.
+ * If set, the most significant byte is last in the group. If @len takes the
+ * form of 4k+3, the last group will only be able to represent 24 bits, and its
+ * most significant octet is byte 2.
+ *
+ * Return: the physical offset into the buffer corresponding to the logical box.
+ */
+static int calculate_box_addr(int box, size_t len, u8 quirks)
+{
+	size_t offset_of_group, offset_in_group, this_group = box / 4;
+	size_t group_size;
+
+	if (quirks & QUIRK_LSW32_IS_FIRST)
+		offset_of_group = this_group * 4;
+	else
+		offset_of_group = len - ((this_group + 1) * 4);
+
+	group_size = min(4, len - offset_of_group);
+
+	if (quirks & QUIRK_LITTLE_ENDIAN)
+		offset_in_group = box - this_group * 4;
+	else
+		offset_in_group = group_size - (box - this_group * 4) - 1;
+
+	return offset_of_group + offset_in_group;
+}
+
 /**
  * packing - Convert numbers (currently u64) between a packed and an unpacked
  *	     format. Unpacked means laid out in memory in the CPU's native
@@ -157,12 +178,7 @@ int packing(void *pbuf, u64 *uval, int startbit, int endbit, size_t pbuflen,
 		 * effective addressing inside the pbuf (so it's not
 		 * logical any longer).
 		 */
-		box_addr = pbuflen - box - 1;
-		if (quirks & QUIRK_LITTLE_ENDIAN)
-			box_addr = get_le_offset(box_addr);
-		if (quirks & QUIRK_LSW32_IS_FIRST)
-			box_addr = get_reverse_lsw32_offset(box_addr,
-							    pbuflen);
+		box_addr = calculate_box_addr(box, pbuflen, quirks);
 
 		if (op == UNPACK) {
 			u64 pval;
diff --git a/Documentation/core-api/packing.rst b/Documentation/core-api/packing.rst
index 3ed13bc9a195..821691f23c54 100644
--- a/Documentation/core-api/packing.rst
+++ b/Documentation/core-api/packing.rst
@@ -151,6 +151,77 @@ the more significant 4-byte word.
 We always think of our offsets as if there were no quirk, and we translate
 them afterwards, before accessing the memory region.
 
+Note on buffer lengths not multiple of 4
+----------------------------------------
+
+To deal with memory layout quirks where groups of 4 bytes are laid out "little
+endian" relative to each other, but "big endian" within the group itself, the
+concept of groups of 4 bytes is intrinsic to the packing API (not to be
+confused with the memory access, which is performed byte by byte, though).
+
+With buffer lengths not multiple of 4, this means one group will be incomplete.
+Depending on the quirks, this may lead to discontinuities in the bit fields
+accessible through the buffer. The packing API assumes discontinuities were not
+the intention of the memory layout, so it avoids them by effectively logically
+shortening the most significant group of 4 octets to the number of octets
+actually available.
+
+Example with a 31 byte sized buffer given below. Physical buffer offsets are
+implicit, and increase from left to right within a group, and from top to
+bottom within a column.
+
+No quirks:
+
+::
+
+            31         29         28        |   Group 7 (most significant)
+ 27         26         25         24        |   Group 6
+ 23         22         21         20        |   Group 5
+ 19         18         17         16        |   Group 4
+ 15         14         13         12        |   Group 3
+ 11         10          9          8        |   Group 2
+  7          6          5          4        |   Group 1
+  3          2          1          0        |   Group 0 (least significant)
+
+QUIRK_LSW32_IS_FIRST:
+
+::
+
+  3          2          1          0        |   Group 0 (least significant)
+  7          6          5          4        |   Group 1
+ 11         10          9          8        |   Group 2
+ 15         14         13         12        |   Group 3
+ 19         18         17         16        |   Group 4
+ 23         22         21         20        |   Group 5
+ 27         26         25         24        |   Group 6
+ 30         29         28                   |   Group 7 (most significant)
+
+QUIRK_LITTLE_ENDIAN:
+
+::
+
+            30         28         29        |   Group 7 (most significant)
+ 24         25         26         27        |   Group 6
+ 20         21         22         23        |   Group 5
+ 16         17         18         19        |   Group 4
+ 12         13         14         15        |   Group 3
+  8          9         10         11        |   Group 2
+  4          5          6          7        |   Group 1
+  0          1          2          3        |   Group 0 (least significant)
+
+QUIRK_LITTLE_ENDIAN | QUIRK_LSW32_IS_FIRST:
+
+::
+
+  0          1          2          3        |   Group 0 (least significant)
+  4          5          6          7        |   Group 1
+  8          9         10         11        |   Group 2
+ 12         13         14         15        |   Group 3
+ 16         17         18         19        |   Group 4
+ 20         21         22         23        |   Group 5
+ 24         25         26         27        |   Group 6
+ 28         29         30                   |   Group 7 (most significant)
+
 Intended use
 ------------
 

-- 
2.46.0.124.g2dc1a81c8933


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH iwl-next v2 03/13] lib: packing: remove kernel-doc from header file
  2024-08-28 20:57 [PATCH iwl-next v2 00/13] ice: use <linux/packing.h> for Tx and Rx queue context data Jacob Keller
  2024-08-28 20:57 ` [PATCH iwl-next v2 01/13] lib: packing: refuse operating on bit indices which exceed size of buffer Jacob Keller
  2024-08-28 20:57 ` [PATCH iwl-next v2 02/13] lib: packing: adjust definitions and implementation for arbitrary buffer lengths Jacob Keller
@ 2024-08-28 20:57 ` Jacob Keller
  2024-08-28 20:57 ` [PATCH iwl-next v2 04/13] lib: packing: add pack() and unpack() wrappers over packing() Jacob Keller
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Jacob Keller @ 2024-08-28 20:57 UTC (permalink / raw)
  To: Vladimir Oltean, netdev, Anthony Nguyen, Intel Wired LAN; +Cc: Przemek Kitszel

From: Vladimir Oltean <vladimir.oltean@nxp.com>

It is not necessary to have the kernel-doc duplicated both in the
header and in the implementation. It is better to have it near the
implementation of the function, since in C, a function can have N
declarations, but only one definition.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
---
 include/linux/packing.h | 26 --------------------------
 1 file changed, 26 deletions(-)

diff --git a/include/linux/packing.h b/include/linux/packing.h
index 8d6571feb95d..69baefebcd02 100644
--- a/include/linux/packing.h
+++ b/include/linux/packing.h
@@ -17,32 +17,6 @@ enum packing_op {
 	UNPACK,
 };
 
-/**
- * packing - Convert numbers (currently u64) between a packed and an unpacked
- *	     format. Unpacked means laid out in memory in the CPU's native
- *	     understanding of integers, while packed means anything else that
- *	     requires translation.
- *
- * @pbuf: Pointer to a buffer holding the packed value.
- * @uval: Pointer to an u64 holding the unpacked value.
- * @startbit: The index (in logical notation, compensated for quirks) where
- *	      the packed value starts within pbuf. Must be larger than, or
- *	      equal to, endbit.
- * @endbit: The index (in logical notation, compensated for quirks) where
- *	    the packed value ends within pbuf. Must be smaller than, or equal
- *	    to, startbit.
- * @op: If PACK, then uval will be treated as const pointer and copied (packed)
- *	into pbuf, between startbit and endbit.
- *	If UNPACK, then pbuf will be treated as const pointer and the logical
- *	value between startbit and endbit will be copied (unpacked) to uval.
- * @quirks: A bit mask of QUIRK_LITTLE_ENDIAN, QUIRK_LSW32_IS_FIRST and
- *	    QUIRK_MSB_ON_THE_RIGHT.
- *
- * Return: 0 on success, EINVAL or ERANGE if called incorrectly. Assuming
- *	   correct usage, return code may be discarded.
- *	   If op is PACK, pbuf is modified.
- *	   If op is UNPACK, uval is modified.
- */
 int packing(void *pbuf, u64 *uval, int startbit, int endbit, size_t pbuflen,
 	    enum packing_op op, u8 quirks);
 

-- 
2.46.0.124.g2dc1a81c8933


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH iwl-next v2 04/13] lib: packing: add pack() and unpack() wrappers over packing()
  2024-08-28 20:57 [PATCH iwl-next v2 00/13] ice: use <linux/packing.h> for Tx and Rx queue context data Jacob Keller
                   ` (2 preceding siblings ...)
  2024-08-28 20:57 ` [PATCH iwl-next v2 03/13] lib: packing: remove kernel-doc from header file Jacob Keller
@ 2024-08-28 20:57 ` Jacob Keller
  2024-08-28 20:57 ` [PATCH iwl-next v2 05/13] lib: packing: duplicate pack() and unpack() implementations Jacob Keller
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Jacob Keller @ 2024-08-28 20:57 UTC (permalink / raw)
  To: Vladimir Oltean, netdev, Anthony Nguyen, Intel Wired LAN; +Cc: Przemek Kitszel

From: Vladimir Oltean <vladimir.oltean@nxp.com>

Geert Uytterhoeven described packing() as "really bad API" because of
not being able to enforce const correctness. The same function is used
both when "pbuf" is input and "uval" is output, as in the other way
around.

Create 2 wrapper functions where const correctness can be ensured.
Do ugly type casts inside, to be able to reuse packing() as currently
implemented - which will _not_ modify the input argument.

Also, take the opportunity to change the type of startbit and endbit to
size_t - an unsigned type - in these new function prototypes. When int,
an extra check for negative values is necessary. Hopefully, when
packing() goes away completely, that check can be dropped.

My concern is that code which does rely on the conditional directionality
of packing() is harder to refactor without blowing up in size. So it may
take a while to completely eliminate packing(). But let's make alternatives
available for those who do not need that.

Link: https://lore.kernel.org/netdev/20210223112003.2223332-1-geert+renesas@glider.be/
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
---
 include/linux/packing.h |  6 ++++++
 lib/packing.c           | 54 +++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 60 insertions(+)

diff --git a/include/linux/packing.h b/include/linux/packing.h
index 69baefebcd02..ea25cb93cc70 100644
--- a/include/linux/packing.h
+++ b/include/linux/packing.h
@@ -20,4 +20,10 @@ enum packing_op {
 int packing(void *pbuf, u64 *uval, int startbit, int endbit, size_t pbuflen,
 	    enum packing_op op, u8 quirks);
 
+int pack(void *pbuf, const u64 *uval, size_t startbit, size_t endbit,
+	 size_t pbuflen, u8 quirks);
+
+int unpack(const void *pbuf, u64 *uval, size_t startbit, size_t endbit,
+	   size_t pbuflen, u8 quirks);
+
 #endif
diff --git a/lib/packing.c b/lib/packing.c
index 435236a914fe..2922db8a528c 100644
--- a/lib/packing.c
+++ b/lib/packing.c
@@ -90,6 +90,8 @@ static int calculate_box_addr(int box, size_t len, u8 quirks)
  * @quirks: A bit mask of QUIRK_LITTLE_ENDIAN, QUIRK_LSW32_IS_FIRST and
  *	    QUIRK_MSB_ON_THE_RIGHT.
  *
+ * Note: this is deprecated, prefer to use pack() or unpack() in new code.
+ *
  * Return: 0 on success, EINVAL or ERANGE if called incorrectly. Assuming
  *	   correct usage, return code may be discarded.
  *	   If op is PACK, pbuf is modified.
@@ -216,4 +218,56 @@ int packing(void *pbuf, u64 *uval, int startbit, int endbit, size_t pbuflen,
 }
 EXPORT_SYMBOL(packing);
 
+/**
+ * pack - Pack u64 number into bitfield of buffer.
+ *
+ * @pbuf: Pointer to a buffer holding the packed value.
+ * @uval: Pointer to an u64 holding the unpacked value.
+ * @startbit: The index (in logical notation, compensated for quirks) where
+ *	      the packed value starts within pbuf. Must be larger than, or
+ *	      equal to, endbit.
+ * @endbit: The index (in logical notation, compensated for quirks) where
+ *	    the packed value ends within pbuf. Must be smaller than, or equal
+ *	    to, startbit.
+ * @pbuflen: The length in bytes of the packed buffer pointed to by @pbuf.
+ * @quirks: A bit mask of QUIRK_LITTLE_ENDIAN, QUIRK_LSW32_IS_FIRST and
+ *	    QUIRK_MSB_ON_THE_RIGHT.
+ *
+ * Return: 0 on success, EINVAL or ERANGE if called incorrectly. Assuming
+ *	   correct usage, return code may be discarded. The @pbuf memory will
+ *	   be modified on success.
+ */
+int pack(void *pbuf, const u64 *uval, size_t startbit, size_t endbit,
+	 size_t pbuflen, u8 quirks)
+{
+	return packing(pbuf, (u64 *)uval, startbit, endbit, pbuflen, PACK, quirks);
+}
+EXPORT_SYMBOL(pack);
+
+/**
+ * unpack - Unpack u64 number from packed buffer.
+ *
+ * @pbuf: Pointer to a buffer holding the packed value.
+ * @uval: Pointer to an u64 holding the unpacked value.
+ * @startbit: The index (in logical notation, compensated for quirks) where
+ *	      the packed value starts within pbuf. Must be larger than, or
+ *	      equal to, endbit.
+ * @endbit: The index (in logical notation, compensated for quirks) where
+ *	    the packed value ends within pbuf. Must be smaller than, or equal
+ *	    to, startbit.
+ * @pbuflen: The length in bytes of the packed buffer pointed to by @pbuf.
+ * @quirks: A bit mask of QUIRK_LITTLE_ENDIAN, QUIRK_LSW32_IS_FIRST and
+ *	    QUIRK_MSB_ON_THE_RIGHT.
+ *
+ * Return: 0 on success, EINVAL or ERANGE if called incorrectly. Assuming
+ *	   correct usage, return code may be discarded. The @uval will be
+ *	   modified on success.
+ */
+int unpack(const void *pbuf, u64 *uval, size_t startbit, size_t endbit,
+	   size_t pbuflen, u8 quirks)
+{
+	return packing((void *)pbuf, uval, startbit, endbit, pbuflen, UNPACK, quirks);
+}
+EXPORT_SYMBOL(unpack);
+
 MODULE_DESCRIPTION("Generic bitfield packing and unpacking");

-- 
2.46.0.124.g2dc1a81c8933


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH iwl-next v2 05/13] lib: packing: duplicate pack() and unpack() implementations
  2024-08-28 20:57 [PATCH iwl-next v2 00/13] ice: use <linux/packing.h> for Tx and Rx queue context data Jacob Keller
                   ` (3 preceding siblings ...)
  2024-08-28 20:57 ` [PATCH iwl-next v2 04/13] lib: packing: add pack() and unpack() wrappers over packing() Jacob Keller
@ 2024-08-28 20:57 ` Jacob Keller
  2024-08-28 20:57 ` [PATCH iwl-next v2 06/13] lib: packing: add KUnit tests adapted from selftests Jacob Keller
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Jacob Keller @ 2024-08-28 20:57 UTC (permalink / raw)
  To: Vladimir Oltean, netdev, Anthony Nguyen, Intel Wired LAN; +Cc: Przemek Kitszel

From: Vladimir Oltean <vladimir.oltean@nxp.com>

packing() is now used in some hot paths, and it would be good to get rid
of some ifs and buts that depend on "op", to speed things up a little bit.

With the main implementations now taking size_t endbit, we no longer
have to check for negative values. Update the local integer variables to
also be size_t to match.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
---
 include/linux/packing.h |   4 +-
 lib/packing.c           | 243 ++++++++++++++++++++++++++++++------------------
 2 files changed, 154 insertions(+), 93 deletions(-)

diff --git a/include/linux/packing.h b/include/linux/packing.h
index ea25cb93cc70..5d36dcd06f60 100644
--- a/include/linux/packing.h
+++ b/include/linux/packing.h
@@ -20,8 +20,8 @@ enum packing_op {
 int packing(void *pbuf, u64 *uval, int startbit, int endbit, size_t pbuflen,
 	    enum packing_op op, u8 quirks);
 
-int pack(void *pbuf, const u64 *uval, size_t startbit, size_t endbit,
-	 size_t pbuflen, u8 quirks);
+int pack(void *pbuf, u64 uval, size_t startbit, size_t endbit, size_t pbuflen,
+	 u8 quirks);
 
 int unpack(const void *pbuf, u64 *uval, size_t startbit, size_t endbit,
 	   size_t pbuflen, u8 quirks);
diff --git a/lib/packing.c b/lib/packing.c
index 2922db8a528c..500184530d07 100644
--- a/lib/packing.c
+++ b/lib/packing.c
@@ -9,11 +9,11 @@
 #include <linux/types.h>
 #include <linux/bitrev.h>
 
-static void adjust_for_msb_right_quirk(u64 *to_write, int *box_start_bit,
-				       int *box_end_bit, u8 *box_mask)
+static void adjust_for_msb_right_quirk(u64 *to_write, size_t *box_start_bit,
+				       size_t *box_end_bit, u8 *box_mask)
 {
-	int box_bit_width = *box_start_bit - *box_end_bit + 1;
-	int new_box_start_bit, new_box_end_bit;
+	size_t box_bit_width = *box_start_bit - *box_end_bit + 1;
+	size_t new_box_start_bit, new_box_end_bit;
 
 	*to_write >>= *box_end_bit;
 	*to_write = bitrev8(*to_write) >> (8 - box_bit_width);
@@ -48,7 +48,7 @@ static void adjust_for_msb_right_quirk(u64 *to_write, int *box_start_bit,
  *
  * Return: the physical offset into the buffer corresponding to the logical box.
  */
-static int calculate_box_addr(int box, size_t len, u8 quirks)
+static size_t calculate_box_addr(size_t box, size_t len, u8 quirks)
 {
 	size_t offset_of_group, offset_in_group, this_group = box / 4;
 	size_t group_size;
@@ -69,13 +69,10 @@ static int calculate_box_addr(int box, size_t len, u8 quirks)
 }
 
 /**
- * packing - Convert numbers (currently u64) between a packed and an unpacked
- *	     format. Unpacked means laid out in memory in the CPU's native
- *	     understanding of integers, while packed means anything else that
- *	     requires translation.
+ * pack - Pack u64 number into bitfield of buffer.
  *
  * @pbuf: Pointer to a buffer holding the packed value.
- * @uval: Pointer to an u64 holding the unpacked value.
+ * @uval: CPU-readable unpacked value to pack.
  * @startbit: The index (in logical notation, compensated for quirks) where
  *	      the packed value starts within pbuf. Must be larger than, or
  *	      equal to, endbit.
@@ -83,58 +80,45 @@ static int calculate_box_addr(int box, size_t len, u8 quirks)
  *	    the packed value ends within pbuf. Must be smaller than, or equal
  *	    to, startbit.
  * @pbuflen: The length in bytes of the packed buffer pointed to by @pbuf.
- * @op: If PACK, then uval will be treated as const pointer and copied (packed)
- *	into pbuf, between startbit and endbit.
- *	If UNPACK, then pbuf will be treated as const pointer and the logical
- *	value between startbit and endbit will be copied (unpacked) to uval.
  * @quirks: A bit mask of QUIRK_LITTLE_ENDIAN, QUIRK_LSW32_IS_FIRST and
  *	    QUIRK_MSB_ON_THE_RIGHT.
  *
- * Note: this is deprecated, prefer to use pack() or unpack() in new code.
- *
  * Return: 0 on success, EINVAL or ERANGE if called incorrectly. Assuming
- *	   correct usage, return code may be discarded.
- *	   If op is PACK, pbuf is modified.
- *	   If op is UNPACK, uval is modified.
+ *	   correct usage, return code may be discarded. The @pbuf memory will
+ *	   be modified on success.
  */
-int packing(void *pbuf, u64 *uval, int startbit, int endbit, size_t pbuflen,
-	    enum packing_op op, u8 quirks)
+int pack(void *pbuf, u64 uval, size_t startbit, size_t endbit, size_t pbuflen,
+	 u8 quirks)
 {
-	/* Number of bits for storing "uval"
-	 * also width of the field to access in the pbuf
-	 */
-	u64 value_width;
 	/* Logical byte indices corresponding to the
 	 * start and end of the field.
 	 */
 	int plogical_first_u8, plogical_last_u8, box;
+	/* width of the field to access in the pbuf */
+	u64 value_width;
 
 	/* startbit is expected to be larger than endbit, and both are
 	 * expected to be within the logically addressable range of the buffer.
 	 */
-	if (unlikely(startbit < endbit || startbit >= 8 * pbuflen || endbit < 0))
+	if (unlikely(startbit < endbit || startbit >= 8 * pbuflen))
 		/* Invalid function call */
 		return -EINVAL;
 
 	value_width = startbit - endbit + 1;
-	if (value_width > 64)
+	if (unlikely(value_width > 64))
 		return -ERANGE;
 
 	/* Check if "uval" fits in "value_width" bits.
 	 * If value_width is 64, the check will fail, but any
 	 * 64-bit uval will surely fit.
 	 */
-	if (op == PACK && value_width < 64 && (*uval >= (1ull << value_width)))
+	if (unlikely(value_width < 64 && uval >= (1ull << value_width)))
 		/* Cannot store "uval" inside "value_width" bits.
 		 * Truncating "uval" is most certainly not desirable,
 		 * so simply erroring out is appropriate.
 		 */
 		return -ERANGE;
 
-	/* Initialize parameter */
-	if (op == UNPACK)
-		*uval = 0;
-
 	/* Iterate through an idealistic view of the pbuf as an u64 with
 	 * no quirks, u8 by u8 (aligned at u8 boundaries), from high to low
 	 * logical bit significance. "box" denotes the current logical u8.
@@ -144,11 +128,12 @@ int packing(void *pbuf, u64 *uval, int startbit, int endbit, size_t pbuflen,
 
 	for (box = plogical_first_u8; box >= plogical_last_u8; box--) {
 		/* Bit indices into the currently accessed 8-bit box */
-		int box_start_bit, box_end_bit, box_addr;
+		size_t box_start_bit, box_end_bit, box_addr;
 		u8  box_mask;
 		/* Corresponding bits from the unpacked u64 parameter */
-		int proj_start_bit, proj_end_bit;
+		size_t proj_start_bit, proj_end_bit;
 		u64 proj_mask;
+		u64 pval;
 
 		/* This u8 may need to be accessed in its entirety
 		 * (from bit 7 to bit 0), or not, depending on the
@@ -182,66 +167,21 @@ int packing(void *pbuf, u64 *uval, int startbit, int endbit, size_t pbuflen,
 		 */
 		box_addr = calculate_box_addr(box, pbuflen, quirks);
 
-		if (op == UNPACK) {
-			u64 pval;
+		/* Write to pbuf, read from uval */
+		pval = uval & proj_mask;
+		pval >>= proj_end_bit;
+		if (quirks & QUIRK_MSB_ON_THE_RIGHT)
+			adjust_for_msb_right_quirk(&pval,
+						   &box_start_bit,
+						   &box_end_bit,
+						   &box_mask);
 
-			/* Read from pbuf, write to uval */
-			pval = ((u8 *)pbuf)[box_addr] & box_mask;
-			if (quirks & QUIRK_MSB_ON_THE_RIGHT)
-				adjust_for_msb_right_quirk(&pval,
-							   &box_start_bit,
-							   &box_end_bit,
-							   &box_mask);
-
-			pval >>= box_end_bit;
-			pval <<= proj_end_bit;
-			*uval &= ~proj_mask;
-			*uval |= pval;
-		} else {
-			u64 pval;
-
-			/* Write to pbuf, read from uval */
-			pval = (*uval) & proj_mask;
-			pval >>= proj_end_bit;
-			if (quirks & QUIRK_MSB_ON_THE_RIGHT)
-				adjust_for_msb_right_quirk(&pval,
-							   &box_start_bit,
-							   &box_end_bit,
-							   &box_mask);
-
-			pval <<= box_end_bit;
-			((u8 *)pbuf)[box_addr] &= ~box_mask;
-			((u8 *)pbuf)[box_addr] |= pval;
-		}
+		pval <<= box_end_bit;
+		((u8 *)pbuf)[box_addr] &= ~box_mask;
+		((u8 *)pbuf)[box_addr] |= pval;
 	}
 	return 0;
 }
-EXPORT_SYMBOL(packing);
-
-/**
- * pack - Pack u64 number into bitfield of buffer.
- *
- * @pbuf: Pointer to a buffer holding the packed value.
- * @uval: Pointer to an u64 holding the unpacked value.
- * @startbit: The index (in logical notation, compensated for quirks) where
- *	      the packed value starts within pbuf. Must be larger than, or
- *	      equal to, endbit.
- * @endbit: The index (in logical notation, compensated for quirks) where
- *	    the packed value ends within pbuf. Must be smaller than, or equal
- *	    to, startbit.
- * @pbuflen: The length in bytes of the packed buffer pointed to by @pbuf.
- * @quirks: A bit mask of QUIRK_LITTLE_ENDIAN, QUIRK_LSW32_IS_FIRST and
- *	    QUIRK_MSB_ON_THE_RIGHT.
- *
- * Return: 0 on success, EINVAL or ERANGE if called incorrectly. Assuming
- *	   correct usage, return code may be discarded. The @pbuf memory will
- *	   be modified on success.
- */
-int pack(void *pbuf, const u64 *uval, size_t startbit, size_t endbit,
-	 size_t pbuflen, u8 quirks)
-{
-	return packing(pbuf, (u64 *)uval, startbit, endbit, pbuflen, PACK, quirks);
-}
 EXPORT_SYMBOL(pack);
 
 /**
@@ -266,8 +206,129 @@ EXPORT_SYMBOL(pack);
 int unpack(const void *pbuf, u64 *uval, size_t startbit, size_t endbit,
 	   size_t pbuflen, u8 quirks)
 {
-	return packing((void *)pbuf, uval, startbit, endbit, pbuflen, UNPACK, quirks);
+	/* Logical byte indices corresponding to the
+	 * start and end of the field.
+	 */
+	int plogical_first_u8, plogical_last_u8, box;
+	/* width of the field to access in the pbuf */
+	u64 value_width;
+
+	/* startbit is expected to be larger than endbit, and both are
+	 * expected to be within the logically addressable range of the buffer.
+	 */
+	if (unlikely(startbit < endbit || startbit >= 8 * pbuflen))
+		/* Invalid function call */
+		return -EINVAL;
+
+	value_width = startbit - endbit + 1;
+	if (unlikely(value_width > 64))
+		return -ERANGE;
+
+	/* Initialize parameter */
+	*uval = 0;
+
+	/* Iterate through an idealistic view of the pbuf as an u64 with
+	 * no quirks, u8 by u8 (aligned at u8 boundaries), from high to low
+	 * logical bit significance. "box" denotes the current logical u8.
+	 */
+	plogical_first_u8 = startbit / 8;
+	plogical_last_u8  = endbit / 8;
+
+	for (box = plogical_first_u8; box >= plogical_last_u8; box--) {
+		/* Bit indices into the currently accessed 8-bit box */
+		size_t box_start_bit, box_end_bit, box_addr;
+		u8  box_mask;
+		/* Corresponding bits from the unpacked u64 parameter */
+		size_t proj_start_bit, proj_end_bit;
+		u64 proj_mask;
+		u64 pval;
+
+		/* This u8 may need to be accessed in its entirety
+		 * (from bit 7 to bit 0), or not, depending on the
+		 * input arguments startbit and endbit.
+		 */
+		if (box == plogical_first_u8)
+			box_start_bit = startbit % 8;
+		else
+			box_start_bit = 7;
+		if (box == plogical_last_u8)
+			box_end_bit = endbit % 8;
+		else
+			box_end_bit = 0;
+
+		/* We have determined the box bit start and end.
+		 * Now we calculate where this (masked) u8 box would fit
+		 * in the unpacked (CPU-readable) u64 - the u8 box's
+		 * projection onto the unpacked u64. Though the
+		 * box is u8, the projection is u64 because it may fall
+		 * anywhere within the unpacked u64.
+		 */
+		proj_start_bit = ((box * 8) + box_start_bit) - endbit;
+		proj_end_bit   = ((box * 8) + box_end_bit) - endbit;
+		proj_mask = GENMASK_ULL(proj_start_bit, proj_end_bit);
+		box_mask  = GENMASK_ULL(box_start_bit, box_end_bit);
+
+		/* Determine the offset of the u8 box inside the pbuf,
+		 * adjusted for quirks. The adjusted box_addr will be used for
+		 * effective addressing inside the pbuf (so it's not
+		 * logical any longer).
+		 */
+		box_addr = calculate_box_addr(box, pbuflen, quirks);
+
+		/* Read from pbuf, write to uval */
+		pval = ((u8 *)pbuf)[box_addr] & box_mask;
+		if (quirks & QUIRK_MSB_ON_THE_RIGHT)
+			adjust_for_msb_right_quirk(&pval,
+						   &box_start_bit,
+						   &box_end_bit,
+						   &box_mask);
+
+		pval >>= box_end_bit;
+		pval <<= proj_end_bit;
+		*uval &= ~proj_mask;
+		*uval |= pval;
+	}
+	return 0;
 }
 EXPORT_SYMBOL(unpack);
 
+/**
+ * packing - Convert numbers (currently u64) between a packed and an unpacked
+ *	     format. Unpacked means laid out in memory in the CPU's native
+ *	     understanding of integers, while packed means anything else that
+ *	     requires translation.
+ *
+ * @pbuf: Pointer to a buffer holding the packed value.
+ * @uval: Pointer to an u64 holding the unpacked value.
+ * @startbit: The index (in logical notation, compensated for quirks) where
+ *	      the packed value starts within pbuf. Must be larger than, or
+ *	      equal to, endbit.
+ * @endbit: The index (in logical notation, compensated for quirks) where
+ *	    the packed value ends within pbuf. Must be smaller than, or equal
+ *	    to, startbit.
+ * @pbuflen: The length in bytes of the packed buffer pointed to by @pbuf.
+ * @op: If PACK, then uval will be treated as const pointer and copied (packed)
+ *	into pbuf, between startbit and endbit.
+ *	If UNPACK, then pbuf will be treated as const pointer and the logical
+ *	value between startbit and endbit will be copied (unpacked) to uval.
+ * @quirks: A bit mask of QUIRK_LITTLE_ENDIAN, QUIRK_LSW32_IS_FIRST and
+ *	    QUIRK_MSB_ON_THE_RIGHT.
+ *
+ * Note: this is deprecated, prefer to use pack() or unpack() in new code.
+ *
+ * Return: 0 on success, EINVAL or ERANGE if called incorrectly. Assuming
+ *	   correct usage, return code may be discarded.
+ *	   If op is PACK, pbuf is modified.
+ *	   If op is UNPACK, uval is modified.
+ */
+int packing(void *pbuf, u64 *uval, int startbit, int endbit, size_t pbuflen,
+	    enum packing_op op, u8 quirks)
+{
+	if (op == PACK)
+		return pack(pbuf, *uval, startbit, endbit, pbuflen, quirks);
+
+	return unpack(pbuf, uval, startbit, endbit, pbuflen, quirks);
+}
+EXPORT_SYMBOL(packing);
+
 MODULE_DESCRIPTION("Generic bitfield packing and unpacking");

-- 
2.46.0.124.g2dc1a81c8933


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH iwl-next v2 06/13] lib: packing: add KUnit tests adapted from selftests
  2024-08-28 20:57 [PATCH iwl-next v2 00/13] ice: use <linux/packing.h> for Tx and Rx queue context data Jacob Keller
                   ` (4 preceding siblings ...)
  2024-08-28 20:57 ` [PATCH iwl-next v2 05/13] lib: packing: duplicate pack() and unpack() implementations Jacob Keller
@ 2024-08-28 20:57 ` Jacob Keller
  2024-08-28 20:57 ` [PATCH iwl-next v2 07/13] lib: packing: add additional KUnit tests Jacob Keller
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Jacob Keller @ 2024-08-28 20:57 UTC (permalink / raw)
  To: Vladimir Oltean, netdev, Anthony Nguyen, Intel Wired LAN; +Cc: Przemek Kitszel

Add 24 simple KUnit tests for the lib/packing.c pack() and unpack() APIs.

The first 16 tests exercise all combinations of quirks with a simple magic
number value on a 16-byte buffer. The remaining 8 tests cover
non-multiple-of-4 buffer sizes.

These tests were originally written by Vladimir as simple selftest
functions. I adapted them to KUnit, refactoring them into a table driven
approach. This will aid in adding additional tests in the future.

Co-developed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
---
 lib/packing_test.c | 258 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 MAINTAINERS        |   1 +
 lib/Kconfig        |  12 +++
 lib/Makefile       |   1 +
 4 files changed, 272 insertions(+)

diff --git a/lib/packing_test.c b/lib/packing_test.c
new file mode 100644
index 000000000000..4d07523dba29
--- /dev/null
+++ b/lib/packing_test.c
@@ -0,0 +1,258 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2024, Vladimir Oltean <olteanv@gmail.com>
+ * Copyright (c) 2024, Intel Corporation.
+ */
+#include <kunit/test.h>
+#include <linux/packing.h>
+
+struct packing_test_case {
+	const char *desc;
+	const u8 *pbuf;
+	size_t pbuf_size;
+	u64 uval;
+	size_t start_bit;
+	size_t end_bit;
+	u8 quirks;
+};
+
+#define NO_QUIRKS	0
+
+/**
+ * PBUF - Initialize .pbuf and .pbuf_size
+ * @array: elements of constant physical buffer
+ *
+ * Initializes the .pbuf and .pbuf_size fields of a struct packing_test_case
+ * with a constant array of the specified elements.
+ */
+#define PBUF(array...)					\
+	.pbuf = (const u8[]){ array },			\
+	.pbuf_size = sizeof((const u8 []){ array })
+
+static const struct packing_test_case cases[] = {
+	/* These tests pack and unpack a magic 64-bit value
+	 * (0xcafedeadbeefcafe) at a fixed logical offset (32) within an
+	 * otherwise zero array of 128 bits (16 bytes). They test all possible
+	 * bit layouts of the 128 bit buffer.
+	 */
+	{
+		.desc = "no quirks, 16 bytes",
+		PBUF(0x00, 0x00, 0x00, 0x00, 0xca, 0xfe, 0xde, 0xad,
+		     0xbe, 0xef, 0xca, 0xfe, 0x00, 0x00, 0x00, 0x00),
+		.uval = 0xcafedeadbeefcafe,
+		.start_bit = 95,
+		.end_bit = 32,
+		.quirks = NO_QUIRKS,
+	},
+	{
+		.desc = "lsw32 first, 16 bytes",
+		PBUF(0x00, 0x00, 0x00, 0x00, 0xbe, 0xef, 0xca, 0xfe,
+		     0xca, 0xfe, 0xde, 0xad, 0x00, 0x00, 0x00, 0x00),
+		.uval = 0xcafedeadbeefcafe,
+		.start_bit = 95,
+		.end_bit = 32,
+		.quirks = QUIRK_LSW32_IS_FIRST,
+	},
+	{
+		.desc = "little endian words, 16 bytes",
+		PBUF(0x00, 0x00, 0x00, 0x00, 0xad, 0xde, 0xfe, 0xca,
+		     0xfe, 0xca, 0xef, 0xbe, 0x00, 0x00, 0x00, 0x00),
+		.uval = 0xcafedeadbeefcafe,
+		.start_bit = 95,
+		.end_bit = 32,
+		.quirks = QUIRK_LITTLE_ENDIAN,
+	},
+	{
+		.desc = "lsw32 first + little endian words, 16 bytes",
+		PBUF(0x00, 0x00, 0x00, 0x00, 0xfe, 0xca, 0xef, 0xbe,
+		     0xad, 0xde, 0xfe, 0xca, 0x00, 0x00, 0x00, 0x00),
+		.uval = 0xcafedeadbeefcafe,
+		.start_bit = 95,
+		.end_bit = 32,
+		.quirks = QUIRK_LSW32_IS_FIRST | QUIRK_LITTLE_ENDIAN,
+	},
+	{
+		.desc = "msb right, 16 bytes",
+		PBUF(0x00, 0x00, 0x00, 0x00, 0x53, 0x7f, 0x7b, 0xb5,
+		     0x7d, 0xf7, 0x53, 0x7f, 0x00, 0x00, 0x00, 0x00),
+		.uval = 0xcafedeadbeefcafe,
+		.start_bit = 95,
+		.end_bit = 32,
+		.quirks = QUIRK_MSB_ON_THE_RIGHT,
+	},
+	{
+		.desc = "msb right + lsw32 first, 16 bytes",
+		PBUF(0x00, 0x00, 0x00, 0x00, 0x7d, 0xf7, 0x53, 0x7f,
+		     0x53, 0x7f, 0x7b, 0xb5, 0x00, 0x00, 0x00, 0x00),
+		.uval = 0xcafedeadbeefcafe,
+		.start_bit = 95,
+		.end_bit = 32,
+		.quirks = QUIRK_MSB_ON_THE_RIGHT | QUIRK_LSW32_IS_FIRST,
+	},
+	{
+		.desc = "msb right + little endian words, 16 bytes",
+		PBUF(0x00, 0x00, 0x00, 0x00, 0xb5, 0x7b, 0x7f, 0x53,
+		     0x7f, 0x53, 0xf7, 0x7d, 0x00, 0x00, 0x00, 0x00),
+		.uval = 0xcafedeadbeefcafe,
+		.start_bit = 95,
+		.end_bit = 32,
+		.quirks = QUIRK_MSB_ON_THE_RIGHT | QUIRK_LITTLE_ENDIAN,
+	},
+	{
+		.desc = "msb right + lsw32 first + little endian words, 16 bytes",
+		PBUF(0x00, 0x00, 0x00, 0x00, 0x7f, 0x53, 0xf7, 0x7d,
+		     0xb5, 0x7b, 0x7f, 0x53, 0x00, 0x00, 0x00, 0x00),
+		.uval = 0xcafedeadbeefcafe,
+		.start_bit = 95,
+		.end_bit = 32,
+		.quirks = QUIRK_MSB_ON_THE_RIGHT | QUIRK_LSW32_IS_FIRST | QUIRK_LITTLE_ENDIAN,
+	},
+	/* These tests pack and unpack a magic 64-bit value
+	 * (0xcafedeadbeefcafe) at a fixed logical offset (32) within an
+	 * otherwise zero array of varying size from 18 bytes to 24 bytes.
+	 */
+	{
+		.desc = "no quirks, 18 bytes",
+		PBUF(0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xca, 0xfe,
+		     0xde, 0xad, 0xbe, 0xef, 0xca, 0xfe, 0x00, 0x00,
+		     0x00, 0x00),
+		.uval = 0xcafedeadbeefcafe,
+		.start_bit = 95,
+		.end_bit = 32,
+		.quirks = NO_QUIRKS,
+	},
+	{
+		.desc = "no quirks, 19 bytes",
+		PBUF(0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xca,
+		     0xfe, 0xde, 0xad, 0xbe, 0xef, 0xca, 0xfe, 0x00,
+		     0x00, 0x00, 0x00),
+		.uval = 0xcafedeadbeefcafe,
+		.start_bit = 95,
+		.end_bit = 32,
+		.quirks = NO_QUIRKS,
+	},
+	{
+		.desc = "no quirks, 20 bytes",
+		PBUF(0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		     0xca, 0xfe, 0xde, 0xad, 0xbe, 0xef, 0xca, 0xfe,
+		     0x00, 0x00, 0x00, 0x00),
+		.uval = 0xcafedeadbeefcafe,
+		.start_bit = 95,
+		.end_bit = 32,
+		.quirks = NO_QUIRKS,
+	},
+	{
+		.desc = "no quirks, 22 bytes",
+		PBUF(0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		     0x00, 0x00, 0xca, 0xfe, 0xde, 0xad, 0xbe, 0xef,
+		     0xca, 0xfe, 0x00, 0x00, 0x00, 0x00),
+		.uval = 0xcafedeadbeefcafe,
+		.start_bit = 95,
+		.end_bit = 32,
+		.quirks = NO_QUIRKS,
+	},
+	{
+		.desc = "no quirks, 24 bytes",
+		PBUF(0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		     0x00, 0x00, 0x00, 0x00, 0xca, 0xfe, 0xde, 0xad,
+		     0xbe, 0xef, 0xca, 0xfe, 0x00, 0x00, 0x00, 0x00),
+		.uval = 0xcafedeadbeefcafe,
+		.start_bit = 95,
+		.end_bit = 32,
+		.quirks = NO_QUIRKS,
+	},
+	{
+		.desc = "lsw32 first + little endian words, 18 bytes",
+		PBUF(0x00, 0x00, 0x00, 0x00, 0xfe, 0xca, 0xef, 0xbe,
+		     0xad, 0xde, 0xfe, 0xca, 0x00, 0x00, 0x00, 0x00,
+		     0x00, 0x00),
+		.uval = 0xcafedeadbeefcafe,
+		.start_bit = 95,
+		.end_bit = 32,
+		.quirks = QUIRK_LSW32_IS_FIRST | QUIRK_LITTLE_ENDIAN,
+	},
+	{
+		.desc = "lsw32 first + little endian words, 19 bytes",
+		PBUF(0x00, 0x00, 0x00, 0x00, 0xfe, 0xca, 0xef, 0xbe,
+		     0xad, 0xde, 0xfe, 0xca, 0x00, 0x00, 0x00, 0x00,
+		     0x00, 0x00, 0x00),
+		.uval = 0xcafedeadbeefcafe,
+		.start_bit = 95,
+		.end_bit = 32,
+		.quirks = QUIRK_LSW32_IS_FIRST | QUIRK_LITTLE_ENDIAN,
+	},
+	{
+		.desc = "lsw32 first + little endian words, 20 bytes",
+		PBUF(0x00, 0x00, 0x00, 0x00, 0xfe, 0xca, 0xef, 0xbe,
+		     0xad, 0xde, 0xfe, 0xca, 0x00, 0x00, 0x00, 0x00,
+		     0x00, 0x00, 0x00, 0x00),
+		.uval = 0xcafedeadbeefcafe,
+		.start_bit = 95,
+		.end_bit = 32,
+		.quirks = QUIRK_LSW32_IS_FIRST | QUIRK_LITTLE_ENDIAN,
+	},
+	{
+		.desc = "lsw32 first + little endian words, 22 bytes",
+		PBUF(0x00, 0x00, 0x00, 0x00, 0xfe, 0xca, 0xef, 0xbe,
+		     0xad, 0xde, 0xfe, 0xca, 0x00, 0x00, 0x00, 0x00,
+		     0x00, 0x00, 0x00, 0x00, 0x00, 0x00),
+		.uval = 0xcafedeadbeefcafe,
+		.start_bit = 95,
+		.end_bit = 32,
+		.quirks = QUIRK_LSW32_IS_FIRST | QUIRK_LITTLE_ENDIAN,
+	},
+	{
+		.desc = "lsw32 first + little endian words, 24 bytes",
+		PBUF(0x00, 0x00, 0x00, 0x00, 0xfe, 0xca, 0xef, 0xbe,
+		     0xad, 0xde, 0xfe, 0xca, 0x00, 0x00, 0x00, 0x00,
+		     0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00),
+		.uval = 0xcafedeadbeefcafe,
+		.start_bit = 95,
+		.end_bit = 32,
+		.quirks = QUIRK_LSW32_IS_FIRST | QUIRK_LITTLE_ENDIAN,
+	},
+};
+
+KUNIT_ARRAY_PARAM_DESC(packing, cases, desc);
+
+static void packing_test_pack(struct kunit *test)
+{
+	const struct packing_test_case *params = test->param_value;
+	u8 *pbuf;
+	int err;
+
+	pbuf = kunit_kzalloc(test, params->pbuf_size, GFP_KERNEL);
+
+	err = pack(pbuf, params->uval, params->start_bit, params->end_bit,
+		   params->pbuf_size, params->quirks);
+
+	KUNIT_EXPECT_EQ_MSG(test, err, 0, "pack() returned %pe\n", ERR_PTR(err));
+	KUNIT_EXPECT_MEMEQ(test, pbuf, params->pbuf, params->pbuf_size);
+}
+
+static void packing_test_unpack(struct kunit *test)
+{
+	const struct packing_test_case *params = test->param_value;
+	u64 uval;
+	int err;
+
+	err = unpack(params->pbuf, &uval, params->start_bit, params->end_bit,
+		     params->pbuf_size, params->quirks);
+	KUNIT_EXPECT_EQ_MSG(test, err, 0, "unpack() returned %pe\n", ERR_PTR(err));
+	KUNIT_EXPECT_EQ(test, uval, params->uval);
+}
+
+static struct kunit_case packing_test_cases[] = {
+	KUNIT_CASE_PARAM(packing_test_pack, packing_gen_params),
+	KUNIT_CASE_PARAM(packing_test_unpack, packing_gen_params),
+	{},
+};
+
+static struct kunit_suite packing_test_suite = {
+	.name = "packing",
+	.test_cases = packing_test_cases,
+};
+
+kunit_test_suite(packing_test_suite);
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("KUnit tests for packing library");
diff --git a/MAINTAINERS b/MAINTAINERS
index 30a9b9450e11..8daa3b7d307a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -17235,6 +17235,7 @@ S:	Supported
 F:	Documentation/core-api/packing.rst
 F:	include/linux/packing.h
 F:	lib/packing.c
+F:	lib/packing_test.c
 
 PADATA PARALLEL EXECUTION MECHANISM
 M:	Steffen Klassert <steffen.klassert@secunet.com>
diff --git a/lib/Kconfig b/lib/Kconfig
index b38849af6f13..50d85f38b569 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -40,6 +40,18 @@ config PACKING
 
 	  When in doubt, say N.
 
+config PACKING_KUNIT_TEST
+	tristate "KUnit tests for packing library" if !KUNIT_ALL_TESTS
+	depends on PACKING && KUNIT
+	default KUNIT_ALL_TESTS
+	help
+	  This builds KUnit tests for the packing library.
+
+	  For more information on KUnit and unit tests in general,
+	  please refer to the KUnit documentation in Documentation/dev-tools/kunit/.
+
+	  When in doubt, say N.
+
 config BITREVERSE
 	tristate
 
diff --git a/lib/Makefile b/lib/Makefile
index 322bb127b4dc..69db48a89cf5 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -153,6 +153,7 @@ obj-$(CONFIG_DEBUG_OBJECTS) += debugobjects.o
 obj-$(CONFIG_BITREVERSE) += bitrev.o
 obj-$(CONFIG_LINEAR_RANGES) += linear_ranges.o
 obj-$(CONFIG_PACKING)	+= packing.o
+obj-$(CONFIG_PACKING_KUNIT_TEST) += packing_test.o
 obj-$(CONFIG_CRC_CCITT)	+= crc-ccitt.o
 obj-$(CONFIG_CRC16)	+= crc16.o
 obj-$(CONFIG_CRC_T10DIF)+= crc-t10dif.o

-- 
2.46.0.124.g2dc1a81c8933


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH iwl-next v2 07/13] lib: packing: add additional KUnit tests
  2024-08-28 20:57 [PATCH iwl-next v2 00/13] ice: use <linux/packing.h> for Tx and Rx queue context data Jacob Keller
                   ` (5 preceding siblings ...)
  2024-08-28 20:57 ` [PATCH iwl-next v2 06/13] lib: packing: add KUnit tests adapted from selftests Jacob Keller
@ 2024-08-28 20:57 ` Jacob Keller
  2024-08-28 20:57 ` [PATCH iwl-next v2 08/13] lib: packing: fix QUIRK_MSB_ON_THE_RIGHT behavior Jacob Keller
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Jacob Keller @ 2024-08-28 20:57 UTC (permalink / raw)
  To: Vladimir Oltean, netdev, Anthony Nguyen, Intel Wired LAN; +Cc: Przemek Kitszel

While reviewing the initial KUnit tests for lib/packing, Przemek pointed
out that the test values have duplicate bytes in the input sequence.

In addition, I noticed that the unit tests pack and unpack on a byte
boundary, instead of crossing bytes. Thus, we lack good coverage of the
corner cases of the API.

Add additional unit tests to cover packing and unpacking byte buffers which
do not have duplicate bytes in the unpacked value, and which pack and
unpack to an unaligned offset.

A careful reviewer may note the lack tests for QUIRK_MSB_ON_THE_RIGHT. This
is because I found issues with that quirk during test implementation. This
quirk will be fixed and the tests will be included in a future change.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
---
 lib/packing_test.c | 82 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 82 insertions(+)

diff --git a/lib/packing_test.c b/lib/packing_test.c
index 4d07523dba29..5d533e633e06 100644
--- a/lib/packing_test.c
+++ b/lib/packing_test.c
@@ -210,6 +210,88 @@ static const struct packing_test_case cases[] = {
 		.end_bit = 32,
 		.quirks = QUIRK_LSW32_IS_FIRST | QUIRK_LITTLE_ENDIAN,
 	},
+	/* These tests pack and unpack a magic 64-bit value
+	 * (0x1122334455667788) at an odd starting bit (43) within an
+	 * otherwise zero array of 128 bits (16 bytes). They test all possible
+	 * bit layouts of the 128 bit buffer.
+	 */
+	{
+		.desc = "no quirks, 16 bytes, non-aligned",
+		PBUF(0x00, 0x00, 0x00, 0x89, 0x11, 0x9a, 0x22, 0xab,
+		     0x33, 0xbc, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00),
+		.uval = 0x1122334455667788,
+		.start_bit = 106,
+		.end_bit = 43,
+		.quirks = NO_QUIRKS,
+	},
+	{
+		.desc = "lsw32 first, 16 bytes, non-aligned",
+		PBUF(0x00, 0x00, 0x00, 0x00, 0x33, 0xbc, 0x40, 0x00,
+		     0x11, 0x9a, 0x22, 0xab, 0x00, 0x00, 0x00, 0x89),
+		.uval = 0x1122334455667788,
+		.start_bit = 106,
+		.end_bit = 43,
+		.quirks = QUIRK_LSW32_IS_FIRST,
+	},
+	{
+		.desc = "little endian words, 16 bytes, non-aligned",
+		PBUF(0x89, 0x00, 0x00, 0x00, 0xab, 0x22, 0x9a, 0x11,
+		     0x00, 0x40, 0xbc, 0x33, 0x00, 0x00, 0x00, 0x00),
+		.uval = 0x1122334455667788,
+		.start_bit = 106,
+		.end_bit = 43,
+		.quirks = QUIRK_LITTLE_ENDIAN,
+	},
+	{
+		.desc = "lsw32 first + little endian words, 16 bytes, non-aligned",
+		PBUF(0x00, 0x00, 0x00, 0x00, 0x00, 0x40, 0xbc, 0x33,
+		     0xab, 0x22, 0x9a, 0x11, 0x89, 0x00, 0x00, 0x00),
+		.uval = 0x1122334455667788,
+		.start_bit = 106,
+		.end_bit = 43,
+		.quirks = QUIRK_LSW32_IS_FIRST | QUIRK_LITTLE_ENDIAN,
+	},
+	/* These tests pack and unpack a u64 with all bits set
+	 * (0xffffffffffffffff) at an odd starting bit (43) within an
+	 * otherwise zero array of 128 bits (16 bytes). They test all possible
+	 * bit layouts of the 128 bit buffer.
+	 */
+	{
+		.desc = "no quirks, 16 bytes, non-aligned, 0xff",
+		PBUF(0x00, 0x00, 0x07, 0xff, 0xff, 0xff, 0xff, 0xff,
+		     0xff, 0xff, 0xf8, 0x00, 0x00, 0x00, 0x00, 0x00),
+		.uval = 0xffffffffffffffff,
+		.start_bit = 106,
+		.end_bit = 43,
+		.quirks = NO_QUIRKS,
+	},
+	{
+		.desc = "lsw32 first, 16 bytes, non-aligned, 0xff",
+		PBUF(0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xf8, 0x00,
+		     0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x07, 0xff),
+		.uval = 0xffffffffffffffff,
+		.start_bit = 106,
+		.end_bit = 43,
+		.quirks = QUIRK_LSW32_IS_FIRST,
+	},
+	{
+		.desc = "little endian words, 16 bytes, non-aligned, 0xff",
+		PBUF(0xff, 0x07, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff,
+		     0x00, 0xf8, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00),
+		.uval = 0xffffffffffffffff,
+		.start_bit = 106,
+		.end_bit = 43,
+		.quirks = QUIRK_LITTLE_ENDIAN,
+	},
+	{
+		.desc = "lsw32 first + little endian words, 16 bytes, non-aligned, 0xff",
+		PBUF(0x00, 0x00, 0x00, 0x00, 0x00, 0xf8, 0xff, 0xff,
+		     0xff, 0xff, 0xff, 0xff, 0xff, 0x07, 0x00, 0x00),
+		.uval = 0xffffffffffffffff,
+		.start_bit = 106,
+		.end_bit = 43,
+		.quirks = QUIRK_LSW32_IS_FIRST | QUIRK_LITTLE_ENDIAN,
+	},
 };
 
 KUNIT_ARRAY_PARAM_DESC(packing, cases, desc);

-- 
2.46.0.124.g2dc1a81c8933


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH iwl-next v2 08/13] lib: packing: fix QUIRK_MSB_ON_THE_RIGHT behavior
  2024-08-28 20:57 [PATCH iwl-next v2 00/13] ice: use <linux/packing.h> for Tx and Rx queue context data Jacob Keller
                   ` (6 preceding siblings ...)
  2024-08-28 20:57 ` [PATCH iwl-next v2 07/13] lib: packing: add additional KUnit tests Jacob Keller
@ 2024-08-28 20:57 ` Jacob Keller
  2024-09-02 23:25   ` Vladimir Oltean
  2024-08-28 20:57 ` [PATCH iwl-next v2 09/13] ice: remove int_q_state from ice_tlan_ctx Jacob Keller
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 20+ messages in thread
From: Jacob Keller @ 2024-08-28 20:57 UTC (permalink / raw)
  To: Vladimir Oltean, netdev, Anthony Nguyen, Intel Wired LAN; +Cc: Przemek Kitszel

The QUIRK_MSB_ON_THE_RIGHT quirk is intended to modify pack() and unpack()
so that the most significant bit of each byte in the packed layout is on
the right.

The way the quirk is currently implemented is broken whenever the packing
code packs or unpacks any value that is not exactly a full byte.

The broken behavior can occur when packing any values smaller than one
byte, when packing any value that is not exactly a whole number of bytes,
or when the packing is not aligned to a byte boundary.

This quirk is documented in the following way:

  1. Normally (no quirks), we would do it like this:

  ::

    63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32
    7                       6                       5                        4
    31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0
    3                       2                       1                        0

  <snip>

  2. If QUIRK_MSB_ON_THE_RIGHT is set, we do it like this:

  ::

    56 57 58 59 60 61 62 63 48 49 50 51 52 53 54 55 40 41 42 43 44 45 46 47 32 33 34 35 36 37 38 39
    7                       6                        5                       4
    24 25 26 27 28 29 30 31 16 17 18 19 20 21 22 23  8  9 10 11 12 13 14 15  0  1  2  3  4  5  6  7
    3                       2                        1                       0

  That is, QUIRK_MSB_ON_THE_RIGHT does not affect byte positioning, but
  inverts bit offsets inside a byte.

Essentially, the mapping for physical bit offsets should be reserved for a
given byte within the payload. This reversal should be fixed to the bytes
in the packing layout.

The logic to implement this quirk is handled within the
adjust_for_msb_right_quirk() function. This function does not work properly
when dealing with the bytes that contain only a partial amount of data.

In particular, consider trying to pack or unpack the range 53-44. We should
always be mapping the bits from the logical ordering to their physical
ordering in the same way, regardless of what sequence of bits we are
unpacking.

This, we should grab the following logical bits:

  Logical: 55 54 53 52 51 50 49 48 47 45 44 43 42 41 40 39
                  ^  ^  ^  ^  ^  ^  ^  ^  ^

And pack them into the physical bits:

   Physical: 48 49 50 51 52 53 54 55 40 41 42 43 44 45 46 47
    Logical: 48 49 50 51 52 53                   44 45 46 47
              ^  ^  ^  ^  ^  ^                    ^  ^  ^  ^

The current logic in adjust_for_msb_right_quirk is broken. I believe it is
intending to map according to the following:

  Physical: 48 49 50 51 52 53 54 55 40 41 42 43 44 45 46 47
   Logical:       48 49 50 51 52 53 44 45 46 47
                   ^  ^  ^  ^  ^  ^  ^  ^  ^  ^

That is, it tries to keep the bits at the start and end of a packing
together. This is wrong, as it makes the packing change what bit is being
mapped to what based on which bits you're currently packing or unpacking.

Worse, the actual calculations within adjust_for_msb_right_quirk don't make
sense.

Consider the case when packing the last byte of an unaligned packing. It
might have a start bit of 7 and an end bit of 5. This would have a width of
3 bits. The new_start_bit will be calculated as the width - the box_end_bit
- 1. This will underflow and produce a negative value, which will
ultimate result in generating a new box_mask of all 0s.

For any other values, the result of the calculations of the
new_box_end_bit, new_box_start_bit, and the new box_mask will result in the
exact same values for the box_end_bit, box_start_bit, and box_mask. This
makes the calculations completely irrelevant.

If box_end_bit is 0, and box_start_bit is 7, then the entire function of
adjust_for_msb_right_quirk will boil down to just:

    *to_write = bitrev8(*to_write)

The other adjustments are attempting (incorrectly) to keep the bits in the
same place but just reversed. This is not the right behavior even if
implemented correctly, as it leaves the mapping dependent on the bit values
being packed or unpacked.

Remove adjust_for_msb_right_quirk() and just use bitrev8 to reverse the
byte order when interacting with the packed data.

In particular, for packing, we need to reverse both the box_mask and the
physical value being packed. This is done after shifting the value by
box_end_bit so that the reversed mapping is always aligned to the physical
buffer byte boundary. The box_mask is reversed as we're about to use it to
clear any stale bits in the physical buffer at this block.

For unpacking, we need to reverse the contents of the physical buffer
*before* masking with the box_mask. This is critical, as the box_mask is a
logical mask of the bit layout before handling the QUIRK_MSB_ON_THE_RIGHT.

Add several new tests which cover this behavior. These tests will fail
without the fix and pass afterwards. Note that no current drivers make use
of QUIRK_MSB_ON_THE_RIGHT. I suspect this is why there have been no reports
of this inconsistency before.

Fixes: 554aae35007e ("lib: Add support for generic packing operations")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
---
 lib/packing.c      | 39 +++++++++--------------------
 lib/packing_test.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 83 insertions(+), 28 deletions(-)

diff --git a/lib/packing.c b/lib/packing.c
index 500184530d07..9efe57d347c7 100644
--- a/lib/packing.c
+++ b/lib/packing.c
@@ -9,23 +9,6 @@
 #include <linux/types.h>
 #include <linux/bitrev.h>
 
-static void adjust_for_msb_right_quirk(u64 *to_write, size_t *box_start_bit,
-				       size_t *box_end_bit, u8 *box_mask)
-{
-	size_t box_bit_width = *box_start_bit - *box_end_bit + 1;
-	size_t new_box_start_bit, new_box_end_bit;
-
-	*to_write >>= *box_end_bit;
-	*to_write = bitrev8(*to_write) >> (8 - box_bit_width);
-	*to_write <<= *box_end_bit;
-
-	new_box_end_bit   = box_bit_width - *box_start_bit - 1;
-	new_box_start_bit = box_bit_width - *box_end_bit - 1;
-	*box_mask = GENMASK_ULL(new_box_start_bit, new_box_end_bit);
-	*box_start_bit = new_box_start_bit;
-	*box_end_bit   = new_box_end_bit;
-}
-
 /**
  * calculate_box_addr - Determine physical location of byte in buffer
  * @box: Index of byte within buffer seen as a logical big-endian big number
@@ -170,13 +153,13 @@ int pack(void *pbuf, u64 uval, size_t startbit, size_t endbit, size_t pbuflen,
 		/* Write to pbuf, read from uval */
 		pval = uval & proj_mask;
 		pval >>= proj_end_bit;
-		if (quirks & QUIRK_MSB_ON_THE_RIGHT)
-			adjust_for_msb_right_quirk(&pval,
-						   &box_start_bit,
-						   &box_end_bit,
-						   &box_mask);
-
 		pval <<= box_end_bit;
+
+		if (quirks & QUIRK_MSB_ON_THE_RIGHT) {
+			pval = bitrev8(pval);
+			box_mask = bitrev8(box_mask);
+		}
+
 		((u8 *)pbuf)[box_addr] &= ~box_mask;
 		((u8 *)pbuf)[box_addr] |= pval;
 	}
@@ -276,12 +259,12 @@ int unpack(const void *pbuf, u64 *uval, size_t startbit, size_t endbit,
 		box_addr = calculate_box_addr(box, pbuflen, quirks);
 
 		/* Read from pbuf, write to uval */
-		pval = ((u8 *)pbuf)[box_addr] & box_mask;
+		pval = ((u8 *)pbuf)[box_addr];
+
 		if (quirks & QUIRK_MSB_ON_THE_RIGHT)
-			adjust_for_msb_right_quirk(&pval,
-						   &box_start_bit,
-						   &box_end_bit,
-						   &box_mask);
+			pval = bitrev8(pval);
+
+		pval &= box_mask;
 
 		pval >>= box_end_bit;
 		pval <<= proj_end_bit;
diff --git a/lib/packing_test.c b/lib/packing_test.c
index 5d533e633e06..015ad1180d23 100644
--- a/lib/packing_test.c
+++ b/lib/packing_test.c
@@ -251,6 +251,42 @@ static const struct packing_test_case cases[] = {
 		.end_bit = 43,
 		.quirks = QUIRK_LSW32_IS_FIRST | QUIRK_LITTLE_ENDIAN,
 	},
+	{
+		.desc = "msb right, 16 bytes, non-aligned",
+		PBUF(0x00, 0x00, 0x00, 0x91, 0x88, 0x59, 0x44, 0xd5,
+		     0xcc, 0x3d, 0x02, 0x00, 0x00, 0x00, 0x00, 0x00),
+		.uval = 0x1122334455667788,
+		.start_bit = 106,
+		.end_bit = 43,
+		.quirks = QUIRK_MSB_ON_THE_RIGHT,
+	},
+	{
+		.desc = "msb right + lsw32 first, 16 bytes, non-aligned",
+		PBUF(0x00, 0x00, 0x00, 0x00, 0xcc, 0x3d, 0x02, 0x00,
+		     0x88, 0x59, 0x44, 0xd5, 0x00, 0x00, 0x00, 0x91),
+		.uval = 0x1122334455667788,
+		.start_bit = 106,
+		.end_bit = 43,
+		.quirks = QUIRK_MSB_ON_THE_RIGHT | QUIRK_LSW32_IS_FIRST,
+	},
+	{
+		.desc = "msb right + little endian words, 16 bytes, non-aligned",
+		PBUF(0x91, 0x00, 0x00, 0x00, 0xd5, 0x44, 0x59, 0x88,
+		     0x00, 0x02, 0x3d, 0xcc, 0x00, 0x00, 0x00, 0x00),
+		.uval = 0x1122334455667788,
+		.start_bit = 106,
+		.end_bit = 43,
+		.quirks = QUIRK_MSB_ON_THE_RIGHT | QUIRK_LITTLE_ENDIAN,
+	},
+	{
+		.desc = "msb right + lsw32 first + little endian words, 16 bytes, non-aligned",
+		PBUF(0x00, 0x00, 0x00, 0x00, 0x00, 0x02, 0x3d, 0xcc,
+		     0xd5, 0x44, 0x59, 0x88, 0x91, 0x00, 0x00, 0x00),
+		.uval = 0x1122334455667788,
+		.start_bit = 106,
+		.end_bit = 43,
+		.quirks = QUIRK_MSB_ON_THE_RIGHT | QUIRK_LSW32_IS_FIRST | QUIRK_LITTLE_ENDIAN,
+	},
 	/* These tests pack and unpack a u64 with all bits set
 	 * (0xffffffffffffffff) at an odd starting bit (43) within an
 	 * otherwise zero array of 128 bits (16 bytes). They test all possible
@@ -292,6 +328,42 @@ static const struct packing_test_case cases[] = {
 		.end_bit = 43,
 		.quirks = QUIRK_LSW32_IS_FIRST | QUIRK_LITTLE_ENDIAN,
 	},
+	{
+		.desc = "msb right, 16 bytes, non-aligned, 0xff",
+		PBUF(0x00, 0x00, 0xe0, 0xff, 0xff, 0xff, 0xff, 0xff,
+		     0xff, 0xff, 0x1f, 0x00, 0x00, 0x00, 0x00, 0x00),
+		.uval = 0xffffffffffffffff,
+		.start_bit = 106,
+		.end_bit = 43,
+		.quirks = QUIRK_MSB_ON_THE_RIGHT,
+	},
+	{
+		.desc = "msb right + lsw32 first, 16 bytes, non-aligned, 0xff",
+		PBUF(0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0x1f, 0x00,
+		     0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0xe0, 0xff),
+		.uval = 0xffffffffffffffff,
+		.start_bit = 106,
+		.end_bit = 43,
+		.quirks = QUIRK_MSB_ON_THE_RIGHT | QUIRK_LSW32_IS_FIRST,
+	},
+	{
+		.desc = "msb right + little endian words, 16 bytes, non-aligned, 0xff",
+		PBUF(0xff, 0xe0, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff,
+		     0x00, 0x1f, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00),
+		.uval = 0xffffffffffffffff,
+		.start_bit = 106,
+		.end_bit = 43,
+		.quirks = QUIRK_MSB_ON_THE_RIGHT | QUIRK_LITTLE_ENDIAN,
+	},
+	{
+		.desc = "msb right + lsw32 first + little endian words, 16 bytes, non-aligned, 0xff",
+		PBUF(0x00, 0x00, 0x00, 0x00, 0x00, 0x1f, 0xff, 0xff,
+		     0xff, 0xff, 0xff, 0xff, 0xff, 0xe0, 0x00, 0x00),
+		.uval = 0xffffffffffffffff,
+		.start_bit = 106,
+		.end_bit = 43,
+		.quirks = QUIRK_MSB_ON_THE_RIGHT | QUIRK_LSW32_IS_FIRST | QUIRK_LITTLE_ENDIAN,
+	},
 };
 
 KUNIT_ARRAY_PARAM_DESC(packing, cases, desc);

-- 
2.46.0.124.g2dc1a81c8933


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH iwl-next v2 09/13] ice: remove int_q_state from ice_tlan_ctx
  2024-08-28 20:57 [PATCH iwl-next v2 00/13] ice: use <linux/packing.h> for Tx and Rx queue context data Jacob Keller
                   ` (7 preceding siblings ...)
  2024-08-28 20:57 ` [PATCH iwl-next v2 08/13] lib: packing: fix QUIRK_MSB_ON_THE_RIGHT behavior Jacob Keller
@ 2024-08-28 20:57 ` Jacob Keller
  2024-08-28 20:57 ` [PATCH iwl-next v2 10/13] ice: use <linux/packing.h> for Tx and Rx queue context data Jacob Keller
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Jacob Keller @ 2024-08-28 20:57 UTC (permalink / raw)
  To: Vladimir Oltean, netdev, Anthony Nguyen, Intel Wired LAN; +Cc: Przemek Kitszel

The int_q_state field of the ice_tlan_ctx structure represents the internal
queue state. However, we never actually need to assign this or read this
during normal operation. In fact, trying to unpack it would not be possible
as it is larger than a u64. Remove this field from the ice_tlan_ctx
structure, and remove its packing field from the ice_tlan_ctx_info array.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_lan_tx_rx.h | 1 -
 drivers/net/ethernet/intel/ice/ice_common.c    | 1 -
 2 files changed, 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_lan_tx_rx.h b/drivers/net/ethernet/intel/ice/ice_lan_tx_rx.h
index 611577ebc29d..0e8ed8c226e6 100644
--- a/drivers/net/ethernet/intel/ice/ice_lan_tx_rx.h
+++ b/drivers/net/ethernet/intel/ice/ice_lan_tx_rx.h
@@ -590,7 +590,6 @@ struct ice_tlan_ctx {
 	u8 drop_ena;
 	u8 cache_prof_idx;
 	u8 pkt_shaper_prof_idx;
-	u8 int_q_state;	/* width not needed - internal - DO NOT WRITE!!! */
 };
 
 #endif /* _ICE_LAN_TX_RX_H_ */
diff --git a/drivers/net/ethernet/intel/ice/ice_common.c b/drivers/net/ethernet/intel/ice/ice_common.c
index 009716a12a26..c4b24ba36b5e 100644
--- a/drivers/net/ethernet/intel/ice/ice_common.c
+++ b/drivers/net/ethernet/intel/ice/ice_common.c
@@ -1467,7 +1467,6 @@ const struct ice_ctx_ele ice_tlan_ctx_info[] = {
 	ICE_CTX_STORE(ice_tlan_ctx, drop_ena,			1,	165),
 	ICE_CTX_STORE(ice_tlan_ctx, cache_prof_idx,		2,	166),
 	ICE_CTX_STORE(ice_tlan_ctx, pkt_shaper_prof_idx,	3,	168),
-	ICE_CTX_STORE(ice_tlan_ctx, int_q_state,		122,	171),
 	{ 0 }
 };
 

-- 
2.46.0.124.g2dc1a81c8933


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH iwl-next v2 10/13] ice: use <linux/packing.h> for Tx and Rx queue context data
  2024-08-28 20:57 [PATCH iwl-next v2 00/13] ice: use <linux/packing.h> for Tx and Rx queue context data Jacob Keller
                   ` (8 preceding siblings ...)
  2024-08-28 20:57 ` [PATCH iwl-next v2 09/13] ice: remove int_q_state from ice_tlan_ctx Jacob Keller
@ 2024-08-28 20:57 ` Jacob Keller
  2024-09-03  0:08   ` Vladimir Oltean
  2024-08-28 20:57 ` [PATCH iwl-next v2 11/13] ice: reduce size of queue context fields Jacob Keller
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 20+ messages in thread
From: Jacob Keller @ 2024-08-28 20:57 UTC (permalink / raw)
  To: Vladimir Oltean, netdev, Anthony Nguyen, Intel Wired LAN; +Cc: Przemek Kitszel

The ice driver needs to write the Tx and Rx queue context when programming
Tx and Rx queues. This is currently done using some bespoke custom logic
via the ice_set_ctx() and its helper functions, along with bit position
definitions in the ice_tlan_ctx_info and ice_rlan_ctx_info structures.

This logic does work, but is problematic for several reasons:

1) ice_set_ctx requires a helper function for each byte size being packed,
   as it uses a separate function to pack u8, u16, u32, and u64 fields.
   This requires 4 functions which contain near-duplicate logic with the
   types changed out.

2) The logic in the ice_pack_ctx_word, ice_pack_ctx_dword, and
   ice_pack_ctx_qword does not handle values which straddle alignment
   boundaries very well. This requires that several fields in the
   ice_tlan_ctx_info and ice_rlan_ctx_info be a size larger than their bit
   size should require.

3) Future support for live migration will require adding unpacking
   functions to take the packed hardware context and unpack it into the
   ice_rlan_ctx and ice_tlan_ctx structures. Implementing this would
   require implementing ice_get_ctx, and its associated helper functions,
   which essentially doubles the amount of code required.

Since Linux 5.2, commit 554aae35007e ("lib: Add support for generic packing
operations") has had a robust bit packing library function which can
correctly pack or unpack data between the CPU-useful unpacked structures
and bitpacked buffers used by hardware. This API was previously broken if
the buffer size was not aligned to 4 bytes. However, this has recently been
fixed.

The major difference with <linux/packing.h> is that it expects the unpacked
data will always be a u64. This is somewhat limiting, but can be worked
around by using a local temporary u64.

Replace the ice-specific ice_set_ctx() logic with a <linux/packing.h>
based implementation, ice_ctx_pack(). It takes pointers to the relevant
packed and unpacked storage, as well as the least significant bit and the
width. A temporary local u64 value is used to interact with the pack() API.

As with the other implementations, we handle the error codes from pack()
with a pr_err and a call to dump_stack. These are unexpected as they should
only happen due to programmer error.

Note that I initially tried implementing this as functions which just
repeatably called the ice_ctx_pack() function instead of using the
ice_rlan_ctx_info and ice_tlan_ctx_info arrays. This does work, but it has
a couple of downsides:

 1) it wastes a significant amount of bytes in the text section, vs the
    savings from removing the RO data of the arrays.

 2) this cost is made worse after implementing an unpack function, as we
    must duplicate the list of packings for the unpack function.

It does have the slight upside of allowing integer type promotion when
packing. However, the unpack() function still requires a pointer to the u64
value, so a temporary variable would still be required when unpacking.

Finally, I opted to use the pack() interface instead of the packing()
interface since it ends up producing code I found easier to read overall.
The unpack support will ultimately add a few extra bytes to implement
separate ice_ctx_unpack(), and the __ice_unpack_queue_ctx(). However, I
think the resulting code is still simpler and will avoid requiring the
packing() to stick around forever. The overall cost is not high since we
still store the actual packing tables as shared RO data.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_common.h |  13 +-
 drivers/net/ethernet/intel/ice/ice_base.c   |   3 +-
 drivers/net/ethernet/intel/ice/ice_common.c | 285 ++++++++--------------------
 3 files changed, 96 insertions(+), 205 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_common.h b/drivers/net/ethernet/intel/ice/ice_common.h
index 27208a60cece..550fdd08e6aa 100644
--- a/drivers/net/ethernet/intel/ice/ice_common.h
+++ b/drivers/net/ethernet/intel/ice/ice_common.h
@@ -5,6 +5,7 @@
 #define _ICE_COMMON_H_
 
 #include <linux/bitfield.h>
+#include <linux/packing.h>
 
 #include "ice.h"
 #include "ice_type.h"
@@ -93,8 +94,16 @@ bool ice_check_sq_alive(struct ice_hw *hw, struct ice_ctl_q_info *cq);
 int ice_aq_q_shutdown(struct ice_hw *hw, bool unloading);
 void ice_fill_dflt_direct_cmd_desc(struct ice_aq_desc *desc, u16 opcode);
 extern const struct ice_ctx_ele ice_tlan_ctx_info[];
-int ice_set_ctx(struct ice_hw *hw, u8 *src_ctx, u8 *dest_ctx,
-		const struct ice_ctx_ele *ce_info);
+extern const struct ice_ctx_ele ice_rlan_ctx_info[];
+
+void __ice_pack_queue_ctx(const void *ctx, void *buf, size_t len,
+			  const struct ice_ctx_ele *ctx_info);
+
+#define ice_pack_rxq_ctx(rlan_ctx, buf) \
+	__ice_pack_queue_ctx((rlan_ctx), (buf), sizeof(buf), ice_rlan_ctx_info)
+
+#define ice_pack_txq_ctx(tlan_ctx, buf) \
+	__ice_pack_queue_ctx((tlan_ctx), (buf), sizeof(buf), ice_tlan_ctx_info)
 
 extern struct mutex ice_global_cfg_lock_sw;
 
diff --git a/drivers/net/ethernet/intel/ice/ice_base.c b/drivers/net/ethernet/intel/ice/ice_base.c
index f448d3a84564..1881ce8105ca 100644
--- a/drivers/net/ethernet/intel/ice/ice_base.c
+++ b/drivers/net/ethernet/intel/ice/ice_base.c
@@ -907,8 +907,7 @@ ice_vsi_cfg_txq(struct ice_vsi *vsi, struct ice_tx_ring *ring,
 	ice_setup_tx_ctx(ring, &tlan_ctx, pf_q);
 	/* copy context contents into the qg_buf */
 	qg_buf->txqs[0].txq_id = cpu_to_le16(pf_q);
-	ice_set_ctx(hw, (u8 *)&tlan_ctx, qg_buf->txqs[0].txq_ctx,
-		    ice_tlan_ctx_info);
+	ice_pack_txq_ctx(&tlan_ctx, qg_buf->txqs[0].txq_ctx);
 
 	/* init queue specific tail reg. It is referred as
 	 * transmit comm scheduler queue doorbell.
diff --git a/drivers/net/ethernet/intel/ice/ice_common.c b/drivers/net/ethernet/intel/ice/ice_common.c
index c4b24ba36b5e..09a94c20e16d 100644
--- a/drivers/net/ethernet/intel/ice/ice_common.c
+++ b/drivers/net/ethernet/intel/ice/ice_common.c
@@ -1387,8 +1387,89 @@ ice_copy_rxq_ctx_to_hw(struct ice_hw *hw, u8 *ice_rxq_ctx, u32 rxq_index)
 	return 0;
 }
 
+/**
+ * ice_ctx_pack - Pack Tx/Rx queue context data
+ * @pbuf: pointer to the packed buffer
+ * @pbuflen: size of the packed buffer
+ * @val: pointer to storage for the unpacked data
+ * @len: size of the unpacked data
+ * @width: width of bits to pack
+ * @lsb: least significant bit in the packed buffer
+ *
+ * Pack the given field of the Tx or Rx queue context into the hardware data
+ * buffer. The packed contents are in full Little Endian ordering with the
+ * least significant 4-byte block first.
+ *
+ * The only time that pack() should produce an error is due to invalid bit
+ * offsets. Thus, errors are logged along with a stack dump.
+ */
+static void ice_ctx_pack(void *pbuf, size_t pbuflen, const void *val,
+			 size_t len, size_t width, size_t lsb)
+{
+	size_t msb = lsb + width - 1;
+	u64 uval;
+	int err;
+
+	switch (len) {
+	case sizeof(u8):
+		uval = *((u8 *)val);
+		break;
+	case sizeof(u16):
+		uval = *((u16 *)val);
+		break;
+	case sizeof(u32):
+		uval = *((u32 *)val);
+		break;
+	case sizeof(u64):
+		uval = *((u64 *)val);
+		break;
+	default:
+		WARN_ONCE(1, "Unexpected size %zd when packing queue context",
+			  len);
+		return;
+	}
+
+	err = pack(pbuf, uval, msb, lsb, pbuflen,
+		   QUIRK_LITTLE_ENDIAN | QUIRK_LSW32_IS_FIRST);
+	if (unlikely(err)) {
+		if (err == -EINVAL) {
+			pr_err("MSB (%zd) expected to be larger than LSB (%zd)\n",
+			       msb, lsb);
+		} else if (err == -ERANGE) {
+			if ((msb - lsb + 1) > 64)
+				pr_err("Field %zd-%zd too large for 64 bits!\n",
+				       lsb, msb);
+			else
+				pr_err("Cannot store %llx inside field %zd-%zd (would truncate)\n",
+				       uval, lsb, msb);
+		}
+		dump_stack();
+	}
+}
+
+/**
+ * __ice_pack_queue_ctx - Pack Tx or Rx queue context
+ * @ctx: Unpacked queue context structure
+ * @buf: packed buffer storage
+ * @len: size of the packed buffer
+ * @ctx_info: array describing the packing layout
+ */
+void __ice_pack_queue_ctx(const void *ctx, void *buf, size_t len,
+			  const struct ice_ctx_ele *ctx_info)
+{
+	for (int f = 0; ctx_info[f].width; f++) {
+		const struct ice_ctx_ele *field_info = &ctx_info[f];
+		const void *field;
+
+		field = (const void *)ctx + field_info->offset;
+
+		ice_ctx_pack(buf, len, field, field_info->size_of,
+			     field_info->width, field_info->lsb);
+	}
+}
+
 /* LAN Rx Queue Context */
-static const struct ice_ctx_ele ice_rlan_ctx_info[] = {
+const struct ice_ctx_ele ice_rlan_ctx_info[] = {
 	/* Field		Width	LSB */
 	ICE_CTX_STORE(ice_rlan_ctx, head,		13,	0),
 	ICE_CTX_STORE(ice_rlan_ctx, cpuid,		8,	13),
@@ -1433,7 +1514,8 @@ int ice_write_rxq_ctx(struct ice_hw *hw, struct ice_rlan_ctx *rlan_ctx,
 
 	rlan_ctx->prefena = 1;
 
-	ice_set_ctx(hw, (u8 *)rlan_ctx, ctx_buf, ice_rlan_ctx_info);
+	ice_pack_rxq_ctx(rlan_ctx, ctx_buf);
+
 	return ice_copy_rxq_ctx_to_hw(hw, ctx_buf, rxq_index);
 }
 
@@ -4526,205 +4608,6 @@ ice_aq_add_rdma_qsets(struct ice_hw *hw, u8 num_qset_grps,
 
 /* End of FW Admin Queue command wrappers */
 
-/**
- * ice_pack_ctx_byte - write a byte to a packed context structure
- * @src_ctx: unpacked source context structure
- * @dest_ctx: packed destination context data
- * @ce_info: context element description
- */
-static void ice_pack_ctx_byte(u8 *src_ctx, u8 *dest_ctx,
-			      const struct ice_ctx_ele *ce_info)
-{
-	u8 src_byte, dest_byte, mask;
-	u8 *from, *dest;
-	u16 shift_width;
-
-	/* copy from the next struct field */
-	from = src_ctx + ce_info->offset;
-
-	/* prepare the bits and mask */
-	shift_width = ce_info->lsb % 8;
-	mask = GENMASK(ce_info->width - 1 + shift_width, shift_width);
-
-	src_byte = *from;
-	src_byte <<= shift_width;
-	src_byte &= mask;
-
-	/* get the current bits from the target bit string */
-	dest = dest_ctx + (ce_info->lsb / 8);
-
-	memcpy(&dest_byte, dest, sizeof(dest_byte));
-
-	dest_byte &= ~mask;	/* get the bits not changing */
-	dest_byte |= src_byte;	/* add in the new bits */
-
-	/* put it all back */
-	memcpy(dest, &dest_byte, sizeof(dest_byte));
-}
-
-/**
- * ice_pack_ctx_word - write a word to a packed context structure
- * @src_ctx: unpacked source context structure
- * @dest_ctx: packed destination context data
- * @ce_info: context element description
- */
-static void ice_pack_ctx_word(u8 *src_ctx, u8 *dest_ctx,
-			      const struct ice_ctx_ele *ce_info)
-{
-	u16 src_word, mask;
-	__le16 dest_word;
-	u8 *from, *dest;
-	u16 shift_width;
-
-	/* copy from the next struct field */
-	from = src_ctx + ce_info->offset;
-
-	/* prepare the bits and mask */
-	shift_width = ce_info->lsb % 8;
-	mask = GENMASK(ce_info->width - 1 + shift_width, shift_width);
-
-	/* don't swizzle the bits until after the mask because the mask bits
-	 * will be in a different bit position on big endian machines
-	 */
-	src_word = *(u16 *)from;
-	src_word <<= shift_width;
-	src_word &= mask;
-
-	/* get the current bits from the target bit string */
-	dest = dest_ctx + (ce_info->lsb / 8);
-
-	memcpy(&dest_word, dest, sizeof(dest_word));
-
-	dest_word &= ~(cpu_to_le16(mask));	/* get the bits not changing */
-	dest_word |= cpu_to_le16(src_word);	/* add in the new bits */
-
-	/* put it all back */
-	memcpy(dest, &dest_word, sizeof(dest_word));
-}
-
-/**
- * ice_pack_ctx_dword - write a dword to a packed context structure
- * @src_ctx: unpacked source context structure
- * @dest_ctx: packed destination context data
- * @ce_info: context element description
- */
-static void ice_pack_ctx_dword(u8 *src_ctx, u8 *dest_ctx,
-			       const struct ice_ctx_ele *ce_info)
-{
-	u32 src_dword, mask;
-	__le32 dest_dword;
-	u8 *from, *dest;
-	u16 shift_width;
-
-	/* copy from the next struct field */
-	from = src_ctx + ce_info->offset;
-
-	/* prepare the bits and mask */
-	shift_width = ce_info->lsb % 8;
-	mask = GENMASK(ce_info->width - 1 + shift_width, shift_width);
-
-	/* don't swizzle the bits until after the mask because the mask bits
-	 * will be in a different bit position on big endian machines
-	 */
-	src_dword = *(u32 *)from;
-	src_dword <<= shift_width;
-	src_dword &= mask;
-
-	/* get the current bits from the target bit string */
-	dest = dest_ctx + (ce_info->lsb / 8);
-
-	memcpy(&dest_dword, dest, sizeof(dest_dword));
-
-	dest_dword &= ~(cpu_to_le32(mask));	/* get the bits not changing */
-	dest_dword |= cpu_to_le32(src_dword);	/* add in the new bits */
-
-	/* put it all back */
-	memcpy(dest, &dest_dword, sizeof(dest_dword));
-}
-
-/**
- * ice_pack_ctx_qword - write a qword to a packed context structure
- * @src_ctx: unpacked source context structure
- * @dest_ctx: packed destination context data
- * @ce_info: context element description
- */
-static void ice_pack_ctx_qword(u8 *src_ctx, u8 *dest_ctx,
-			       const struct ice_ctx_ele *ce_info)
-{
-	u64 src_qword, mask;
-	__le64 dest_qword;
-	u8 *from, *dest;
-	u16 shift_width;
-
-	/* copy from the next struct field */
-	from = src_ctx + ce_info->offset;
-
-	/* prepare the bits and mask */
-	shift_width = ce_info->lsb % 8;
-	mask = GENMASK_ULL(ce_info->width - 1 + shift_width, shift_width);
-
-	/* don't swizzle the bits until after the mask because the mask bits
-	 * will be in a different bit position on big endian machines
-	 */
-	src_qword = *(u64 *)from;
-	src_qword <<= shift_width;
-	src_qword &= mask;
-
-	/* get the current bits from the target bit string */
-	dest = dest_ctx + (ce_info->lsb / 8);
-
-	memcpy(&dest_qword, dest, sizeof(dest_qword));
-
-	dest_qword &= ~(cpu_to_le64(mask));	/* get the bits not changing */
-	dest_qword |= cpu_to_le64(src_qword);	/* add in the new bits */
-
-	/* put it all back */
-	memcpy(dest, &dest_qword, sizeof(dest_qword));
-}
-
-/**
- * ice_set_ctx - set context bits in packed structure
- * @hw: pointer to the hardware structure
- * @src_ctx:  pointer to a generic non-packed context structure
- * @dest_ctx: pointer to memory for the packed structure
- * @ce_info: List of Rx context elements
- */
-int ice_set_ctx(struct ice_hw *hw, u8 *src_ctx, u8 *dest_ctx,
-		const struct ice_ctx_ele *ce_info)
-{
-	int f;
-
-	for (f = 0; ce_info[f].width; f++) {
-		/* We have to deal with each element of the FW response
-		 * using the correct size so that we are correct regardless
-		 * of the endianness of the machine.
-		 */
-		if (ce_info[f].width > (ce_info[f].size_of * BITS_PER_BYTE)) {
-			ice_debug(hw, ICE_DBG_QCTX, "Field %d width of %d bits larger than size of %d byte(s) ... skipping write\n",
-				  f, ce_info[f].width, ce_info[f].size_of);
-			continue;
-		}
-		switch (ce_info[f].size_of) {
-		case sizeof(u8):
-			ice_pack_ctx_byte(src_ctx, dest_ctx, &ce_info[f]);
-			break;
-		case sizeof(u16):
-			ice_pack_ctx_word(src_ctx, dest_ctx, &ce_info[f]);
-			break;
-		case sizeof(u32):
-			ice_pack_ctx_dword(src_ctx, dest_ctx, &ce_info[f]);
-			break;
-		case sizeof(u64):
-			ice_pack_ctx_qword(src_ctx, dest_ctx, &ce_info[f]);
-			break;
-		default:
-			return -EINVAL;
-		}
-	}
-
-	return 0;
-}
-
 /**
  * ice_get_lan_q_ctx - get the LAN queue context for the given VSI and TC
  * @hw: pointer to the HW struct

-- 
2.46.0.124.g2dc1a81c8933


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH iwl-next v2 11/13] ice: reduce size of queue context fields
  2024-08-28 20:57 [PATCH iwl-next v2 00/13] ice: use <linux/packing.h> for Tx and Rx queue context data Jacob Keller
                   ` (9 preceding siblings ...)
  2024-08-28 20:57 ` [PATCH iwl-next v2 10/13] ice: use <linux/packing.h> for Tx and Rx queue context data Jacob Keller
@ 2024-08-28 20:57 ` Jacob Keller
  2024-08-28 20:57 ` [PATCH iwl-next v2 12/13] ice: move prefetch enable to ice_setup_rx_ctx Jacob Keller
  2024-08-28 20:57 ` [PATCH iwl-next v2 13/13] ice: cleanup Rx queue context programming functions Jacob Keller
  12 siblings, 0 replies; 20+ messages in thread
From: Jacob Keller @ 2024-08-28 20:57 UTC (permalink / raw)
  To: Vladimir Oltean, netdev, Anthony Nguyen, Intel Wired LAN; +Cc: Przemek Kitszel

The ice_rlan_ctx and ice_tlan_ctx structures have some fields which are
intentionally sized larger necessary relative to the packed sizes the data
must fit into. This was done because the original ice_set_ctx() function
and its helpers did not correctly handle packing when the packed bits
straddled a byte. This is no longer the case with the use of the
<linux/packing.h> implementation.

Save some bytes in these structures by sizing the variables to the number
of bytes the actual bitpacked fields fit into.

There are a couple of gaps left in the structure, which is a result of the
fields being in the order they appear in the packed bit layout, but where
alignment forces some extra gaps. We could fix this, saving ~8 bytes from
each structure. However, these structures are not used heavily, and the
resulting savings is minimal:

$ bloat-o-meter ice-before-reorder.ko ice-after-reorder.ko
add/remove: 0/0 grow/shrink: 1/1 up/down: 26/-70 (-44)
Function                                     old     new   delta
ice_vsi_cfg_txq                             1873    1899     +26
ice_setup_rx_ctx.constprop                  1529    1459     -70
Total: Before=1459555, After=1459511, chg -0.00%

Thus, the fields are left in the same order as the packed bit layout,
despite the gaps this causes.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_lan_tx_rx.h | 32 ++++++++------------------
 1 file changed, 10 insertions(+), 22 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_lan_tx_rx.h b/drivers/net/ethernet/intel/ice/ice_lan_tx_rx.h
index 0e8ed8c226e6..4f68d2787d7d 100644
--- a/drivers/net/ethernet/intel/ice/ice_lan_tx_rx.h
+++ b/drivers/net/ethernet/intel/ice/ice_lan_tx_rx.h
@@ -377,23 +377,17 @@ enum ice_rx_flex_desc_status_error_1_bits {
 #define ICE_TX_DRBELL_Q_CTX_SIZE_DWORDS	5
 #define GLTCLAN_CQ_CNTX(i, CQ)		(GLTCLAN_CQ_CNTX0(CQ) + ((i) * 0x0800))
 
-/* RLAN Rx queue context data
- *
- * The sizes of the variables may be larger than needed due to crossing byte
- * boundaries. If we do not have the width of the variable set to the correct
- * size then we could end up shifting bits off the top of the variable when the
- * variable is at the top of a byte and crosses over into the next byte.
- */
+/* RLAN Rx queue context data */
 struct ice_rlan_ctx {
 	u16 head;
-	u16 cpuid; /* bigger than needed, see above for reason */
+	u8 cpuid;
 #define ICE_RLAN_BASE_S 7
 	u64 base;
 	u16 qlen;
 #define ICE_RLAN_CTX_DBUF_S 7
-	u16 dbuf; /* bigger than needed, see above for reason */
+	u8 dbuf;
 #define ICE_RLAN_CTX_HBUF_S 6
-	u16 hbuf; /* bigger than needed, see above for reason */
+	u8 hbuf;
 	u8 dtype;
 	u8 dsize;
 	u8 crcstrip;
@@ -401,12 +395,12 @@ struct ice_rlan_ctx {
 	u8 hsplit_0;
 	u8 hsplit_1;
 	u8 showiv;
-	u32 rxmax; /* bigger than needed, see above for reason */
+	u16 rxmax;
 	u8 tphrdesc_ena;
 	u8 tphwdesc_ena;
 	u8 tphdata_ena;
 	u8 tphhead_ena;
-	u16 lrxqthresh; /* bigger than needed, see above for reason */
+	u8 lrxqthresh;
 	u8 prefena;	/* NOTE: normally must be set to 1 at init */
 };
 
@@ -551,18 +545,12 @@ enum ice_tx_ctx_desc_eipt_offload {
 #define ICE_LAN_TXQ_MAX_QGRPS	127
 #define ICE_LAN_TXQ_MAX_QDIS	1023
 
-/* Tx queue context data
- *
- * The sizes of the variables may be larger than needed due to crossing byte
- * boundaries. If we do not have the width of the variable set to the correct
- * size then we could end up shifting bits off the top of the variable when the
- * variable is at the top of a byte and crosses over into the next byte.
- */
+/* Tx queue context data */
 struct ice_tlan_ctx {
 #define ICE_TLAN_CTX_BASE_S	7
 	u64 base;		/* base is defined in 128-byte units */
 	u8 port_num;
-	u16 cgd_num;		/* bigger than needed, see above for reason */
+	u8 cgd_num;
 	u8 pf_num;
 	u16 vmvf_num;
 	u8 vmvf_type;
@@ -573,7 +561,7 @@ struct ice_tlan_ctx {
 	u8 tsyn_ena;
 	u8 internal_usage_flag;
 	u8 alt_vlan;
-	u16 cpuid;		/* bigger than needed, see above for reason */
+	u8 cpuid;
 	u8 wb_mode;
 	u8 tphrd_desc;
 	u8 tphrd;
@@ -582,7 +570,7 @@ struct ice_tlan_ctx {
 	u16 qnum_in_func;
 	u8 itr_notification_mode;
 	u8 adjust_prof_id;
-	u32 qlen;		/* bigger than needed, see above for reason */
+	u16 qlen;
 	u8 quanta_prof_idx;
 	u8 tso_ena;
 	u16 tso_qnum;

-- 
2.46.0.124.g2dc1a81c8933


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH iwl-next v2 12/13] ice: move prefetch enable to ice_setup_rx_ctx
  2024-08-28 20:57 [PATCH iwl-next v2 00/13] ice: use <linux/packing.h> for Tx and Rx queue context data Jacob Keller
                   ` (10 preceding siblings ...)
  2024-08-28 20:57 ` [PATCH iwl-next v2 11/13] ice: reduce size of queue context fields Jacob Keller
@ 2024-08-28 20:57 ` Jacob Keller
  2024-08-28 20:57 ` [PATCH iwl-next v2 13/13] ice: cleanup Rx queue context programming functions Jacob Keller
  12 siblings, 0 replies; 20+ messages in thread
From: Jacob Keller @ 2024-08-28 20:57 UTC (permalink / raw)
  To: Vladimir Oltean, netdev, Anthony Nguyen, Intel Wired LAN; +Cc: Przemek Kitszel

The ice_write_rxq_ctx() function is responsible for programming the Rx
Queue context into hardware. It receives the configuration in unpacked form
via the ice_rlan_ctx structure.

This function unconditionally modifies the context to set the prefetch
enable bit. This was done by commit c31a5c25bb19 ("ice: Always set prefena
when configuring an Rx queue"). Setting this bit makes sense, since
prefetching descriptors is almost always the preferred behavior.

However, the ice_write_rxq_ctx() function is not the place that actually
defines the queue context. We initialize the Rx Queue context in
ice_setup_rx_ctx(). It is surprising to have the Rx queue context changed
by a function who's responsibility is to program the given context to
hardware.

Following the principle of least surprise, move the setting of the prefetch
enable bit out of ice_write_rxq_ctx() and into the ice_setup_rx_ctx().

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_base.c   | 3 +++
 drivers/net/ethernet/intel/ice/ice_common.c | 9 +++------
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_base.c b/drivers/net/ethernet/intel/ice/ice_base.c
index 1881ce8105ca..3fe87a30c29e 100644
--- a/drivers/net/ethernet/intel/ice/ice_base.c
+++ b/drivers/net/ethernet/intel/ice/ice_base.c
@@ -453,6 +453,9 @@ static int ice_setup_rx_ctx(struct ice_rx_ring *ring)
 	/* Rx queue threshold in units of 64 */
 	rlan_ctx.lrxqthresh = 1;
 
+	/* Enable descriptor prefetch */
+	rlan_ctx.prefena = 1;
+
 	/* PF acts as uplink for switchdev; set flex descriptor with src_vsi
 	 * metadata and flags to allow redirecting to PR netdev
 	 */
diff --git a/drivers/net/ethernet/intel/ice/ice_common.c b/drivers/net/ethernet/intel/ice/ice_common.c
index 09a94c20e16d..67273e4af7ff 100644
--- a/drivers/net/ethernet/intel/ice/ice_common.c
+++ b/drivers/net/ethernet/intel/ice/ice_common.c
@@ -1495,14 +1495,13 @@ const struct ice_ctx_ele ice_rlan_ctx_info[] = {
 };
 
 /**
- * ice_write_rxq_ctx
+ * ice_write_rxq_ctx - Write Rx Queue context to hardware
  * @hw: pointer to the hardware structure
  * @rlan_ctx: pointer to the rxq context
  * @rxq_index: the index of the Rx queue
  *
- * Converts rxq context from sparse to dense structure and then writes
- * it to HW register space and enables the hardware to prefetch descriptors
- * instead of only fetching them on demand
+ * Pack the sparse Rx Queue context into dense hardware format and write it
+ * into the HW register space.
  */
 int ice_write_rxq_ctx(struct ice_hw *hw, struct ice_rlan_ctx *rlan_ctx,
 		      u32 rxq_index)
@@ -1512,8 +1511,6 @@ int ice_write_rxq_ctx(struct ice_hw *hw, struct ice_rlan_ctx *rlan_ctx,
 	if (!rlan_ctx)
 		return -EINVAL;
 
-	rlan_ctx->prefena = 1;
-
 	ice_pack_rxq_ctx(rlan_ctx, ctx_buf);
 
 	return ice_copy_rxq_ctx_to_hw(hw, ctx_buf, rxq_index);

-- 
2.46.0.124.g2dc1a81c8933


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH iwl-next v2 13/13] ice: cleanup Rx queue context programming functions
  2024-08-28 20:57 [PATCH iwl-next v2 00/13] ice: use <linux/packing.h> for Tx and Rx queue context data Jacob Keller
                   ` (11 preceding siblings ...)
  2024-08-28 20:57 ` [PATCH iwl-next v2 12/13] ice: move prefetch enable to ice_setup_rx_ctx Jacob Keller
@ 2024-08-28 20:57 ` Jacob Keller
  12 siblings, 0 replies; 20+ messages in thread
From: Jacob Keller @ 2024-08-28 20:57 UTC (permalink / raw)
  To: Vladimir Oltean, netdev, Anthony Nguyen, Intel Wired LAN; +Cc: Przemek Kitszel

The ice_copy_rxq_ctx_to_hw() and ice_write_rxq_ctx() functions perform some
defensive checks which are typically frowned upon by kernel style
guidelines.

In particular, they perform NULL checks on buffers which are never expected
to be NULL. Both functions are only called once and always have valid
buffers pointing to stack memory. These checks only serve to hide potential
programming error, as we will not produce the normal crash dump on a NULL
access.

In addition, ice_copy_rxq_ctx_to_hw() cannot fail in another way, so could
be made void.

Future support for VF Live Migration will need to introduce an inverse
function for reading Rx queue context from HW registers to unpack it, as
well as functions to pack and unpack Tx queue context from HW.

Rather than copying these style issues into the new functions, lets first
cleanup the existing code.

For the ice_copy_rxq_ctx_to_hw() function:

 * Remove the NULL parameter check.
 * Move the Rx queue index check out of this function.
 * Convert the function to a void return.
 * Use a simple int variable instead of a u8 for the for loop index.
 * Use a local variable and array syntax to access the rxq_ctx.
 * Update the function description to better align with kernel doc style.

For the ice_write_rxq_ctx() function:

 * Use the more common '= {}' syntax for initializing the context buffer.
 * Move the Rx queue index check into this function.
 * Update the function description with a Returns: to align with kernel doc
   style.

These changes align the existing write functions to current kernel
style, and will align with the style of the new functions added when we
implement live migration in a future series.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_common.c | 42 ++++++++++++-----------------
 1 file changed, 17 insertions(+), 25 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_common.c b/drivers/net/ethernet/intel/ice/ice_common.c
index 67273e4af7ff..bf4f4b6195f9 100644
--- a/drivers/net/ethernet/intel/ice/ice_common.c
+++ b/drivers/net/ethernet/intel/ice/ice_common.c
@@ -1357,34 +1357,22 @@ int ice_reset(struct ice_hw *hw, enum ice_reset_req req)
 }

 /**
- * ice_copy_rxq_ctx_to_hw
+ * ice_copy_rxq_ctx_to_hw - Copy packed Rx queue context to HW registers
  * @hw: pointer to the hardware structure
- * @ice_rxq_ctx: pointer to the rxq context
+ * @rxq_ctx: pointer to the packed Rx queue context
  * @rxq_index: the index of the Rx queue
- *
- * Copies rxq context from dense structure to HW register space
  */
-static int
-ice_copy_rxq_ctx_to_hw(struct ice_hw *hw, u8 *ice_rxq_ctx, u32 rxq_index)
+static void ice_copy_rxq_ctx_to_hw(struct ice_hw *hw, u8 *rxq_ctx,
+				   u32 rxq_index)
 {
-	u8 i;
-
-	if (!ice_rxq_ctx)
-		return -EINVAL;
-
-	if (rxq_index > QRX_CTRL_MAX_INDEX)
-		return -EINVAL;
-
 	/* Copy each dword separately to HW */
-	for (i = 0; i < ICE_RXQ_CTX_SIZE_DWORDS; i++) {
-		wr32(hw, QRX_CONTEXT(i, rxq_index),
-		     *((u32 *)(ice_rxq_ctx + (i * sizeof(u32)))));
+	for (int i = 0; i < ICE_RXQ_CTX_SIZE_DWORDS; i++) {
+		u32 ctx = ((u32 *)rxq_ctx)[i];

-		ice_debug(hw, ICE_DBG_QCTX, "qrxdata[%d]: %08X\n", i,
-			  *((u32 *)(ice_rxq_ctx + (i * sizeof(u32)))));
+		wr32(hw, QRX_CONTEXT(i, rxq_index), ctx);
+
+		ice_debug(hw, ICE_DBG_QCTX, "qrxdata[%d]: %08X\n", i, ctx);
 	}
-
-	return 0;
 }

 /**
@@ -1497,23 +1485,27 @@ const struct ice_ctx_ele ice_rlan_ctx_info[] = {
 /**
  * ice_write_rxq_ctx - Write Rx Queue context to hardware
  * @hw: pointer to the hardware structure
- * @rlan_ctx: pointer to the rxq context
+ * @rlan_ctx: pointer to the packed Rx queue context
  * @rxq_index: the index of the Rx queue
  *
  * Pack the sparse Rx Queue context into dense hardware format and write it
  * into the HW register space.
+ *
+ * Return: 0 on success, or -EINVAL if the Rx queue index is invalid.
  */
 int ice_write_rxq_ctx(struct ice_hw *hw, struct ice_rlan_ctx *rlan_ctx,
 		      u32 rxq_index)
 {
-	u8 ctx_buf[ICE_RXQ_CTX_SZ] = { 0 };
+	u8 ctx_buf[ICE_RXQ_CTX_SZ] = {};

-	if (!rlan_ctx)
+	if (rxq_index > QRX_CTRL_MAX_INDEX)
 		return -EINVAL;

 	ice_pack_rxq_ctx(rlan_ctx, ctx_buf);

-	return ice_copy_rxq_ctx_to_hw(hw, ctx_buf, rxq_index);
+	ice_copy_rxq_ctx_to_hw(hw, ctx_buf, rxq_index);
+
+	return 0;
 }

 /* LAN Tx Queue Context */

-- 
2.46.0.124.g2dc1a81c8933

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH iwl-next v2 08/13] lib: packing: fix QUIRK_MSB_ON_THE_RIGHT behavior
  2024-08-28 20:57 ` [PATCH iwl-next v2 08/13] lib: packing: fix QUIRK_MSB_ON_THE_RIGHT behavior Jacob Keller
@ 2024-09-02 23:25   ` Vladimir Oltean
  2024-09-03 21:03     ` Jacob Keller
  0 siblings, 1 reply; 20+ messages in thread
From: Vladimir Oltean @ 2024-09-02 23:25 UTC (permalink / raw)
  To: Jacob Keller; +Cc: netdev, Anthony Nguyen, Intel Wired LAN, Przemek Kitszel

Hi Jacob,

It's very cool that you and Przemek (and possibly others) spent the time
to untangle this. Thanks! Just a microscopic nitpick below.

On Wed, Aug 28, 2024 at 01:57:24PM -0700, Jacob Keller wrote:
> The QUIRK_MSB_ON_THE_RIGHT quirk is intended to modify pack() and unpack()
> so that the most significant bit of each byte in the packed layout is on
> the right.
> 
> The way the quirk is currently implemented is broken whenever the packing
> code packs or unpacks any value that is not exactly a full byte.
> 
> The broken behavior can occur when packing any values smaller than one
> byte, when packing any value that is not exactly a whole number of bytes,
> or when the packing is not aligned to a byte boundary.
> 
> This quirk is documented in the following way:
> 
>   1. Normally (no quirks), we would do it like this:
> 
>   ::
> 
>     63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32
>     7                       6                       5                        4
>     31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0
>     3                       2                       1                        0
> 
>   <snip>
> 
>   2. If QUIRK_MSB_ON_THE_RIGHT is set, we do it like this:
> 
>   ::
> 
>     56 57 58 59 60 61 62 63 48 49 50 51 52 53 54 55 40 41 42 43 44 45 46 47 32 33 34 35 36 37 38 39
>     7                       6                        5                       4
>     24 25 26 27 28 29 30 31 16 17 18 19 20 21 22 23  8  9 10 11 12 13 14 15  0  1  2  3  4  5  6  7
>     3                       2                        1                       0
> 
>   That is, QUIRK_MSB_ON_THE_RIGHT does not affect byte positioning, but
>   inverts bit offsets inside a byte.
> 
> Essentially, the mapping for physical bit offsets should be reserved for a
							      ~~~~~~~~
							      reversed

> given byte within the payload. This reversal should be fixed to the bytes
> in the packing layout.
> 
> The logic to implement this quirk is handled within the
> adjust_for_msb_right_quirk() function. This function does not work properly
> when dealing with the bytes that contain only a partial amount of data.
> 
> In particular, consider trying to pack or unpack the range 53-44. We should
> always be mapping the bits from the logical ordering to their physical
> ordering in the same way, regardless of what sequence of bits we are
> unpacking.
> 
> This, we should grab the following logical bits:
> 
>   Logical: 55 54 53 52 51 50 49 48 47 45 44 43 42 41 40 39
>                   ^  ^  ^  ^  ^  ^  ^  ^  ^

These 16 bits should have been 55-40. Bit 46 is missing, and bit 39 is
extraneous.

Also, I honestly think that another "Byte boundary:" line would help the
reader see the transformation proposed as an example better. Like this:

 Byte boundary: |                       |                       |
       Logical:   55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40
                         ^  ^  ^  ^  ^  ^  ^  ^  ^

> 
> And pack them into the physical bits:
> 
>    Physical: 48 49 50 51 52 53 54 55 40 41 42 43 44 45 46 47
>     Logical: 48 49 50 51 52 53                   44 45 46 47
>               ^  ^  ^  ^  ^  ^                    ^  ^  ^  ^
> 
> The current logic in adjust_for_msb_right_quirk is broken. I believe it is
> intending to map according to the following:
> 
>   Physical: 48 49 50 51 52 53 54 55 40 41 42 43 44 45 46 47
>    Logical:       48 49 50 51 52 53 44 45 46 47
>                    ^  ^  ^  ^  ^  ^  ^  ^  ^  ^
> 
> That is, it tries to keep the bits at the start and end of a packing
> together. This is wrong, as it makes the packing change what bit is being
> mapped to what based on which bits you're currently packing or unpacking.
> 
> Worse, the actual calculations within adjust_for_msb_right_quirk don't make
> sense.
> 
> Consider the case when packing the last byte of an unaligned packing. It
> might have a start bit of 7 and an end bit of 5. This would have a width of
> 3 bits. The new_start_bit will be calculated as the width - the box_end_bit
> - 1. This will underflow and produce a negative value, which will
> ultimate result in generating a new box_mask of all 0s.
> 
> For any other values, the result of the calculations of the
> new_box_end_bit, new_box_start_bit, and the new box_mask will result in the
> exact same values for the box_end_bit, box_start_bit, and box_mask. This
> makes the calculations completely irrelevant.
> 
> If box_end_bit is 0, and box_start_bit is 7, then the entire function of
> adjust_for_msb_right_quirk will boil down to just:
> 
>     *to_write = bitrev8(*to_write)
> 
> The other adjustments are attempting (incorrectly) to keep the bits in the
> same place but just reversed. This is not the right behavior even if
> implemented correctly, as it leaves the mapping dependent on the bit values
> being packed or unpacked.
> 
> Remove adjust_for_msb_right_quirk() and just use bitrev8 to reverse the
> byte order when interacting with the packed data.

Yes, just bitrev8() should be exactly what is needed for the "MSB on the
right within a packed byte" definition.

> 
> In particular, for packing, we need to reverse both the box_mask and the
> physical value being packed. This is done after shifting the value by
> box_end_bit so that the reversed mapping is always aligned to the physical
> buffer byte boundary. The box_mask is reversed as we're about to use it to
> clear any stale bits in the physical buffer at this block.
> 
> For unpacking, we need to reverse the contents of the physical buffer
> *before* masking with the box_mask. This is critical, as the box_mask is a
> logical mask of the bit layout before handling the QUIRK_MSB_ON_THE_RIGHT.
> 
> Add several new tests which cover this behavior. These tests will fail
> without the fix and pass afterwards. Note that no current drivers make use
> of QUIRK_MSB_ON_THE_RIGHT. I suspect this is why there have been no reports
> of this inconsistency before.
> 
> Fixes: 554aae35007e ("lib: Add support for generic packing operations")

When there is no user-observable issue in mainline, I believe there is
no reason for a Fixes: tag, even if the bug is very real. My understanding
is that the role of the tag is to help the backporting process to stable.
Using it here could possibly confuse the maintainers that it needs to be
backported, even though it is spelled out that it needs not be.

> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
> ---

Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Tested-by: Vladimir Oltean <olteanv@gmail.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH iwl-next v2 10/13] ice: use <linux/packing.h> for Tx and Rx queue context data
  2024-08-28 20:57 ` [PATCH iwl-next v2 10/13] ice: use <linux/packing.h> for Tx and Rx queue context data Jacob Keller
@ 2024-09-03  0:08   ` Vladimir Oltean
  2024-09-03 21:16     ` Jacob Keller
  0 siblings, 1 reply; 20+ messages in thread
From: Vladimir Oltean @ 2024-09-03  0:08 UTC (permalink / raw)
  To: Jacob Keller; +Cc: netdev, Anthony Nguyen, Intel Wired LAN, Przemek Kitszel

On Wed, Aug 28, 2024 at 01:57:26PM -0700, Jacob Keller wrote:
> The major difference with <linux/packing.h> is that it expects the unpacked
> data will always be a u64. This is somewhat limiting, but can be worked
> around by using a local temporary u64.
> 
> As with the other implementations, we handle the error codes from pack()
> with a pr_err and a call to dump_stack. These are unexpected as they should
> only happen due to programmer error.
> 
> Note that I initially tried implementing this as functions which just
> repeatably called the ice_ctx_pack() function instead of using the
> ice_rlan_ctx_info and ice_tlan_ctx_info arrays. This does work, but it has
> a couple of downsides:
> 
>  1) it wastes a significant amount of bytes in the text section, vs the
>     savings from removing the RO data of the arrays.
> 
>  2) this cost is made worse after implementing an unpack function, as we
>     must duplicate the list of packings for the unpack function.

I agree with your concerns and with your decision of keeping the
ICE_CTX_STORE tables. Actually I have some more unposted lib/packing
changes which operate on struct packed_field arrays, very, very similar
to the ICE_CTX_STORE tables. Essentially two new calls: pack_fields()
and unpack_fields(), which perform the iteration inside the core library.
(the only real difference being that I went for startbit and endbit in
their definitions, rather than LSB+size).

I came to the realization that this API would be nice exactly because
otherwise, you need to duplicate the field definitions, once for the
pack() call and once for the unpack(). But if they're tables, you don't.

I'm looking at ways in which this new API could solve all the shortcomings
I don't like with lib/packing in general. Without being even aware of
ICE_CTX_STORE, I had actually implemented the type conversion to smaller
unpacked u8/u16/u32 types in exactly the same way.

I also wish to do something about the way in which lib/packing deals
with errors. I don't think it's exactly great for every driver to have
to dump stack and ignore them. And also, since they're programming
errors, it's odd (and bad for performance) to perform the sanity checks
for every function call, instead of just once, say at boot time. So I
had thought of a third new API call: packed_fields_validate(), which
runs at module_init() in the consumer driver (ice, sja1105, ...) and
operates on the field tables.

Basically it is a singular replacement for these existing verifications
in pack() and unpack():

	/* startbit is expected to be larger than endbit, and both are
	 * expected to be within the logically addressable range of the buffer.
	 */
	if (unlikely(startbit < endbit || startbit >= 8 * pbuflen))
		/* Invalid function call */
		return -EINVAL;

	value_width = startbit - endbit + 1;
	if (unlikely(value_width > 64))
		return -ERANGE;

so you actually need to tell packed_fields_validate() what is the size
of the buffer you intend to run pack_fields() and unpack_fields() on.
I don't see it as a problem at all that the packed buffer size must be
fixed and known ahead of time - you also operate on fixed ICE_RXQ_CTX_SZ
and ... hmmm... txq_ctx[22]? sized buffers.

But this packed_fields_validate() idea quickly creates another set of 2
problems which I didn't get to the bottom of:

1. Some sanity checks in pack() are dynamic and cannot be run at
   module_init() time. Like this:

	/* Check if "uval" fits in "value_width" bits.
	 * If value_width is 64, the check will fail, but any
	 * 64-bit uval will surely fit.
	 */
	if (value_width < 64 && uval >= (1ull << value_width))
		/* Cannot store "uval" inside "value_width" bits.
		 * Truncating "uval" is most certainly not desirable,
		 * so simply erroring out is appropriate.
		 */
		return -ERANGE;

   The worse part is that the check is actually useful. It led to the
   quick identification of the bug behind commit 24deec6b9e4a ("net:
   dsa: sja1105: disallow C45 transactions on the BASE-TX MDIO bus"),
   rather than seeing a silent failure. But... I would still really like
   the revised lib/packing API to return void for pack_fields() and
   unpack_fields(). Not really sure how to reconcile these.

2. How to make sure that the pbuflen passed to packed_fields_validate()
   will be the same one as the pbuflen passed to all subsequent pack_fields()
   calls validated against that very pbuflen?

Sorry for not posting code and just telling a story about it. There
isn't much to show other than unfinished ideas with conflicting
requirements. So I thought maybe I could use a little help with some
brainstorming. Of course, do let me know if you are not that interested
in making the ICE_CTX_STORE tables idea a part of the packing core.

Thanks!

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH iwl-next v2 08/13] lib: packing: fix QUIRK_MSB_ON_THE_RIGHT behavior
  2024-09-02 23:25   ` Vladimir Oltean
@ 2024-09-03 21:03     ` Jacob Keller
  0 siblings, 0 replies; 20+ messages in thread
From: Jacob Keller @ 2024-09-03 21:03 UTC (permalink / raw)
  To: Vladimir Oltean; +Cc: netdev, Anthony Nguyen, Intel Wired LAN, Przemek Kitszel



On 9/2/2024 4:25 PM, Vladimir Oltean wrote:
> Hi Jacob,
> 
> It's very cool that you and Przemek (and possibly others) spent the time
> to untangle this. Thanks! Just a microscopic nitpick below.
> 
> On Wed, Aug 28, 2024 at 01:57:24PM -0700, Jacob Keller wrote:
>>   That is, QUIRK_MSB_ON_THE_RIGHT does not affect byte positioning, but
>>   inverts bit offsets inside a byte.
>>
>> Essentially, the mapping for physical bit offsets should be reserved for a
> 							      ~~~~~~~~
> 							      reversed
> 

Yep.

>>   Logical: 55 54 53 52 51 50 49 48 47 45 44 43 42 41 40 39
>>                   ^  ^  ^  ^  ^  ^  ^  ^  ^
> 
> These 16 bits should have been 55-40. Bit 46 is missing, and bit 39 is
> extraneous.
> 
> Also, I honestly think that another "Byte boundary:" line would help the
> reader see the transformation proposed as an example better. Like this:
> 
>  Byte boundary: |                       |                       |
>        Logical:   55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40
>                          ^  ^  ^  ^  ^  ^  ^  ^  ^
> 

Good catch, and yea a byte boundary helps readability.

>> Fixes: 554aae35007e ("lib: Add support for generic packing operations")
> 
> When there is no user-observable issue in mainline, I believe there is
> no reason for a Fixes: tag, even if the bug is very real. My understanding
> is that the role of the tag is to help the backporting process to stable.
> Using it here could possibly confuse the maintainers that it needs to be
> backported, even though it is spelled out that it needs not be.
> 

Fair. I view the fixes tags as also helpful because it helps quickly
locate the code when I'm reviewing a past commit when trying to
understand why something was changed. I could move this to a mention in
the commit message text without an explicit fixes tag though.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH iwl-next v2 10/13] ice: use <linux/packing.h> for Tx and Rx queue context data
  2024-09-03  0:08   ` Vladimir Oltean
@ 2024-09-03 21:16     ` Jacob Keller
  2024-09-03 22:13       ` Vladimir Oltean
  0 siblings, 1 reply; 20+ messages in thread
From: Jacob Keller @ 2024-09-03 21:16 UTC (permalink / raw)
  To: Vladimir Oltean; +Cc: netdev, Anthony Nguyen, Intel Wired LAN, Przemek Kitszel



On 9/2/2024 5:08 PM, Vladimir Oltean wrote:
> On Wed, Aug 28, 2024 at 01:57:26PM -0700, Jacob Keller wrote:
>> The major difference with <linux/packing.h> is that it expects the unpacked
>> data will always be a u64. This is somewhat limiting, but can be worked
>> around by using a local temporary u64.
>>
>> As with the other implementations, we handle the error codes from pack()
>> with a pr_err and a call to dump_stack. These are unexpected as they should
>> only happen due to programmer error.
>>
>> Note that I initially tried implementing this as functions which just
>> repeatably called the ice_ctx_pack() function instead of using the
>> ice_rlan_ctx_info and ice_tlan_ctx_info arrays. This does work, but it has
>> a couple of downsides:
>>
>>  1) it wastes a significant amount of bytes in the text section, vs the
>>     savings from removing the RO data of the arrays.
>>
>>  2) this cost is made worse after implementing an unpack function, as we
>>     must duplicate the list of packings for the unpack function.
> 
> I agree with your concerns and with your decision of keeping the
> ICE_CTX_STORE tables. Actually I have some more unposted lib/packing
> changes which operate on struct packed_field arrays, very, very similar
> to the ICE_CTX_STORE tables. Essentially two new calls: pack_fields()
> and unpack_fields(), which perform the iteration inside the core library.
> (the only real difference being that I went for startbit and endbit in
> their definitions, rather than LSB+size).
> 

I only kept my interface in terms of lsb+size because I did not want to
attempt to re-write the table. I actually did re-write the table at
first, and discovered a documentation bug because our documentation for
the table has incorrect lsb/msb for some of the fields in some versions
of the doc!

I ultimately don't mind switching to the packing convention of start/end
(though my brain does have trouble sometimes thinking of the start as
the higher bit...)

> I came to the realization that this API would be nice exactly because
> otherwise, you need to duplicate the field definitions, once for the
> pack() call and once for the unpack(). But if they're tables, you don't.
> 
> I'm looking at ways in which this new API could solve all the shortcomings
> I don't like with lib/packing in general. Without being even aware of
> ICE_CTX_STORE, I had actually implemented the type conversion to smaller
> unpacked u8/u16/u32 types in exactly the same way.

I think having this in the core API with a standardized table, along
with support for unpacking the types would be great!

> 
> I also wish to do something about the way in which lib/packing deals
> with errors. I don't think it's exactly great for every driver to have
> to dump stack and ignore them. And also, since they're programming
> errors, it's odd (and bad for performance) to perform the sanity checks
> for every function call, instead of just once, say at boot time. So I
> had thought of a third new API call: packed_fields_validate(), which
> runs at module_init() in the consumer driver (ice, sja1105, ...) and
> operates on the field tables.
> 

It does seem reasonable to me that we can move the sanity checks here,
especially since the main programmer error is if this table is incorrect
in one of these ways.

> Basically it is a singular replacement for these existing verifications
> in pack() and unpack():
> 
> 	/* startbit is expected to be larger than endbit, and both are
> 	 * expected to be within the logically addressable range of the buffer.
> 	 */
> 	if (unlikely(startbit < endbit || startbit >= 8 * pbuflen))
> 		/* Invalid function call */
> 		return -EINVAL;
> 
> 	value_width = startbit - endbit + 1;
> 	if (unlikely(value_width > 64))
> 		return -ERANGE;
> 
> so you actually need to tell packed_fields_validate() what is the size
> of the buffer you intend to run pack_fields() and unpack_fields() on.
> I don't see it as a problem at all that the packed buffer size must be
> fixed and known ahead of time - you also operate on fixed ICE_RXQ_CTX_SZ
> and ... hmmm... txq_ctx[22]? sized buffers.
> 

Yea, these are fixed sizes. Strictly I think we could have a macro
defining the size of the Tx queue context as well....

> But this packed_fields_validate() idea quickly creates another set of 2
> problems which I didn't get to the bottom of:
> 
> 1. Some sanity checks in pack() are dynamic and cannot be run at
>    module_init() time. Like this:
> 
> 	/* Check if "uval" fits in "value_width" bits.
> 	 * If value_width is 64, the check will fail, but any
> 	 * 64-bit uval will surely fit.
> 	 */
> 	if (value_width < 64 && uval >= (1ull << value_width))
> 		/* Cannot store "uval" inside "value_width" bits.
> 		 * Truncating "uval" is most certainly not desirable,
> 		 * so simply erroring out is appropriate.
> 		 */
> 		return -ERANGE;
> 

If we support u8/u16/u32/u64 sizes as well, you could check that the
size of the unpacked variable too. Could this data be in the table? Oh I
guess technically not because we are checking if the actual value passed
fits. I think keeping this but making it WARN would be sufficient?

>    The worse part is that the check is actually useful. It led to the
>    quick identification of the bug behind commit 24deec6b9e4a ("net:
>    dsa: sja1105: disallow C45 transactions on the BASE-TX MDIO bus"),
>    rather than seeing a silent failure. But... I would still really like
>    the revised lib/packing API to return void for pack_fields() and
>    unpack_fields(). Not really sure how to reconcile these.
> 

Since this is generally programmer error (not something where uAPI can
affect it) what about making these WARN in the core?

> 2. How to make sure that the pbuflen passed to packed_fields_validate()
>    will be the same one as the pbuflen passed to all subsequent pack_fields()
>    calls validated against that very pbuflen?
> 

I guess you either duplicate the check and WARN, or you don't, and let
it panic on the invalid memory? But I guess that would only actually
panic with something like KASAN

> Sorry for not posting code and just telling a story about it. There
> isn't much to show other than unfinished ideas with conflicting
> requirements. So I thought maybe I could use a little help with some
> brainstorming. Of course, do let me know if you are not that interested
> in making the ICE_CTX_STORE tables idea a part of the packing core.
> 
> Thanks!

I think moving this into core would be fantastic. Since pretty much
every driver handles these sanity checks the same way, I also think that
moving those into the core and making them WARN or similar seems
reasonable, so we can make the pack/unpack as void.

It would be interesting to see a comparison of the resulting module size.

How much of this do you have code for now?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH iwl-next v2 10/13] ice: use <linux/packing.h> for Tx and Rx queue context data
  2024-09-03 21:16     ` Jacob Keller
@ 2024-09-03 22:13       ` Vladimir Oltean
  2024-09-03 22:24         ` Jacob Keller
  0 siblings, 1 reply; 20+ messages in thread
From: Vladimir Oltean @ 2024-09-03 22:13 UTC (permalink / raw)
  To: Jacob Keller; +Cc: netdev, Anthony Nguyen, Intel Wired LAN, Przemek Kitszel

On Tue, Sep 03, 2024 at 02:16:59PM -0700, Jacob Keller wrote:
> I only kept my interface in terms of lsb+size because I did not want to
> attempt to re-write the table. I actually did re-write the table at
> first, and discovered a documentation bug because our documentation for
> the table has incorrect lsb/msb for some of the fields in some versions
> of the doc!
> 
> I ultimately don't mind switching to the packing convention of start/end
> (though my brain does have trouble sometimes thinking of the start as
> the higher bit...)

Well, you can keep the ICE_CTX_STORE() macro formatted in whichever way
you want, including Width+LSB, but reimplement it to generate a struct
packed_field, formatted as startbit+endbit (endbit=LSB, startbit=LSB+Width-1).
I actually encourage any user of the packing library to write the code
in the closest way possible to the field definitions as they appear in
the documentation. What do you think?

> > I came to the realization that this API would be nice exactly because
> > otherwise, you need to duplicate the field definitions, once for the
> > pack() call and once for the unpack(). But if they're tables, you don't.
> > 
> > I'm looking at ways in which this new API could solve all the shortcomings
> > I don't like with lib/packing in general. Without being even aware of
> > ICE_CTX_STORE, I had actually implemented the type conversion to smaller
> > unpacked u8/u16/u32 types in exactly the same way.
> 
> I think having this in the core API with a standardized table, along
> with support for unpacking the types would be great!

Cool then!

> > I also wish to do something about the way in which lib/packing deals
> > with errors. I don't think it's exactly great for every driver to have
> > to dump stack and ignore them. And also, since they're programming
> > errors, it's odd (and bad for performance) to perform the sanity checks
> > for every function call, instead of just once, say at boot time. So I
> > had thought of a third new API call: packed_fields_validate(), which
> > runs at module_init() in the consumer driver (ice, sja1105, ...) and
> > operates on the field tables.
> > 
> 
> It does seem reasonable to me that we can move the sanity checks here,
> especially since the main programmer error is if this table is incorrect
> in one of these ways.

Actually I did manage to cook something up which I like even more than
packed_fields_validate(): a compile-time sanity check. I'm not completely
happy with it, just reasonably happy. You'll see.

> > Basically it is a singular replacement for these existing verifications
> > in pack() and unpack():
> > 
> > 	/* startbit is expected to be larger than endbit, and both are
> > 	 * expected to be within the logically addressable range of the buffer.
> > 	 */
> > 	if (unlikely(startbit < endbit || startbit >= 8 * pbuflen))
> > 		/* Invalid function call */
> > 		return -EINVAL;
> > 
> > 	value_width = startbit - endbit + 1;
> > 	if (unlikely(value_width > 64))
> > 		return -ERANGE;
> > 
> > so you actually need to tell packed_fields_validate() what is the size
> > of the buffer you intend to run pack_fields() and unpack_fields() on.
> > I don't see it as a problem at all that the packed buffer size must be
> > fixed and known ahead of time - you also operate on fixed ICE_RXQ_CTX_SZ
> > and ... hmmm... txq_ctx[22]? sized buffers.
> > 
> 
> Yea, these are fixed sizes. Strictly I think we could have a macro
> defining the size of the Tx queue context as well....

Yeah, you'll need than when you'll see what I've prepared below :)

> > But this packed_fields_validate() idea quickly creates another set of 2
> > problems which I didn't get to the bottom of:
> > 
> > 1. Some sanity checks in pack() are dynamic and cannot be run at
> >    module_init() time. Like this:
> > 
> > 	/* Check if "uval" fits in "value_width" bits.
> > 	 * If value_width is 64, the check will fail, but any
> > 	 * 64-bit uval will surely fit.
> > 	 */
> > 	if (value_width < 64 && uval >= (1ull << value_width))
> > 		/* Cannot store "uval" inside "value_width" bits.
> > 		 * Truncating "uval" is most certainly not desirable,
> > 		 * so simply erroring out is appropriate.
> > 		 */
> > 		return -ERANGE;
> > 
> 
> If we support u8/u16/u32/u64 sizes as well, you could check that the
> size of the unpacked variable too. Could this data be in the table? Oh I
> guess technically not because we are checking if the actual value passed
> fits. I think keeping this but making it WARN would be sufficient?

The u8/u16/u32/u64 unpacked field size will absolutely be held in the
struct packed_field tables.

> >    The worse part is that the check is actually useful. It led to the
> >    quick identification of the bug behind commit 24deec6b9e4a ("net:
> >    dsa: sja1105: disallow C45 transactions on the BASE-TX MDIO bus"),
> >    rather than seeing a silent failure. But... I would still really like
> >    the revised lib/packing API to return void for pack_fields() and
> >    unpack_fields(). Not really sure how to reconcile these.
> > 
> 
> Since this is generally programmer error (not something where uAPI can
> affect it) what about making these WARN in the core?

Yes, I went for demoting the -ERANGE error in the truncation case to a
WARN() that is printed for both the "int pack()" as well as the new
"void __pack()" call. Practically, it doesn't make a big difference
whether we bail out or do nothing, as long as we loudly complain in
either case. It's the complaint that leads to the bug getting easily
identified and fixed.

> > 2. How to make sure that the pbuflen passed to packed_fields_validate()
> >    will be the same one as the pbuflen passed to all subsequent pack_fields()
> >    calls validated against that very pbuflen?
> 
> I guess you either duplicate the check and WARN, or you don't, and let
> it panic on the invalid memory? But I guess that would only actually
> panic with something like KASAN

Yeah, I'm still not too sure here. The length of the packed buffer still
needs to be "obvious to the eye", since the same length must be manually
passed to the sanity check as well as to the actual pack_fields() call,
otherwise "bad things" will happen. I believe in the users' ability to
structure their code in such a way that this is not hard. Especially
since the sanity checks now rely on BUILD_BUG_ON(), and that can be
technically be placed anywhere in the code, I expect the best place to
put it is exactly near the pack_fields() call. That way, it's the most
obvious that the buffer length declared for the sanity check is
identical to the one during actual usage. Especially if the usage can be
restricted to just one function or two.

> > Sorry for not posting code and just telling a story about it. There
> > isn't much to show other than unfinished ideas with conflicting
> > requirements. So I thought maybe I could use a little help with some
> > brainstorming. Of course, do let me know if you are not that interested
> > in making the ICE_CTX_STORE tables idea a part of the packing core.
> > 
> > Thanks!
> 
> I think moving this into core would be fantastic. Since pretty much
> every driver handles these sanity checks the same way, I also think that
> moving those into the core and making them WARN or similar seems
> reasonable, so we can make the pack/unpack as void.
> 
> It would be interesting to see a comparison of the resulting module size.

Yeah, I'm also very much interested in comparisons: text size vs rodata
size vs dynamic memory usage. With the pack_fields() API in the sja1105
driver, I can shrink an enormous amount of u64 unpacked structure fields
to just u8. I also really like the idea of compile-time sanity checks,
and I'm curious how much that matters in a benchmark or something.
Anyway, I don't have a complete rework at the moment, so there's that.

> How much of this do you have code for now?

Well, I do have code, but it's not yet complete :)
I've updated by https://github.com/vladimiroltean/linux/tree/packing-selftests
branch on top of your patch set. Until HEAD~1, I've tested the sja1105
driver and it still works. The last patch - the bulk of the conversion
actually - is extremely tedious and is not yet ready (it doesn't even
compile). Yet, with a bit of imagination, it should be enough to provide
an example and hopefully move the discussion forward.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH iwl-next v2 10/13] ice: use <linux/packing.h> for Tx and Rx queue context data
  2024-09-03 22:13       ` Vladimir Oltean
@ 2024-09-03 22:24         ` Jacob Keller
  0 siblings, 0 replies; 20+ messages in thread
From: Jacob Keller @ 2024-09-03 22:24 UTC (permalink / raw)
  To: Vladimir Oltean; +Cc: netdev, Anthony Nguyen, Intel Wired LAN, Przemek Kitszel



On 9/3/2024 3:13 PM, Vladimir Oltean wrote:
> On Tue, Sep 03, 2024 at 02:16:59PM -0700, Jacob Keller wrote:
>> I only kept my interface in terms of lsb+size because I did not want to
>> attempt to re-write the table. I actually did re-write the table at
>> first, and discovered a documentation bug because our documentation for
>> the table has incorrect lsb/msb for some of the fields in some versions
>> of the doc!
>>
>> I ultimately don't mind switching to the packing convention of start/end
>> (though my brain does have trouble sometimes thinking of the start as
>> the higher bit...)
> 
> Well, you can keep the ICE_CTX_STORE() macro formatted in whichever way
> you want, including Width+LSB, but reimplement it to generate a struct
> packed_field, formatted as startbit+endbit (endbit=LSB, startbit=LSB+Width-1).
> I actually encourage any user of the packing library to write the code
> in the closest way possible to the field definitions as they appear in
> the documentation. What do you think?

Our documentation has both lsb, msb, and the width. I agree, and i think
it would benefit if the packing documentation suggests this as well.

> 
>>> I came to the realization that this API would be nice exactly because
>>> otherwise, you need to duplicate the field definitions, once for the
>>> pack() call and once for the unpack(). But if they're tables, you don't.
>>>
>>> I'm looking at ways in which this new API could solve all the shortcomings
>>> I don't like with lib/packing in general. Without being even aware of
>>> ICE_CTX_STORE, I had actually implemented the type conversion to smaller
>>> unpacked u8/u16/u32 types in exactly the same way.
>>
>> I think having this in the core API with a standardized table, along
>> with support for unpacking the types would be great!
> 
> Cool then!
> 
>>> I also wish to do something about the way in which lib/packing deals
>>> with errors. I don't think it's exactly great for every driver to have
>>> to dump stack and ignore them. And also, since they're programming
>>> errors, it's odd (and bad for performance) to perform the sanity checks
>>> for every function call, instead of just once, say at boot time. So I
>>> had thought of a third new API call: packed_fields_validate(), which
>>> runs at module_init() in the consumer driver (ice, sja1105, ...) and
>>> operates on the field tables.
>>>
>>
>> It does seem reasonable to me that we can move the sanity checks here,
>> especially since the main programmer error is if this table is incorrect
>> in one of these ways.
> 
> Actually I did manage to cook something up which I like even more than
> packed_fields_validate(): a compile-time sanity check. I'm not completely
> happy with it, just reasonably happy. You'll see.>>> Basically it is a singular replacement for these existing verifications
>>> in pack() and unpack():
>>>
>>> 	/* startbit is expected to be larger than endbit, and both are
>>> 	 * expected to be within the logically addressable range of the buffer.
>>> 	 */
>>> 	if (unlikely(startbit < endbit || startbit >= 8 * pbuflen))
>>> 		/* Invalid function call */
>>> 		return -EINVAL;
>>>
>>> 	value_width = startbit - endbit + 1;
>>> 	if (unlikely(value_width > 64))
>>> 		return -ERANGE;
>>>
>>> so you actually need to tell packed_fields_validate() what is the size
>>> of the buffer you intend to run pack_fields() and unpack_fields() on.
>>> I don't see it as a problem at all that the packed buffer size must be
>>> fixed and known ahead of time - you also operate on fixed ICE_RXQ_CTX_SZ
>>> and ... hmmm... txq_ctx[22]? sized buffers.
>>>
>>
>> Yea, these are fixed sizes. Strictly I think we could have a macro
>> defining the size of the Tx queue context as well....
> 
> Yeah, you'll need than when you'll see what I've prepared below :)
> 
>>> But this packed_fields_validate() idea quickly creates another set of 2
>>> problems which I didn't get to the bottom of:
>>>
>>> 1. Some sanity checks in pack() are dynamic and cannot be run at
>>>    module_init() time. Like this:
>>>
>>> 	/* Check if "uval" fits in "value_width" bits.
>>> 	 * If value_width is 64, the check will fail, but any
>>> 	 * 64-bit uval will surely fit.
>>> 	 */
>>> 	if (value_width < 64 && uval >= (1ull << value_width))
>>> 		/* Cannot store "uval" inside "value_width" bits.
>>> 		 * Truncating "uval" is most certainly not desirable,
>>> 		 * so simply erroring out is appropriate.
>>> 		 */
>>> 		return -ERANGE;
>>>
>>
>> If we support u8/u16/u32/u64 sizes as well, you could check that the
>> size of the unpacked variable too. Could this data be in the table? Oh I
>> guess technically not because we are checking if the actual value passed
>> fits. I think keeping this but making it WARN would be sufficient?
> 
> The u8/u16/u32/u64 unpacked field size will absolutely be held in the
> struct packed_field tables.
> 
>>>    The worse part is that the check is actually useful. It led to the
>>>    quick identification of the bug behind commit 24deec6b9e4a ("net:
>>>    dsa: sja1105: disallow C45 transactions on the BASE-TX MDIO bus"),
>>>    rather than seeing a silent failure. But... I would still really like
>>>    the revised lib/packing API to return void for pack_fields() and
>>>    unpack_fields(). Not really sure how to reconcile these.
>>>
>>
>> Since this is generally programmer error (not something where uAPI can
>> affect it) what about making these WARN in the core?
> 
> Yes, I went for demoting the -ERANGE error in the truncation case to a
> WARN() that is printed for both the "int pack()" as well as the new
> "void __pack()" call. Practically, it doesn't make a big difference
> whether we bail out or do nothing, as long as we loudly complain in
> either case. It's the complaint that leads to the bug getting easily
> identified and fixed.
>

Right.


>>> 2. How to make sure that the pbuflen passed to packed_fields_validate()
>>>    will be the same one as the pbuflen passed to all subsequent pack_fields()
>>>    calls validated against that very pbuflen?
>>
>> I guess you either duplicate the check and WARN, or you don't, and let
>> it panic on the invalid memory? But I guess that would only actually
>> panic with something like KASAN
> 
> Yeah, I'm still not too sure here. The length of the packed buffer still
> needs to be "obvious to the eye", since the same length must be manually
> passed to the sanity check as well as to the actual pack_fields() call,
> otherwise "bad things" will happen. I believe in the users' ability to
> structure their code in such a way that this is not hard. Especially
> since the sanity checks now rely on BUILD_BUG_ON(), and that can be
> technically be placed anywhere in the code, I expect the best place to
> put it is exactly near the pack_fields() call. That way, it's the most
> obvious that the buffer length declared for the sanity check is
> identical to the one during actual usage. Especially if the usage can be
> restricted to just one function or two.
> 

Makes sense. I may need to do a little work on the ice driver to make
that fit well :D I think currently we have a packing for the Tx queue
context in different places.

>>> Sorry for not posting code and just telling a story about it. There
>>> isn't much to show other than unfinished ideas with conflicting
>>> requirements. So I thought maybe I could use a little help with some
>>> brainstorming. Of course, do let me know if you are not that interested
>>> in making the ICE_CTX_STORE tables idea a part of the packing core.
>>>
>>> Thanks!
>>
>> I think moving this into core would be fantastic. Since pretty much
>> every driver handles these sanity checks the same way, I also think that
>> moving those into the core and making them WARN or similar seems
>> reasonable, so we can make the pack/unpack as void.
>>
>> It would be interesting to see a comparison of the resulting module size.
> 
> Yeah, I'm also very much interested in comparisons: text size vs rodata
> size vs dynamic memory usage. With the pack_fields() API in the sja1105
> driver, I can shrink an enormous amount of u64 unpacked structure fields
> to just u8. I also really like the idea of compile-time sanity checks,
> and I'm curious how much that matters in a benchmark or something.
> Anyway, I don't have a complete rework at the moment, so there's that.
> 

Yea. Actually my entire initial motivation for starting looking at
reworking ice was the weird sizes used in our unpacked context
structure... :D

>> How much of this do you have code for now?
> 
> Well, I do have code, but it's not yet complete :)
> I've updated by https://github.com/vladimiroltean/linux/tree/packing-selftests
> branch on top of your patch set. Until HEAD~1, I've tested the sja1105
> driver and it still works. The last patch - the bulk of the conversion
> actually - is extremely tedious and is not yet ready (it doesn't even
> compile). Yet, with a bit of imagination, it should be enough to provide
> an example and hopefully move the discussion forward.

Ok sounds good. I'll take a look at this.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2024-09-03 22:25 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-28 20:57 [PATCH iwl-next v2 00/13] ice: use <linux/packing.h> for Tx and Rx queue context data Jacob Keller
2024-08-28 20:57 ` [PATCH iwl-next v2 01/13] lib: packing: refuse operating on bit indices which exceed size of buffer Jacob Keller
2024-08-28 20:57 ` [PATCH iwl-next v2 02/13] lib: packing: adjust definitions and implementation for arbitrary buffer lengths Jacob Keller
2024-08-28 20:57 ` [PATCH iwl-next v2 03/13] lib: packing: remove kernel-doc from header file Jacob Keller
2024-08-28 20:57 ` [PATCH iwl-next v2 04/13] lib: packing: add pack() and unpack() wrappers over packing() Jacob Keller
2024-08-28 20:57 ` [PATCH iwl-next v2 05/13] lib: packing: duplicate pack() and unpack() implementations Jacob Keller
2024-08-28 20:57 ` [PATCH iwl-next v2 06/13] lib: packing: add KUnit tests adapted from selftests Jacob Keller
2024-08-28 20:57 ` [PATCH iwl-next v2 07/13] lib: packing: add additional KUnit tests Jacob Keller
2024-08-28 20:57 ` [PATCH iwl-next v2 08/13] lib: packing: fix QUIRK_MSB_ON_THE_RIGHT behavior Jacob Keller
2024-09-02 23:25   ` Vladimir Oltean
2024-09-03 21:03     ` Jacob Keller
2024-08-28 20:57 ` [PATCH iwl-next v2 09/13] ice: remove int_q_state from ice_tlan_ctx Jacob Keller
2024-08-28 20:57 ` [PATCH iwl-next v2 10/13] ice: use <linux/packing.h> for Tx and Rx queue context data Jacob Keller
2024-09-03  0:08   ` Vladimir Oltean
2024-09-03 21:16     ` Jacob Keller
2024-09-03 22:13       ` Vladimir Oltean
2024-09-03 22:24         ` Jacob Keller
2024-08-28 20:57 ` [PATCH iwl-next v2 11/13] ice: reduce size of queue context fields Jacob Keller
2024-08-28 20:57 ` [PATCH iwl-next v2 12/13] ice: move prefetch enable to ice_setup_rx_ctx Jacob Keller
2024-08-28 20:57 ` [PATCH iwl-next v2 13/13] ice: cleanup Rx queue context programming functions Jacob Keller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).