* [PATCH AUTOSEL 6.12 45/93] net: mlx4: add SOF_TIMESTAMPING_TX_SOFTWARE flag when getting ts info
[not found] <20250604005919.4191884-1-sashal@kernel.org>
@ 2025-06-04 0:58 ` Sasha Levin
2025-06-04 0:59 ` [PATCH AUTOSEL 6.12 82/93] RDMA/hns: initialize db in update_srq_db() Sasha Levin
1 sibling, 0 replies; 2+ messages in thread
From: Sasha Levin @ 2025-06-04 0:58 UTC (permalink / raw)
To: patches, stable
Cc: Jason Xing, Tariq Toukan, Jakub Kicinski, Sasha Levin, netdev,
linux-rdma
From: Jason Xing <kernelxing@tencent.com>
[ Upstream commit b86bcfee30576b752302c55693fff97242b35dfd ]
As mlx4 has implemented skb_tx_timestamp() in mlx4_en_xmit(), the
SOFTWARE flag is surely needed when users are trying to get timestamp
information.
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250510093442.79711-1-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
Perfect! Now I can see that `skb_tx_timestamp(skb)` is called in the
`mlx4_en_xmit()` function, which is the main transmit function. This
confirms that the mlx4 driver does implement software timestamping. Now
let me compare this to the similar commits to see the pattern: **YES**
This commit should be backported to stable kernel trees. **Extensive
Explanation:** This commit addresses a clear inconsistency bug where the
mlx4 driver implements software timestamping functionality but fails to
advertise this capability to userspace through the ethtool timestamping
info interface. **Analysis of the fix:** 1. **The Problem**: The mlx4
driver calls `skb_tx_timestamp(skb)` in `mlx4_en_xmit()` at line 1091 of
`/drivers/net/ethernet/mellanox/mlx4/en_tx.c`, which means it properly
implements software TX timestamping. However, in
`mlx4_en_get_ts_info()`, it only advertises hardware timestamping
capabilities (`SOF_TIMESTAMPING_TX_HARDWARE`,
`SOF_TIMESTAMPING_RX_HARDWARE`, `SOF_TIMESTAMPING_RAW_HARDWARE`) but
omits the `SOF_TIMESTAMPING_TX_SOFTWARE` flag. 2. **The Fix**: The
change adds a single line `SOF_TIMESTAMPING_TX_SOFTWARE |` to the
`info->so_timestamping` field when hardware timestamping is supported,
correctly advertising both hardware and software timestamping
capabilities. 3. **Impact**: This is a user-facing API correctness
issue. Applications checking timestamping capabilities via ethtool would
incorrectly believe software timestamping is not supported, even though
the driver implements it. **Comparison with similar commits:** -
**Similar Commit #1 (YES)**: Also fixed a timestamping bug in mlx4,
showing these are important fixes - **Similar Commit #3 (YES)**: Fixed
incorrect timestamping capability reporting in mlx5 by removing false
software timestamping indication - **Similar Commit #5 (YES)**: Fixed
timestamping capabilities reporting using wrong BIT() macro in mlx5 The
pattern shows that timestamping capability reporting bugs are
consistently considered backport-worthy because they affect user-visible
API behavior. **Backport Criteria Assessment:** ✅ **Fixes a user-
affecting bug**: Applications cannot detect software timestamping
support ✅ **Small, contained change**: Single line addition with no
functional logic changes ✅ **No architectural changes**: Only fixes
capability reporting ✅ **Low regression risk**: Only adds a missing
flag, doesn't change driver behavior ✅ **Clear side effects**: None
beyond correctly advertising existing functionality ✅ **Follows stable
rules**: Important API correctness fix with minimal risk The fix is
identical in nature to Similar Commit #3 (marked YES) but in reverse -
where mlx5 removed false software timestamping advertisement, this adds
the missing software timestamping advertisement for mlx4 where it's
actually implemented.
drivers/net/ethernet/mellanox/mlx4/en_ethtool.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
index cd17a3f4faf83..a68cd3f0304c6 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
@@ -1897,6 +1897,7 @@ static int mlx4_en_get_ts_info(struct net_device *dev,
if (mdev->dev->caps.flags2 & MLX4_DEV_CAP_FLAG2_TS) {
info->so_timestamping |=
SOF_TIMESTAMPING_TX_HARDWARE |
+ SOF_TIMESTAMPING_TX_SOFTWARE |
SOF_TIMESTAMPING_RX_HARDWARE |
SOF_TIMESTAMPING_RAW_HARDWARE;
--
2.39.5
^ permalink raw reply related [flat|nested] 2+ messages in thread
* [PATCH AUTOSEL 6.12 82/93] RDMA/hns: initialize db in update_srq_db()
[not found] <20250604005919.4191884-1-sashal@kernel.org>
2025-06-04 0:58 ` [PATCH AUTOSEL 6.12 45/93] net: mlx4: add SOF_TIMESTAMPING_TX_SOFTWARE flag when getting ts info Sasha Levin
@ 2025-06-04 0:59 ` Sasha Levin
1 sibling, 0 replies; 2+ messages in thread
From: Sasha Levin @ 2025-06-04 0:59 UTC (permalink / raw)
To: patches, stable
Cc: Chen Linxuan, Winston Wen, Leon Romanovsky, Sasha Levin,
tangchengchang, huangjunxian6, linux-rdma
From: Chen Linxuan <chenlinxuan@uniontech.com>
[ Upstream commit ffe1cee21f8b533ae27c3a31bfa56b8c1b27fa6e ]
On x86_64 with gcc version 13.3.0, I compile
drivers/infiniband/hw/hns/hns_roce_hw_v2.c with:
make defconfig
./scripts/kconfig/merge_config.sh .config <(
echo CONFIG_COMPILE_TEST=y
echo CONFIG_HNS3=m
echo CONFIG_INFINIBAND=m
echo CONFIG_INFINIBAND_HNS_HIP08=m
)
make KCFLAGS="-fno-inline-small-functions -fno-inline-functions-called-once" \
drivers/infiniband/hw/hns/hns_roce_hw_v2.o
Then I get a compile error:
CALL scripts/checksyscalls.sh
DESCEND objtool
INSTALL libsubcmd_headers
CC [M] drivers/infiniband/hw/hns/hns_roce_hw_v2.o
In file included from drivers/infiniband/hw/hns/hns_roce_hw_v2.c:47:
drivers/infiniband/hw/hns/hns_roce_hw_v2.c: In function 'update_srq_db':
drivers/infiniband/hw/hns/hns_roce_common.h:74:17: error: 'db' is used uninitialized [-Werror=uninitialized]
74 | *((__le32 *)_ptr + (field_h) / 32) &= \
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/infiniband/hw/hns/hns_roce_common.h:90:17: note: in expansion of macro '_hr_reg_clear'
90 | _hr_reg_clear(ptr, field_type, field_h, field_l); \
| ^~~~~~~~~~~~~
drivers/infiniband/hw/hns/hns_roce_common.h:95:39: note: in expansion of macro '_hr_reg_write'
95 | #define hr_reg_write(ptr, field, val) _hr_reg_write(ptr, field, val)
| ^~~~~~~~~~~~~
drivers/infiniband/hw/hns/hns_roce_hw_v2.c:948:9: note: in expansion of macro 'hr_reg_write'
948 | hr_reg_write(&db, DB_TAG, srq->srqn);
| ^~~~~~~~~~~~
drivers/infiniband/hw/hns/hns_roce_hw_v2.c:946:31: note: 'db' declared here
946 | struct hns_roce_v2_db db;
| ^~
cc1: all warnings being treated as errors
Signed-off-by: Chen Linxuan <chenlinxuan@uniontech.com>
Co-developed-by: Winston Wen <wentao@uniontech.com>
Signed-off-by: Winston Wen <wentao@uniontech.com>
Link: https://patch.msgid.link/FF922C77946229B6+20250411105459.90782-5-chenlinxuan@uniontech.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
Based on my analysis of the commit and examination of the kernel source
code, here is my assessment: **YES** This commit should be backported to
stable kernel trees for the following extensive reasons: ## 1.
**Critical Hardware Register Corruption Risk** The commit fixes a
genuine bug where an uninitialized stack variable containing random data
is written directly to hardware registers. In the `update_srq_db()`
function: ```c struct hns_roce_v2_db db; // UNINITIALIZED - contains
random stack data hr_reg_write(&db, DB_TAG, srq->srqn); // Only sets
specific bits hr_reg_write(&db, DB_CMD, HNS_ROCE_V2_SRQ_DB); // Only
sets specific bits hr_reg_write(&db, DB_PI, srq->idx_que.head); // Only
sets specific bits hns_roce_write64(hr_dev, (__le32 *)&db, srq->db_reg);
// Writes ENTIRE structure to hardware ``` The `hr_reg_write()` macros
only modify specific bit fields within the 64-bit structure. Any bits
not explicitly set retain their random initial values from the stack,
which then get written to the hardware doorbell register. ## 2.
**Pattern Inconsistency Indicates Bug** My examination of the codebase
shows that ALL other similar functions correctly initialize their
database structures: - `update_sq_db()`: `struct hns_roce_v2_db sq_db =
{};` ✓ - `update_rq_db()`: `struct hns_roce_v2_db rq_db = {};` ✓ -
`update_cq_db()`: `struct hns_roce_v2_db cq_db = {};` ✓ -
`update_srq_db()`: `struct hns_roce_v2_db db;` ✗ (the only exception)
This pattern strongly indicates that the uninitialized `db` variable in
`update_srq_db()` is a bug rather than intentional design. ## 3. **Real
Runtime Impact Potential** This is not merely a cosmetic compiler
warning. The uninitialized data can cause: - **Subtle SRQ functionality
degradation**: Random bits in hardware doorbell registers can confuse
the hardware - **Performance issues**: Incorrect doorbell values may
cause hardware to misinterpret commands - **Silent data corruption**:
Unlike crashes, this bug could cause hard-to-debug networking issues -
**Hardware state corruption**: Writing garbage to hardware registers can
put the device in unexpected states ## 4. **Production Environment
Risk** SRQ (Shared Receive Queue) is a critical InfiniBand/RoCE feature
used in high-performance computing and data center environments. Any
corruption in SRQ operations can: - Impact network throughput and
latency - Cause intermittent packet loss or reordering - Lead to
difficult-to-reproduce performance regressions - Affect reliability of
distributed applications ## 5. **Low-Risk, High-Value Fix** The fix is
minimal and safe: - Changes only initialization: `struct hns_roce_v2_db
db = {};` - Zero-initialization has no functional side effects - Follows
established patterns in the same driver - No API changes or behavioral
modifications - Extremely low regression risk ## 6. **Similar Historical
Precedent** Looking at the similar commits provided, commit #4 (Status:
YES) shows a nearly identical fix for uninitialized variable usage in
the same driver family: ```c doorbell[0] = cons_index &
((hr_cq->cq_depth << 1) - 1); +doorbell[1] = 0; // Initialize before use
roce_set_bit(doorbell[1], ROCEE_DB_OTHERS_H_ROCEE_DB_OTH_HW_SYNS_S, 1);
``` This historical precedent confirms that uninitialized variable fixes
in HNS RoCE hardware register operations are considered important enough
for stable backporting. ## 7. **Affects Multiple Compiler Versions** The
commit message shows this affects GCC 13.3.0 with specific optimization
flags, but uninitialized variable usage is fundamentally unsafe
regardless of compiler version or warning visibility. **Conclusion**:
This commit fixes a real hardware register corruption bug with potential
for silent data corruption in production RoCE/InfiniBand environments.
The fix is minimal, safe, and follows established patterns. It should be
backported to ensure system reliability in stable kernel deployments.
drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
index f5c3e560df58d..d63c66c4a8b73 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
@@ -943,7 +943,7 @@ static void fill_wqe_idx(struct hns_roce_srq *srq, unsigned int wqe_idx)
static void update_srq_db(struct hns_roce_srq *srq)
{
struct hns_roce_dev *hr_dev = to_hr_dev(srq->ibsrq.device);
- struct hns_roce_v2_db db;
+ struct hns_roce_v2_db db = {};
hr_reg_write(&db, DB_TAG, srq->srqn);
hr_reg_write(&db, DB_CMD, HNS_ROCE_V2_SRQ_DB);
--
2.39.5
^ permalink raw reply related [flat|nested] 2+ messages in thread
end of thread, other threads:[~2025-06-04 1:01 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20250604005919.4191884-1-sashal@kernel.org>
2025-06-04 0:58 ` [PATCH AUTOSEL 6.12 45/93] net: mlx4: add SOF_TIMESTAMPING_TX_SOFTWARE flag when getting ts info Sasha Levin
2025-06-04 0:59 ` [PATCH AUTOSEL 6.12 82/93] RDMA/hns: initialize db in update_srq_db() Sasha Levin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox