From: Sasha Levin <sashal@kernel.org>
To: patches@lists.linux.dev, stable@vger.kernel.org
Cc: Chen Linxuan <chenlinxuan@uniontech.com>,
Winston Wen <wentao@uniontech.com>,
Leon Romanovsky <leon@kernel.org>,
Sasha Levin <sashal@kernel.org>,
tangchengchang@huawei.com, huangjunxian6@hisilicon.com,
linux-rdma@vger.kernel.org
Subject: [PATCH AUTOSEL 6.14 094/108] RDMA/hns: initialize db in update_srq_db()
Date: Tue, 3 Jun 2025 20:55:17 -0400 [thread overview]
Message-ID: <20250604005531.4178547-94-sashal@kernel.org> (raw)
In-Reply-To: <20250604005531.4178547-1-sashal@kernel.org>
From: Chen Linxuan <chenlinxuan@uniontech.com>
[ Upstream commit ffe1cee21f8b533ae27c3a31bfa56b8c1b27fa6e ]
On x86_64 with gcc version 13.3.0, I compile
drivers/infiniband/hw/hns/hns_roce_hw_v2.c with:
make defconfig
./scripts/kconfig/merge_config.sh .config <(
echo CONFIG_COMPILE_TEST=y
echo CONFIG_HNS3=m
echo CONFIG_INFINIBAND=m
echo CONFIG_INFINIBAND_HNS_HIP08=m
)
make KCFLAGS="-fno-inline-small-functions -fno-inline-functions-called-once" \
drivers/infiniband/hw/hns/hns_roce_hw_v2.o
Then I get a compile error:
CALL scripts/checksyscalls.sh
DESCEND objtool
INSTALL libsubcmd_headers
CC [M] drivers/infiniband/hw/hns/hns_roce_hw_v2.o
In file included from drivers/infiniband/hw/hns/hns_roce_hw_v2.c:47:
drivers/infiniband/hw/hns/hns_roce_hw_v2.c: In function 'update_srq_db':
drivers/infiniband/hw/hns/hns_roce_common.h:74:17: error: 'db' is used uninitialized [-Werror=uninitialized]
74 | *((__le32 *)_ptr + (field_h) / 32) &= \
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/infiniband/hw/hns/hns_roce_common.h:90:17: note: in expansion of macro '_hr_reg_clear'
90 | _hr_reg_clear(ptr, field_type, field_h, field_l); \
| ^~~~~~~~~~~~~
drivers/infiniband/hw/hns/hns_roce_common.h:95:39: note: in expansion of macro '_hr_reg_write'
95 | #define hr_reg_write(ptr, field, val) _hr_reg_write(ptr, field, val)
| ^~~~~~~~~~~~~
drivers/infiniband/hw/hns/hns_roce_hw_v2.c:948:9: note: in expansion of macro 'hr_reg_write'
948 | hr_reg_write(&db, DB_TAG, srq->srqn);
| ^~~~~~~~~~~~
drivers/infiniband/hw/hns/hns_roce_hw_v2.c:946:31: note: 'db' declared here
946 | struct hns_roce_v2_db db;
| ^~
cc1: all warnings being treated as errors
Signed-off-by: Chen Linxuan <chenlinxuan@uniontech.com>
Co-developed-by: Winston Wen <wentao@uniontech.com>
Signed-off-by: Winston Wen <wentao@uniontech.com>
Link: https://patch.msgid.link/FF922C77946229B6+20250411105459.90782-5-chenlinxuan@uniontech.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
Based on my analysis of the commit and examination of the kernel source
code, here is my assessment: **YES** This commit should be backported to
stable kernel trees for the following extensive reasons: ## 1.
**Critical Hardware Register Corruption Risk** The commit fixes a
genuine bug where an uninitialized stack variable containing random data
is written directly to hardware registers. In the `update_srq_db()`
function: ```c struct hns_roce_v2_db db; // UNINITIALIZED - contains
random stack data hr_reg_write(&db, DB_TAG, srq->srqn); // Only sets
specific bits hr_reg_write(&db, DB_CMD, HNS_ROCE_V2_SRQ_DB); // Only
sets specific bits hr_reg_write(&db, DB_PI, srq->idx_que.head); // Only
sets specific bits hns_roce_write64(hr_dev, (__le32 *)&db, srq->db_reg);
// Writes ENTIRE structure to hardware ``` The `hr_reg_write()` macros
only modify specific bit fields within the 64-bit structure. Any bits
not explicitly set retain their random initial values from the stack,
which then get written to the hardware doorbell register. ## 2.
**Pattern Inconsistency Indicates Bug** My examination of the codebase
shows that ALL other similar functions correctly initialize their
database structures: - `update_sq_db()`: `struct hns_roce_v2_db sq_db =
{};` ✓ - `update_rq_db()`: `struct hns_roce_v2_db rq_db = {};` ✓ -
`update_cq_db()`: `struct hns_roce_v2_db cq_db = {};` ✓ -
`update_srq_db()`: `struct hns_roce_v2_db db;` ✗ (the only exception)
This pattern strongly indicates that the uninitialized `db` variable in
`update_srq_db()` is a bug rather than intentional design. ## 3. **Real
Runtime Impact Potential** This is not merely a cosmetic compiler
warning. The uninitialized data can cause: - **Subtle SRQ functionality
degradation**: Random bits in hardware doorbell registers can confuse
the hardware - **Performance issues**: Incorrect doorbell values may
cause hardware to misinterpret commands - **Silent data corruption**:
Unlike crashes, this bug could cause hard-to-debug networking issues -
**Hardware state corruption**: Writing garbage to hardware registers can
put the device in unexpected states ## 4. **Production Environment
Risk** SRQ (Shared Receive Queue) is a critical InfiniBand/RoCE feature
used in high-performance computing and data center environments. Any
corruption in SRQ operations can: - Impact network throughput and
latency - Cause intermittent packet loss or reordering - Lead to
difficult-to-reproduce performance regressions - Affect reliability of
distributed applications ## 5. **Low-Risk, High-Value Fix** The fix is
minimal and safe: - Changes only initialization: `struct hns_roce_v2_db
db = {};` - Zero-initialization has no functional side effects - Follows
established patterns in the same driver - No API changes or behavioral
modifications - Extremely low regression risk ## 6. **Similar Historical
Precedent** Looking at the similar commits provided, commit #4 (Status:
YES) shows a nearly identical fix for uninitialized variable usage in
the same driver family: ```c doorbell[0] = cons_index &
((hr_cq->cq_depth << 1) - 1); +doorbell[1] = 0; // Initialize before use
roce_set_bit(doorbell[1], ROCEE_DB_OTHERS_H_ROCEE_DB_OTH_HW_SYNS_S, 1);
``` This historical precedent confirms that uninitialized variable fixes
in HNS RoCE hardware register operations are considered important enough
for stable backporting. ## 7. **Affects Multiple Compiler Versions** The
commit message shows this affects GCC 13.3.0 with specific optimization
flags, but uninitialized variable usage is fundamentally unsafe
regardless of compiler version or warning visibility. **Conclusion**:
This commit fixes a real hardware register corruption bug with potential
for silent data corruption in production RoCE/InfiniBand environments.
The fix is minimal, safe, and follows established patterns. It should be
backported to ensure system reliability in stable kernel deployments.
drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
index 160e8927d364e..afd2ea6da3ee2 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
@@ -943,7 +943,7 @@ static void fill_wqe_idx(struct hns_roce_srq *srq, unsigned int wqe_idx)
static void update_srq_db(struct hns_roce_srq *srq)
{
struct hns_roce_dev *hr_dev = to_hr_dev(srq->ibsrq.device);
- struct hns_roce_v2_db db;
+ struct hns_roce_v2_db db = {};
hr_reg_write(&db, DB_TAG, srq->srqn);
hr_reg_write(&db, DB_CMD, HNS_ROCE_V2_SRQ_DB);
--
2.39.5
prev parent reply other threads:[~2025-06-04 0:58 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20250604005531.4178547-1-sashal@kernel.org>
2025-06-04 0:54 ` [PATCH AUTOSEL 6.14 050/108] net: mlx4: add SOF_TIMESTAMPING_TX_SOFTWARE flag when getting ts info Sasha Levin
2025-06-04 0:55 ` Sasha Levin [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250604005531.4178547-94-sashal@kernel.org \
--to=sashal@kernel.org \
--cc=chenlinxuan@uniontech.com \
--cc=huangjunxian6@hisilicon.com \
--cc=leon@kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=patches@lists.linux.dev \
--cc=stable@vger.kernel.org \
--cc=tangchengchang@huawei.com \
--cc=wentao@uniontech.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox