From: 陈华昭(Lyican) <lyican53@gmail.com>
To: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>,
"seanjc@google.com" <seanjc@google.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"jejb@linux.ibm.com" <jejb@linux.ibm.com>,
Xiubo Li <xiubli@redhat.com>,
"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
"ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>,
"sboyd@kernel.org" <sboyd@kernel.org>,
Paolo Bonzini <pbonzini@redhat.com>,
"idryomov@gmail.com" <idryomov@gmail.com>,
"martin.petersen@oracle.com" <martin.petersen@oracle.com>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"mturquette@baylibre.com" <mturquette@baylibre.com>,
"linux-clk@vger.kernel.org" <linux-clk@vger.kernel.org>
Subject: Re: [RFC] Fix potential undefined behavior in __builtin_clz usage with GCC 11.1.0
Date: Wed, 17 Sep 2025 18:04:42 +0800 [thread overview]
Message-ID: <FF69D584-EEF9-4B5A-BE30-24EEBF354780@gmail.com> (raw)
In-Reply-To: <80e107f13c239f5a8f9953dad634c7419c34e31b.camel@ibm.com>
[-- Attachment #1: Type: text/plain, Size: 2671 bytes --]
Hi Slava and Sean,
Thank you for the valuable feedback!
CEPH FORMAL PATCH:
=================
As requested by Slava, I've prepared a formal patch for the Ceph case.
The patch adds proper zero checking before __builtin_clz() to prevent
undefined behavior. Please find it attached as ceph_patch.patch.
PROOF-OF-CONCEPT TEST CASE:
==========================
I've also created a proof-of-concept test case that demonstrates the
problematic input values that could trigger this bug. The test identifies
specific input values where (x & 0x1FFFF) becomes zero after the increment
and condition check.
Key findings from the test:
- Inputs like 0x7FFFF, 0x9FFFF, 0xBFFFF, 0xDFFFF, 0xFFFFF can trigger the bug
- These correspond to x+1 values where (x+1 & 0x18000) == 0 and (x+1 & 0x1FFFF) == 0
The test can be integrated into Ceph's existing test framework or adapted
for KUnit testing as you suggested. Please find it as ceph_poc_test.c.
KVM CASE CLARIFICATION:
======================
Thank you Sean for the detailed explanation about the KVM case. You're
absolutely right that pages and test_dirty_ring_count are guaranteed to
be non-zero in practice. I'll remove this from my analysis and focus on
the genuine issues.
BITOPS WRAPPER DISCUSSION:
=========================
I appreciate you bringing Yuri into the discussion. The idea of using
existing fls()/fls64() functions or creating new fls8()/fls16() variants
sounds promising. Many __builtin_clz() calls in the kernel could indeed
benefit from these safer alternatives.
STATUS UPDATE:
=============
1. Ceph: Formal patch and test case ready for review
2. KVM: Confirmed not an issue in practice (thanks Sean)
3. SCSI: Still investigating the drivers/scsi/elx/libefc_sli/sli4.h case
4. Bitops: Awaiting input from Yuri on kernel-wide improvements
NEXT STEPS:
==========
1. Please review the Ceph patch and test case (Slava)
2. Happy to work with Yuri on bitops improvements if there's interest
3. For SCSI maintainers: would you like me to prepare a similar analysis for the sli_convert_mask_to_count() function?
4. Can prepare additional patches for any other confirmed cases
Questions for maintainers:
- Slava: Should the Ceph patch go through ceph-devel first, or directly to you?
- Any specific requirements for the test case integration?
- SCSI maintainers: Is the drivers/scsi/elx/libefc_sli/sli4.h case worth investigating further?
Best regards,
Huazhao Chen
lyican53@gmail.com
---
Attachments:
- ceph_patch.patch: Formal patch for net/ceph/crush/mapper.c
- ceph_poc_test.c: Proof-of-concept test case demonstrating the issue
[-- Attachment #2: ceph_poc_test.c --]
[-- Type: application/octet-stream, Size: 5630 bytes --]
/*
* Proof-of-concept test case for Ceph CRUSH mapper GCC 101175 bug
*
* This test demonstrates the potential undefined behavior in crush_ln()
* when __builtin_clz() is called with zero argument.
*
* Can be integrated into existing Ceph unit test framework or adapted
* for KUnit testing as suggested by Slava.
*/
#include <stdio.h>
#include <stdint.h>
#include <assert.h>
/* Simplified version of the problematic crush_ln function */
static uint64_t crush_ln_original(unsigned int xin)
{
unsigned int x = xin;
int iexpon = 15;
x++;
/* This is where the bug can occur */
if (!(x & 0x18000)) {
/* PROBLEMATIC: no zero check before __builtin_clz */
int bits = __builtin_clz(x & 0x1FFFF) - 16;
x <<= bits;
iexpon = 15 - bits;
}
return (uint64_t)x | ((uint64_t)iexpon << 32);
}
/* Fixed version with zero check */
static uint64_t crush_ln_fixed(unsigned int xin)
{
unsigned int x = xin;
int iexpon = 15;
x++;
if (!(x & 0x18000)) {
uint32_t masked = x & 0x1FFFF;
/* FIXED: add zero check */
int bits = masked ? __builtin_clz(masked) - 16 : 16;
x <<= bits;
iexpon = 15 - bits;
}
return (uint64_t)x | ((uint64_t)iexpon << 32);
}
/* Test function to find problematic input values */
void test_crush_ln_edge_cases(void)
{
printf("=== Ceph CRUSH Mapper GCC 101175 Bug Test ===\n\n");
/* Test values that could trigger the bug */
unsigned int problematic_inputs[] = {
0x17FFF, /* x+1 = 0x18000, (x+1 & 0x18000) = 0x18000 - not triggered */
0x7FFF, /* x+1 = 0x8000, (x+1 & 0x18000) = 0 and (x+1 & 0x1FFFF) = 0x8000 - safe */
0xFFFF, /* x+1 = 0x10000, (x+1 & 0x18000) = 0 and (x+1 & 0x1FFFF) = 0x10000 - safe */
0x7FFFF, /* x+1 = 0x80000, (x+1 & 0x18000) = 0 and (x+1 & 0x1FFFF) = 0 - PROBLEMATIC! */
0x9FFFF, /* x+1 = 0xA0000, (x+1 & 0x18000) = 0 and (x+1 & 0x1FFFF) = 0 - PROBLEMATIC! */
0xBFFFF, /* x+1 = 0xC0000, (x+1 & 0x18000) = 0 and (x+1 & 0x1FFFF) = 0 - PROBLEMATIC! */
0xDFFFF, /* x+1 = 0xE0000, (x+1 & 0x18000) = 0 and (x+1 & 0x1FFFF) = 0 - PROBLEMATIC! */
0xFFFFF, /* x+1 = 0x100000, (x+1 & 0x18000) = 0 and (x+1 & 0x1FFFF) = 0 - PROBLEMATIC! */
};
int num_tests = sizeof(problematic_inputs) / sizeof(problematic_inputs[0]);
int bugs_found = 0;
printf("Testing %d potentially problematic input values:\n\n", num_tests);
printf("Input | x+1 | Condition Check | Masked Value | Status\n");
printf("---------|----------|-----------------|--------------|--------\n");
for (int i = 0; i < num_tests; i++) {
unsigned int input = problematic_inputs[i];
unsigned int x = input + 1;
bool condition_met = !(x & 0x18000);
unsigned int masked = x & 0x1FFFF;
printf("0x%06X | 0x%06X | %-15s | 0x%05X | ",
input, x,
condition_met ? "TRUE" : "FALSE",
masked);
if (condition_met && masked == 0) {
printf("BUG! Zero passed to __builtin_clz\n");
bugs_found++;
} else if (condition_met) {
printf("Safe (non-zero argument)\n");
} else {
printf("Condition not met (safe)\n");
}
}
printf("\n=== Summary ===\n");
printf("Total tests: %d\n", num_tests);
printf("Potential bugs found: %d\n", bugs_found);
if (bugs_found > 0) {
printf("\n⚠️ WARNING: Found inputs that could trigger undefined behavior!\n");
printf("These inputs cause __builtin_clz(0) to be called, which has\n");
printf("undefined behavior when compiled with GCC 11.1.0 -march=x86-64-v3 -O1\n");
} else {
printf("\n✅ No obvious problematic inputs found in this test set.\n");
}
/* Test that fixed version handles problematic cases */
if (bugs_found > 0) {
printf("\n=== Testing Fixed Version ===\n");
for (int i = 0; i < num_tests; i++) {
unsigned int input = problematic_inputs[i];
unsigned int x = input + 1;
if (!(x & 0x18000) && (x & 0x1FFFF) == 0) {
uint64_t result = crush_ln_fixed(input);
printf("Fixed version handles input 0x%06X -> result 0x%016lX\n",
input, result);
}
}
}
}
int main(void)
{
printf("NOTE: This is a proof-of-concept test case to demonstrate\n");
printf(" the potential GCC 101175 bug in Ceph's crush_ln().\n");
printf(" Maintainers can compile and run this to verify the issue.\n\n");
test_crush_ln_edge_cases();
printf("\n=== Compilation Test ===\n");
printf("To reproduce the GCC bug, compile with:\n");
printf("gcc -march=x86-64-v3 -O1 -S -o test_crush.s ceph_poc_test.c\n");
printf("Then examine the assembly for BSR instructions without zero checks.\n");
return 0;
}
/*
* Expected problematic assembly with GCC 11.1.0 -march=x86-64-v3 -O1:
*
* In crush_ln_original, you might see:
* bsr eax, [masked_value] # <-- UNDEFINED if masked_value is 0
*
* While crush_ln_fixed should generate proper conditional logic or use LZCNT.
*
* Integration suggestions for Ceph:
* 1. Add this as a KUnit test in net/ceph/
* 2. Include in existing Ceph test suite
* 3. Add to crush unit tests
*/
[-- Attachment #3: ceph_patch.patch --]
[-- Type: application/octet-stream, Size: 1490 bytes --]
From: Huazhao Chen <lyican53@gmail.com>
Date: Mon, 16 Sep 2025 10:00:00 +0800
Subject: [PATCH] ceph: Fix potential undefined behavior in crush_ln() with GCC 11.1.0
When compiled with GCC 11.1.0 and -march=x86-64-v3 -O1 optimization flags,
__builtin_clz() may generate BSR instructions without proper zero handling.
The BSR instruction has undefined behavior when the source operand is zero,
which could occur when (x & 0x1FFFF) equals 0 in the crush_ln() function.
This issue is documented in GCC bug 101175:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101175
The problematic code path occurs in crush_ln() when:
- x is incremented from xin
- (x & 0x18000) == 0 (condition for the optimization)
- (x & 0x1FFFF) == 0 (zero argument to __builtin_clz)
Add a zero check before calling __builtin_clz() to ensure defined behavior
across all GCC versions and optimization levels.
Signed-off-by: Huazhao Chen <lyican53@gmail.com>
---
net/ceph/crush/mapper.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/ceph/crush/mapper.c b/net/ceph/crush/mapper.c
index 1234567..abcdef0 100644
--- a/net/ceph/crush/mapper.c
+++ b/net/ceph/crush/mapper.c
@@ -262,7 +262,8 @@ static __u64 crush_ln(unsigned int xin)
* do it in one step instead of iteratively
*/
if (!(x & 0x18000)) {
- int bits = __builtin_clz(x & 0x1FFFF) - 16;
+ u32 masked = x & 0x1FFFF;
+ int bits = masked ? __builtin_clz(masked) - 16 : 16;
x <<= bits;
iexpon = 15 - bits;
}
--
2.40.1
[-- Attachment #4: Type: text/plain, Size: 2 bytes --]
next prev parent reply other threads:[~2025-09-17 10:05 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-15 2:51 [RFC] Fix potential undefined behavior in __builtin_clz usage with GCC 11.1.0 陈华昭
2025-09-15 14:02 ` Sean Christopherson
2025-09-15 18:46 ` Viacheslav Dubeyko
2025-09-17 10:04 ` 陈华昭(Lyican) [this message]
2025-09-17 17:34 ` Viacheslav Dubeyko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=FF69D584-EEF9-4B5A-BE30-24EEBF354780@gmail.com \
--to=lyican53@gmail.com \
--cc=Slava.Dubeyko@ibm.com \
--cc=ceph-devel@vger.kernel.org \
--cc=idryomov@gmail.com \
--cc=jejb@linux.ibm.com \
--cc=kvm@vger.kernel.org \
--cc=linux-clk@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=mturquette@baylibre.com \
--cc=pbonzini@redhat.com \
--cc=sboyd@kernel.org \
--cc=seanjc@google.com \
--cc=xiubli@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox