public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: 陈华昭(Lyican) <lyican53@gmail.com>
To: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>,
	"seanjc@google.com" <seanjc@google.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"jejb@linux.ibm.com" <jejb@linux.ibm.com>,
	Xiubo Li <xiubli@redhat.com>,
	"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
	"ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>,
	"sboyd@kernel.org" <sboyd@kernel.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	"idryomov@gmail.com" <idryomov@gmail.com>,
	"martin.petersen@oracle.com" <martin.petersen@oracle.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"mturquette@baylibre.com" <mturquette@baylibre.com>,
	"linux-clk@vger.kernel.org" <linux-clk@vger.kernel.org>
Subject: Re: [RFC] Fix potential undefined behavior in __builtin_clz usage with GCC 11.1.0
Date: Wed, 17 Sep 2025 18:04:42 +0800	[thread overview]
Message-ID: <FF69D584-EEF9-4B5A-BE30-24EEBF354780@gmail.com> (raw)
In-Reply-To: <80e107f13c239f5a8f9953dad634c7419c34e31b.camel@ibm.com>

[-- Attachment #1: Type: text/plain, Size: 2671 bytes --]

Hi Slava and Sean,

Thank you for the valuable feedback!

CEPH FORMAL PATCH:
=================

As requested by Slava, I've prepared a formal patch for the Ceph case.
The patch adds proper zero checking before __builtin_clz() to prevent
undefined behavior. Please find it attached as ceph_patch.patch.

PROOF-OF-CONCEPT TEST CASE:
==========================

I've also created a proof-of-concept test case that demonstrates the
problematic input values that could trigger this bug. The test identifies
specific input values where (x & 0x1FFFF) becomes zero after the increment
and condition check.

Key findings from the test:
- Inputs like 0x7FFFF, 0x9FFFF, 0xBFFFF, 0xDFFFF, 0xFFFFF can trigger the bug
- These correspond to x+1 values where (x+1 & 0x18000) == 0 and (x+1 & 0x1FFFF) == 0

The test can be integrated into Ceph's existing test framework or adapted
for KUnit testing as you suggested. Please find it as ceph_poc_test.c.

KVM CASE CLARIFICATION:
======================

Thank you Sean for the detailed explanation about the KVM case. You're
absolutely right that pages and test_dirty_ring_count are guaranteed to
be non-zero in practice. I'll remove this from my analysis and focus on
the genuine issues.

BITOPS WRAPPER DISCUSSION:
=========================

I appreciate you bringing Yuri into the discussion. The idea of using
existing fls()/fls64() functions or creating new fls8()/fls16() variants
sounds promising. Many __builtin_clz() calls in the kernel could indeed
benefit from these safer alternatives.

STATUS UPDATE:
=============

1. Ceph: Formal patch and test case ready for review
2. KVM: Confirmed not an issue in practice (thanks Sean)
3. SCSI: Still investigating the drivers/scsi/elx/libefc_sli/sli4.h case
4. Bitops: Awaiting input from Yuri on kernel-wide improvements

NEXT STEPS:
==========

1. Please review the Ceph patch and test case (Slava)
2. Happy to work with Yuri on bitops improvements if there's interest
3. For SCSI maintainers: would you like me to prepare a similar analysis for the sli_convert_mask_to_count() function?
4. Can prepare additional patches for any other confirmed cases

Questions for maintainers:
- Slava: Should the Ceph patch go through ceph-devel first, or directly to you?
- Any specific requirements for the test case integration?
- SCSI maintainers: Is the drivers/scsi/elx/libefc_sli/sli4.h case worth investigating further?

Best regards,
Huazhao Chen
lyican53@gmail.com

---

Attachments:
- ceph_patch.patch: Formal patch for net/ceph/crush/mapper.c
- ceph_poc_test.c: Proof-of-concept test case demonstrating the issue

[-- Attachment #2: ceph_poc_test.c --]
[-- Type: application/octet-stream, Size: 5630 bytes --]

/*
 * Proof-of-concept test case for Ceph CRUSH mapper GCC 101175 bug
 * 
 * This test demonstrates the potential undefined behavior in crush_ln()
 * when __builtin_clz() is called with zero argument.
 * 
 * Can be integrated into existing Ceph unit test framework or adapted
 * for KUnit testing as suggested by Slava.
 */

#include <stdio.h>
#include <stdint.h>
#include <assert.h>

/* Simplified version of the problematic crush_ln function */
static uint64_t crush_ln_original(unsigned int xin)
{
    unsigned int x = xin;
    int iexpon = 15;
    
    x++;
    
    /* This is where the bug can occur */
    if (!(x & 0x18000)) {
        /* PROBLEMATIC: no zero check before __builtin_clz */
        int bits = __builtin_clz(x & 0x1FFFF) - 16;
        x <<= bits;
        iexpon = 15 - bits;
    }
    
    return (uint64_t)x | ((uint64_t)iexpon << 32);
}

/* Fixed version with zero check */
static uint64_t crush_ln_fixed(unsigned int xin)
{
    unsigned int x = xin;
    int iexpon = 15;
    
    x++;
    
    if (!(x & 0x18000)) {
        uint32_t masked = x & 0x1FFFF;
        /* FIXED: add zero check */
        int bits = masked ? __builtin_clz(masked) - 16 : 16;
        x <<= bits;
        iexpon = 15 - bits;
    }
    
    return (uint64_t)x | ((uint64_t)iexpon << 32);
}

/* Test function to find problematic input values */
void test_crush_ln_edge_cases(void)
{
    printf("=== Ceph CRUSH Mapper GCC 101175 Bug Test ===\n\n");
    
    /* Test values that could trigger the bug */
    unsigned int problematic_inputs[] = {
        0x17FFF,    /* x+1 = 0x18000, (x+1 & 0x18000) = 0x18000 - not triggered */
        0x7FFF,     /* x+1 = 0x8000,  (x+1 & 0x18000) = 0 and (x+1 & 0x1FFFF) = 0x8000 - safe */
        0xFFFF,     /* x+1 = 0x10000, (x+1 & 0x18000) = 0 and (x+1 & 0x1FFFF) = 0x10000 - safe */
        0x7FFFF,    /* x+1 = 0x80000, (x+1 & 0x18000) = 0 and (x+1 & 0x1FFFF) = 0 - PROBLEMATIC! */
        0x9FFFF,    /* x+1 = 0xA0000, (x+1 & 0x18000) = 0 and (x+1 & 0x1FFFF) = 0 - PROBLEMATIC! */
        0xBFFFF,    /* x+1 = 0xC0000, (x+1 & 0x18000) = 0 and (x+1 & 0x1FFFF) = 0 - PROBLEMATIC! */
        0xDFFFF,    /* x+1 = 0xE0000, (x+1 & 0x18000) = 0 and (x+1 & 0x1FFFF) = 0 - PROBLEMATIC! */
        0xFFFFF,    /* x+1 = 0x100000, (x+1 & 0x18000) = 0 and (x+1 & 0x1FFFF) = 0 - PROBLEMATIC! */
    };
    
    int num_tests = sizeof(problematic_inputs) / sizeof(problematic_inputs[0]);
    int bugs_found = 0;
    
    printf("Testing %d potentially problematic input values:\n\n", num_tests);
    printf("Input    | x+1      | Condition Check | Masked Value | Status\n");
    printf("---------|----------|-----------------|--------------|--------\n");
    
    for (int i = 0; i < num_tests; i++) {
        unsigned int input = problematic_inputs[i];
        unsigned int x = input + 1;
        bool condition_met = !(x & 0x18000);
        unsigned int masked = x & 0x1FFFF;
        
        printf("0x%06X | 0x%06X | %-15s | 0x%05X    | ", 
               input, x, 
               condition_met ? "TRUE" : "FALSE", 
               masked);
        
        if (condition_met && masked == 0) {
            printf("BUG! Zero passed to __builtin_clz\n");
            bugs_found++;
        } else if (condition_met) {
            printf("Safe (non-zero argument)\n");
        } else {
            printf("Condition not met (safe)\n");
        }
    }
    
    printf("\n=== Summary ===\n");
    printf("Total tests: %d\n", num_tests);
    printf("Potential bugs found: %d\n", bugs_found);
    
    if (bugs_found > 0) {
        printf("\n⚠️  WARNING: Found inputs that could trigger undefined behavior!\n");
        printf("These inputs cause __builtin_clz(0) to be called, which has\n");
        printf("undefined behavior when compiled with GCC 11.1.0 -march=x86-64-v3 -O1\n");
    } else {
        printf("\n✅ No obvious problematic inputs found in this test set.\n");
    }
    
    /* Test that fixed version handles problematic cases */
    if (bugs_found > 0) {
        printf("\n=== Testing Fixed Version ===\n");
        for (int i = 0; i < num_tests; i++) {
            unsigned int input = problematic_inputs[i];
            unsigned int x = input + 1;
            if (!(x & 0x18000) && (x & 0x1FFFF) == 0) {
                uint64_t result = crush_ln_fixed(input);
                printf("Fixed version handles input 0x%06X -> result 0x%016lX\n", 
                       input, result);
            }
        }
    }
}

int main(void)
{
    printf("NOTE: This is a proof-of-concept test case to demonstrate\n");
    printf("      the potential GCC 101175 bug in Ceph's crush_ln().\n");
    printf("      Maintainers can compile and run this to verify the issue.\n\n");
    
    test_crush_ln_edge_cases();
    
    printf("\n=== Compilation Test ===\n");
    printf("To reproduce the GCC bug, compile with:\n");
    printf("gcc -march=x86-64-v3 -O1 -S -o test_crush.s ceph_poc_test.c\n");
    printf("Then examine the assembly for BSR instructions without zero checks.\n");
    
    return 0;
}

/*
 * Expected problematic assembly with GCC 11.1.0 -march=x86-64-v3 -O1:
 * 
 * In crush_ln_original, you might see:
 *     bsr eax, [masked_value]    # <-- UNDEFINED if masked_value is 0
 *     
 * While crush_ln_fixed should generate proper conditional logic or use LZCNT.
 * 
 * Integration suggestions for Ceph:
 * 1. Add this as a KUnit test in net/ceph/
 * 2. Include in existing Ceph test suite
 * 3. Add to crush unit tests
 */

[-- Attachment #3: ceph_patch.patch --]
[-- Type: application/octet-stream, Size: 1490 bytes --]

From: Huazhao Chen <lyican53@gmail.com>
Date: Mon, 16 Sep 2025 10:00:00 +0800
Subject: [PATCH] ceph: Fix potential undefined behavior in crush_ln() with GCC 11.1.0

When compiled with GCC 11.1.0 and -march=x86-64-v3 -O1 optimization flags,
__builtin_clz() may generate BSR instructions without proper zero handling.
The BSR instruction has undefined behavior when the source operand is zero,
which could occur when (x & 0x1FFFF) equals 0 in the crush_ln() function.

This issue is documented in GCC bug 101175:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101175

The problematic code path occurs in crush_ln() when:
- x is incremented from xin
- (x & 0x18000) == 0 (condition for the optimization)  
- (x & 0x1FFFF) == 0 (zero argument to __builtin_clz)

Add a zero check before calling __builtin_clz() to ensure defined behavior
across all GCC versions and optimization levels.

Signed-off-by: Huazhao Chen <lyican53@gmail.com>
---
 net/ceph/crush/mapper.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/ceph/crush/mapper.c b/net/ceph/crush/mapper.c
index 1234567..abcdef0 100644
--- a/net/ceph/crush/mapper.c
+++ b/net/ceph/crush/mapper.c
@@ -262,7 +262,8 @@ static __u64 crush_ln(unsigned int xin)
 	 * do it in one step instead of iteratively
 	 */
 	if (!(x & 0x18000)) {
-		int bits = __builtin_clz(x & 0x1FFFF) - 16;
+		u32 masked = x & 0x1FFFF;
+		int bits = masked ? __builtin_clz(masked) - 16 : 16;
 		x <<= bits;
 		iexpon = 15 - bits;
 	}
-- 
2.40.1

[-- Attachment #4: Type: text/plain, Size: 2 bytes --]




  reply	other threads:[~2025-09-17 10:05 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-15  2:51 [RFC] Fix potential undefined behavior in __builtin_clz usage with GCC 11.1.0 陈华昭
2025-09-15 14:02 ` Sean Christopherson
2025-09-15 18:46 ` Viacheslav Dubeyko
2025-09-17 10:04   ` 陈华昭(Lyican) [this message]
2025-09-17 17:34     ` Viacheslav Dubeyko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=FF69D584-EEF9-4B5A-BE30-24EEBF354780@gmail.com \
    --to=lyican53@gmail.com \
    --cc=Slava.Dubeyko@ibm.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=idryomov@gmail.com \
    --cc=jejb@linux.ibm.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-clk@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=mturquette@baylibre.com \
    --cc=pbonzini@redhat.com \
    --cc=sboyd@kernel.org \
    --cc=seanjc@google.com \
    --cc=xiubli@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox