Re: [RFC] Fix potential undefined behavior in __builtin_clz usage with GCC 11.1.0

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Sean Christopherson <seanjc@google.com>
To: 陈华昭 <lyican53@gmail.com>
Cc: linux-kernel@vger.kernel.org, idryomov@gmail.com,
	xiubli@redhat.com,  ceph-devel@vger.kernel.org,
	jejb@linux.ibm.com, martin.petersen@oracle.com,
	 linux-scsi@vger.kernel.org, pbonzini@redhat.com,
	kvm@vger.kernel.org,  mturquette@baylibre.com, sboyd@kernel.org,
	linux-clk@vger.kernel.org
Subject: Re: [RFC] Fix potential undefined behavior in __builtin_clz usage with GCC 11.1.0
Date: Mon, 15 Sep 2025 07:02:04 -0700	[thread overview]
Message-ID: <aMgcXJ3lbrTwOzzO@google.com> (raw)
In-Reply-To: <CAN53R8HxFvf9fAiF1vacCAdsx+m+Zcv1_vxEiq4CwoHLu17hNg@mail.gmail.com>

+Yuri

On Mon, Sep 15, 2025, 陈华昭 wrote:
> Hi all,
> 
> I've identified several instances in the Linux kernel where __builtin_clz()
> is used without proper zero-value checking, which may trigger undefined
> behavior when compiled with GCC 11.1.0 using -march=x86-64-v3 -O1 optimization.
> 
> PROBLEM DESCRIPTION:
> ===================
> 
> GCC bug 101175 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101175) causes
> __builtin_clz() to generate BSR instructions without proper zero handling when
> compiled with specific optimization flags. The BSR instruction has undefined
> behavior when the source operand is zero, potentially causing incorrect results.
> 
> The issue manifests when:
> - GCC version: 11.1.0 (potentially other versions)
> - Compilation flags: -march=x86-64-v3 -O1
> - Code pattern: __builtin_clz(value) where value might be 0
> 
> AFFECTED LOCATIONS:
> ==================
> 
> 1. HIGH RISK: net/ceph/crush/mapper.c:265
> Problem: __builtin_clz(x & 0x1FFFF) when (x & 0x1FFFF) could be 0
> Impact: CRUSH hash algorithm corruption in Ceph storage
> 
> 2. HIGH RISK: drivers/scsi/elx/libefc_sli/sli4.h:3796
> Problem: __builtin_clz(mask) in sli_convert_mask_to_count() with no zero check
> Impact: Incorrect count calculations in SCSI operations
> 
> 3. HIGH RISK: tools/testing/selftests/kvm/dirty_log_test.c:314
> Problem: Two __builtin_clz() calls without zero validation
> Impact: KVM selftest framework reliability

In practice, neither pages nor test_dirty_ring_count can be zero.   Pages is
guaranteed to be a large-ish value, as vm->page shift is guaranteed to be at
most 16, DIRTY_MEM_BITS is 30, and the guest/host adjustments just do minor
tweaks.  Not to mention the test would fail miserably if pages were ever zero.

	pages = (1ul << (DIRTY_MEM_BITS - vm->page_shift)) + 3;
	pages = vm_adjust_num_guest_pages(vm->mode, pages);
	if (vm->page_size < getpagesize())
		pages = vm_num_host_pages(vm->mode, pages);

The user could deliberately set test_dirty_ring_count to zero, but the test would
crash due to a divide-by-zero before reaching this point.

> 4. MEDIUM RISK: drivers/clk/clk-versaclock7.c:322
> Problem: __builtin_clzll(den) but prior checks likely prevent den=0
> Impact: Clock driver calculations (lower risk due to existing checks)
> 
> COMPARISON WITH SAFE PATTERNS:
> =============================
> 
> The kernel already implements safe patterns in many places:
> 
> // Safe pattern from include/asm-generic/bitops/builtin-fls.h
> return x ? sizeof(x) * 8 - __builtin_clz(x) : 0;
> 
> // Safe pattern from arch/powerpc/lib/sstep.c
> op->val = (val ? __builtin_clz(val) : 32);
> 
> PROPOSED FIXES:
> ==============
> 
> 1. net/ceph/crush/mapper.c:
> - int bits = __builtin_clz(x & 0x1FFFF) - 16;
> + u32 masked = x & 0x1FFFF;
> + int bits = masked ? __builtin_clz(masked) - 16 : 16;
> 
> 2. drivers/scsi/elx/libefc_sli/sli4.h:
> if (method) {
> - count = 1 << (31 - __builtin_clz(mask));
> + count = mask ? 1 << (31 - __builtin_clz(mask)) : 0;
> count *= 16;
> 
> 3. tools/testing/selftests/kvm/dirty_log_test.c:
> - limit = 1 << (31 - __builtin_clz(pages));
> - test_dirty_ring_count = 1 << (31 - __builtin_clz(test_dirty_ring_count));
> + limit = pages ? 1 << (31 - __builtin_clz(pages)) : 1;
> + test_dirty_ring_count = test_dirty_ring_count ?
> + 1 << (31 - __builtin_clz(test_dirty_ring_count)) : 1;
> 
> REPRODUCTION:
> ============
> 
> Based on the GCC bug report and analysis of the kernel code patterns, this
> issue can be reproduced by:
> 
> 1. Compiling affected code with: gcc -march=x86-64-v3 -O1
> 2. Examining generated assembly for BSR instructions
> 3. Triggering code paths where the __builtin_clz argument could be zero
> 
> QUESTIONS:
> =========
> 
> 1. Should I prepare formal patches for each affected subsystem?

Don't bother for the KVM selftest, it's not a problem in practice and checking
for zero will only add confusion by incorrectly implying that pages and/or
test_dirty_ring_count can ever be zero.

> 2. Are there other instances I should investigate?
> 3. Would adding a kernel-wide safe wrapper for __builtin_clz be appropriate?

Maybe?  At a glance, several of the calls to __builtin_clz() could be replaced
with fls() or fls64(), or a fls8() or fls16() (which don't exist, yet).  This is
probably a question for the bitops maintainer, Yuri, now Cc'd.

> 4. Would the maintainers like me to create a proof-of-concept test case?
> 
> This analysis is based on static code review and comparison with the known
> GCC bug behavior. Further testing by the respective subsystem maintainers
> would be valuable to confirm the impact.
> 
> Best regards,
> Huazhao Chen
> lyican53@gmail.com
> 
> ---
> 
> This analysis affects multiple subsystems and should be addressed to ensure
> deterministic behavior across different GCC versions and optimization levels.
> I'm happy to assist with testing or patch development if the maintainers
> confirm this is indeed an issue worth addressing.

next prev parent reply	other threads:[~2025-09-15 14:02 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-15  2:51 [RFC] Fix potential undefined behavior in __builtin_clz usage with GCC 11.1.0 陈华昭
2025-09-15 14:02 ` Sean Christopherson [this message]
2025-09-15 18:46 ` Viacheslav Dubeyko
2025-09-17 10:04   ` 陈华昭（Lyican）
2025-09-17 17:34     ` Viacheslav Dubeyko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aMgcXJ3lbrTwOzzO@google.com \
    --to=seanjc@google.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=idryomov@gmail.com \
    --cc=jejb@linux.ibm.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-clk@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=lyican53@gmail.com \
    --cc=martin.petersen@oracle.com \
    --cc=mturquette@baylibre.com \
    --cc=pbonzini@redhat.com \
    --cc=sboyd@kernel.org \
    --cc=xiubli@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox