From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out-170.mta0.migadu.com (out-170.mta0.migadu.com [91.218.175.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 80E972046BA for ; Wed, 15 Oct 2025 20:54:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.170 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1760561661; cv=none; b=X3y5Mh6o5p9j63ns/Sw9w6ZRfpR0+jkm3+DOx8N4AzUo22ViKcwMwlcOoSKwFcomdhvpseTsZXN97dVghmYuKZrAQX46MeyifJpPSKOpZWu+5dZ12qUQirFitmFQpd0JNSGnr1o5K3Xeink3U3BYQTpV4vt6EPm5l16XW0QiYVI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1760561661; c=relaxed/simple; bh=FELPSjZTS/IFr7JkBw/MxQ/fyf6mwm2Q0Zr6knIQGcA=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=BU1Jafi3vHZpHecNMFaP74xV9eVFA0d2GWdnAVSaHFYQ4PyipdGlHkODQgrB1i7/8uwQnQQrbaImsQSuwZED+A4aZSOmJZSYPhu0wRjaH23S26fy75/L0S6U0xgniy/OO33xJfyiCHpinHqBDLZ+QMCWWLIaIw4P1jx6f8UNoVk= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=IM0Jdjg9; arc=none smtp.client-ip=91.218.175.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="IM0Jdjg9" Message-ID: <77c9925b-8567-460c-8441-c42b3629e4a1@linux.dev> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1760561656; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=PoMOSJAGDRmYXJdzQ4c3wB/PRH8o18bOVk4ODvevw9c=; b=IM0Jdjg9X122TFP/56wRybI7VpZSfwpenkFwlcEIt3EvzV+HqrI+HLoibpH2CfE+gBX+ZZ PQa1hSj9wdF8MjE/eDlnZE4eCSdvqnGo3ncbk1QHqdMdIicWHfUn+35SA1yo0WKvsh4RZD yvxTD8wgrdUP/o3kQPoqC9ejHAWh8Fs= Date: Wed, 15 Oct 2025 13:54:11 -0700 Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Subject: Re: [PATCH bpf-next] selftests/bpf: Fix selftest verif_scale_strobemeta failure with llvm22 Content-Language: en-GB To: Andrii Nakryiko Cc: bpf@vger.kernel.org, Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , kernel-team@fb.com, Martin KaFai Lau References: <20251014051639.1996331-1-yonghong.song@linux.dev> <93fc2f1c-c5d3-4260-9a83-1a7300ff73cc@linux.dev> X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Yonghong Song In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT On 10/15/25 1:31 PM, Andrii Nakryiko wrote: > On Wed, Oct 15, 2025 at 12:56 PM Yonghong Song wrote: >> >> >> On 10/15/25 9:45 AM, Andrii Nakryiko wrote: >>> On Mon, Oct 13, 2025 at 10:16 PM Yonghong Song wrote: >>>> With latest llvm22, I hit the verif_scale_strobemeta selftest failure >>>> below: >>>> $ ./test_progs -n 618 >>>> libbpf: prog 'on_event': BPF program load failed: -E2BIG >>>> libbpf: prog 'on_event': -- BEGIN PROG LOAD LOG -- >>>> BPF program is too large. Processed 1000001 insn >>>> verification time 7019091 usec >>>> stack depth 488 >>>> processed 1000001 insns (limit 1000000) max_states_per_insn 28 total_states 33927 peak_states 12813 mark_read 0 >>>> -- END PROG LOAD LOG -- >>>> libbpf: prog 'on_event': failed to load: -E2BIG >>>> libbpf: failed to load object 'strobemeta.bpf.o' >>>> scale_test:FAIL:expect_success unexpected error: -7 (errno 7) >>>> #618 verif_scale_strobemeta:FAIL >>>> >>>> But if I increase the verificaiton insn limit from 1M to 10M, the above >>>> test_progs run actually will succeed. The below is the result from veristat: >>>> $ ./veristat strobemeta.bpf.o >>>> Processing 'strobemeta.bpf.o'... >>>> File Program Verdict Duration (us) Insns States Program size Jited size >>>> ---------------- -------- ------- ------------- ------- ------ ------------ ---------- >>>> strobemeta.bpf.o on_event success 90250893 9777685 358230 15954 80794 >>>> ---------------- -------- ------- ------------- ------- ------ ------------ ---------- >>>> Done. Processed 1 files, 0 programs. Skipped 1 files, 0 programs. >>>> >>>> Further debugging shows the llvm commit [1] is responsible for the verificaiton >>>> failure as it tries to convert certain switch statement to if-condition. Such >>>> change may cause different transformation compared to original switch statement. >>>> >>>> In bpf program strobemeta.c case, the initial llvm ir for read_int_var() function is >>>> define internal void @read_int_var(ptr noundef %0, i64 noundef %1, ptr noundef %2, >>>> ptr noundef %3, ptr noundef %4) #2 !dbg !535 { >>>> %6 = alloca ptr, align 8 >>>> %7 = alloca i64, align 8 >>>> %8 = alloca ptr, align 8 >>>> %9 = alloca ptr, align 8 >>>> %10 = alloca ptr, align 8 >>>> %11 = alloca ptr, align 8 >>>> %12 = alloca i32, align 4 >>>> ... >>>> %20 = icmp ne ptr %19, null, !dbg !561 >>>> br i1 %20, label %22, label %21, !dbg !562 >>>> >>>> 21: ; preds = %5 >>>> store i32 1, ptr %12, align 4 >>>> br label %48, !dbg !563 >>>> >>>> 22: >>>> %23 = load ptr, ptr %9, align 8, !dbg !564 >>>> ... >>>> >>>> 47: ; preds = %38, %22 >>>> store i32 0, ptr %12, align 4, !dbg !588 >>>> br label %48, !dbg !588 >>>> >>>> 48: ; preds = %47, %21 >>>> call void @llvm.lifetime.end.p0(ptr %11) #4, !dbg !588 >>>> %49 = load i32, ptr %12, align 4 >>>> switch i32 %49, label %51 [ >>>> i32 0, label %50 >>>> i32 1, label %50 >>>> ] >>>> >>>> 50: ; preds = %48, %48 >>>> ret void, !dbg !589 >>>> >>>> 51: ; preds = %48 >>>> unreachable >>>> } >>>> >>>> Note that the above 'switch' statement is added by clang frontend. >>>> Without [1], the switch statement will survive until SelectionDag, >>>> so the switch statement acts like a 'barrier' and prevents some >>>> transformation involved with both 'before' and 'after' the switch statement. >>>> >>>> But with [1], the switch statement will be removed during middle end >>>> optimization and later middle end passes (esp. after inlining) have more >>>> freedom to reorder the code. >>>> >>>> The following is the related source code: >>>> >>>> static void *calc_location(struct strobe_value_loc *loc, void *tls_base): >>>> bpf_probe_read_user(&tls_ptr, sizeof(void *), dtv); >>>> /* if pointer has (void *)-1 value, then TLS wasn't initialized yet */ >>>> return tls_ptr && tls_ptr != (void *)-1 >>>> ? tls_ptr + tls_index.offset >>>> : NULL; >>>> >>>> In read_int_var() func, we have: >>>> void *location = calc_location(&cfg->int_locs[idx], tls_base); >>>> if (!location) >>>> return; >>>> >>>> bpf_probe_read_user(value, sizeof(struct strobe_value_generic), location); >>>> ... >>>> >>>> The static func calc_location() is called inside read_int_var(). The asm code >>>> without [1]: >>>> 77: .123....89 (85) call bpf_probe_read_user#112 >>>> 78: ........89 (79) r1 = *(u64 *)(r10 -368) >>>> 79: .1......89 (79) r2 = *(u64 *)(r10 -8) >>>> 80: .12.....89 (bf) r3 = r2 >>>> 81: .123....89 (0f) r3 += r1 >>>> 82: ..23....89 (07) r2 += 1 >>>> 83: ..23....89 (79) r4 = *(u64 *)(r10 -464) >>>> 84: ..234...89 (a5) if r2 < 0x2 goto pc+13 >>>> 85: ...34...89 (15) if r3 == 0x0 goto pc+12 >>>> 86: ...3....89 (bf) r1 = r10 >>>> 87: .1.3....89 (07) r1 += -400 >>>> 88: .1.3....89 (b4) w2 = 16 >>>> In this case, 'r2 < 0x2' and 'r3 == 0x0' go to null 'locaiton' place, >>>> so the verifier actually prefers to do verification first at 'r1 = r10' etc. >>>> >>>> The asm code with [1]: >>>> 119: .123....89 (85) call bpf_probe_read_user#112 >>>> 120: ........89 (79) r1 = *(u64 *)(r10 -368) >>>> 121: .1......89 (79) r2 = *(u64 *)(r10 -8) >>>> 122: .12.....89 (bf) r3 = r2 >>>> 123: .123....89 (0f) r3 += r1 >>>> 124: ..23....89 (07) r2 += -1 >>>> 125: ..23....89 (a5) if r2 < 0xfffffffe goto pc+6 >>>> 126: ........89 (05) goto pc+17 >>>> ... >>>> 144: ........89 (b4) w1 = 0 >>>> 145: .1......89 (6b) *(u16 *)(r8 +80) = r1 >>>> In this case, if 'r2 < 0xfffffffe' is true, the control will go to >>>> non-null 'location' branch, so 'goto pc+17' will actually go to >>>> null 'location' branch. This seems causing tremendous amount of >>>> verificaiton state. >>>> >>>> To fix the issue, rewrite the following code >>>> return tls_ptr && tls_ptr != (void *)-1 >>>> ? tls_ptr + tls_index.offset >>>> : NULL; >>>> to if/then statement and hopefully these explicit if/then statements >>>> are sticky during middle-end optimizations. >>> this is so fragile and non-obvious... Just looking at the patch, it's >>> an equivalent transformation, so as a user I'd have no clue that doing >>> something like that can even matter... >> You are correct. The llvm generate different codes due to compiler internal >> changes, and in this case the change caused the verification failure. >> >>> Have you tried adding likely() around non-NULL case? Does it change >>> generated code, while leaving ternary as is? >> I tried the following: >> >> diff --git a/tools/testing/selftests/bpf/progs/strobemeta.h b/tools/testing/selftests/bpf/progs/strobemeta.h >> index a5c74d31a244..6c0ec8794d3e 100644 >> --- a/tools/testing/selftests/bpf/progs/strobemeta.h >> +++ b/tools/testing/selftests/bpf/progs/strobemeta.h >> @@ -346,13 +346,12 @@ static void read_int_var(struct strobemeta_cfg *cfg, >> struct strobemeta_payload *data) >> { >> void *location = calc_location(&cfg->int_locs[idx], tls_base); >> - if (!location) >> - return; >> - >> - bpf_probe_read_user(value, sizeof(struct strobe_value_generic), location); >> - data->int_vals[idx] = value->val; >> - if (value->header.len) >> - data->int_vals_set_mask |= (1 << idx); >> + if (likely(location)) { >> + bpf_probe_read_user(value, sizeof(struct strobe_value_generic), location); >> + data->int_vals[idx] = value->val; >> + if (value->header.len) >> + data->int_vals_set_mask |= (1 << idx); >> + } >> } >> >> >> and the verification still failed (exceeding 1000000 insns). > I was thinking to add likely like so: > > return likely(tls_ptr && tls_ptr != (void *)-1) ? tls_ptr + > tls_index.offset : NULL; > > > and then hope that Clang will prioritize leaving non-NULL code path as > linear as possible Sadly with the suggested change: diff --git a/tools/testing/selftests/bpf/progs/strobemeta.h b/tools/testing/selftests/bpf/progs/strobemeta.h index a5c74d31a244..d31d5fb1e96e 100644 --- a/tools/testing/selftests/bpf/progs/strobemeta.h +++ b/tools/testing/selftests/bpf/progs/strobemeta.h @@ -330,7 +330,7 @@ static void *calc_location(struct strobe_value_loc *loc, void *tls_base) } bpf_probe_read_user(&tls_ptr, sizeof(void *), dtv); /* if pointer has (void *)-1 value, then TLS wasn't initialized yet */ - return tls_ptr && tls_ptr != (void *)-1 + return likely(tls_ptr && tls_ptr != (void *)-1) ? tls_ptr + tls_index.offset : NULL; } the verification still fails. Stubborn clang :-( > >> I think that we can leave patch for a while. I will do some investigation >> in llvm side to see whether I can come up with some heuristics to benefit >> verifier in terms of verified insns. >> >>>> Test with llvm20 and llvm21 as well and all strobemeta related selftests >>>> are passed. >>>> >>>> [1] https://github.com/llvm/llvm-project/pull/161000 >>>> >>>> Signed-off-by: Yonghong Song >>>> --- >>>> tools/testing/selftests/bpf/progs/strobemeta.h | 6 +++--- >>>> 1 file changed, 3 insertions(+), 3 deletions(-) >>>> >>>> NOTE: I will also check whether we can make changes in llvm to automatically >>>> adjust branch statements to minimize verification insns/states. >>>> >>>> diff --git a/tools/testing/selftests/bpf/progs/strobemeta.h b/tools/testing/selftests/bpf/progs/strobemeta.h >>>> index a5c74d31a244..6e1918deaf26 100644 >>>> --- a/tools/testing/selftests/bpf/progs/strobemeta.h >>>> +++ b/tools/testing/selftests/bpf/progs/strobemeta.h >>>> @@ -330,9 +330,9 @@ static void *calc_location(struct strobe_value_loc *loc, void *tls_base) >>>> } >>>> bpf_probe_read_user(&tls_ptr, sizeof(void *), dtv); >>>> /* if pointer has (void *)-1 value, then TLS wasn't initialized yet */ >>>> - return tls_ptr && tls_ptr != (void *)-1 >>>> - ? tls_ptr + tls_index.offset >>>> - : NULL; >>>> + if (!tls_ptr || tls_ptr == (void *)-1) >>>> + return NULL; >>>> + return tls_ptr + tls_index.offset; >>>> } >>>> >>>> #ifdef SUBPROGS >>>> -- >>>> 2.47.3 >>>>