From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wm1-f68.google.com (mail-wm1-f68.google.com [209.85.128.68]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 292F43C5523 for ; Fri, 19 Jun 2026 20:59:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.68 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781902789; cv=none; b=OLE0XuueU3OfiVJ5TdkFHEo2WsLzEk9bLEp8iKkxH2LcRE5vnCflpc99UPVmpsccCAhJvLayAkNijXLz8vTRooUX8R26xM3L+/bVpvexhhqJtGTF3fWlbthFPnH9spvK8Q1BVTEfRu9IoL3AvsXs5w0Mg8M/WKgJBmCb+Yysd+U= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781902789; c=relaxed/simple; bh=Cpa6NBCD3sxhl1tRnHZbmRh83a1i7SLQbuB+GCLVfA4=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=VFeckfrB/0g11dbhaFRM6KVfmguEzYNitYtp3j5DiutJ6ZArKxClvjeUS/dmRLJJI/ox3BOHHAVO3fYxC6vyjxL0RvH0buqsbWh8CdmWO2boopN/JEQ3A9t6+ZTTpDsJvII3drg+tJ6aAvdSxNZdXx2Qdd5v3Y05BfvuaZegNmg= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=WRnjmv3u; arc=none smtp.client-ip=209.85.128.68 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="WRnjmv3u" Received: by mail-wm1-f68.google.com with SMTP id 5b1f17b1804b1-4924944fe6bso1089875e9.0 for ; Fri, 19 Jun 2026 13:59:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1781902776; x=1782507576; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=zx+80VBlbiUOkdOFk1QiSIQdOy8ppyoIAFxhN5y8qgY=; b=WRnjmv3u8IshXSELjEUxJDou6AO6n56WdXSRAp4utYPvLfoVVZU8BQC22kDMu2+Sr5 U9fQuf06sVOUDDRDynAyoUGuBOzM6oTb+p2iWKLsp8o0Y6dPVWYfIUNOX4SikZWsa91B Tap69Eoep6GTi7/gVeOOMbwOOOkdeqlj7WwzBjiunK9JkAsc95n/LQcmtUZFapCFywBv 4emTchFVBqSM3x218ckY3frcOfBkLWp7xmnKBMZGupGN/vEGax5CfGo8HWzwGb8h1vny f3psXH8mxjQlQ7q5JiqbTHAqqi1ymIdJVwYF/AomRu8ext0xuATn6x6JXmJw6kf7RkJn IUYQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1781902776; x=1782507576; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=zx+80VBlbiUOkdOFk1QiSIQdOy8ppyoIAFxhN5y8qgY=; b=JkNVz7XpWnc9gKp4cPMeXT0UYVzIk5xXXWhF1yNBLyaszjWN6kld/F1/2P4tjlsoc0 XlnzrwM/Vq0owpB1MmOKNmgMJ9LNK0FAd8TdPiM2Mr9tPO2SDGnZNIY8kiC+d8Dej177 qepUoaxd5s7DD3at6v3ZMA7VFNjExzwqrZuNU3QclYtp7daWHCuDsaFve71QjNi1eKnT 3K/ZfuAlUyXrAUKpNdLho9yxPZeDcafzeDKsqrrMkuz9ygaBnMKnVNuPf4Vei3gLm/mt k/F8lf21TRKVIlbXff1uhh8V3DNNRpSHxDTlnm0ErU2jOZwWnmcDRr+6wCPsOe62aqh8 Olhg== X-Gm-Message-State: AOJu0Yyk/np0hz1b/AQbqtp+4EgFuRCHd8f8x0lE/R/nSyEeRRQgHOJG 7T7R/0Rd9TXb1agB0pnNwFhqH/DUl9ry4nnfyW0vmJXwpwZFZRVHvWPBwB6HMCa/ X-Gm-Gg: AfdE7ck21QDgkWo0mf2j83Te1RDUcmSdzAlPhtli6Gi2rf+oYPlqGC6BCbShHJrFnLX 6+bqtbxhDO4a6xuG+l8dwRHPy6R7EOTksKfoS6HubPFz3UbsIrrtNH7LeUoUv8s77Fk1X9HjO2u s4BFzFd2jSS014fIV00jy6LVN7MrtW2BZWhOyxVK10o1Bi052gacBkI12B7JJQBIL5J4abi73+M GYyRMDYP6YsJ12nOhWF+CrQqVOC9Hl+Se45ZOZSwlMV9tb2PEtZI4TjeW1bSZOEZ0g7TenGuwW/ XpNM7EbqY0c5fkreeCSNweu+IfMxRu/jQ3xdUDR9MXMAyXsKj8aC+L3G0inZr7FlvY6kbzrMRk0 PY3vC2TVlal8LtlZmX5A/9iDMEQBfCe7pZpNmIAOqu5Jw1zuSkGoD5d77f5o5KQ6ZPlbzxcDP+6 6ipCRpFM8AyGHpUggjXHY/zfgbqN/b/1HWtw1nyL7ZCfDJcoZAjwjWNj3O7y+ij3kwAXK6cDxUD maWygvG5KeruFrxtPaCxpfwViTsVYEkNWKz8LEjlm87o9Q9Fko0ZF4= X-Received: by 2002:a05:600c:6990:b0:492:490b:a604 with SMTP id 5b1f17b1804b1-492490ba610mr12877025e9.37.1781902775736; Fri, 19 Jun 2026 13:59:35 -0700 (PDT) Received: from localhost (nat-icclus-192-26-29-3.epfl.ch. [192.26.29.3]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-49249238bfbsm12836855e9.5.2026.06.19.13.59.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 19 Jun 2026 13:59:35 -0700 (PDT) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org Cc: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Eduard Zingerman , Emil Tsalapatis , kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v2 00/17] Redesign Verification Errors Date: Fri, 19 Jun 2026 22:59:13 +0200 Message-ID: <20260619205934.1312876-1-memxor@gmail.com> X-Mailer: git-send-email 2.53.0 Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=29365; i=memxor@gmail.com; h=from:subject; bh=Cpa6NBCD3sxhl1tRnHZbmRh83a1i7SLQbuB+GCLVfA4=; b=owGbwMvMwCXmrmtenRyi38x4Wi2JIct05Z855/58Wl0lkdxw0dalJlLQQFXml3R+7fK40JlNO feKg7d1lLIwiHExyIopspT838dkfKLyd6DtMm6YOaxMIEMYuDgFYCIZegz/dAIfn1q88/NzlpQt l30P7D615Y7CeXMDt/BIgbTg/DqlLQz/k060Ku6xz2xZyGzm8a4ldN0PBpXtbrdkVvLf4hAxWar LDQA= X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=B34BD741DE8494B76E2F717880EF20021D46C59B Content-Transfer-Encoding: 8bit TL;DR: This set reworks verifier error messages to include source and instruction annotations, together with more causal context, making failures easier to understand and more actionable when debugging and repairing BPF programs. Changelog: ---------- v1 -> v2 v1: https://lore.kernel.org/bpf/20260605063412.974640-1-memxor@gmail.com * Reworked diagnostic history from per-verifier-state log to active path log with positions saved and reset when verifier search backtracks. (Eduard) * Moved reusable diagnostic formatting storage into struct bpf_diag under struct bpf_verifier_env, and removed large per-report scratch buffers from verifier stack frames. (Eduard) * Added stack-slot events so diagnostics follow ordinary stack spill/fill value flow and invalidations in register-scoped histories. (Eduard) * Reused existing source and BTF formatting helpers for diagnostics, including bpf_get_linfo_file_line() and btf_type_snprintf_show_name(). (Eduard) * Fixed diagnostic edge cases around signed offset text, BPF_MAX_VAR_OFF reporting, negative-offset clamping, poisoned stack reads, and borrowed-reference invalidations. (Eduard) * Fixed various miscellaneous diagnostic bugs. (Sashiko) * Misc improvements and refinements. -- Motivation ~~~~~~~~~~ The verifier log is the primary interface through which the verifier communicates to the user its verdict on whether a program was accepted or rejected. To aid the debugging of rejection decisions, verifier also reports the symbolic state of the program at each instruction, across every explored path of the BPF program. Such detailed information is critical to introspect the correctness of verification decisions, and provide insight into why a given program may have failed to load in the kernel. A constant pain point in BPF ecosystem throughout the years has been the difficulty of debugging verification errors. The human-readable error messages produced in response to a failure in satisfying safety-related constraints are often terse, context-dependent, or insufficient for understanding why a given error may have happened. Users must fall back to the verbose instruction-by-instruction breakdown of how the symbolic state evolved to surface the root cause. For programs with a huge log volume due to high verification complexity, such logs quickly become inscrutable. All of this has made life difficult for users lacking an understanding of how the verifier works, and the various heuristics and idiosyncracies used by it. In some cases, even seasoned BPF experts spend significant time reverse engineering why a program may have failed, and have to reach into the verifier's source code to form a complete picture of the verification process. Such a steep learning curve and cognitive burden also hurts the speed of BPF development, as the verifier sits right in the middle of the user's iteration loop while they make use of BPF to solve any given problem. Expertise in debugging verifier errors does not scale in terms of teams deploying these programs in production across a diverse set of kernels. Overall, this leads to a poorer developer experience, causes visible user dissatisfaction, and remains a drag on wider BPF adoption. With some of the more recent developments where users increasingly leverage AI tooling [0] to author their code, this bottleneck becomes even more critical to address, since it throttles the much faster iteration loop of AI agents. [0]: https://lwn.net/Articles/1075067 Approach ~~~~~~~~ This series starts moving selected failures from terse terminal messages toward diagnostics that carry the relevant context for a verification failure. The existing verbose log remains the low-level trace. The new report is emitted after this trace with selected failures and answers the immediate debugging questions: - what verifier rule failed, - why the current state does not satisfy it, - where the failing instruction maps to source, - which earlier branch or state event made this path fail, - what kind of source change would satisfy the verifier. The series adds a text-only diagnostics framework under kernel/bpf and uses it to augment selected verifier errors. Existing verbose(env, ...) messages are kept, so current selftest expectations and existing log consumers continue to see the legacy text. The new report has a uniform outer shape: Verification failed: : Reason: exact reason for the verification failure, with details At: source and instruction annotation Causal path: compressed branch and verifier-state events relevant for debugging Suggestion: speculation on potential fixes to repair the program The outer shape is shared, but report construction is category-specific. The categories are intentionally broad and reviewable. This revision covers representative cases in Register Type Safety, Memory Safety, Resource Lifetime Safety, Call Type Safety, Execution Context Safety, Program Structure, Policy, Verifier Limit, and Verifier Internal errors. It does not attempt to convert every verbose(env, ...) site for now. Additional verbose-only errors can be moved into the same framework incrementally. The following excerpts are copied from this current run on this branch: ./test_progs -j1 \ -a cpumask/test_populate_invalid_destination,\ cpumask/test_alloc_no_release,\ verifier_helper_value_access/via_variable_no_max_check_1,\ verifier_sock/invalidate_pkt_pointers_from_global_func \ -vv They show the old terminal error and the exact new diagnostic report, including the source/instruction annotation. Call Type Safety, cpumask/test_populate_invalid_destination: Legacy: R1 type=scalar expected=fp Diagnostic: Verification failed: Call Type Safety: Invalid call argument Reason: The first argument (R1) to bpf_cpumask_populate does not satisfy the verifier contract: the kfunc expects 16 bytes of memory for (struct cpumask), but it is an integer scalar and not verifier-known memory. At: test_populate_invalid_destination @ cpumask_failure.c:234:8 232 | ... 2 | (b7) r1 = 1193046 233 | ... 3 | (b7) r3 = 8 >>> 234 | ret = bpf_cpumask_populate((struct cpumask *)invalid, &bits, sizeof... >>> 4 | (85) call bpf_cpumask_populate#62115 | ^-- error: invalid first argument (R1) for bpf_cpumask_populate 235 | if (!ret) 5 | (56) if w0 != 0x0 goto pc+4 236 | err = 2; 6 | (18) r1 = 0xffffc9000028e000 Causal path: test_populate_invalid_destination @ cpumask_failure.c:234:8 232 | ... 0 | (bf) r2 = r10 233 | ... 1 | (07) r2 += -8 >>> 234 | ret = bpf_cpumask_populate((struct cpumask *)invalid, &bits, sizeof... >>> 2 | (b7) r1 = 1193046 | ^-- update: R1 changed from context pointer at offset 0 to integer scalar value | 1193046 235 | if (!ret) 3 | (b7) r3 = 8 236 | err = 2; 4 | (85) call bpf_cpumask_populate#62115 Suggestion: Pass stack, map, context, or other verifier-known memory of the expected type and size, not an integer cast to a pointer. Register Type Safety, verifier_sock/invalidate_pkt_pointers_from_global_func: Legacy: R7 invalid mem access 'scalar' Diagnostic: Verification failed: Register Type Safety: Invalid dereference Reason: R7 is an integer scalar here, not a pointer to memory. At: invalidate_pkt_pointers_from_global_func @ verifier_sock.c:1067:5 1065 | ... 8 | (85) call pc+4 1066 | skb_pull_data1(sk, 0); 9 | (b4) w1 = 42 >>> 1067 | *p = 42; /* this is unsafe */ >>> 10 | (63) *(u32 *)(r7 +0) = r1 | ^-- error: invalid dereference of R7 (scalar) 1068 | ... 11 | (bc) w0 = w6 1069 | } 12 | (95) exit Causal path: invalidate_pkt_pointers_from_global_func @ verifier_sock.c:1066:2 1064 | if ((void *)(p + 1) > (void *)(long)sk->data_end) 6 | (b4) w6 = 0 1065 | ... 7 | (b4) w2 = 0 >>> 1066 | skb_pull_data1(sk, 0); >>> 8 | (85) call pc+4 | ^-- invalidated: R7: packet data may have moved; previous value was pkt at | offset 0 1067 | *p = 42; /* this is unsafe */ 9 | (b4) w1 = 42 1068 | ... 10 | (63) *(u32 *)(r7 +0) = r1 Suggestion: Preserve a pointer-valued register where needed, or reload and revalidate the pointer after scalar arithmetic, helper calls, or other operations that can invalidate it. Memory Safety, verifier_helper_value_access/via_variable_no_max_check_1: Legacy: R1 unbounded memory access, make sure to bounds check any such access Diagnostic: Verification failed: Memory Safety: Access outside bounds Reason: The verifier cannot prove offset + access_size <= object_size. Here, R1 has unsigned maximum 4294967295, which exceeds BPF_MAX_VAR_OFF 536870912. R1 is map_value; offset is variable: known bits 0x0, unknown mask 0xffffffff; signed range [0, 4294967295], unsigned range [0, 4294967295]; access_size is 1; object_size is 48. At: via_variable_no_max_check_1 @ verifier_helper_value_access.c:627:2 625 | ... 11 | (b7) r2 = 1 626 | ... 12 | (b7) r3 = 0 >>> 627 | asm volatile (" \ >>> 13 | (85) call bpf_probe_read_kernel#113 | ^-- error: access may be outside object bounds 628 | ... 14 | (95) exit 629 | ... Causal path: via_variable_no_max_check_1 @ verifier_helper_value_access.c:627:2 625 | ... 8 | (bf) r1 = r0 626 | ... 9 | (61) r3 = *(u32 *)(r0 +0) >>> 627 | asm volatile (" \ >>> 10 | (0f) r1 += r3 | ^-- update: R1 changed from map value from map_hash_48b at offset 0 to map value | from map_hash_48b with variable offset: known bits 0x0, unknown mask | 0xffffffff, signed range [0, 4294967295], unsigned range [0, 4294967295] 628 | ... 11 | (b7) r2 = 1 629 | ... 12 | (b7) r3 = 0 Suggestion: Add or adjust a bounds check that proves offset + access_size stays within the object. Resource Lifetime Safety, cpumask/test_alloc_no_release: Legacy: Unreleased reference id=2 alloc_insn=0 BPF_EXIT instruction in main prog would lead to reference leak Diagnostic: Verification failed: Resource Lifetime Safety: Unreleased resource Reason: Owned resource (id=2) was acquired at instruction 0 and still needs to be released before this exit path. At: test_alloc_no_release @ cpumask_failure.c:36:5 34 | ... 19 | (7b) *(u64 *)(r10 -8) = r6 35 | ... 20 | (b4) w0 = 0 >>> 36 | int BPF_PROG(test_alloc_no_release, struct task_struct *task, u64 clone_flags) >>> 21 | (95) exit | ^-- error: owned resource (id=2) still needs release 37 | ... 38 | ... Causal path: test_alloc_no_release @ cpumask_common.h:78:12 76 | ... 77 | ... >>> 78 | cpumask = bpf_cpumask_create(); >>> 0 | (85) call bpf_cpumask_create#62106 | ^-- acquired: owned resource (id=2) 79 | if (!cpumask) { 1 | (bf) r6 = r0 80 | err = 1; 2 | (55) if r6 != 0x0 goto pc+5 test_alloc_no_release @ cpumask_common.h:79:6 77 | ... 0 | (85) call bpf_cpumask_create#62106 78 | cpumask = bpf_cpumask_create(); 1 | (bf) r6 = r0 >>> 79 | if (!cpumask) { >>> 2 | (55) if r6 != 0x0 goto pc+5 | ^-- branch: explored as true, goto followed 80 | err = 1; 3 | (18) r1 = 0xffffc90000252000 81 | ... test_alloc_no_release @ cpumask_common.h:84:6 82 | ... 9 | (85) call bpf_cpumask_empty#62107 83 | ... 10 | (54) w0 &= 1 >>> 84 | if (!bpf_cpumask_empty(cast(cpumask))) { >>> 11 | (56) if w0 != 0x0 goto pc+7 | ^-- branch: explored as true, goto followed 85 | err = 2; 12 | (18) r1 = 0xffffc90000252000 86 | bpf_cpumask_release(cpumask); Suggestion: Release or transfer ownership of the acquired resource on every path before the program exits. Patch layout: - Patches 1-3 add the common infrastructure: report sections, diagnostic categories, source-line lookup, and side-by-side source/instruction annotations. - Patches 4-7 add growable environment-owned diagnostic history. The history follows the active verifier path and is pruned when backtracking; it records branch outcomes, material register changes, reference lifetime events, and execution-context events so reports can explain the path and causal state transitions that led to the failure. - Patches 8-16 add the first category-specific reports. These patches hook selected verifier failure sites and choose the evidence that is useful for that error class. - Patch 17 gates diagnostic collection and rendering on verifier log level, so stats-only loads do not collect the extra path history. The overhead is limited to verbose log mode through this change. Evaluation ~~~~~~~~~~ To quantitatively measure and goal on metrics that help assess the quality of diagnostics (apart from subjective human feedback), we use AI models (called over APIs) and veristat metrics to compare results. Models are used as a way to measure repair utility of the extra diagnostics over a fixed test set. Each prompt contains only a sanitized source snippet and either the legacy verifier log or the new diagnostic log. To avoid leaking the answer through the test itself, comments, annotations, and other source hints that describe the intended failure were removed. The model is not given internet access, repository access, test execution, verifier access, or the expected fix. The expected causes and intended repairs are kept outside the prompt. Under those constraints, correctness, exact repair rate, output size, reasoning tokens, cost, and wall time provide a proxy for whether the additional verifier context makes the failure easier to understand and turn into a source-level fix. Verifier cost is assessed by forcing the collection of diagnostics information during normal verification. This information by default is supposed to only be collected and processed when logs are enabled, but forcing it even without a verbose log levels helps us understand the cost in CPU time and memory that is incurred due to this extra data. Both evaluations are covered in the sections below. Repair Quality -------------- Repair quality is measured by asking API-only models to propose source fixes from a sanitized source snippet and verifier log. The criterion is score >= 3 on a 0-4 local grading scale, where 3 means a likely fix with incomplete detail and 4 means an actionable source-level fix. Score 4 is reported separately as the exact repair rate. The reported model set contains 596 successful API responses: 298 diagnostic and 298 legacy. Main results (details available in Appendix): Metric Diagnostic Legacy Delta ---------------------------------- ----------- ----------- -------- Answers 298 298 Success rate 97.0% 97.3% -0.3 pp Exact repair rate 82.2% 72.1% +10.1 pp Mean score 3.79 3.69 +0.10 Solver cost $8.93 $10.37 -13.8% Mean output tokens per answer 1662 1975 -15.8% Mean reasoning tokens per answer 951 1080 -11.9% Mean wall time per answer 37.3s 44.1s -15.4% Diagnostic prompts carry more input context. The resulting answers are still shorter and cheaper. In this run, diagnostics do not materially change the coarse success rate, but they increase exact repairs by 10.1 percentage points while reducing cost, output tokens, reasoning tokens, and wall time. Verifier cost ------------- Verifier cost is measured with veristat over the BPF selftest programs selected by tools/testing/selftests/bpf/veristat.cfg, with five repetitions per configuration. With diagnostics gated by log level, wall time and verifier duration stay close to baseline. Forcing diagnostics on for every verifier run adds modest overhead on this workload. memory.peak is measured with cgroup v2 memory accounting for each program load. The table reports the mean wall time, the mean summed verifier duration, and the mean of the per-repetition maximum memory.peak values. Configuration Wall time mean Verifier duration memory.peak ---------------------------- -------------- ----------------- ----------- bpf-next baseline 25.78s 9.86s 142 MiB diagnostics, gated 26.64s 10.16s 144 MiB diagnostics, forced on 28.01s 11.00s 148 MiB TODO ~~~~ Known follow-up work: - Convert more verbose-only verifier errors into category-specific reports. - Integrate loop-convergence failure summarization from Eduard. - Report candidate kfuncs/helpers for releasing owned resources. - Explore association of source variables with verifier registers where debug info permits it. - Refine suggestions per category and, where useful, link diagnostics to maintained documentation. - Bring verifier warnings into the same reporting framework. Appendix: AI repair details ~~~~~~~~~~~~~~~~~~~~~~~~~~~ The 20 verifier-failing selftest cases are: Case Diff Category Selftest selector ------- ------ -------------------------- --------------------------------------------- case-001 easy Call Type Safety cpumask/test_populate_invalid_destination case-002 easy Resource Lifetime Safety cpumask/test_alloc_no_release case-003 easy Register Type Safety verifier_spill_fill/check_corrupted_spill_fill case-004 easy Register Type Safety test_global_funcs/global_func12 case-005 easy Execution Context Safety preempt_lock/preempt_sleepable_helper case-006 easy Policy verifier_helper_restricted/in_bpf_prog_type_kprobe_1 case-007 medium Memory Safety dynptr/dynptr_slice_var_len1 case-008 medium Call Type Safety dynptr/test_dynptr_skb_small_buff case-009 medium Call Type Safety task_kfunc/task_kfunc_acquire_untrusted case-010 medium Register Type Safety test_global_funcs/global_func6 case-011 medium Resource Lifetime Safety dynptr/ringbuf_missing_release2 case-012 medium Execution Context Safety irq/irq_sleepable_helper_global_subprog case-013 medium Verifier Limit test_global_funcs/global_func1 case-014 hard Memory Safety verifier_helper_value_access/via_variable_no_max_check_1 case-015 hard Register Type Safety verifier_sock/invalidate_pkt_pointers_from_global_func case-016 hard Resource Lifetime Safety verifier_ref_tracking/check_free_in_one_subbranch case-017 hard Resource Lifetime Safety irq/irq_restore_ooo case-018 hard Resource Lifetime Safety res_spin_lock_failure/res_spin_lock_ooo_unlock case-019 hard Program Structure verifier_loops1/bounded_recursion case-020 hard Verifier Limit verifier_liveness_exp/liveness_exponential_complexity The grading scale is: - 4: identifies the verifier cause and gives an actionable source-level fix. - 3: gives a likely fix, but with incomplete explanation or detail. - 2: identifies part of the issue, but not enough to fix confidently. - 1: gives only a broad verifier-area answer, or a wrong/insufficient fix. - 0: does not identify the intended verifier failure. Detailed effort metrics for the model set: Metric Variant Mean Median P99 ----------------------- ---------- -------- -------- -------- Cost per answer diagnostic $0.030 $0.019 $0.203 Cost per answer legacy $0.035 $0.018 $0.223 Input tokens diagnostic 1391 1220 4048 Input tokens legacy 1052 805 3655 Output tokens diagnostic 1662 954 8680 Output tokens legacy 1975 1034 9912 Reasoning tokens diagnostic 951 208 8108 Reasoning tokens legacy 1080 228 6322 Wall time diagnostic 37.3s 18.3s 222.7s Wall time legacy 44.1s 19.8s 255.5s Per-model results for diagnostic prompts: Model profile Ans Succ Exact Mean Cost OutK ReasK Wall ----------------------------------------- --- ----- ----- ---- ------- ---- ----- ----- anthropic-haiku-4.5-default 20 90.0 80.0 3.70 $0.087 11.4 0.0 5.0s anthropic-opus-4.8-high 20 100.0 90.0 3.90 $0.819 25.5 0.0 15.5s anthropic-opus-4.8-medium 20 95.0 90.0 3.85 $0.870 27.5 0.0 12.7s anthropic-sonnet-4.6-high 20 95.0 80.0 3.75 $0.824 48.9 0.0 21.6s anthropic-sonnet-4.6-medium 20 100.0 65.0 3.65 $0.278 12.4 0.0 6.6s openai-gpt-5.3-codex-high 20 100.0 80.0 3.80 $0.601 39.8 33.9 25.0s openai-gpt-5.3-codex-medium 20 95.0 85.0 3.80 $0.287 17.5 11.4 13.5s openai-gpt-5.5-high 20 100.0 90.0 3.90 $2.356 74.4 65.2 56.8s openai-gpt-5.5-low 20 100.0 90.0 3.90 $0.686 18.7 8.5 21.3s openai-gpt-5.5-medium 19 100.0 84.2 3.84 $1.353 41.1 31.8 37.4s openai-gpt-5.5-none 20 95.0 90.0 3.85 $0.457 11.1 0.0 10.4s openrouter-deepseek-r1-0528 20 100.0 75.0 3.75 $0.145 61.5 53.8 98.3s openrouter-deepseek-v3.2 19 100.0 78.9 3.79 $0.028 64.2 58.1 87.3s openrouter-glm-5.1-high 20 95.0 80.0 3.75 $0.113 28.8 20.7 19.3s openrouter-qwen3-coder 20 90.0 75.0 3.65 $0.028 12.4 0.0 7.1s Per-model results for legacy prompts: Model profile Ans Succ Exact Mean Cost OutK ReasK Wall ----------------------------------------- --- ----- ----- ---- ------- ---- ----- ----- anthropic-haiku-4.5-default 20 90.0 45.0 3.35 $0.081 11.6 0.0 5.0s anthropic-opus-4.8-high 20 90.0 70.0 3.60 $1.192 42.2 0.0 17.5s anthropic-opus-4.8-medium 20 95.0 85.0 3.80 $1.001 34.5 0.0 13.4s anthropic-sonnet-4.6-high 20 100.0 75.0 3.75 $1.181 74.1 0.0 24.4s anthropic-sonnet-4.6-medium 20 95.0 65.0 3.60 $0.420 23.4 0.0 12.3s openai-gpt-5.3-codex-high 20 100.0 85.0 3.85 $0.562 37.8 31.6 27.1s openai-gpt-5.3-codex-medium 20 100.0 75.0 3.75 $0.318 20.3 13.7 13.6s openai-gpt-5.5-high 19 100.0 78.9 3.79 $2.613 84.0 75.4 98.1s openai-gpt-5.5-low 20 100.0 75.0 3.75 $0.664 19.0 9.7 21.7s openai-gpt-5.5-medium 20 100.0 75.0 3.75 $1.602 50.2 41.0 56.1s openai-gpt-5.5-none 20 95.0 85.0 3.80 $0.416 10.7 0.0 10.9s openrouter-deepseek-r1-0528 20 95.0 70.0 3.65 $0.149 64.6 57.5 92.5s openrouter-deepseek-v3.2 20 100.0 60.0 3.60 $0.030 74.3 67.8 98.3s openrouter-glm-5.1-high 19 100.0 63.2 3.63 $0.115 32.1 24.9 30.4s openrouter-qwen3-coder 20 100.0 75.0 3.75 $0.022 9.5 0.0 5.4s Kumar Kartikeya Dwivedi (17): bpf: Add verifier diagnostics report helpers bpf: Add source and instruction diagnostic context bpf: Add verifier diagnostic event log bpf: Prune verifier diagnostics on backtracking bpf: Track verifier register diagnostic events bpf: Track verifier reference diagnostic events bpf: Track verifier context diagnostic events bpf: Report Register Type Safety errors bpf: Report Memory Safety bounds errors bpf: Report Resource Lifetime reference leaks bpf: Report Call Type Safety argument errors bpf: Report Execution Context Safety errors bpf: Report Program Structure CFG errors bpf: Report Policy helper and kfunc errors bpf: Report Verifier Limit errors bpf: Report Verifier Internal errors bpf: Gate verifier diagnostics on log level include/linux/bpf.h | 4 +- include/linux/bpf_verifier.h | 3 + include/linux/btf.h | 2 + kernel/bpf/Makefile | 2 +- kernel/bpf/btf.c | 11 + kernel/bpf/cfg.c | 39 + kernel/bpf/core.c | 10 +- kernel/bpf/diagnostics.c | 2502 ++++++++++++++++++++++++++++++++++ kernel/bpf/diagnostics.h | 272 ++++ kernel/bpf/liveness.c | 6 + kernel/bpf/verifier.c | 1253 ++++++++++++++++- 11 files changed, 4052 insertions(+), 52 deletions(-) create mode 100644 kernel/bpf/diagnostics.c create mode 100644 kernel/bpf/diagnostics.h base-commit: e771677c937da5808f7b6c1f0e4a97ec1a84f8a8 -- 2.53.0