From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8E41B37BE80 for ; Sun, 3 May 2026 05:18:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777785514; cv=none; b=lz6rZFQfg3S8pjeqiS7WsGaAMMAkKn2P/0jLkUeeSB/Pnh8RA42KRg5fkTTALlFDFgSd5SX4UprwXvMFiYG8SZck7ZOVT9+KMKddB1N+oeP2y7QRALtPLVlZxqj+ReLLSvLyGAFFFPBWoYvjjefJXhz7NyaYks+R93USSb3W9Gw= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777785514; c=relaxed/simple; bh=KrgWOi1++IUXfdXoHA26zg0lkROYbc+/DLGNGrxCWt8=; h=From:Subject:To:Cc:In-Reply-To:References:Content-Type:Date: Message-Id; b=RgLQRZkjbEy6F7YxifLnjMt8Re2NsXVXkweW8fsAcPWagL6kZ8DUGZcuaE1TEKPSTmfj2t+5z9+r0kisI5xiJZllaJjBoheem5fp5braoj+OSfBU+a3worsW3lE1/7eL0Fyv15Ise1lmjY1N7ifqUF+clLTF8jJgwnLMyvbVS8A= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=MP+f9t5b; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="MP+f9t5b" Received: by smtp.kernel.org (Postfix) with ESMTPSA id E4773C2BCB4; Sun, 3 May 2026 05:18:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1777785514; bh=KrgWOi1++IUXfdXoHA26zg0lkROYbc+/DLGNGrxCWt8=; h=From:Subject:Reply-To:To:Cc:In-Reply-To:References:Date:From; b=MP+f9t5bCyNBlRHPWEn4WNstRPFpWFnSoPQ5zGVRl0oH36PWaFJ9d2vRRxIPx0yJs uraE0hDoWZKbAZISUO3T7Rh6WyR7FCRdHJA5bDVySmIylSNXH+c/UPw4vzOVQmPOwI 1oN1dMSW/QN2rJo6DauXPaRyXDdLTlVYP1Lx7eVaa6b7TMNXFmrjapgCLz9jPVyMnG fgVGJb1z0YOZ4KvI5JQWLLNhqj+owMINkWpzjP8kvIybY5rFjo9YgqmBUN5OGNZdLs u3v6480AEGDwUmYON2fLCuClDzBGuFRRv2cpys7JR9Ca078+dtURGriccPD72F0ESC D5VOSqMd+ZZSw== From: sashiko-bot@kernel.org Subject: Re: [RFC v2 3/3] tools/ufq_iosched: add BPF example scheduler and build scaffolding Reply-To: sashiko@lists.linux.dev To: "Kaitao cheng" Cc: bpf@vger.kernel.org In-Reply-To: <20260503035623.28771-4-kaitao.cheng@linux.dev> References: <20260503035623.28771-4-kaitao.cheng@linux.dev> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Date: Sun, 03 May 2026 05:18:33 +0000 Message-Id: <20260503051833.E4773C2BCB4@smtp.kernel.org> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Thank you for your contribution! Sashiko AI review found 9 potential issue(= s) to consider: - [High] Hallucinated BPF rbtree traversal functions - [High] Wait-context violation and lock inversion via `bpf_request_bio_try= _merge` - [High] Verifier rejection due to unreleased acquired request references - [High] Verifier rejection from accessing non-owning reference after tree = mutation - [High] Target request permanently orphaned after a merge operation - [High] Red-black tree corruption due to mutable traversal keys - [High] Use-After-Free when accessing `free->__data_len` - [Medium] Incorrect queue insertion order for `BLK_MQ_INSERT_AT_HEAD` - [Medium] Non-atomic updates to per-CPU array statistics -- commit 837eafbebf3f48dff3cc402191f742e1f891291d Author: Kaitao Cheng tools/ufq_iosched: add BPF example scheduler and build scaffolding Add ufq_iosched as a simple example for the UFQ block I/O scheduler. It implements the eBPF struct_ops hooks the kernel exposes to exercise the behavior and stability of the kernel UFQ scheduling framework. > diff --git a/tools/ufq_iosched/ufq_simple.bpf.c b/tools/ufq_iosched/ufq_s= imple.bpf.c > new file mode 100644 > index 000000000000..81954e73068a > --- /dev/null > +++ b/tools/ufq_iosched/ufq_simple.bpf.c [ ... ] > +static void stat_add(u32 idx, u32 val) > +{ > + u64 *cnt_p =3D bpf_map_lookup_elem(&stats, &idx); > + > + if (cnt_p) > + (*cnt_p) +=3D val; > +} Since operations like insert_req can run in process context while finish_req might run in softirq on the same CPU, could these non-atomic updates result in lost statistics? Should this use __sync_fetch_and_add() instead? [ ... ] > +int BPF_STRUCT_OPS(ufq_simple_insert_req, struct request_queue *q, > + struct request *rq, blk_insert_t flags, > + struct list_head *freeq) > +{ [ ... ] > + if (flags & BLK_MQ_INSERT_AT_HEAD) { [ ... ] > + /* Add queue_list_node to dispatch list */ > + bpf_spin_lock(&ufq_sd->lock); > + ret =3D bpf_list_push_back(&ufq_sd->dispatch, &qnode->node); > + bpf_spin_unlock(&ufq_sd->lock); > + } else { Does using bpf_list_push_back() here place the request at the tail of the dispatch list rather than the head? Since dispatch_req pops from the front, this seems to violate the requested BLK_MQ_INSERT_AT_HEAD semantics. [ ... ] > +struct request *BPF_STRUCT_OPS(ufq_simple_dispatch_req, struct request_q= ueue *q) > +{ [ ... ] > + snode =3D container_of(rb_node, struct sort_tree_node, rb_node); > + > + /* Get request from sort_tree_node (this will be returned) */ > + rq =3D bpf_kptr_xchg(&snode->req, NULL); > + bpf_spin_unlock(&ufq_sd->lock); > + bpf_obj_drop(snode); > + } > + if (!rq) > + bpf_printk("ufq_simple/dispatch_req: no request to dispatch"); > + > +out: > + if (rq) { > + stat_add(UFQ_SIMP_DISPATCH_CNT, 1); > + stat_add(UFQ_SIMP_DISPATCH_SIZE, rq->__data_len); > + } > + > + return rq; > +} Will the verifier reject this program because it returns an acquired refere= nce from bpf_kptr_xchg without calling bpf_request_release()? Also, if the verifier did allow this, would the request reference permanently leak since the block layer doesn't consume BPF kptr references? > +bool BPF_STRUCT_OPS(ufq_simple_has_req, struct request_queue *q, int rqs= _count) > +{ [ ... ] > + bpf_spin_lock(&ufq_sd->lock); > + has =3D !bpf_list_empty(&ufq_sd->dispatch) || > + bpf_rbtree_root(&ufq_sd->sort_tree_read) || > + bpf_rbtree_root(&ufq_sd->sort_tree_write); The functions bpf_rbtree_root(), bpf_rbtree_left(), and bpf_rbtree_right() are used throughout this file, but they do not appear to exist in the BPF API. The API only exposes bpf_rbtree_add(), bpf_rbtree_remove(), and bpf_rbtree_first(). Will this code fail to compile and load? [ ... ] > +struct request *BPF_STRUCT_OPS(ufq_simple_merge_req, struct request_queu= e *q, > + struct request *rq, int *type) > +{ [ ... ] > + if (mt) { > + if (rq->mq_ctx !=3D targ_mq_ctx || rq->mq_hctx !=3D targ_mq_hctx) { > + mt =3D ELEVATOR_NO_MERGE; > + break; > + } > + > + rb_node =3D bpf_rbtree_remove(tree, rb_node); > + if (rb_node) { > + snode =3D container_of(rb_node, > + struct sort_tree_node, rb_node); > + targ =3D bpf_kptr_xchg(&snode->req, NULL); > + bpf_spin_unlock(&ufq_sd->lock); > + if (targ) { > + *type =3D mt; > + stat_add(UFQ_SIMP_RQMERGE_CNT, 1); > + stat_add(UFQ_SIMP_RQMERGE_SIZE, targ->__data_len); > + stat_sub(UFQ_SIMP_INSERT_CNT, 1); > + stat_sub(UFQ_SIMP_INSERT_SIZE, targ->__data_len); > + } > + > + bpf_obj_drop(snode); > + } else { > + bpf_spin_unlock(&ufq_sd->lock); > + *type =3D ELEVATOR_NO_MERGE; > + } > + return targ; > + } > + } > + bpf_spin_unlock(&ufq_sd->lock); > + > + return NULL; > +} If a candidate request is found for merging, it gets permanently removed fr= om the scheduler's rbtree and its node is dropped. Since dispatch_req will now never see it, does this result in a permanent I/O hang for the merged target request? > +static struct request *merge_bio_left_unlock(struct ufq_simple_data *ufq= _sd, > + struct bpf_rb_root *tree, > + struct sort_tree_node *snode, > + struct request *cand) > +{ [ ... ] > + if (left_end =3D=3D cand_start) > + free =3D bpf_request_try_merge(left_rq, cand); > + > + if (free =3D=3D cand) { > + removed =3D bpf_rbtree_remove(tree, &snode->rb_node); > + left_rq =3D bpf_kptr_xchg(&left_node->req, left_rq); According to BPF safety rules, removing a node from a collection invalidates all non-owning references to that collection. Will the verifier reject this program because it accesses left_node right after calling bpf_rbtree_remove= ()? > + bpf_spin_unlock(&ufq_sd->lock); > + if (removed) { > + struct sort_tree_node *drop =3D container_of(removed, > + struct sort_tree_node, rb_node); > + bpf_obj_drop(drop); > + } > + stat_add(UFQ_SIMP_RQMERGE_CNT, 1); > + stat_add(UFQ_SIMP_RQMERGE_SIZE, free->__data_len); If bpf_request_try_merge() is successful, it merges and frees the candidate request. By passing free->__data_len to stat_add, does this cause a use-after-free read on a request that was just returned to the block layer? [ ... ] > +struct request *BPF_STRUCT_OPS(ufq_simple_merge_bio, > + struct request_queue *q, struct bio *bio, > + unsigned int nr_segs, bool *merged) > +{ [ ... ] > + bpf_spin_lock(&ufq_sd->lock); [ ... ] > + while (rb_node && count < UFQ_LOOP_MAX) { > + count++; > + snode =3D container_of(rb_node, struct sort_tree_node, rb_node); > + cand =3D bpf_kptr_xchg(&snode->req, NULL); > + if (!cand) > + break; > + > + cand_start =3D cand->__sector; > + cand_end =3D cand_start + (cand->__data_len >> SECTOR_SHIFT); > + > + if (end < cand_start) { > + rb_node =3D bpf_rbtree_left(tree, rb_node); > + } else if (start > cand_end) { > + rb_node =3D bpf_rbtree_right(tree, rb_node); > + } else if (cand_start =3D=3D end) { > + if (bpf_request_bio_try_merge(cand, bio, nr_segs)) { The rbtree traversal here compares against cand->__sector. If a request was previously front-merged, its __sector would be modified by the block layer. Since the node isn't re-inserted to update its sorting key, could subsequent traversals navigate down the wrong tree branches and corrupt the red-black tree? Additionally, the code holds a bpf_spin_lock while calling the kfunc bpf_request_bio_try_merge(). Looking at bpf_request_bio_try_merge() in block/ufq-kfunc.c, it internally acquires a spin_lock: block/ufq-kfunc.c:bpf_request_bio_try_merge() { ... spin_lock(&ctx->lock); merged =3D blk_attempt_bio_merge(rq->q, rq, bio, nr_segs, true) =3D=3D BIO= _MERGE_OK; spin_unlock(&ctx->lock); ... } Since bpf_spin_lock() maps to a raw_spinlock_t, and spin_lock() is sleepable on PREEMPT_RT kernels, will this create a sleep-in-atomic context that triggers a lockdep BUG and system panic? --=20 Sashiko AI review =C2=B7 https://sashiko.dev/#/patchset/20260503035623.2877= 1-1-kaitao.cheng@linux.dev?part=3D3