From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-dy1-f175.google.com (mail-dy1-f175.google.com [74.125.82.175]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C09054D2EC1 for ; Thu, 11 Jun 2026 18:41:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=74.125.82.175 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781203318; cv=none; b=pj4UnvFochvt6gpVCESPwOVGfJ9Ig9jHGmUgpq2EMDEhFqOmA0rYaY2BTcvplHF5lBmvMcPCXHJWhwfPOYALN4Le0AzzjEnbxk3sAOEQ7SgC8ww6qTnOnVFuB6+6gWG9cz5ODJk+dNBDwxs/Grc82+YkZmLhU5qr9tfqjrsi09A= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781203318; c=relaxed/simple; bh=IkEHZPSDtCkUJmB9jlcPSk6gUtKj+1tZjpR+VAgd3ss=; h=Mime-Version:Content-Type:Date:Message-Id:From:To:Cc:Subject: References:In-Reply-To; b=BfB06EcfUJLZQYp4Q3a81RRA+FWAV/y+tOcqgn6uzPRbtMB4Yx0wwOOdy1zlIYDZJ52YDTB8S+/wMaN3FqvqpZguOChIen7GDSj5doczVvZ56RpCAI1wvY2X/R/s/PzVTCY2BUDdQX0Vz1l1o8sunNK5j+whv8YVW8IxbpOjAxQ= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=etsalapatis.com; spf=pass smtp.mailfrom=etsalapatis.com; dkim=pass (2048-bit key) header.d=etsalapatis-com.20251104.gappssmtp.com header.i=@etsalapatis-com.20251104.gappssmtp.com header.b=p9yJaWXM; arc=none smtp.client-ip=74.125.82.175 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=etsalapatis.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=etsalapatis.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=etsalapatis-com.20251104.gappssmtp.com header.i=@etsalapatis-com.20251104.gappssmtp.com header.b="p9yJaWXM" Received: by mail-dy1-f175.google.com with SMTP id 5a478bee46e88-304c520fe9aso464256eec.0 for ; Thu, 11 Jun 2026 11:41:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=etsalapatis-com.20251104.gappssmtp.com; s=20251104; t=1781203316; x=1781808116; darn=vger.kernel.org; h=in-reply-to:references:subject:cc:to:from:message-id:date :content-transfer-encoding:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=F7QM0qmxmtR/YrFr0aJ+thth8ELg/lCSgHx9ugycN6k=; b=p9yJaWXMDd3WP1Cb4U5SZU9NgtNRsl/G/EZ0l8KOhMuETmjCdg0olBowjrDxmpNkXi fvxENa5uPfRyidx/Fm3oFDnsZWQqB7IBY1sneQ4na2v9H4592QND9C7z3ofI2W3zvTlZ /MzaC9QaGzs6LKjKblop0m7lFXI/l3hZvl5+aqL8pXzmSv2hD7bt9vxwQRnsCfs52+dB RAyw1ggrftHM/lmA4qLft4PVabqLXUjYB0PVHig7QIk36U6mAeYQP/KuDJAvLfpyP3tP KOL/SoLmd7jzeTkBZ9dAez0sxCvC/3qTfTL994vV3ypcsZOQdLKlerugSAWF9aOt/6EG aJvg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1781203316; x=1781808116; h=in-reply-to:references:subject:cc:to:from:message-id:date :content-transfer-encoding:mime-version:x-gm-gg:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=F7QM0qmxmtR/YrFr0aJ+thth8ELg/lCSgHx9ugycN6k=; b=Zy0G4kFJddruIpHzRGpfPOjQ9oYQg/BofvpkAzgPT2L15wmDyUBjZuezM38iQ7JaJI 4V42vLHazgfN0/JYtOtbyss55k/KLmMvsJaV4+YHFOAPfwdhkQmKd0oS1s3kOo22BWSK JMho9xgQhSdLgTJJYeUYGAhMAVqmM09t97lADCd/0MVPiE7K+5sPNFpjBcWmIAWUMbM+ uaPvU6qhP2OGSXDaVYWEpoOrOBSiIjwVJRT5wnNY82hPIAEsZTdHZO1jKq9ZV6BPOXEr J0JKe6UJxAiwuEC5Aehvr0lMVAXCzxYbmjdwFL3wl7wSdRzU1KR1kUgczd4E8S5LXyUX dZnw== X-Forwarded-Encrypted: i=1; AFNElJ+yWSPqSeDJKNZMEgxbVVY2b2Hl+7kBYMVvqZCNHp6fTVazQjzX9YNvVhL0A+nA0rxtODrnmnM=@vger.kernel.org X-Gm-Message-State: AOJu0Yx7ZTVNWhLYwZbEn+hqJadhaHXTzVGPnjgzqeH2uv5eajF6joZ6 QcguzKgwXp4p7yHJ522Qo4iCFoyV4deJELAp/6KD9LRdBe/RUhE7dwrL6ExcHloHPqI= X-Gm-Gg: Acq92OGDCaK5gIvuHO3gfzg1ZnAFtAXz0c2jWEmEMSaq6zxhg6G0esZC9bKQtF2qPkc ciFmTSK4KcIPELjctJEIZuV8NbExp697d133gEPch7ay2dSEAvijxVKllplt/igyOFWtnku4vuH 33U05yz36pQzoYJ4dIZ9JjLycE5jFRiT65ZWwZ6TxHMJZm4t9N6Za2c3n+2j/Eqbta6UD8boYQy y+eqwU09z8HwLsDLeYuLAQdeQF9CnZuHUQPGtPO/dDp4UHi2ib/+yL1iHCjCUp/RKpzfsF1NVDN Z9nDxik9hrPz9NVLD5bdpHb7jGYc5q8Wy6QiCBG6ODhiMz5NxRsn/BQacxd4d0N2/iroehv6uun xzhiI2ob8wq3rWfRituNx5WqWCGT1WM+CvDQXZ54CxRyQ4AG8Zm30CcG6jr2AAwgvxDXVvReAE0 4RfOvM X-Received: by 2002:a05:7300:2146:b0:307:26a3:75e4 with SMTP id 5a478bee46e88-3080461f82cmr3319194eec.4.1781203315661; Thu, 11 Jun 2026 11:41:55 -0700 (PDT) Received: from localhost ([2620:10d:c090:600::9f35]) by smtp.gmail.com with ESMTPSA id 5a478bee46e88-30806c2f420sm2775011eec.6.2026.06.11.11.41.52 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 11 Jun 2026 11:41:55 -0700 (PDT) Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=UTF-8 Date: Thu, 11 Jun 2026 14:41:52 -0400 Message-Id: From: "Emil Tsalapatis" To: "Jiayuan Chen" , Cc: "Zhang Cen" , , "Han Guidong" <2045gemini@gmail.com>, "John Fastabend" , "Daniel Borkmann" , "Stanislav Fomichev" , "Martin KaFai Lau" , "Alexei Starovoitov" , "Andrii Nakryiko" , "Eduard Zingerman" , "Kumar Kartikeya Dwivedi" , "Song Liu" , "Yonghong Song" , "Jiri Olsa" , "Emil Tsalapatis" , "David S. Miller" , "Eric Dumazet" , "Jakub Kicinski" , "Paolo Abeni" , "Simon Horman" , "Jakub Sitnicki" , "Shuah Khan" , "Jesper Dangaard Brouer" , "Sechang Lim" , "Ihor Solodrai" , "Cong Wang" , , , Subject: Re: [PATCH bpf v2 4/7] bpf, sockmap: keep sk_msg copy state in sync X-Mailer: aerc 0.21.0-0-g5549850facc2 References: <20260611123538.156005-1-jiayuan.chen@linux.dev> <20260611123538.156005-5-jiayuan.chen@linux.dev> In-Reply-To: <20260611123538.156005-5-jiayuan.chen@linux.dev> On Thu Jun 11, 2026 at 8:34 AM EDT, Jiayuan Chen wrote: > From: Zhang Cen > > SK_MSG uses msg->sg.copy as per-scatterlist-entry provenance. Entries > with this bit set are copied before data/data_end are exposed to SK_MSG > BPF programs for direct packet access. > > bpf_msg_pull_data(), bpf_msg_push_data(), and bpf_msg_pop_data() > rewrite the sk_msg scatterlist ring by collapsing, splitting, and > shifting entries. These operations move msg->sg.data[] entries, but the > parallel copy bitmap can be left behind on the old slot. A copied entry > can then return to msg->sg.start with its copy bit clear and be exposed > as directly writable packet data. > > This corruption path requires an attached SK_MSG BPF program that calls > the mutating helpers; ordinary sockmap/TLS traffic that never runs > push/pop/pull helper sequences is not affected. > > Keep msg->sg.copy synchronized with scatterlist entry moves, preserve > the copy bit when an entry is split, clear it when a helper replaces an > entry with a private page, and clear slots vacated by pull-data > compaction. > > Fixes: 015632bb30da ("bpf: sk_msg program helper bpf_sk_msg_pull_data") > Fixes: 6fff607e2f14 ("bpf: sk_msg program helper bpf_msg_push_data") > Fixes: 7246d8ed4dcc ("bpf: helper to pop data from messages") > Cc: stable@vger.kernel.org > Co-developed-by: Han Guidong <2045gemini@gmail.com> > Cc: Jiayuan Chen > Reviewed-by: John Fastabend > Signed-off-by: Han Guidong <2045gemini@gmail.com> > Signed-off-by: Zhang Cen Reviewed-by: Emil Tsalapatis There's a bunch of nits here in terms of complexity but the fix is correct and considering how many reports are flying around for the same helpers it's iom better to just get them fixed asap. > --- > net/core/filter.c | 88 ++++++++++++++++++++++++++++++++++++++++++++--- > 1 file changed, 83 insertions(+), 5 deletions(-) > > diff --git a/net/core/filter.c b/net/core/filter.c > index 6e345ca65ca14..e35e681a15dca 100644 > --- a/net/core/filter.c > +++ b/net/core/filter.c > @@ -2654,6 +2654,38 @@ static void sk_msg_reset_curr(struct sk_msg *msg) > } > } > =20 > +static bool sk_msg_elem_is_copy(const struct sk_msg *msg, u32 i) > +{ > + return test_bit(i, msg->sg.copy); > +} > + > +static void sk_msg_clear_elem_copy(struct sk_msg *msg, u32 i) > +{ > + __clear_bit(i, msg->sg.copy); > +} > + > +static void sk_msg_set_elem_copy(struct sk_msg *msg, u32 i) > +{ > + __set_bit(i, msg->sg.copy); > +} > + > +static void sk_msg_clear_copy_range(struct sk_msg *msg, u32 start, u32 e= nd) > +{ > + while (start !=3D end) { > + sk_msg_clear_elem_copy(msg, start); > + sk_msg_iter_var_next(start); > + } > +} > + > +static void sk_msg_sg_move(struct sk_msg *msg, u32 dst, u32 src) > +{ > + msg->sg.data[dst] =3D msg->sg.data[src]; > + if (sk_msg_elem_is_copy(msg, src)) > + sk_msg_set_elem_copy(msg, dst); > + else > + sk_msg_clear_elem_copy(msg, dst); > +} > + > static const struct bpf_func_proto bpf_msg_cork_bytes_proto =3D { > .func =3D bpf_msg_cork_bytes, > .gpl_only =3D false, > @@ -2692,7 +2724,7 @@ BPF_CALL_4(bpf_msg_pull_data, struct sk_msg *, msg,= u32, start, > * account for the headroom. > */ > bytes_sg_total =3D start - offset + bytes; > - if (!test_bit(i, msg->sg.copy) && bytes_sg_total <=3D len) > + if (!sk_msg_elem_is_copy(msg, i) && bytes_sg_total <=3D len) > goto out; > =20 > /* At this point we need to linearize multiple scatterlist > @@ -2738,6 +2770,7 @@ BPF_CALL_4(bpf_msg_pull_data, struct sk_msg *, msg,= u32, start, > } while (i !=3D last_sge); > =20 > sg_set_page(&msg->sg.data[first_sge], page, copy, 0); > + sk_msg_clear_elem_copy(msg, first_sge); > =20 > /* To repair sg ring we need to shift entries. If we only > * had a single entry though we can just replace it and > @@ -2747,8 +2780,14 @@ BPF_CALL_4(bpf_msg_pull_data, struct sk_msg *, msg= , u32, start, > shift =3D last_sge > first_sge ? > last_sge - first_sge - 1 : > NR_MSG_FRAG_IDS - first_sge + last_sge - 1; > - if (!shift) > + if (!shift) { > + sk_msg_clear_elem_copy(msg, msg->sg.end); > goto out; > + } > + > + i =3D first_sge; > + sk_msg_iter_var_next(i); > + sk_msg_clear_copy_range(msg, i, last_sge); > =20 > i =3D first_sge; > sk_msg_iter_var_next(i); > @@ -2762,16 +2801,18 @@ BPF_CALL_4(bpf_msg_pull_data, struct sk_msg *, ms= g, u32, start, > if (move_from =3D=3D msg->sg.end) > break; > =20 > - msg->sg.data[i] =3D msg->sg.data[move_from]; > + sk_msg_sg_move(msg, i, move_from); > msg->sg.data[move_from].length =3D 0; > msg->sg.data[move_from].page_link =3D 0; > msg->sg.data[move_from].offset =3D 0; > + sk_msg_clear_elem_copy(msg, move_from); > sk_msg_iter_var_next(i); > } while (1); > =20 > msg->sg.end =3D msg->sg.end - shift > msg->sg.end ? > msg->sg.end - shift + NR_MSG_FRAG_IDS : > msg->sg.end - shift; > + sk_msg_clear_elem_copy(msg, msg->sg.end); > out: > sk_msg_reset_curr(msg); > msg->data =3D sg_virt(&msg->sg.data[first_sge]) + start - offset; > @@ -2794,6 +2835,8 @@ BPF_CALL_4(bpf_msg_push_data, struct sk_msg *, msg,= u32, start, > { > struct scatterlist sge, nsge, nnsge, rsge =3D {0}, *psge; > u32 new, i =3D 0, l =3D 0, space, copy =3D 0, offset =3D 0; > + bool sge_copy =3D false, nsge_copy =3D false, nnsge_copy =3D false; > + bool rsge_copy =3D false; > u8 *raw, *to, *from; > struct page *page; > =20 > @@ -2869,6 +2912,7 @@ BPF_CALL_4(bpf_msg_push_data, struct sk_msg *, msg,= u32, start, > sk_msg_iter_var_prev(i); > psge =3D sk_msg_elem(msg, i); > rsge =3D sk_msg_elem_cpy(msg, i); > + rsge_copy =3D sk_msg_elem_is_copy(msg, i); > =20 > psge->length =3D start - offset; > rsge.length -=3D psge->length; > @@ -2894,23 +2938,34 @@ BPF_CALL_4(bpf_msg_push_data, struct sk_msg *, ms= g, u32, start, > /* Shift one or two slots as needed */ > sge =3D sk_msg_elem_cpy(msg, new); > sg_unmark_end(&sge); > + sge_copy =3D sk_msg_elem_is_copy(msg, new); > =20 > nsge =3D sk_msg_elem_cpy(msg, i); > + nsge_copy =3D sk_msg_elem_is_copy(msg, i); > if (rsge.length) { > sk_msg_iter_var_next(i); > nnsge =3D sk_msg_elem_cpy(msg, i); > + nnsge_copy =3D sk_msg_elem_is_copy(msg, i); > sk_msg_iter_next(msg, end); > } > =20 > while (i !=3D msg->sg.end) { > msg->sg.data[i] =3D sge; > + if (sge_copy) > + sk_msg_set_elem_copy(msg, i); > + else > + sk_msg_clear_elem_copy(msg, i); > sge =3D nsge; > + sge_copy =3D nsge_copy; > sk_msg_iter_var_next(i); > if (rsge.length) { > nsge =3D nnsge; > + nsge_copy =3D nnsge_copy; > nnsge =3D sk_msg_elem_cpy(msg, i); > + nnsge_copy =3D sk_msg_elem_is_copy(msg, i); > } else { > nsge =3D sk_msg_elem_cpy(msg, i); > + nsge_copy =3D sk_msg_elem_is_copy(msg, i); > } > } > =20 > @@ -2918,13 +2973,18 @@ BPF_CALL_4(bpf_msg_push_data, struct sk_msg *, ms= g, u32, start, > /* Place newly allocated data buffer */ > sk_mem_charge(msg->sk, len); > msg->sg.size +=3D len; > - __clear_bit(new, msg->sg.copy); > + sk_msg_clear_elem_copy(msg, new); > sg_set_page(&msg->sg.data[new], page, len + copy, 0); > if (rsge.length) { > get_page(sg_page(&rsge)); > sk_msg_iter_var_next(new); > msg->sg.data[new] =3D rsge; > + if (rsge_copy) > + sk_msg_set_elem_copy(msg, new); > + else > + sk_msg_clear_elem_copy(msg, new); > } > + sk_msg_clear_elem_copy(msg, msg->sg.end); > =20 > sk_msg_reset_curr(msg); > sk_msg_compute_data_pointers(msg); > @@ -2950,27 +3010,38 @@ static void sk_msg_shift_left(struct sk_msg *msg,= int i) > do { > prev =3D i; > sk_msg_iter_var_next(i); > - msg->sg.data[prev] =3D msg->sg.data[i]; > + sk_msg_sg_move(msg, prev, i); > } while (i !=3D msg->sg.end); > =20 > sk_msg_iter_prev(msg, end); > + sk_msg_clear_elem_copy(msg, msg->sg.end); > } > =20 > static void sk_msg_shift_right(struct sk_msg *msg, int i) > { > struct scatterlist tmp, sge; > + bool tmp_copy, sge_copy; > =20 > sk_msg_iter_next(msg, end); > sge =3D sk_msg_elem_cpy(msg, i); > + sge_copy =3D sk_msg_elem_is_copy(msg, i); > sk_msg_iter_var_next(i); > tmp =3D sk_msg_elem_cpy(msg, i); > + tmp_copy =3D sk_msg_elem_is_copy(msg, i); > =20 > while (i !=3D msg->sg.end) { > msg->sg.data[i] =3D sge; > + if (sge_copy) > + sk_msg_set_elem_copy(msg, i); > + else > + sk_msg_clear_elem_copy(msg, i); > sk_msg_iter_var_next(i); > sge =3D tmp; > + sge_copy =3D tmp_copy; > tmp =3D sk_msg_elem_cpy(msg, i); > + tmp_copy =3D sk_msg_elem_is_copy(msg, i); > } > + sk_msg_clear_elem_copy(msg, msg->sg.end); > } > =20 > BPF_CALL_4(bpf_msg_pop_data, struct sk_msg *, msg, u32, start, > @@ -3027,8 +3098,10 @@ BPF_CALL_4(bpf_msg_pop_data, struct sk_msg *, msg,= u32, start, > */ > if (start !=3D offset) { > struct scatterlist *nsge, *sge =3D sk_msg_elem(msg, i); > + u32 sge_idx =3D i; > int a =3D start - offset; > int b =3D sge->length - pop - a; > + bool sge_copy =3D sk_msg_elem_is_copy(msg, sge_idx); > =20 > sk_msg_iter_var_next(i); > =20 > @@ -3041,6 +3114,10 @@ BPF_CALL_4(bpf_msg_pop_data, struct sk_msg *, msg,= u32, start, > sg_set_page(nsge, > sg_page(sge), > b, sge->offset + pop + a); > + if (sge_copy) > + sk_msg_set_elem_copy(msg, i); > + else > + sk_msg_clear_elem_copy(msg, i); > } else { > struct page *page, *orig; > u8 *to, *from; > @@ -3057,6 +3134,7 @@ BPF_CALL_4(bpf_msg_pop_data, struct sk_msg *, msg, = u32, start, > memcpy(to, from, a); > memcpy(to + a, from + a + pop, b); > sg_set_page(sge, page, a + b, 0); > + sk_msg_clear_elem_copy(msg, sge_idx); > put_page(orig); > } > pop =3D 0;