From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A3010C61DA4 for ; Wed, 15 Feb 2023 01:54:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229661AbjBOBy4 (ORCPT ); Tue, 14 Feb 2023 20:54:56 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40912 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231871AbjBOByf (ORCPT ); Tue, 14 Feb 2023 20:54:35 -0500 Received: from out-140.mta0.migadu.com (out-140.mta0.migadu.com [IPv6:2001:41d0:1004:224b::8c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0170B4207 for ; Tue, 14 Feb 2023 17:54:33 -0800 (PST) Message-ID: <6d48c284-42eb-9688-4259-79b7f096e294@linux.dev> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1676426072; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mhqbNPN7xfCGCSg3vVz7nFLwHSKzUxezJOcTZ/ywPcs=; b=oazU5CKA6kp0Y87uBdomtQ6CiatwUxSuz6iu7gi80L0osDrOdsjmBG9Qz5PG2NYNBERSHy jpI9BBKK4OaPrV9osJiKlEpbR+OfjqaffTtMfMXFodEZHpW/gXMVabVuAFGmHJGe5Yd8KK +l/0V8Y1VdWpRKg5QkxeYNI2vKxqMRA= Date: Tue, 14 Feb 2023 17:54:26 -0800 MIME-Version: 1.0 Subject: Re: [RFC PATCH bpf-next 0/6] bpf: Handle reuse in bpf memory alloc Content-Language: en-US To: Alexei Starovoitov , Hou Tao Cc: Kumar Kartikeya Dwivedi , Yonghong Song , bpf , Andrii Nakryiko , Song Liu , Hao Luo , Yonghong Song , Alexei Starovoitov , Daniel Borkmann , KP Singh , Stanislav Fomichev , Jiri Olsa , John Fastabend , "Paul E . McKenney" , rcu@vger.kernel.org, Hou Tao , Martin KaFai Lau References: <20221230041151.1231169-1-houtao@huaweicloud.com> <20230101012629.nmpofewtlgdutqpe@macbook-pro-6.dhcp.thefacebook.com> <1d97a5c0-d1fb-a625-8e8d-25ef799ee9e2@huaweicloud.com> <20230210163258.phekigglpquitq33@apollo> X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Martin KaFai Lau In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Migadu-Flow: FLOW_OUT Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org On 2/11/23 8:34 AM, Alexei Starovoitov wrote: > On Sat, Feb 11, 2023 at 8:33 AM Alexei Starovoitov > wrote: >> >> On Fri, Feb 10, 2023 at 5:10 PM Hou Tao wrote: >>>>> Hou, are you plannning to resubmit this change? I also hit this while testing my >>>>> changes on bpf-next. >>>> Are you talking about the whole patch set or just GFP_ZERO in mem_alloc? >>>> The former will take a long time to settle. >>>> The latter is trivial. >>>> To unblock yourself just add GFP_ZERO in an extra patch? >>> Sorry for the long delay. Just find find out time to do some tests to compare >>> the performance of bzero and ctor. After it is done, will resubmit on next week. >> >> I still don't like ctor as a concept. In general the callbacks in the critical >> path are guaranteed to be slow due to retpoline overhead. >> Please send a patch to add GFP_ZERO. >> >> Also I realized that we can make the BPF_REUSE_AFTER_RCU_GP flag usable >> without risking OOM by only waiting for normal rcu GP and not rcu_tasks_trace. >> This approach will work for inner nodes of qptrie, since bpf progs >> never see pointers to them. It will work for local storage >> converted to bpf_mem_alloc too. It wouldn't need to use its own call_rcu. >> It's also safe without uaf caveat in sleepable progs and sleepable progs > > I meant 'safe with uaf caveat'. > Safe because we wait for rcu_task_trace later before returning to kernel memory. For local storage, when its owner (sk/task/inode/cgrp) is going away, the memory can be reused immediately. No rcu gp is needed. The local storage delete case (eg. bpf_sk_storage_delete) is the only one that needs to be freed by tasks_trace gp because another bpf prog (reader) may be under the rcu_read_lock_trace(). I think the idea (BPF_REUSE_AFTER_RCU_GP) on allowing reuse after vanilla rcu gp and free (if needed) after tasks_trace gp can be extended to the local storage delete case. I think we can extend the assumption that "sleepable progs (reader) can use explicit bpf_rcu_read_lock() when they want to avoid uaf" to bpf_{sk,task,inode,cgrp}_storage_get() also. I also need the GFP_ZERO in bpf_mem_alloc, so will work on the GFP_ZERO and the BPF_REUSE_AFTER_RCU_GP idea. Probably will get the GFP_ZERO out first. > >> can use explicit bpf_rcu_read_lock() when they want to avoid uaf. >> So please respin the set with rcu gp only and that new flag.