From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="bjfS7dlI" Received: from out-188.mta0.migadu.com (out-188.mta0.migadu.com [IPv6:2001:41d0:1004:224b::bc]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9123990 for ; Fri, 8 Dec 2023 10:26:41 -0800 (PST) Message-ID: <80439854-29ed-41f1-855b-d0cf91c07b8d@linux.dev> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1702059998; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+mysBcGb/P1HG/nItSRanrsAjUUSC0dSKoI6qQolZcA=; b=bjfS7dlIdO1GXVz/tG2ysjT8YrBKNsNG0b9C8ThVyqba/QWIST9KoDui3ZEVp9y9WIz3oS /f7HcCn6HtkAKnrHqU6TViDSyrWlal1lh8Rcix9faLA1+opS7ZsBEGzbpq/IKNXXA43NCK nTpvH9+zU/OT3MKLpkhvzV2bjLXO86E= Date: Fri, 8 Dec 2023 10:26:34 -0800 Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Subject: Re: [PATCH bpf-next v4] bpf: Fix a race condition between btf_put() and map_free() Content-Language: en-US To: Yonghong Song Cc: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , kernel-team@fb.com, Martin KaFai Lau , Hou Tao , bpf@vger.kernel.org References: <20231206210959.1035724-1-yonghong.song@linux.dev> <969852f3-34f8-45d9-bf2d-f6a4d5167e55@linux.dev> X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Martin KaFai Lau In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT On 12/8/23 8:45 AM, Yonghong Song wrote: > > On 12/8/23 12:16 AM, Martin KaFai Lau wrote: >> On 12/7/23 7:59 PM, Yonghong Song wrote: >>>> >>>> I am trying to avoid making a special case for "bool has_btf_ref;" and "bool >>>> from_map_check". It seems to a bit too much to deal with the error path for >>>> btf_parse(). >>>> >>>> Would doing the refcount_set(&btf->refcnt, 1) earlier in btf_parse help? >>> >>> No, it does not. The core reason is what Hao is mentioned in >>> https://lore.kernel.org/bpf/47ee3265-23f7-2130-ff28-27bfaf3f7877@huaweicloud.com/ >>> We simply cannot take btf reference if called from btf_parse(). >>> Let us say we move refcount_set(&btf->refcnt, 1) earlier in btf_parse() >>> so we take ref for btf during btf_parse_fields(), then we have >>>       btf_put <=== expect refcount == 0 to start the destruction process >>>         ... >>>           btf_record_free <=== in which if graph_root, a btf reference will >>> be hold >>> so btf_put will never be able to actually free btf data. >> >> ah. There is a loop like btf->struct_meta_tab->...btf. >> >>> Yes, the kasan problem will be resolved but we leak memory. >>> >>>> >>>>> It is also unnecessary to take a reference since the value_rec is >>>>> referring to a record in struct_meta_tab. >>>> >>>> If we optimize for not taking a refcnt, how about not taking a refcnt for >>>> all cases and postpone the btf_put(), instead of taking refcnt in one case >>>> but not another. Like your fix in v1. The failed selftest can be changed or >>>> even removed if it does not make sense anymore. >>> >>> After a couple of iterations, I think taking necessary reference approach >>> sounds better >>> and this will be consistent with how kptr is handled. For kptr, btf_parse >>> will ignore it. >> >> Got it. It is why kptr.btf got away with the loop. >> >> On the other hand, am I reading it correctly that kptr.btf only needs to take >> the refcnt for btf that is btf_is_kernel()? > > No. besides vmlinux and module btf, it also takes reference for prog btf, see > > static int btf_parse_kptr(const struct btf *btf, struct btf_field *field, >                           struct btf_field_info *info) > { > ... >         if (id == -ENOENT) { >                 /* btf_parse_kptr should only be called w/ btf = program BTF */ >                 WARN_ON_ONCE(btf_is_kernel(btf)); >                 /* Type exists only in program BTF. Assume that it's a MEM_ALLOC >                  * kptr allocated via bpf_obj_new >                  */ >                 field->kptr.dtor = NULL; >                 id = info->kptr.type_id; >                 kptr_btf = (struct btf *)btf; >                 btf_get(kptr_btf); I meant only kernel/module btf needs to take the refcnt, so there is no need to take the refcnt here for the (it)self btf. Sorry that I was not clear in my earlier comment. The record is capturing something either in the self btf or something in the kernel btf. The field->kptr.kptr is the one that may either point to a kernel or self btf, so it should be the only case that needs to check the following in btf_record_free(): if (btf_is_kernel(rec->fields[i].kptr.btf)) btf_put(rec->fields[i].kptr.btf); All other cases the record has a self btf (including field->graph_root.btf). The owner (map here) needs to ensure the self btf is freed after the record is freed. I was thinking if it can avoid doing different things based on where btf_parse_fields() is called by separating what type of btf always needs refcnt or not. Agree the approach in this patch will fix the issue also and I have acked v5. Thanks for the fix. >                 goto found_dtor; >         } > ... > } > >> >>> Unfortunately, for graph_root (list_head, rb_root), btf_parse and map_check >>> will both >>> process it and that adds a little bit complexity. >>> Alexei also suggested the same taking reference approach: >>> https://lore.kernel.org/bpf/CAADnVQL+uc6VV65_Ezgzw3WH=ME9z1Fdy8Pd6xd0oOq8rgwh7g@mail.gmail.com/ >>