From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out-172.mta0.migadu.com (out-172.mta0.migadu.com [91.218.175.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 45168283C87 for ; Thu, 15 Jan 2026 03:26:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.172 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768447561; cv=none; b=bRe5rmpI+grtB/JuEiZwz1IPsTV8zs+1oEIrV/I9FONuuuLKZKuR6wRoITQsUjN3+wioLYuOfCWZ0BFmm0+nx/g3bcvwjkKw3uHyzzXH6r9mYt7Ny4hwxSlzJGJ5h8MqhsNUjMx4+iY5toO2rlbZZnZpE5+Vv/G9P5o6BehqxNY= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768447561; c=relaxed/simple; bh=QkPXqnV3ZRfhtGfh1Vcau6M2ZsYkINU+HKPnyvuY0iU=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=Wsq6x7yjzkZoZB4cvUnH/Oak9zH4gV5OMaKn+UWF/ucMh2eTrtR7Yg1HZ2mO4I+8mUhPqiU0xRu/GZYx5O6yiZauJWGD4v2u8pGCgWS5B/9r0cFAYYt/orQemMzb9JOtwIRsw82R93B9LH0XxC75IDI28riB4w0x1SS4idZ5DMY= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=rWWj7X3Z; arc=none smtp.client-ip=91.218.175.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="rWWj7X3Z" Message-ID: <3d4287e5-0564-4933-83ee-c2dcbfe993f4@linux.dev> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1768447557; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=QwdtCxk+bhle2ZNHlx3yJsg+WAw9Zp1L7FaYnQsiDdw=; b=rWWj7X3Z2fEw+8PidI1ci6WixFd5ud03Mwh/YDwBd5ZkwOSyXuvYFcwQXTjjOGzMzNnWR3 pgwu+gNmbAEs7GKgMkE0vcY+ReR0X611mnt5pNo3q9BhyX/NRsAWnmu1/JjXVMR8zi3igH W5z8PYpkszoIK9EgQbdjNUcxwzeZBDE= Date: Thu, 15 Jan 2026 11:25:46 +0800 Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Subject: Re: [PATCH bpf-next v3 4/5] bpf: lru: Fix unintended eviction when updating lru hash maps Content-Language: en-US To: Martin KaFai Lau Cc: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Eduard Zingerman , Song Liu , Yonghong Song , John Fastabend , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , Shuah Khan , Saket Kumar Bhaskar , "David S . Miller" , linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, kernel-patches-bot@fb.com, bpf@vger.kernel.org References: <20260107151456.72539-1-leon.hwang@linux.dev> <20260107151456.72539-5-leon.hwang@linux.dev> X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Leon Hwang In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT On 15/1/26 03:39, Martin KaFai Lau wrote: > > > On 1/7/26 7:14 AM, Leon Hwang wrote: >> When updating an existing element in lru_[percpu_,]hash maps, the current >> implementation always calls prealloc_lru_pop() to get a new node before >> checking if the key already exists. If the map is full, this triggers >> LRU eviction and removes an existing element, even though the update >> operation only needs to modify the value of an existing key in-place. >> >> This is problematic because: >> 1. Users may unexpectedly lose entries when doing simple value updates >> 2. The eviction overhead is unnecessary for existing key updates > > This is not the common LRU map use case. The bpf prog usually does a > lookup first, finds the entry, and then directly updates the value in- > place in the bpf prog itself. If the lookup fails, it will insert a > _new_ element. > > When the map is full, eviction should actually be triggered regardless. > For an LRU map that is too small to fit the working set, it is asking > for trouble. > > From the syscall update, if the use case is always updating an existing > element, the regular hashmap should be used instead. > Thanks for the explanation. While the common use case is indeed to update values in place after a lookup, small-capacity LRU maps are not forbidden today, so the unexpected eviction behavior can still be observed in practice. I have been asked about data loss with a 110-entry LRU map before, and in that case my recommendation was also to use a regular hash map instead. >> Fix this by first checking if the key exists before allocating a new >> node. If the key is found, update the value using the extra lru node >> without triggering any eviction. > > This will instead add overhead for the common use case described above. > The patch is mostly for getting a selftest case to work in a small LRU > map. I don't think it is worth the added complexity either. > Given this, instead of pursuing this change, I will update the selftests in 'tools/testing/selftests/bpf/prog_tests/percpu_alloc.c' to make them more robust and avoid CI failures. > Patch 2 and 3 look ok, but they also only make marginal improvements on > the existing code. > > pw-bot: cr > >> +static int htab_lru_map_update_elem_in_place(struct bpf_htab *htab, >> void *key, void *value, >> +                         u64 map_flags, struct bucket *b, >> +                         struct hlist_nulls_head *head, u32 hash, >> +                         bool percpu, bool onallcpus) >> +{ [...] >> +err: >> +    htab_unlock_bucket(b, flags); >> + >> +err_lock_bucket: >> +    if (ret) { >> +        bpf_lru_push_free(&htab->lru, node); >> +    } else { >> +        if (l_old && !percpu) >> +            bpf_obj_free_fields(map->record, htab_elem_value(l_old, >> key_size)); > > Does htab_lru_map_update_elem() have an existing bug that is missing the > bpf_obj_free_fields() on l_old? > No. htab_lru_push_free() would free the special fields. Thanks, Leon [...]