From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-ed1-f51.google.com (mail-ed1-f51.google.com [209.85.208.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6B02B3A7588 for ; Thu, 14 May 2026 10:59:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.51 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778756357; cv=none; b=UbC9toMsSmfPrJj1yYycgyATPYBwJoDJ2lW0+TsyZr+wDVIwxcy0mbHDEnGf6dk9Mglx6GyUFq8FR+/6SWvsUC3bzv0l7E57wTkM/0ZQvVPy84ARrsxEEoEzuI0fFDUzkjQWd0tP5KBfuZGmX2h5CBNJoNtzdyZPI3OWOoL7qGQ= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778756357; c=relaxed/simple; bh=PQJOq7Gt3V0FoByHftq5oizZyfqBwptJsy5CzCeaun4=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=budcJnRFpHuPQ9Fg+Q5YD+WVrhTfo0IAnRroCT2M4jbVx/mg7GfgYKdtjzI5eqc7HaiUe3d8sxFob4VslhQjJX8RKGLicbjkTVe3JKavXnDvMCRjw9HDSC9gKd4YujrRyrzlwjHU+JOSJQGoSr1pdloghmGl33e8RpG0vkM4ZpE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=HIIfdxmi; arc=none smtp.client-ip=209.85.208.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="HIIfdxmi" Received: by mail-ed1-f51.google.com with SMTP id 4fb4d7f45d1cf-67929ff6dbfso1815716a12.2 for ; Thu, 14 May 2026 03:59:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1778756354; x=1779361154; darn=vger.kernel.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=kkwdTqHs0HHapBLKFJsBVx4Ec+/KoqsOq1kIxSzcnYw=; b=HIIfdxmiLiF4EFHfITfDsuXrebxBrB7gCjm3wIEN+zq+9VEnHPUsRDlSzZGib7qpXZ EC46MvaorDRtnuY9G5tr08toCjRsYjtVUL/nqHcC3SgLyyHe22rkbSDLJPgYv+N6kaVU CA7xH2QfR4LB8BgWijlGpdKVlrN48P9grkt9BI1mmeGNB05bHWP8SWPzhoo2CP9tbW/j a63mXl2g+dn3a+1Hy5Tl6EMEww4kKrJeXtOtcG1lXe7tj4sbkgyw6CkBnRHAtGKtEp0m iO2Pz9h2Q+HYNM4Hsv7dxxP2UqCnncmSLbeSrLgfTuU9aNICrNV51F2x7zeWh4AAQcBY OfBg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1778756354; x=1779361154; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-gg:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=kkwdTqHs0HHapBLKFJsBVx4Ec+/KoqsOq1kIxSzcnYw=; b=tN/kIICdFZkqMDzRfXRBIajNUV5EL52F8MqsGJ1FrQZZMkK+1fcs2ZGsDxD3IlLF8P 2/5b1abcHr8u1MwjDsDLgNFj+jrm+zJ8HCxxrQFDin1/kcOr+Q/pVgZmv+GfELqcIhuo XvtSJSmEwulJP9MTmig1DI1Ofp9uKbbgXE3WWgIa7NLqhUs9tMFHI3jSG2qFMHN8TUW7 M8xzPnl714RXAy+TnKZVghk1KMF2tzt+IvfSWwRJrpB8/YHhfUqDDLv+dPptD8qLdhiK AGWUh9kozVt+v/HI/POIbwmPRVgpY7GiTfiWjMibKqH9vZnTKdbQQEqIgsJod5Sf1QlB NCaQ== X-Forwarded-Encrypted: i=1; AFNElJ9Np74ie7/8yTRhhuTY33YDIBZApW5ux+ZPXnzlKJp/nyPRU9YpEEWQYIwVD6tsC3E0Ygo=@vger.kernel.org X-Gm-Message-State: AOJu0Yza6jdqYYJ2WN5oyA4RVcmC2cZ8XRQa9t3kmWK9UMV+wNbz0TOS iU+mzOCw2GfqhQrv2ftU0n4boK9Pajdd8ZjDZ+/aR4nKUAW4OnmRZzYJ X-Gm-Gg: Acq92OEp69p+oO4PH8iDrPKYXxfQ9jZp/bxgvBzj/PaoA5Kl+7GGs4fEeDVlDFOPefI JtEnuOAH29u2o+xBVgR0G3afyTavLoYPQ2W5Xoptb3t6JMpyx9W8+YSOxFAyNwQ/3yR0UR0Tqok py8kfZTw6e4kcjlaX8TuZez3LbM5xVWuknVsKkzwUkwgkbHMLkFB2ualQvxc+IUMfXgyMtc7C1Z otPHu/J8NEryMg6H9W8cwZuSz5bnIYciKpd9QlyCkSK8ViCtx8OfScG7zufTZbByGrP7m7HY65s FTsyGITH6zr/sR9OwwNDALtTTj1/i/7/i2NEdibPRc1l4WFRi9cjTUXcOsxgbZPYMN/SWa/NYWZ O9AMvRm6Abrrn+fEMqTqsRG/vzrCfCNgJXDJw5T4ySZSrAVzXWPstaUtqmNVZV2Wo7Vr5z1XYCl dnocvHNzWxgUOxv3UtbxLcakvmjMFT+Xiu/6pLJkyzXPCduX2ySnvN72eFRQQeK9Goe1MiJew= X-Received: by 2002:a05:6402:1a57:b0:683:1cc8:84a8 with SMTP id 4fb4d7f45d1cf-6831cc88ad2mr889282a12.12.1778756353482; Thu, 14 May 2026 03:59:13 -0700 (PDT) Received: from ?IPV6:2a03:83e0:1126:4:6819:f883:3543:af7e? ([2620:10d:c092:500::7:3aea]) by smtp.gmail.com with ESMTPSA id 4fb4d7f45d1cf-68310d58c79sm637341a12.12.2026.05.14.03.59.08 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 14 May 2026 03:59:11 -0700 (PDT) Message-ID: <188d8000-356c-4e7e-ac4b-4ab78ef0a81d@gmail.com> Date: Thu, 14 May 2026 11:59:07 +0100 Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH bpf-next v4 03/11] bpf: Implement resizable hashmap basic functions To: bot+bpf-ci@kernel.org, bpf@vger.kernel.org, ast@kernel.org, andrii@kernel.org, daniel@iogearbox.net, kafai@meta.com, kernel-team@meta.com, eddyz87@gmail.com, memxor@gmail.com, herbert@gondor.apana.org.au Cc: yatsenko@meta.com, martin.lau@kernel.org, yonghong.song@linux.dev, clm@meta.com, ihor.solodrai@linux.dev References: <20260513-rhash-v4-3-dd3d541ccb0b@meta.com> Content-Language: en-US From: Mykyta Yatsenko In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit On 5/14/26 12:21 AM, bot+bpf-ci@kernel.org wrote: >> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c >> --- a/kernel/bpf/hashtab.c >> +++ b/kernel/bpf/hashtab.c >> @@ -9,6 +9,7 @@ >> #include >> #include >> #include >> +#include >> #include >> #include >> #include > > [ ... ] > >> @@ -2739,3 +2740,383 @@ static void htab_of_map_free(struct bpf_map *map) >> BATCH_OPS(htab), >> .map_btf_id = &htab_map_btf_ids[0], >> }; >> + >> +struct rhtab_elem { >> + struct rhash_head node; >> + /* key bytes, then value bytes follow */ >> + u8 data[] __aligned(8); >> +}; >> + >> +struct bpf_rhtab { >> + struct bpf_map map; >> + struct rhashtable ht; >> + struct bpf_mem_alloc ma; >> + u32 elem_size; >> +}; > > [ ... ] > >> +static int rhtab_delete_elem(struct bpf_rhtab *rhtab, struct rhtab_elem *elem, void *copy, >> + u64 flags) >> +{ >> + int err; >> + >> + /* >> + * disable_instrumentation() mitigates the deadlock for programs running in NMI context. >> + * rhashtable locks bucket with local_irq_save(). Only NMI programs may reenter >> + * rhashtable code, bpf_disable_instrumentation() disables programs running in NMI, except >> + * raw tracepoints, which we don't have in rhashtable. >> + */ >> + bpf_disable_instrumentation(); >> + err = rhashtable_remove_fast(&rhtab->ht, &elem->node, rhtab_params); >> + bpf_enable_instrumentation(); > > Alexei Starovoitov suggested in v1 that update/delete operations should > check 'if (irqs_disabled()) return -EBUSY' to prevent deadlocks when > called from IRQ/NMI contexts. > > The v4 implementation uses bpf_disable_instrumentation() instead, which > only prevents NMI-based tracing programs but does not address the softirq > deadlock scenario described by Sashiko bot in v3: if a syscall holds the > bucket lock and is interrupted by hardware IRQ that schedules softirq > (e.g., XDP/TC program), the softirq would deadlock trying to acquire the > same bucket lock. > > Does the bpf_disable_instrumentation() approach prevent this scenario? > > Link: https://lore.kernel.org/all/CAADnVQKvp2tnQSWW4Kh3ofGLvaEYJ47kyZeNUscAH_DShw8XHw@mail.gmail.com/ > >> + >> + if (err) >> + return err; >> + >> + if (copy) { >> + rhtab_read_elem_value(&rhtab->map, copy, elem, flags); >> + check_and_init_map_value(&rhtab->map, copy); >> + } >> + >> + bpf_mem_cache_free_rcu(&rhtab->ma, elem); >> + return 0; >> +} > > [ ... ] > >> +static long rhtab_map_update_existing(struct bpf_map *map, struct rhtab_elem *elem, void *value, >> + u64 map_flags) >> +{ >> + void *old_val = rhtab_elem_value(elem, map->key_size); >> + >> + if (map_flags & BPF_NOEXIST) >> + return -EEXIST; >> + >> + if (map_flags & BPF_F_LOCK) >> + copy_map_value_locked(map, old_val, value, false); >> + else >> + copy_map_value(map, old_val, value); >> + return 0; >> +} >> + >> +static long rhtab_map_update_elem(struct bpf_map *map, void *key, void *value, u64 map_flags) >> +{ >> + struct bpf_rhtab *rhtab = container_of(map, struct bpf_rhtab, map); >> + struct rhtab_elem *elem, *tmp; >> + >> + if (unlikely((map_flags & ~BPF_F_LOCK) > BPF_EXIST)) >> + return -EINVAL; >> + >> + if ((map_flags & BPF_F_LOCK) && !btf_record_has_field(map->record, BPF_SPIN_LOCK)) >> + return -EINVAL; > > The commit message claims 'Users requiring consistent reads should use > BPF_F_LOCK', but can BPF_F_LOCK actually work with BPF_MAP_TYPE_RHASH? Implemented in the following patch. > > Looking at map_check_btf() in kernel/bpf/syscall.c, BPF_MAP_TYPE_RHASH > is not in the list of map types allowed to have BPF_SPIN_LOCK fields > (only HASH, ARRAY, CGROUP_STORAGE, SK_STORAGE, INODE_STORAGE, > TASK_STORAGE, CGRP_STORAGE are allowed). Map creation with a > BPF_SPIN_LOCK field on a BPF_MAP_TYPE_RHASH map returns -EOPNOTSUPP. > > Therefore btf_record_has_field(map->record, BPF_SPIN_LOCK) is always > false for rhtab maps, and any update with BPF_F_LOCK is rejected with > -EINVAL. The BPF_F_LOCK code paths in rhtab_map_update_existing, > rhtab_read_elem_value, and rhtab_map_lookup_and_delete_elem are all > dead code. > > Is there currently any way to obtain consistent reads on rhtab maps? > >> + >> + guard(rcu)(); >> + elem = rhtab_lookup_elem(map, key); >> + if (elem) >> + return rhtab_map_update_existing(map, elem, value, map_flags); >> + >> + if (map_flags & BPF_EXIST) >> + return -ENOENT; >> + >> + /* Check max_entries limit before inserting new element */ >> + if (atomic_read(&rhtab->ht.nelems) >= map->max_entries) >> + return -E2BIG; >> + >> + elem = bpf_mem_cache_alloc(&rhtab->ma); >> + if (!elem) >> + return -ENOMEM; >> + >> + memcpy(elem->data, key, map->key_size); >> + copy_map_value(map, rhtab_elem_value(elem, map->key_size), value); >> + >> + /* Prevent deadlock for NMI programs attempting to take bucket lock */ >> + bpf_disable_instrumentation(); >> + tmp = rhashtable_lookup_get_insert_fast(&rhtab->ht, &elem->node, rhtab_params); >> + bpf_enable_instrumentation(); >> + >> + if (tmp) { >> + bpf_mem_cache_free(&rhtab->ma, elem); >> + if (IS_ERR(tmp)) >> + return PTR_ERR(tmp); >> + >> + return rhtab_map_update_existing(map, tmp, value, map_flags); >> + } >> + >> + return 0; >> +} > > [ ... ] > >> +static int rhtab_map_get_next_key(struct bpf_map *map, void *key, void *next_key) >> +{ >> + return -EOPNOTSUPP; >> +} > > rhtab_map_get_next_key always returns -EOPNOTSUPP. As a result, > userspace cannot iterate keys of a BPF_MAP_TYPE_RHASH map via the > standard syscall API, and tools like bpftool dump cannot list rhtab > map contents. this is just a stub. > > The series previously added rhashtable_next_key() (commit 595ea21f6c1c) > which provides best-effort iteration. Is there a plan to wire this up? > >> + >> +static void rhtab_map_seq_show_elem(struct bpf_map *map, void *key, struct seq_file *m) >> +{ >> +} >> + >> +static long bpf_each_rhash_elem(struct bpf_map *map, bpf_callback_t callback_fn, >> + void *callback_ctx, u64 flags) >> +{ >> + return -EOPNOTSUPP; >> +} >> + >> +static u64 rhtab_map_mem_usage(const struct bpf_map *map) >> +{ >> + return 0; >> +} > > rhtab_map_mem_usage unconditionally returns 0, regardless of how many > elements the map holds or its memory footprint. This callback is consumed > by bpf_map_show_fdinfo, BPF map iteration tools, and memory accounting > reporting. > > For an rhtab map containing arbitrarily many elements (up to 2^31) plus > the rhashtable bucket arrays plus the bpf_mem_alloc per-CPU caches, > does reporting 0 mislead userspace introspection and accounting? > >> + >> +static int rhtab_map_lookup_batch(struct bpf_map *map, const union bpf_attr *attr, >> + union bpf_attr __user *uattr) >> +{ >> + return -EOPNOTSUPP; >> +} >> + >> +static int rhtab_map_lookup_and_delete_batch(struct bpf_map *map, const union bpf_attr *attr, >> + union bpf_attr __user *uattr) >> +{ >> + return -EOPNOTSUPP; >> +} > > [ ... ] > >> +static void *bpf_rhash_map_seq_start(struct seq_file *seq, loff_t *pos) >> +{ >> + return NULL; >> +} >> + >> +static void *bpf_rhash_map_seq_next(struct seq_file *seq, void *v, loff_t *pos) >> +{ >> + return NULL; >> +} >> + >> +static int bpf_rhash_map_seq_show(struct seq_file *seq, void *v) >> +{ >> + return 0; >> +} >> + >> +static void bpf_rhash_map_seq_stop(struct seq_file *seq, void *v) >> +{ >> +} > > [ ... ] > >> +const struct bpf_map_ops rhtab_map_ops = { >> + .map_meta_equal = bpf_map_meta_equal, >> + .map_alloc_check = rhtab_map_alloc_check, >> + .map_alloc = rhtab_map_alloc, >> + .map_free = rhtab_map_free, >> + .map_get_next_key = rhtab_map_get_next_key, >> + .map_release_uref = rhtab_map_free_internal_structs, >> + .map_lookup_elem = rhtab_map_lookup_elem, >> + .map_lookup_and_delete_elem = rhtab_map_lookup_and_delete_elem, >> + .map_update_elem = rhtab_map_update_elem, >> + .map_delete_elem = rhtab_map_delete_elem, >> + .map_gen_lookup = rhtab_map_gen_lookup, >> + .map_seq_show_elem = rhtab_map_seq_show_elem, >> + .map_set_for_each_callback_args = map_set_for_each_callback_args, >> + .map_for_each_callback = bpf_each_rhash_elem, >> + .map_mem_usage = rhtab_map_mem_usage, >> + BATCH_OPS(rhtab), >> + .map_btf_id = &rhtab_map_btf_ids[0], >> + .iter_seq_info = &rhash_iter_seq_info, >> +}; > > Multiple map_ops are registered as functional callbacks but are stubs. > > map_seq_show_elem is registered with rhtab_map_seq_show_elem (empty > body) - bpffs map fdinfo will not show element contents. > > map_for_each_callback is registered with bpf_each_rhash_elem returning > -EOPNOTSUPP - the BPF helper bpf_for_each_map_elem() will fail at runtime. > > iter_seq_info points to rhash_iter_seq_info whose start/next/show/stop > are all stubs - BPF map iter programs targeting rhtab maps will iterate > nothing. > > BATCH_OPS expands to rhtab_map_lookup_batch/lookup_and_delete_batch > returning -EOPNOTSUPP - userspace BPF_MAP_LOOKUP_BATCH and > BPF_MAP_LOOKUP_AND_DELETE_BATCH syscalls fail. > > While returning -EOPNOTSUPP is graceful (no crash), registering ops > pointers that fail rather than relying on the dispatcher's NULL check > defeats the verifier's compile-time validation: the verifier will accept > BPF programs that call bpf_for_each_map_elem on an rhtab map, only to > fail at runtime. > > Should these callbacks be NULL instead? > > > --- > AI reviewed your patch. Please fix the bug or email reply why it's not a bug. > See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md > > CI run summary: https://github.com/kernel-patches/bpf/actions/runs/25830929099