[PATCH bpf-next] bpf: hashtab - allow BPF_MAP_LOOKUP{,_AND_DELETE}

bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH bpf-next] bpf: hashtab - allow BPF_MAP_LOOKUP{,_AND_DELETE}_BATCH with NULL keys/values.
@ 2025-08-13  7:39 Maciej Żenczykowski
  2025-08-13 20:46 ` kernel test robot
  2025-08-18 20:58 ` Yonghong Song
  0 siblings, 2 replies; 6+ messages in thread
From: Maciej Żenczykowski @ 2025-08-13  7:39 UTC (permalink / raw)
  To: Maciej Żenczykowski, Alexei Starovoitov, Daniel Borkmann
  Cc: Linux Network Development Mailing List, David S . Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Linux Kernel Mailing List, BPF Mailing List,
	Maciej Żenczykowski, Stanislav Fomichev

BPF_MAP_LOOKUP_AND_DELETE_BATCH keys & values == NULL
seems like a nice way to simply quickly clear a map.

BPF_MAP_LOOKUP keys/values == NULL might be useful if we just want
the values/keys and don't want to bother copying the keys/values...

BPF_MAP_LOOKUP keys & values == NULL might be useful to count
the number of populated entries.

Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Maciej Żenczykowski <maze@google.com>
---
 kernel/bpf/hashtab.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 5001131598e5..8fbdd000d9e0 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -1873,9 +1873,9 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
 
 	rcu_read_unlock();
 	bpf_enable_instrumentation();
-	if (bucket_cnt && (copy_to_user(ukeys + total * key_size, keys,
+	if (bucket_cnt && (ukeys && copy_to_user(ukeys + total * key_size, keys,
 	    key_size * bucket_cnt) ||
-	    copy_to_user(uvalues + total * value_size, values,
+	    uvalues && copy_to_user(uvalues + total * value_size, values,
 	    value_size * bucket_cnt))) {
 		ret = -EFAULT;
 		goto after_loop;
-- 
2.51.0.rc1.163.g2494970778-goog


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH bpf-next] bpf: hashtab - allow BPF_MAP_LOOKUP{,_AND_DELETE}_BATCH with NULL keys/values.
  2025-08-13  7:39 [PATCH bpf-next] bpf: hashtab - allow BPF_MAP_LOOKUP{,_AND_DELETE}_BATCH with NULL keys/values Maciej Żenczykowski
@ 2025-08-13 20:46 ` kernel test robot
  2025-08-18 20:58 ` Yonghong Song
  1 sibling, 0 replies; 6+ messages in thread
From: kernel test robot @ 2025-08-13 20:46 UTC (permalink / raw)
  To: Maciej Żenczykowski, Maciej Żenczykowski,
	Alexei Starovoitov, Daniel Borkmann
  Cc: oe-kbuild-all, Linux Network Development Mailing List,
	David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Linux Kernel Mailing List, BPF Mailing List, Stanislav Fomichev

Hi Maciej,

kernel test robot noticed the following build warnings:

[auto build test WARNING on bpf-next/master]

url:    https://github.com/intel-lab-lkp/linux/commits/Maciej-enczykowski/bpf-hashtab-allow-BPF_MAP_LOOKUP-_AND_DELETE-_BATCH-with-NULL-keys-values/20250813-154227
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
patch link:    https://lore.kernel.org/r/20250813073955.1775315-1-maze%40google.com
patch subject: [PATCH bpf-next] bpf: hashtab - allow BPF_MAP_LOOKUP{,_AND_DELETE}_BATCH with NULL keys/values.
config: csky-randconfig-001-20250814 (https://download.01.org/0day-ci/archive/20250814/202508140435.GA2XXxaK-lkp@intel.com/config)
compiler: csky-linux-gcc (GCC) 15.1.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250814/202508140435.GA2XXxaK-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202508140435.GA2XXxaK-lkp@intel.com/

All warnings (new ones prefixed by >>):

   kernel/bpf/hashtab.c: In function '__htab_map_lookup_and_delete_batch':
>> kernel/bpf/hashtab.c:1875:34: warning: suggest parentheses around '&&' within '||' [-Wparentheses]
    1875 |         if (bucket_cnt && (ukeys && copy_to_user(ukeys + total * key_size, keys,
         |                            ~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    1876 |             key_size * bucket_cnt) ||
         |             ~~~~~~~~~~~~~~~~~~~~~~


vim +1875 kernel/bpf/hashtab.c

  1675	
  1676	static int
  1677	__htab_map_lookup_and_delete_batch(struct bpf_map *map,
  1678					   const union bpf_attr *attr,
  1679					   union bpf_attr __user *uattr,
  1680					   bool do_delete, bool is_lru_map,
  1681					   bool is_percpu)
  1682	{
  1683		struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
  1684		void *keys = NULL, *values = NULL, *value, *dst_key, *dst_val;
  1685		void __user *uvalues = u64_to_user_ptr(attr->batch.values);
  1686		void __user *ukeys = u64_to_user_ptr(attr->batch.keys);
  1687		void __user *ubatch = u64_to_user_ptr(attr->batch.in_batch);
  1688		u32 batch, max_count, size, bucket_size, map_id;
  1689		u32 bucket_cnt, total, key_size, value_size;
  1690		struct htab_elem *node_to_free = NULL;
  1691		u64 elem_map_flags, map_flags;
  1692		struct hlist_nulls_head *head;
  1693		struct hlist_nulls_node *n;
  1694		unsigned long flags = 0;
  1695		bool locked = false;
  1696		struct htab_elem *l;
  1697		struct bucket *b;
  1698		int ret = 0;
  1699	
  1700		elem_map_flags = attr->batch.elem_flags;
  1701		if ((elem_map_flags & ~BPF_F_LOCK) ||
  1702		    ((elem_map_flags & BPF_F_LOCK) && !btf_record_has_field(map->record, BPF_SPIN_LOCK)))
  1703			return -EINVAL;
  1704	
  1705		map_flags = attr->batch.flags;
  1706		if (map_flags)
  1707			return -EINVAL;
  1708	
  1709		max_count = attr->batch.count;
  1710		if (!max_count)
  1711			return 0;
  1712	
  1713		if (put_user(0, &uattr->batch.count))
  1714			return -EFAULT;
  1715	
  1716		batch = 0;
  1717		if (ubatch && copy_from_user(&batch, ubatch, sizeof(batch)))
  1718			return -EFAULT;
  1719	
  1720		if (batch >= htab->n_buckets)
  1721			return -ENOENT;
  1722	
  1723		key_size = htab->map.key_size;
  1724		value_size = htab->map.value_size;
  1725		size = round_up(value_size, 8);
  1726		if (is_percpu)
  1727			value_size = size * num_possible_cpus();
  1728		total = 0;
  1729		/* while experimenting with hash tables with sizes ranging from 10 to
  1730		 * 1000, it was observed that a bucket can have up to 5 entries.
  1731		 */
  1732		bucket_size = 5;
  1733	
  1734	alloc:
  1735		/* We cannot do copy_from_user or copy_to_user inside
  1736		 * the rcu_read_lock. Allocate enough space here.
  1737		 */
  1738		keys = kvmalloc_array(key_size, bucket_size, GFP_USER | __GFP_NOWARN);
  1739		values = kvmalloc_array(value_size, bucket_size, GFP_USER | __GFP_NOWARN);
  1740		if (!keys || !values) {
  1741			ret = -ENOMEM;
  1742			goto after_loop;
  1743		}
  1744	
  1745	again:
  1746		bpf_disable_instrumentation();
  1747		rcu_read_lock();
  1748	again_nocopy:
  1749		dst_key = keys;
  1750		dst_val = values;
  1751		b = &htab->buckets[batch];
  1752		head = &b->head;
  1753		/* do not grab the lock unless need it (bucket_cnt > 0). */
  1754		if (locked) {
  1755			ret = htab_lock_bucket(b, &flags);
  1756			if (ret) {
  1757				rcu_read_unlock();
  1758				bpf_enable_instrumentation();
  1759				goto after_loop;
  1760			}
  1761		}
  1762	
  1763		bucket_cnt = 0;
  1764		hlist_nulls_for_each_entry_rcu(l, n, head, hash_node)
  1765			bucket_cnt++;
  1766	
  1767		if (bucket_cnt && !locked) {
  1768			locked = true;
  1769			goto again_nocopy;
  1770		}
  1771	
  1772		if (bucket_cnt > (max_count - total)) {
  1773			if (total == 0)
  1774				ret = -ENOSPC;
  1775			/* Note that since bucket_cnt > 0 here, it is implicit
  1776			 * that the locked was grabbed, so release it.
  1777			 */
  1778			htab_unlock_bucket(b, flags);
  1779			rcu_read_unlock();
  1780			bpf_enable_instrumentation();
  1781			goto after_loop;
  1782		}
  1783	
  1784		if (bucket_cnt > bucket_size) {
  1785			bucket_size = bucket_cnt;
  1786			/* Note that since bucket_cnt > 0 here, it is implicit
  1787			 * that the locked was grabbed, so release it.
  1788			 */
  1789			htab_unlock_bucket(b, flags);
  1790			rcu_read_unlock();
  1791			bpf_enable_instrumentation();
  1792			kvfree(keys);
  1793			kvfree(values);
  1794			goto alloc;
  1795		}
  1796	
  1797		/* Next block is only safe to run if you have grabbed the lock */
  1798		if (!locked)
  1799			goto next_batch;
  1800	
  1801		hlist_nulls_for_each_entry_safe(l, n, head, hash_node) {
  1802			memcpy(dst_key, l->key, key_size);
  1803	
  1804			if (is_percpu) {
  1805				int off = 0, cpu;
  1806				void __percpu *pptr;
  1807	
  1808				pptr = htab_elem_get_ptr(l, map->key_size);
  1809				for_each_possible_cpu(cpu) {
  1810					copy_map_value_long(&htab->map, dst_val + off, per_cpu_ptr(pptr, cpu));
  1811					check_and_init_map_value(&htab->map, dst_val + off);
  1812					off += size;
  1813				}
  1814			} else {
  1815				value = htab_elem_value(l, key_size);
  1816				if (is_fd_htab(htab)) {
  1817					struct bpf_map **inner_map = value;
  1818	
  1819					 /* Actual value is the id of the inner map */
  1820					map_id = map->ops->map_fd_sys_lookup_elem(*inner_map);
  1821					value = &map_id;
  1822				}
  1823	
  1824				if (elem_map_flags & BPF_F_LOCK)
  1825					copy_map_value_locked(map, dst_val, value,
  1826							      true);
  1827				else
  1828					copy_map_value(map, dst_val, value);
  1829				/* Zeroing special fields in the temp buffer */
  1830				check_and_init_map_value(map, dst_val);
  1831			}
  1832			if (do_delete) {
  1833				hlist_nulls_del_rcu(&l->hash_node);
  1834	
  1835				/* bpf_lru_push_free() will acquire lru_lock, which
  1836				 * may cause deadlock. See comments in function
  1837				 * prealloc_lru_pop(). Let us do bpf_lru_push_free()
  1838				 * after releasing the bucket lock.
  1839				 *
  1840				 * For htab of maps, htab_put_fd_value() in
  1841				 * free_htab_elem() may acquire a spinlock with bucket
  1842				 * lock being held and it violates the lock rule, so
  1843				 * invoke free_htab_elem() after unlock as well.
  1844				 */
  1845				l->batch_flink = node_to_free;
  1846				node_to_free = l;
  1847			}
  1848			dst_key += key_size;
  1849			dst_val += value_size;
  1850		}
  1851	
  1852		htab_unlock_bucket(b, flags);
  1853		locked = false;
  1854	
  1855		while (node_to_free) {
  1856			l = node_to_free;
  1857			node_to_free = node_to_free->batch_flink;
  1858			if (is_lru_map)
  1859				htab_lru_push_free(htab, l);
  1860			else
  1861				free_htab_elem(htab, l);
  1862		}
  1863	
  1864	next_batch:
  1865		/* If we are not copying data, we can go to next bucket and avoid
  1866		 * unlocking the rcu.
  1867		 */
  1868		if (!bucket_cnt && (batch + 1 < htab->n_buckets)) {
  1869			batch++;
  1870			goto again_nocopy;
  1871		}
  1872	
  1873		rcu_read_unlock();
  1874		bpf_enable_instrumentation();
> 1875		if (bucket_cnt && (ukeys && copy_to_user(ukeys + total * key_size, keys,
  1876		    key_size * bucket_cnt) ||
  1877		    uvalues && copy_to_user(uvalues + total * value_size, values,
  1878		    value_size * bucket_cnt))) {
  1879			ret = -EFAULT;
  1880			goto after_loop;
  1881		}
  1882	
  1883		total += bucket_cnt;
  1884		batch++;
  1885		if (batch >= htab->n_buckets) {
  1886			ret = -ENOENT;
  1887			goto after_loop;
  1888		}
  1889		goto again;
  1890	
  1891	after_loop:
  1892		if (ret == -EFAULT)
  1893			goto out;
  1894	
  1895		/* copy # of entries and next batch */
  1896		ubatch = u64_to_user_ptr(attr->batch.out_batch);
  1897		if (copy_to_user(ubatch, &batch, sizeof(batch)) ||
  1898		    put_user(total, &uattr->batch.count))
  1899			ret = -EFAULT;
  1900	
  1901	out:
  1902		kvfree(keys);
  1903		kvfree(values);
  1904		return ret;
  1905	}
  1906	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH bpf-next] bpf: hashtab - allow BPF_MAP_LOOKUP{,_AND_DELETE}_BATCH with NULL keys/values.
  2025-08-13  7:39 [PATCH bpf-next] bpf: hashtab - allow BPF_MAP_LOOKUP{,_AND_DELETE}_BATCH with NULL keys/values Maciej Żenczykowski
  2025-08-13 20:46 ` kernel test robot
@ 2025-08-18 20:58 ` Yonghong Song
  2025-08-21  4:07   ` Maciej Żenczykowski
       [not found]   ` <CANP3RGcJ06uRUBF=RR6bjqNnxdaSdpBpynGzNTSms0jA-ZpW6w@mail.gmail.com>
  1 sibling, 2 replies; 6+ messages in thread
From: Yonghong Song @ 2025-08-18 20:58 UTC (permalink / raw)
  To: Maciej Żenczykowski, Maciej Żenczykowski,
	Alexei Starovoitov, Daniel Borkmann
  Cc: Linux Network Development Mailing List, David S . Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Linux Kernel Mailing List, BPF Mailing List, Stanislav Fomichev



On 8/13/25 12:39 AM, Maciej Żenczykowski wrote:
> BPF_MAP_LOOKUP_AND_DELETE_BATCH keys & values == NULL
> seems like a nice way to simply quickly clear a map.

This will change existing API as users will expect
some error (e.g., -EFAULT) return when keys or values is NULL.

We have a 'flags' field in uapi header in

         struct { /* struct used by BPF_MAP_*_BATCH commands */
                 __aligned_u64   in_batch;       /* start batch,
                                                  * NULL to start from beginning
                                                  */
                 __aligned_u64   out_batch;      /* output: next start batch */
                 __aligned_u64   keys;
                 __aligned_u64   values;
                 __u32           count;          /* input/output:
                                                  * input: # of key/value
                                                  * elements
                                                  * output: # of filled elements
                                                  */
                 __u32           map_fd;
                 __u64           elem_flags;
                 __u64           flags;
         } batch;

we can add a flag in 'flags' like BPF_F_CLEAR_MAP_IF_KV_NULL with a comment
that if keys or values is NULL, the batched elements will be cleared.

>
> BPF_MAP_LOOKUP keys/values == NULL might be useful if we just want
> the values/keys and don't want to bother copying the keys/values...
>
> BPF_MAP_LOOKUP keys & values == NULL might be useful to count
> the number of populated entries.

bpf_map_lookup_elem() does not have flags field, so we probably should not
change existins semantics.

>
> Cc: Alexei Starovoitov <ast@kernel.org>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Stanislav Fomichev <sdf@fomichev.me>
> Signed-off-by: Maciej Żenczykowski <maze@google.com>
> ---
>   kernel/bpf/hashtab.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> index 5001131598e5..8fbdd000d9e0 100644
> --- a/kernel/bpf/hashtab.c
> +++ b/kernel/bpf/hashtab.c
> @@ -1873,9 +1873,9 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
>   
>   	rcu_read_unlock();
>   	bpf_enable_instrumentation();
> -	if (bucket_cnt && (copy_to_user(ukeys + total * key_size, keys,
> +	if (bucket_cnt && (ukeys && copy_to_user(ukeys + total * key_size, keys,
>   	    key_size * bucket_cnt) ||
> -	    copy_to_user(uvalues + total * value_size, values,
> +	    uvalues && copy_to_user(uvalues + total * value_size, values,
>   	    value_size * bucket_cnt))) {
>   		ret = -EFAULT;
>   		goto after_loop;


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH bpf-next] bpf: hashtab - allow BPF_MAP_LOOKUP{,_AND_DELETE}_BATCH with NULL keys/values.
  2025-08-18 20:58 ` Yonghong Song
@ 2025-08-21  4:07   ` Maciej Żenczykowski
       [not found]   ` <CANP3RGcJ06uRUBF=RR6bjqNnxdaSdpBpynGzNTSms0jA-ZpW6w@mail.gmail.com>
  1 sibling, 0 replies; 6+ messages in thread
From: Maciej Żenczykowski @ 2025-08-21  4:07 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Alexei Starovoitov, Daniel Borkmann,
	Linux Network Development Mailing List, David S . Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Linux Kernel Mailing List, BPF Mailing List, Stanislav Fomichev

On Mon, Aug 18, 2025 at 1:58 PM Yonghong Song <yonghong.song@linux.dev> wrote:
> On 8/13/25 12:39 AM, Maciej Żenczykowski wrote:
> > BPF_MAP_LOOKUP_AND_DELETE_BATCH keys & values == NULL
> > seems like a nice way to simply quickly clear a map.
>
> This will change existing API as users will expect
> some error (e.g., -EFAULT) return when keys or values is NULL.

No reasonable user will call the current api with NULLs.

This is a similar API change to adding a new system call
(where previously it returned -ENOSYS) - which *is* also a UAPI
change, but obviously allowed.

Or adding support for a new address family / protocol (where
previously it -EAFNOSUPPORT)
Or adding support for a new flag (where previously it returned -EINVAL)

Consider why userspace would ever pass in NULL, two possibilities:
(a) explicit NULL - you'd never do this since it would (till now)
always -EFAULT,
  so this would only possibly show up in a very thorough test suite
(b) you're using dynamically allocated memory and it failed allocation.
  that's already a program bug, you should catch that before you call bpf().

> We have a 'flags' field in uapi header in
>
>          struct { /* struct used by BPF_MAP_*_BATCH commands */
>                  __aligned_u64   in_batch;       /* start batch,
>                                                   * NULL to start from beginning
>                                                   */
>                  __aligned_u64   out_batch;      /* output: next start batch */
>                  __aligned_u64   keys;
>                  __aligned_u64   values;
>                  __u32           count;          /* input/output:
>                                                   * input: # of key/value
>                                                   * elements
>                                                   * output: # of filled elements
>                                                   */
>                  __u32           map_fd;
>                  __u64           elem_flags;
>                  __u64           flags;
>          } batch;
>
> we can add a flag in 'flags' like BPF_F_CLEAR_MAP_IF_KV_NULL with a comment
> that if keys or values is NULL, the batched elements will be cleared.

I just don't see what value this provides.

> > BPF_MAP_LOOKUP keys/values == NULL might be useful if we just want
> > the values/keys and don't want to bother copying the keys/values...
> >
> > BPF_MAP_LOOKUP keys & values == NULL might be useful to count
> > the number of populated entries.
>
> bpf_map_lookup_elem() does not have flags field, so we probably should not
> change existins semantics.

This is unrelated to this patch, since this only touches 'batch' operation.
(unless I'm missing something)

> > Cc: Alexei Starovoitov <ast@kernel.org>
> > Cc: Daniel Borkmann <daniel@iogearbox.net>
> > Cc: Stanislav Fomichev <sdf@fomichev.me>
> > Signed-off-by: Maciej Żenczykowski <maze@google.com>
> > ---
> >   kernel/bpf/hashtab.c | 4 ++--
> >   1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> > index 5001131598e5..8fbdd000d9e0 100644
> > --- a/kernel/bpf/hashtab.c
> > +++ b/kernel/bpf/hashtab.c
> > @@ -1873,9 +1873,9 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
> >
> >       rcu_read_unlock();
> >       bpf_enable_instrumentation();
> > -     if (bucket_cnt && (copy_to_user(ukeys + total * key_size, keys,
> > +     if (bucket_cnt && (ukeys && copy_to_user(ukeys + total * key_size, keys,
> >           key_size * bucket_cnt) ||
> > -         copy_to_user(uvalues + total * value_size, values,
> > +         uvalues && copy_to_user(uvalues + total * value_size, values,
> >           value_size * bucket_cnt))) {
> >               ret = -EFAULT;
> >               goto after_loop;

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH bpf-next] bpf: hashtab - allow BPF_MAP_LOOKUP{,_AND_DELETE}_BATCH with NULL keys/values.
       [not found]   ` <CANP3RGcJ06uRUBF=RR6bjqNnxdaSdpBpynGzNTSms0jA-ZpW6w@mail.gmail.com>
@ 2025-08-21 21:48     ` Yonghong Song
  2025-08-22 19:00       ` Andrii Nakryiko
  0 siblings, 1 reply; 6+ messages in thread
From: Yonghong Song @ 2025-08-21 21:48 UTC (permalink / raw)
  To: Maciej Żenczykowski
  Cc: Alexei Starovoitov, Daniel Borkmann,
	Linux Network Development Mailing List, David S . Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Linux Kernel Mailing List, BPF Mailing List, Stanislav Fomichev



On 8/20/25 7:23 PM, Maciej Żenczykowski wrote:
> On Mon, Aug 18, 2025 at 1:58 PM Yonghong Song 
> <yonghong.song@linux.dev> wrote:
> > On 8/13/25 12:39 AM, Maciej Żenczykowski wrote:
> > > BPF_MAP_LOOKUP_AND_DELETE_BATCH keys & values == NULL
> > > seems like a nice way to simply quickly clear a map.
> >
> > This will change existing API as users will expect
> > some error (e.g., -EFAULT) return when keys or values is NULL.
>
> No reasonable user will call the current api with NULLs.

I do agree it is really unlikely users will have NULL keys or values.

>
> This is a similar API change to adding a new system call
> (where previously it returned -ENOSYS) - which *is* also a UAPI 
> change, but obviously allowed.
>
> Or adding support for a new address family / protocol (where 
> previously it -EAFNOSUPPORT)
> Or adding support for a new flag (where previously it returned -EINVAL)
>
> Consider why userspace would ever pass in NULL, two possibilities:
> (a) explicit NULL - you'd never do this since it would (till now) 
> always -EFAULT,
>   so this would only possibly show up in a very thorough test suite
> (b) you're using dynamically allocated memory and it failed allocation.
>   that's already a program bug, you should catch that before you call 
> bpf().

Okay. What you describes make sense.
Could you add a selftest for this?
Could you add some comments in below uapi bpf.h header to new functionality?

>
> > We have a 'flags' field in uapi header in
> >
> >          struct { /* struct used by BPF_MAP_*_BATCH commands */
> >                  __aligned_u64   in_batch;       /* start batch,
> >                                                   * NULL to start 
> from beginning
> >                                                   */
> >                  __aligned_u64   out_batch;      /* output: next 
> start batch */
> >                  __aligned_u64   keys;
> >                  __aligned_u64   values;
> >                  __u32           count;          /* input/output:
> >                                                   * input: # of 
> key/value
> >                                                   * elements
> >                                                   * output: # of 
> filled elements
> >                                                   */
> >                  __u32           map_fd;
> >                  __u64           elem_flags;
> >                  __u64           flags;
> >          } batch;
> >
> > we can add a flag in 'flags' like BPF_F_CLEAR_MAP_IF_KV_NULL with a 
> comment
> > that if keys or values is NULL, the batched elements will be cleared.
>
> I just don't see what value this provides.
>
> > > BPF_MAP_LOOKUP keys/values == NULL might be useful if we just want
> > > the values/keys and don't want to bother copying the keys/values...
> > >
> > > BPF_MAP_LOOKUP keys & values == NULL might be useful to count
> > > the number of populated entries.
> >
> > bpf_map_lookup_elem() does not have flags field, so we probably 
> should not
> > change existins semantics.
>
> This is unrelated to this patch, since this only touches 'batch' 
> operation.
> (unless I'm missing something)
>
> > > Cc: Alexei Starovoitov <ast@kernel.org>
> > > Cc: Daniel Borkmann <daniel@iogearbox.net>
> > > Cc: Stanislav Fomichev <sdf@fomichev.me>
> > > Signed-off-by: Maciej Żenczykowski <maze@google.com>
> > > ---
> > >   kernel/bpf/hashtab.c | 4 ++--
> > >   1 file changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> > > index 5001131598e5..8fbdd000d9e0 100644
> > > --- a/kernel/bpf/hashtab.c
> > > +++ b/kernel/bpf/hashtab.c
> > > @@ -1873,9 +1873,9 @@ __htab_map_lookup_and_delete_batch(struct 
> bpf_map *map,
> > >
> > >       rcu_read_unlock();
> > >       bpf_enable_instrumentation();
> > > -     if (bucket_cnt && (copy_to_user(ukeys + total * key_size, keys,
> > > +     if (bucket_cnt && (ukeys && copy_to_user(ukeys + total * 
> key_size, keys,
> > >           key_size * bucket_cnt) ||
> > > -         copy_to_user(uvalues + total * value_size, values,
> > > +         uvalues && copy_to_user(uvalues + total * value_size, 
> values,
> > >           value_size * bucket_cnt))) {
> > >               ret = -EFAULT;
> > >               goto after_loop;
> >
>
>
> --
> Maciej Żenczykowski, Kernel Networking Developer @ Google


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH bpf-next] bpf: hashtab - allow BPF_MAP_LOOKUP{,_AND_DELETE}_BATCH with NULL keys/values.
  2025-08-21 21:48     ` Yonghong Song
@ 2025-08-22 19:00       ` Andrii Nakryiko
  0 siblings, 0 replies; 6+ messages in thread
From: Andrii Nakryiko @ 2025-08-22 19:00 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Maciej Żenczykowski, Alexei Starovoitov, Daniel Borkmann,
	Linux Network Development Mailing List, David S . Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Linux Kernel Mailing List, BPF Mailing List, Stanislav Fomichev

On Thu, Aug 21, 2025 at 2:49 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>
>
>
> On 8/20/25 7:23 PM, Maciej Żenczykowski wrote:
> > On Mon, Aug 18, 2025 at 1:58 PM Yonghong Song
> > <yonghong.song@linux.dev> wrote:
> > > On 8/13/25 12:39 AM, Maciej Żenczykowski wrote:
> > > > BPF_MAP_LOOKUP_AND_DELETE_BATCH keys & values == NULL
> > > > seems like a nice way to simply quickly clear a map.
> > >
> > > This will change existing API as users will expect
> > > some error (e.g., -EFAULT) return when keys or values is NULL.
> >
> > No reasonable user will call the current api with NULLs.
>
> I do agree it is really unlikely users will have NULL keys or values.
>
> >
> > This is a similar API change to adding a new system call
> > (where previously it returned -ENOSYS) - which *is* also a UAPI
> > change, but obviously allowed.
> >
> > Or adding support for a new address family / protocol (where
> > previously it -EAFNOSUPPORT)
> > Or adding support for a new flag (where previously it returned -EINVAL)
> >
> > Consider why userspace would ever pass in NULL, two possibilities:
> > (a) explicit NULL - you'd never do this since it would (till now)
> > always -EFAULT,
> >   so this would only possibly show up in a very thorough test suite
> > (b) you're using dynamically allocated memory and it failed allocation.
> >   that's already a program bug, you should catch that before you call
> > bpf().
>
> Okay. What you describes make sense.

+1, I think there is no backwards compat concern with this extension.

> Could you add a selftest for this?

yes, please

> Could you add some comments in below uapi bpf.h header to new functionality?

+1, I'd also appreciate if you can incorporate that into libbpf API
doc comments in tools/lib/bpf/bpf.h, as we already have a decent
description of this rather complicated API, so it would be nice to
keep it up to date with this added semantics. Thanks!

Also, please shorten the subject, it's way too long.

pw-bot: cr

>
> >
> > > We have a 'flags' field in uapi header in
> > >
> > >          struct { /* struct used by BPF_MAP_*_BATCH commands */
> > >                  __aligned_u64   in_batch;       /* start batch,
> > >                                                   * NULL to start
> > from beginning
> > >                                                   */
> > >                  __aligned_u64   out_batch;      /* output: next
> > start batch */
> > >                  __aligned_u64   keys;
> > >                  __aligned_u64   values;
> > >                  __u32           count;          /* input/output:
> > >                                                   * input: # of
> > key/value
> > >                                                   * elements
> > >                                                   * output: # of
> > filled elements
> > >                                                   */
> > >                  __u32           map_fd;
> > >                  __u64           elem_flags;
> > >                  __u64           flags;
> > >          } batch;
> > >
> > > we can add a flag in 'flags' like BPF_F_CLEAR_MAP_IF_KV_NULL with a
> > comment
> > > that if keys or values is NULL, the batched elements will be cleared.
> >
> > I just don't see what value this provides.
> >
> > > > BPF_MAP_LOOKUP keys/values == NULL might be useful if we just want
> > > > the values/keys and don't want to bother copying the keys/values...
> > > >
> > > > BPF_MAP_LOOKUP keys & values == NULL might be useful to count
> > > > the number of populated entries.
> > >
> > > bpf_map_lookup_elem() does not have flags field, so we probably
> > should not
> > > change existins semantics.
> >
> > This is unrelated to this patch, since this only touches 'batch'
> > operation.
> > (unless I'm missing something)
> >
> > > > Cc: Alexei Starovoitov <ast@kernel.org>
> > > > Cc: Daniel Borkmann <daniel@iogearbox.net>
> > > > Cc: Stanislav Fomichev <sdf@fomichev.me>
> > > > Signed-off-by: Maciej Żenczykowski <maze@google.com>
> > > > ---
> > > >   kernel/bpf/hashtab.c | 4 ++--
> > > >   1 file changed, 2 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> > > > index 5001131598e5..8fbdd000d9e0 100644
> > > > --- a/kernel/bpf/hashtab.c
> > > > +++ b/kernel/bpf/hashtab.c
> > > > @@ -1873,9 +1873,9 @@ __htab_map_lookup_and_delete_batch(struct
> > bpf_map *map,
> > > >
> > > >       rcu_read_unlock();
> > > >       bpf_enable_instrumentation();
> > > > -     if (bucket_cnt && (copy_to_user(ukeys + total * key_size, keys,
> > > > +     if (bucket_cnt && (ukeys && copy_to_user(ukeys + total *
> > key_size, keys,
> > > >           key_size * bucket_cnt) ||
> > > > -         copy_to_user(uvalues + total * value_size, values,
> > > > +         uvalues && copy_to_user(uvalues + total * value_size,
> > values,
> > > >           value_size * bucket_cnt))) {
> > > >               ret = -EFAULT;
> > > >               goto after_loop;
> > >
> >
> >
> > --
> > Maciej Żenczykowski, Kernel Networking Developer @ Google
>
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-08-22 19:00 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-13  7:39 [PATCH bpf-next] bpf: hashtab - allow BPF_MAP_LOOKUP{,_AND_DELETE}_BATCH with NULL keys/values Maciej Żenczykowski
2025-08-13 20:46 ` kernel test robot
2025-08-18 20:58 ` Yonghong Song
2025-08-21  4:07   ` Maciej Żenczykowski
     [not found]   ` <CANP3RGcJ06uRUBF=RR6bjqNnxdaSdpBpynGzNTSms0jA-ZpW6w@mail.gmail.com>
2025-08-21 21:48     ` Yonghong Song
2025-08-22 19:00       ` Andrii Nakryiko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).