[PATCH] bpf: fix possible endless loop in BPF map iteration

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] bpf: fix possible endless loop in BPF map iteration
@ 2025-02-24 22:01 Brandon Kammerdiener
  2025-02-25  7:13 ` Martin KaFai Lau
  0 siblings, 1 reply; 8+ messages in thread
From: Brandon Kammerdiener @ 2025-02-24 22:01 UTC (permalink / raw)
  To: bpf

This patch fixes an endless loop condition that can occur in
bpf_for_each_hash_elem, causing the core to softlock. My understanding is
that a combination of RCU list deletion and insertion introduces the new
element after the iteration cursor and that there is a chance that an RCU
reader may in fact use this new element in iteration. The patch uses a
_safe variant of the macro which gets the next element to iterate before
executing the loop body for the current element. The following simple BPF
program can be used to reproduce the issue:

    #include "vmlinux.h"
    #include <bpf/bpf_helpers.h>
    #include <bpf/bpf_tracing.h>

    #define N (64)

    struct {
        __uint(type,        BPF_MAP_TYPE_HASH);
        __uint(max_entries, N);
        __type(key,         __u64);
        __type(value,       __u64);
    } map SEC(".maps");

    static int cb(struct bpf_map *map, __u64 *key, __u64 *value, void *arg) {
        bpf_map_delete_elem(map, key);
        bpf_map_update_elem(map, &key, &val, 0);
        return 0;
    }

    SEC("uprobe//proc/self/exe:test")
    int BPF_PROG(test) {
        __u64 i;

        bpf_for(i, 0, N) {
            bpf_map_update_elem(&map, &i, &i, 0);
        }

        bpf_for_each_map_elem(&map, cb, NULL, 0);

        return 0;
    }

    char LICENSE[] SEC("license") = "GPL";

Signed-off-by: Brandon Kammerdiener <brandon.kammerdiener@intel.com>

---
 kernel/bpf/hashtab.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 4a9eeb7aef85..43574b0495c3 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -2224,7 +2224,7 @@ static long bpf_for_each_hash_elem(struct bpf_map *map, bpf_callback_t callback_
 		b = &htab->buckets[i];
 		rcu_read_lock();
 		head = &b->head;
-		hlist_nulls_for_each_entry_rcu(elem, n, head, hash_node) {
+		hlist_nulls_for_each_entry_safe(elem, n, head, hash_node) {
 			key = elem->key;
 			if (is_percpu) {
 				/* current cpu value for percpu map */
--
2.48.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] bpf: fix possible endless loop in BPF map iteration
  2025-02-24 22:01 [PATCH] bpf: fix possible endless loop in BPF map iteration Brandon Kammerdiener
@ 2025-02-25  7:13 ` Martin KaFai Lau
  2025-02-25 12:26   ` Hou Tao
  0 siblings, 1 reply; 8+ messages in thread
From: Martin KaFai Lau @ 2025-02-25  7:13 UTC (permalink / raw)
  To: Brandon Kammerdiener; +Cc: bpf

On 2/24/25 2:01 PM, Brandon Kammerdiener wrote:
> This patch fixes an endless loop condition that can occur in
> bpf_for_each_hash_elem, causing the core to softlock. My understanding is
> that a combination of RCU list deletion and insertion introduces the new
> element after the iteration cursor and that there is a chance that an RCU

new element is added to the head of the bucket, so the first thought is it 
should not extend the list beyond the current iteration point...

> reader may in fact use this new element in iteration. The patch uses a
> _safe variant of the macro which gets the next element to iterate before
> executing the loop body for the current element. The following simple BPF
> program can be used to reproduce the issue:
> 
>      #include "vmlinux.h"
>      #include <bpf/bpf_helpers.h>
>      #include <bpf/bpf_tracing.h>
> 
>      #define N (64)
> 
>      struct {
>          __uint(type,        BPF_MAP_TYPE_HASH);
>          __uint(max_entries, N);
>          __type(key,         __u64);
>          __type(value,       __u64);
>      } map SEC(".maps");
> 
>      static int cb(struct bpf_map *map, __u64 *key, __u64 *value, void *arg) {
>          bpf_map_delete_elem(map, key);
>          bpf_map_update_elem(map, &key, &val, 0);

I suspect what happened in this reproducer is,
there is a bucket with more than one elem(s) and the deleted elem gets 
immediately added back to the bucket->head.
Something like this, '[ ]' as the current elem.

1st iteration     (follow bucket->head.first): [elem_1] ->  elem_2
                                   delete_elem:  elem_2
                                   update_elem: [elem_1] ->  elem_2
2nd iteration (follow elem_1->hash_node.next):  elem_1  -> [elem_2]
                                   delete_elem:  elem_1
                                   update_elem: [elem_2] -> elem_1
3rd iteration (follow elem_2->hash_node.next):  elem_2  -> [elem_1]
				  loop.......

don't think "_safe" covers all cases though. "_safe" may solve this particular 
reproducer which is shooting itself in the foot by deleting and adding itself 
when iterating a bucket.

[ btw, I don't think the test code can work as is. At least the "&key" arg of 
the bpf_map_update_elem looks wrong. ]

>          return 0;
>      }
> 
>      SEC("uprobe//proc/self/exe:test")
>      int BPF_PROG(test) {
>          __u64 i;
> 
>          bpf_for(i, 0, N) {
>              bpf_map_update_elem(&map, &i, &i, 0);
>          }
> 
>          bpf_for_each_map_elem(&map, cb, NULL, 0);
> 
>          return 0;
>      }
> 
>      char LICENSE[] SEC("license") = "GPL";
> 
> Signed-off-by: Brandon Kammerdiener <brandon.kammerdiener@intel.com>
> 
> ---
>   kernel/bpf/hashtab.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> index 4a9eeb7aef85..43574b0495c3 100644
> --- a/kernel/bpf/hashtab.c
> +++ b/kernel/bpf/hashtab.c
> @@ -2224,7 +2224,7 @@ static long bpf_for_each_hash_elem(struct bpf_map *map, bpf_callback_t callback_
>   		b = &htab->buckets[i];
>   		rcu_read_lock();
>   		head = &b->head;
> -		hlist_nulls_for_each_entry_rcu(elem, n, head, hash_node) {
> +		hlist_nulls_for_each_entry_safe(elem, n, head, hash_node) {
>   			key = elem->key;
>   			if (is_percpu) {
>   				/* current cpu value for percpu map */
> --
> 2.48.1
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] bpf: fix possible endless loop in BPF map iteration
  2025-02-25  7:13 ` Martin KaFai Lau
@ 2025-02-25 12:26   ` Hou Tao
  2025-02-25 16:15     ` Brandon Kammerdiener
  0 siblings, 1 reply; 8+ messages in thread
From: Hou Tao @ 2025-02-25 12:26 UTC (permalink / raw)
  To: Martin KaFai Lau, Brandon Kammerdiener; +Cc: bpf

Hi,

On 2/25/2025 3:13 PM, Martin KaFai Lau wrote:
> On 2/24/25 2:01 PM, Brandon Kammerdiener wrote:
>> This patch fixes an endless loop condition that can occur in
>> bpf_for_each_hash_elem, causing the core to softlock. My
>> understanding is
>> that a combination of RCU list deletion and insertion introduces the new
>> element after the iteration cursor and that there is a chance that an
>> RCU
>
> new element is added to the head of the bucket, so the first thought
> is it should not extend the list beyond the current iteration point...
>
>> reader may in fact use this new element in iteration. The patch uses a
>> _safe variant of the macro which gets the next element to iterate before
>> executing the loop body for the current element. The following simple
>> BPF
>> program can be used to reproduce the issue:
>>
>>      #include "vmlinux.h"
>>      #include <bpf/bpf_helpers.h>
>>      #include <bpf/bpf_tracing.h>
>>
>>      #define N (64)
>>
>>      struct {
>>          __uint(type,        BPF_MAP_TYPE_HASH);
>>          __uint(max_entries, N);
>>          __type(key,         __u64);
>>          __type(value,       __u64);
>>      } map SEC(".maps");
>>
>>      static int cb(struct bpf_map *map, __u64 *key, __u64 *value,
>> void *arg) {
>>          bpf_map_delete_elem(map, key);
>>          bpf_map_update_elem(map, &key, &val, 0);
>
> I suspect what happened in this reproducer is,
> there is a bucket with more than one elem(s) and the deleted elem gets
> immediately added back to the bucket->head.
> Something like this, '[ ]' as the current elem.
>
> 1st iteration     (follow bucket->head.first): [elem_1] ->  elem_2
>                                   delete_elem:  elem_2
>                                   update_elem: [elem_1] ->  elem_2
> 2nd iteration (follow elem_1->hash_node.next):  elem_1  -> [elem_2]
>                                   delete_elem:  elem_1
>                                   update_elem: [elem_2] -> elem_1
> 3rd iteration (follow elem_2->hash_node.next):  elem_2  -> [elem_1]
>                   loop.......
>

Yes. The above procedure is exactly the test case tries to do (except
the &key and &val typos).

> don't think "_safe" covers all cases though. "_safe" may solve this
> particular reproducer which is shooting itself in the foot by deleting
> and adding itself when iterating a bucket.

Yes, if the bpf prog could delete and re-add the saved next entry, there
will be dead loop as well. It seems __htab_map_lookup_elem() may suffer
from the same problem just as bpf_for_each_hash_elem(). The problem is
mainly due to the immediate reuse of deleted element. Maybe we need to
add a seqlock to the htab_elem and retry the traversal if the seqlock is
not stable.
>
> [ btw, I don't think the test code can work as is. At least the "&key"
> arg of the bpf_map_update_elem looks wrong. ]
>
>>          return 0;
>>      }
>>
>>      SEC("uprobe//proc/self/exe:test")
>>      int BPF_PROG(test) {
>>          __u64 i;
>>
>>          bpf_for(i, 0, N) {
>>              bpf_map_update_elem(&map, &i, &i, 0);
>>          }
>>
>>          bpf_for_each_map_elem(&map, cb, NULL, 0);
>>
>>          return 0;
>>      }
>>
>>      char LICENSE[] SEC("license") = "GPL";
>>
>> Signed-off-by: Brandon Kammerdiener <brandon.kammerdiener@intel.com>
>>
>> ---
>>   kernel/bpf/hashtab.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
>> index 4a9eeb7aef85..43574b0495c3 100644
>> --- a/kernel/bpf/hashtab.c
>> +++ b/kernel/bpf/hashtab.c
>> @@ -2224,7 +2224,7 @@ static long bpf_for_each_hash_elem(struct
>> bpf_map *map, bpf_callback_t callback_
>>           b = &htab->buckets[i];
>>           rcu_read_lock();
>>           head = &b->head;
>> -        hlist_nulls_for_each_entry_rcu(elem, n, head, hash_node) {
>> +        hlist_nulls_for_each_entry_safe(elem, n, head, hash_node) {
>>               key = elem->key;
>>               if (is_percpu) {
>>                   /* current cpu value for percpu map */
>> -- 
>> 2.48.1
>>
>
>
> .


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] bpf: fix possible endless loop in BPF map iteration
  2025-02-25 12:26   ` Hou Tao
@ 2025-02-25 16:15     ` Brandon Kammerdiener
  2025-02-27  1:32       ` Hou Tao
  0 siblings, 1 reply; 8+ messages in thread
From: Brandon Kammerdiener @ 2025-02-25 16:15 UTC (permalink / raw)
  To: Hou Tao; +Cc: Martin KaFai Lau, bpf

On Tue, Feb 25, 2025 at 08:26:17PM +0800, Hou Tao wrote:
> Hi,
>
> On 2/25/2025 3:13 PM, Martin KaFai Lau wrote:
> > On 2/24/25 2:01 PM, Brandon Kammerdiener wrote:
> >> This patch fixes an endless loop condition that can occur in
> >> bpf_for_each_hash_elem, causing the core to softlock. My
> >> understanding is
> >> that a combination of RCU list deletion and insertion introduces the new
> >> element after the iteration cursor and that there is a chance that an
> >> RCU
> >
> > new element is added to the head of the bucket, so the first thought
> > is it should not extend the list beyond the current iteration point...
> >
> >> reader may in fact use this new element in iteration. The patch uses a
> >> _safe variant of the macro which gets the next element to iterate before
> >> executing the loop body for the current element. The following simple
> >> BPF
> >> program can be used to reproduce the issue:
> >>
> >>      #include "vmlinux.h"
> >>      #include <bpf/bpf_helpers.h>
> >>      #include <bpf/bpf_tracing.h>
> >>
> >>      #define N (64)
> >>
> >>      struct {
> >>          __uint(type,        BPF_MAP_TYPE_HASH);
> >>          __uint(max_entries, N);
> >>          __type(key,         __u64);
> >>          __type(value,       __u64);
> >>      } map SEC(".maps");
> >>
> >>      static int cb(struct bpf_map *map, __u64 *key, __u64 *value,
> >> void *arg) {
> >>          bpf_map_delete_elem(map, key);
> >>          bpf_map_update_elem(map, &key, &val, 0);
> >
> > I suspect what happened in this reproducer is,
> > there is a bucket with more than one elem(s) and the deleted elem gets
> > immediately added back to the bucket->head.
> > Something like this, '[ ]' as the current elem.
> >
> > 1st iteration     (follow bucket->head.first): [elem_1] ->  elem_2
> >                                   delete_elem:  elem_2
> >                                   update_elem: [elem_1] ->  elem_2
> > 2nd iteration (follow elem_1->hash_node.next):  elem_1  -> [elem_2]
> >                                   delete_elem:  elem_1
> >                                   update_elem: [elem_2] -> elem_1
> > 3rd iteration (follow elem_2->hash_node.next):  elem_2  -> [elem_1]
> >                   loop.......
> >
>
> Yes. The above procedure is exactly the test case tries to do (except
> the &key and &val typos).

Yes, apologies for the typos I must have introduced when minifying the
example. Should just use key and val sans the &.

>
> > don't think "_safe" covers all cases though. "_safe" may solve this
> > particular reproducer which is shooting itself in the foot by deleting
> > and adding itself when iterating a bucket.
>
> Yes, if the bpf prog could delete and re-add the saved next entry, there
> will be dead loop as well. It seems __htab_map_lookup_elem() may suffer
> from the same problem just as bpf_for_each_hash_elem(). The problem is
> mainly due to the immediate reuse of deleted element. Maybe we need to
> add a seqlock to the htab_elem and retry the traversal if the seqlock is
> not stable.
> >
> > [ btw, I don't think the test code can work as is. At least the "&key"
> > arg of the bpf_map_update_elem looks wrong. ]
> >
> >>          return 0;
> >>      }
> >>
> >>      SEC("uprobe//proc/self/exe:test")
> >>      int BPF_PROG(test) {
> >>          __u64 i;
> >>
> >>          bpf_for(i, 0, N) {
> >>              bpf_map_update_elem(&map, &i, &i, 0);
> >>          }
> >>
> >>          bpf_for_each_map_elem(&map, cb, NULL, 0);
> >>
> >>          return 0;
> >>      }
> >>
> >>      char LICENSE[] SEC("license") = "GPL";
> >>
> >> Signed-off-by: Brandon Kammerdiener <brandon.kammerdiener@intel.com>
> >>
> >> ---
> >>   kernel/bpf/hashtab.c | 2 +-
> >>   1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> >> index 4a9eeb7aef85..43574b0495c3 100644
> >> --- a/kernel/bpf/hashtab.c
> >> +++ b/kernel/bpf/hashtab.c
> >> @@ -2224,7 +2224,7 @@ static long bpf_for_each_hash_elem(struct
> >> bpf_map *map, bpf_callback_t callback_
> >>           b = &htab->buckets[i];
> >>           rcu_read_lock();
> >>           head = &b->head;
> >> -        hlist_nulls_for_each_entry_rcu(elem, n, head, hash_node) {
> >> +        hlist_nulls_for_each_entry_safe(elem, n, head, hash_node) {
> >>               key = elem->key;
> >>               if (is_percpu) {
> >>                   /* current cpu value for percpu map */
> >> --
> >> 2.48.1
> >>
> >
> >
> > .
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] bpf: fix possible endless loop in BPF map iteration
  2025-02-25 16:15     ` Brandon Kammerdiener
@ 2025-02-27  1:32       ` Hou Tao
  2025-02-27  2:07         ` Alexei Starovoitov
  0 siblings, 1 reply; 8+ messages in thread
From: Hou Tao @ 2025-02-27  1:32 UTC (permalink / raw)
  To: Brandon Kammerdiener, Martin KaFai Lau; +Cc: bpf

Hi,

On 2/26/2025 12:15 AM, Brandon Kammerdiener wrote:
> On Tue, Feb 25, 2025 at 08:26:17PM +0800, Hou Tao wrote:
>> Hi,
>>
>> On 2/25/2025 3:13 PM, Martin KaFai Lau wrote:
>>> On 2/24/25 2:01 PM, Brandon Kammerdiener wrote:
>>>> This patch fixes an endless loop condition that can occur in
>>>> bpf_for_each_hash_elem, causing the core to softlock. My
>>>> understanding is
>>>> that a combination of RCU list deletion and insertion introduces the new
>>>> element after the iteration cursor and that there is a chance that an
>>>> RCU
>>> new element is added to the head of the bucket, so the first thought
>>> is it should not extend the list beyond the current iteration point...
>>>
>>>> reader may in fact use this new element in iteration. The patch uses a
>>>> _safe variant of the macro which gets the next element to iterate before
>>>> executing the loop body for the current element. The following simple
>>>> BPF
>>>> program can be used to reproduce the issue:
>>>>
>>>>      #include "vmlinux.h"
>>>>      #include <bpf/bpf_helpers.h>
>>>>      #include <bpf/bpf_tracing.h>
>>>>
>>>>      #define N (64)
>>>>
>>>>      struct {
>>>>          __uint(type,        BPF_MAP_TYPE_HASH);
>>>>          __uint(max_entries, N);
>>>>          __type(key,         __u64);
>>>>          __type(value,       __u64);
>>>>      } map SEC(".maps");
>>>>
>>>>      static int cb(struct bpf_map *map, __u64 *key, __u64 *value,
>>>> void *arg) {
>>>>          bpf_map_delete_elem(map, key);
>>>>          bpf_map_update_elem(map, &key, &val, 0);
>>> I suspect what happened in this reproducer is,
>>> there is a bucket with more than one elem(s) and the deleted elem gets
>>> immediately added back to the bucket->head.
>>> Something like this, '[ ]' as the current elem.
>>>
>>> 1st iteration     (follow bucket->head.first): [elem_1] ->  elem_2
>>>                                   delete_elem:  elem_2
>>>                                   update_elem: [elem_1] ->  elem_2
>>> 2nd iteration (follow elem_1->hash_node.next):  elem_1  -> [elem_2]
>>>                                   delete_elem:  elem_1
>>>                                   update_elem: [elem_2] -> elem_1
>>> 3rd iteration (follow elem_2->hash_node.next):  elem_2  -> [elem_1]
>>>                   loop.......
>>>
>> Yes. The above procedure is exactly the test case tries to do (except
>> the &key and &val typos).
> Yes, apologies for the typos I must have introduced when minifying the
> example. Should just use key and val sans the &.

OK
>
>>> don't think "_safe" covers all cases though. "_safe" may solve this
>>> particular reproducer which is shooting itself in the foot by deleting
>>> and adding itself when iterating a bucket.
>> Yes, if the bpf prog could delete and re-add the saved next entry, there
>> will be dead loop as well. It seems __htab_map_lookup_elem() may suffer
>> from the same problem just as bpf_for_each_hash_elem(). The problem is
>> mainly due to the immediate reuse of deleted element. Maybe we need to
>> add a seqlock to the htab_elem and retry the traversal if the seqlock is
>> not stable.

The seqlock + restart traversal way doesn't work, because the seq-count
for each element will add 2 after the re-insert and the loop will always
try to restart the traversal. I have another idea: how about add an
per-bucket incremental id for each element in the bucket and during the
traversal, the traversal will ignore the element with id greater than
the id of current element, so it will ignore the newly-added element.
For example, there are three elements in a bucket list: head -> A [id=3]
-> B[id=2] -> C[id=1]

(1) pass A to cb
current id = 3
cb deletes A and inserts A again
head -> A[4] -> B[2] -> C[1]

(2) pass B to cb
current id=2
cb deletes B and inserts B again
head -> B[5] -> A[4] -> C[1]

the id of A is greater than current id, so it is skipped.

>>> [ btw, I don't think the test code can work as is. At least the "&key"
>>> arg of the bpf_map_update_elem looks wrong. ]
>>>
>>>>          return 0;
>>>>      }
>>>>
>>>>      SEC("uprobe//proc/self/exe:test")
>>>>      int BPF_PROG(test) {
>>>>          __u64 i;
>>>>
>>>>          bpf_for(i, 0, N) {
>>>>              bpf_map_update_elem(&map, &i, &i, 0);
>>>>          }
>>>>
>>>>          bpf_for_each_map_elem(&map, cb, NULL, 0);
>>>>
>>>>          return 0;
>>>>      }
>>>>
>>>>      char LICENSE[] SEC("license") = "GPL";
>>>>
>>>> Signed-off-by: Brandon Kammerdiener <brandon.kammerdiener@intel.com>
>>>>
>>>> ---
>>>>   kernel/bpf/hashtab.c | 2 +-
>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
>>>> index 4a9eeb7aef85..43574b0495c3 100644
>>>> --- a/kernel/bpf/hashtab.c
>>>> +++ b/kernel/bpf/hashtab.c
>>>> @@ -2224,7 +2224,7 @@ static long bpf_for_each_hash_elem(struct
>>>> bpf_map *map, bpf_callback_t callback_
>>>>           b = &htab->buckets[i];
>>>>           rcu_read_lock();
>>>>           head = &b->head;
>>>> -        hlist_nulls_for_each_entry_rcu(elem, n, head, hash_node) {
>>>> +        hlist_nulls_for_each_entry_safe(elem, n, head, hash_node) {
>>>>               key = elem->key;
>>>>               if (is_percpu) {
>>>>                   /* current cpu value for percpu map */
>>>> --
>>>> 2.48.1
>>>>
>>>
>>> .
> .


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] bpf: fix possible endless loop in BPF map iteration
  2025-02-27  1:32       ` Hou Tao
@ 2025-02-27  2:07         ` Alexei Starovoitov
  2025-02-27  2:24           ` Hou Tao
  2025-04-08  2:36           ` Hou Tao
  0 siblings, 2 replies; 8+ messages in thread
From: Alexei Starovoitov @ 2025-02-27  2:07 UTC (permalink / raw)
  To: Hou Tao; +Cc: Brandon Kammerdiener, Martin KaFai Lau, bpf

On Wed, Feb 26, 2025 at 5:45 PM Hou Tao <houtao@huaweicloud.com> wrote:
>
> Hi,
>
> On 2/26/2025 12:15 AM, Brandon Kammerdiener wrote:
> > On Tue, Feb 25, 2025 at 08:26:17PM +0800, Hou Tao wrote:
> >> Hi,
> >>
> >> On 2/25/2025 3:13 PM, Martin KaFai Lau wrote:
> >>> On 2/24/25 2:01 PM, Brandon Kammerdiener wrote:
> >>>> This patch fixes an endless loop condition that can occur in
> >>>> bpf_for_each_hash_elem, causing the core to softlock. My
> >>>> understanding is
> >>>> that a combination of RCU list deletion and insertion introduces the new
> >>>> element after the iteration cursor and that there is a chance that an
> >>>> RCU
> >>> new element is added to the head of the bucket, so the first thought
> >>> is it should not extend the list beyond the current iteration point...
> >>>
> >>>> reader may in fact use this new element in iteration. The patch uses a
> >>>> _safe variant of the macro which gets the next element to iterate before
> >>>> executing the loop body for the current element. The following simple
> >>>> BPF
> >>>> program can be used to reproduce the issue:
> >>>>
> >>>>      #include "vmlinux.h"
> >>>>      #include <bpf/bpf_helpers.h>
> >>>>      #include <bpf/bpf_tracing.h>
> >>>>
> >>>>      #define N (64)
> >>>>
> >>>>      struct {
> >>>>          __uint(type,        BPF_MAP_TYPE_HASH);
> >>>>          __uint(max_entries, N);
> >>>>          __type(key,         __u64);
> >>>>          __type(value,       __u64);
> >>>>      } map SEC(".maps");
> >>>>
> >>>>      static int cb(struct bpf_map *map, __u64 *key, __u64 *value,
> >>>> void *arg) {
> >>>>          bpf_map_delete_elem(map, key);
> >>>>          bpf_map_update_elem(map, &key, &val, 0);
> >>> I suspect what happened in this reproducer is,
> >>> there is a bucket with more than one elem(s) and the deleted elem gets
> >>> immediately added back to the bucket->head.
> >>> Something like this, '[ ]' as the current elem.
> >>>
> >>> 1st iteration     (follow bucket->head.first): [elem_1] ->  elem_2
> >>>                                   delete_elem:  elem_2
> >>>                                   update_elem: [elem_1] ->  elem_2
> >>> 2nd iteration (follow elem_1->hash_node.next):  elem_1  -> [elem_2]
> >>>                                   delete_elem:  elem_1
> >>>                                   update_elem: [elem_2] -> elem_1
> >>> 3rd iteration (follow elem_2->hash_node.next):  elem_2  -> [elem_1]
> >>>                   loop.......
> >>>
> >> Yes. The above procedure is exactly the test case tries to do (except
> >> the &key and &val typos).
> > Yes, apologies for the typos I must have introduced when minifying the
> > example. Should just use key and val sans the &.
>
> OK
> >
> >>> don't think "_safe" covers all cases though. "_safe" may solve this
> >>> particular reproducer which is shooting itself in the foot by deleting
> >>> and adding itself when iterating a bucket.
> >> Yes, if the bpf prog could delete and re-add the saved next entry, there
> >> will be dead loop as well. It seems __htab_map_lookup_elem() may suffer
> >> from the same problem just as bpf_for_each_hash_elem(). The problem is
> >> mainly due to the immediate reuse of deleted element. Maybe we need to
> >> add a seqlock to the htab_elem and retry the traversal if the seqlock is
> >> not stable.
>
> The seqlock + restart traversal way doesn't work, because the seq-count
> for each element will add 2 after the re-insert and the loop will always
> try to restart the traversal. I have another idea: how about add an
> per-bucket incremental id for each element in the bucket and during the
> traversal, the traversal will ignore the element with id greater than
> the id of current element, so it will ignore the newly-added element.
> For example, there are three elements in a bucket list: head -> A [id=3]
> -> B[id=2] -> C[id=1]
>
> (1) pass A to cb
> current id = 3
> cb deletes A and inserts A again
> head -> A[4] -> B[2] -> C[1]
>
> (2) pass B to cb
> current id=2
> cb deletes B and inserts B again
> head -> B[5] -> A[4] -> C[1]
>
> the id of A is greater than current id, so it is skipped.

This looks like unnecessary overhead that attempts
to reinvent nulls logic.

At present I'm not convinced that lookup_nulls_elem_raw() is broken.
The only issue with bpf_for_each_hash_elem() that loops forever
in this convoluted case and _safe() is the right fix for it.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] bpf: fix possible endless loop in BPF map iteration
  2025-02-27  2:07         ` Alexei Starovoitov
@ 2025-02-27  2:24           ` Hou Tao
  2025-04-08  2:36           ` Hou Tao
  1 sibling, 0 replies; 8+ messages in thread
From: Hou Tao @ 2025-02-27  2:24 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: Brandon Kammerdiener, Martin KaFai Lau, bpf

Hi,

On 2/27/2025 10:07 AM, Alexei Starovoitov wrote:
> On Wed, Feb 26, 2025 at 5:45 PM Hou Tao <houtao@huaweicloud.com> wrote:
>> Hi,
>>
>> On 2/26/2025 12:15 AM, Brandon Kammerdiener wrote:
>>> On Tue, Feb 25, 2025 at 08:26:17PM +0800, Hou Tao wrote:
>>>> Hi,
>>>>
>>>> On 2/25/2025 3:13 PM, Martin KaFai Lau wrote:
>>>>> On 2/24/25 2:01 PM, Brandon Kammerdiener wrote:
>>>>>> This patch fixes an endless loop condition that can occur in
>>>>>> bpf_for_each_hash_elem, causing the core to softlock. My
>>>>>> understanding is
>>>>>> that a combination of RCU list deletion and insertion introduces the new
>>>>>> element after the iteration cursor and that there is a chance that an
>>>>>> RCU
>>>>> new element is added to the head of the bucket, so the first thought
>>>>> is it should not extend the list beyond the current iteration point...
>>>>>
>>>>>> reader may in fact use this new element in iteration. The patch uses a
>>>>>> _safe variant of the macro which gets the next element to iterate before
>>>>>> executing the loop body for the current element. The following simple
>>>>>> BPF
>>>>>> program can be used to reproduce the issue:
>>>>>>
>>>>>>      #include "vmlinux.h"
>>>>>>      #include <bpf/bpf_helpers.h>
>>>>>>      #include <bpf/bpf_tracing.h>
>>>>>>
>>>>>>      #define N (64)
>>>>>>
>>>>>>      struct {
>>>>>>          __uint(type,        BPF_MAP_TYPE_HASH);
>>>>>>          __uint(max_entries, N);
>>>>>>          __type(key,         __u64);
>>>>>>          __type(value,       __u64);
>>>>>>      } map SEC(".maps");
>>>>>>
>>>>>>      static int cb(struct bpf_map *map, __u64 *key, __u64 *value,
>>>>>> void *arg) {
>>>>>>          bpf_map_delete_elem(map, key);
>>>>>>          bpf_map_update_elem(map, &key, &val, 0);
>>>>> I suspect what happened in this reproducer is,
>>>>> there is a bucket with more than one elem(s) and the deleted elem gets
>>>>> immediately added back to the bucket->head.
>>>>> Something like this, '[ ]' as the current elem.
>>>>>
>>>>> 1st iteration     (follow bucket->head.first): [elem_1] ->  elem_2
>>>>>                                   delete_elem:  elem_2
>>>>>                                   update_elem: [elem_1] ->  elem_2
>>>>> 2nd iteration (follow elem_1->hash_node.next):  elem_1  -> [elem_2]
>>>>>                                   delete_elem:  elem_1
>>>>>                                   update_elem: [elem_2] -> elem_1
>>>>> 3rd iteration (follow elem_2->hash_node.next):  elem_2  -> [elem_1]
>>>>>                   loop.......
>>>>>
>>>> Yes. The above procedure is exactly the test case tries to do (except
>>>> the &key and &val typos).
>>> Yes, apologies for the typos I must have introduced when minifying the
>>> example. Should just use key and val sans the &.
>> OK
>>>>> don't think "_safe" covers all cases though. "_safe" may solve this
>>>>> particular reproducer which is shooting itself in the foot by deleting
>>>>> and adding itself when iterating a bucket.
>>>> Yes, if the bpf prog could delete and re-add the saved next entry, there
>>>> will be dead loop as well. It seems __htab_map_lookup_elem() may suffer
>>>> from the same problem just as bpf_for_each_hash_elem(). The problem is
>>>> mainly due to the immediate reuse of deleted element. Maybe we need to
>>>> add a seqlock to the htab_elem and retry the traversal if the seqlock is
>>>> not stable.
>> The seqlock + restart traversal way doesn't work, because the seq-count
>> for each element will add 2 after the re-insert and the loop will always
>> try to restart the traversal. I have another idea: how about add an
>> per-bucket incremental id for each element in the bucket and during the
>> traversal, the traversal will ignore the element with id greater than
>> the id of current element, so it will ignore the newly-added element.
>> For example, there are three elements in a bucket list: head -> A [id=3]
>> -> B[id=2] -> C[id=1]
>>
>> (1) pass A to cb
>> current id = 3
>> cb deletes A and inserts A again
>> head -> A[4] -> B[2] -> C[1]
>>
>> (2) pass B to cb
>> current id=2
>> cb deletes B and inserts B again
>> head -> B[5] -> A[4] -> C[1]
>>
>> the id of A is greater than current id, so it is skipped.
> This looks like unnecessary overhead that attempts
> to reinvent nulls logic.

OK.
>
> At present I'm not convinced that lookup_nulls_elem_raw() is broken.
> The only issue with bpf_for_each_hash_elem() that loops forever
> in this convoluted case and _safe() is the right fix for it.
For lookup_nulls_elem_raw(), I think the dead loop in lookup depends on
the exact synchronization between lookup procedure and update/deletion
procedure, and it will hard to reproduce it.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] bpf: fix possible endless loop in BPF map iteration
  2025-02-27  2:07         ` Alexei Starovoitov
  2025-02-27  2:24           ` Hou Tao
@ 2025-04-08  2:36           ` Hou Tao
  1 sibling, 0 replies; 8+ messages in thread
From: Hou Tao @ 2025-04-08  2:36 UTC (permalink / raw)
  To: Brandon Kammerdiener; +Cc: Martin KaFai Lau, bpf, Alexei Starovoitov

Hi Brandon,

On 2/27/2025 10:07 AM, Alexei Starovoitov wrote:
> On Wed, Feb 26, 2025 at 5:45 PM Hou Tao <houtao@huaweicloud.com> wrote:
>> Hi,
>>
>> On 2/26/2025 12:15 AM, Brandon Kammerdiener wrote:
>>> On Tue, Feb 25, 2025 at 08:26:17PM +0800, Hou Tao wrote:
>>>> Hi,
>>>>
>>>> On 2/25/2025 3:13 PM, Martin KaFai Lau wrote:
>>>>> On 2/24/25 2:01 PM, Brandon Kammerdiener wrote:
>>>>>> This patch fixes an endless loop condition that can occur in
>>>>>> bpf_for_each_hash_elem, causing the core to softlock. My
>>>>>> understanding is
>>>>>> that a combination of RCU list deletion and insertion introduces the new
>>>>>> element after the iteration cursor and that there is a chance that an
>>>>>> RCU
>>>>> new element is added to the head of the bucket, so the first thought
>>>>> is it should not extend the list beyond the current iteration point...
>>>>>
>>>>>> reader may in fact use this new element in iteration. The patch uses a
>>>>>> _safe variant of the macro which gets the next element to iterate before
>>>>>> executing the loop body for the current element. The following simple
>>>>>> BPF
>>>>>> program can be used to reproduce the issue:
>>>>>>
>>>>>>      #include "vmlinux.h"
>>>>>>      #include <bpf/bpf_helpers.h>
>>>>>>      #include <bpf/bpf_tracing.h>
>>>>>>
>>>>>>      #define N (64)
>>>>>>
>>>>>>      struct {
>>>>>>          __uint(type,        BPF_MAP_TYPE_HASH);
>>>>>>          __uint(max_entries, N);
>>>>>>          __type(key,         __u64);
>>>>>>          __type(value,       __u64);
>>>>>>      } map SEC(".maps");
>>>>>>
>>>>>>      static int cb(struct bpf_map *map, __u64 *key, __u64 *value,
>>>>>> void *arg) {
>>>>>>          bpf_map_delete_elem(map, key);
>>>>>>          bpf_map_update_elem(map, &key, &val, 0);
>>>>> I suspect what happened in this reproducer is,
>>>>> there is a bucket with more than one elem(s) and the deleted elem gets
>>>>> immediately added back to the bucket->head.
>>>>> Something like this, '[ ]' as the current elem.
>>>>>
>>>>> 1st iteration     (follow bucket->head.first): [elem_1] ->  elem_2
>>>>>                                   delete_elem:  elem_2
>>>>>                                   update_elem: [elem_1] ->  elem_2
>>>>> 2nd iteration (follow elem_1->hash_node.next):  elem_1  -> [elem_2]
>>>>>                                   delete_elem:  elem_1
>>>>>                                   update_elem: [elem_2] -> elem_1
>>>>> 3rd iteration (follow elem_2->hash_node.next):  elem_2  -> [elem_1]
>>>>>                   loop.......
>>>>>
>>>> Yes. The above procedure is exactly the test case tries to do (except
>>>> the &key and &val typos).
>>> Yes, apologies for the typos I must have introduced when minifying the
>>> example. Should just use key and val sans the &.

Could you please resend the patch after fixing the typos in the patch ?
It will be better to add a new selftest for the fix.
>> OK
>>>>> don't think "_safe" covers all cases though. "_safe" may solve this
>>>>> particular reproducer which is shooting itself in the foot by deleting
>>>>> and adding itself when iterating a bucket.
>>>> Yes, if the bpf prog could delete and re-add the saved next entry, there
>>>> will be dead loop as well. It seems __htab_map_lookup_elem() may suffer
>>>> from the same problem just as bpf_for_each_hash_elem(). The problem is
>>>> mainly due to the immediate reuse of deleted element. Maybe we need to
>>>> add a seqlock to the htab_elem and retry the traversal if the seqlock is
>>>> not stable.
>> The seqlock + restart traversal way doesn't work, because the seq-count
>> for each element will add 2 after the re-insert and the loop will always
>> try to restart the traversal. I have another idea: how about add an
>> per-bucket incremental id for each element in the bucket and during the
>> traversal, the traversal will ignore the element with id greater than
>> the id of current element, so it will ignore the newly-added element.
>> For example, there are three elements in a bucket list: head -> A [id=3]
>> -> B[id=2] -> C[id=1]
>>
>> (1) pass A to cb
>> current id = 3
>> cb deletes A and inserts A again
>> head -> A[4] -> B[2] -> C[1]
>>
>> (2) pass B to cb
>> current id=2
>> cb deletes B and inserts B again
>> head -> B[5] -> A[4] -> C[1]
>>
>> the id of A is greater than current id, so it is skipped.
> This looks like unnecessary overhead that attempts
> to reinvent nulls logic.
>
> At present I'm not convinced that lookup_nulls_elem_raw() is broken.
> The only issue with bpf_for_each_hash_elem() that loops forever
> in this convoluted case and _safe() is the right fix for it.
>
> .


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-04-08  2:36 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-24 22:01 [PATCH] bpf: fix possible endless loop in BPF map iteration Brandon Kammerdiener
2025-02-25  7:13 ` Martin KaFai Lau
2025-02-25 12:26   ` Hou Tao
2025-02-25 16:15     ` Brandon Kammerdiener
2025-02-27  1:32       ` Hou Tao
2025-02-27  2:07         ` Alexei Starovoitov
2025-02-27  2:24           ` Hou Tao
2025-04-08  2:36           ` Hou Tao

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.