bpf_map_update_elem returns -ENOMEM

All of lore.kernel.org
 help / color / mirror / Atom feed

* bpf_map_update_elem returns -ENOMEM
@ 2024-05-06 15:19 Chase Hiltz
  2024-05-08 11:19 ` Donald Hunter
  2024-05-16 11:29 ` Hou Tao
  0 siblings, 2 replies; 5+ messages in thread
From: Chase Hiltz @ 2024-05-06 15:19 UTC (permalink / raw)
  To: xdp-newbies

Hi,

I'm writing regarding a rather bizarre scenario that I'm hoping
someone could provide insight on. I have a map defined as follows:
```
struct {
    __uint(type, BPF_MAP_TYPE_LRU_HASH);
    __uint(max_entries, 1000000);
    __type(key, struct my_map_key);
    __type(value, struct my_map_val);
    __uint(map_flags, BPF_F_NO_COMMON_LRU);
    __uint(pinning, LIBBPF_PIN_BY_NAME);
} my_map SEC(".maps");
```
I have several fentry/fexit programs that need to perform updates in
this map. After a certain number of map entries has been reached,
calls to bpf_map_update_elem start returning `-ENOMEM`. As one
example, I'm observing a program deployment where we have 816032
entries on a 64 CPU machine, and a certain portion of updates are
failing. I'm puzzled as to why this is occurring given that:
- The 1M entries should be preallocated upon map creation (since I'm
not using `BPF_F_NO_PREALLOC`)
- The host machine has over 120G of unused memory available at any given time

I've previously reduced max_entries by 25% under the assumption that
this would prevent the problem from occurring, but this only caused
map updates to start failing at a lower threshold. I believe that this
is a problem with maps using the `BPF_F_NO_COMMON_LRU` flag, my
reasoning being that when map updates fail, it occurs consistently for
specific CPUs.
At this time, all machines experiencing the problem are running kernel
version 5.15, however I'm not currently able to try out any newer
kernels to confirm whether or not the same problem occurs there. Any
ideas on what could be responsible for this would be greatly
appreciated!

Thanks,
Chase Hiltz

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: bpf_map_update_elem returns -ENOMEM
  2024-05-06 15:19 bpf_map_update_elem returns -ENOMEM Chase Hiltz
@ 2024-05-08 11:19 ` Donald Hunter
  2024-05-16 11:29 ` Hou Tao
  1 sibling, 0 replies; 5+ messages in thread
From: Donald Hunter @ 2024-05-08 11:19 UTC (permalink / raw)
  To: Chase Hiltz; +Cc: xdp-newbies

Chase Hiltz <chase@path.net> writes:

> Hi,
>
> I'm writing regarding a rather bizarre scenario that I'm hoping
> someone could provide insight on. I have a map defined as follows:
> ```
> struct {
>     __uint(type, BPF_MAP_TYPE_LRU_HASH);
>     __uint(max_entries, 1000000);
>     __type(key, struct my_map_key);
>     __type(value, struct my_map_val);
>     __uint(map_flags, BPF_F_NO_COMMON_LRU);
>     __uint(pinning, LIBBPF_PIN_BY_NAME);
> } my_map SEC(".maps");
> ```
> I have several fentry/fexit programs that need to perform updates in
> this map. After a certain number of map entries has been reached,
> calls to bpf_map_update_elem start returning `-ENOMEM`. As one
> example, I'm observing a program deployment where we have 816032
> entries on a 64 CPU machine, and a certain portion of updates are
> failing. I'm puzzled as to why this is occurring given that:
> - The 1M entries should be preallocated upon map creation (since I'm
> not using `BPF_F_NO_PREALLOC`)
> - The host machine has over 120G of unused memory available at any
> given time

I hoped that I might be able to help here, given that I wrote the
documentation for BPF_MAP_TYPE_LRU_HASH. Unfortunately the details of
LRU eviction are complex, especially when using BPF_F_NO_COMMON_LRU for
per-cpu LRU lists.

The LRU documentation was updated by Joe Stringer, including a flowchart
which you might find helpful:

https://docs.kernel.org/bpf/map_hash.html

Joe also gave a talk about LRU maps LPC a couple of years ago which
might give some insight:

https://lpc.events/event/16/contributions/1368/

> I've previously reduced max_entries by 25% under the assumption that
> this would prevent the problem from occurring, but this only caused
> map updates to start failing at a lower threshold. I believe that this
> is a problem with maps using the `BPF_F_NO_COMMON_LRU` flag, my
> reasoning being that when map updates fail, it occurs consistently for
> specific CPUs.
> At this time, all machines experiencing the problem are running kernel
> version 5.15, however I'm not currently able to try out any newer
> kernels to confirm whether or not the same problem occurs there. Any
> ideas on what could be responsible for this would be greatly
> appreciated!

There have been several updates to the LRU map code since 5.15 so it is
definitely possible that it will behave differently on a 6.x kernel.

> Thanks,
> Chase Hiltz

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: bpf_map_update_elem returns -ENOMEM
  2024-05-06 15:19 bpf_map_update_elem returns -ENOMEM Chase Hiltz
  2024-05-08 11:19 ` Donald Hunter
@ 2024-05-16 11:29 ` Hou Tao
  2024-05-17 13:52   ` Chase Hiltz
  1 sibling, 1 reply; 5+ messages in thread
From: Hou Tao @ 2024-05-16 11:29 UTC (permalink / raw)
  To: Chase Hiltz, xdp-newbies; +Cc: bpf

Hi,

+cc bpf list

On 5/6/2024 11:19 PM, Chase Hiltz wrote:
> Hi,
>
> I'm writing regarding a rather bizarre scenario that I'm hoping
> someone could provide insight on. I have a map defined as follows:
> ```
> struct {
>     __uint(type, BPF_MAP_TYPE_LRU_HASH);
>     __uint(max_entries, 1000000);
>     __type(key, struct my_map_key);
>     __type(value, struct my_map_val);
>     __uint(map_flags, BPF_F_NO_COMMON_LRU);
>     __uint(pinning, LIBBPF_PIN_BY_NAME);
> } my_map SEC(".maps");
> ```
> I have several fentry/fexit programs that need to perform updates in
> this map. After a certain number of map entries has been reached,
> calls to bpf_map_update_elem start returning `-ENOMEM`. As one
> example, I'm observing a program deployment where we have 816032
> entries on a 64 CPU machine, and a certain portion of updates are
> failing. I'm puzzled as to why this is occurring given that:
> - The 1M entries should be preallocated upon map creation (since I'm
> not using `BPF_F_NO_PREALLOC`)
> - The host machine has over 120G of unused memory available at any given time
>
> I've previously reduced max_entries by 25% under the assumption that
> this would prevent the problem from occurring, but this only caused

For LRU map with BPF_F_NO_PREALLOC, the number of entries is distributed
evenly between all CPUs. For your case, each CPU will have 1M/64 = 15625
entries. In order to reduce of possibility of ENOMEM error, the right
way is to increase the value of max_entries instead of decreasing it.
> map updates to start failing at a lower threshold. I believe that this
> is a problem with maps using the `BPF_F_NO_COMMON_LRU` flag, my
> reasoning being that when map updates fail, it occurs consistently for
> specific CPUs.

Does the specific CPU always fail afterwards, or does it fail
periodically ? Is the machine running the bpf program an arm64 host or
an x86-64 host (namely uname -a) ? I suspect that the problem may be due
to htab_lock_bucket() which may fail under arm64 host in v5.15.

Could you please check and account the ratio of times when
htab_lru_map_delete_node() returns 0 ? If the ratio high, it probably
means that there may be too many overwrites of entries between different
CPUs (e.g., CPU 0 updates key=X, then CPU 1 updates the same key again).
> At this time, all machines experiencing the problem are running kernel
> version 5.15, however I'm not currently able to try out any newer
> kernels to confirm whether or not the same problem occurs there. Any
> ideas on what could be responsible for this would be greatly
> appreciated!
>
> Thanks,
> Chase Hiltz
>
> .


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: bpf_map_update_elem returns -ENOMEM
  2024-05-16 11:29 ` Hou Tao
@ 2024-05-17 13:52   ` Chase Hiltz
  2024-05-18  4:32     ` Hou Tao
  0 siblings, 1 reply; 5+ messages in thread
From: Chase Hiltz @ 2024-05-17 13:52 UTC (permalink / raw)
  To: Hou Tao; +Cc: xdp-newbies, bpf

Hi,

Thanks for the replies.

> Joe also gave a talk about LRU maps LPC a couple of years ago which
> might give some insight:
Thanks, this was very helpful in understanding how LRU eviction works!
I definitely think it's related to high levels of contention on
individual machines causing LRU eviction to fail, given that I'm only
seeing it occur for those which consistently process the most packets.

> There have been several updates to the LRU map code since 5.15 so it is
> definitely possible that it will behave differently on a 6.x kernel.
I've compared the implementation between 5.15 and 6.5 (what I would
consider as a potential upgrade) and observed no more than a few
refactoring changes, but of course it's possible that I missed
something.

> In order to reduce of possibility of ENOMEM error, the right
> way is to increase the value of max_entries instead of decreasing it.
Yes, I now see the error of my ways in thinking that reducing it would
help at all when it actually hurts. For the time being, I'm going to
do this as a temporary remediation.

> Does the specific CPU always fail afterwards, or does it fail
> periodically ? Is the machine running the bpf program an arm64 host or
> an x86-64 host (namely uname -a) ? I suspect that the problem may be due
> to htab_lock_bucket() which may fail under arm64 host in v5.15
It always fails afterwards, I'm doing RSS and we notice this problem
occurring back-to-back for specific source-destination pairs (because
they always land on the same queue). This is a 64-bit system:
```
$ uname -a
5.15.0-76-generic #83-Ubuntu SMP Thu Jun 15 19:16:32 UTC 2023 x86_64
x86_64 x86_64 GNU/Linux
```

> Could you please check and account the ratio of times when
> htab_lru_map_delete_node() returns 0 ? If the ratio high, it probably
> means that there may be too many overwrites of entries between different
> CPUs (e.g., CPU 0 updates key=X, then CPU 1 updates the same key again).
I'm not aware of any way to get that information, if you have any
pointers I'd be happy to check this.


On Thu, 16 May 2024 at 07:29, Hou Tao <houtao@huaweicloud.com> wrote:
>
> Hi,
>
> +cc bpf list
>
> On 5/6/2024 11:19 PM, Chase Hiltz wrote:
> > Hi,
> >
> > I'm writing regarding a rather bizarre scenario that I'm hoping
> > someone could provide insight on. I have a map defined as follows:
> > ```
> > struct {
> >     __uint(type, BPF_MAP_TYPE_LRU_HASH);
> >     __uint(max_entries, 1000000);
> >     __type(key, struct my_map_key);
> >     __type(value, struct my_map_val);
> >     __uint(map_flags, BPF_F_NO_COMMON_LRU);
> >     __uint(pinning, LIBBPF_PIN_BY_NAME);
> > } my_map SEC(".maps");
> > ```
> > I have several fentry/fexit programs that need to perform updates in
> > this map. After a certain number of map entries has been reached,
> > calls to bpf_map_update_elem start returning `-ENOMEM`. As one
> > example, I'm observing a program deployment where we have 816032
> > entries on a 64 CPU machine, and a certain portion of updates are
> > failing. I'm puzzled as to why this is occurring given that:
> > - The 1M entries should be preallocated upon map creation (since I'm
> > not using `BPF_F_NO_PREALLOC`)
> > - The host machine has over 120G of unused memory available at any given time
> >
> > I've previously reduced max_entries by 25% under the assumption that
> > this would prevent the problem from occurring, but this only caused
>
> For LRU map with BPF_F_NO_PREALLOC, the number of entries is distributed
> evenly between all CPUs. For your case, each CPU will have 1M/64 = 15625
> entries. In order to reduce of possibility of ENOMEM error, the right
> way is to increase the value of max_entries instead of decreasing it.
> > map updates to start failing at a lower threshold. I believe that this
> > is a problem with maps using the `BPF_F_NO_COMMON_LRU` flag, my
> > reasoning being that when map updates fail, it occurs consistently for
> > specific CPUs.
>
> Does the specific CPU always fail afterwards, or does it fail
> periodically ? Is the machine running the bpf program an arm64 host or
> an x86-64 host (namely uname -a) ? I suspect that the problem may be due
> to htab_lock_bucket() which may fail under arm64 host in v5.15.
>
> Could you please check and account the ratio of times when
> htab_lru_map_delete_node() returns 0 ? If the ratio high, it probably
> means that there may be too many overwrites of entries between different
> CPUs (e.g., CPU 0 updates key=X, then CPU 1 updates the same key again).
> > At this time, all machines experiencing the problem are running kernel
> > version 5.15, however I'm not currently able to try out any newer
> > kernels to confirm whether or not the same problem occurs there. Any
> > ideas on what could be responsible for this would be greatly
> > appreciated!
> >
> > Thanks,
> > Chase Hiltz
> >
> > .
>


-- 


Chase Hiltz



XDP Developer, Path Network

A 6991 E Camelback Rd., Suite D-300, Scottsdale AZ, 85251

W www.path.net  M +1 819 816 4353

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: bpf_map_update_elem returns -ENOMEM
  2024-05-17 13:52   ` Chase Hiltz
@ 2024-05-18  4:32     ` Hou Tao
  0 siblings, 0 replies; 5+ messages in thread
From: Hou Tao @ 2024-05-18  4:32 UTC (permalink / raw)
  To: Chase Hiltz; +Cc: xdp-newbies, bpf

Hi,

On 5/17/2024 9:52 PM, Chase Hiltz wrote:
> Hi,
>
> Thanks for the replies.
>
>> Joe also gave a talk about LRU maps LPC a couple of years ago which
>> might give some insight:
> Thanks, this was very helpful in understanding how LRU eviction works!
> I definitely think it's related to high levels of contention on
> individual machines causing LRU eviction to fail, given that I'm only
> seeing it occur for those which consistently process the most packets.
>
>> There have been several updates to the LRU map code since 5.15 so it is
>> definitely possible that it will behave differently on a 6.x kernel.
> I've compared the implementation between 5.15 and 6.5 (what I would
> consider as a potential upgrade) and observed no more than a few
> refactoring changes, but of course it's possible that I missed
> something.
>
>> In order to reduce of possibility of ENOMEM error, the right
>> way is to increase the value of max_entries instead of decreasing it.
> Yes, I now see the error of my ways in thinking that reducing it would
> help at all when it actually hurts. For the time being, I'm going to
> do this as a temporary remediation.

Is there a special reason on why use
>> Does the specific CPU always fail afterwards, or does it fail
>> periodically ? Is the machine running the bpf program an arm64 host or
>> an x86-64 host (namely uname -a) ? I suspect that the problem may be due
>> to htab_lock_bucket() which may fail under arm64 host in v5.15
> It always fails afterwards, I'm doing RSS and we notice this problem
> occurring back-to-back for specific source-destination pairs (because
> they always land on the same queue). This is a 64-bit system:
> ```
> $ uname -a
> 5.15.0-76-generic #83-Ubuntu SMP Thu Jun 15 19:16:32 UTC 2023 x86_64
> x86_64 x86_64 GNU/Linux
> ```

It is an x86-64 host, so my previous guess is wrong.
>
>> Could you please check and account the ratio of times when
>> htab_lru_map_delete_node() returns 0 ? If the ratio high, it probably
>> means that there may be too many overwrites of entries between different
>> CPUs (e.g., CPU 0 updates key=X, then CPU 1 updates the same key again).
> I'm not aware of any way to get that information, if you have any
> pointers I'd be happy to check this.

Please install bpftrace on the host firstly, then running the following
one-line script in the host when bpf_map_update_elem() starts to return
-ENOMEM:

# sudo bpftrace -e 'kr:htab_lru_map_delete_node { if (retval == 0) {
@lock[cpu] = count(); } else { @del[retval & 0xff, cpu] = count(); } }
i:s:10 { exit(); }'

The script above tries to account the return value of
htab_lru_map_delete_node():
(1) if htab_lock_bucket() returns true,  retval will 0, so account the
case in the @lock map
(2) if the target node is found in the hash list, the lowest byte of
retval will be 1, otherwise it will 0. These returns are accounted in
@del map.

The snippet 'i:s:10 { exit(); }' is used to terminate the script after
10 seconds. You could adjust the time to a smaller one if there are too
many accounting. The following is the output from my local developer
environment:

# bpftrace -e 'kr:htab_lru_map_delete_node { if (retval == 0) {
@lock[cpu] = count(); } else { @del[retval & 0xff, cpu] = count(); } }
i:s:10 { exit(); }'
Attaching 2 probes...

@del[0, 3]: 4822
@del[0, 6]: 5656
@del[0, 2]: 5995
@del[0, 4]: 8652
@del[0, 1]: 24722
@del[0, 5]: 25146
@del[0, 0]: 36137
@del[0, 7]: 38254
@del[1, 3]: 162054
@del[1, 4]: 208696
@del[1, 6]: 245960
@del[1, 2]: 267437
@del[1, 5]: 533654
@del[1, 1]: 548974
@del[1, 7]: 618810
@del[1, 0]: 619459

>
>
> On Thu, 16 May 2024 at 07:29, Hou Tao <houtao@huaweicloud.com> wrote:
>> Hi,
>>
>> +cc bpf list
>>
>> On 5/6/2024 11:19 PM, Chase Hiltz wrote:
>>> Hi,
>>>
>>> I'm writing regarding a rather bizarre scenario that I'm hoping
>>> someone could provide insight on. I have a map defined as follows:
>>> ```
>>> struct {
>>>     __uint(type, BPF_MAP_TYPE_LRU_HASH);
>>>     __uint(max_entries, 1000000);
>>>     __type(key, struct my_map_key);
>>>     __type(value, struct my_map_val);
>>>     __uint(map_flags, BPF_F_NO_COMMON_LRU);
>>>     __uint(pinning, LIBBPF_PIN_BY_NAME);
>>> } my_map SEC(".maps");
>>> ```
>>> I have several fentry/fexit programs that need to perform updates in
>>> this map. After a certain number of map entries has been reached,
>>> calls to bpf_map_update_elem start returning `-ENOMEM`. As one
>>> example, I'm observing a program deployment where we have 816032
>>> entries on a 64 CPU machine, and a certain portion of updates are
>>> failing. I'm puzzled as to why this is occurring given that:
>>> - The 1M entries should be preallocated upon map creation (since I'm
>>> not using `BPF_F_NO_PREALLOC`)
>>> - The host machine has over 120G of unused memory available at any given time
>>>
>>> I've previously reduced max_entries by 25% under the assumption that
>>> this would prevent the problem from occurring, but this only caused
>> For LRU map with BPF_F_NO_PREALLOC, the number of entries is distributed
>> evenly between all CPUs. For your case, each CPU will have 1M/64 = 15625
>> entries. In order to reduce of possibility of ENOMEM error, the right
>> way is to increase the value of max_entries instead of decreasing it.
>>> map updates to start failing at a lower threshold. I believe that this
>>> is a problem with maps using the `BPF_F_NO_COMMON_LRU` flag, my
>>> reasoning being that when map updates fail, it occurs consistently for
>>> specific CPUs.
>> Does the specific CPU always fail afterwards, or does it fail
>> periodically ? Is the machine running the bpf program an arm64 host or
>> an x86-64 host (namely uname -a) ? I suspect that the problem may be due
>> to htab_lock_bucket() which may fail under arm64 host in v5.15.
>>
>> Could you please check and account the ratio of times when
>> htab_lru_map_delete_node() returns 0 ? If the ratio high, it probably
>> means that there may be too many overwrites of entries between different
>> CPUs (e.g., CPU 0 updates key=X, then CPU 1 updates the same key again).
>>> At this time, all machines experiencing the problem are running kernel
>>> version 5.15, however I'm not currently able to try out any newer
>>> kernels to confirm whether or not the same problem occurs there. Any
>>> ideas on what could be responsible for this would be greatly
>>> appreciated!
>>>
>>> Thanks,
>>> Chase Hiltz
>>>
>>> .
>


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-05-18  4:49 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-06 15:19 bpf_map_update_elem returns -ENOMEM Chase Hiltz
2024-05-08 11:19 ` Donald Hunter
2024-05-16 11:29 ` Hou Tao
2024-05-17 13:52   ` Chase Hiltz
2024-05-18  4:32     ` Hou Tao

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.