* bpf_map_update_elem returns -ENOMEM
@ 2024-05-06 15:19 Chase Hiltz
2024-05-08 11:19 ` Donald Hunter
2024-05-16 11:29 ` Hou Tao
0 siblings, 2 replies; 5+ messages in thread
From: Chase Hiltz @ 2024-05-06 15:19 UTC (permalink / raw)
To: xdp-newbies
Hi,
I'm writing regarding a rather bizarre scenario that I'm hoping
someone could provide insight on. I have a map defined as follows:
```
struct {
__uint(type, BPF_MAP_TYPE_LRU_HASH);
__uint(max_entries, 1000000);
__type(key, struct my_map_key);
__type(value, struct my_map_val);
__uint(map_flags, BPF_F_NO_COMMON_LRU);
__uint(pinning, LIBBPF_PIN_BY_NAME);
} my_map SEC(".maps");
```
I have several fentry/fexit programs that need to perform updates in
this map. After a certain number of map entries has been reached,
calls to bpf_map_update_elem start returning `-ENOMEM`. As one
example, I'm observing a program deployment where we have 816032
entries on a 64 CPU machine, and a certain portion of updates are
failing. I'm puzzled as to why this is occurring given that:
- The 1M entries should be preallocated upon map creation (since I'm
not using `BPF_F_NO_PREALLOC`)
- The host machine has over 120G of unused memory available at any given time
I've previously reduced max_entries by 25% under the assumption that
this would prevent the problem from occurring, but this only caused
map updates to start failing at a lower threshold. I believe that this
is a problem with maps using the `BPF_F_NO_COMMON_LRU` flag, my
reasoning being that when map updates fail, it occurs consistently for
specific CPUs.
At this time, all machines experiencing the problem are running kernel
version 5.15, however I'm not currently able to try out any newer
kernels to confirm whether or not the same problem occurs there. Any
ideas on what could be responsible for this would be greatly
appreciated!
Thanks,
Chase Hiltz
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: bpf_map_update_elem returns -ENOMEM 2024-05-06 15:19 bpf_map_update_elem returns -ENOMEM Chase Hiltz @ 2024-05-08 11:19 ` Donald Hunter 2024-05-16 11:29 ` Hou Tao 1 sibling, 0 replies; 5+ messages in thread From: Donald Hunter @ 2024-05-08 11:19 UTC (permalink / raw) To: Chase Hiltz; +Cc: xdp-newbies Chase Hiltz <chase@path.net> writes: > Hi, > > I'm writing regarding a rather bizarre scenario that I'm hoping > someone could provide insight on. I have a map defined as follows: > ``` > struct { > __uint(type, BPF_MAP_TYPE_LRU_HASH); > __uint(max_entries, 1000000); > __type(key, struct my_map_key); > __type(value, struct my_map_val); > __uint(map_flags, BPF_F_NO_COMMON_LRU); > __uint(pinning, LIBBPF_PIN_BY_NAME); > } my_map SEC(".maps"); > ``` > I have several fentry/fexit programs that need to perform updates in > this map. After a certain number of map entries has been reached, > calls to bpf_map_update_elem start returning `-ENOMEM`. As one > example, I'm observing a program deployment where we have 816032 > entries on a 64 CPU machine, and a certain portion of updates are > failing. I'm puzzled as to why this is occurring given that: > - The 1M entries should be preallocated upon map creation (since I'm > not using `BPF_F_NO_PREALLOC`) > - The host machine has over 120G of unused memory available at any > given time I hoped that I might be able to help here, given that I wrote the documentation for BPF_MAP_TYPE_LRU_HASH. Unfortunately the details of LRU eviction are complex, especially when using BPF_F_NO_COMMON_LRU for per-cpu LRU lists. The LRU documentation was updated by Joe Stringer, including a flowchart which you might find helpful: https://docs.kernel.org/bpf/map_hash.html Joe also gave a talk about LRU maps LPC a couple of years ago which might give some insight: https://lpc.events/event/16/contributions/1368/ > I've previously reduced max_entries by 25% under the assumption that > this would prevent the problem from occurring, but this only caused > map updates to start failing at a lower threshold. I believe that this > is a problem with maps using the `BPF_F_NO_COMMON_LRU` flag, my > reasoning being that when map updates fail, it occurs consistently for > specific CPUs. > At this time, all machines experiencing the problem are running kernel > version 5.15, however I'm not currently able to try out any newer > kernels to confirm whether or not the same problem occurs there. Any > ideas on what could be responsible for this would be greatly > appreciated! There have been several updates to the LRU map code since 5.15 so it is definitely possible that it will behave differently on a 6.x kernel. > Thanks, > Chase Hiltz ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: bpf_map_update_elem returns -ENOMEM 2024-05-06 15:19 bpf_map_update_elem returns -ENOMEM Chase Hiltz 2024-05-08 11:19 ` Donald Hunter @ 2024-05-16 11:29 ` Hou Tao 2024-05-17 13:52 ` Chase Hiltz 1 sibling, 1 reply; 5+ messages in thread From: Hou Tao @ 2024-05-16 11:29 UTC (permalink / raw) To: Chase Hiltz, xdp-newbies; +Cc: bpf Hi, +cc bpf list On 5/6/2024 11:19 PM, Chase Hiltz wrote: > Hi, > > I'm writing regarding a rather bizarre scenario that I'm hoping > someone could provide insight on. I have a map defined as follows: > ``` > struct { > __uint(type, BPF_MAP_TYPE_LRU_HASH); > __uint(max_entries, 1000000); > __type(key, struct my_map_key); > __type(value, struct my_map_val); > __uint(map_flags, BPF_F_NO_COMMON_LRU); > __uint(pinning, LIBBPF_PIN_BY_NAME); > } my_map SEC(".maps"); > ``` > I have several fentry/fexit programs that need to perform updates in > this map. After a certain number of map entries has been reached, > calls to bpf_map_update_elem start returning `-ENOMEM`. As one > example, I'm observing a program deployment where we have 816032 > entries on a 64 CPU machine, and a certain portion of updates are > failing. I'm puzzled as to why this is occurring given that: > - The 1M entries should be preallocated upon map creation (since I'm > not using `BPF_F_NO_PREALLOC`) > - The host machine has over 120G of unused memory available at any given time > > I've previously reduced max_entries by 25% under the assumption that > this would prevent the problem from occurring, but this only caused For LRU map with BPF_F_NO_PREALLOC, the number of entries is distributed evenly between all CPUs. For your case, each CPU will have 1M/64 = 15625 entries. In order to reduce of possibility of ENOMEM error, the right way is to increase the value of max_entries instead of decreasing it. > map updates to start failing at a lower threshold. I believe that this > is a problem with maps using the `BPF_F_NO_COMMON_LRU` flag, my > reasoning being that when map updates fail, it occurs consistently for > specific CPUs. Does the specific CPU always fail afterwards, or does it fail periodically ? Is the machine running the bpf program an arm64 host or an x86-64 host (namely uname -a) ? I suspect that the problem may be due to htab_lock_bucket() which may fail under arm64 host in v5.15. Could you please check and account the ratio of times when htab_lru_map_delete_node() returns 0 ? If the ratio high, it probably means that there may be too many overwrites of entries between different CPUs (e.g., CPU 0 updates key=X, then CPU 1 updates the same key again). > At this time, all machines experiencing the problem are running kernel > version 5.15, however I'm not currently able to try out any newer > kernels to confirm whether or not the same problem occurs there. Any > ideas on what could be responsible for this would be greatly > appreciated! > > Thanks, > Chase Hiltz > > . ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: bpf_map_update_elem returns -ENOMEM 2024-05-16 11:29 ` Hou Tao @ 2024-05-17 13:52 ` Chase Hiltz 2024-05-18 4:32 ` Hou Tao 0 siblings, 1 reply; 5+ messages in thread From: Chase Hiltz @ 2024-05-17 13:52 UTC (permalink / raw) To: Hou Tao; +Cc: xdp-newbies, bpf Hi, Thanks for the replies. > Joe also gave a talk about LRU maps LPC a couple of years ago which > might give some insight: Thanks, this was very helpful in understanding how LRU eviction works! I definitely think it's related to high levels of contention on individual machines causing LRU eviction to fail, given that I'm only seeing it occur for those which consistently process the most packets. > There have been several updates to the LRU map code since 5.15 so it is > definitely possible that it will behave differently on a 6.x kernel. I've compared the implementation between 5.15 and 6.5 (what I would consider as a potential upgrade) and observed no more than a few refactoring changes, but of course it's possible that I missed something. > In order to reduce of possibility of ENOMEM error, the right > way is to increase the value of max_entries instead of decreasing it. Yes, I now see the error of my ways in thinking that reducing it would help at all when it actually hurts. For the time being, I'm going to do this as a temporary remediation. > Does the specific CPU always fail afterwards, or does it fail > periodically ? Is the machine running the bpf program an arm64 host or > an x86-64 host (namely uname -a) ? I suspect that the problem may be due > to htab_lock_bucket() which may fail under arm64 host in v5.15 It always fails afterwards, I'm doing RSS and we notice this problem occurring back-to-back for specific source-destination pairs (because they always land on the same queue). This is a 64-bit system: ``` $ uname -a 5.15.0-76-generic #83-Ubuntu SMP Thu Jun 15 19:16:32 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux ``` > Could you please check and account the ratio of times when > htab_lru_map_delete_node() returns 0 ? If the ratio high, it probably > means that there may be too many overwrites of entries between different > CPUs (e.g., CPU 0 updates key=X, then CPU 1 updates the same key again). I'm not aware of any way to get that information, if you have any pointers I'd be happy to check this. On Thu, 16 May 2024 at 07:29, Hou Tao <houtao@huaweicloud.com> wrote: > > Hi, > > +cc bpf list > > On 5/6/2024 11:19 PM, Chase Hiltz wrote: > > Hi, > > > > I'm writing regarding a rather bizarre scenario that I'm hoping > > someone could provide insight on. I have a map defined as follows: > > ``` > > struct { > > __uint(type, BPF_MAP_TYPE_LRU_HASH); > > __uint(max_entries, 1000000); > > __type(key, struct my_map_key); > > __type(value, struct my_map_val); > > __uint(map_flags, BPF_F_NO_COMMON_LRU); > > __uint(pinning, LIBBPF_PIN_BY_NAME); > > } my_map SEC(".maps"); > > ``` > > I have several fentry/fexit programs that need to perform updates in > > this map. After a certain number of map entries has been reached, > > calls to bpf_map_update_elem start returning `-ENOMEM`. As one > > example, I'm observing a program deployment where we have 816032 > > entries on a 64 CPU machine, and a certain portion of updates are > > failing. I'm puzzled as to why this is occurring given that: > > - The 1M entries should be preallocated upon map creation (since I'm > > not using `BPF_F_NO_PREALLOC`) > > - The host machine has over 120G of unused memory available at any given time > > > > I've previously reduced max_entries by 25% under the assumption that > > this would prevent the problem from occurring, but this only caused > > For LRU map with BPF_F_NO_PREALLOC, the number of entries is distributed > evenly between all CPUs. For your case, each CPU will have 1M/64 = 15625 > entries. In order to reduce of possibility of ENOMEM error, the right > way is to increase the value of max_entries instead of decreasing it. > > map updates to start failing at a lower threshold. I believe that this > > is a problem with maps using the `BPF_F_NO_COMMON_LRU` flag, my > > reasoning being that when map updates fail, it occurs consistently for > > specific CPUs. > > Does the specific CPU always fail afterwards, or does it fail > periodically ? Is the machine running the bpf program an arm64 host or > an x86-64 host (namely uname -a) ? I suspect that the problem may be due > to htab_lock_bucket() which may fail under arm64 host in v5.15. > > Could you please check and account the ratio of times when > htab_lru_map_delete_node() returns 0 ? If the ratio high, it probably > means that there may be too many overwrites of entries between different > CPUs (e.g., CPU 0 updates key=X, then CPU 1 updates the same key again). > > At this time, all machines experiencing the problem are running kernel > > version 5.15, however I'm not currently able to try out any newer > > kernels to confirm whether or not the same problem occurs there. Any > > ideas on what could be responsible for this would be greatly > > appreciated! > > > > Thanks, > > Chase Hiltz > > > > . > -- Chase Hiltz XDP Developer, Path Network A 6991 E Camelback Rd., Suite D-300, Scottsdale AZ, 85251 W www.path.net M +1 819 816 4353 ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: bpf_map_update_elem returns -ENOMEM 2024-05-17 13:52 ` Chase Hiltz @ 2024-05-18 4:32 ` Hou Tao 0 siblings, 0 replies; 5+ messages in thread From: Hou Tao @ 2024-05-18 4:32 UTC (permalink / raw) To: Chase Hiltz; +Cc: xdp-newbies, bpf Hi, On 5/17/2024 9:52 PM, Chase Hiltz wrote: > Hi, > > Thanks for the replies. > >> Joe also gave a talk about LRU maps LPC a couple of years ago which >> might give some insight: > Thanks, this was very helpful in understanding how LRU eviction works! > I definitely think it's related to high levels of contention on > individual machines causing LRU eviction to fail, given that I'm only > seeing it occur for those which consistently process the most packets. > >> There have been several updates to the LRU map code since 5.15 so it is >> definitely possible that it will behave differently on a 6.x kernel. > I've compared the implementation between 5.15 and 6.5 (what I would > consider as a potential upgrade) and observed no more than a few > refactoring changes, but of course it's possible that I missed > something. > >> In order to reduce of possibility of ENOMEM error, the right >> way is to increase the value of max_entries instead of decreasing it. > Yes, I now see the error of my ways in thinking that reducing it would > help at all when it actually hurts. For the time being, I'm going to > do this as a temporary remediation. Is there a special reason on why use >> Does the specific CPU always fail afterwards, or does it fail >> periodically ? Is the machine running the bpf program an arm64 host or >> an x86-64 host (namely uname -a) ? I suspect that the problem may be due >> to htab_lock_bucket() which may fail under arm64 host in v5.15 > It always fails afterwards, I'm doing RSS and we notice this problem > occurring back-to-back for specific source-destination pairs (because > they always land on the same queue). This is a 64-bit system: > ``` > $ uname -a > 5.15.0-76-generic #83-Ubuntu SMP Thu Jun 15 19:16:32 UTC 2023 x86_64 > x86_64 x86_64 GNU/Linux > ``` It is an x86-64 host, so my previous guess is wrong. > >> Could you please check and account the ratio of times when >> htab_lru_map_delete_node() returns 0 ? If the ratio high, it probably >> means that there may be too many overwrites of entries between different >> CPUs (e.g., CPU 0 updates key=X, then CPU 1 updates the same key again). > I'm not aware of any way to get that information, if you have any > pointers I'd be happy to check this. Please install bpftrace on the host firstly, then running the following one-line script in the host when bpf_map_update_elem() starts to return -ENOMEM: # sudo bpftrace -e 'kr:htab_lru_map_delete_node { if (retval == 0) { @lock[cpu] = count(); } else { @del[retval & 0xff, cpu] = count(); } } i:s:10 { exit(); }' The script above tries to account the return value of htab_lru_map_delete_node(): (1) if htab_lock_bucket() returns true, retval will 0, so account the case in the @lock map (2) if the target node is found in the hash list, the lowest byte of retval will be 1, otherwise it will 0. These returns are accounted in @del map. The snippet 'i:s:10 { exit(); }' is used to terminate the script after 10 seconds. You could adjust the time to a smaller one if there are too many accounting. The following is the output from my local developer environment: # bpftrace -e 'kr:htab_lru_map_delete_node { if (retval == 0) { @lock[cpu] = count(); } else { @del[retval & 0xff, cpu] = count(); } } i:s:10 { exit(); }' Attaching 2 probes... @del[0, 3]: 4822 @del[0, 6]: 5656 @del[0, 2]: 5995 @del[0, 4]: 8652 @del[0, 1]: 24722 @del[0, 5]: 25146 @del[0, 0]: 36137 @del[0, 7]: 38254 @del[1, 3]: 162054 @del[1, 4]: 208696 @del[1, 6]: 245960 @del[1, 2]: 267437 @del[1, 5]: 533654 @del[1, 1]: 548974 @del[1, 7]: 618810 @del[1, 0]: 619459 > > > On Thu, 16 May 2024 at 07:29, Hou Tao <houtao@huaweicloud.com> wrote: >> Hi, >> >> +cc bpf list >> >> On 5/6/2024 11:19 PM, Chase Hiltz wrote: >>> Hi, >>> >>> I'm writing regarding a rather bizarre scenario that I'm hoping >>> someone could provide insight on. I have a map defined as follows: >>> ``` >>> struct { >>> __uint(type, BPF_MAP_TYPE_LRU_HASH); >>> __uint(max_entries, 1000000); >>> __type(key, struct my_map_key); >>> __type(value, struct my_map_val); >>> __uint(map_flags, BPF_F_NO_COMMON_LRU); >>> __uint(pinning, LIBBPF_PIN_BY_NAME); >>> } my_map SEC(".maps"); >>> ``` >>> I have several fentry/fexit programs that need to perform updates in >>> this map. After a certain number of map entries has been reached, >>> calls to bpf_map_update_elem start returning `-ENOMEM`. As one >>> example, I'm observing a program deployment where we have 816032 >>> entries on a 64 CPU machine, and a certain portion of updates are >>> failing. I'm puzzled as to why this is occurring given that: >>> - The 1M entries should be preallocated upon map creation (since I'm >>> not using `BPF_F_NO_PREALLOC`) >>> - The host machine has over 120G of unused memory available at any given time >>> >>> I've previously reduced max_entries by 25% under the assumption that >>> this would prevent the problem from occurring, but this only caused >> For LRU map with BPF_F_NO_PREALLOC, the number of entries is distributed >> evenly between all CPUs. For your case, each CPU will have 1M/64 = 15625 >> entries. In order to reduce of possibility of ENOMEM error, the right >> way is to increase the value of max_entries instead of decreasing it. >>> map updates to start failing at a lower threshold. I believe that this >>> is a problem with maps using the `BPF_F_NO_COMMON_LRU` flag, my >>> reasoning being that when map updates fail, it occurs consistently for >>> specific CPUs. >> Does the specific CPU always fail afterwards, or does it fail >> periodically ? Is the machine running the bpf program an arm64 host or >> an x86-64 host (namely uname -a) ? I suspect that the problem may be due >> to htab_lock_bucket() which may fail under arm64 host in v5.15. >> >> Could you please check and account the ratio of times when >> htab_lru_map_delete_node() returns 0 ? If the ratio high, it probably >> means that there may be too many overwrites of entries between different >> CPUs (e.g., CPU 0 updates key=X, then CPU 1 updates the same key again). >>> At this time, all machines experiencing the problem are running kernel >>> version 5.15, however I'm not currently able to try out any newer >>> kernels to confirm whether or not the same problem occurs there. Any >>> ideas on what could be responsible for this would be greatly >>> appreciated! >>> >>> Thanks, >>> Chase Hiltz >>> >>> . > ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-05-18 4:49 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-05-06 15:19 bpf_map_update_elem returns -ENOMEM Chase Hiltz 2024-05-08 11:19 ` Donald Hunter 2024-05-16 11:29 ` Hou Tao 2024-05-17 13:52 ` Chase Hiltz 2024-05-18 4:32 ` Hou Tao
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.