* [PATCH bpf] bpf: fix memory leak in lpm_trie map_free callback function
@ 2018-02-12 21:58 Yonghong Song
2018-02-12 22:15 ` Eric Dumazet
2018-02-14 1:11 ` Daniel Borkmann
0 siblings, 2 replies; 5+ messages in thread
From: Yonghong Song @ 2018-02-12 21:58 UTC (permalink / raw)
To: ast, daniel, malat, netdev; +Cc: kernel-team
There is a memory leak happening in lpm_trie map_free callback
function trie_free. The trie structure itself does not get freed.
Also, trie_free function did not do synchronize_rcu before freeing
various data structures. This is incorrect as some rcu_read_lock
region(s) for lookup, update, delete or get_next_key may not complete yet.
The fix is to add synchronize_rcu in the beginning of trie_free.
The useless spin_lock is removed from this function as well.
Fixes: b95a5c4db09b ("bpf: add a longest prefix match trie map implementation")
Reported-by: Mathieu Malaterre <malat@debian.org>
Reported-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
---
kernel/bpf/lpm_trie.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
index 7b469d1..9b41ea4 100644
--- a/kernel/bpf/lpm_trie.c
+++ b/kernel/bpf/lpm_trie.c
@@ -555,7 +555,12 @@ static void trie_free(struct bpf_map *map)
struct lpm_trie_node __rcu **slot;
struct lpm_trie_node *node;
- raw_spin_lock(&trie->lock);
+ /* at this point bpf_prog->aux->refcnt == 0 and this map->refcnt == 0,
+ * so the programs (can be more than one that used this map) were
+ * disconnected from events. Wait for outstanding programs to complete
+ * update/lookup/delete/get_next_key and free the trie.
+ */
+ synchronize_rcu();
/* Always start at the root and walk down to a node that has no
* children. Then free that node, nullify its reference in the parent
@@ -588,7 +593,7 @@ static void trie_free(struct bpf_map *map)
}
unlock:
- raw_spin_unlock(&trie->lock);
+ kfree(trie);
}
static int trie_get_next_key(struct bpf_map *map, void *_key, void *_next_key)
--
2.9.5
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH bpf] bpf: fix memory leak in lpm_trie map_free callback function
2018-02-12 21:58 [PATCH bpf] bpf: fix memory leak in lpm_trie map_free callback function Yonghong Song
@ 2018-02-12 22:15 ` Eric Dumazet
2018-02-12 22:20 ` Eric Dumazet
2018-02-14 1:11 ` Daniel Borkmann
1 sibling, 1 reply; 5+ messages in thread
From: Eric Dumazet @ 2018-02-12 22:15 UTC (permalink / raw)
To: Yonghong Song, ast, daniel, malat, netdev; +Cc: kernel-team
On Mon, 2018-02-12 at 13:58 -0800, Yonghong Song wrote:
> There is a memory leak happening in lpm_trie map_free callback
> function trie_free. The trie structure itself does not get freed.
>
> Also, trie_free function did not do synchronize_rcu before freeing
> various data structures. This is incorrect as some rcu_read_lock
> region(s) for lookup, update, delete or get_next_key may not complete yet.
> The fix is to add synchronize_rcu in the beginning of trie_free.
> The useless spin_lock is removed from this function as well.
>
> Fixes: b95a5c4db09b ("bpf: add a longest prefix match trie map implementation")
> Reported-by: Mathieu Malaterre <malat@debian.org>
> Reported-by: Alexei Starovoitov <ast@kernel.org>
> Tested-by: Mathieu Malaterre <malat@debian.org>
> Signed-off-by: Yonghong Song <yhs@fb.com>
> ---
> kernel/bpf/lpm_trie.c | 9 +++++++--
> 1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
> index 7b469d1..9b41ea4 100644
> --- a/kernel/bpf/lpm_trie.c
> +++ b/kernel/bpf/lpm_trie.c
> @@ -555,7 +555,12 @@ static void trie_free(struct bpf_map *map)
> struct lpm_trie_node __rcu **slot;
> struct lpm_trie_node *node;
>
> - raw_spin_lock(&trie->lock);
> + /* at this point bpf_prog->aux->refcnt == 0 and this map->refcnt == 0,
> + * so the programs (can be more than one that used this map) were
> + * disconnected from events. Wait for outstanding programs to complete
> + * update/lookup/delete/get_next_key and free the trie.
> + */
> + synchronize_rcu();
>
Please do not do that.
Use kfree_rcu() instead (adding one struct rcu_head in struct lpm_trie)
> /* Always start at the root and walk down to a node that has no
> * children. Then free that node, nullify its reference in the parent
> @@ -588,7 +593,7 @@ static void trie_free(struct bpf_map *map)
> }
>
> unlock:
> - raw_spin_unlock(&trie->lock);
> + kfree(trie);
> }
>
> static int trie_get_next_key(struct bpf_map *map, void *_key, void *_next_key)
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH bpf] bpf: fix memory leak in lpm_trie map_free callback function
2018-02-12 22:15 ` Eric Dumazet
@ 2018-02-12 22:20 ` Eric Dumazet
0 siblings, 0 replies; 5+ messages in thread
From: Eric Dumazet @ 2018-02-12 22:20 UTC (permalink / raw)
To: Yonghong Song, ast, daniel, malat, netdev; +Cc: kernel-team
On Mon, 2018-02-12 at 14:15 -0800, Eric Dumazet wrote:
> On Mon, 2018-02-12 at 13:58 -0800, Yonghong Song wrote:
> > There is a memory leak happening in lpm_trie map_free callback
> > function trie_free. The trie structure itself does not get freed.
> >
> > Also, trie_free function did not do synchronize_rcu before freeing
> > various data structures. This is incorrect as some rcu_read_lock
> > region(s) for lookup, update, delete or get_next_key may not complete yet.
> > The fix is to add synchronize_rcu in the beginning of trie_free.
> > The useless spin_lock is removed from this function as well.
> >
> > Fixes: b95a5c4db09b ("bpf: add a longest prefix match trie map implementation")
> > Reported-by: Mathieu Malaterre <malat@debian.org>
> > Reported-by: Alexei Starovoitov <ast@kernel.org>
> > Tested-by: Mathieu Malaterre <malat@debian.org>
> > Signed-off-by: Yonghong Song <yhs@fb.com>
> > ---
> > kernel/bpf/lpm_trie.c | 9 +++++++--
> > 1 file changed, 7 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
> > index 7b469d1..9b41ea4 100644
> > --- a/kernel/bpf/lpm_trie.c
> > +++ b/kernel/bpf/lpm_trie.c
> > @@ -555,7 +555,12 @@ static void trie_free(struct bpf_map *map)
> > struct lpm_trie_node __rcu **slot;
> > struct lpm_trie_node *node;
> >
> > - raw_spin_lock(&trie->lock);
> > + /* at this point bpf_prog->aux->refcnt == 0 and this map->refcnt == 0,
> > + * so the programs (can be more than one that used this map) were
> > + * disconnected from events. Wait for outstanding programs to complete
> > + * update/lookup/delete/get_next_key and free the trie.
> > + */
> > + synchronize_rcu();
> >
>
> Please do not do that.
>
> Use kfree_rcu() instead (adding one struct rcu_head in struct lpm_trie)
Oh well, I take this back. It looks we heavily use synchronize_rcu()
all over the places for ->map_free() already.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH bpf] bpf: fix memory leak in lpm_trie map_free callback function
2018-02-12 21:58 [PATCH bpf] bpf: fix memory leak in lpm_trie map_free callback function Yonghong Song
2018-02-12 22:15 ` Eric Dumazet
@ 2018-02-14 1:11 ` Daniel Borkmann
2018-02-14 1:46 ` Yonghong Song
1 sibling, 1 reply; 5+ messages in thread
From: Daniel Borkmann @ 2018-02-14 1:11 UTC (permalink / raw)
To: Yonghong Song, ast, malat, netdev; +Cc: kernel-team
Hi Yonghong,
On 02/12/2018 10:58 PM, Yonghong Song wrote:
> There is a memory leak happening in lpm_trie map_free callback
> function trie_free. The trie structure itself does not get freed.
>
> Also, trie_free function did not do synchronize_rcu before freeing
> various data structures. This is incorrect as some rcu_read_lock
> region(s) for lookup, update, delete or get_next_key may not complete yet.
> The fix is to add synchronize_rcu in the beginning of trie_free.
> The useless spin_lock is removed from this function as well.
>
> Fixes: b95a5c4db09b ("bpf: add a longest prefix match trie map implementation")
> Reported-by: Mathieu Malaterre <malat@debian.org>
> Reported-by: Alexei Starovoitov <ast@kernel.org>
> Tested-by: Mathieu Malaterre <malat@debian.org>
> Signed-off-by: Yonghong Song <yhs@fb.com>
> ---
> kernel/bpf/lpm_trie.c | 9 +++++++--
> 1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
> index 7b469d1..9b41ea4 100644
> --- a/kernel/bpf/lpm_trie.c
> +++ b/kernel/bpf/lpm_trie.c
> @@ -555,7 +555,12 @@ static void trie_free(struct bpf_map *map)
> struct lpm_trie_node __rcu **slot;
> struct lpm_trie_node *node;
>
> - raw_spin_lock(&trie->lock);
> + /* at this point bpf_prog->aux->refcnt == 0 and this map->refcnt == 0,
> + * so the programs (can be more than one that used this map) were
> + * disconnected from events. Wait for outstanding programs to complete
> + * update/lookup/delete/get_next_key and free the trie.
> + */
I think the first part of the comment is slightly misleading, e.g. map refcount
could drop to zero also for various other reasons, not strictly due to prog
refcnt dropping to 0, so I would just keep the second part with 'Wait for
outstanding [...]' which is okay since this is eventually what is relevant in
this context.
> + synchronize_rcu();
>
> /* Always start at the root and walk down to a node that has no
> * children. Then free that node, nullify its reference in the parent
> @@ -588,7 +593,7 @@ static void trie_free(struct bpf_map *map)
> }
>
> unlock:
> - raw_spin_unlock(&trie->lock);
Could you do a minor change here: since we get rid of the locking as there's
no user left anymore after grace period, it would be great if you could also
change the 'unlock' label name above.
Other than that, good to go, thanks!
> + kfree(trie);
> }
>
> static int trie_get_next_key(struct bpf_map *map, void *_key, void *_next_key)
>
Thanks,
Daniel
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH bpf] bpf: fix memory leak in lpm_trie map_free callback function
2018-02-14 1:11 ` Daniel Borkmann
@ 2018-02-14 1:46 ` Yonghong Song
0 siblings, 0 replies; 5+ messages in thread
From: Yonghong Song @ 2018-02-14 1:46 UTC (permalink / raw)
To: Daniel Borkmann, ast, malat, netdev; +Cc: kernel-team
On 2/13/18 5:11 PM, Daniel Borkmann wrote:
> Hi Yonghong,
>
> On 02/12/2018 10:58 PM, Yonghong Song wrote:
>> There is a memory leak happening in lpm_trie map_free callback
>> function trie_free. The trie structure itself does not get freed.
>>
>> Also, trie_free function did not do synchronize_rcu before freeing
>> various data structures. This is incorrect as some rcu_read_lock
>> region(s) for lookup, update, delete or get_next_key may not complete yet.
>> The fix is to add synchronize_rcu in the beginning of trie_free.
>> The useless spin_lock is removed from this function as well.
>>
>> Fixes: b95a5c4db09b ("bpf: add a longest prefix match trie map implementation")
>> Reported-by: Mathieu Malaterre <malat@debian.org>
>> Reported-by: Alexei Starovoitov <ast@kernel.org>
>> Tested-by: Mathieu Malaterre <malat@debian.org>
>> Signed-off-by: Yonghong Song <yhs@fb.com>
>> ---
>> kernel/bpf/lpm_trie.c | 9 +++++++--
>> 1 file changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
>> index 7b469d1..9b41ea4 100644
>> --- a/kernel/bpf/lpm_trie.c
>> +++ b/kernel/bpf/lpm_trie.c
>> @@ -555,7 +555,12 @@ static void trie_free(struct bpf_map *map)
>> struct lpm_trie_node __rcu **slot;
>> struct lpm_trie_node *node;
>>
>> - raw_spin_lock(&trie->lock);
>> + /* at this point bpf_prog->aux->refcnt == 0 and this map->refcnt == 0,
>> + * so the programs (can be more than one that used this map) were
>> + * disconnected from events. Wait for outstanding programs to complete
>> + * update/lookup/delete/get_next_key and free the trie.
>> + */
>
> I think the first part of the comment is slightly misleading, e.g. map refcount
> could drop to zero also for various other reasons, not strictly due to prog
> refcnt dropping to 0, so I would just keep the second part with 'Wait for
Oh, yes. Make sense. Copy-paste without thinking.
We have similar comments in virtually all other places of using
synchronize_rcu under kernel/bpf. This can be cleaned up later though.
> outstanding [...]' which is okay since this is eventually what is relevant in
> this context.
Will do.
>
>> + synchronize_rcu();
>>
>> /* Always start at the root and walk down to a node that has no
>> * children. Then free that node, nullify its reference in the parent
>> @@ -588,7 +593,7 @@ static void trie_free(struct bpf_map *map)
>> }
>>
>> unlock:
>> - raw_spin_unlock(&trie->lock);
>
> Could you do a minor change here: since we get rid of the locking as there's
> no user left anymore after grace period, it would be great if you could also
> change the 'unlock' label name above.
Will do.
>
> Other than that, good to go, thanks!
>
>> + kfree(trie);
>> }
>>
>> static int trie_get_next_key(struct bpf_map *map, void *_key, void *_next_key)
>>
>
> Thanks,
> Daniel
>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2018-02-14 1:47 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-02-12 21:58 [PATCH bpf] bpf: fix memory leak in lpm_trie map_free callback function Yonghong Song
2018-02-12 22:15 ` Eric Dumazet
2018-02-12 22:20 ` Eric Dumazet
2018-02-14 1:11 ` Daniel Borkmann
2018-02-14 1:46 ` Yonghong Song
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).