* Re: [PATCH v2] netfilter: account ebt_table_info to kmemcg
2019-01-03 3:14 [PATCH v2] netfilter: account ebt_table_info to kmemcg Shakeel Butt via Bridge
@ 2019-01-03 10:14 ` William Kucharski
2019-01-03 16:18 ` Shakeel Butt
2019-01-06 11:00 ` Kirill Tkhai
` (2 subsequent siblings)
3 siblings, 1 reply; 9+ messages in thread
From: William Kucharski @ 2019-01-03 10:14 UTC (permalink / raw)
To: Shakeel Butt
Cc: Michal Hocko, Andrew Morton, Florian Westphal, Kirill Tkhai,
Linux-MM, LKML, syzbot+7713f3aa67be76b1552c, Pablo Neira Ayuso,
Jozsef Kadlecsik, Roopa Prabhu, Nikolay Aleksandrov,
netfilter-devel, coreteam, bridge
> On Jan 2, 2019, at 8:14 PM, Shakeel Butt <shakeelb@google.com> wrote:
>
> countersize = COUNTER_OFFSET(tmp.nentries) * nr_cpu_ids;
> - newinfo = vmalloc(sizeof(*newinfo) + countersize);
> + newinfo = __vmalloc(sizeof(*newinfo) + countersize, GFP_KERNEL_ACCOUNT,
> + PAGE_KERNEL);
> if (!newinfo)
> return -ENOMEM;
>
> if (countersize)
> memset(newinfo->counters, 0, countersize);
>
> - newinfo->entries = vmalloc(tmp.entries_size);
> + newinfo->entries = __vmalloc(tmp.entries_size, GFP_KERNEL_ACCOUNT,
> + PAGE_KERNEL);
> if (!newinfo->entries) {
> ret = -ENOMEM;
> goto free_newinfo;
> --
Just out of curiosity, what are the actual sizes of these areas in typical use
given __vmalloc() will be allocating by the page?
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] netfilter: account ebt_table_info to kmemcg
2019-01-03 10:14 ` William Kucharski
@ 2019-01-03 16:18 ` Shakeel Butt
0 siblings, 0 replies; 9+ messages in thread
From: Shakeel Butt @ 2019-01-03 16:18 UTC (permalink / raw)
To: William Kucharski
Cc: Michal Hocko, Andrew Morton, Florian Westphal, Kirill Tkhai,
Linux-MM, LKML, syzbot+7713f3aa67be76b1552c, Pablo Neira Ayuso,
Jozsef Kadlecsik, Roopa Prabhu, Nikolay Aleksandrov,
netfilter-devel, coreteam, bridge
On Thu, Jan 3, 2019 at 2:15 AM William Kucharski
<william.kucharski@oracle.com> wrote:
>
>
>
> > On Jan 2, 2019, at 8:14 PM, Shakeel Butt <shakeelb@google.com> wrote:
> >
> > countersize = COUNTER_OFFSET(tmp.nentries) * nr_cpu_ids;
> > - newinfo = vmalloc(sizeof(*newinfo) + countersize);
> > + newinfo = __vmalloc(sizeof(*newinfo) + countersize, GFP_KERNEL_ACCOUNT,
> > + PAGE_KERNEL);
> > if (!newinfo)
> > return -ENOMEM;
> >
> > if (countersize)
> > memset(newinfo->counters, 0, countersize);
> >
> > - newinfo->entries = vmalloc(tmp.entries_size);
> > + newinfo->entries = __vmalloc(tmp.entries_size, GFP_KERNEL_ACCOUNT,
> > + PAGE_KERNEL);
> > if (!newinfo->entries) {
> > ret = -ENOMEM;
> > goto free_newinfo;
> > --
>
> Just out of curiosity, what are the actual sizes of these areas in typical use
> given __vmalloc() will be allocating by the page?
>
We don't really use this in production, so, I don't have a good idea
of the size in the typical case. The size depends on the workload. The
motivation behind this patch was the system OOM triggered by a syzbot
running in a restricted memcg.
Shakeel
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] netfilter: account ebt_table_info to kmemcg
2019-01-03 3:14 [PATCH v2] netfilter: account ebt_table_info to kmemcg Shakeel Butt via Bridge
2019-01-03 10:14 ` William Kucharski
@ 2019-01-06 11:00 ` Kirill Tkhai
2019-01-10 9:22 ` Kirill Tkhai
2019-01-10 0:44 ` Pablo Neira Ayuso
2019-01-10 23:57 ` Pablo Neira Ayuso
3 siblings, 1 reply; 9+ messages in thread
From: Kirill Tkhai @ 2019-01-06 11:00 UTC (permalink / raw)
To: Shakeel Butt, Michal Hocko, Andrew Morton, Florian Westphal
Cc: Nikolay Aleksandrov, Roopa Prabhu,
bridge@lists.linux-foundation.org, linux-kernel@vger.kernel.org,
linux-mm@kvack.org, coreteam@netfilter.org,
netfilter-devel@vger.kernel.org,
syzbot+7713f3aa67be76b1552c@syzkaller.appspotmail.com,
Jozsef Kadlecsik, Pablo Neira Ayuso
On 03.01.2019 06:14, Shakeel Butt wrote:
> The [ip,ip6,arp]_tables use x_tables_info internally and the underlying
> memory is already accounted to kmemcg. Do the same for ebtables. The
> syzbot, by using setsockopt(EBT_SO_SET_ENTRIES), was able to OOM the
> whole system from a restricted memcg, a potential DoS.
>
> By accounting the ebt_table_info, the memory used for ebt_table_info can
> be contained within the memcg of the allocating process. However the
> lifetime of ebt_table_info is independent of the allocating process and
> is tied to the network namespace. So, the oom-killer will not be able to
> relieve the memory pressure due to ebt_table_info memory. The memory for
> ebt_table_info is allocated through vmalloc. Currently vmalloc does not
> handle the oom-killed allocating process correctly and one large
> allocation can bypass memcg limit enforcement. So, with this patch,
> at least the small allocations will be contained. For large allocations,
> we need to fix vmalloc.
>
> Reported-by: syzbot+7713f3aa67be76b1552c@syzkaller.appspotmail.com
> Signed-off-by: Shakeel Butt <shakeelb@google.com>
> Cc: Florian Westphal <fw@strlen.de>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
> Cc: Pablo Neira Ayuso <pablo@netfilter.org>
> Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
> Cc: Roopa Prabhu <roopa@cumulusnetworks.com>
> Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Linux MM <linux-mm@kvack.org>
> Cc: netfilter-devel@vger.kernel.org
> Cc: coreteam@netfilter.org
> Cc: bridge@lists.linux-foundation.org
> Cc: LKML <linux-kernel@vger.kernel.org>
> ---
> Changelog since v1:
> - More descriptive commit message.
Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com>
>
> net/bridge/netfilter/ebtables.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c
> index 491828713e0b..5e55cef0cec3 100644
> --- a/net/bridge/netfilter/ebtables.c
> +++ b/net/bridge/netfilter/ebtables.c
> @@ -1137,14 +1137,16 @@ static int do_replace(struct net *net, const void __user *user,
> tmp.name[sizeof(tmp.name) - 1] = 0;
>
> countersize = COUNTER_OFFSET(tmp.nentries) * nr_cpu_ids;
> - newinfo = vmalloc(sizeof(*newinfo) + countersize);
> + newinfo = __vmalloc(sizeof(*newinfo) + countersize, GFP_KERNEL_ACCOUNT,
> + PAGE_KERNEL);
> if (!newinfo)
> return -ENOMEM;
>
> if (countersize)
> memset(newinfo->counters, 0, countersize);
>
> - newinfo->entries = vmalloc(tmp.entries_size);
> + newinfo->entries = __vmalloc(tmp.entries_size, GFP_KERNEL_ACCOUNT,
> + PAGE_KERNEL);
> if (!newinfo->entries) {
> ret = -ENOMEM;
> goto free_newinfo;
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] netfilter: account ebt_table_info to kmemcg
2019-01-06 11:00 ` Kirill Tkhai
@ 2019-01-10 9:22 ` Kirill Tkhai
2019-01-10 9:41 ` Michal Hocko
0 siblings, 1 reply; 9+ messages in thread
From: Kirill Tkhai @ 2019-01-10 9:22 UTC (permalink / raw)
To: Shakeel Butt, Michal Hocko, Andrew Morton, Florian Westphal
Cc: linux-mm, linux-kernel, syzbot+7713f3aa67be76b1552c,
Pablo Neira Ayuso, Jozsef Kadlecsik, Roopa Prabhu,
Nikolay Aleksandrov, netfilter-devel, coreteam, bridge
On 06.01.2019 14:00, Kirill Tkhai wrote:
> On 03.01.2019 06:14, Shakeel Butt wrote:
>> The [ip,ip6,arp]_tables use x_tables_info internally and the underlying
>> memory is already accounted to kmemcg. Do the same for ebtables. The
>> syzbot, by using setsockopt(EBT_SO_SET_ENTRIES), was able to OOM the
>> whole system from a restricted memcg, a potential DoS.
>>
>> By accounting the ebt_table_info, the memory used for ebt_table_info can
>> be contained within the memcg of the allocating process. However the
>> lifetime of ebt_table_info is independent of the allocating process and
>> is tied to the network namespace. So, the oom-killer will not be able to
>> relieve the memory pressure due to ebt_table_info memory. The memory for
>> ebt_table_info is allocated through vmalloc. Currently vmalloc does not
>> handle the oom-killed allocating process correctly and one large
>> allocation can bypass memcg limit enforcement. So, with this patch,
>> at least the small allocations will be contained. For large allocations,
>> we need to fix vmalloc.
>>
>> Reported-by: syzbot+7713f3aa67be76b1552c@syzkaller.appspotmail.com
>> Signed-off-by: Shakeel Butt <shakeelb@google.com>
>> Cc: Florian Westphal <fw@strlen.de>
>> Cc: Michal Hocko <mhocko@kernel.org>
>> Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
>> Cc: Pablo Neira Ayuso <pablo@netfilter.org>
>> Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
>> Cc: Roopa Prabhu <roopa@cumulusnetworks.com>
>> Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Linux MM <linux-mm@kvack.org>
>> Cc: netfilter-devel@vger.kernel.org
>> Cc: coreteam@netfilter.org
>> Cc: bridge@lists.linux-foundation.org
>> Cc: LKML <linux-kernel@vger.kernel.org>
>> ---
>> Changelog since v1:
>> - More descriptive commit message.
>
> Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com>
>
>>
>> net/bridge/netfilter/ebtables.c | 6 ++++--
>> 1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c
>> index 491828713e0b..5e55cef0cec3 100644
>> --- a/net/bridge/netfilter/ebtables.c
>> +++ b/net/bridge/netfilter/ebtables.c
>> @@ -1137,14 +1137,16 @@ static int do_replace(struct net *net, const void __user *user,
>> tmp.name[sizeof(tmp.name) - 1] = 0;
>>
>> countersize = COUNTER_OFFSET(tmp.nentries) * nr_cpu_ids;
>> - newinfo = vmalloc(sizeof(*newinfo) + countersize);
>> + newinfo = __vmalloc(sizeof(*newinfo) + countersize, GFP_KERNEL_ACCOUNT,
>> + PAGE_KERNEL);
Do we need GFP_HIGHMEM here?
>> if (!newinfo)
>> return -ENOMEM;
>>
>> if (countersize)
>> memset(newinfo->counters, 0, countersize);
>>
>> - newinfo->entries = vmalloc(tmp.entries_size);
>> + newinfo->entries = __vmalloc(tmp.entries_size, GFP_KERNEL_ACCOUNT,
>> + PAGE_KERNEL);
>> if (!newinfo->entries) {
>> ret = -ENOMEM;
>> goto free_newinfo;
>>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] netfilter: account ebt_table_info to kmemcg
2019-01-10 9:22 ` Kirill Tkhai
@ 2019-01-10 9:41 ` Michal Hocko
2019-01-10 9:48 ` Kirill Tkhai
0 siblings, 1 reply; 9+ messages in thread
From: Michal Hocko @ 2019-01-10 9:41 UTC (permalink / raw)
To: Kirill Tkhai
Cc: Shakeel Butt, Andrew Morton, Florian Westphal, linux-mm,
linux-kernel, syzbot+7713f3aa67be76b1552c, Pablo Neira Ayuso,
Jozsef Kadlecsik, Roopa Prabhu, Nikolay Aleksandrov,
netfilter-devel, coreteam, bridge
On Thu 10-01-19 12:22:09, Kirill Tkhai wrote:
[...]
> >> diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c
> >> index 491828713e0b..5e55cef0cec3 100644
> >> --- a/net/bridge/netfilter/ebtables.c
> >> +++ b/net/bridge/netfilter/ebtables.c
> >> @@ -1137,14 +1137,16 @@ static int do_replace(struct net *net, const void __user *user,
> >> tmp.name[sizeof(tmp.name) - 1] = 0;
> >>
> >> countersize = COUNTER_OFFSET(tmp.nentries) * nr_cpu_ids;
> >> - newinfo = vmalloc(sizeof(*newinfo) + countersize);
> >> + newinfo = __vmalloc(sizeof(*newinfo) + countersize, GFP_KERNEL_ACCOUNT,
> >> + PAGE_KERNEL);
>
> Do we need GFP_HIGHMEM here?
No. vmalloc adds __GPF_HIGHMEM implicitly (see __vmalloc_area_node).
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] netfilter: account ebt_table_info to kmemcg
2019-01-10 9:41 ` Michal Hocko
@ 2019-01-10 9:48 ` Kirill Tkhai
0 siblings, 0 replies; 9+ messages in thread
From: Kirill Tkhai @ 2019-01-10 9:48 UTC (permalink / raw)
To: Michal Hocko
Cc: Shakeel Butt, Andrew Morton, Florian Westphal, linux-mm,
linux-kernel, syzbot+7713f3aa67be76b1552c, Pablo Neira Ayuso,
Jozsef Kadlecsik, Roopa Prabhu, Nikolay Aleksandrov,
netfilter-devel, coreteam, bridge
On 10.01.2019 12:41, Michal Hocko wrote:
> On Thu 10-01-19 12:22:09, Kirill Tkhai wrote:
> [...]
>>>> diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c
>>>> index 491828713e0b..5e55cef0cec3 100644
>>>> --- a/net/bridge/netfilter/ebtables.c
>>>> +++ b/net/bridge/netfilter/ebtables.c
>>>> @@ -1137,14 +1137,16 @@ static int do_replace(struct net *net, const void __user *user,
>>>> tmp.name[sizeof(tmp.name) - 1] = 0;
>>>>
>>>> countersize = COUNTER_OFFSET(tmp.nentries) * nr_cpu_ids;
>>>> - newinfo = vmalloc(sizeof(*newinfo) + countersize);
>>>> + newinfo = __vmalloc(sizeof(*newinfo) + countersize, GFP_KERNEL_ACCOUNT,
>>>> + PAGE_KERNEL);
>>
>> Do we need GFP_HIGHMEM here?
>
> No. vmalloc adds __GPF_HIGHMEM implicitly (see __vmalloc_area_node).
Then OK, thanks for the explanation.
Kirill
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] netfilter: account ebt_table_info to kmemcg
2019-01-03 3:14 [PATCH v2] netfilter: account ebt_table_info to kmemcg Shakeel Butt via Bridge
2019-01-03 10:14 ` William Kucharski
2019-01-06 11:00 ` Kirill Tkhai
@ 2019-01-10 0:44 ` Pablo Neira Ayuso
2019-01-10 23:57 ` Pablo Neira Ayuso
3 siblings, 0 replies; 9+ messages in thread
From: Pablo Neira Ayuso @ 2019-01-10 0:44 UTC (permalink / raw)
To: Shakeel Butt
Cc: coreteam, Nikolay Aleksandrov, Roopa Prabhu, bridge,
Florian Westphal, linux-kernel, Michal Hocko, linux-mm,
Kirill Tkhai, netfilter-devel, syzbot+7713f3aa67be76b1552c,
Jozsef Kadlecsik, Andrew Morton
On Wed, Jan 02, 2019 at 07:14:31PM -0800, Shakeel Butt wrote:
> The [ip,ip6,arp]_tables use x_tables_info internally and the underlying
> memory is already accounted to kmemcg. Do the same for ebtables. The
> syzbot, by using setsockopt(EBT_SO_SET_ENTRIES), was able to OOM the
> whole system from a restricted memcg, a potential DoS.
>
> By accounting the ebt_table_info, the memory used for ebt_table_info can
> be contained within the memcg of the allocating process. However the
> lifetime of ebt_table_info is independent of the allocating process and
> is tied to the network namespace. So, the oom-killer will not be able to
> relieve the memory pressure due to ebt_table_info memory. The memory for
> ebt_table_info is allocated through vmalloc. Currently vmalloc does not
> handle the oom-killed allocating process correctly and one large
> allocation can bypass memcg limit enforcement. So, with this patch,
> at least the small allocations will be contained. For large allocations,
> we need to fix vmalloc.
Fine with this -mm?
If no objections, I'll apply this to the netfilter tree. Thanks.
> Reported-by: syzbot+7713f3aa67be76b1552c@syzkaller.appspotmail.com
> Signed-off-by: Shakeel Butt <shakeelb@google.com>
> Cc: Florian Westphal <fw@strlen.de>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
> Cc: Pablo Neira Ayuso <pablo@netfilter.org>
> Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
> Cc: Roopa Prabhu <roopa@cumulusnetworks.com>
> Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Linux MM <linux-mm@kvack.org>
> Cc: netfilter-devel@vger.kernel.org
> Cc: coreteam@netfilter.org
> Cc: bridge@lists.linux-foundation.org
> Cc: LKML <linux-kernel@vger.kernel.org>
> ---
> Changelog since v1:
> - More descriptive commit message.
>
> net/bridge/netfilter/ebtables.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c
> index 491828713e0b..5e55cef0cec3 100644
> --- a/net/bridge/netfilter/ebtables.c
> +++ b/net/bridge/netfilter/ebtables.c
> @@ -1137,14 +1137,16 @@ static int do_replace(struct net *net, const void __user *user,
> tmp.name[sizeof(tmp.name) - 1] = 0;
>
> countersize = COUNTER_OFFSET(tmp.nentries) * nr_cpu_ids;
> - newinfo = vmalloc(sizeof(*newinfo) + countersize);
> + newinfo = __vmalloc(sizeof(*newinfo) + countersize, GFP_KERNEL_ACCOUNT,
> + PAGE_KERNEL);
> if (!newinfo)
> return -ENOMEM;
>
> if (countersize)
> memset(newinfo->counters, 0, countersize);
>
> - newinfo->entries = vmalloc(tmp.entries_size);
> + newinfo->entries = __vmalloc(tmp.entries_size, GFP_KERNEL_ACCOUNT,
> + PAGE_KERNEL);
> if (!newinfo->entries) {
> ret = -ENOMEM;
> goto free_newinfo;
> --
> 2.20.1.415.g653613c723-goog
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] netfilter: account ebt_table_info to kmemcg
2019-01-03 3:14 [PATCH v2] netfilter: account ebt_table_info to kmemcg Shakeel Butt via Bridge
` (2 preceding siblings ...)
2019-01-10 0:44 ` Pablo Neira Ayuso
@ 2019-01-10 23:57 ` Pablo Neira Ayuso
3 siblings, 0 replies; 9+ messages in thread
From: Pablo Neira Ayuso @ 2019-01-10 23:57 UTC (permalink / raw)
To: Shakeel Butt
Cc: Michal Hocko, Andrew Morton, Florian Westphal, Kirill Tkhai,
linux-mm, linux-kernel, syzbot+7713f3aa67be76b1552c,
Jozsef Kadlecsik, Roopa Prabhu, Nikolay Aleksandrov,
netfilter-devel, coreteam, bridge
On Wed, Jan 02, 2019 at 07:14:31PM -0800, Shakeel Butt wrote:
> The [ip,ip6,arp]_tables use x_tables_info internally and the underlying
> memory is already accounted to kmemcg. Do the same for ebtables. The
> syzbot, by using setsockopt(EBT_SO_SET_ENTRIES), was able to OOM the
> whole system from a restricted memcg, a potential DoS.
>
> By accounting the ebt_table_info, the memory used for ebt_table_info can
> be contained within the memcg of the allocating process. However the
> lifetime of ebt_table_info is independent of the allocating process and
> is tied to the network namespace. So, the oom-killer will not be able to
> relieve the memory pressure due to ebt_table_info memory. The memory for
> ebt_table_info is allocated through vmalloc. Currently vmalloc does not
> handle the oom-killed allocating process correctly and one large
> allocation can bypass memcg limit enforcement. So, with this patch,
> at least the small allocations will be contained. For large allocations,
> we need to fix vmalloc.
OK, patch is applied, thanks.
^ permalink raw reply [flat|nested] 9+ messages in thread