From: ebiederm@xmission.com (Eric W. Biederman)
To: Alexander Kuznetsov <wwfq@yandex-team.ru>
Cc: netdev@vger.kernel.org, zeil@yandex-team.ru, davem@davemloft.net,
dmtrmonakhov@yandex-team.ru
Subject: Re: [PATCH] ipv6: enable net.ipv6.route.max_size sysctl in network namespace
Date: Tue, 12 Oct 2021 11:38:48 -0500 [thread overview]
Message-ID: <87fst642qv.fsf@disp2133> (raw)
In-Reply-To: <20211012073914.27775-1-wwfq@yandex-team.ru> (Alexander Kuznetsov's message of "Tue, 12 Oct 2021 10:39:14 +0300")
Alexander Kuznetsov <wwfq@yandex-team.ru> writes:
> We want to increase route cache size in network namespace
> created with user namespace. Currently ipv6 route settings
> are disabled for non-initial network namespaces.
>
> We want to allow to write this sysctl only for users
> from host(initial) user ns.
This is subtle, and arguably broken. Having permission tests
inside write methods has been proven problematic over the years.
That is because it usually is pretty simple to fool some application
that has the appropriate permissions to write to a file descriptor
it did not open.
From what I can see you are doing this convoluted test to avoid
performing the analysis to see if it is safe to allow the route
cache size to be changed from inside the namespace.
Usually limits like this exist more as to catch it when things
go crazy rather than to limit resource consumption in general.
So I expect it is reasonable to relax this limit possibly after
ensuring you have memory cgroup annotation on the allocations.
I suspect the routing table can grow large enough memory cgroup
annotations need to be present already.
Eric
> Signed-off-by: Alexander Kuznetsov <wwfq@yandex-team.ru>
> Acked-by: Dmitry Yakunin <zeil@yandex-team.ru>
> Acked-by: Dmitry Monakhov <dmtrmonakhov@yandex-team.ru>
> ---
> net/ipv6/route.c | 32 ++++++++++++++++++++------------
> 1 file changed, 20 insertions(+), 12 deletions(-)
>
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index dbc2240..2d96c9f 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -6303,13 +6303,21 @@ static int ipv6_sysctl_rtcache_flush(struct ctl_table *ctl, int write,
> return 0;
> }
>
> +static int ipv6_sysctl_route_max_size(struct ctl_table *ctl, int write,
> + void *buffer, size_t *lenp, loff_t *ppos)
> +{
> + if (write && current_user_ns() != &init_user_ns)
> + return -EPERM;
> +
> + return proc_dointvec(ctl, write, buffer, lenp, ppos);
> +}
> static struct ctl_table ipv6_route_table_template[] = {
> {
> - .procname = "flush",
> - .data = &init_net.ipv6.sysctl.flush_delay,
> + .procname = "max_size",
> + .data = &init_net.ipv6.sysctl.ip6_rt_max_size,
> .maxlen = sizeof(int),
> - .mode = 0200,
> - .proc_handler = ipv6_sysctl_rtcache_flush
> + .mode = 0644,
> + .proc_handler = ipv6_sysctl_route_max_size,
> },
> {
> .procname = "gc_thresh",
> @@ -6319,11 +6327,11 @@ static struct ctl_table ipv6_route_table_template[] = {
> .proc_handler = proc_dointvec,
> },
> {
> - .procname = "max_size",
> - .data = &init_net.ipv6.sysctl.ip6_rt_max_size,
> + .procname = "flush",
> + .data = &init_net.ipv6.sysctl.flush_delay,
> .maxlen = sizeof(int),
> - .mode = 0644,
> - .proc_handler = proc_dointvec,
> + .mode = 0200,
> + .proc_handler = ipv6_sysctl_rtcache_flush
> },
> {
> .procname = "gc_min_interval",
> @@ -6395,10 +6403,10 @@ struct ctl_table * __net_init ipv6_route_sysctl_init(struct net *net)
> GFP_KERNEL);
>
> if (table) {
> - table[0].data = &net->ipv6.sysctl.flush_delay;
> - table[0].extra1 = net;
> + table[0].data = &net->ipv6.sysctl.ip6_rt_max_size;
> table[1].data = &net->ipv6.ip6_dst_ops.gc_thresh;
> - table[2].data = &net->ipv6.sysctl.ip6_rt_max_size;
> + table[2].data = &net->ipv6.sysctl.flush_delay;
> + table[2].extra1 = net;
> table[3].data = &net->ipv6.sysctl.ip6_rt_gc_min_interval;
> table[4].data = &net->ipv6.sysctl.ip6_rt_gc_timeout;
> table[5].data = &net->ipv6.sysctl.ip6_rt_gc_interval;
> @@ -6410,7 +6418,7 @@ struct ctl_table * __net_init ipv6_route_sysctl_init(struct net *net)
>
> /* Don't export sysctls to unprivileged users */
> if (net->user_ns != &init_user_ns)
> - table[0].procname = NULL;
> + table[1].procname = NULL;
> }
>
> return table;
prev parent reply other threads:[~2021-10-12 16:39 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-10-12 7:39 [PATCH] ipv6: enable net.ipv6.route.max_size sysctl in network namespace Alexander Kuznetsov
2021-10-12 16:38 ` Eric W. Biederman [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87fst642qv.fsf@disp2133 \
--to=ebiederm@xmission.com \
--cc=davem@davemloft.net \
--cc=dmtrmonakhov@yandex-team.ru \
--cc=netdev@vger.kernel.org \
--cc=wwfq@yandex-team.ru \
--cc=zeil@yandex-team.ru \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).