From: ebiederm@xmission.com (Eric W. Biederman)
To: Francesco Ruggeri <fruggeri@aristanetworks.com>
Cc: netdev@vger.kernel.org, Eric Dumazet <eric.dumazet@gmail.com>,
David Miller <davem@davemloft.ne>
Subject: Re: Race condition in ipv6 code
Date: Thu, 12 Jan 2012 17:17:32 -0800 [thread overview]
Message-ID: <m1ty40tn83.fsf@fess.ebiederm.org> (raw)
In-Reply-To: <CA+HUmGhJ-YvFMRnKzb8RqH6COZ=erp1SajVcb1rq1vi2k9OBGw@mail.gmail.com> (Francesco Ruggeri's message of "Wed, 11 Jan 2012 18:13:55 -0800")
Francesco Ruggeri <fruggeri@aristanetworks.com> writes:
> We have hit a race condition in ipv6 code when setting
> /proc/sys/net/ipv6/conf/*/forwarding. This happens when the syscall
> has to be restarted.
>
> I wonder if anyone else has run into the same issue.
>
> The current sequence in addrconf_sysctl_forward() and
> addrconf_fixup_forwarding() is as follows:
> - change the parameter in idev->cnf.forwarding (using proc_dointvec())
> - try to get the rtnl lock
> - if cannot get the lock then restore the original value in
> idev->cnf.forwarding and restart the syscall.
>
> While this is going on, the ipv6 code may access idev->cnf.forwarding
> and get an incorrect value.
> In our case we were in addrconf_ifdown (holding the rtnl lock) and
> calling __ipv6_ifa_notify(RTM_DELADDR, ifa) on the idev->addr_list
> entries.
> __ipv6_ifa_notify() only invokes addrconf_leave_anycast() if
> idev->cnf.forwarding is set. Because a process trying to set
> forwarding to 0 was stuck in the restart_syscall sequence above
> flipping the flag on and off, we erroneously read the flag as 0, with
> the result that addrconf_leave_anycast() was not invoked, some
> idev->ac_list entries were never released, idev was never freed and
> kept a reference to its net_device, and the net_device was never freed
> and caused the "unregister_netdevice: waiting for xxx to become free"
> message forever. In our case this was a vlan interfaces that was being
> deleted, so we ended up getting stuck in vlan_ioctl_handler() holding
> vlan_ioctl_mutex with further bad consequences.
> The following diffs (for 2.6.38, but the same logic seems to be used
> in 3.2) address the issue by modifying idev->cnf.forwarding only after
> the rtnl lock is acquired. There is a similar situation for
> disable_ipv6.
> Any comments are appreciated.
Interesting. So ultimately the problem is not the syscall restart
although that exacerbates it, the problem is that we expect
idev->cnf.forwarding to be protected by the rtnl_lock and it is not.
At first read through your patch looks good. I am a bit worried that
we have some versions of the value: aka
net->ipv6.devconf_dflt->forwarding not protected by the rtnl_lock
and other version of the value protected by the rtnl_lock.
That just seems confusing.
We can't hold the rtnl_lock around proc_dointvec because that can sleep
indefinitely in copy_from_user. So it looks like your change to create
a temporary ctl_table and call proc_dointvec seems very reasonable,
and necessary however we do this.
I don't know if there are other places that need the rtnl_lock that
but your patch below looks like it makes things better for all of
the right reasons. So on that score.
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Unless someone wants to volunteer to sort out the impedance mismatch
between these tunables and the sysctl infrastructure. I suggest
you resend this patch to David with [PATCH] in the subject line.
I would also suggest a little clearer description why
idev->cnf.forwarding and idev->cnf.disable_ipv6 need rntl_lock
protection.
But overall this looks like a pretty obvious bug fix, to the
problem that we need the rtnl_lock to protect idev->cnf.forwarding,
and we currently allow updates to idev->cnf.forwarding without
holding the rtnl_lock.
Eric
> Francesco Ruggeri
>
>
> --- a/net/ipv6/addrconf.c 2011-03-14 18:20:32.000000000 -0700
> +++ b/net/ipv6/addrconf.c 2012-01-10 12:56:01.458880292 -0800
> @@ -507,29 +507,31 @@ static void addrconf_forward_change(stru
> rcu_read_unlock();
> }
>
> -static int addrconf_fixup_forwarding(struct ctl_table *table, int *p, int old)
> +static int addrconf_fixup_forwarding(struct ctl_table *table, int *p, int newf)
> {
> struct net *net;
> + int old;
>
> net = (struct net *)table->extra2;
> - if (p == &net->ipv6.devconf_dflt->forwarding)
> + if (p == &net->ipv6.devconf_dflt->forwarding) {
> + *p = newf;
> return 0;
> + }
>
> - if (!rtnl_trylock()) {
> - /* Restore the original values before restarting */
> - *p = old;
> + if (!rtnl_trylock())
> return restart_syscall();
> - }
> +
> + old = *p;
> + *p = newf;
>
> if (p == &net->ipv6.devconf_all->forwarding) {
> - __s32 newf = net->ipv6.devconf_all->forwarding;
> net->ipv6.devconf_dflt->forwarding = newf;
> addrconf_forward_change(net, newf);
> - } else if ((!*p) ^ (!old))
> + } else if ((!newf) ^ (!old))
> dev_forward_change((struct inet6_dev *)table->extra1);
> rtnl_unlock();
>
> - if (*p)
> + if (newf)
> rt6_purge_dflt_routers(net);
> return 1;
> }
> @@ -4165,9 +4167,17 @@ int addrconf_sysctl_forward(ctl_table *c
> int *valp = ctl->data;
> int val = *valp;
> loff_t pos = *ppos;
> + ctl_table lctl;
> int ret;
>
> - ret = proc_dointvec(ctl, write, buffer, lenp, ppos);
> + /*
> + * ctl->data points to idev->cnf.forwarding, we should
> + * not modify it until we get the rtnl lock.
> + */
> + lctl = *ctl;
> + lctl.data = &val;
> +
> + ret = proc_dointvec(&lctl, write, buffer, lenp, ppos);
>
> if (write)
> ret = addrconf_fixup_forwarding(ctl, valp, val);
> @@ -4205,26 +4215,28 @@ static void addrconf_disable_change(stru
> rcu_read_unlock();
> }
>
> -static int addrconf_disable_ipv6(struct ctl_table *table, int *p, int old)
> +static int addrconf_disable_ipv6(struct ctl_table *table, int *p, int newf)
> {
> struct net *net;
> + int old;
>
> net = (struct net *)table->extra2;
>
> - if (p == &net->ipv6.devconf_dflt->disable_ipv6)
> + if (p == &net->ipv6.devconf_dflt->disable_ipv6) {
> + *p = newf;
> return 0;
> + }
>
> - if (!rtnl_trylock()) {
> - /* Restore the original values before restarting */
> - *p = old;
> + if (!rtnl_trylock())
> return restart_syscall();
> - }
> +
> + old = *p;
> + *p = newf;
>
> if (p == &net->ipv6.devconf_all->disable_ipv6) {
> - __s32 newf = net->ipv6.devconf_all->disable_ipv6;
> net->ipv6.devconf_dflt->disable_ipv6 = newf;
> addrconf_disable_change(net, newf);
> - } else if ((!*p) ^ (!old))
> + } else if ((!newf) ^ (!old))
> dev_disable_change((struct inet6_dev *)table->extra1);
>
> rtnl_unlock();
> @@ -4238,9 +4250,17 @@ int addrconf_sysctl_disable(ctl_table *c
> int *valp = ctl->data;
> int val = *valp;
> loff_t pos = *ppos;
> + ctl_table lctl;
> int ret;
>
> - ret = proc_dointvec(ctl, write, buffer, lenp, ppos);
> + /*
> + * ctl->data points to idev->cnf.disable_ipv6, we should
> + * not modify it until we get the rtnl lock.
> + */
> + lctl = *ctl;
> + lctl.data = &val;
> +
> + ret = proc_dointvec(&lctl, write, buffer, lenp, ppos);
>
> if (write)
> ret = addrconf_disable_ipv6(ctl, valp, val);
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2012-01-13 1:15 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-01-12 2:13 Race condition in ipv6 code Francesco Ruggeri
2012-01-12 6:31 ` Eric Dumazet
2012-01-12 6:44 ` David Miller
2012-01-12 20:48 ` Francesco Ruggeri
2012-01-13 0:11 ` Eric W. Biederman
2012-01-13 6:02 ` Eric Dumazet
2012-01-13 7:40 ` Eric W. Biederman
2012-01-13 17:04 ` Ben Greear
2012-01-14 5:46 ` Eric W. Biederman
2012-01-14 18:31 ` Ben Greear
2012-01-20 2:54 ` Eric W. Biederman
2012-01-13 1:17 ` Eric W. Biederman [this message]
2012-01-13 1:57 ` Stephen Hemminger
2012-01-13 22:02 ` Francesco Ruggeri
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=m1ty40tn83.fsf@fess.ebiederm.org \
--to=ebiederm@xmission.com \
--cc=davem@davemloft.ne \
--cc=eric.dumazet@gmail.com \
--cc=fruggeri@aristanetworks.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.