* Re: [Lguest] [PATCH 4/5] lguest: use KVM hypercalls [not found] ` <49E47976.8020005@trash.net> @ 2009-04-15 8:36 ` Herbert Xu [not found] ` <20090415083610.GA8579-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org> 0 siblings, 1 reply; 27+ messages in thread From: Herbert Xu @ 2009-04-15 8:36 UTC (permalink / raw) To: Patrick McHardy Cc: Eric W. Biederman, Matias Zabaljauregui, odie, Rusty Russell, lguest, virtualization, David S. Miller, netdev On Tue, Apr 14, 2009 at 01:54:30PM +0200, Patrick McHardy wrote: > > I see. How about this patch instead? It moves the sk_sleep assignment > to tun_attach, after the permission checks took place. Thanks for pointing out this gaping hole in my patch. I think it may be better to remove read_wait altogether and just use socket.wait in tun_struct. Let me whip up a patch. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 27+ messages in thread
[parent not found: <20090415083610.GA8579-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>]
* Re: [PATCH 4/5] lguest: use KVM hypercalls [not found] ` <20090415083610.GA8579-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org> @ 2009-04-15 8:47 ` Herbert Xu 2009-04-15 9:07 ` [Lguest] " Christian Borntraeger ` (2 more replies) 0 siblings, 3 replies; 27+ messages in thread From: Herbert Xu @ 2009-04-15 8:47 UTC (permalink / raw) To: Patrick McHardy Cc: lguest-mnsaURCQ41sdnm+yROfE0A, Christian Borntraeger, virtualization-qjLDD68F18O7TbgM5vRIOg, Matias Zabaljauregui, Eric W. Biederman, netdev-u79uwXL29TY76Z2rM5mHXA, David S. Miller On Wed, Apr 15, 2009 at 04:36:10PM +0800, Herbert Xu wrote: > > Let me whip up a patch. tun: Fix sk_sleep races when attaching/detaching As the sk_sleep wait queue actually lives in tfile, which may be detached from the tun device, bad things will happen when we use sk_sleep after detaching. Since the tun device is the persistent data structure here (when requested by the user), it makes much more sense to have the wait queue live there. There is no reason to have it in tfile at all since the only time we can wait is if we have a tun attached. In fact we already have a wait queue in tun_struct, so we might as well use it. Reported-by: Christian Borntraeger <borntraeger-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org> Reported-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> Reported-by: Patrick McHardy <kaber-dcUjhNyLwpNeoWH0uzbU5w@public.gmane.org> Signed-off-by: Herbert Xu <herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org> diff --git a/drivers/net/tun.c b/drivers/net/tun.c index a1b0697..412b578 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -93,7 +93,6 @@ struct tun_file { atomic_t count; struct tun_struct *tun; struct net *net; - wait_queue_head_t read_wait; }; struct tun_sock; @@ -333,7 +332,7 @@ static void tun_net_uninit(struct net_device *dev) /* Inform the methods they need to stop using the dev. */ if (tfile) { - wake_up_all(&tfile->read_wait); + wake_up_all(&tun->socket.wait); if (atomic_dec_and_test(&tfile->count)) __tun_detach(tun); } @@ -393,7 +392,7 @@ static int tun_net_xmit(struct sk_buff *skb, struct net_device *dev) /* Notify and wake up reader process */ if (tun->flags & TUN_FASYNC) kill_fasync(&tun->fasync, SIGIO, POLL_IN); - wake_up_interruptible(&tun->tfile->read_wait); + wake_up_interruptible(&tun->socket.wait); return 0; drop: @@ -490,7 +489,7 @@ static unsigned int tun_chr_poll(struct file *file, poll_table * wait) DBG(KERN_INFO "%s: tun_chr_poll\n", tun->dev->name); - poll_wait(file, &tfile->read_wait, wait); + poll_wait(file, &tun->socket.wait, wait); if (!skb_queue_empty(&tun->readq)) mask |= POLLIN | POLLRDNORM; @@ -762,7 +761,7 @@ static ssize_t tun_chr_aio_read(struct kiocb *iocb, const struct iovec *iv, goto out; } - add_wait_queue(&tfile->read_wait, &wait); + add_wait_queue(&tun->socket.wait, &wait); while (len) { current->state = TASK_INTERRUPTIBLE; @@ -793,7 +792,7 @@ static ssize_t tun_chr_aio_read(struct kiocb *iocb, const struct iovec *iv, } current->state = TASK_RUNNING; - remove_wait_queue(&tfile->read_wait, &wait); + remove_wait_queue(&tun->socket.wait, &wait); out: tun_put(tun); @@ -861,7 +860,6 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr) struct sock *sk; struct tun_struct *tun; struct net_device *dev; - struct tun_file *tfile = file->private_data; int err; dev = __dev_get_by_name(net, ifr->ifr_name); @@ -921,11 +919,11 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr) /* This ref count is for tun->sk. */ dev_hold(dev); + init_waitqueue_head(&tun->socket.wait); sock_init_data(&tun->socket, sk); sk->sk_write_space = tun_sock_write_space; sk->sk_destruct = tun_sock_destruct; sk->sk_sndbuf = INT_MAX; - sk->sk_sleep = &tfile->read_wait; tun->sk = sk; container_of(sk, struct tun_sock, sk)->tun = tun; @@ -1265,7 +1263,6 @@ static int tun_chr_open(struct inode *inode, struct file * file) atomic_set(&tfile->count, 0); tfile->tun = NULL; tfile->net = get_net(current->nsproxy->net_ns); - init_waitqueue_head(&tfile->read_wait); file->private_data = tfile; return 0; } Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [Lguest] [PATCH 4/5] lguest: use KVM hypercalls 2009-04-15 8:47 ` Herbert Xu @ 2009-04-15 9:07 ` Christian Borntraeger 2009-04-15 11:07 ` Patrick McHardy 2009-04-15 13:23 ` Eric W. Biederman 2 siblings, 0 replies; 27+ messages in thread From: Christian Borntraeger @ 2009-04-15 9:07 UTC (permalink / raw) To: Herbert Xu Cc: Patrick McHardy, Eric W. Biederman, Matias Zabaljauregui, odie, Rusty Russell, lguest, virtualization, David S. Miller, netdev Am Wednesday 15 April 2009 10:47:17 schrieb Herbert Xu: > tun: Fix sk_sleep races when attaching/detaching > > As the sk_sleep wait queue actually lives in tfile, which may be > detached from the tun device, bad things will happen when we use > sk_sleep after detaching. > > Since the tun device is the persistent data structure here (when > requested by the user), it makes much more sense to have the wait > queue live there. There is no reason to have it in tfile at all > since the only time we can wait is if we have a tun attached. > In fact we already have a wait queue in tun_struct, so we might > as well use it. > > Reported-by: Christian Borntraeger <borntraeger@de.ibm.com> > Reported-by: Eric W. Biederman <ebiederm@xmission.com> > Reported-by: Patrick McHardy <kaber@trash.net> > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Lguest] [PATCH 4/5] lguest: use KVM hypercalls 2009-04-15 8:47 ` Herbert Xu 2009-04-15 9:07 ` [Lguest] " Christian Borntraeger @ 2009-04-15 11:07 ` Patrick McHardy 2009-04-15 13:23 ` Eric W. Biederman 2 siblings, 0 replies; 27+ messages in thread From: Patrick McHardy @ 2009-04-15 11:07 UTC (permalink / raw) To: Herbert Xu Cc: Eric W. Biederman, Matias Zabaljauregui, odie, Rusty Russell, lguest, virtualization, David S. Miller, netdev, Christian Borntraeger Herbert Xu wrote: > On Wed, Apr 15, 2009 at 04:36:10PM +0800, Herbert Xu wrote: >> Let me whip up a patch. > > tun: Fix sk_sleep races when attaching/detaching > > As the sk_sleep wait queue actually lives in tfile, which may be > detached from the tun device, bad things will happen when we use > sk_sleep after detaching. > > Since the tun device is the persistent data structure here (when > requested by the user), it makes much more sense to have the wait > queue live there. There is no reason to have it in tfile at all > since the only time we can wait is if we have a tun attached. > In fact we already have a wait queue in tun_struct, so we might > as well use it. Tested and works fine, thanks Herbert. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Lguest] [PATCH 4/5] lguest: use KVM hypercalls 2009-04-15 8:47 ` Herbert Xu 2009-04-15 9:07 ` [Lguest] " Christian Borntraeger 2009-04-15 11:07 ` Patrick McHardy @ 2009-04-15 13:23 ` Eric W. Biederman [not found] ` <m18wm2rqy6.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org> 2 siblings, 1 reply; 27+ messages in thread From: Eric W. Biederman @ 2009-04-15 13:23 UTC (permalink / raw) To: Herbert Xu Cc: Patrick McHardy, Matias Zabaljauregui, odie, Rusty Russell, lguest, virtualization, David S. Miller, netdev, Christian Borntraeger Herbert Xu <herbert@gondor.apana.org.au> writes: > On Wed, Apr 15, 2009 at 04:36:10PM +0800, Herbert Xu wrote: >> >> Let me whip up a patch. > > tun: Fix sk_sleep races when attaching/detaching > > As the sk_sleep wait queue actually lives in tfile, which may be > detached from the tun device, bad things will happen when we use > sk_sleep after detaching. > > Since the tun device is the persistent data structure here (when > requested by the user), it makes much more sense to have the wait > queue live there. There is no reason to have it in tfile at all > since the only time we can wait is if we have a tun attached. > In fact we already have a wait queue in tun_struct, so we might > as well use it. There is a GIGANTIC reason to have the wait queue on tfile. If you open a file, and do ip link del tapN you can still be blocked waiting in poll. The problem is specifically free_poll_entry, where we call remove_wait_queue and fput without calling any file methods. So all of this happens without struct tun_file's count being elevated. Which means tun_net_uninit can detach before we get off of the stupid poll wait queue. As documented in: commit b2430de37ef0bc0799ffba7b5219d38ca417eb76 Author: Eric W. Biederman <ebiederm@xmission.com> Date: Tue Jan 20 11:03:21 2009 +0000 tun: Move read_wait into tun_file The poll interface requires that the waitqueue exist while the struct file is open. In the rare case when a tun device disappears before the tun file closes we fail to provide this property, so move read_wait. This is safe now that tun_net_xmit is atomic with tun_detach. Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> I specifically moved the wait queue out of tun struct to avoid this race. I will see about getting the vfs to do something saner in my generic revoke work. But for 2.6.30 we have to live with the nasties that are there. Nacked-by: "Eric W. Biederman" <ebiederm@xmission.com> Eric ^ permalink raw reply [flat|nested] 27+ messages in thread
[parent not found: <m18wm2rqy6.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>]
* Re: [PATCH 4/5] lguest: use KVM hypercalls [not found] ` <m18wm2rqy6.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org> @ 2009-04-15 13:28 ` Herbert Xu [not found] ` <20090415132802.GA11408-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org> 0 siblings, 1 reply; 27+ messages in thread From: Herbert Xu @ 2009-04-15 13:28 UTC (permalink / raw) To: Eric W. Biederman Cc: lguest-mnsaURCQ41sdnm+yROfE0A, Christian Borntraeger, David S. Miller, virtualization-qjLDD68F18O7TbgM5vRIOg, Matias Zabaljauregui, netdev-u79uwXL29TY76Z2rM5mHXA, Patrick McHardy On Wed, Apr 15, 2009 at 06:23:29AM -0700, Eric W. Biederman wrote: > > There is a GIGANTIC reason to have the wait queue on tfile. > > If you open a file, and do ip link del tapN you can still > be blocked waiting in poll. > > The problem is specifically free_poll_entry, where we call > remove_wait_queue and fput without calling any file methods. > So all of this happens without struct tun_file's count being > elevated. Which means tun_net_uninit can detach before we get > off of the stupid poll wait queue. What about taking a netdev refcount before calling poll_wait? Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 27+ messages in thread
[parent not found: <20090415132802.GA11408-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>]
* Re: [PATCH 4/5] lguest: use KVM hypercalls [not found] ` <20090415132802.GA11408-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org> @ 2009-04-15 13:35 ` Eric W. Biederman [not found] ` <m1skkaox8h.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org> 0 siblings, 1 reply; 27+ messages in thread From: Eric W. Biederman @ 2009-04-15 13:35 UTC (permalink / raw) To: Herbert Xu Cc: lguest-mnsaURCQ41sdnm+yROfE0A, Christian Borntraeger, David S. Miller, virtualization-qjLDD68F18O7TbgM5vRIOg, Matias Zabaljauregui, netdev-u79uwXL29TY76Z2rM5mHXA, Patrick McHardy Herbert Xu <herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org> writes: > On Wed, Apr 15, 2009 at 06:23:29AM -0700, Eric W. Biederman wrote: >> >> There is a GIGANTIC reason to have the wait queue on tfile. >> >> If you open a file, and do ip link del tapN you can still >> be blocked waiting in poll. >> >> The problem is specifically free_poll_entry, where we call >> remove_wait_queue and fput without calling any file methods. >> So all of this happens without struct tun_file's count being >> elevated. Which means tun_net_uninit can detach before we get >> off of the stupid poll wait queue. > > What about taking a netdev refcount before calling poll_wait? Because as far as I can tell we would just leak that refcount. The poll code does not appear to call back into any of the file methods when it frees itself from the wait queue. Eric ^ permalink raw reply [flat|nested] 27+ messages in thread
[parent not found: <m1skkaox8h.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>]
* Re: [PATCH 4/5] lguest: use KVM hypercalls [not found] ` <m1skkaox8h.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org> @ 2009-04-15 13:46 ` Herbert Xu [not found] ` <20090415134610.GA11683-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org> 2009-04-15 14:06 ` [Lguest] " Eric W. Biederman 0 siblings, 2 replies; 27+ messages in thread From: Herbert Xu @ 2009-04-15 13:46 UTC (permalink / raw) To: Eric W. Biederman Cc: lguest-mnsaURCQ41sdnm+yROfE0A, Christian Borntraeger, David S. Miller, virtualization-qjLDD68F18O7TbgM5vRIOg, Matias Zabaljauregui, netdev-u79uwXL29TY76Z2rM5mHXA, Patrick McHardy On Wed, Apr 15, 2009 at 06:35:58AM -0700, Eric W. Biederman wrote: > > Because as far as I can tell we would just leak that refcount. > > The poll code does not appear to call back into any of the file > methods when it frees itself from the wait queue. OK my suggestion was stupid. But I still don't see how this race is possible at all. So process A has a tun fd open and is spinning in poll(2). Now process B comes along and deletes that tun device. Process A's fd should have a netdev reference that keeps the device and associated structures alive. Oh I see what's going on. We're automatically detaching the device in uninit. This is just wrong. Just because process B deleted the netdev, process A should not be involutarily detached. Does anything actually rely on this behaviour? If not we should just change it to not do that. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 27+ messages in thread
[parent not found: <20090415134610.GA11683-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>]
* Re: [PATCH 4/5] lguest: use KVM hypercalls [not found] ` <20090415134610.GA11683-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org> @ 2009-04-15 13:55 ` Herbert Xu [not found] ` <20090415135502.GA11827-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org> 0 siblings, 1 reply; 27+ messages in thread From: Herbert Xu @ 2009-04-15 13:55 UTC (permalink / raw) To: Eric W. Biederman Cc: lguest-mnsaURCQ41sdnm+yROfE0A, Christian Borntraeger, David S. Miller, virtualization-qjLDD68F18O7TbgM5vRIOg, Matias Zabaljauregui, netdev-u79uwXL29TY76Z2rM5mHXA, Patrick McHardy On Wed, Apr 15, 2009 at 09:46:10PM +0800, Herbert Xu wrote: > > Does anything actually rely on this behaviour? I doubt it :) > If not we should just change it to not do that. It appears that this was introduced in commit c70f182940f988448f3c12a209d18b1edc276e33 Author: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> Date: Tue Jan 20 11:07:17 2009 +0000 tun: Fix races between tun_net_close and free_netdev. Presumably in order to fix the problem of trying to unregister the same device twice. I what we should do is to mark the device as dead instead of detaching if a third party deletes it. That's all you need to know to stop close(2) from trying the unregister a device that's already been unregistered. What else am I missing? Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 27+ messages in thread
[parent not found: <20090415135502.GA11827-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>]
* Re: [PATCH 4/5] lguest: use KVM hypercalls [not found] ` <20090415135502.GA11827-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org> @ 2009-04-15 14:10 ` Eric W. Biederman [not found] ` <m1ocuynh2f.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org> 0 siblings, 1 reply; 27+ messages in thread From: Eric W. Biederman @ 2009-04-15 14:10 UTC (permalink / raw) To: Herbert Xu Cc: lguest-mnsaURCQ41sdnm+yROfE0A, Christian Borntraeger, David S. Miller, virtualization-qjLDD68F18O7TbgM5vRIOg, Matias Zabaljauregui, netdev-u79uwXL29TY76Z2rM5mHXA, Patrick McHardy Herbert Xu <herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org> writes: > On Wed, Apr 15, 2009 at 09:46:10PM +0800, Herbert Xu wrote: >> >> Does anything actually rely on this behaviour? > > I doubt it :) > >> If not we should just change it to not do that. > > It appears that this was introduced in > > commit c70f182940f988448f3c12a209d18b1edc276e33 > Author: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> > Date: Tue Jan 20 11:07:17 2009 +0000 > > tun: Fix races between tun_net_close and free_netdev. > > Presumably in order to fix the problem of trying to unregister > the same device twice. > > I what we should do is to mark the device as dead instead of > detaching if a third party deletes it. That's all you need > to know to stop close(2) from trying the unregister a device > that's already been unregistered. > > What else am I missing? Hopefully my earlier explanation helps. I will get back to you as soon as I can. But I am off on vacation for the rest of the week. Eric ^ permalink raw reply [flat|nested] 27+ messages in thread
[parent not found: <m1ocuynh2f.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>]
* Re: [PATCH 4/5] lguest: use KVM hypercalls [not found] ` <m1ocuynh2f.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org> @ 2009-04-15 14:12 ` Herbert Xu 0 siblings, 0 replies; 27+ messages in thread From: Herbert Xu @ 2009-04-15 14:12 UTC (permalink / raw) To: Eric W. Biederman Cc: lguest-mnsaURCQ41sdnm+yROfE0A, Christian Borntraeger, David S. Miller, virtualization-qjLDD68F18O7TbgM5vRIOg, Matias Zabaljauregui, netdev-u79uwXL29TY76Z2rM5mHXA, Patrick McHardy On Wed, Apr 15, 2009 at 07:10:32AM -0700, Eric W. Biederman wrote: > > Hopefully my earlier explanation helps. I will get back to > you as soon as I can. But I am off on vacation for the > rest of the week. I'm sorry but I don't think we can wait for that. We might just have to fix it whatever way we can. You can unfix it when you come back :) Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Lguest] [PATCH 4/5] lguest: use KVM hypercalls 2009-04-15 13:46 ` Herbert Xu [not found] ` <20090415134610.GA11683-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org> @ 2009-04-15 14:06 ` Eric W. Biederman [not found] ` <m11vruovu5.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org> 1 sibling, 1 reply; 27+ messages in thread From: Eric W. Biederman @ 2009-04-15 14:06 UTC (permalink / raw) To: Herbert Xu Cc: Patrick McHardy, Matias Zabaljauregui, odie, Rusty Russell, lguest, virtualization, David S. Miller, netdev, Christian Borntraeger Herbert Xu <herbert@gondor.apana.org.au> writes: > On Wed, Apr 15, 2009 at 06:35:58AM -0700, Eric W. Biederman wrote: >> >> Because as far as I can tell we would just leak that refcount. >> >> The poll code does not appear to call back into any of the file >> methods when it frees itself from the wait queue. > > OK my suggestion was stupid. > > But I still don't see how this race is possible at all. > > So process A has a tun fd open and is spinning in poll(2). Now > process B comes along and deletes that tun device. Process A's > fd should have a netdev reference that keeps the device and > associated structures alive. > > Oh I see what's going on. We're automatically detaching the > device in uninit. This is just wrong. Just because process B > deleted the netdev, process A should not be involutarily detached. > > Does anything actually rely on this behaviour? Yes. There is the boring rmmod case that has always existed. There is more interesting case of moving your tap device into another network namespace. In which case there is the possibility of the network namespace exiting and destroying all of the virtual network devices before we close the file handle. I implemented the rtnetlink methods so we can test the uninit behavior without going through all sorts of hoops. Eric ^ permalink raw reply [flat|nested] 27+ messages in thread
[parent not found: <m11vruovu5.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>]
* Re: [PATCH 4/5] lguest: use KVM hypercalls [not found] ` <m11vruovu5.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org> @ 2009-04-15 14:08 ` Herbert Xu [not found] ` <20090415140819.GA11991-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org> 0 siblings, 1 reply; 27+ messages in thread From: Herbert Xu @ 2009-04-15 14:08 UTC (permalink / raw) To: Eric W. Biederman Cc: lguest-mnsaURCQ41sdnm+yROfE0A, Christian Borntraeger, David S. Miller, virtualization-qjLDD68F18O7TbgM5vRIOg, Matias Zabaljauregui, netdev-u79uwXL29TY76Z2rM5mHXA, Patrick McHardy On Wed, Apr 15, 2009 at 07:06:10AM -0700, Eric W. Biederman wrote: > > There is the boring rmmod case that has always existed. > > There is more interesting case of moving your tap device > into another network namespace. > > In which case there is the possibility of the network namespace > exiting and destroying all of the virtual network devices before > we close the file handle. In that case what's the problem with holding a refcount to the unregistered device until the process owning the fd closes it? Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 27+ messages in thread
[parent not found: <20090415140819.GA11991-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>]
* Re: [PATCH 4/5] lguest: use KVM hypercalls [not found] ` <20090415140819.GA11991-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org> @ 2009-04-15 14:18 ` Eric W. Biederman [not found] ` <m1iql6m24b.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org> 0 siblings, 1 reply; 27+ messages in thread From: Eric W. Biederman @ 2009-04-15 14:18 UTC (permalink / raw) To: Herbert Xu Cc: lguest-mnsaURCQ41sdnm+yROfE0A, Christian Borntraeger, David S. Miller, virtualization-qjLDD68F18O7TbgM5vRIOg, Matias Zabaljauregui, netdev-u79uwXL29TY76Z2rM5mHXA, Patrick McHardy Herbert Xu <herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org> writes: > On Wed, Apr 15, 2009 at 07:06:10AM -0700, Eric W. Biederman wrote: >> >> There is the boring rmmod case that has always existed. >> >> There is more interesting case of moving your tap device >> into another network namespace. >> >> In which case there is the possibility of the network namespace >> exiting and destroying all of the virtual network devices before >> we close the file handle. > > In that case what's the problem with holding a refcount to the > unregistered device until the process owning the fd closes it? Network devices do not hold a network namespace alive. Only sockets and processes do. So holding the reference only blocks us indefinitely in netdev_wait_allrefs, blocking the network namespace exit, and holding net_mutex indefinitely. My gut feel is that the socket needs to live in tun_file. Instead of in tun_struct. Making that change looked just tricky enough I couldn't sort through it when I glanced at the tun code, after I noticed you had added a socket. Eric ^ permalink raw reply [flat|nested] 27+ messages in thread
[parent not found: <m1iql6m24b.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>]
* Re: [PATCH 4/5] lguest: use KVM hypercalls [not found] ` <m1iql6m24b.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org> @ 2009-04-15 14:23 ` Herbert Xu 2009-04-15 14:38 ` Herbert Xu 1 sibling, 0 replies; 27+ messages in thread From: Herbert Xu @ 2009-04-15 14:23 UTC (permalink / raw) To: Eric W. Biederman Cc: lguest-mnsaURCQ41sdnm+yROfE0A, Christian Borntraeger, David S. Miller, virtualization-qjLDD68F18O7TbgM5vRIOg, Matias Zabaljauregui, netdev-u79uwXL29TY76Z2rM5mHXA, Patrick McHardy On Wed, Apr 15, 2009 at 07:18:44AM -0700, Eric W. Biederman wrote: > > My gut feel is that the socket needs to live in tun_file. Instead > of in tun_struct. Making that change looked just tricky enough > I couldn't sort through it when I glanced at the tun code, after I noticed > you had added a socket. Referring to tun_file to get sk_sleep is just too error-prone. Unlike the transmit direction, the receive direction does not present itself to the easy xmit lock solution that's currently used to make tun_detach atomic. The receive callback that currently uses sk_sleep can happen anywhere. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 4/5] lguest: use KVM hypercalls [not found] ` <m1iql6m24b.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org> 2009-04-15 14:23 ` Herbert Xu @ 2009-04-15 14:38 ` Herbert Xu [not found] ` <20090415143834.GA12384-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org> 1 sibling, 1 reply; 27+ messages in thread From: Herbert Xu @ 2009-04-15 14:38 UTC (permalink / raw) To: Eric W. Biederman Cc: lguest-mnsaURCQ41sdnm+yROfE0A, Christian Borntraeger, David S. Miller, virtualization-qjLDD68F18O7TbgM5vRIOg, Matias Zabaljauregui, netdev-u79uwXL29TY76Z2rM5mHXA, Patrick McHardy On Wed, Apr 15, 2009 at 07:18:44AM -0700, Eric W. Biederman wrote: > > So holding the reference only blocks us indefinitely in > netdev_wait_allrefs, blocking the network namespace exit, and holding > net_mutex indefinitely. OK that's a killer because process A in my previous scenario can kill the device itself and cause a dead-lock. So how about this? We replace the dev destructor with our own that doesn't immediately call free_netdev. We only call free_netdev once all tun fd's attached to the device have been closed. This can be achieved by simply adding a counter to tun_struct. We'd also change the async detach path to set a marker instead of detaching. That marker can then be checked in places like tun_get. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 27+ messages in thread
[parent not found: <20090415143834.GA12384-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>]
* Re: [PATCH 4/5] lguest: use KVM hypercalls [not found] ` <20090415143834.GA12384-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org> @ 2009-04-15 14:56 ` Eric W. Biederman [not found] ` <m1zleiklsl.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org> 2009-04-16 11:08 ` [1/2] tun: Only free a netdev when all tun descriptors are closed Herbert Xu 1 sibling, 1 reply; 27+ messages in thread From: Eric W. Biederman @ 2009-04-15 14:56 UTC (permalink / raw) To: Herbert Xu Cc: lguest-mnsaURCQ41sdnm+yROfE0A, Christian Borntraeger, David S. Miller, virtualization-qjLDD68F18O7TbgM5vRIOg, Matias Zabaljauregui, netdev-u79uwXL29TY76Z2rM5mHXA, Patrick McHardy Herbert Xu <herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org> writes: > On Wed, Apr 15, 2009 at 07:18:44AM -0700, Eric W. Biederman wrote: >> >> So holding the reference only blocks us indefinitely in >> netdev_wait_allrefs, blocking the network namespace exit, and holding >> net_mutex indefinitely. > > OK that's a killer because process A in my previous scenario can > kill the device itself and cause a dead-lock. > > So how about this? We replace the dev destructor with our own that > doesn't immediately call free_netdev. We only call free_netdev once > all tun fd's attached to the device have been closed. > > This can be achieved by simply adding a counter to tun_struct. > We'd also change the async detach path to set a marker instead > of detaching. That marker can then be checked in places like > tun_get. That sounds like it would work, and allow us to have big tun_struct. Which is sounds simpler overall. I still have the feeling that putting the socket in tun_file instead of in tun_struct might be conceptually cleaner, but one big blob that is allocated and destroyed together is certainly easier and a lot less racy to deal with. Eric ^ permalink raw reply [flat|nested] 27+ messages in thread
[parent not found: <m1zleiklsl.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>]
* Re: [PATCH 4/5] lguest: use KVM hypercalls [not found] ` <m1zleiklsl.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org> @ 2009-04-15 22:27 ` Herbert Xu 0 siblings, 0 replies; 27+ messages in thread From: Herbert Xu @ 2009-04-15 22:27 UTC (permalink / raw) To: Eric W. Biederman Cc: lguest-mnsaURCQ41sdnm+yROfE0A, Christian Borntraeger, David S. Miller, virtualization-qjLDD68F18O7TbgM5vRIOg, Matias Zabaljauregui, netdev-u79uwXL29TY76Z2rM5mHXA, Patrick McHardy On Wed, Apr 15, 2009 at 07:56:42AM -0700, Eric W. Biederman wrote: > > I still have the feeling that putting the socket in tun_file instead > of in tun_struct might be conceptually cleaner, but one big blob that > is allocated and destroyed together is certainly easier and a lot > less racy to deal with. As I said the difficulty with putting the socket in tun_file is how do you get it on the RX callback path without introducing new races. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 27+ messages in thread
* [1/2] tun: Only free a netdev when all tun descriptors are closed [not found] ` <20090415143834.GA12384-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org> 2009-04-15 14:56 ` Eric W. Biederman @ 2009-04-16 11:08 ` Herbert Xu [not found] ` <20090416110818.GA20950-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org> 2009-04-24 8:55 ` [1/2] tun: Only free a netdev when all tun descriptors are closed Christian Borntraeger 1 sibling, 2 replies; 27+ messages in thread From: Herbert Xu @ 2009-04-16 11:08 UTC (permalink / raw) To: Eric W. Biederman Cc: lguest-mnsaURCQ41sdnm+yROfE0A, Christian Borntraeger, David S. Miller, virtualization-qjLDD68F18O7TbgM5vRIOg, Matias Zabaljauregui, netdev-u79uwXL29TY76Z2rM5mHXA, Patrick McHardy On Wed, Apr 15, 2009 at 10:38:34PM +0800, Herbert Xu wrote: > > So how about this? We replace the dev destructor with our own that > doesn't immediately call free_netdev. We only call free_netdev once > all tun fd's attached to the device have been closed. Here's the patch. I'd appreciate if everyone can review it and see if they can recreate the original race by 1) Starting a process using tun and polls on it; 2) Doing ip tun del tunXXX while the process is polling. tun: Only free a netdev when all tun descriptors are closed The commit c70f182940f988448f3c12a209d18b1edc276e33 ("tun: Fix races between tun_net_close and free_netdev") fixed a race where an asynchronous deletion of a tun device can hose a poll(2) on a tun fd attached to that device. However, this came at the cost of moving the tun wait queue into the tun file data structure. The problem with this is that it imposes restrictions on when and where the tun device can access the wait queue since the tun file may change at any time due to detaching and reattaching. In particular, now that we need to use the wait queue on the receive path it becomes difficult to properly synchronise this with the detachment of the tun device. This patch solves the original race in a different way. Since the race is only because the underlying memory gets freed, we can prevent it simply by ensuring that we don't do that until all tun descriptors ever attached to the device (even if they have since be detached because they may still be sitting in poll) have been closed. This is done by using reference counting the attached tun file descriptors. The refcount in tun->sk has been reappropriated for this purpose since it was already being used for that, albeit from the opposite angle. Note that we no longer zero tfile->tun since tun_get will return NULL anyway after the refcount on tfile hits zero. Instead it represents whether this device has ever been attached to a device. Signed-off-by: Herbert Xu <herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org> diff --git a/drivers/net/tun.c b/drivers/net/tun.c index a1b0697..540f829 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -156,6 +156,7 @@ static int tun_attach(struct tun_struct *tun, struct file *file) tfile->tun = tun; tun->tfile = tfile; dev_hold(tun->dev); + sock_hold(tun->sk); atomic_inc(&tfile->count); out: @@ -165,11 +166,8 @@ out: static void __tun_detach(struct tun_struct *tun) { - struct tun_file *tfile = tun->tfile; - /* Detach from net device */ netif_tx_lock_bh(tun->dev); - tfile->tun = NULL; tun->tfile = NULL; netif_tx_unlock_bh(tun->dev); @@ -339,6 +337,13 @@ static void tun_net_uninit(struct net_device *dev) } } +static void tun_free_netdev(struct net_device *dev) +{ + struct tun_struct *tun = netdev_priv(dev); + + sock_put(tun->sk); +} + /* Net device open. */ static int tun_net_open(struct net_device *dev) { @@ -810,7 +815,7 @@ static void tun_setup(struct net_device *dev) tun->group = -1; dev->ethtool_ops = &tun_ethtool_ops; - dev->destructor = free_netdev; + dev->destructor = tun_free_netdev; } /* Trivial set of netlink ops to allow deleting tun or tap @@ -847,7 +852,7 @@ static void tun_sock_write_space(struct sock *sk) static void tun_sock_destruct(struct sock *sk) { - dev_put(container_of(sk, struct tun_sock, sk)->tun->dev); + free_netdev(container_of(sk, struct tun_sock, sk)->tun->dev); } static struct proto tun_proto = { @@ -919,8 +924,6 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr) if (!sk) goto err_free_dev; - /* This ref count is for tun->sk. */ - dev_hold(dev); sock_init_data(&tun->socket, sk); sk->sk_write_space = tun_sock_write_space; sk->sk_destruct = tun_sock_destruct; @@ -1275,20 +1278,18 @@ static int tun_chr_close(struct inode *inode, struct file *file) struct tun_file *tfile = file->private_data; struct tun_struct *tun = __tun_get(tfile); - if (tun) { - DBG(KERN_INFO "%s: tun_chr_close\n", tun->dev->name); - - rtnl_lock(); - __tun_detach(tun); - /* If desireable, unregister the netdevice. */ - if (!(tun->flags & TUN_PERSIST)) { - sock_put(tun->sk); - unregister_netdevice(tun->dev); - } + if (!(tun->flags & TUN_PERSIST)) + unregister_netdev(tun->dev); + else + tun_put(tun); + } else + tun = tfile->tun; - rtnl_unlock(); + if (tun) { + DBG(KERN_INFO "%s: tun_chr_close\n", tun->dev->name); + sock_put(tun->sk); } put_net(tfile->net); Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply related [flat|nested] 27+ messages in thread
[parent not found: <20090416110818.GA20950-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>]
* [2/2] tun: Fix sk_sleep races when attaching/detaching [not found] ` <20090416110818.GA20950-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org> @ 2009-04-16 11:09 ` Herbert Xu [not found] ` <20090416110952.GB20950-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org> 0 siblings, 1 reply; 27+ messages in thread From: Herbert Xu @ 2009-04-16 11:09 UTC (permalink / raw) To: Eric W. Biederman Cc: lguest-mnsaURCQ41sdnm+yROfE0A, Christian Borntraeger, David S. Miller, virtualization-qjLDD68F18O7TbgM5vRIOg, Matias Zabaljauregui, netdev-u79uwXL29TY76Z2rM5mHXA, Patrick McHardy On Thu, Apr 16, 2009 at 07:08:18PM +0800, Herbert Xu wrote: > > tun: Only free a netdev when all tun descriptors are closed With that patch we can now safely move read_wait. tun: Fix sk_sleep races when attaching/detaching As the sk_sleep wait queue actually lives in tfile, which may be detached from the tun device, bad things will happen when we use sk_sleep after detaching. Since the tun device is the persistent data structure here (when requested by the user), it makes much more sense to have the wait queue live there. There is no reason to have it in tfile at all since the only time we can wait is if we have a tun attached. In fact we already have a wait queue in tun_struct, so we might as well use it. Reported-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> Tested-by: Christian Borntraeger <borntraeger-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org> Tested-by: Patrick McHardy <kaber-dcUjhNyLwpNeoWH0uzbU5w@public.gmane.org> Signed-off-by: Herbert Xu <herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org> diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 540f829..7cfe3d1 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -93,7 +93,6 @@ struct tun_file { atomic_t count; struct tun_struct *tun; struct net *net; - wait_queue_head_t read_wait; }; struct tun_sock; @@ -331,7 +330,7 @@ static void tun_net_uninit(struct net_device *dev) /* Inform the methods they need to stop using the dev. */ if (tfile) { - wake_up_all(&tfile->read_wait); + wake_up_all(&tun->socket.wait); if (atomic_dec_and_test(&tfile->count)) __tun_detach(tun); } @@ -398,7 +397,7 @@ static int tun_net_xmit(struct sk_buff *skb, struct net_device *dev) /* Notify and wake up reader process */ if (tun->flags & TUN_FASYNC) kill_fasync(&tun->fasync, SIGIO, POLL_IN); - wake_up_interruptible(&tun->tfile->read_wait); + wake_up_interruptible(&tun->socket.wait); return 0; drop: @@ -495,7 +494,7 @@ static unsigned int tun_chr_poll(struct file *file, poll_table * wait) DBG(KERN_INFO "%s: tun_chr_poll\n", tun->dev->name); - poll_wait(file, &tfile->read_wait, wait); + poll_wait(file, &tun->socket.wait, wait); if (!skb_queue_empty(&tun->readq)) mask |= POLLIN | POLLRDNORM; @@ -767,7 +766,7 @@ static ssize_t tun_chr_aio_read(struct kiocb *iocb, const struct iovec *iv, goto out; } - add_wait_queue(&tfile->read_wait, &wait); + add_wait_queue(&tun->socket.wait, &wait); while (len) { current->state = TASK_INTERRUPTIBLE; @@ -798,7 +797,7 @@ static ssize_t tun_chr_aio_read(struct kiocb *iocb, const struct iovec *iv, } current->state = TASK_RUNNING; - remove_wait_queue(&tfile->read_wait, &wait); + remove_wait_queue(&tun->socket.wait, &wait); out: tun_put(tun); @@ -866,7 +865,6 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr) struct sock *sk; struct tun_struct *tun; struct net_device *dev; - struct tun_file *tfile = file->private_data; int err; dev = __dev_get_by_name(net, ifr->ifr_name); @@ -924,11 +922,11 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr) if (!sk) goto err_free_dev; + init_waitqueue_head(&tun->socket.wait); sock_init_data(&tun->socket, sk); sk->sk_write_space = tun_sock_write_space; sk->sk_destruct = tun_sock_destruct; sk->sk_sndbuf = INT_MAX; - sk->sk_sleep = &tfile->read_wait; tun->sk = sk; container_of(sk, struct tun_sock, sk)->tun = tun; @@ -1268,7 +1266,6 @@ static int tun_chr_open(struct inode *inode, struct file * file) atomic_set(&tfile->count, 0); tfile->tun = NULL; tfile->net = get_net(current->nsproxy->net_ns); - init_waitqueue_head(&tfile->read_wait); file->private_data = tfile; return 0; } Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply related [flat|nested] 27+ messages in thread
[parent not found: <20090416110952.GB20950-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>]
* Re: [2/2] tun: Fix sk_sleep races when attaching/detaching [not found] ` <20090416110952.GB20950-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org> @ 2009-04-20 8:35 ` Herbert Xu 2009-04-20 9:26 ` David Miller 0 siblings, 1 reply; 27+ messages in thread From: Herbert Xu @ 2009-04-20 8:35 UTC (permalink / raw) To: Eric W. Biederman Cc: lguest-mnsaURCQ41sdnm+yROfE0A, Christian Borntraeger, David S. Miller, virtualization-qjLDD68F18O7TbgM5vRIOg, Matias Zabaljauregui, netdev-u79uwXL29TY76Z2rM5mHXA, Patrick McHardy On Thu, Apr 16, 2009 at 07:09:52PM +0800, Herbert Xu wrote: > > tun: Fix sk_sleep races when attaching/detaching That patch doesn't apply anymore because of contextual changes caused by the first patch. Here's an update. tun: Fix sk_sleep races when attaching/detaching As the sk_sleep wait queue actually lives in tfile, which may be detached from the tun device, bad things will happen when we use sk_sleep after detaching. Since the tun device is the persistent data structure here (when requested by the user), it makes much more sense to have the wait queue live there. There is no reason to have it in tfile at all since the only time we can wait is if we have a tun attached. In fact we already have a wait queue in tun_struct, so we might as well use it. Reported-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> Tested-by: Christian Borntraeger <borntraeger-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org> Tested-by: Patrick McHardy <kaber-dcUjhNyLwpNeoWH0uzbU5w@public.gmane.org> Signed-off-by: Herbert Xu <herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org> diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 95ae40a..735bf41 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -93,7 +93,6 @@ struct tun_file { atomic_t count; struct tun_struct *tun; struct net *net; - wait_queue_head_t read_wait; }; struct tun_sock; @@ -331,7 +330,7 @@ static void tun_net_uninit(struct net_device *dev) /* Inform the methods they need to stop using the dev. */ if (tfile) { - wake_up_all(&tfile->read_wait); + wake_up_all(&tun->socket.wait); if (atomic_dec_and_test(&tfile->count)) __tun_detach(tun); } @@ -398,7 +397,7 @@ static int tun_net_xmit(struct sk_buff *skb, struct net_device *dev) /* Notify and wake up reader process */ if (tun->flags & TUN_FASYNC) kill_fasync(&tun->fasync, SIGIO, POLL_IN); - wake_up_interruptible(&tun->tfile->read_wait); + wake_up_interruptible(&tun->socket.wait); return 0; drop: @@ -495,7 +494,7 @@ static unsigned int tun_chr_poll(struct file *file, poll_table * wait) DBG(KERN_INFO "%s: tun_chr_poll\n", tun->dev->name); - poll_wait(file, &tfile->read_wait, wait); + poll_wait(file, &tun->socket.wait, wait); if (!skb_queue_empty(&tun->readq)) mask |= POLLIN | POLLRDNORM; @@ -768,7 +767,7 @@ static ssize_t tun_chr_aio_read(struct kiocb *iocb, const struct iovec *iv, goto out; } - add_wait_queue(&tfile->read_wait, &wait); + add_wait_queue(&tun->socket.wait, &wait); while (len) { current->state = TASK_INTERRUPTIBLE; @@ -799,7 +798,7 @@ static ssize_t tun_chr_aio_read(struct kiocb *iocb, const struct iovec *iv, } current->state = TASK_RUNNING; - remove_wait_queue(&tfile->read_wait, &wait); + remove_wait_queue(&tun->socket.wait, &wait); out: tun_put(tun); @@ -867,7 +866,6 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr) struct sock *sk; struct tun_struct *tun; struct net_device *dev; - struct tun_file *tfile = file->private_data; int err; dev = __dev_get_by_name(net, ifr->ifr_name); @@ -925,10 +923,10 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr) if (!sk) goto err_free_dev; + init_waitqueue_head(&tun->socket.wait); sock_init_data(&tun->socket, sk); sk->sk_write_space = tun_sock_write_space; sk->sk_sndbuf = INT_MAX; - sk->sk_sleep = &tfile->read_wait; tun->sk = sk; container_of(sk, struct tun_sock, sk)->tun = tun; @@ -1270,7 +1268,6 @@ static int tun_chr_open(struct inode *inode, struct file * file) atomic_set(&tfile->count, 0); tfile->tun = NULL; tfile->net = get_net(current->nsproxy->net_ns); - init_waitqueue_head(&tfile->read_wait); file->private_data = tfile; return 0; } Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [2/2] tun: Fix sk_sleep races when attaching/detaching 2009-04-20 8:35 ` Herbert Xu @ 2009-04-20 9:26 ` David Miller 2009-04-20 9:35 ` Herbert Xu 0 siblings, 1 reply; 27+ messages in thread From: David Miller @ 2009-04-20 9:26 UTC (permalink / raw) To: herbert Cc: ebiederm, kaber, zabaljauregui, odie, rusty, lguest, virtualization, netdev, borntraeger From: Herbert Xu <herbert@gondor.apana.org.au> Date: Mon, 20 Apr 2009 16:35:50 +0800 > On Thu, Apr 16, 2009 at 07:09:52PM +0800, Herbert Xu wrote: >> >> tun: Fix sk_sleep races when attaching/detaching > > That patch doesn't apply anymore because of contextual changes > caused by the first patch. Here's an update. > > tun: Fix sk_sleep races when attaching/detaching Do you think these two patches are ready to go into net-2.6 now? Thanks. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [2/2] tun: Fix sk_sleep races when attaching/detaching 2009-04-20 9:26 ` David Miller @ 2009-04-20 9:35 ` Herbert Xu 2009-04-20 10:02 ` David Miller 0 siblings, 1 reply; 27+ messages in thread From: Herbert Xu @ 2009-04-20 9:35 UTC (permalink / raw) To: David Miller Cc: ebiederm, kaber, zabaljauregui, odie, rusty, lguest, virtualization, netdev, borntraeger On Mon, Apr 20, 2009 at 02:26:35AM -0700, David Miller wrote: > > Do you think these two patches are ready to go into net-2.6 > now? I think so. Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [2/2] tun: Fix sk_sleep races when attaching/detaching 2009-04-20 9:35 ` Herbert Xu @ 2009-04-20 10:02 ` David Miller 0 siblings, 0 replies; 27+ messages in thread From: David Miller @ 2009-04-20 10:02 UTC (permalink / raw) To: herbert Cc: ebiederm, kaber, zabaljauregui, odie, rusty, lguest, virtualization, netdev, borntraeger From: Herbert Xu <herbert@gondor.apana.org.au> Date: Mon, 20 Apr 2009 17:35:49 +0800 > On Mon, Apr 20, 2009 at 02:26:35AM -0700, David Miller wrote: >> >> Do you think these two patches are ready to go into net-2.6 >> now? > > I think so. Great, applied, thanks Herbert. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [1/2] tun: Only free a netdev when all tun descriptors are closed 2009-04-16 11:08 ` [1/2] tun: Only free a netdev when all tun descriptors are closed Herbert Xu [not found] ` <20090416110818.GA20950-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org> @ 2009-04-24 8:55 ` Christian Borntraeger [not found] ` <200904241055.49794.borntraeger-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org> 1 sibling, 1 reply; 27+ messages in thread From: Christian Borntraeger @ 2009-04-24 8:55 UTC (permalink / raw) To: Herbert Xu Cc: Eric W. Biederman, Patrick McHardy, Matias Zabaljauregui, odie, Rusty Russell, lguest, virtualization, David S. Miller, netdev, Carsten Otte Am Thursday 16 April 2009 13:08:18 schrieb Herbert Xu: > On Wed, Apr 15, 2009 at 10:38:34PM +0800, Herbert Xu wrote: > > > > So how about this? We replace the dev destructor with our own that > > doesn't immediately call free_netdev. We only call free_netdev once > > all tun fd's attached to the device have been closed. > > Here's the patch. I'd appreciate if everyone can review it > and see if they can recreate the original race by > > 1) Starting a process using tun and polls on it; > 2) Doing ip tun del tunXXX while the process is polling. > > tun: Only free a netdev when all tun descriptors are closed > > The commit c70f182940f988448f3c12a209d18b1edc276e33 ("tun: Fix > races between tun_net_close and free_netdev") fixed a race where > an asynchronous deletion of a tun device can hose a poll(2) on > a tun fd attached to that device. Sorry for the late reply, but this patch creates another regression: Now TUNSETIFF returns EBUSY on a persistent tap device: open("/dev/net/tun", O_RDWR) = 11 ioctl(11, 0x400454ca, 0x3ffff8e2270) = -1 EBUSY (Device or resource busy) Some debugging (thanks to Carsten Otte) showed that tun_set_iff calls tun_attach (the first call inside the if(dev)). tun_attach now checks for tun->tfile but this is already set. Looks like we need another fix on top :-( Christian ^ permalink raw reply [flat|nested] 27+ messages in thread
[parent not found: <200904241055.49794.borntraeger-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org>]
* Re: [1/2] tun: Only free a netdev when all tun descriptors are closed [not found] ` <200904241055.49794.borntraeger-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org> @ 2009-04-24 12:11 ` Herbert Xu [not found] ` <20090424121156.GA28039-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org> 0 siblings, 1 reply; 27+ messages in thread From: Herbert Xu @ 2009-04-24 12:11 UTC (permalink / raw) To: Christian Borntraeger Cc: Carsten Otte, lguest-mnsaURCQ41sdnm+yROfE0A, David S. Miller, virtualization-qjLDD68F18O7TbgM5vRIOg, Matias Zabaljauregui, Eric W. Biederman, netdev-u79uwXL29TY76Z2rM5mHXA, Patrick McHardy On Fri, Apr 24, 2009 at 10:55:49AM +0200, Christian Borntraeger wrote: > Am Thursday 16 April 2009 13:08:18 schrieb Herbert Xu: > > > Here's the patch. I'd appreciate if everyone can review it > > and see if they can recreate the original race by > > > > 1) Starting a process using tun and polls on it; > > 2) Doing ip tun del tunXXX while the process is polling. > > > > tun: Only free a netdev when all tun descriptors are closed > > > > The commit c70f182940f988448f3c12a209d18b1edc276e33 ("tun: Fix > > races between tun_net_close and free_netdev") fixed a race where > > an asynchronous deletion of a tun device can hose a poll(2) on > > a tun fd attached to that device. > > Sorry for the late reply, but this patch creates another regression: > Now TUNSETIFF returns EBUSY on a persistent tap device: > > open("/dev/net/tun", O_RDWR) = 11 > ioctl(11, 0x400454ca, 0x3ffff8e2270) = -1 EBUSY (Device or resource busy) The patch you qouted has been superceded by many versions :) Can you please test the latest that's in davem's tree? Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 27+ messages in thread
[parent not found: <20090424121156.GA28039-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>]
* Re: [1/2] tun: Only free a netdev when all tun descriptors are closed [not found] ` <20090424121156.GA28039-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org> @ 2009-04-24 12:40 ` Christian Borntraeger 0 siblings, 0 replies; 27+ messages in thread From: Christian Borntraeger @ 2009-04-24 12:40 UTC (permalink / raw) To: Herbert Xu Cc: Carsten Otte, lguest-mnsaURCQ41sdnm+yROfE0A, David S. Miller, virtualization-qjLDD68F18O7TbgM5vRIOg, Matias Zabaljauregui, Eric W. Biederman, netdev-u79uwXL29TY76Z2rM5mHXA, Patrick McHardy Am Friday 24 April 2009 14:11:56 schrieb Herbert Xu: > The patch you qouted has been superceded by many versions :) Yes, I got lost in this mail thread... > Can you please test the latest that's in davem's tree? Done. With http://git.kernel.org/?p=linux%2Fkernel%2Fgit%2Fdavem%2Fnet-2.6.git;a=commitdiff_plain;h=9c3fea6ab04a7bd9298e635bf29b4a5379f6c476 http://git.kernel.org/?p=linux%2Fkernel%2Fgit%2Fdavem%2Fnet-2.6.git;a=commitdiff_plain;h=c40af84a6726f63e35740d26f841992e8f31f92c Everything works fine. Thank you and sorry for the noise. Christian ^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2009-04-24 12:40 UTC | newest]
Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <200903271022.38244.rusty@rustcorp.com.au>
[not found] ` <1238709324.5823.8.camel@odie.local>
[not found] ` <1239043798.27826.93.camel@zetabook>
[not found] ` <200904081021.39877.rusty@rustcorp.com.au>
[not found] ` <1239224319.17844.16.camel@zetabook>
[not found] ` <49DDE91A.8060603@trash.net>
[not found] ` <49DDF614.1060909@trash.net>
[not found] ` <m1bpr6hqrm.fsf@fess.ebiederm.org>
[not found] ` <49E47976.8020005@trash.net>
2009-04-15 8:36 ` [Lguest] [PATCH 4/5] lguest: use KVM hypercalls Herbert Xu
[not found] ` <20090415083610.GA8579-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>
2009-04-15 8:47 ` Herbert Xu
2009-04-15 9:07 ` [Lguest] " Christian Borntraeger
2009-04-15 11:07 ` Patrick McHardy
2009-04-15 13:23 ` Eric W. Biederman
[not found] ` <m18wm2rqy6.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
2009-04-15 13:28 ` Herbert Xu
[not found] ` <20090415132802.GA11408-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>
2009-04-15 13:35 ` Eric W. Biederman
[not found] ` <m1skkaox8h.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
2009-04-15 13:46 ` Herbert Xu
[not found] ` <20090415134610.GA11683-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>
2009-04-15 13:55 ` Herbert Xu
[not found] ` <20090415135502.GA11827-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>
2009-04-15 14:10 ` Eric W. Biederman
[not found] ` <m1ocuynh2f.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
2009-04-15 14:12 ` Herbert Xu
2009-04-15 14:06 ` [Lguest] " Eric W. Biederman
[not found] ` <m11vruovu5.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
2009-04-15 14:08 ` Herbert Xu
[not found] ` <20090415140819.GA11991-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>
2009-04-15 14:18 ` Eric W. Biederman
[not found] ` <m1iql6m24b.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
2009-04-15 14:23 ` Herbert Xu
2009-04-15 14:38 ` Herbert Xu
[not found] ` <20090415143834.GA12384-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>
2009-04-15 14:56 ` Eric W. Biederman
[not found] ` <m1zleiklsl.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
2009-04-15 22:27 ` Herbert Xu
2009-04-16 11:08 ` [1/2] tun: Only free a netdev when all tun descriptors are closed Herbert Xu
[not found] ` <20090416110818.GA20950-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>
2009-04-16 11:09 ` [2/2] tun: Fix sk_sleep races when attaching/detaching Herbert Xu
[not found] ` <20090416110952.GB20950-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>
2009-04-20 8:35 ` Herbert Xu
2009-04-20 9:26 ` David Miller
2009-04-20 9:35 ` Herbert Xu
2009-04-20 10:02 ` David Miller
2009-04-24 8:55 ` [1/2] tun: Only free a netdev when all tun descriptors are closed Christian Borntraeger
[not found] ` <200904241055.49794.borntraeger-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org>
2009-04-24 12:11 ` Herbert Xu
[not found] ` <20090424121156.GA28039-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>
2009-04-24 12:40 ` Christian Borntraeger
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).