netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [Lguest] [PATCH 4/5] lguest: use KVM hypercalls
       [not found]               ` <49E47976.8020005@trash.net>
@ 2009-04-15  8:36                 ` Herbert Xu
       [not found]                   ` <20090415083610.GA8579-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>
  0 siblings, 1 reply; 27+ messages in thread
From: Herbert Xu @ 2009-04-15  8:36 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: Eric W. Biederman, Matias Zabaljauregui, odie, Rusty Russell,
	lguest, virtualization, David S. Miller, netdev

On Tue, Apr 14, 2009 at 01:54:30PM +0200, Patrick McHardy wrote:
>
> I see. How about this patch instead? It moves the sk_sleep assignment
> to tun_attach, after the permission checks took place.

Thanks for pointing out this gaping hole in my patch.  I think
it may be better to remove read_wait altogether and just use
socket.wait in tun_struct.

Let me whip up a patch.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/5] lguest: use KVM hypercalls
       [not found]                   ` <20090415083610.GA8579-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>
@ 2009-04-15  8:47                     ` Herbert Xu
  2009-04-15  9:07                       ` [Lguest] " Christian Borntraeger
                                         ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Herbert Xu @ 2009-04-15  8:47 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: lguest-mnsaURCQ41sdnm+yROfE0A, Christian Borntraeger,
	virtualization-qjLDD68F18O7TbgM5vRIOg, Matias Zabaljauregui,
	Eric W. Biederman, netdev-u79uwXL29TY76Z2rM5mHXA, David S. Miller

On Wed, Apr 15, 2009 at 04:36:10PM +0800, Herbert Xu wrote:
> 
> Let me whip up a patch.

tun: Fix sk_sleep races when attaching/detaching

As the sk_sleep wait queue actually lives in tfile, which may be
detached from the tun device, bad things will happen when we use
sk_sleep after detaching.

Since the tun device is the persistent data structure here (when
requested by the user), it makes much more sense to have the wait
queue live there.  There is no reason to have it in tfile at all
since the only time we can wait is if we have a tun attached.
In fact we already have a wait queue in tun_struct, so we might
as well use it.

Reported-by: Christian Borntraeger <borntraeger-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org>
Reported-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
Reported-by: Patrick McHardy <kaber-dcUjhNyLwpNeoWH0uzbU5w@public.gmane.org>
Signed-off-by: Herbert Xu <herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index a1b0697..412b578 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -93,7 +93,6 @@ struct tun_file {
 	atomic_t count;
 	struct tun_struct *tun;
 	struct net *net;
-	wait_queue_head_t	read_wait;
 };
 
 struct tun_sock;
@@ -333,7 +332,7 @@ static void tun_net_uninit(struct net_device *dev)
 	/* Inform the methods they need to stop using the dev.
 	 */
 	if (tfile) {
-		wake_up_all(&tfile->read_wait);
+		wake_up_all(&tun->socket.wait);
 		if (atomic_dec_and_test(&tfile->count))
 			__tun_detach(tun);
 	}
@@ -393,7 +392,7 @@ static int tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
 	/* Notify and wake up reader process */
 	if (tun->flags & TUN_FASYNC)
 		kill_fasync(&tun->fasync, SIGIO, POLL_IN);
-	wake_up_interruptible(&tun->tfile->read_wait);
+	wake_up_interruptible(&tun->socket.wait);
 	return 0;
 
 drop:
@@ -490,7 +489,7 @@ static unsigned int tun_chr_poll(struct file *file, poll_table * wait)
 
 	DBG(KERN_INFO "%s: tun_chr_poll\n", tun->dev->name);
 
-	poll_wait(file, &tfile->read_wait, wait);
+	poll_wait(file, &tun->socket.wait, wait);
 
 	if (!skb_queue_empty(&tun->readq))
 		mask |= POLLIN | POLLRDNORM;
@@ -762,7 +761,7 @@ static ssize_t tun_chr_aio_read(struct kiocb *iocb, const struct iovec *iv,
 		goto out;
 	}
 
-	add_wait_queue(&tfile->read_wait, &wait);
+	add_wait_queue(&tun->socket.wait, &wait);
 	while (len) {
 		current->state = TASK_INTERRUPTIBLE;
 
@@ -793,7 +792,7 @@ static ssize_t tun_chr_aio_read(struct kiocb *iocb, const struct iovec *iv,
 	}
 
 	current->state = TASK_RUNNING;
-	remove_wait_queue(&tfile->read_wait, &wait);
+	remove_wait_queue(&tun->socket.wait, &wait);
 
 out:
 	tun_put(tun);
@@ -861,7 +860,6 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
 	struct sock *sk;
 	struct tun_struct *tun;
 	struct net_device *dev;
-	struct tun_file *tfile = file->private_data;
 	int err;
 
 	dev = __dev_get_by_name(net, ifr->ifr_name);
@@ -921,11 +919,11 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
 
 		/* This ref count is for tun->sk. */
 		dev_hold(dev);
+		init_waitqueue_head(&tun->socket.wait);
 		sock_init_data(&tun->socket, sk);
 		sk->sk_write_space = tun_sock_write_space;
 		sk->sk_destruct = tun_sock_destruct;
 		sk->sk_sndbuf = INT_MAX;
-		sk->sk_sleep = &tfile->read_wait;
 
 		tun->sk = sk;
 		container_of(sk, struct tun_sock, sk)->tun = tun;
@@ -1265,7 +1263,6 @@ static int tun_chr_open(struct inode *inode, struct file * file)
 	atomic_set(&tfile->count, 0);
 	tfile->tun = NULL;
 	tfile->net = get_net(current->nsproxy->net_ns);
-	init_waitqueue_head(&tfile->read_wait);
 	file->private_data = tfile;
 	return 0;
 }

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [Lguest] [PATCH 4/5] lguest: use KVM hypercalls
  2009-04-15  8:47                     ` Herbert Xu
@ 2009-04-15  9:07                       ` Christian Borntraeger
  2009-04-15 11:07                       ` Patrick McHardy
  2009-04-15 13:23                       ` Eric W. Biederman
  2 siblings, 0 replies; 27+ messages in thread
From: Christian Borntraeger @ 2009-04-15  9:07 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Patrick McHardy, Eric W. Biederman, Matias Zabaljauregui, odie,
	Rusty Russell, lguest, virtualization, David S. Miller, netdev

Am Wednesday 15 April 2009 10:47:17 schrieb Herbert Xu:
> tun: Fix sk_sleep races when attaching/detaching
>
> As the sk_sleep wait queue actually lives in tfile, which may be
> detached from the tun device, bad things will happen when we use
> sk_sleep after detaching.
>
> Since the tun device is the persistent data structure here (when
> requested by the user), it makes much more sense to have the wait
> queue live there.  There is no reason to have it in tfile at all
> since the only time we can wait is if we have a tun attached.
> In fact we already have a wait queue in tun_struct, so we might
> as well use it.
>
> Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
> Reported-by: Eric W. Biederman <ebiederm@xmission.com>
> Reported-by: Patrick McHardy <kaber@trash.net>
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Lguest] [PATCH 4/5] lguest: use KVM hypercalls
  2009-04-15  8:47                     ` Herbert Xu
  2009-04-15  9:07                       ` [Lguest] " Christian Borntraeger
@ 2009-04-15 11:07                       ` Patrick McHardy
  2009-04-15 13:23                       ` Eric W. Biederman
  2 siblings, 0 replies; 27+ messages in thread
From: Patrick McHardy @ 2009-04-15 11:07 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Eric W. Biederman, Matias Zabaljauregui, odie, Rusty Russell,
	lguest, virtualization, David S. Miller, netdev,
	Christian Borntraeger

Herbert Xu wrote:
> On Wed, Apr 15, 2009 at 04:36:10PM +0800, Herbert Xu wrote:
>> Let me whip up a patch.
> 
> tun: Fix sk_sleep races when attaching/detaching
> 
> As the sk_sleep wait queue actually lives in tfile, which may be
> detached from the tun device, bad things will happen when we use
> sk_sleep after detaching.
> 
> Since the tun device is the persistent data structure here (when
> requested by the user), it makes much more sense to have the wait
> queue live there.  There is no reason to have it in tfile at all
> since the only time we can wait is if we have a tun attached.
> In fact we already have a wait queue in tun_struct, so we might
> as well use it.

Tested and works fine, thanks Herbert.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Lguest] [PATCH 4/5] lguest: use KVM hypercalls
  2009-04-15  8:47                     ` Herbert Xu
  2009-04-15  9:07                       ` [Lguest] " Christian Borntraeger
  2009-04-15 11:07                       ` Patrick McHardy
@ 2009-04-15 13:23                       ` Eric W. Biederman
       [not found]                         ` <m18wm2rqy6.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
  2 siblings, 1 reply; 27+ messages in thread
From: Eric W. Biederman @ 2009-04-15 13:23 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Patrick McHardy, Matias Zabaljauregui, odie, Rusty Russell,
	lguest, virtualization, David S. Miller, netdev,
	Christian Borntraeger

Herbert Xu <herbert@gondor.apana.org.au> writes:

> On Wed, Apr 15, 2009 at 04:36:10PM +0800, Herbert Xu wrote:
>> 
>> Let me whip up a patch.
>
> tun: Fix sk_sleep races when attaching/detaching
>
> As the sk_sleep wait queue actually lives in tfile, which may be
> detached from the tun device, bad things will happen when we use
> sk_sleep after detaching.
>
> Since the tun device is the persistent data structure here (when
> requested by the user), it makes much more sense to have the wait
> queue live there.  There is no reason to have it in tfile at all
> since the only time we can wait is if we have a tun attached.
> In fact we already have a wait queue in tun_struct, so we might
> as well use it.

There is a GIGANTIC reason to have the wait queue on tfile.

If you open a file, and do ip link del tapN you can still
be blocked waiting in poll.

The problem is specifically free_poll_entry, where we call
remove_wait_queue and fput without calling any file methods.
So all of this happens without struct tun_file's count being
elevated.  Which means tun_net_uninit can detach before we get
off of the stupid poll wait queue.

As documented in:

commit b2430de37ef0bc0799ffba7b5219d38ca417eb76
Author: Eric W. Biederman <ebiederm@xmission.com>
Date:   Tue Jan 20 11:03:21 2009 +0000

    tun: Move read_wait into tun_file
    
    The poll interface requires that the waitqueue exist while the struct
    file is open.  In the rare case when a tun device disappears before
    the tun file closes we fail to provide this property, so move
    read_wait.
    
    This is safe now that tun_net_xmit is atomic with tun_detach.
    
    Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>



I specifically moved the wait queue out of tun struct to avoid this
race.

I will see about getting the vfs to do something saner in my generic
revoke work.  But for 2.6.30 we have to live with the nasties that
are there.

Nacked-by: "Eric W. Biederman" <ebiederm@xmission.com>

Eric

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/5] lguest: use KVM hypercalls
       [not found]                         ` <m18wm2rqy6.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
@ 2009-04-15 13:28                           ` Herbert Xu
       [not found]                             ` <20090415132802.GA11408-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>
  0 siblings, 1 reply; 27+ messages in thread
From: Herbert Xu @ 2009-04-15 13:28 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: lguest-mnsaURCQ41sdnm+yROfE0A, Christian Borntraeger,
	David S. Miller, virtualization-qjLDD68F18O7TbgM5vRIOg,
	Matias Zabaljauregui, netdev-u79uwXL29TY76Z2rM5mHXA,
	Patrick McHardy

On Wed, Apr 15, 2009 at 06:23:29AM -0700, Eric W. Biederman wrote:
> 
> There is a GIGANTIC reason to have the wait queue on tfile.
> 
> If you open a file, and do ip link del tapN you can still
> be blocked waiting in poll.
> 
> The problem is specifically free_poll_entry, where we call
> remove_wait_queue and fput without calling any file methods.
> So all of this happens without struct tun_file's count being
> elevated.  Which means tun_net_uninit can detach before we get
> off of the stupid poll wait queue.

What about taking a netdev refcount before calling poll_wait?

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/5] lguest: use KVM hypercalls
       [not found]                             ` <20090415132802.GA11408-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>
@ 2009-04-15 13:35                               ` Eric W. Biederman
       [not found]                                 ` <m1skkaox8h.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
  0 siblings, 1 reply; 27+ messages in thread
From: Eric W. Biederman @ 2009-04-15 13:35 UTC (permalink / raw)
  To: Herbert Xu
  Cc: lguest-mnsaURCQ41sdnm+yROfE0A, Christian Borntraeger,
	David S. Miller, virtualization-qjLDD68F18O7TbgM5vRIOg,
	Matias Zabaljauregui, netdev-u79uwXL29TY76Z2rM5mHXA,
	Patrick McHardy

Herbert Xu <herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org> writes:

> On Wed, Apr 15, 2009 at 06:23:29AM -0700, Eric W. Biederman wrote:
>> 
>> There is a GIGANTIC reason to have the wait queue on tfile.
>> 
>> If you open a file, and do ip link del tapN you can still
>> be blocked waiting in poll.
>> 
>> The problem is specifically free_poll_entry, where we call
>> remove_wait_queue and fput without calling any file methods.
>> So all of this happens without struct tun_file's count being
>> elevated.  Which means tun_net_uninit can detach before we get
>> off of the stupid poll wait queue.
>
> What about taking a netdev refcount before calling poll_wait?

Because as far as I can tell we would just leak that refcount.

The poll code does not appear to call back into any of the file
methods when it frees itself from the wait queue.

Eric

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/5] lguest: use KVM hypercalls
       [not found]                                 ` <m1skkaox8h.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
@ 2009-04-15 13:46                                   ` Herbert Xu
       [not found]                                     ` <20090415134610.GA11683-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>
  2009-04-15 14:06                                     ` [Lguest] " Eric W. Biederman
  0 siblings, 2 replies; 27+ messages in thread
From: Herbert Xu @ 2009-04-15 13:46 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: lguest-mnsaURCQ41sdnm+yROfE0A, Christian Borntraeger,
	David S. Miller, virtualization-qjLDD68F18O7TbgM5vRIOg,
	Matias Zabaljauregui, netdev-u79uwXL29TY76Z2rM5mHXA,
	Patrick McHardy

On Wed, Apr 15, 2009 at 06:35:58AM -0700, Eric W. Biederman wrote:
> 
> Because as far as I can tell we would just leak that refcount.
> 
> The poll code does not appear to call back into any of the file
> methods when it frees itself from the wait queue.

OK my suggestion was stupid.

But I still don't see how this race is possible at all.

So process A has a tun fd open and is spinning in poll(2).  Now
process B comes along and deletes that tun device.  Process A's
fd should have a netdev reference that keeps the device and
associated structures alive.

Oh I see what's going on.  We're automatically detaching the
device in uninit.  This is just wrong.  Just because process B
deleted the netdev, process A should not be involutarily detached.

Does anything actually rely on this behaviour?

If not we should just change it to not do that.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/5] lguest: use KVM hypercalls
       [not found]                                     ` <20090415134610.GA11683-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>
@ 2009-04-15 13:55                                       ` Herbert Xu
       [not found]                                         ` <20090415135502.GA11827-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>
  0 siblings, 1 reply; 27+ messages in thread
From: Herbert Xu @ 2009-04-15 13:55 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: lguest-mnsaURCQ41sdnm+yROfE0A, Christian Borntraeger,
	David S. Miller, virtualization-qjLDD68F18O7TbgM5vRIOg,
	Matias Zabaljauregui, netdev-u79uwXL29TY76Z2rM5mHXA,
	Patrick McHardy

On Wed, Apr 15, 2009 at 09:46:10PM +0800, Herbert Xu wrote:
>
> Does anything actually rely on this behaviour?

I doubt it :)

> If not we should just change it to not do that.

It appears that this was introduced in

commit c70f182940f988448f3c12a209d18b1edc276e33
Author: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
Date:   Tue Jan 20 11:07:17 2009 +0000

    tun: Fix races between tun_net_close and free_netdev.

Presumably in order to fix the problem of trying to unregister
the same device twice.

I what we should do is to mark the device as dead instead of
detaching if a third party deletes it.  That's all you need
to know to stop close(2) from trying the unregister a device
that's already been unregistered.

What else am I missing?

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Lguest] [PATCH 4/5] lguest: use KVM hypercalls
  2009-04-15 13:46                                   ` Herbert Xu
       [not found]                                     ` <20090415134610.GA11683-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>
@ 2009-04-15 14:06                                     ` Eric W. Biederman
       [not found]                                       ` <m11vruovu5.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
  1 sibling, 1 reply; 27+ messages in thread
From: Eric W. Biederman @ 2009-04-15 14:06 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Patrick McHardy, Matias Zabaljauregui, odie, Rusty Russell,
	lguest, virtualization, David S. Miller, netdev,
	Christian Borntraeger

Herbert Xu <herbert@gondor.apana.org.au> writes:

> On Wed, Apr 15, 2009 at 06:35:58AM -0700, Eric W. Biederman wrote:
>> 
>> Because as far as I can tell we would just leak that refcount.
>> 
>> The poll code does not appear to call back into any of the file
>> methods when it frees itself from the wait queue.
>
> OK my suggestion was stupid.
>
> But I still don't see how this race is possible at all.
>
> So process A has a tun fd open and is spinning in poll(2).  Now
> process B comes along and deletes that tun device.  Process A's
> fd should have a netdev reference that keeps the device and
> associated structures alive.
>
> Oh I see what's going on.  We're automatically detaching the
> device in uninit.  This is just wrong.  Just because process B
> deleted the netdev, process A should not be involutarily detached.
>
> Does anything actually rely on this behaviour?

Yes.

There is the boring rmmod case that has always existed.

There is more interesting case of moving your tap device
into another network namespace.

In which case there is the possibility of the network namespace
exiting and destroying all of the virtual network devices before
we close the file handle.

I implemented the rtnetlink methods so we can test the uninit
behavior without going through all sorts of hoops.

Eric

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/5] lguest: use KVM hypercalls
       [not found]                                       ` <m11vruovu5.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
@ 2009-04-15 14:08                                         ` Herbert Xu
       [not found]                                           ` <20090415140819.GA11991-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>
  0 siblings, 1 reply; 27+ messages in thread
From: Herbert Xu @ 2009-04-15 14:08 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: lguest-mnsaURCQ41sdnm+yROfE0A, Christian Borntraeger,
	David S. Miller, virtualization-qjLDD68F18O7TbgM5vRIOg,
	Matias Zabaljauregui, netdev-u79uwXL29TY76Z2rM5mHXA,
	Patrick McHardy

On Wed, Apr 15, 2009 at 07:06:10AM -0700, Eric W. Biederman wrote:
>
> There is the boring rmmod case that has always existed.
> 
> There is more interesting case of moving your tap device
> into another network namespace.
> 
> In which case there is the possibility of the network namespace
> exiting and destroying all of the virtual network devices before
> we close the file handle.

In that case what's the problem with holding a refcount to the
unregistered device until the process owning the fd closes it?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/5] lguest: use KVM hypercalls
       [not found]                                         ` <20090415135502.GA11827-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>
@ 2009-04-15 14:10                                           ` Eric W. Biederman
       [not found]                                             ` <m1ocuynh2f.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
  0 siblings, 1 reply; 27+ messages in thread
From: Eric W. Biederman @ 2009-04-15 14:10 UTC (permalink / raw)
  To: Herbert Xu
  Cc: lguest-mnsaURCQ41sdnm+yROfE0A, Christian Borntraeger,
	David S. Miller, virtualization-qjLDD68F18O7TbgM5vRIOg,
	Matias Zabaljauregui, netdev-u79uwXL29TY76Z2rM5mHXA,
	Patrick McHardy

Herbert Xu <herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org> writes:

> On Wed, Apr 15, 2009 at 09:46:10PM +0800, Herbert Xu wrote:
>>
>> Does anything actually rely on this behaviour?
>
> I doubt it :)
>
>> If not we should just change it to not do that.
>
> It appears that this was introduced in
>
> commit c70f182940f988448f3c12a209d18b1edc276e33
> Author: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
> Date:   Tue Jan 20 11:07:17 2009 +0000
>
>     tun: Fix races between tun_net_close and free_netdev.
>
> Presumably in order to fix the problem of trying to unregister
> the same device twice.
>
> I what we should do is to mark the device as dead instead of
> detaching if a third party deletes it.  That's all you need
> to know to stop close(2) from trying the unregister a device
> that's already been unregistered.
>
> What else am I missing?

Hopefully my earlier explanation helps.  I will get back to
you as soon as I can.  But I am off on vacation for the
rest of the week.

Eric

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/5] lguest: use KVM hypercalls
       [not found]                                             ` <m1ocuynh2f.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
@ 2009-04-15 14:12                                               ` Herbert Xu
  0 siblings, 0 replies; 27+ messages in thread
From: Herbert Xu @ 2009-04-15 14:12 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: lguest-mnsaURCQ41sdnm+yROfE0A, Christian Borntraeger,
	David S. Miller, virtualization-qjLDD68F18O7TbgM5vRIOg,
	Matias Zabaljauregui, netdev-u79uwXL29TY76Z2rM5mHXA,
	Patrick McHardy

On Wed, Apr 15, 2009 at 07:10:32AM -0700, Eric W. Biederman wrote:
>
> Hopefully my earlier explanation helps.  I will get back to
> you as soon as I can.  But I am off on vacation for the
> rest of the week.

I'm sorry but I don't think we can wait for that.  We might just
have to fix it whatever way we can.  You can unfix it when you
come back :)

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/5] lguest: use KVM hypercalls
       [not found]                                           ` <20090415140819.GA11991-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>
@ 2009-04-15 14:18                                             ` Eric W. Biederman
       [not found]                                               ` <m1iql6m24b.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
  0 siblings, 1 reply; 27+ messages in thread
From: Eric W. Biederman @ 2009-04-15 14:18 UTC (permalink / raw)
  To: Herbert Xu
  Cc: lguest-mnsaURCQ41sdnm+yROfE0A, Christian Borntraeger,
	David S. Miller, virtualization-qjLDD68F18O7TbgM5vRIOg,
	Matias Zabaljauregui, netdev-u79uwXL29TY76Z2rM5mHXA,
	Patrick McHardy

Herbert Xu <herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org> writes:

> On Wed, Apr 15, 2009 at 07:06:10AM -0700, Eric W. Biederman wrote:
>>
>> There is the boring rmmod case that has always existed.
>> 
>> There is more interesting case of moving your tap device
>> into another network namespace.
>> 
>> In which case there is the possibility of the network namespace
>> exiting and destroying all of the virtual network devices before
>> we close the file handle.
>
> In that case what's the problem with holding a refcount to the
> unregistered device until the process owning the fd closes it?

Network devices do not hold a network namespace alive.  Only sockets
and processes do.

So holding the reference only blocks us indefinitely in
netdev_wait_allrefs, blocking the network namespace exit, and holding
net_mutex indefinitely.

My gut feel is that the socket needs to live in tun_file.  Instead
of in tun_struct.  Making that change looked just tricky enough
I couldn't sort through it when I glanced at the tun code, after I noticed
you had added a socket.

Eric

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/5] lguest: use KVM hypercalls
       [not found]                                               ` <m1iql6m24b.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
@ 2009-04-15 14:23                                                 ` Herbert Xu
  2009-04-15 14:38                                                 ` Herbert Xu
  1 sibling, 0 replies; 27+ messages in thread
From: Herbert Xu @ 2009-04-15 14:23 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: lguest-mnsaURCQ41sdnm+yROfE0A, Christian Borntraeger,
	David S. Miller, virtualization-qjLDD68F18O7TbgM5vRIOg,
	Matias Zabaljauregui, netdev-u79uwXL29TY76Z2rM5mHXA,
	Patrick McHardy

On Wed, Apr 15, 2009 at 07:18:44AM -0700, Eric W. Biederman wrote:
> 
> My gut feel is that the socket needs to live in tun_file.  Instead
> of in tun_struct.  Making that change looked just tricky enough
> I couldn't sort through it when I glanced at the tun code, after I noticed
> you had added a socket.

Referring to tun_file to get sk_sleep is just too error-prone.
Unlike the transmit direction, the receive direction does not
present itself to the easy xmit lock solution that's currently
used to make tun_detach atomic.

The receive callback that currently uses sk_sleep can happen
anywhere.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/5] lguest: use KVM hypercalls
       [not found]                                               ` <m1iql6m24b.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
  2009-04-15 14:23                                                 ` Herbert Xu
@ 2009-04-15 14:38                                                 ` Herbert Xu
       [not found]                                                   ` <20090415143834.GA12384-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>
  1 sibling, 1 reply; 27+ messages in thread
From: Herbert Xu @ 2009-04-15 14:38 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: lguest-mnsaURCQ41sdnm+yROfE0A, Christian Borntraeger,
	David S. Miller, virtualization-qjLDD68F18O7TbgM5vRIOg,
	Matias Zabaljauregui, netdev-u79uwXL29TY76Z2rM5mHXA,
	Patrick McHardy

On Wed, Apr 15, 2009 at 07:18:44AM -0700, Eric W. Biederman wrote:
>
> So holding the reference only blocks us indefinitely in
> netdev_wait_allrefs, blocking the network namespace exit, and holding
> net_mutex indefinitely.

OK that's a killer because process A in my previous scenario can
kill the device itself and cause a dead-lock.

So how about this? We replace the dev destructor with our own that
doesn't immediately call free_netdev.  We only call free_netdev once
all tun fd's attached to the device have been closed.

This can be achieved by simply adding a counter to tun_struct.
We'd also change the async detach path to set a marker instead
of detaching.  That marker can then be checked in places like
tun_get.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/5] lguest: use KVM hypercalls
       [not found]                                                   ` <20090415143834.GA12384-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>
@ 2009-04-15 14:56                                                     ` Eric W. Biederman
       [not found]                                                       ` <m1zleiklsl.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
  2009-04-16 11:08                                                     ` [1/2] tun: Only free a netdev when all tun descriptors are closed Herbert Xu
  1 sibling, 1 reply; 27+ messages in thread
From: Eric W. Biederman @ 2009-04-15 14:56 UTC (permalink / raw)
  To: Herbert Xu
  Cc: lguest-mnsaURCQ41sdnm+yROfE0A, Christian Borntraeger,
	David S. Miller, virtualization-qjLDD68F18O7TbgM5vRIOg,
	Matias Zabaljauregui, netdev-u79uwXL29TY76Z2rM5mHXA,
	Patrick McHardy

Herbert Xu <herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org> writes:

> On Wed, Apr 15, 2009 at 07:18:44AM -0700, Eric W. Biederman wrote:
>>
>> So holding the reference only blocks us indefinitely in
>> netdev_wait_allrefs, blocking the network namespace exit, and holding
>> net_mutex indefinitely.
>
> OK that's a killer because process A in my previous scenario can
> kill the device itself and cause a dead-lock.
>
> So how about this? We replace the dev destructor with our own that
> doesn't immediately call free_netdev.  We only call free_netdev once
> all tun fd's attached to the device have been closed.
>
> This can be achieved by simply adding a counter to tun_struct.
> We'd also change the async detach path to set a marker instead
> of detaching.  That marker can then be checked in places like
> tun_get.

That sounds like it would work, and allow us to have big tun_struct.
Which is sounds simpler overall.

I still have the feeling that putting the socket in tun_file instead
of in tun_struct might be conceptually cleaner, but one big blob that
is allocated and destroyed together is certainly easier and a lot
less racy to deal with.

Eric

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/5] lguest: use KVM hypercalls
       [not found]                                                       ` <m1zleiklsl.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
@ 2009-04-15 22:27                                                         ` Herbert Xu
  0 siblings, 0 replies; 27+ messages in thread
From: Herbert Xu @ 2009-04-15 22:27 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: lguest-mnsaURCQ41sdnm+yROfE0A, Christian Borntraeger,
	David S. Miller, virtualization-qjLDD68F18O7TbgM5vRIOg,
	Matias Zabaljauregui, netdev-u79uwXL29TY76Z2rM5mHXA,
	Patrick McHardy

On Wed, Apr 15, 2009 at 07:56:42AM -0700, Eric W. Biederman wrote:
>
> I still have the feeling that putting the socket in tun_file instead
> of in tun_struct might be conceptually cleaner, but one big blob that
> is allocated and destroyed together is certainly easier and a lot
> less racy to deal with.

As I said the difficulty with putting the socket in tun_file
is how do you get it on the RX callback path without introducing
new races.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [1/2] tun: Only free a netdev when all tun descriptors are closed
       [not found]                                                   ` <20090415143834.GA12384-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>
  2009-04-15 14:56                                                     ` Eric W. Biederman
@ 2009-04-16 11:08                                                     ` Herbert Xu
       [not found]                                                       ` <20090416110818.GA20950-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>
  2009-04-24  8:55                                                       ` [1/2] tun: Only free a netdev when all tun descriptors are closed Christian Borntraeger
  1 sibling, 2 replies; 27+ messages in thread
From: Herbert Xu @ 2009-04-16 11:08 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: lguest-mnsaURCQ41sdnm+yROfE0A, Christian Borntraeger,
	David S. Miller, virtualization-qjLDD68F18O7TbgM5vRIOg,
	Matias Zabaljauregui, netdev-u79uwXL29TY76Z2rM5mHXA,
	Patrick McHardy

On Wed, Apr 15, 2009 at 10:38:34PM +0800, Herbert Xu wrote:
> 
> So how about this? We replace the dev destructor with our own that
> doesn't immediately call free_netdev.  We only call free_netdev once
> all tun fd's attached to the device have been closed.

Here's the patch.  I'd appreciate if everyone can review it
and see if they can recreate the original race by

1) Starting a process using tun and polls on it;
2) Doing ip tun del tunXXX while the process is polling.

tun: Only free a netdev when all tun descriptors are closed

The commit c70f182940f988448f3c12a209d18b1edc276e33 ("tun: Fix
races between tun_net_close and free_netdev") fixed a race where
an asynchronous deletion of a tun device can hose a poll(2) on
a tun fd attached to that device.

However, this came at the cost of moving the tun wait queue into
the tun file data structure.  The problem with this is that it
imposes restrictions on when and where the tun device can access
the wait queue since the tun file may change at any time due to
detaching and reattaching.

In particular, now that we need to use the wait queue on the
receive path it becomes difficult to properly synchronise this
with the detachment of the tun device.

This patch solves the original race in a different way.  Since
the race is only because the underlying memory gets freed, we
can prevent it simply by ensuring that we don't do that until
all tun descriptors ever attached to the device (even if they
have since be detached because they may still be sitting in poll)
have been closed.

This is done by using reference counting the attached tun file
descriptors.  The refcount in tun->sk has been reappropriated
for this purpose since it was already being used for that, albeit
from the opposite angle.

Note that we no longer zero tfile->tun since tun_get will return
NULL anyway after the refcount on tfile hits zero.  Instead it
represents whether this device has ever been attached to a device.

Signed-off-by: Herbert Xu <herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index a1b0697..540f829 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -156,6 +156,7 @@ static int tun_attach(struct tun_struct *tun, struct file *file)
 	tfile->tun = tun;
 	tun->tfile = tfile;
 	dev_hold(tun->dev);
+	sock_hold(tun->sk);
 	atomic_inc(&tfile->count);
 
 out:
@@ -165,11 +166,8 @@ out:
 
 static void __tun_detach(struct tun_struct *tun)
 {
-	struct tun_file *tfile = tun->tfile;
-
 	/* Detach from net device */
 	netif_tx_lock_bh(tun->dev);
-	tfile->tun = NULL;
 	tun->tfile = NULL;
 	netif_tx_unlock_bh(tun->dev);
 
@@ -339,6 +337,13 @@ static void tun_net_uninit(struct net_device *dev)
 	}
 }
 
+static void tun_free_netdev(struct net_device *dev)
+{
+	struct tun_struct *tun = netdev_priv(dev);
+
+	sock_put(tun->sk);
+}
+
 /* Net device open. */
 static int tun_net_open(struct net_device *dev)
 {
@@ -810,7 +815,7 @@ static void tun_setup(struct net_device *dev)
 	tun->group = -1;
 
 	dev->ethtool_ops = &tun_ethtool_ops;
-	dev->destructor = free_netdev;
+	dev->destructor = tun_free_netdev;
 }
 
 /* Trivial set of netlink ops to allow deleting tun or tap
@@ -847,7 +852,7 @@ static void tun_sock_write_space(struct sock *sk)
 
 static void tun_sock_destruct(struct sock *sk)
 {
-	dev_put(container_of(sk, struct tun_sock, sk)->tun->dev);
+	free_netdev(container_of(sk, struct tun_sock, sk)->tun->dev);
 }
 
 static struct proto tun_proto = {
@@ -919,8 +924,6 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
 		if (!sk)
 			goto err_free_dev;
 
-		/* This ref count is for tun->sk. */
-		dev_hold(dev);
 		sock_init_data(&tun->socket, sk);
 		sk->sk_write_space = tun_sock_write_space;
 		sk->sk_destruct = tun_sock_destruct;
@@ -1275,20 +1278,18 @@ static int tun_chr_close(struct inode *inode, struct file *file)
 	struct tun_file *tfile = file->private_data;
 	struct tun_struct *tun = __tun_get(tfile);
 
-
 	if (tun) {
-		DBG(KERN_INFO "%s: tun_chr_close\n", tun->dev->name);
-
-		rtnl_lock();
-		__tun_detach(tun);
-
 		/* If desireable, unregister the netdevice. */
-		if (!(tun->flags & TUN_PERSIST)) {
-			sock_put(tun->sk);
-			unregister_netdevice(tun->dev);
-		}
+		if (!(tun->flags & TUN_PERSIST))
+			unregister_netdev(tun->dev);
+		else
+			tun_put(tun);
+	} else
+		tun = tfile->tun;
 
-		rtnl_unlock();
+	if (tun) {
+		DBG(KERN_INFO "%s: tun_chr_close\n", tun->dev->name);
+		sock_put(tun->sk);
 	}
 
 	put_net(tfile->net);

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [2/2] tun: Fix sk_sleep races when attaching/detaching
       [not found]                                                       ` <20090416110818.GA20950-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>
@ 2009-04-16 11:09                                                         ` Herbert Xu
       [not found]                                                           ` <20090416110952.GB20950-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>
  0 siblings, 1 reply; 27+ messages in thread
From: Herbert Xu @ 2009-04-16 11:09 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: lguest-mnsaURCQ41sdnm+yROfE0A, Christian Borntraeger,
	David S. Miller, virtualization-qjLDD68F18O7TbgM5vRIOg,
	Matias Zabaljauregui, netdev-u79uwXL29TY76Z2rM5mHXA,
	Patrick McHardy

On Thu, Apr 16, 2009 at 07:08:18PM +0800, Herbert Xu wrote:
>
> tun: Only free a netdev when all tun descriptors are closed

With that patch we can now safely move read_wait.

tun: Fix sk_sleep races when attaching/detaching

As the sk_sleep wait queue actually lives in tfile, which may be
detached from the tun device, bad things will happen when we use
sk_sleep after detaching.

Since the tun device is the persistent data structure here (when
requested by the user), it makes much more sense to have the wait
queue live there.  There is no reason to have it in tfile at all
since the only time we can wait is if we have a tun attached.
In fact we already have a wait queue in tun_struct, so we might
as well use it.

Reported-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
Tested-by: Christian Borntraeger <borntraeger-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org>
Tested-by: Patrick McHardy <kaber-dcUjhNyLwpNeoWH0uzbU5w@public.gmane.org>
Signed-off-by: Herbert Xu <herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 540f829..7cfe3d1 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -93,7 +93,6 @@ struct tun_file {
 	atomic_t count;
 	struct tun_struct *tun;
 	struct net *net;
-	wait_queue_head_t	read_wait;
 };
 
 struct tun_sock;
@@ -331,7 +330,7 @@ static void tun_net_uninit(struct net_device *dev)
 	/* Inform the methods they need to stop using the dev.
 	 */
 	if (tfile) {
-		wake_up_all(&tfile->read_wait);
+		wake_up_all(&tun->socket.wait);
 		if (atomic_dec_and_test(&tfile->count))
 			__tun_detach(tun);
 	}
@@ -398,7 +397,7 @@ static int tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
 	/* Notify and wake up reader process */
 	if (tun->flags & TUN_FASYNC)
 		kill_fasync(&tun->fasync, SIGIO, POLL_IN);
-	wake_up_interruptible(&tun->tfile->read_wait);
+	wake_up_interruptible(&tun->socket.wait);
 	return 0;
 
 drop:
@@ -495,7 +494,7 @@ static unsigned int tun_chr_poll(struct file *file, poll_table * wait)
 
 	DBG(KERN_INFO "%s: tun_chr_poll\n", tun->dev->name);
 
-	poll_wait(file, &tfile->read_wait, wait);
+	poll_wait(file, &tun->socket.wait, wait);
 
 	if (!skb_queue_empty(&tun->readq))
 		mask |= POLLIN | POLLRDNORM;
@@ -767,7 +766,7 @@ static ssize_t tun_chr_aio_read(struct kiocb *iocb, const struct iovec *iv,
 		goto out;
 	}
 
-	add_wait_queue(&tfile->read_wait, &wait);
+	add_wait_queue(&tun->socket.wait, &wait);
 	while (len) {
 		current->state = TASK_INTERRUPTIBLE;
 
@@ -798,7 +797,7 @@ static ssize_t tun_chr_aio_read(struct kiocb *iocb, const struct iovec *iv,
 	}
 
 	current->state = TASK_RUNNING;
-	remove_wait_queue(&tfile->read_wait, &wait);
+	remove_wait_queue(&tun->socket.wait, &wait);
 
 out:
 	tun_put(tun);
@@ -866,7 +865,6 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
 	struct sock *sk;
 	struct tun_struct *tun;
 	struct net_device *dev;
-	struct tun_file *tfile = file->private_data;
 	int err;
 
 	dev = __dev_get_by_name(net, ifr->ifr_name);
@@ -924,11 +922,11 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
 		if (!sk)
 			goto err_free_dev;
 
+		init_waitqueue_head(&tun->socket.wait);
 		sock_init_data(&tun->socket, sk);
 		sk->sk_write_space = tun_sock_write_space;
 		sk->sk_destruct = tun_sock_destruct;
 		sk->sk_sndbuf = INT_MAX;
-		sk->sk_sleep = &tfile->read_wait;
 
 		tun->sk = sk;
 		container_of(sk, struct tun_sock, sk)->tun = tun;
@@ -1268,7 +1266,6 @@ static int tun_chr_open(struct inode *inode, struct file * file)
 	atomic_set(&tfile->count, 0);
 	tfile->tun = NULL;
 	tfile->net = get_net(current->nsproxy->net_ns);
-	init_waitqueue_head(&tfile->read_wait);
 	file->private_data = tfile;
 	return 0;
 }

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [2/2] tun: Fix sk_sleep races when attaching/detaching
       [not found]                                                           ` <20090416110952.GB20950-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>
@ 2009-04-20  8:35                                                             ` Herbert Xu
  2009-04-20  9:26                                                               ` David Miller
  0 siblings, 1 reply; 27+ messages in thread
From: Herbert Xu @ 2009-04-20  8:35 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: lguest-mnsaURCQ41sdnm+yROfE0A, Christian Borntraeger,
	David S. Miller, virtualization-qjLDD68F18O7TbgM5vRIOg,
	Matias Zabaljauregui, netdev-u79uwXL29TY76Z2rM5mHXA,
	Patrick McHardy

On Thu, Apr 16, 2009 at 07:09:52PM +0800, Herbert Xu wrote:
> 
> tun: Fix sk_sleep races when attaching/detaching

That patch doesn't apply anymore because of contextual changes
caused by the first patch.  Here's an update.

tun: Fix sk_sleep races when attaching/detaching

As the sk_sleep wait queue actually lives in tfile, which may be
detached from the tun device, bad things will happen when we use
sk_sleep after detaching.

Since the tun device is the persistent data structure here (when
requested by the user), it makes much more sense to have the wait
queue live there.  There is no reason to have it in tfile at all
since the only time we can wait is if we have a tun attached.
In fact we already have a wait queue in tun_struct, so we might
as well use it.

Reported-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
Tested-by: Christian Borntraeger <borntraeger-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org>
Tested-by: Patrick McHardy <kaber-dcUjhNyLwpNeoWH0uzbU5w@public.gmane.org>
Signed-off-by: Herbert Xu <herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 95ae40a..735bf41 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -93,7 +93,6 @@ struct tun_file {
 	atomic_t count;
 	struct tun_struct *tun;
 	struct net *net;
-	wait_queue_head_t	read_wait;
 };
 
 struct tun_sock;
@@ -331,7 +330,7 @@ static void tun_net_uninit(struct net_device *dev)
 	/* Inform the methods they need to stop using the dev.
 	 */
 	if (tfile) {
-		wake_up_all(&tfile->read_wait);
+		wake_up_all(&tun->socket.wait);
 		if (atomic_dec_and_test(&tfile->count))
 			__tun_detach(tun);
 	}
@@ -398,7 +397,7 @@ static int tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
 	/* Notify and wake up reader process */
 	if (tun->flags & TUN_FASYNC)
 		kill_fasync(&tun->fasync, SIGIO, POLL_IN);
-	wake_up_interruptible(&tun->tfile->read_wait);
+	wake_up_interruptible(&tun->socket.wait);
 	return 0;
 
 drop:
@@ -495,7 +494,7 @@ static unsigned int tun_chr_poll(struct file *file, poll_table * wait)
 
 	DBG(KERN_INFO "%s: tun_chr_poll\n", tun->dev->name);
 
-	poll_wait(file, &tfile->read_wait, wait);
+	poll_wait(file, &tun->socket.wait, wait);
 
 	if (!skb_queue_empty(&tun->readq))
 		mask |= POLLIN | POLLRDNORM;
@@ -768,7 +767,7 @@ static ssize_t tun_chr_aio_read(struct kiocb *iocb, const struct iovec *iv,
 		goto out;
 	}
 
-	add_wait_queue(&tfile->read_wait, &wait);
+	add_wait_queue(&tun->socket.wait, &wait);
 	while (len) {
 		current->state = TASK_INTERRUPTIBLE;
 
@@ -799,7 +798,7 @@ static ssize_t tun_chr_aio_read(struct kiocb *iocb, const struct iovec *iv,
 	}
 
 	current->state = TASK_RUNNING;
-	remove_wait_queue(&tfile->read_wait, &wait);
+	remove_wait_queue(&tun->socket.wait, &wait);
 
 out:
 	tun_put(tun);
@@ -867,7 +866,6 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
 	struct sock *sk;
 	struct tun_struct *tun;
 	struct net_device *dev;
-	struct tun_file *tfile = file->private_data;
 	int err;
 
 	dev = __dev_get_by_name(net, ifr->ifr_name);
@@ -925,10 +923,10 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
 		if (!sk)
 			goto err_free_dev;
 
+		init_waitqueue_head(&tun->socket.wait);
 		sock_init_data(&tun->socket, sk);
 		sk->sk_write_space = tun_sock_write_space;
 		sk->sk_sndbuf = INT_MAX;
-		sk->sk_sleep = &tfile->read_wait;
 
 		tun->sk = sk;
 		container_of(sk, struct tun_sock, sk)->tun = tun;
@@ -1270,7 +1268,6 @@ static int tun_chr_open(struct inode *inode, struct file * file)
 	atomic_set(&tfile->count, 0);
 	tfile->tun = NULL;
 	tfile->net = get_net(current->nsproxy->net_ns);
-	init_waitqueue_head(&tfile->read_wait);
 	file->private_data = tfile;
 	return 0;
 }

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [2/2] tun: Fix sk_sleep races when attaching/detaching
  2009-04-20  8:35                                                             ` Herbert Xu
@ 2009-04-20  9:26                                                               ` David Miller
  2009-04-20  9:35                                                                 ` Herbert Xu
  0 siblings, 1 reply; 27+ messages in thread
From: David Miller @ 2009-04-20  9:26 UTC (permalink / raw)
  To: herbert
  Cc: ebiederm, kaber, zabaljauregui, odie, rusty, lguest,
	virtualization, netdev, borntraeger

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Mon, 20 Apr 2009 16:35:50 +0800

> On Thu, Apr 16, 2009 at 07:09:52PM +0800, Herbert Xu wrote:
>> 
>> tun: Fix sk_sleep races when attaching/detaching
> 
> That patch doesn't apply anymore because of contextual changes
> caused by the first patch.  Here's an update.
> 
> tun: Fix sk_sleep races when attaching/detaching

Do you think these two patches are ready to go into net-2.6
now?

Thanks.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [2/2] tun: Fix sk_sleep races when attaching/detaching
  2009-04-20  9:26                                                               ` David Miller
@ 2009-04-20  9:35                                                                 ` Herbert Xu
  2009-04-20 10:02                                                                   ` David Miller
  0 siblings, 1 reply; 27+ messages in thread
From: Herbert Xu @ 2009-04-20  9:35 UTC (permalink / raw)
  To: David Miller
  Cc: ebiederm, kaber, zabaljauregui, odie, rusty, lguest,
	virtualization, netdev, borntraeger

On Mon, Apr 20, 2009 at 02:26:35AM -0700, David Miller wrote:
> 
> Do you think these two patches are ready to go into net-2.6
> now?

I think so.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [2/2] tun: Fix sk_sleep races when attaching/detaching
  2009-04-20  9:35                                                                 ` Herbert Xu
@ 2009-04-20 10:02                                                                   ` David Miller
  0 siblings, 0 replies; 27+ messages in thread
From: David Miller @ 2009-04-20 10:02 UTC (permalink / raw)
  To: herbert
  Cc: ebiederm, kaber, zabaljauregui, odie, rusty, lguest,
	virtualization, netdev, borntraeger

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Mon, 20 Apr 2009 17:35:49 +0800

> On Mon, Apr 20, 2009 at 02:26:35AM -0700, David Miller wrote:
>> 
>> Do you think these two patches are ready to go into net-2.6
>> now?
> 
> I think so.

Great, applied, thanks Herbert.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [1/2] tun: Only free a netdev when all tun descriptors are closed
  2009-04-16 11:08                                                     ` [1/2] tun: Only free a netdev when all tun descriptors are closed Herbert Xu
       [not found]                                                       ` <20090416110818.GA20950-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>
@ 2009-04-24  8:55                                                       ` Christian Borntraeger
       [not found]                                                         ` <200904241055.49794.borntraeger-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org>
  1 sibling, 1 reply; 27+ messages in thread
From: Christian Borntraeger @ 2009-04-24  8:55 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Eric W. Biederman, Patrick McHardy, Matias Zabaljauregui, odie,
	Rusty Russell, lguest, virtualization, David S. Miller, netdev,
	Carsten Otte

Am Thursday 16 April 2009 13:08:18 schrieb Herbert Xu:
> On Wed, Apr 15, 2009 at 10:38:34PM +0800, Herbert Xu wrote:
> > 
> > So how about this? We replace the dev destructor with our own that
> > doesn't immediately call free_netdev.  We only call free_netdev once
> > all tun fd's attached to the device have been closed.
> 
> Here's the patch.  I'd appreciate if everyone can review it
> and see if they can recreate the original race by
> 
> 1) Starting a process using tun and polls on it;
> 2) Doing ip tun del tunXXX while the process is polling.
> 
> tun: Only free a netdev when all tun descriptors are closed
> 
> The commit c70f182940f988448f3c12a209d18b1edc276e33 ("tun: Fix
> races between tun_net_close and free_netdev") fixed a race where
> an asynchronous deletion of a tun device can hose a poll(2) on
> a tun fd attached to that device.

Sorry for the late reply, but this patch creates another regression:
Now TUNSETIFF returns EBUSY on a persistent tap device:

open("/dev/net/tun", O_RDWR) = 11
ioctl(11, 0x400454ca, 0x3ffff8e2270) = -1 EBUSY (Device or resource busy)

Some debugging (thanks to Carsten Otte) showed that

tun_set_iff calls tun_attach (the first call inside the if(dev)). tun_attach
now checks for tun->tfile but this is already set.
Looks like we need another fix on top :-(

Christian

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [1/2] tun: Only free a netdev when all tun descriptors are closed
       [not found]                                                         ` <200904241055.49794.borntraeger-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org>
@ 2009-04-24 12:11                                                           ` Herbert Xu
       [not found]                                                             ` <20090424121156.GA28039-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>
  0 siblings, 1 reply; 27+ messages in thread
From: Herbert Xu @ 2009-04-24 12:11 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Carsten Otte, lguest-mnsaURCQ41sdnm+yROfE0A, David S. Miller,
	virtualization-qjLDD68F18O7TbgM5vRIOg, Matias Zabaljauregui,
	Eric W. Biederman, netdev-u79uwXL29TY76Z2rM5mHXA, Patrick McHardy

On Fri, Apr 24, 2009 at 10:55:49AM +0200, Christian Borntraeger wrote:
> Am Thursday 16 April 2009 13:08:18 schrieb Herbert Xu:
>
> > Here's the patch.  I'd appreciate if everyone can review it
> > and see if they can recreate the original race by
> > 
> > 1) Starting a process using tun and polls on it;
> > 2) Doing ip tun del tunXXX while the process is polling.
> > 
> > tun: Only free a netdev when all tun descriptors are closed
> > 
> > The commit c70f182940f988448f3c12a209d18b1edc276e33 ("tun: Fix
> > races between tun_net_close and free_netdev") fixed a race where
> > an asynchronous deletion of a tun device can hose a poll(2) on
> > a tun fd attached to that device.
> 
> Sorry for the late reply, but this patch creates another regression:
> Now TUNSETIFF returns EBUSY on a persistent tap device:
> 
> open("/dev/net/tun", O_RDWR) = 11
> ioctl(11, 0x400454ca, 0x3ffff8e2270) = -1 EBUSY (Device or resource busy)

The patch you qouted has been superceded by many versions :)

Can you please test the latest that's in davem's tree?

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [1/2] tun: Only free a netdev when all tun descriptors are closed
       [not found]                                                             ` <20090424121156.GA28039-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>
@ 2009-04-24 12:40                                                               ` Christian Borntraeger
  0 siblings, 0 replies; 27+ messages in thread
From: Christian Borntraeger @ 2009-04-24 12:40 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Carsten Otte, lguest-mnsaURCQ41sdnm+yROfE0A, David S. Miller,
	virtualization-qjLDD68F18O7TbgM5vRIOg, Matias Zabaljauregui,
	Eric W. Biederman, netdev-u79uwXL29TY76Z2rM5mHXA, Patrick McHardy

Am Friday 24 April 2009 14:11:56 schrieb Herbert Xu:
> The patch you qouted has been superceded by many versions :)

Yes, I got lost in this mail thread...

> Can you please test the latest that's in davem's tree?

Done. With 
http://git.kernel.org/?p=linux%2Fkernel%2Fgit%2Fdavem%2Fnet-2.6.git;a=commitdiff_plain;h=9c3fea6ab04a7bd9298e635bf29b4a5379f6c476
http://git.kernel.org/?p=linux%2Fkernel%2Fgit%2Fdavem%2Fnet-2.6.git;a=commitdiff_plain;h=c40af84a6726f63e35740d26f841992e8f31f92c

Everything works fine. Thank you and sorry for the noise.

Christian

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2009-04-24 12:40 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <200903271022.38244.rusty@rustcorp.com.au>
     [not found] ` <1238709324.5823.8.camel@odie.local>
     [not found]   ` <1239043798.27826.93.camel@zetabook>
     [not found]     ` <200904081021.39877.rusty@rustcorp.com.au>
     [not found]       ` <1239224319.17844.16.camel@zetabook>
     [not found]         ` <49DDE91A.8060603@trash.net>
     [not found]           ` <49DDF614.1060909@trash.net>
     [not found]             ` <m1bpr6hqrm.fsf@fess.ebiederm.org>
     [not found]               ` <49E47976.8020005@trash.net>
2009-04-15  8:36                 ` [Lguest] [PATCH 4/5] lguest: use KVM hypercalls Herbert Xu
     [not found]                   ` <20090415083610.GA8579-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>
2009-04-15  8:47                     ` Herbert Xu
2009-04-15  9:07                       ` [Lguest] " Christian Borntraeger
2009-04-15 11:07                       ` Patrick McHardy
2009-04-15 13:23                       ` Eric W. Biederman
     [not found]                         ` <m18wm2rqy6.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
2009-04-15 13:28                           ` Herbert Xu
     [not found]                             ` <20090415132802.GA11408-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>
2009-04-15 13:35                               ` Eric W. Biederman
     [not found]                                 ` <m1skkaox8h.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
2009-04-15 13:46                                   ` Herbert Xu
     [not found]                                     ` <20090415134610.GA11683-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>
2009-04-15 13:55                                       ` Herbert Xu
     [not found]                                         ` <20090415135502.GA11827-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>
2009-04-15 14:10                                           ` Eric W. Biederman
     [not found]                                             ` <m1ocuynh2f.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
2009-04-15 14:12                                               ` Herbert Xu
2009-04-15 14:06                                     ` [Lguest] " Eric W. Biederman
     [not found]                                       ` <m11vruovu5.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
2009-04-15 14:08                                         ` Herbert Xu
     [not found]                                           ` <20090415140819.GA11991-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>
2009-04-15 14:18                                             ` Eric W. Biederman
     [not found]                                               ` <m1iql6m24b.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
2009-04-15 14:23                                                 ` Herbert Xu
2009-04-15 14:38                                                 ` Herbert Xu
     [not found]                                                   ` <20090415143834.GA12384-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>
2009-04-15 14:56                                                     ` Eric W. Biederman
     [not found]                                                       ` <m1zleiklsl.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
2009-04-15 22:27                                                         ` Herbert Xu
2009-04-16 11:08                                                     ` [1/2] tun: Only free a netdev when all tun descriptors are closed Herbert Xu
     [not found]                                                       ` <20090416110818.GA20950-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>
2009-04-16 11:09                                                         ` [2/2] tun: Fix sk_sleep races when attaching/detaching Herbert Xu
     [not found]                                                           ` <20090416110952.GB20950-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>
2009-04-20  8:35                                                             ` Herbert Xu
2009-04-20  9:26                                                               ` David Miller
2009-04-20  9:35                                                                 ` Herbert Xu
2009-04-20 10:02                                                                   ` David Miller
2009-04-24  8:55                                                       ` [1/2] tun: Only free a netdev when all tun descriptors are closed Christian Borntraeger
     [not found]                                                         ` <200904241055.49794.borntraeger-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org>
2009-04-24 12:11                                                           ` Herbert Xu
     [not found]                                                             ` <20090424121156.GA28039-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>
2009-04-24 12:40                                                               ` Christian Borntraeger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).