public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/1] aio: change exit_aio() to load mm->ioctx_table once and avoid rcu_read_lock()
@ 2014-04-29 18:39 Oleg Nesterov
  2014-04-29 18:39 ` Oleg Nesterov
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Oleg Nesterov @ 2014-04-29 18:39 UTC (permalink / raw)
  To: Andrew Morton, Benjamin LaHaise, Kent Overstreet
  Cc: Al Viro, linux-aio, linux-kernel

Completely untested and I know nothing about aio, thus needs an ack
from maintainers or should be ignored.

But exit_aio() looks "obviously unnecessarily overcomplicated", I was
really puzzled when I looked at this code by accident.

Kent, could you also explain kioctx->dead is atomic_t? This looks
pointless? afaics atomic_ buys nothing in this case.

Oleg.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/1] aio: change exit_aio() to load mm->ioctx_table once and avoid rcu_read_lock()
  2014-04-29 18:39 [PATCH 0/1] aio: change exit_aio() to load mm->ioctx_table once and avoid rcu_read_lock() Oleg Nesterov
@ 2014-04-29 18:39 ` Oleg Nesterov
  2014-04-29 23:36   ` Kent Overstreet
  2014-04-29 18:40 ` [PATCH 1/1] " Oleg Nesterov
  2014-04-29 19:33 ` [PATCH 0/1] aio: change exit_aio() to load mm->ioctx_table once and avoid rcu_read_lock() Benjamin LaHaise
  2 siblings, 1 reply; 15+ messages in thread
From: Oleg Nesterov @ 2014-04-29 18:39 UTC (permalink / raw)
  To: Andrew Morton, Benjamin LaHaise, Kent Overstreet
  Cc: Al Viro, linux-aio, linux-kernel

1. We can read ->ioctx_table only once and we do not read rcu_read_lock()
   or even rcu_dereference().

   This mm has no users, nobody else can play with ->ioctx_table. Otherwise
   the code is buggy anyway, if we need rcu_read_lock() in a loop because
   ->ioctx_table can be updated then kfree(table) is obviously wrong.

2. Update the comment. "exit_mmap(mm) is coming" is the good reason to avoid
   munmap(), but another reason is that we simply can't do vm_munmap() unless
   current->mm == mm and this is not true in general, the caller is mmput().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 fs/aio.c |   47 ++++++++++++++++++-----------------------------
 1 files changed, 18 insertions(+), 29 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 12a3de0..5fd1fe7 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -777,40 +777,29 @@ EXPORT_SYMBOL(wait_on_sync_kiocb);
  */
 void exit_aio(struct mm_struct *mm)
 {
-	struct kioctx_table *table;
-	struct kioctx *ctx;
-	unsigned i = 0;
-
-	while (1) {
-		rcu_read_lock();
-		table = rcu_dereference(mm->ioctx_table);
-
-		do {
-			if (!table || i >= table->nr) {
-				rcu_read_unlock();
-				rcu_assign_pointer(mm->ioctx_table, NULL);
-				if (table)
-					kfree(table);
-				return;
-			}
-
-			ctx = table->table[i++];
-		} while (!ctx);
+	struct kioctx_table *table = rcu_dereference_raw(mm->ioctx_table);
+	int i;
 
-		rcu_read_unlock();
+	if (!table)
+		return;
 
+	for (i = 0; i < table->nr; ++i) {
+		struct kioctx *ctx = table->table[i];
 		/*
-		 * We don't need to bother with munmap() here -
-		 * exit_mmap(mm) is coming and it'll unmap everything.
-		 * Since aio_free_ring() uses non-zero ->mmap_size
-		 * as indicator that it needs to unmap the area,
-		 * just set it to 0; aio_free_ring() is the only
-		 * place that uses ->mmap_size, so it's safe.
+		 * We don't need to bother with munmap() here - exit_mmap(mm)
+		 * is coming and it'll unmap everything. And we simply can't,
+		 * this is not necessarily our ->mm.
+		 * Since kill_ioctx() uses non-zero ->mmap_size as indicator
+		 * that it needs to unmap the area, just set it to 0.
 		 */
-		ctx->mmap_size = 0;
-
-		kill_ioctx(mm, ctx);
+		if (ctx) {
+			ctx->mmap_size = 0;
+			kill_ioctx(mm, ctx);
+		}
 	}
+
+	rcu_assign_pointer(mm->ioctx_table, NULL);
+	kfree(table);
 }
 
 static void put_reqs_available(struct kioctx *ctx, unsigned nr)
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 1/1] aio: change exit_aio() to load mm->ioctx_table once and avoid rcu_read_lock()
  2014-04-29 18:39 [PATCH 0/1] aio: change exit_aio() to load mm->ioctx_table once and avoid rcu_read_lock() Oleg Nesterov
  2014-04-29 18:39 ` Oleg Nesterov
@ 2014-04-29 18:40 ` Oleg Nesterov
  2014-04-29 20:42   ` Benjamin LaHaise
  2014-04-29 19:33 ` [PATCH 0/1] aio: change exit_aio() to load mm->ioctx_table once and avoid rcu_read_lock() Benjamin LaHaise
  2 siblings, 1 reply; 15+ messages in thread
From: Oleg Nesterov @ 2014-04-29 18:40 UTC (permalink / raw)
  To: Andrew Morton, Benjamin LaHaise, Kent Overstreet
  Cc: Al Viro, linux-aio, linux-kernel

1. We can read ->ioctx_table only once and we do not read rcu_read_lock()
   or even rcu_dereference().

   This mm has no users, nobody else can play with ->ioctx_table. Otherwise
   the code is buggy anyway, if we need rcu_read_lock() in a loop because
   ->ioctx_table can be updated then kfree(table) is obviously wrong.

2. Update the comment. "exit_mmap(mm) is coming" is the good reason to avoid
   munmap(), but another reason is that we simply can't do vm_munmap() unless
   current->mm == mm and this is not true in general, the caller is mmput().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 fs/aio.c |   47 ++++++++++++++++++-----------------------------
 1 files changed, 18 insertions(+), 29 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 12a3de0..5fd1fe7 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -777,40 +777,29 @@ EXPORT_SYMBOL(wait_on_sync_kiocb);
  */
 void exit_aio(struct mm_struct *mm)
 {
-	struct kioctx_table *table;
-	struct kioctx *ctx;
-	unsigned i = 0;
-
-	while (1) {
-		rcu_read_lock();
-		table = rcu_dereference(mm->ioctx_table);
-
-		do {
-			if (!table || i >= table->nr) {
-				rcu_read_unlock();
-				rcu_assign_pointer(mm->ioctx_table, NULL);
-				if (table)
-					kfree(table);
-				return;
-			}
-
-			ctx = table->table[i++];
-		} while (!ctx);
+	struct kioctx_table *table = rcu_dereference_raw(mm->ioctx_table);
+	int i;
 
-		rcu_read_unlock();
+	if (!table)
+		return;
 
+	for (i = 0; i < table->nr; ++i) {
+		struct kioctx *ctx = table->table[i];
 		/*
-		 * We don't need to bother with munmap() here -
-		 * exit_mmap(mm) is coming and it'll unmap everything.
-		 * Since aio_free_ring() uses non-zero ->mmap_size
-		 * as indicator that it needs to unmap the area,
-		 * just set it to 0; aio_free_ring() is the only
-		 * place that uses ->mmap_size, so it's safe.
+		 * We don't need to bother with munmap() here - exit_mmap(mm)
+		 * is coming and it'll unmap everything. And we simply can't,
+		 * this is not necessarily our ->mm.
+		 * Since kill_ioctx() uses non-zero ->mmap_size as indicator
+		 * that it needs to unmap the area, just set it to 0.
 		 */
-		ctx->mmap_size = 0;
-
-		kill_ioctx(mm, ctx);
+		if (ctx) {
+			ctx->mmap_size = 0;
+			kill_ioctx(mm, ctx);
+		}
 	}
+
+	rcu_assign_pointer(mm->ioctx_table, NULL);
+	kfree(table);
 }
 
 static void put_reqs_available(struct kioctx *ctx, unsigned nr)
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/1] aio: change exit_aio() to load mm->ioctx_table once and avoid rcu_read_lock()
  2014-04-29 18:39 [PATCH 0/1] aio: change exit_aio() to load mm->ioctx_table once and avoid rcu_read_lock() Oleg Nesterov
  2014-04-29 18:39 ` Oleg Nesterov
  2014-04-29 18:40 ` [PATCH 1/1] " Oleg Nesterov
@ 2014-04-29 19:33 ` Benjamin LaHaise
  2014-04-29 19:49   ` Oleg Nesterov
  2 siblings, 1 reply; 15+ messages in thread
From: Benjamin LaHaise @ 2014-04-29 19:33 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Kent Overstreet, Al Viro, linux-aio, linux-kernel

On Tue, Apr 29, 2014 at 08:39:15PM +0200, Oleg Nesterov wrote:
> Completely untested and I know nothing about aio, thus needs an ack
> from maintainers or should be ignored.
> 
> But exit_aio() looks "obviously unnecessarily overcomplicated", I was
> really puzzled when I looked at this code by accident.
> 
> Kent, could you also explain kioctx->dead is atomic_t? This looks
> pointless? afaics atomic_ buys nothing in this case.

If it wasn't atomic, it would need to be volatile.

		-ben
-- 
"Thought is the essence of where you are now."

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/1] aio: change exit_aio() to load mm->ioctx_table once and avoid rcu_read_lock()
  2014-04-29 19:33 ` [PATCH 0/1] aio: change exit_aio() to load mm->ioctx_table once and avoid rcu_read_lock() Benjamin LaHaise
@ 2014-04-29 19:49   ` Oleg Nesterov
  0 siblings, 0 replies; 15+ messages in thread
From: Oleg Nesterov @ 2014-04-29 19:49 UTC (permalink / raw)
  To: Benjamin LaHaise
  Cc: Andrew Morton, Kent Overstreet, Al Viro, linux-aio, linux-kernel

On 04/29, Benjamin LaHaise wrote:
>
> On Tue, Apr 29, 2014 at 08:39:15PM +0200, Oleg Nesterov wrote:
> >
> > Kent, could you also explain kioctx->dead is atomic_t? This looks
> > pointless? afaics atomic_ buys nothing in this case.
>
> If it wasn't atomic, it would need to be volatile.

Then ACCESS_ONCE() in aio_read_events() makes more sense. Although
I can't say I understand why it is needed.

Oleg.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/1] aio: change exit_aio() to load mm->ioctx_table once and avoid rcu_read_lock()
  2014-04-29 18:40 ` [PATCH 1/1] " Oleg Nesterov
@ 2014-04-29 20:42   ` Benjamin LaHaise
  2014-04-29 21:22     ` Benjamin LaHaise
  0 siblings, 1 reply; 15+ messages in thread
From: Benjamin LaHaise @ 2014-04-29 20:42 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Kent Overstreet, Al Viro, linux-aio, linux-kernel

On Tue, Apr 29, 2014 at 08:40:04PM +0200, Oleg Nesterov wrote:
> 1. We can read ->ioctx_table only once and we do not read rcu_read_lock()
>    or even rcu_dereference().
> 
>    This mm has no users, nobody else can play with ->ioctx_table. Otherwise
>    the code is buggy anyway, if we need rcu_read_lock() in a loop because
>    ->ioctx_table can be updated then kfree(table) is obviously wrong.
> 
> 2. Update the comment. "exit_mmap(mm) is coming" is the good reason to avoid
>    munmap(), but another reason is that we simply can't do vm_munmap() unless
>    current->mm == mm and this is not true in general, the caller is mmput().
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>

Your patch does not apply because it is whitespace damaged.  Please resend 
and verify that it applies with 'git am'.

		-ben

> ---
>  fs/aio.c |   47 ++++++++++++++++++-----------------------------
>  1 files changed, 18 insertions(+), 29 deletions(-)
> 
> diff --git a/fs/aio.c b/fs/aio.c
> index 12a3de0..5fd1fe7 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -777,40 +777,29 @@ EXPORT_SYMBOL(wait_on_sync_kiocb);
>   */
>  void exit_aio(struct mm_struct *mm)
>  {
> -	struct kioctx_table *table;
> -	struct kioctx *ctx;
> -	unsigned i = 0;
> -
> -	while (1) {
> -		rcu_read_lock();
> -		table = rcu_dereference(mm->ioctx_table);
> -
> -		do {
> -			if (!table || i >= table->nr) {
> -				rcu_read_unlock();
> -				rcu_assign_pointer(mm->ioctx_table, NULL);
> -				if (table)
> -					kfree(table);
> -				return;
> -			}
> -
> -			ctx = table->table[i++];
> -		} while (!ctx);
> +	struct kioctx_table *table = rcu_dereference_raw(mm->ioctx_table);
> +	int i;
>  
> -		rcu_read_unlock();
> +	if (!table)
> +		return;
>  
> +	for (i = 0; i < table->nr; ++i) {
> +		struct kioctx *ctx = table->table[i];
>  		/*
> -		 * We don't need to bother with munmap() here -
> -		 * exit_mmap(mm) is coming and it'll unmap everything.
> -		 * Since aio_free_ring() uses non-zero ->mmap_size
> -		 * as indicator that it needs to unmap the area,
> -		 * just set it to 0; aio_free_ring() is the only
> -		 * place that uses ->mmap_size, so it's safe.
> +		 * We don't need to bother with munmap() here - exit_mmap(mm)
> +		 * is coming and it'll unmap everything. And we simply can't,
> +		 * this is not necessarily our ->mm.
> +		 * Since kill_ioctx() uses non-zero ->mmap_size as indicator
> +		 * that it needs to unmap the area, just set it to 0.
>  		 */
> -		ctx->mmap_size = 0;
> -
> -		kill_ioctx(mm, ctx);
> +		if (ctx) {
> +			ctx->mmap_size = 0;
> +			kill_ioctx(mm, ctx);
> +		}
>  	}
> +
> +	rcu_assign_pointer(mm->ioctx_table, NULL);
> +	kfree(table);
>  }
>  
>  static void put_reqs_available(struct kioctx *ctx, unsigned nr)
> -- 
> 1.5.5.1
> 

-- 
"Thought is the essence of where you are now."

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/1] aio: change exit_aio() to load mm->ioctx_table once and avoid rcu_read_lock()
  2014-04-29 20:42   ` Benjamin LaHaise
@ 2014-04-29 21:22     ` Benjamin LaHaise
  2014-04-29 23:04       ` Mateusz Guzik
  2014-04-30 14:15       ` [PATCH v2 0/2] aio: ioctx_table/rcu_read_lock cleanups Oleg Nesterov
  0 siblings, 2 replies; 15+ messages in thread
From: Benjamin LaHaise @ 2014-04-29 21:22 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Kent Overstreet, Al Viro, linux-aio, linux-kernel

On Tue, Apr 29, 2014 at 04:42:17PM -0400, Benjamin LaHaise wrote:
> > Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> 
> Your patch does not apply because it is whitespace damaged.  Please resend 
> and verify that it applies with 'git am'.

Whoops, it's not whitespace damange, but rather that it doesn't apply with 
the other changes that are queued up in the aio-next tree.  You can find a 
copy of that tree at git://git.kvack.org/~bcrl/aio-next.git .  The change 
that conflicts is an additional parameter to kill_ioctx().

		-ben
-- 
"Thought is the essence of where you are now."

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/1] aio: change exit_aio() to load mm->ioctx_table once and avoid rcu_read_lock()
  2014-04-29 21:22     ` Benjamin LaHaise
@ 2014-04-29 23:04       ` Mateusz Guzik
  2014-04-30 12:13         ` Oleg Nesterov
  2014-04-30 14:15       ` [PATCH v2 0/2] aio: ioctx_table/rcu_read_lock cleanups Oleg Nesterov
  1 sibling, 1 reply; 15+ messages in thread
From: Mateusz Guzik @ 2014-04-29 23:04 UTC (permalink / raw)
  To: Benjamin LaHaise
  Cc: Oleg Nesterov, Andrew Morton, Kent Overstreet, Al Viro, linux-aio,
	linux-kernel

On Tue, Apr 29, 2014 at 05:22:22PM -0400, Benjamin LaHaise wrote:
> On Tue, Apr 29, 2014 at 04:42:17PM -0400, Benjamin LaHaise wrote:
> > > Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> > 
> > Your patch does not apply because it is whitespace damaged.  Please resend 
> > and verify that it applies with 'git am'.
> 
> Whoops, it's not whitespace damange, but rather that it doesn't apply with 
> the other changes that are queued up in the aio-next tree.  You can find a 
> copy of that tree at git://git.kvack.org/~bcrl/aio-next.git .  The change 
> that conflicts is an additional parameter to kill_ioctx().

While here is there any reason for:
rcu_assign_pointer(mm->ioctx_table, NULL);

Nothing looks at this pointer afterwards and mm is about to be freed.

I thought it would be used to sanity check that everything was cleared
before freeing, but that is nod one and not every pointer is nullified
anyway.

-- 
Mateusz Guzik

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/1] aio: change exit_aio() to load mm->ioctx_table once and avoid rcu_read_lock()
  2014-04-29 18:39 ` Oleg Nesterov
@ 2014-04-29 23:36   ` Kent Overstreet
  0 siblings, 0 replies; 15+ messages in thread
From: Kent Overstreet @ 2014-04-29 23:36 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Benjamin LaHaise, Al Viro, linux-aio, linux-kernel

On Tue, Apr 29, 2014 at 08:39:30PM +0200, Oleg Nesterov wrote:
> 1. We can read ->ioctx_table only once and we do not read rcu_read_lock()
>    or even rcu_dereference().
> 
>    This mm has no users, nobody else can play with ->ioctx_table. Otherwise
>    the code is buggy anyway, if we need rcu_read_lock() in a loop because
>    ->ioctx_table can be updated then kfree(table) is obviously wrong.
> 
> 2. Update the comment. "exit_mmap(mm) is coming" is the good reason to avoid
>    munmap(), but another reason is that we simply can't do vm_munmap() unless
>    current->mm == mm and this is not true in general, the caller is mmput().

I'm pretty sure you're analysis is all correct. IIRC there's other things in the
shutdown path we still have to be carefull with synchronization wise, but unless
I've forgotten something you're right about ioctx_table.

If I wrote that code (I'd have to check git blame), I'd say I just wrote it that
way because I prefer to be consistent about how things like RCU protected
pointers are accessed, even if it's not strictly necessary. Feels less tricky to
me. But I don't have a strong preference one way or the other.

w.r.t. ioctx->dead, yes, no need for it to be atomic. a regular int and xchg()
is probably more correct anyways since xchg() implies memory barriers and
atomic_xchg() does not.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/1] aio: change exit_aio() to load mm->ioctx_table once and avoid rcu_read_lock()
  2014-04-29 23:04       ` Mateusz Guzik
@ 2014-04-30 12:13         ` Oleg Nesterov
  0 siblings, 0 replies; 15+ messages in thread
From: Oleg Nesterov @ 2014-04-30 12:13 UTC (permalink / raw)
  To: Mateusz Guzik
  Cc: Benjamin LaHaise, Andrew Morton, Kent Overstreet, Al Viro,
	linux-aio, linux-kernel

On 04/30, Mateusz Guzik wrote:
>
> On Tue, Apr 29, 2014 at 05:22:22PM -0400, Benjamin LaHaise wrote:
> > On Tue, Apr 29, 2014 at 04:42:17PM -0400, Benjamin LaHaise wrote:
> > > > Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> > >
> > > Your patch does not apply because it is whitespace damaged.  Please resend
> > > and verify that it applies with 'git am'.
> >
> > Whoops, it's not whitespace damange, but rather that it doesn't apply with
> > the other changes that are queued up in the aio-next tree.  You can find a
> > copy of that tree at git://git.kvack.org/~bcrl/aio-next.git .  The change
> > that conflicts is an additional parameter to kill_ioctx().
>
> While here is there any reason for:
> rcu_assign_pointer(mm->ioctx_table, NULL);
>
> Nothing looks at this pointer afterwards and mm is about to be freed.

Yees, and initially I was going to remove it, but this needs "3" in the
changelog and I am lazy ;)

> I thought it would be used to sanity check that everything was cleared
> before freeing,

Actually, ->mm_count can be > 1, we do not know if we are going to free
this mm_struct or not.

We do not really need to nullify this ptr anyway, but perhaps it makes
sense to keep this initialization anyway, I dunno. But I'll change it to
use RCU_INIT_POINTER(NULL) in v2.

Oleg.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 0/2] aio: ioctx_table/rcu_read_lock cleanups
  2014-04-29 21:22     ` Benjamin LaHaise
  2014-04-29 23:04       ` Mateusz Guzik
@ 2014-04-30 14:15       ` Oleg Nesterov
  2014-04-30 14:16         ` [PATCH v2 1/2] aio: change exit_aio() to load mm->ioctx_table once and avoid rcu_read_lock() Oleg Nesterov
  2014-04-30 14:16         ` [PATCH v2 2/2] aio: kill the misleading rcu read locks in ioctx_add_table() and kill_ioctx() Oleg Nesterov
  1 sibling, 2 replies; 15+ messages in thread
From: Oleg Nesterov @ 2014-04-30 14:15 UTC (permalink / raw)
  To: Benjamin LaHaise
  Cc: Andrew Morton, Kent Overstreet, Al Viro, linux-aio, linux-kernel

On 04/29, Benjamin LaHaise wrote:
>
> Whoops, it's not whitespace damange, but rather that it doesn't apply with
> the other changes that are queued up in the aio-next tree.  You can find a
> copy of that tree at git://git.kvack.org/~bcrl/aio-next.git .  The change
> that conflicts is an additional parameter to kill_ioctx().

Please see v2. Rebased, added another untested cleanup.

Kent, it seems that you agree at least with 1/2, I'll appreciate your ack ;)

And it seems that we can kill mm->ioctx_lock, with the minimal complications
we can move it into kioctx_table, but this is minor.

Oleg.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 1/2] aio: change exit_aio() to load mm->ioctx_table once and avoid rcu_read_lock()
  2014-04-30 14:15       ` [PATCH v2 0/2] aio: ioctx_table/rcu_read_lock cleanups Oleg Nesterov
@ 2014-04-30 14:16         ` Oleg Nesterov
  2014-04-30 15:23           ` Benjamin LaHaise
  2014-04-30 14:16         ` [PATCH v2 2/2] aio: kill the misleading rcu read locks in ioctx_add_table() and kill_ioctx() Oleg Nesterov
  1 sibling, 1 reply; 15+ messages in thread
From: Oleg Nesterov @ 2014-04-30 14:16 UTC (permalink / raw)
  To: Benjamin LaHaise
  Cc: Andrew Morton, Kent Overstreet, Al Viro, linux-aio, linux-kernel

1. We can read ->ioctx_table only once and we do not read rcu_read_lock()
   or even rcu_dereference().

   This mm has no users, nobody else can play with ->ioctx_table. Otherwise
   the code is buggy anyway, if we need rcu_read_lock() in a loop because
   ->ioctx_table can be updated then kfree(table) is obviously wrong.

2. Update the comment. "exit_mmap(mm) is coming" is the good reason to avoid
   munmap(), but another reason is that we simply can't do vm_munmap() unless
   current->mm == mm and this is not true in general, the caller is mmput().

3. We do not really need to nullify mm->ioctx_table before return, probably
   the current code does this to catch the potential problems. But in this
   case RCU_INIT_POINTER(NULL) looks better.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 fs/aio.c |   47 ++++++++++++++++++-----------------------------
 1 files changed, 18 insertions(+), 29 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 79b7e69..3526c2b 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -791,40 +791,29 @@ EXPORT_SYMBOL(wait_on_sync_kiocb);
  */
 void exit_aio(struct mm_struct *mm)
 {
-	struct kioctx_table *table;
-	struct kioctx *ctx;
-	unsigned i = 0;
-
-	while (1) {
-		rcu_read_lock();
-		table = rcu_dereference(mm->ioctx_table);
-
-		do {
-			if (!table || i >= table->nr) {
-				rcu_read_unlock();
-				rcu_assign_pointer(mm->ioctx_table, NULL);
-				if (table)
-					kfree(table);
-				return;
-			}
-
-			ctx = table->table[i++];
-		} while (!ctx);
+	struct kioctx_table *table = rcu_dereference_raw(mm->ioctx_table);
+	int i;
 
-		rcu_read_unlock();
+	if (!table)
+		return;
 
+	for (i = 0; i < table->nr; ++i) {
+		struct kioctx *ctx = table->table[i];
 		/*
-		 * We don't need to bother with munmap() here -
-		 * exit_mmap(mm) is coming and it'll unmap everything.
-		 * Since aio_free_ring() uses non-zero ->mmap_size
-		 * as indicator that it needs to unmap the area,
-		 * just set it to 0; aio_free_ring() is the only
-		 * place that uses ->mmap_size, so it's safe.
+		 * We don't need to bother with munmap() here - exit_mmap(mm)
+		 * is coming and it'll unmap everything. And we simply can't,
+		 * this is not necessarily our ->mm.
+		 * Since kill_ioctx() uses non-zero ->mmap_size as indicator
+		 * that it needs to unmap the area, just set it to 0.
 		 */
-		ctx->mmap_size = 0;
-
-		kill_ioctx(mm, ctx, NULL);
+		if (ctx) {
+			ctx->mmap_size = 0;
+			kill_ioctx(mm, ctx, NULL);
+		}
 	}
+
+	RCU_INIT_POINTER(mm->ioctx_table, NULL);
+	kfree(table);
 }
 
 static void put_reqs_available(struct kioctx *ctx, unsigned nr)
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 2/2] aio: kill the misleading rcu read locks in ioctx_add_table() and kill_ioctx()
  2014-04-30 14:15       ` [PATCH v2 0/2] aio: ioctx_table/rcu_read_lock cleanups Oleg Nesterov
  2014-04-30 14:16         ` [PATCH v2 1/2] aio: change exit_aio() to load mm->ioctx_table once and avoid rcu_read_lock() Oleg Nesterov
@ 2014-04-30 14:16         ` Oleg Nesterov
  1 sibling, 0 replies; 15+ messages in thread
From: Oleg Nesterov @ 2014-04-30 14:16 UTC (permalink / raw)
  To: Benjamin LaHaise
  Cc: Andrew Morton, Kent Overstreet, Al Viro, linux-aio, linux-kernel

ioctx_add_table() is the writer, it does not need rcu_read_lock() to
protect ->ioctx_table. It relies on mm->ioctx_lock and rcu locks just
add the confusion.

And it doesn't need rcu_dereference() by the same reason, it must see
any updates previously done under the same ->ioctx_lock. We could use
rcu_dereference_protected() but the patch uses rcu_dereference_raw(),
the function is simple enough.

The same for kill_ioctx(), although it does not update the pointer.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 fs/aio.c |   14 +++-----------
 1 files changed, 3 insertions(+), 11 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 3526c2b..a415870 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -554,8 +554,7 @@ static int ioctx_add_table(struct kioctx *ctx, struct mm_struct *mm)
 	struct aio_ring *ring;
 
 	spin_lock(&mm->ioctx_lock);
-	rcu_read_lock();
-	table = rcu_dereference(mm->ioctx_table);
+	table = rcu_dereference_raw(mm->ioctx_table);
 
 	while (1) {
 		if (table)
@@ -563,7 +562,6 @@ static int ioctx_add_table(struct kioctx *ctx, struct mm_struct *mm)
 				if (!table->table[i]) {
 					ctx->id = i;
 					table->table[i] = ctx;
-					rcu_read_unlock();
 					spin_unlock(&mm->ioctx_lock);
 
 					/* While kioctx setup is in progress,
@@ -577,8 +575,6 @@ static int ioctx_add_table(struct kioctx *ctx, struct mm_struct *mm)
 				}
 
 		new_nr = (table ? table->nr : 1) * 4;
-
-		rcu_read_unlock();
 		spin_unlock(&mm->ioctx_lock);
 
 		table = kzalloc(sizeof(*table) + sizeof(struct kioctx *) *
@@ -589,8 +585,7 @@ static int ioctx_add_table(struct kioctx *ctx, struct mm_struct *mm)
 		table->nr = new_nr;
 
 		spin_lock(&mm->ioctx_lock);
-		rcu_read_lock();
-		old = rcu_dereference(mm->ioctx_table);
+		old = rcu_dereference_raw(mm->ioctx_table);
 
 		if (!old) {
 			rcu_assign_pointer(mm->ioctx_table, table);
@@ -737,12 +732,9 @@ static int kill_ioctx(struct mm_struct *mm, struct kioctx *ctx,
 
 
 	spin_lock(&mm->ioctx_lock);
-	rcu_read_lock();
-	table = rcu_dereference(mm->ioctx_table);
-
+	table = rcu_dereference_raw(mm->ioctx_table);
 	WARN_ON(ctx != table->table[ctx->id]);
 	table->table[ctx->id] = NULL;
-	rcu_read_unlock();
 	spin_unlock(&mm->ioctx_lock);
 
 	/* percpu_ref_kill() will do the necessary call_rcu() */
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/2] aio: change exit_aio() to load mm->ioctx_table once and avoid rcu_read_lock()
  2014-04-30 14:16         ` [PATCH v2 1/2] aio: change exit_aio() to load mm->ioctx_table once and avoid rcu_read_lock() Oleg Nesterov
@ 2014-04-30 15:23           ` Benjamin LaHaise
  2014-04-30 17:02             ` [PATCH v3 " Oleg Nesterov
  0 siblings, 1 reply; 15+ messages in thread
From: Benjamin LaHaise @ 2014-04-30 15:23 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Kent Overstreet, Al Viro, linux-aio, linux-kernel

Hi Oleg,

On Wed, Apr 30, 2014 at 04:16:16PM +0200, Oleg Nesterov wrote:
> 1. We can read ->ioctx_table only once and we do not read rcu_read_lock()
>    or even rcu_dereference().
> 
>    This mm has no users, nobody else can play with ->ioctx_table. Otherwise
>    the code is buggy anyway, if we need rcu_read_lock() in a loop because
>    ->ioctx_table can be updated then kfree(table) is obviously wrong.
> 
> 2. Update the comment. "exit_mmap(mm) is coming" is the good reason to avoid
>    munmap(), but another reason is that we simply can't do vm_munmap() unless
>    current->mm == mm and this is not true in general, the caller is mmput().
> 
> 3. We do not really need to nullify mm->ioctx_table before return, probably
>    the current code does this to catch the potential problems. But in this
>    case RCU_INIT_POINTER(NULL) looks better.

Looks pretty good.  One minor style comment below.

> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> ---
>  fs/aio.c |   47 ++++++++++++++++++-----------------------------
>  1 files changed, 18 insertions(+), 29 deletions(-)
> 
> diff --git a/fs/aio.c b/fs/aio.c
> index 79b7e69..3526c2b 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -791,40 +791,29 @@ EXPORT_SYMBOL(wait_on_sync_kiocb);
>   */
>  void exit_aio(struct mm_struct *mm)
>  {
> -	struct kioctx_table *table;
> -	struct kioctx *ctx;
> -	unsigned i = 0;
> -
> -	while (1) {
> -		rcu_read_lock();
> -		table = rcu_dereference(mm->ioctx_table);
> -
> -		do {
> -			if (!table || i >= table->nr) {
> -				rcu_read_unlock();
> -				rcu_assign_pointer(mm->ioctx_table, NULL);
> -				if (table)
> -					kfree(table);
> -				return;
> -			}
> -
> -			ctx = table->table[i++];
> -		} while (!ctx);
> +	struct kioctx_table *table = rcu_dereference_raw(mm->ioctx_table);
> +	int i;
>  
> -		rcu_read_unlock();
> +	if (!table)
> +		return;
>  
> +	for (i = 0; i < table->nr; ++i) {
> +		struct kioctx *ctx = table->table[i];
>  		/*
> -		 * We don't need to bother with munmap() here -
> -		 * exit_mmap(mm) is coming and it'll unmap everything.
> -		 * Since aio_free_ring() uses non-zero ->mmap_size
> -		 * as indicator that it needs to unmap the area,
> -		 * just set it to 0; aio_free_ring() is the only
> -		 * place that uses ->mmap_size, so it's safe.
> +		 * We don't need to bother with munmap() here - exit_mmap(mm)
> +		 * is coming and it'll unmap everything. And we simply can't,
> +		 * this is not necessarily our ->mm.
> +		 * Since kill_ioctx() uses non-zero ->mmap_size as indicator
> +		 * that it needs to unmap the area, just set it to 0.
>  		 */
> -		ctx->mmap_size = 0;
> -
> -		kill_ioctx(mm, ctx, NULL);
> +		if (ctx) {
> +			ctx->mmap_size = 0;
> +			kill_ioctx(mm, ctx, NULL);
> +		}

Rather than indenting and moving the two lines changing mmap_size and the 
kill_ioctx() call, why not just do "if (!ctx) ... continue;"?  That reduces 
the number of lines changed and avoid excessive indentation.

		-ben

>  	}
> +
> +	RCU_INIT_POINTER(mm->ioctx_table, NULL);
> +	kfree(table);
>  }
>  
>  static void put_reqs_available(struct kioctx *ctx, unsigned nr)
> -- 
> 1.5.5.1
> 

-- 
"Thought is the essence of where you are now."

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v3 1/2] aio: change exit_aio() to load mm->ioctx_table once and avoid rcu_read_lock()
  2014-04-30 15:23           ` Benjamin LaHaise
@ 2014-04-30 17:02             ` Oleg Nesterov
  0 siblings, 0 replies; 15+ messages in thread
From: Oleg Nesterov @ 2014-04-30 17:02 UTC (permalink / raw)
  To: Benjamin LaHaise
  Cc: Andrew Morton, Kent Overstreet, Al Viro, linux-aio, linux-kernel

On 04/30, Benjamin LaHaise wrote:
>
> > -		ctx->mmap_size = 0;
> > -
> > -		kill_ioctx(mm, ctx, NULL);
> > +		if (ctx) {
> > +			ctx->mmap_size = 0;
> > +			kill_ioctx(mm, ctx, NULL);
> > +		}
>
> Rather than indenting and moving the two lines changing mmap_size and the
> kill_ioctx() call, why not just do "if (!ctx) ... continue;"?  That reduces
> the number of lines changed and avoid excessive indentation.

OK. To me the code looks better/simpler with "if (ctx)", but this is subjective
of course, I won't argue.

The patch still removes the empty line between mmap_size = 0 and kill_ioctx(),
we reset mmap_size only for kill_ioctx(). But feel free to remove this change.

-------------------------------------------------------------------------------
Subject: [PATCH v3 1/2] aio: change exit_aio() to load mm->ioctx_table once and avoid rcu_read_lock()

1. We can read ->ioctx_table only once and we do not read rcu_read_lock()
   or even rcu_dereference().

   This mm has no users, nobody else can play with ->ioctx_table. Otherwise
   the code is buggy anyway, if we need rcu_read_lock() in a loop because
   ->ioctx_table can be updated then kfree(table) is obviously wrong.

2. Update the comment. "exit_mmap(mm) is coming" is the good reason to avoid
   munmap(), but another reason is that we simply can't do vm_munmap() unless
   current->mm == mm and this is not true in general, the caller is mmput().

3. We do not really need to nullify mm->ioctx_table before return, probably
   the current code does this to catch the potential problems. But in this
   case RCU_INIT_POINTER(NULL) looks better.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 fs/aio.c |   42 ++++++++++++++++--------------------------
 1 files changed, 16 insertions(+), 26 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 79b7e69..b67f3e2 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -791,40 +791,30 @@ EXPORT_SYMBOL(wait_on_sync_kiocb);
  */
 void exit_aio(struct mm_struct *mm)
 {
-	struct kioctx_table *table;
-	struct kioctx *ctx;
-	unsigned i = 0;
-
-	while (1) {
-		rcu_read_lock();
-		table = rcu_dereference(mm->ioctx_table);
-
-		do {
-			if (!table || i >= table->nr) {
-				rcu_read_unlock();
-				rcu_assign_pointer(mm->ioctx_table, NULL);
-				if (table)
-					kfree(table);
-				return;
-			}
+	struct kioctx_table *table = rcu_dereference_raw(mm->ioctx_table);
+	int i;
 
-			ctx = table->table[i++];
-		} while (!ctx);
+	if (!table)
+		return;
 
-		rcu_read_unlock();
+	for (i = 0; i < table->nr; ++i) {
+		struct kioctx *ctx = table->table[i];
 
+		if (!ctx)
+			continue;
 		/*
-		 * We don't need to bother with munmap() here -
-		 * exit_mmap(mm) is coming and it'll unmap everything.
-		 * Since aio_free_ring() uses non-zero ->mmap_size
-		 * as indicator that it needs to unmap the area,
-		 * just set it to 0; aio_free_ring() is the only
-		 * place that uses ->mmap_size, so it's safe.
+		 * We don't need to bother with munmap() here - exit_mmap(mm)
+		 * is coming and it'll unmap everything. And we simply can't,
+		 * this is not necessarily our ->mm.
+		 * Since kill_ioctx() uses non-zero ->mmap_size as indicator
+		 * that it needs to unmap the area, just set it to 0.
 		 */
 		ctx->mmap_size = 0;
-
 		kill_ioctx(mm, ctx, NULL);
 	}
+
+	RCU_INIT_POINTER(mm->ioctx_table, NULL);
+	kfree(table);
 }
 
 static void put_reqs_available(struct kioctx *ctx, unsigned nr)
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2014-04-30 17:03 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-04-29 18:39 [PATCH 0/1] aio: change exit_aio() to load mm->ioctx_table once and avoid rcu_read_lock() Oleg Nesterov
2014-04-29 18:39 ` Oleg Nesterov
2014-04-29 23:36   ` Kent Overstreet
2014-04-29 18:40 ` [PATCH 1/1] " Oleg Nesterov
2014-04-29 20:42   ` Benjamin LaHaise
2014-04-29 21:22     ` Benjamin LaHaise
2014-04-29 23:04       ` Mateusz Guzik
2014-04-30 12:13         ` Oleg Nesterov
2014-04-30 14:15       ` [PATCH v2 0/2] aio: ioctx_table/rcu_read_lock cleanups Oleg Nesterov
2014-04-30 14:16         ` [PATCH v2 1/2] aio: change exit_aio() to load mm->ioctx_table once and avoid rcu_read_lock() Oleg Nesterov
2014-04-30 15:23           ` Benjamin LaHaise
2014-04-30 17:02             ` [PATCH v3 " Oleg Nesterov
2014-04-30 14:16         ` [PATCH v2 2/2] aio: kill the misleading rcu read locks in ioctx_add_table() and kill_ioctx() Oleg Nesterov
2014-04-29 19:33 ` [PATCH 0/1] aio: change exit_aio() to load mm->ioctx_table once and avoid rcu_read_lock() Benjamin LaHaise
2014-04-29 19:49   ` Oleg Nesterov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox