[PATCH 2/2] aio: fix the confliction of read events and migrating ring page

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 2/2] aio: fix the confliction of read events and migrating ring page
@ 2014-03-20  5:46 Gu Zheng
  2014-03-20 14:32 ` Dave Jones
  0 siblings, 1 reply; 15+ messages in thread
From: Gu Zheng @ 2014-03-20  5:46 UTC (permalink / raw)
  To: Benjamin
  Cc: Al Viro, jmoyer, kosaki.motohiro, KAMEZAWA Hiroyuki,
	Yasuaki Ishimatsu, tangchen, miaox, linux-aio, fsdevel,
	linux-kernel, Andrew Morton

Since we do not have additional protection on the page at the read events
side, so it is possible that the read of the page takes place after the
page has been freed and allocated to another part of the kernel. This
would result in the read returning invalid information.
So here we add a mutex pair before putting old page when migrating page
success to fix the confliction of reading events and migrating page.

Reported-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
---
 fs/aio.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 88ad40c..e353085 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -319,6 +319,9 @@ static int aio_migratepage(struct address_space *mapping, struct page *new,
 	ctx->ring_pages[old->index] = new;
 	spin_unlock_irqrestore(&ctx->completion_lock, flags);
 
+	/* Ensure read event is completed before putting old page */
+	mutex_lock(&ctx->ring_lock);
+	mutex_unlock(&ctx->ring_lock);
 	put_page(old);
 
 	return rc;
-- 
1.7.7

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] aio: fix the confliction of read events and migrating ring page
  2014-03-20  5:46 [PATCH 2/2] aio: fix the confliction of read events and migrating ring page Gu Zheng
@ 2014-03-20 14:32 ` Dave Jones
  2014-03-20 16:30   ` Benjamin LaHaise
  0 siblings, 1 reply; 15+ messages in thread
From: Dave Jones @ 2014-03-20 14:32 UTC (permalink / raw)
  To: Gu Zheng
  Cc: Benjamin, Al Viro, jmoyer, kosaki.motohiro, KAMEZAWA Hiroyuki,
	Yasuaki Ishimatsu, tangchen, miaox, linux-aio, fsdevel,
	linux-kernel, Andrew Morton

On Thu, Mar 20, 2014 at 01:46:25PM +0800, Gu Zheng wrote:

 > diff --git a/fs/aio.c b/fs/aio.c
 > index 88ad40c..e353085 100644
 > --- a/fs/aio.c
 > +++ b/fs/aio.c
 > @@ -319,6 +319,9 @@ static int aio_migratepage(struct address_space *mapping, struct page *new,
 >  	ctx->ring_pages[old->index] = new;
 >  	spin_unlock_irqrestore(&ctx->completion_lock, flags);
 >  
 > +	/* Ensure read event is completed before putting old page */
 > +	mutex_lock(&ctx->ring_lock);
 > +	mutex_unlock(&ctx->ring_lock);
 >  	put_page(old);
 >  
 >  	return rc;

This looks a bit weird. Would using a completion work here ?

	Dave

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] aio: fix the confliction of read events and migrating ring page
  2014-03-20 14:32 ` Dave Jones
@ 2014-03-20 16:30   ` Benjamin LaHaise
  2014-03-21  1:56     ` Gu Zheng
  0 siblings, 1 reply; 15+ messages in thread
From: Benjamin LaHaise @ 2014-03-20 16:30 UTC (permalink / raw)
  To: Dave Jones, Gu Zheng, Al Viro, jmoyer, kosaki.motohiro,
	KAMEZAWA Hiroyuki, Yasuaki Ishimatsu, tangchen, miaox, linux-aio,
	fsdevel, linux-kernel, Andrew Morton

On Thu, Mar 20, 2014 at 10:32:07AM -0400, Dave Jones wrote:
> On Thu, Mar 20, 2014 at 01:46:25PM +0800, Gu Zheng wrote:
> 
>  > diff --git a/fs/aio.c b/fs/aio.c
>  > index 88ad40c..e353085 100644
>  > --- a/fs/aio.c
>  > +++ b/fs/aio.c
>  > @@ -319,6 +319,9 @@ static int aio_migratepage(struct address_space *mapping, struct page *new,
>  >  	ctx->ring_pages[old->index] = new;
>  >  	spin_unlock_irqrestore(&ctx->completion_lock, flags);
>  >  
>  > +	/* Ensure read event is completed before putting old page */
>  > +	mutex_lock(&ctx->ring_lock);
>  > +	mutex_unlock(&ctx->ring_lock);
>  >  	put_page(old);
>  >  
>  >  	return rc;
> 
> This looks a bit weird. Would using a completion work here ?

Nope.  This is actually the most elegant fix I've seen for this approach, 
as everything else has relied on adding additional spin locks (which only 
end up being needed in the migration case) around access to the ring_pages 
on the reader side.  That said, this patch is not a complete solution to 
the problem, as the update of the ring's head pointer could still get lost 
with this patch.  I think the right thing is just taking the ring_lock 
mutex over the entire page migration operation.  That should be safe, as 
nowhere else is the ring_lock mutex nested with any other locks.

		-ben

> 	Dave

-- 
"Thought is the essence of where you are now."

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] aio: fix the confliction of read events and migrating ring page
  2014-03-20 16:30   ` Benjamin LaHaise
@ 2014-03-21  1:56     ` Gu Zheng
  2014-03-21 17:35       ` Benjamin LaHaise
  2014-03-21 18:35       ` [PATCH] aio: ensure access to ctx->ring_pages is correctly serialised Benjamin LaHaise
  0 siblings, 2 replies; 15+ messages in thread
From: Gu Zheng @ 2014-03-21  1:56 UTC (permalink / raw)
  To: Benjamin LaHaise
  Cc: Dave Jones, Al Viro, jmoyer, kosaki.motohiro, KAMEZAWA Hiroyuki,
	Yasuaki Ishimatsu, tangchen, miaox, linux-aio, fsdevel,
	linux-kernel, Andrew Morton

Hi Ben,
On 03/21/2014 12:30 AM, Benjamin LaHaise wrote:

> On Thu, Mar 20, 2014 at 10:32:07AM -0400, Dave Jones wrote:
>> On Thu, Mar 20, 2014 at 01:46:25PM +0800, Gu Zheng wrote:
>>
>>  > diff --git a/fs/aio.c b/fs/aio.c
>>  > index 88ad40c..e353085 100644
>>  > --- a/fs/aio.c
>>  > +++ b/fs/aio.c
>>  > @@ -319,6 +319,9 @@ static int aio_migratepage(struct address_space *mapping, struct page *new,
>>  >  	ctx->ring_pages[old->index] = new;
>>  >  	spin_unlock_irqrestore(&ctx->completion_lock, flags);
>>  >  
>>  > +	/* Ensure read event is completed before putting old page */
>>  > +	mutex_lock(&ctx->ring_lock);
>>  > +	mutex_unlock(&ctx->ring_lock);
>>  >  	put_page(old);
>>  >  
>>  >  	return rc;
>>
>> This looks a bit weird. Would using a completion work here ?
> 
> Nope.  This is actually the most elegant fix I've seen for this approach, 
> as everything else has relied on adding additional spin locks (which only 
> end up being needed in the migration case) around access to the ring_pages 
> on the reader side.  That said, this patch is not a complete solution to 
> the problem, as the update of the ring's head pointer could still get lost 
> with this patch.  I think the right thing is just taking the ring_lock 
> mutex over the entire page migration operation.  That should be safe, as 
> nowhere else is the ring_lock mutex nested with any other locks.

This one is based on linux-next which has merged the following patch:
commit 692c9b8c5ee8d263bb8348171f0bebd3d84eb2c1
Author: Tang Chen <tangchen@cn.fujitsu.com>
Date:   Mon Mar 10 16:15:33 2014 +0800
aio, memory-hotplug: Fix confliction when migrating and accessing ring pages.

With this patch, the update of the ring's head pointer is safe because it is protected
by completion_lock, so we do not need to enlarge the ring_lock protection region.
And on the other side, if we take the ring_lock over the entire page migration
operation, reading events will be affected if the page migration is going.

Thanks,
Gu

> 
> 		-ben
> 
>> 	Dave
> 


--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] aio: fix the confliction of read events and migrating ring page
  2014-03-21  1:56     ` Gu Zheng
@ 2014-03-21 17:35       ` Benjamin LaHaise
  2014-03-21 18:35       ` [PATCH] aio: ensure access to ctx->ring_pages is correctly serialised Benjamin LaHaise
  1 sibling, 0 replies; 15+ messages in thread
From: Benjamin LaHaise @ 2014-03-21 17:35 UTC (permalink / raw)
  To: Gu Zheng
  Cc: Dave Jones, Al Viro, jmoyer, kosaki.motohiro, KAMEZAWA Hiroyuki,
	Yasuaki Ishimatsu, tangchen, miaox, linux-aio, fsdevel,
	linux-kernel, Andrew Morton

Hi Gu,

On Fri, Mar 21, 2014 at 09:56:36AM +0800, Gu Zheng wrote:
> This one is based on linux-next which has merged the following patch:
> commit 692c9b8c5ee8d263bb8348171f0bebd3d84eb2c1
> Author: Tang Chen <tangchen@cn.fujitsu.com>
> Date:   Mon Mar 10 16:15:33 2014 +0800
> aio, memory-hotplug: Fix confliction when migrating and accessing ring pages.
> 
> With this patch, the update of the ring's head pointer is safe because it is protected
> by completion_lock, so we do not need to enlarge the ring_lock protection region.
> And on the other side, if we take the ring_lock over the entire page migration
> operation, reading events will be affected if the page migration is going.

I have dropped this particular change of Tang's from my aio-next tree.  
The spinlock addition in the read events code path is not needed if we 
take the ring_lock mutex in the page migration code path.  This will 
ensure that we aren't needlessly regressing performance.  I'll send out 
my replacement patch in a couple of minutes.

		-ben
-- 
"Thought is the essence of where you are now."

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH] aio: ensure access to ctx->ring_pages is correctly serialised
  2014-03-21  1:56     ` Gu Zheng
  2014-03-21 17:35       ` Benjamin LaHaise
@ 2014-03-21 18:35       ` Benjamin LaHaise
  2014-03-24 10:56         ` Gu Zheng
                           ` (3 more replies)
  1 sibling, 4 replies; 15+ messages in thread
From: Benjamin LaHaise @ 2014-03-21 18:35 UTC (permalink / raw)
  To: Gu Zheng, Tang Chen
  Cc: Dave Jones, Al Viro, jmoyer, kosaki.motohiro, KAMEZAWA Hiroyuki,
	Yasuaki Ishimatsu, tangchen, miaox, linux-aio, fsdevel,
	linux-kernel, Andrew Morton

Hi all,

Based on the issues reported by Tang and Gu, I've come up with the an 
alternative fix that avoids adding additional locking in the event read 
code path.  The fix is to take the ring_lock mutex during page migration, 
which is already used to syncronize event readers and thus does not add 
any new locking requirements in aio_read_events_ring().  I've dropped 
the patches from Tang and Gu as a result.  This patch is now in my 
git://git.kvack.org/~bcrl/aio-next.git tree and will be sent to Linus 
once a few other people chime in with their reviews of this change.  
Please review Tang, Gu.  Thanks!

		-ben
-- 
"Thought is the essence of where you are now."

>From e52ddd946ab1def55c8282c8b3d0e80403abae12 Mon Sep 17 00:00:00 2001
From: Benjamin LaHaise <bcrl@kvack.org>
Date: Fri, 21 Mar 2014 14:26:43 -0400
Subject: [PATCH] aio: ensure access to ctx->ring_pages is correctly serialised

As reported by Tang Chen and Gu Zheng, the following issues exist in the
aio ring page migration support.

Tang Chen reported:

As a result, for example, we have the following problem:

            thread 1                      |              thread 2
                                          |
aio_migratepage()                         |
 |-> take ctx->completion_lock            |
 |-> migrate_page_copy(new, old)          |
 |   *NOW*, ctx->ring_pages[idx] == old   |
                                          |
                                          |    *NOW*, ctx->ring_pages[idx] == old
                                          |    aio_read_events_ring()
                                          |     |-> ring = kmap_atomic(ctx->ring_pages[0])
                                          |     |-> ring->head = head;          *HERE, write to the old ring page*
                                          |     |-> kunmap_atomic(ring);
                                          |
 |-> ctx->ring_pages[idx] = new           |
 |   *BUT NOW*, the content of            |
 |    ring_pages[idx] is old.             |
 |-> release ctx->completion_lock         |

As above, the new ring page will not be updated.

Fix this issue, as well as prevent races in aio_ring_setup() by taking
the ring_lock mutex during page migration and where otherwise applicable.
This avoids the overhead of taking another spinlock in
aio_read_events_ring() as Tang's and Gu's original fix did, pushing the
overhead into the migration code.

Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Gu Zheng <guz.fnst@cn.fujitsu.com>
---
 fs/aio.c | 40 ++++++++++++++++++++++++++++++++++++----
 1 file changed, 36 insertions(+), 4 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 062a5f6..c97cee8 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -241,8 +241,10 @@ static void put_aio_ring_file(struct kioctx *ctx)
 
 static void aio_free_ring(struct kioctx *ctx)
 {
+	unsigned long flags;
 	int i;
 
+	spin_lock_irqsave(&ctx->completion_lock, flags);
 	for (i = 0; i < ctx->nr_pages; i++) {
 		struct page *page;
 		pr_debug("pid(%d) [%d] page->count=%d\n", current->pid, i,
@@ -253,6 +255,7 @@ static void aio_free_ring(struct kioctx *ctx)
 		ctx->ring_pages[i] = NULL;
 		put_page(page);
 	}
+	spin_unlock_irqrestore(&ctx->completion_lock, flags);
 
 	put_aio_ring_file(ctx);
 
@@ -287,9 +290,20 @@ static int aio_migratepage(struct address_space *mapping, struct page *new,
 
 	rc = 0;
 
-	/* Make sure the old page hasn't already been changed */
+	/* Get a reference on the ioctx so we can take the ring_lock mutex. */
 	spin_lock(&mapping->private_lock);
 	ctx = mapping->private_data;
+	if (ctx)
+		percpu_ref_get(&ctx->users);
+	spin_unlock(&mapping->private_lock);
+
+	if (!ctx)
+		return -EINVAL;
+
+	mutex_lock(&ctx->ring_lock);
+
+	/* Make sure the old page hasn't already been changed */
+	spin_lock(&mapping->private_lock);
 	if (ctx) {
 		pgoff_t idx;
 		spin_lock_irqsave(&ctx->completion_lock, flags);
@@ -305,7 +319,7 @@ static int aio_migratepage(struct address_space *mapping, struct page *new,
 	spin_unlock(&mapping->private_lock);
 
 	if (rc != 0)
-		return rc;
+		goto out_unlock;
 
 	/* Writeback must be complete */
 	BUG_ON(PageWriteback(old));
@@ -314,7 +328,7 @@ static int aio_migratepage(struct address_space *mapping, struct page *new,
 	rc = migrate_page_move_mapping(mapping, new, old, NULL, mode, 1);
 	if (rc != MIGRATEPAGE_SUCCESS) {
 		put_page(new);
-		return rc;
+		goto out_unlock;
 	}
 
 	/* We can potentially race against kioctx teardown here.  Use the
@@ -346,6 +360,9 @@ static int aio_migratepage(struct address_space *mapping, struct page *new,
 	else
 		put_page(new);
 
+out_unlock:
+	mutex_unlock(&ctx->ring_lock);
+	percpu_ref_put(&ctx->users);
 	return rc;
 }
 #endif
@@ -556,9 +573,17 @@ static int ioctx_add_table(struct kioctx *ctx, struct mm_struct *mm)
 					rcu_read_unlock();
 					spin_unlock(&mm->ioctx_lock);
 
+					/*
+					 * Accessing ring pages must be done
+					 * holding ctx->completion_lock to
+					 * prevent aio ring page migration
+					 * procedure from migrating ring pages.
+					 */
+					spin_lock_irq(&ctx->completion_lock);
 					ring = kmap_atomic(ctx->ring_pages[0]);
 					ring->id = ctx->id;
 					kunmap_atomic(ring);
+					spin_unlock_irq(&ctx->completion_lock);
 					return 0;
 				}
 
@@ -657,7 +682,13 @@ static struct kioctx *ioctx_alloc(unsigned nr_events)
 	if (!ctx->cpu)
 		goto err;
 
-	if (aio_setup_ring(ctx) < 0)
+	/* Prevent races with page migration in aio_setup_ring() by holding
+	 * the ring_lock mutex.
+	 */
+	mutex_lock(&ctx->ring_lock);
+	err = aio_setup_ring(ctx);
+	mutex_unlock(&ctx->ring_lock);
+	if (err < 0)
 		goto err;
 
 	atomic_set(&ctx->reqs_available, ctx->nr_events - 1);
@@ -1024,6 +1055,7 @@ static long aio_read_events_ring(struct kioctx *ctx,
 
 	mutex_lock(&ctx->ring_lock);
 
+	/* Access to ->ring_pages here is protected by ctx->ring_lock. */
 	ring = kmap_atomic(ctx->ring_pages[0]);
 	head = ring->head;
 	tail = ring->tail;
-- 
1.8.2.1

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] aio: ensure access to ctx->ring_pages is correctly serialised
  2014-03-21 18:35       ` [PATCH] aio: ensure access to ctx->ring_pages is correctly serialised Benjamin LaHaise
@ 2014-03-24 10:56         ` Gu Zheng
  2014-03-24 10:59         ` [V2 PATCH 1/2] aio: clean up aio_migratepage() and related code much Gu Zheng
                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 15+ messages in thread
From: Gu Zheng @ 2014-03-24 10:56 UTC (permalink / raw)
  To: Benjamin LaHaise
  Cc: Tang Chen, Dave Jones, Al Viro, jmoyer, kosaki.motohiro,
	KAMEZAWA Hiroyuki, Yasuaki Ishimatsu, miaox, linux-aio, fsdevel,
	linux-kernel, Andrew Morton

Hi Ben,
On 03/22/2014 02:35 AM, Benjamin LaHaise wrote:

> Hi all,
> 
> Based on the issues reported by Tang and Gu, I've come up with the an 
> alternative fix that avoids adding additional locking in the event read 
> code path.  The fix is to take the ring_lock mutex during page migration, 
> which is already used to syncronize event readers and thus does not add 
> any new locking requirements in aio_read_events_ring().  I've dropped 
> the patches from Tang and Gu as a result.  This patch is now in my 
> git://git.kvack.org/~bcrl/aio-next.git tree and will be sent to Linus 
> once a few other people chime in with their reviews of this change.  
> Please review Tang, Gu.  Thanks!

As I mentioned before:
https://lkml.org/lkml/2014/3/20/34
We can put put_aio_ring_file() at the first of the
context teardown flow (aio_free_ring). Then, page migration and ctx freeing
will have mutual execution guarded by lock_page() v.s. truncate().
So that, the additional spinlock(address_space's private_lock) used to
protect use and updates of the mapping's private_data, and the sane check
of context is needless, so does the additional percpu_ref_get/put(&ctx->users).
But the enlarge ring_lock protection region to remove the additional spin_lock
is an elegant method if we ignore the effect to reading events while migrating
page is going.

Thanks,
Gu

> 
> 		-ben

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [V2 PATCH 1/2] aio: clean up aio_migratepage() and related code much
  2014-03-21 18:35       ` [PATCH] aio: ensure access to ctx->ring_pages is correctly serialised Benjamin LaHaise
  2014-03-24 10:56         ` Gu Zheng
@ 2014-03-24 10:59         ` Gu Zheng
  2014-03-24 13:20           ` Benjamin LaHaise
  2014-03-24 10:59         ` [V2 PATCH 2/2] aio: fix the confliction of aio read events and aio migrate page Gu Zheng
  2014-03-24 18:22         ` [PATCH] aio: ensure access to ctx->ring_pages is correctly serialised Sasha Levin
  3 siblings, 1 reply; 15+ messages in thread
From: Gu Zheng @ 2014-03-24 10:59 UTC (permalink / raw)
  To: Benjamin LaHaise
  Cc: Tang Chen, Dave Jones, Al Viro, jmoyer, kosaki.motohiro,
	KAMEZAWA Hiroyuki, Yasuaki Ishimatsu, miaox, linux-aio, fsdevel,
	linux-kernel, Andrew Morton

As the page migration framework holds lock_page() to protect the pages
(both old and new) while migrating, so while the page migrating, both
of old page and new page are locked. And the aio context teardown
routine will call *truncate*(in put_aio_ring_file()) to truncate
pagecache which needs to acquire page_lock() for each page one by one.
So there is a native mutual exclusion between *migrate page* v.s. truncate().

If put_aio_ring_file() is called at first of the context teardown flow
(aio_free_ring). Then, page migration and ctx freeing will have mutual
execution guarded by lock_page() v.s. truncate(). Once a page is removed
from radix-tree, it will not be migrated. On the other hand, the context
can not be freed while the page migraiton are ongoing.

aio_free_ring
-put_aio_ring_file()		  |
 |-truncate_setsize()             |migrate_pages()
 | |-truncate_inode_pages_range() | |-__unmap_and_move()
 |  |-trylock_page(page)          |  |-lock_page(old)
 |-i_mapping->private_data = NULL |  ...
 |-ctx->aio_ring_file=NULL        |   |-move_to_new_page()
 |                                |    |-trylock_page(newpage)
			          |     |-aio_migratepage()
So that, the additional spinlock(address_space's private_lock) used to
protect use and updates of the mapping's private_data, and the sane check
of context is needless, so remove it here.

Besides, we also move filling ring_pages[] array and ctx->nr_pages into the
page_lock protection in aio_setup_ring to keep the semantic unanimous.

Moreover, after migrate_page_move_mapping() success, page migration should
never fail, so here we merge the flow in one routine.

Thanks KAMEZAWA Hiroyuki very much for giving directions on this.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
---
v2: Rebased on latest linus' tree.
---
 fs/aio.c |   79 ++++++++++++++++++--------------------------------------------
 1 files changed, 23 insertions(+), 56 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 062a5f6..6453c12 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -229,11 +229,8 @@ static void put_aio_ring_file(struct kioctx *ctx)
 	if (aio_ring_file) {
 		truncate_setsize(aio_ring_file->f_inode, 0);
 
-		/* Prevent further access to the kioctx from migratepages */
-		spin_lock(&aio_ring_file->f_inode->i_mapping->private_lock);
 		aio_ring_file->f_inode->i_mapping->private_data = NULL;
 		ctx->aio_ring_file = NULL;
-		spin_unlock(&aio_ring_file->f_inode->i_mapping->private_lock);
 
 		fput(aio_ring_file);
 	}
@@ -243,6 +240,8 @@ static void aio_free_ring(struct kioctx *ctx)
 {
 	int i;
 
+	put_aio_ring_file(ctx);
+
 	for (i = 0; i < ctx->nr_pages; i++) {
 		struct page *page;
 		pr_debug("pid(%d) [%d] page->count=%d\n", current->pid, i,
@@ -254,8 +253,6 @@ static void aio_free_ring(struct kioctx *ctx)
 		put_page(page);
 	}
 
-	put_aio_ring_file(ctx);
-
 	if (ctx->ring_pages && ctx->ring_pages != ctx->internal_pages) {
 		kfree(ctx->ring_pages);
 		ctx->ring_pages = NULL;
@@ -283,32 +280,22 @@ static int aio_migratepage(struct address_space *mapping, struct page *new,
 {
 	struct kioctx *ctx;
 	unsigned long flags;
-	int rc;
-
-	rc = 0;
+	pgoff_t index;
+	int rc = MIGRATEPAGE_SUCCESS;
 
-	/* Make sure the old page hasn't already been changed */
-	spin_lock(&mapping->private_lock);
+	/*
+	 * Because old page is *locked*, if we see mapping, the page isn't
+	 * truncated, andmapping , mapping->private_data etc...are all valid.
+	 */
 	ctx = mapping->private_data;
-	if (ctx) {
-		pgoff_t idx;
-		spin_lock_irqsave(&ctx->completion_lock, flags);
-		idx = old->index;
-		if (idx < (pgoff_t)ctx->nr_pages) {
-			if (ctx->ring_pages[idx] != old)
-				rc = -EAGAIN;
-		} else
-			rc = -EINVAL;
-		spin_unlock_irqrestore(&ctx->completion_lock, flags);
-	} else
-		rc = -EINVAL;
-	spin_unlock(&mapping->private_lock);
 
-	if (rc != 0)
-		return rc;
+	index = old->index;
+	BUG_ON(index >= (pgoff_t)ctx->nr_pages);
+	VM_BUG_ON(ctx->ring_pages[index] != old);
 
 	/* Writeback must be complete */
 	BUG_ON(PageWriteback(old));
+	/* Extra ref cnt for rind_pages[] array */
 	get_page(new);
 
 	rc = migrate_page_move_mapping(mapping, new, old, NULL, mode, 1);
@@ -317,34 +304,13 @@ static int aio_migratepage(struct address_space *mapping, struct page *new,
 		return rc;
 	}
 
-	/* We can potentially race against kioctx teardown here.  Use the
-	 * address_space's private data lock to protect the mapping's
-	 * private_data.
-	 */
-	spin_lock(&mapping->private_lock);
-	ctx = mapping->private_data;
-	if (ctx) {
-		pgoff_t idx;
-		spin_lock_irqsave(&ctx->completion_lock, flags);
-		migrate_page_copy(new, old);
-		idx = old->index;
-		if (idx < (pgoff_t)ctx->nr_pages) {
-			/* And only do the move if things haven't changed */
-			if (ctx->ring_pages[idx] == old)
-				ctx->ring_pages[idx] = new;
-			else
-				rc = -EAGAIN;
-		} else
-			rc = -EINVAL;
-		spin_unlock_irqrestore(&ctx->completion_lock, flags);
-	} else
-		rc = -EBUSY;
-	spin_unlock(&mapping->private_lock);
+	spin_lock_irqsave(&ctx->completion_lock, flags);
+	/* Migration should not fail if migrate_page_move_mapping SUCCESS */
+	migrate_page_copy(new, old);
+	ctx->ring_pages[old->index] = new;
+	spin_unlock_irqrestore(&ctx->completion_lock, flags);
 
-	if (rc == MIGRATEPAGE_SUCCESS)
-		put_page(old);
-	else
-		put_page(new);
+	put_page(old);
 
 	return rc;
 }
@@ -397,6 +363,8 @@ static int aio_setup_ring(struct kioctx *ctx)
 		}
 	}
 
+	ctx->nr_pages = 0;
+
 	for (i = 0; i < nr_pages; i++) {
 		struct page *page;
 		page = find_or_create_page(file->f_inode->i_mapping,
@@ -407,13 +375,12 @@ static int aio_setup_ring(struct kioctx *ctx)
 			 current->pid, i, page_count(page));
 		SetPageUptodate(page);
 		SetPageDirty(page);
-		unlock_page(page);
-
 		ctx->ring_pages[i] = page;
+		ctx->nr_pages++;
+		unlock_page(page);
 	}
-	ctx->nr_pages = i;
 
-	if (unlikely(i != nr_pages)) {
+	if (unlikely(ctx->nr_pages != nr_pages)) {
 		aio_free_ring(ctx);
 		return -EAGAIN;
 	}
-- 
1.7.7


--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [V2 PATCH 1/2] aio: clean up aio_migratepage() and related code much
  2014-03-24 10:59         ` [V2 PATCH 1/2] aio: clean up aio_migratepage() and related code much Gu Zheng
@ 2014-03-24 13:20           ` Benjamin LaHaise
  2014-03-25 10:11             ` Gu Zheng
  0 siblings, 1 reply; 15+ messages in thread
From: Benjamin LaHaise @ 2014-03-24 13:20 UTC (permalink / raw)
  To: Gu Zheng
  Cc: Tang Chen, Dave Jones, Al Viro, jmoyer, kosaki.motohiro,
	KAMEZAWA Hiroyuki, Yasuaki Ishimatsu, miaox, linux-aio, fsdevel,
	linux-kernel, Andrew Morton

On Mon, Mar 24, 2014 at 06:59:30PM +0800, Gu Zheng wrote:
> As the page migration framework holds lock_page() to protect the pages
> (both old and new) while migrating, so while the page migrating, both
> of old page and new page are locked. And the aio context teardown
> routine will call *truncate*(in put_aio_ring_file()) to truncate
> pagecache which needs to acquire page_lock() for each page one by one.
> So there is a native mutual exclusion between *migrate page* v.s. truncate().
> 
> If put_aio_ring_file() is called at first of the context teardown flow
> (aio_free_ring). Then, page migration and ctx freeing will have mutual
> execution guarded by lock_page() v.s. truncate(). Once a page is removed
> from radix-tree, it will not be migrated. On the other hand, the context
> can not be freed while the page migraiton are ongoing.

Sorry, but your change to remove the taking of ->private_lock in 
put_aio_ring_file() is not safe.  If a malicious user reinstantiates 
any pages in the ring buffer's mmaping, there is nothing protecting 
the system against incoherent accesses of ->ring_pages.  One possible 
way of making this occur would be to use mremap() to expand the size 
of the mapping or move it to a different location in the user process' 
address space.  Yes, it's a tiny race, but it's possible.  There is 
absolutely no reason to remove this locking -- ring teardown is 
hardly a performance sensitive code path.  I'm going to stick with my 
approach instead.

		-ben
-- 
"Thought is the essence of where you are now."

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [V2 PATCH 1/2] aio: clean up aio_migratepage() and related code much
  2014-03-24 13:20           ` Benjamin LaHaise
@ 2014-03-25 10:11             ` Gu Zheng
  0 siblings, 0 replies; 15+ messages in thread
From: Gu Zheng @ 2014-03-25 10:11 UTC (permalink / raw)
  To: Benjamin LaHaise
  Cc: Tang Chen, Dave Jones, Al Viro, jmoyer, kosaki.motohiro,
	KAMEZAWA Hiroyuki, Yasuaki Ishimatsu, miaox, linux-aio, fsdevel,
	linux-kernel, Andrew Morton

Hi Ben,
On 03/24/2014 09:20 PM, Benjamin LaHaise wrote:

> On Mon, Mar 24, 2014 at 06:59:30PM +0800, Gu Zheng wrote:
>> As the page migration framework holds lock_page() to protect the pages
>> (both old and new) while migrating, so while the page migrating, both
>> of old page and new page are locked. And the aio context teardown
>> routine will call *truncate*(in put_aio_ring_file()) to truncate
>> pagecache which needs to acquire page_lock() for each page one by one.
>> So there is a native mutual exclusion between *migrate page* v.s. truncate().
>>
>> If put_aio_ring_file() is called at first of the context teardown flow
>> (aio_free_ring). Then, page migration and ctx freeing will have mutual
>> execution guarded by lock_page() v.s. truncate(). Once a page is removed
>> from radix-tree, it will not be migrated. On the other hand, the context
>> can not be freed while the page migraiton are ongoing.
> 
> Sorry, but your change to remove the taking of ->private_lock in 
> put_aio_ring_file() is not safe.  If a malicious user reinstantiates 
> any pages in the ring buffer's mmaping, there is nothing protecting 
> the system against incoherent accesses of ->ring_pages.  One possible 
> way of making this occur would be to use mremap() to expand the size 
> of the mapping or move it to a different location in the user process' 
> address space.  Yes, it's a tiny race, but it's possible.  There is 
> absolutely no reason to remove this locking -- ring teardown is 
> hardly a performance sensitive code path.  I'm going to stick with my 
> approach instead.

OK, you can go ahead via your approach, but I'll hold the reservation
about the issue you mentioned above before I find out it clearly.

BTW, please also send it to the 3.12.y and 3.13.y stable tree once it is
merged into Linus' tree.

Thanks,
Gu

> 
> 		-ben

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [V2 PATCH 2/2] aio: fix the confliction of aio read events and aio migrate page
  2014-03-21 18:35       ` [PATCH] aio: ensure access to ctx->ring_pages is correctly serialised Benjamin LaHaise
  2014-03-24 10:56         ` Gu Zheng
  2014-03-24 10:59         ` [V2 PATCH 1/2] aio: clean up aio_migratepage() and related code much Gu Zheng
@ 2014-03-24 10:59         ` Gu Zheng
  2014-03-24 18:22         ` [PATCH] aio: ensure access to ctx->ring_pages is correctly serialised Sasha Levin
  3 siblings, 0 replies; 15+ messages in thread
From: Gu Zheng @ 2014-03-24 10:59 UTC (permalink / raw)
  To: Benjamin LaHaise
  Cc: Tang Chen, Dave Jones, Al Viro, jmoyer, kosaki.motohiro,
	KAMEZAWA Hiroyuki, Yasuaki Ishimatsu, miaox, linux-aio, fsdevel,
	linux-kernel, Andrew Morton

Since we do not have additional protection on the page at the read events
side, so it is possible that the read of the page takes place after the
page has been freed and allocated to another part of the kernel. This
would result in the read returning invalid information.
As a result, for example, we have the following problem:

            thread 1                      |              thread 2
                                          |
aio_migratepage()                         |
 |-> take ctx->completion_lock            |
 |-> migrate_page_copy(new, old)          |
 |   *NOW*, ctx->ring_pages[idx] == old   |
                                          |
                                          |    *NOW*, ctx->ring_pages[idx] == old
                                          |    aio_read_events_ring()
                                          |     |-> ring = kmap_atomic(ctx->ring_pages[0])
                                          |     |-> ring->head = head;          *HERE, write to the old ring page*
                                          |     |-> kunmap_atomic(ring);
                                          |
 |-> ctx->ring_pages[idx] = new           |
 |   *BUT NOW*, the content of            |
 |    ring_pages[idx] is old.             |
 |-> release ctx->completion_lock         |

As above, the new ring page will not be updated.

Fix this issue, as well as prevent races in aio_ring_setup() by taking
the ring_lock mutex and completion_lock during page migration and where
otherwise applicable.

Reported-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
---
v2:
Merged Tang Chen's patch to use the spin_lock to protect the ring buffer update.
Use ring_lock rather than the additional spin_lock as Benjamin LaHaise suggested.
---
 fs/aio.c |   23 ++++++++++++++++++++++-
 1 files changed, 22 insertions(+), 1 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 6453c12..ee74704 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -298,6 +298,9 @@ static int aio_migratepage(struct address_space *mapping, struct page *new,
 	/* Extra ref cnt for rind_pages[] array */
 	get_page(new);
 
+	/* Ensure no aio read events is going when migrating page */
+	mutex_lock(&ctx->ring_lock);
+
 	rc = migrate_page_move_mapping(mapping, new, old, NULL, mode, 1);
 	if (rc != MIGRATEPAGE_SUCCESS) {
 		put_page(new);
@@ -312,6 +315,8 @@ static int aio_migratepage(struct address_space *mapping, struct page *new,
 
 	put_page(old);
 
+	mutex_unlock(&ctx->ring_lock);
+
 	return rc;
 }
 #endif
@@ -523,9 +528,18 @@ static int ioctx_add_table(struct kioctx *ctx, struct mm_struct *mm)
 					rcu_read_unlock();
 					spin_unlock(&mm->ioctx_lock);
 
+					/*
+					* Accessing ring pages must be done
+					* holding ctx->completion_lock to
+					* prevent aio ring page migration
+					* procedure from migrating ring pages.
+					*/
+					spin_lock_irq(&ctx->completion_lock);
 					ring = kmap_atomic(ctx->ring_pages[0]);
 					ring->id = ctx->id;
 					kunmap_atomic(ring);
+					spin_unlock_irq(&ctx->completion_lock);
+
 					return 0;
 				}
 
@@ -624,7 +638,14 @@ static struct kioctx *ioctx_alloc(unsigned nr_events)
 	if (!ctx->cpu)
 		goto err;
 
-	if (aio_setup_ring(ctx) < 0)
+	/*
+	 * Prevent races with page migration in aio_setup_ring() by holding
+	 * the ring_lock mutex.
+	 */
+	mutex_lock(&ctx->ring_lock);
+	err = aio_setup_ring(ctx);
+	mutex_unlock(&ctx->ring_lock);
+	if (err < 0)
 		goto err;
 
 	atomic_set(&ctx->reqs_available, ctx->nr_events - 1);
-- 
1.7.7

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] aio: ensure access to ctx->ring_pages is correctly serialised
  2014-03-21 18:35       ` [PATCH] aio: ensure access to ctx->ring_pages is correctly serialised Benjamin LaHaise
                           ` (2 preceding siblings ...)
  2014-03-24 10:59         ` [V2 PATCH 2/2] aio: fix the confliction of aio read events and aio migrate page Gu Zheng
@ 2014-03-24 18:22         ` Sasha Levin
  2014-03-24 19:07           ` Benjamin LaHaise
  3 siblings, 1 reply; 15+ messages in thread
From: Sasha Levin @ 2014-03-24 18:22 UTC (permalink / raw)
  To: Benjamin LaHaise, Gu Zheng, Tang Chen
  Cc: Dave Jones, Al Viro, jmoyer, kosaki.motohiro, KAMEZAWA Hiroyuki,
	Yasuaki Ishimatsu, miaox, linux-aio, fsdevel, linux-kernel,
	Andrew Morton, linux-mm@kvack.org

On 03/21/2014 02:35 PM, Benjamin LaHaise wrote:
> Hi all,
>
> Based on the issues reported by Tang and Gu, I've come up with the an
> alternative fix that avoids adding additional locking in the event read
> code path.  The fix is to take the ring_lock mutex during page migration,
> which is already used to syncronize event readers and thus does not add
> any new locking requirements in aio_read_events_ring().  I've dropped
> the patches from Tang and Gu as a result.  This patch is now in my
> git://git.kvack.org/~bcrl/aio-next.git tree and will be sent to Linus
> once a few other people chime in with their reviews of this change.
> Please review Tang, Gu.  Thanks!

Hi Benjamin,

This patch seems to trigger:

[  433.476216] ======================================================
[  433.478468] [ INFO: possible circular locking dependency detected ]
[  433.480900] 3.14.0-rc7-next-20140324-sasha-00015-g1fb7de8 #267 Tainted: G        W
[  433.480900] -------------------------------------------------------
[  433.480900] trinity-c57/13776 is trying to acquire lock:
[  433.480900]  (&ctx->ring_lock){+.+.+.}, at: aio_migratepage (include/linux/spinlock.h:303 fs/aio.c:306)
[  433.480900]
[  433.480900] but task is already holding lock:
[  433.480900]  (&mm->mmap_sem){++++++}, at: SYSC_move_pages (mm/migrate.c:1215 mm/migrate.c:1353 mm/migrate.c:1508)
[  433.480900]
[  433.480900] which lock already depends on the new lock.
[  433.480900]
[  433.480900]
[  433.480900] the existing dependency chain (in reverse order) is:
[  433.480900]
-> #1 (&mm->mmap_sem){++++++}:
[  433.480900]        lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602)
[  433.480900]        down_write (arch/x86/include/asm/rwsem.h:130 kernel/locking/rwsem.c:50)
[  433.480900]        SyS_io_setup (fs/aio.c:442 fs/aio.c:689 fs/aio.c:1201 fs/aio.c:1184)
[  433.480900]        tracesys (arch/x86/kernel/entry_64.S:749)
[  433.480900]
-> #0 (&ctx->ring_lock){+.+.+.}:
[  433.480900]        __lock_acquire (kernel/locking/lockdep.c:1840 kernel/locking/lockdep.c:1945 kernel/locking/lockdep.c:2131 kernel/locking/lockdep.c:3182)
[  433.480900]        lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602)
[  433.480900]        mutex_lock_nested (kernel/locking/mutex.c:486 kernel/locking/mutex.c:587)
[  433.480900]        aio_migratepage (include/linux/spinlock.h:303 fs/aio.c:306)
[  433.480900]        move_to_new_page (mm/migrate.c:777)
[  433.480900]        migrate_pages (mm/migrate.c:921 mm/migrate.c:960 mm/migrate.c:1126)
[  433.480900]        SYSC_move_pages (mm/migrate.c:1278 mm/migrate.c:1353 mm/migrate.c:1508)
[  433.480900]        SyS_move_pages (mm/migrate.c:1456)
[  433.480900]        tracesys (arch/x86/kernel/entry_64.S:749)
[  433.480900]
[  433.480900] other info that might help us debug this:
[  433.480900]
[  433.480900]  Possible unsafe locking scenario:
[  433.480900]
[  433.480900]        CPU0                    CPU1
[  433.480900]        ----                    ----
[  433.480900]   lock(&mm->mmap_sem);
[  433.480900]                                lock(&ctx->ring_lock);
[  433.480900]                                lock(&mm->mmap_sem);
[  433.480900]   lock(&ctx->ring_lock);
[  433.480900]
[  433.480900]  *** DEADLOCK ***
[  433.480900]
[  433.480900] 1 lock held by trinity-c57/13776:
[  433.480900]  #0:  (&mm->mmap_sem){++++++}, at: SYSC_move_pages (mm/migrate.c:1215 mm/migrate.c:1353 mm/migrate.c:1508)
[  433.480900]
[  433.480900] stack backtrace:
[  433.480900] CPU: 4 PID: 13776 Comm: trinity-c57 Tainted: G        W     3.14.0-rc7-next-20140324-sasha-00015-g1fb7de8 #267
[  433.480900]  ffffffff87a80790 ffff8806abbbb9a8 ffffffff844bae02 0000000000000000
[  433.480900]  ffffffff87a80790 ffff8806abbbb9f8 ffffffff844ad86d 0000000000000001
[  433.480900]  ffff8806abbbba88 ffff8806abbbb9f8 ffff8806ab8fbcf0 ffff8806ab8fbd28
[  433.480900] Call Trace:
[  433.480900]  dump_stack (lib/dump_stack.c:52)
[  433.480900]  print_circular_bug (kernel/locking/lockdep.c:1216)
[  433.480900]  __lock_acquire (kernel/locking/lockdep.c:1840 kernel/locking/lockdep.c:1945 kernel/locking/lockdep.c:2131 kernel/locking/lockdep.c:3182)
[  433.480900]  ? sched_clock (arch/x86/include/asm/paravirt.h:192 arch/x86/kernel/tsc.c:305)
[  433.480900]  ? sched_clock_local (kernel/sched/clock.c:214)
[  433.480900]  ? sched_clock_cpu (kernel/sched/clock.c:311)
[  433.480900]  ? __lock_acquire (kernel/locking/lockdep.c:3189)
[  433.480900]  lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602)
[  433.480900]  ? aio_migratepage (include/linux/spinlock.h:303 fs/aio.c:306)
[  433.480900]  mutex_lock_nested (kernel/locking/mutex.c:486 kernel/locking/mutex.c:587)
[  433.480900]  ? aio_migratepage (include/linux/spinlock.h:303 fs/aio.c:306)
[  433.480900]  ? aio_migratepage (fs/aio.c:303)
[  433.480900]  ? aio_migratepage (include/linux/spinlock.h:303 fs/aio.c:306)
[  433.480900]  ? aio_migratepage (include/linux/rcupdate.h:324 include/linux/rcupdate.h:909 include/linux/percpu-refcount.h:117 fs/aio.c:297)
[  433.480900]  ? preempt_count_sub (kernel/sched/core.c:2527)
[  433.480900]  aio_migratepage (include/linux/spinlock.h:303 fs/aio.c:306)
[  433.480900]  ? aio_migratepage (include/linux/rcupdate.h:886 include/linux/percpu-refcount.h:108 fs/aio.c:297)
[  433.480900]  ? mutex_unlock (kernel/locking/mutex.c:220)
[  433.480900]  move_to_new_page (mm/migrate.c:777)
[  433.480900]  ? try_to_unmap (mm/rmap.c:1516)
[  433.480900]  ? try_to_unmap_nonlinear (mm/rmap.c:1113)
[  433.480900]  ? invalid_migration_vma (mm/rmap.c:1472)
[  433.480900]  ? page_remove_rmap (mm/rmap.c:1380)
[  433.480900]  ? anon_vma_fork (mm/rmap.c:446)
[  433.480900]  migrate_pages (mm/migrate.c:921 mm/migrate.c:960 mm/migrate.c:1126)
[  433.480900]  ? follow_page_mask (mm/memory.c:1544)
[  433.480900]  ? alloc_misplaced_dst_page (mm/migrate.c:1177)
[  433.480900]  SYSC_move_pages (mm/migrate.c:1278 mm/migrate.c:1353 mm/migrate.c:1508)
[  433.480900]  ? SYSC_move_pages (include/linux/rcupdate.h:800 mm/migrate.c:1472)
[  433.480900]  ? sched_clock (arch/x86/include/asm/paravirt.h:192 arch/x86/kernel/tsc.c:305)
[  433.480900]  SyS_move_pages (mm/migrate.c:1456)
[  433.480900]  tracesys (arch/x86/kernel/entry_64.S:749)


Thanks,
Sasha

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] aio: ensure access to ctx->ring_pages is correctly serialised
  2014-03-24 18:22         ` [PATCH] aio: ensure access to ctx->ring_pages is correctly serialised Sasha Levin
@ 2014-03-24 19:07           ` Benjamin LaHaise
  2014-03-25 17:47             ` Sasha Levin
  0 siblings, 1 reply; 15+ messages in thread
From: Benjamin LaHaise @ 2014-03-24 19:07 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Gu Zheng, Tang Chen, Dave Jones, Al Viro, jmoyer, kosaki.motohiro,
	KAMEZAWA Hiroyuki, Yasuaki Ishimatsu, miaox, linux-aio, fsdevel,
	linux-kernel, Andrew Morton, linux-mm@kvack.org

On Mon, Mar 24, 2014 at 02:22:06PM -0400, Sasha Levin wrote:
> On 03/21/2014 02:35 PM, Benjamin LaHaise wrote:
> >Hi all,
> >
> >Based on the issues reported by Tang and Gu, I've come up with the an
> >alternative fix that avoids adding additional locking in the event read
> >code path.  The fix is to take the ring_lock mutex during page migration,
> >which is already used to syncronize event readers and thus does not add
> >any new locking requirements in aio_read_events_ring().  I've dropped
> >the patches from Tang and Gu as a result.  This patch is now in my
> >git://git.kvack.org/~bcrl/aio-next.git tree and will be sent to Linus
> >once a few other people chime in with their reviews of this change.
> >Please review Tang, Gu.  Thanks!
> 
> Hi Benjamin,
> 
> This patch seems to trigger:
> 
> [  433.476216] ======================================================
> [  433.478468] [ INFO: possible circular locking dependency detected ]
...

Yeah, that's a problem -- thanks for the report.  The ring_lock mutex can't 
be nested inside of mmap_sem, as aio_read_events_ring() can take a page 
fault while holding ring_mutex.  That makes the following change required.  
I'll fold this change into the patch that caused this issue.

		-ben
-- 
"Thought is the essence of where you are now."

diff --git a/fs/aio.c b/fs/aio.c
index c97cee8..f645e7e 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -300,7 +300,10 @@ static int aio_migratepage(struct address_space *mapping, struct page *new,
 	if (!ctx)
 		return -EINVAL;
 
-	mutex_lock(&ctx->ring_lock);
+	if (!mutex_trylock(&ctx->ring_lock)) {
+		percpu_ref_put(&ctx->users);
+		return -EAGAIN;
+	}
 
 	/* Make sure the old page hasn't already been changed */
 	spin_lock(&mapping->private_lock);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] aio: ensure access to ctx->ring_pages is correctly serialised
  2014-03-24 19:07           ` Benjamin LaHaise
@ 2014-03-25 17:47             ` Sasha Levin
  2014-03-25 18:57               ` Benjamin LaHaise
  0 siblings, 1 reply; 15+ messages in thread
From: Sasha Levin @ 2014-03-25 17:47 UTC (permalink / raw)
  To: Benjamin LaHaise
  Cc: Gu Zheng, Tang Chen, Dave Jones, Al Viro, jmoyer, kosaki.motohiro,
	KAMEZAWA Hiroyuki, Yasuaki Ishimatsu, miaox, linux-aio, fsdevel,
	linux-kernel, Andrew Morton, linux-mm@kvack.org

On 03/24/2014 03:07 PM, Benjamin LaHaise wrote:
> On Mon, Mar 24, 2014 at 02:22:06PM -0400, Sasha Levin wrote:
>> On 03/21/2014 02:35 PM, Benjamin LaHaise wrote:
>>> Hi all,
>>>
>>> Based on the issues reported by Tang and Gu, I've come up with the an
>>> alternative fix that avoids adding additional locking in the event read
>>> code path.  The fix is to take the ring_lock mutex during page migration,
>>> which is already used to syncronize event readers and thus does not add
>>> any new locking requirements in aio_read_events_ring().  I've dropped
>>> the patches from Tang and Gu as a result.  This patch is now in my
>>> git://git.kvack.org/~bcrl/aio-next.git tree and will be sent to Linus
>>> once a few other people chime in with their reviews of this change.
>>> Please review Tang, Gu.  Thanks!
>>
>> Hi Benjamin,
>>
>> This patch seems to trigger:
>>
>> [  433.476216] ======================================================
>> [  433.478468] [ INFO: possible circular locking dependency detected ]
> ...
>
> Yeah, that's a problem -- thanks for the report.  The ring_lock mutex can't
> be nested inside of mmap_sem, as aio_read_events_ring() can take a page
> fault while holding ring_mutex.  That makes the following change required.
> I'll fold this change into the patch that caused this issue.

Yup, that does the trick.

Could you please add something to document why this is a trylock instead of a lock? If
I were reading the code there's no way I'd understand what's the reason behind it
without knowing of this bug report.


Thanks,
Sasha

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] aio: ensure access to ctx->ring_pages is correctly serialised
  2014-03-25 17:47             ` Sasha Levin
@ 2014-03-25 18:57               ` Benjamin LaHaise
  0 siblings, 0 replies; 15+ messages in thread
From: Benjamin LaHaise @ 2014-03-25 18:57 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Gu Zheng, Tang Chen, Dave Jones, Al Viro, jmoyer, kosaki.motohiro,
	KAMEZAWA Hiroyuki, Yasuaki Ishimatsu, miaox, linux-aio, fsdevel,
	linux-kernel, Andrew Morton, linux-mm@kvack.org

Hi Sasha,

On Tue, Mar 25, 2014 at 01:47:40PM -0400, Sasha Levin wrote:
> On 03/24/2014 03:07 PM, Benjamin LaHaise wrote:
...
> >Yeah, that's a problem -- thanks for the report.  The ring_lock mutex can't
> >be nested inside of mmap_sem, as aio_read_events_ring() can take a page
> >fault while holding ring_mutex.  That makes the following change required.
> >I'll fold this change into the patch that caused this issue.
> 
> Yup, that does the trick.
> 
> Could you please add something to document why this is a trylock instead of 
> a lock? If
> I were reading the code there's no way I'd understand what's the reason 
> behind it
> without knowing of this bug report.

Done.  I've updated the patch in my aio-next.git tree, so it should be in 
tomorrow's linux-next, and will give it one last day for any further problem 
reports.  Thanks for testing!

		-ben

> Thanks,
> Sasha

-- 
"Thought is the essence of where you are now."

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2014-03-25 18:57 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-03-20  5:46 [PATCH 2/2] aio: fix the confliction of read events and migrating ring page Gu Zheng
2014-03-20 14:32 ` Dave Jones
2014-03-20 16:30   ` Benjamin LaHaise
2014-03-21  1:56     ` Gu Zheng
2014-03-21 17:35       ` Benjamin LaHaise
2014-03-21 18:35       ` [PATCH] aio: ensure access to ctx->ring_pages is correctly serialised Benjamin LaHaise
2014-03-24 10:56         ` Gu Zheng
2014-03-24 10:59         ` [V2 PATCH 1/2] aio: clean up aio_migratepage() and related code much Gu Zheng
2014-03-24 13:20           ` Benjamin LaHaise
2014-03-25 10:11             ` Gu Zheng
2014-03-24 10:59         ` [V2 PATCH 2/2] aio: fix the confliction of aio read events and aio migrate page Gu Zheng
2014-03-24 18:22         ` [PATCH] aio: ensure access to ctx->ring_pages is correctly serialised Sasha Levin
2014-03-24 19:07           ` Benjamin LaHaise
2014-03-25 17:47             ` Sasha Levin
2014-03-25 18:57               ` Benjamin LaHaise

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).