[PATCH 1/1] aio, memory-hotplug: Fix confliction when migrating and accessing ring pages.

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 1/1] aio, memory-hotplug: Fix confliction when migrating and accessing ring pages.
@ 2014-02-26  8:38 Tang Chen
  2014-02-27  0:26 ` Tang Chen
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Tang Chen @ 2014-02-26  8:38 UTC (permalink / raw)
  To: viro, bcrl, jmoyer, kosaki.motohiro, kosaki.motohiro,
	isimatu.yasuaki, guz.fnst
  Cc: linux-fsdevel, linux-aio, linux-kernel

AIO ring page migration has been implemented by the following patch:

        https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/fs/aio.c?id=36bc08cc01709b4a9bb563b35aa530241ddc63e3

In this patch, ctx->completion_lock is used to prevent other processes
from accessing the ring page being migrated.

But in aio_setup_ring(), ioctx_add_table() and aio_read_events_ring(),
when writing to the ring page, they didn't take ctx->completion_lock.

As a result, for example, we have the following problem:

            thread 1                      |              thread 2
                                          |
aio_migratepage()                         |
 |-> take ctx->completion_lock            |
 |-> migrate_page_copy(new, old)          |
 |   *NOW*, ctx->ring_pages[idx] == old   |
                                          |
                                          |    *NOW*, ctx->ring_pages[idx] == old
                                          |    aio_read_events_ring()
                                          |     |-> ring = kmap_atomic(ctx->ring_pages[0])
                                          |     |-> ring->head = head;          *HERE, write to the old ring page*
                                          |     |-> kunmap_atomic(ring);
                                          |
 |-> ctx->ring_pages[idx] = new           |
 |   *BUT NOW*, the content of            |
 |    ring_pages[idx] is old.             |
 |-> release ctx->completion_lock         |

As above, the new ring page will not be updated.

The solution is taking ctx->completion_lock in thread 2, which means,
in aio_setup_ring(), ioctx_add_table() and aio_read_events_ring() when
writing to ring pages.


Reported-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
---
 fs/aio.c | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/fs/aio.c b/fs/aio.c
index 062a5f6..50c089c 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -366,6 +366,7 @@ static int aio_setup_ring(struct kioctx *ctx)
 	int nr_pages;
 	int i;
 	struct file *file;
+	unsigned long flags;
 
 	/* Compensate for the ring buffer's head/tail overlap entry */
 	nr_events += 2;	/* 1 is required, 2 for good luck */
@@ -437,6 +438,14 @@ static int aio_setup_ring(struct kioctx *ctx)
 	ctx->user_id = ctx->mmap_base;
 	ctx->nr_events = nr_events; /* trusted copy */
 
+	/*
+	 * The aio ring pages are user space pages, so they can be migrated.
+	 * When writing to an aio ring page, we should ensure the page is not
+	 * being migrated. Aio page migration procedure is protected by
+	 * ctx->completion_lock, so we add this lock here.
+	 */
+	spin_lock_irqsave(&ctx->completion_lock, flags);
+
 	ring = kmap_atomic(ctx->ring_pages[0]);
 	ring->nr = nr_events;	/* user copy */
 	ring->id = ~0U;
@@ -448,6 +457,8 @@ static int aio_setup_ring(struct kioctx *ctx)
 	kunmap_atomic(ring);
 	flush_dcache_page(ctx->ring_pages[0]);
 
+	spin_unlock_irqrestore(&ctx->completion_lock, flags);
+
 	return 0;
 }
 
@@ -542,6 +553,7 @@ static int ioctx_add_table(struct kioctx *ctx, struct mm_struct *mm)
 	unsigned i, new_nr;
 	struct kioctx_table *table, *old;
 	struct aio_ring *ring;
+	unsigned long flags;
 
 	spin_lock(&mm->ioctx_lock);
 	rcu_read_lock();
@@ -556,9 +568,19 @@ static int ioctx_add_table(struct kioctx *ctx, struct mm_struct *mm)
 					rcu_read_unlock();
 					spin_unlock(&mm->ioctx_lock);
 
+					/*
+					 * Accessing ring pages must be done
+					 * holding ctx->completion_lock to
+					 * prevent aio ring page migration
+					 * procedure from migrating ring pages.
+					 */
+					spin_lock_irqsave(&ctx->completion_lock,
+							  flags);
 					ring = kmap_atomic(ctx->ring_pages[0]);
 					ring->id = ctx->id;
 					kunmap_atomic(ring);
+					spin_unlock_irqrestore(
+						&ctx->completion_lock, flags);
 					return 0;
 				}
 
@@ -1021,6 +1043,7 @@ static long aio_read_events_ring(struct kioctx *ctx,
 	unsigned head, tail, pos;
 	long ret = 0;
 	int copy_ret;
+	unsigned long flags;
 
 	mutex_lock(&ctx->ring_lock);
 
@@ -1066,11 +1089,21 @@ static long aio_read_events_ring(struct kioctx *ctx,
 		head %= ctx->nr_events;
 	}
 
+	/*
+	 * The aio ring pages are user space pages, so they can be migrated.
+	 * When writing to an aio ring page, we should ensure the page is not
+	 * being migrated. Aio page migration procedure is protected by
+	 * ctx->completion_lock, so we add this lock here.
+	 */
+	spin_lock_irqsave(&ctx->completion_lock, flags);
+
 	ring = kmap_atomic(ctx->ring_pages[0]);
 	ring->head = head;
 	kunmap_atomic(ring);
 	flush_dcache_page(ctx->ring_pages[0]);
 
+	spin_unlock_irqrestore(&ctx->completion_lock, flags);
+
 	pr_debug("%li  h%u t%u\n", ret, head, tail);
 
 	put_reqs_available(ctx, ret);
-- 
1.8.3.1

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/1] aio, memory-hotplug: Fix confliction when migrating and accessing ring pages.
  2014-02-26  8:38 [PATCH 1/1] aio, memory-hotplug: Fix confliction when migrating and accessing ring pages Tang Chen
@ 2014-02-27  0:26 ` Tang Chen
  2014-02-27 10:03   ` Benjamin LaHaise
  2014-03-10  5:30 ` [V2 PATCH 0/2] Bug fix in aio ring page migration Tang Chen
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 7+ messages in thread
From: Tang Chen @ 2014-02-27  0:26 UTC (permalink / raw)
  To: viro, bcrl, jmoyer, kosaki.motohiro, kosaki.motohiro,
	isimatu.yasuaki, guz.fnst
  Cc: linux-fsdevel, linux-aio, linux-kernel


Hi all,

On 02/26/2014 04:38 PM, Tang Chen wrote:
> AIO ring page migration has been implemented by the following patch:
>
>          https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/fs/aio.c?id=36bc08cc01709b4a9bb563b35aa530241ddc63e3

Forgot to mention that the above patch was merged when Linux 3.12 was 
released.
So I think this problem exists in 3.12 stable tree.

If the following solution is acceptable, we need to merge it to 3.12 
stable tree, too.

Please reply ASAP.

Thanks.

>
> In this patch, ctx->completion_lock is used to prevent other processes
> from accessing the ring page being migrated.
>
> But in aio_setup_ring(), ioctx_add_table() and aio_read_events_ring(),
> when writing to the ring page, they didn't take ctx->completion_lock.
>
> As a result, for example, we have the following problem:
>
>              thread 1                      |              thread 2
>                                            |
> aio_migratepage()                         |
>   |->  take ctx->completion_lock            |
>   |->  migrate_page_copy(new, old)          |
>   |   *NOW*, ctx->ring_pages[idx] == old   |
>                                            |
>                                            |    *NOW*, ctx->ring_pages[idx] == old
>                                            |    aio_read_events_ring()
>                                            |     |->  ring = kmap_atomic(ctx->ring_pages[0])
>                                            |     |->  ring->head = head;          *HERE, write to the old ring page*
>                                            |     |->  kunmap_atomic(ring);
>                                            |
>   |->  ctx->ring_pages[idx] = new           |
>   |   *BUT NOW*, the content of            |
>   |    ring_pages[idx] is old.             |
>   |->  release ctx->completion_lock         |
>
> As above, the new ring page will not be updated.
>
> The solution is taking ctx->completion_lock in thread 2, which means,
> in aio_setup_ring(), ioctx_add_table() and aio_read_events_ring() when
> writing to ring pages.
>
>
> Reported-by: Yasuaki Ishimatsu<isimatu.yasuaki@jp.fujitsu.com>
> Signed-off-by: Tang Chen<tangchen@cn.fujitsu.com>
> ---
>   fs/aio.c | 33 +++++++++++++++++++++++++++++++++
>   1 file changed, 33 insertions(+)
>
> diff --git a/fs/aio.c b/fs/aio.c
> index 062a5f6..50c089c 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -366,6 +366,7 @@ static int aio_setup_ring(struct kioctx *ctx)
>   	int nr_pages;
>   	int i;
>   	struct file *file;
> +	unsigned long flags;
>
>   	/* Compensate for the ring buffer's head/tail overlap entry */
>   	nr_events += 2;	/* 1 is required, 2 for good luck */
> @@ -437,6 +438,14 @@ static int aio_setup_ring(struct kioctx *ctx)
>   	ctx->user_id = ctx->mmap_base;
>   	ctx->nr_events = nr_events; /* trusted copy */
>
> +	/*
> +	 * The aio ring pages are user space pages, so they can be migrated.
> +	 * When writing to an aio ring page, we should ensure the page is not
> +	 * being migrated. Aio page migration procedure is protected by
> +	 * ctx->completion_lock, so we add this lock here.
> +	 */
> +	spin_lock_irqsave(&ctx->completion_lock, flags);
> +
>   	ring = kmap_atomic(ctx->ring_pages[0]);
>   	ring->nr = nr_events;	/* user copy */
>   	ring->id = ~0U;
> @@ -448,6 +457,8 @@ static int aio_setup_ring(struct kioctx *ctx)
>   	kunmap_atomic(ring);
>   	flush_dcache_page(ctx->ring_pages[0]);
>
> +	spin_unlock_irqrestore(&ctx->completion_lock, flags);
> +
>   	return 0;
>   }
>
> @@ -542,6 +553,7 @@ static int ioctx_add_table(struct kioctx *ctx, struct mm_struct *mm)
>   	unsigned i, new_nr;
>   	struct kioctx_table *table, *old;
>   	struct aio_ring *ring;
> +	unsigned long flags;
>
>   	spin_lock(&mm->ioctx_lock);
>   	rcu_read_lock();
> @@ -556,9 +568,19 @@ static int ioctx_add_table(struct kioctx *ctx, struct mm_struct *mm)
>   					rcu_read_unlock();
>   					spin_unlock(&mm->ioctx_lock);
>
> +					/*
> +					 * Accessing ring pages must be done
> +					 * holding ctx->completion_lock to
> +					 * prevent aio ring page migration
> +					 * procedure from migrating ring pages.
> +					 */
> +					spin_lock_irqsave(&ctx->completion_lock,
> +							  flags);
>   					ring = kmap_atomic(ctx->ring_pages[0]);
>   					ring->id = ctx->id;
>   					kunmap_atomic(ring);
> +					spin_unlock_irqrestore(
> +						&ctx->completion_lock, flags);
>   					return 0;
>   				}
>
> @@ -1021,6 +1043,7 @@ static long aio_read_events_ring(struct kioctx *ctx,
>   	unsigned head, tail, pos;
>   	long ret = 0;
>   	int copy_ret;
> +	unsigned long flags;
>
>   	mutex_lock(&ctx->ring_lock);
>
> @@ -1066,11 +1089,21 @@ static long aio_read_events_ring(struct kioctx *ctx,
>   		head %= ctx->nr_events;
>   	}
>
> +	/*
> +	 * The aio ring pages are user space pages, so they can be migrated.
> +	 * When writing to an aio ring page, we should ensure the page is not
> +	 * being migrated. Aio page migration procedure is protected by
> +	 * ctx->completion_lock, so we add this lock here.
> +	 */
> +	spin_lock_irqsave(&ctx->completion_lock, flags);
> +
>   	ring = kmap_atomic(ctx->ring_pages[0]);
>   	ring->head = head;
>   	kunmap_atomic(ring);
>   	flush_dcache_page(ctx->ring_pages[0]);
>
> +	spin_unlock_irqrestore(&ctx->completion_lock, flags);
> +
>   	pr_debug("%li  h%u t%u\n", ret, head, tail);
>
>   	put_reqs_available(ctx, ret);

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/1] aio, memory-hotplug: Fix confliction when migrating and accessing ring pages.
  2014-02-27  0:26 ` Tang Chen
@ 2014-02-27 10:03   ` Benjamin LaHaise
  2014-02-27 14:38     ` tom
  0 siblings, 1 reply; 7+ messages in thread
From: Benjamin LaHaise @ 2014-02-27 10:03 UTC (permalink / raw)
  To: Tang Chen
  Cc: viro, jmoyer, kosaki.motohiro, kosaki.motohiro, isimatu.yasuaki,
	guz.fnst, linux-fsdevel, linux-aio, linux-kernel

On Thu, Feb 27, 2014 at 08:26:16AM +0800, Tang Chen wrote:
> Forgot to mention that the above patch was merged when Linux 3.12 was 
> released.
> So I think this problem exists in 3.12 stable tree.
> 
> If the following solution is acceptable, we need to merge it to 3.12 
> stable tree, too.
> 
> Please reply ASAP.

I'm travelling right now and won't be testing this patch until I get back 
home in about a week, so, for now, I'll apply the patch to my aio-next tree 
so that it gets some exposure to the various trinty runs and other tools 
people run against the -next tree.  I'll then push it out to Linus once I've 
run my own sanity tests next week.  Regards,

		-ben

> Thanks.
> 
> >
> >In this patch, ctx->completion_lock is used to prevent other processes
> >from accessing the ring page being migrated.
> >
> >But in aio_setup_ring(), ioctx_add_table() and aio_read_events_ring(),
> >when writing to the ring page, they didn't take ctx->completion_lock.
> >
> >As a result, for example, we have the following problem:
> >
> >             thread 1                      |              thread 2
> >                                           |
> >aio_migratepage()                         |
> >  |->  take ctx->completion_lock            |
> >  |->  migrate_page_copy(new, old)          |
> >  |   *NOW*, ctx->ring_pages[idx] == old   |
> >                                           |
> >                                           |    *NOW*, 
> >                                           ctx->ring_pages[idx] == old
> >                                           |    aio_read_events_ring()
> >                                           |     |->  ring = 
> >                                           kmap_atomic(ctx->ring_pages[0])
> >                                           |     |->  ring->head = head;   
> >                                           *HERE, write to the old ring 
> >                                           page*
> >                                           |     |->  kunmap_atomic(ring);
> >                                           |
> >  |->  ctx->ring_pages[idx] = new           |
> >  |   *BUT NOW*, the content of            |
> >  |    ring_pages[idx] is old.             |
> >  |->  release ctx->completion_lock         |
> >
> >As above, the new ring page will not be updated.
> >
> >The solution is taking ctx->completion_lock in thread 2, which means,
> >in aio_setup_ring(), ioctx_add_table() and aio_read_events_ring() when
> >writing to ring pages.
> >
> >
> >Reported-by: Yasuaki Ishimatsu<isimatu.yasuaki@jp.fujitsu.com>
> >Signed-off-by: Tang Chen<tangchen@cn.fujitsu.com>
> >---
> >  fs/aio.c | 33 +++++++++++++++++++++++++++++++++
> >  1 file changed, 33 insertions(+)
> >
> >diff --git a/fs/aio.c b/fs/aio.c
> >index 062a5f6..50c089c 100644
> >--- a/fs/aio.c
> >+++ b/fs/aio.c
> >@@ -366,6 +366,7 @@ static int aio_setup_ring(struct kioctx *ctx)
> >  	int nr_pages;
> >  	int i;
> >  	struct file *file;
> >+	unsigned long flags;
> >
> >  	/* Compensate for the ring buffer's head/tail overlap entry */
> >  	nr_events += 2;	/* 1 is required, 2 for good luck */
> >@@ -437,6 +438,14 @@ static int aio_setup_ring(struct kioctx *ctx)
> >  	ctx->user_id = ctx->mmap_base;
> >  	ctx->nr_events = nr_events; /* trusted copy */
> >
> >+	/*
> >+	 * The aio ring pages are user space pages, so they can be migrated.
> >+	 * When writing to an aio ring page, we should ensure the page is not
> >+	 * being migrated. Aio page migration procedure is protected by
> >+	 * ctx->completion_lock, so we add this lock here.
> >+	 */
> >+	spin_lock_irqsave(&ctx->completion_lock, flags);
> >+
> >  	ring = kmap_atomic(ctx->ring_pages[0]);
> >  	ring->nr = nr_events;	/* user copy */
> >  	ring->id = ~0U;
> >@@ -448,6 +457,8 @@ static int aio_setup_ring(struct kioctx *ctx)
> >  	kunmap_atomic(ring);
> >  	flush_dcache_page(ctx->ring_pages[0]);
> >
> >+	spin_unlock_irqrestore(&ctx->completion_lock, flags);
> >+
> >  	return 0;
> >  }
> >
> >@@ -542,6 +553,7 @@ static int ioctx_add_table(struct kioctx *ctx, struct 
> >mm_struct *mm)
> >  	unsigned i, new_nr;
> >  	struct kioctx_table *table, *old;
> >  	struct aio_ring *ring;
> >+	unsigned long flags;
> >
> >  	spin_lock(&mm->ioctx_lock);
> >  	rcu_read_lock();
> >@@ -556,9 +568,19 @@ static int ioctx_add_table(struct kioctx *ctx, struct 
> >mm_struct *mm)
> >  					rcu_read_unlock();
> >  					spin_unlock(&mm->ioctx_lock);
> >
> >+					/*
> >+					 * Accessing ring pages must be done
> >+					 * holding ctx->completion_lock to
> >+					 * prevent aio ring page migration
> >+					 * procedure from migrating ring 
> >pages.
> >+					 */
> >+				 spin_lock_irqsave(&ctx->completion_lock,
> >+							  flags);
> >  					ring = 
> >  					kmap_atomic(ctx->ring_pages[0]);
> >  					ring->id = ctx->id;
> >  					kunmap_atomic(ring);
> >+					spin_unlock_irqrestore(
> >+						&ctx->completion_lock, 
> >flags);
> >  					return 0;
> >  				}
> >
> >@@ -1021,6 +1043,7 @@ static long aio_read_events_ring(struct kioctx *ctx,
> >  	unsigned head, tail, pos;
> >  	long ret = 0;
> >  	int copy_ret;
> >+	unsigned long flags;
> >
> >  	mutex_lock(&ctx->ring_lock);
> >
> >@@ -1066,11 +1089,21 @@ static long aio_read_events_ring(struct kioctx 
> >*ctx,
> >  		head %= ctx->nr_events;
> >  	}
> >
> >+	/*
> >+	 * The aio ring pages are user space pages, so they can be migrated.
> >+	 * When writing to an aio ring page, we should ensure the page is not
> >+	 * being migrated. Aio page migration procedure is protected by
> >+	 * ctx->completion_lock, so we add this lock here.
> >+	 */
> >+	spin_lock_irqsave(&ctx->completion_lock, flags);
> >+
> >  	ring = kmap_atomic(ctx->ring_pages[0]);
> >  	ring->head = head;
> >  	kunmap_atomic(ring);
> >  	flush_dcache_page(ctx->ring_pages[0]);
> >
> >+	spin_unlock_irqrestore(&ctx->completion_lock, flags);
> >+
> >  	pr_debug("%li  h%u t%u\n", ret, head, tail);
> >
> >  	put_reqs_available(ctx, ret);

-- 
"Thought is the essence of where you are now."

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/1] aio, memory-hotplug: Fix confliction when migrating and accessing ring pages.
  2014-02-27 10:03   ` Benjamin LaHaise
@ 2014-02-27 14:38     ` tom
  0 siblings, 0 replies; 7+ messages in thread
From: tom @ 2014-02-27 14:38 UTC (permalink / raw)
  To: Benjamin LaHaise
  Cc: Tang Chen, viro, jmoyer, kosaki.motohiro, kosaki.motohiro,
	isimatu.yasuaki, Gu Zheng, linux-fsdevel, linux-aio, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 6783 bytes --]

Hi Ben,

在 2014年2月27日，下午7:03，Benjamin LaHaise <bcrl@kvack.org> 写道：

> On Thu, Feb 27, 2014 at 08:26:16AM +0800, Tang Chen wrote:
>> Forgot to mention that the above patch was merged when Linux 3.12 was 
>> released.
>> So I think this problem exists in 3.12 stable tree.
>> 
>> If the following solution is acceptable, we need to merge it to 3.12 
>> stable tree, too.
>> 
>> Please reply ASAP.
> 
> I'm travelling right now and won't be testing this patch until I get back 
> home in about a week, so, for now, I'll apply the patch to my aio-next tree 
> so that it gets some exposure to the various trinty runs and other tools 
> people run against the -next tree.  I'll then push it out to Linus once I've 
> run my own sanity tests next week.  Regards,

Thanks for the reply. 

And if you have time, please also have a look at the following patch-set.

https://lkml.org/lkml/2014/2/27/143

patch1 is the same. And I think patch2 is also needed.

Thanks.

> 
> 		-ben
> 
>> Thanks.
>> 
>>> 
>>> In this patch, ctx->completion_lock is used to prevent other processes
>>> from accessing the ring page being migrated.
>>> 
>>> But in aio_setup_ring(), ioctx_add_table() and aio_read_events_ring(),
>>> when writing to the ring page, they didn't take ctx->completion_lock.
>>> 
>>> As a result, for example, we have the following problem:
>>> 
>>>            thread 1                      |              thread 2
>>>                                          |
>>> aio_migratepage()                         |
>>> |->  take ctx->completion_lock            |
>>> |->  migrate_page_copy(new, old)          |
>>> |   *NOW*, ctx->ring_pages[idx] == old   |
>>>                                          |
>>>                                          |    *NOW*, 
>>>                                          ctx->ring_pages[idx] == old
>>>                                          |    aio_read_events_ring()
>>>                                          |     |->  ring = 
>>>                                          kmap_atomic(ctx->ring_pages[0])
>>>                                          |     |->  ring->head = head;   
>>>                                          *HERE, write to the old ring 
>>>                                          page*
>>>                                          |     |->  kunmap_atomic(ring);
>>>                                          |
>>> |->  ctx->ring_pages[idx] = new           |
>>> |   *BUT NOW*, the content of            |
>>> |    ring_pages[idx] is old.             |
>>> |->  release ctx->completion_lock         |
>>> 
>>> As above, the new ring page will not be updated.
>>> 
>>> The solution is taking ctx->completion_lock in thread 2, which means,
>>> in aio_setup_ring(), ioctx_add_table() and aio_read_events_ring() when
>>> writing to ring pages.
>>> 
>>> 
>>> Reported-by: Yasuaki Ishimatsu<isimatu.yasuaki@jp.fujitsu.com>
>>> Signed-off-by: Tang Chen<tangchen@cn.fujitsu.com>
>>> ---
>>> fs/aio.c | 33 +++++++++++++++++++++++++++++++++
>>> 1 file changed, 33 insertions(+)
>>> 
>>> diff --git a/fs/aio.c b/fs/aio.c
>>> index 062a5f6..50c089c 100644
>>> --- a/fs/aio.c
>>> +++ b/fs/aio.c
>>> @@ -366,6 +366,7 @@ static int aio_setup_ring(struct kioctx *ctx)
>>> 	int nr_pages;
>>> 	int i;
>>> 	struct file *file;
>>> +	unsigned long flags;
>>> 
>>> 	/* Compensate for the ring buffer's head/tail overlap entry */
>>> 	nr_events += 2;	/* 1 is required, 2 for good luck */
>>> @@ -437,6 +438,14 @@ static int aio_setup_ring(struct kioctx *ctx)
>>> 	ctx->user_id = ctx->mmap_base;
>>> 	ctx->nr_events = nr_events; /* trusted copy */
>>> 
>>> +	/*
>>> +	 * The aio ring pages are user space pages, so they can be migrated.
>>> +	 * When writing to an aio ring page, we should ensure the page is not
>>> +	 * being migrated. Aio page migration procedure is protected by
>>> +	 * ctx->completion_lock, so we add this lock here.
>>> +	 */
>>> +	spin_lock_irqsave(&ctx->completion_lock, flags);
>>> +
>>> 	ring = kmap_atomic(ctx->ring_pages[0]);
>>> 	ring->nr = nr_events;	/* user copy */
>>> 	ring->id = ~0U;
>>> @@ -448,6 +457,8 @@ static int aio_setup_ring(struct kioctx *ctx)
>>> 	kunmap_atomic(ring);
>>> 	flush_dcache_page(ctx->ring_pages[0]);
>>> 
>>> +	spin_unlock_irqrestore(&ctx->completion_lock, flags);
>>> +
>>> 	return 0;
>>> }
>>> 
>>> @@ -542,6 +553,7 @@ static int ioctx_add_table(struct kioctx *ctx, struct 
>>> mm_struct *mm)
>>> 	unsigned i, new_nr;
>>> 	struct kioctx_table *table, *old;
>>> 	struct aio_ring *ring;
>>> +	unsigned long flags;
>>> 
>>> 	spin_lock(&mm->ioctx_lock);
>>> 	rcu_read_lock();
>>> @@ -556,9 +568,19 @@ static int ioctx_add_table(struct kioctx *ctx, struct 
>>> mm_struct *mm)
>>> 					rcu_read_unlock();
>>> 					spin_unlock(&mm->ioctx_lock);
>>> 
>>> +					/*
>>> +					 * Accessing ring pages must be done
>>> +					 * holding ctx->completion_lock to
>>> +					 * prevent aio ring page migration
>>> +					 * procedure from migrating ring 
>>> pages.
>>> +					 */
>>> +				 spin_lock_irqsave(&ctx->completion_lock,
>>> +							  flags);
>>> 					ring = 
>>> 					kmap_atomic(ctx->ring_pages[0]);
>>> 					ring->id = ctx->id;
>>> 					kunmap_atomic(ring);
>>> +					spin_unlock_irqrestore(
>>> +						&ctx->completion_lock, 
>>> flags);
>>> 					return 0;
>>> 				}
>>> 
>>> @@ -1021,6 +1043,7 @@ static long aio_read_events_ring(struct kioctx *ctx,
>>> 	unsigned head, tail, pos;
>>> 	long ret = 0;
>>> 	int copy_ret;
>>> +	unsigned long flags;
>>> 
>>> 	mutex_lock(&ctx->ring_lock);
>>> 
>>> @@ -1066,11 +1089,21 @@ static long aio_read_events_ring(struct kioctx 
>>> *ctx,
>>> 		head %= ctx->nr_events;
>>> 	}
>>> 
>>> +	/*
>>> +	 * The aio ring pages are user space pages, so they can be migrated.
>>> +	 * When writing to an aio ring page, we should ensure the page is not
>>> +	 * being migrated. Aio page migration procedure is protected by
>>> +	 * ctx->completion_lock, so we add this lock here.
>>> +	 */
>>> +	spin_lock_irqsave(&ctx->completion_lock, flags);
>>> +
>>> 	ring = kmap_atomic(ctx->ring_pages[0]);
>>> 	ring->head = head;
>>> 	kunmap_atomic(ring);
>>> 	flush_dcache_page(ctx->ring_pages[0]);
>>> 
>>> +	spin_unlock_irqrestore(&ctx->completion_lock, flags);
>>> +
>>> 	pr_debug("%li  h%u t%u\n", ret, head, tail);
>>> 
>>> 	put_reqs_available(ctx, ret);
> 
> -- 
> "Thought is the essence of where you are now."
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


[-- Attachment #2: Type: text/html, Size: 22113 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [V2 PATCH 0/2] Bug fix in aio ring page migration
  2014-02-26  8:38 [PATCH 1/1] aio, memory-hotplug: Fix confliction when migrating and accessing ring pages Tang Chen
  2014-02-27  0:26 ` Tang Chen
@ 2014-03-10  5:30 ` Tang Chen
  2014-03-10  5:30 ` [V2 PATCH 1/2] aio, memory-hotplug: Fix confliction when migrating and, accessing ring pages Tang Chen
  2014-03-10  5:31 ` [V2 PATCH 2/2] aio, mem-hotplug: Add memory barrier to aio ring page migration Tang Chen
  3 siblings, 0 replies; 7+ messages in thread
From: Tang Chen @ 2014-03-10  5:30 UTC (permalink / raw)
  To: viro, bcrl, jmoyer, kosaki.motohiro, kosaki.motohiro,
	isimatu.yasuaki, guz.fnst
  Cc: linux-fsdevel, linux-aio, linux-kernel, Miao Xie

This patch-set fixes the following two problems:

1. Need to use ctx->completion_lock to protect ring pages
    from being mis-written while migration.

2. Need memory barrier to ensure memory copy is done before
    ctx->ring_pages[] is updated.

NOTE: AIO ring page migration was implemented since Linux 3.12.
       So we need to merge these two patches into 3.12 stable tree.

Tang Chen (2):
   aio, memory-hotplug: Fix confliction when migrating and accessing
     ring pages.
   aio, mem-hotplug: Add memory barrier to aio ring page migration.

  fs/aio.c |   42 ++++++++++++++++++++++++++++++++++++++++++
  1 files changed, 42 insertions(+), 0 deletions(-)

-- 
1.7.7

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [V2 PATCH 1/2] aio, memory-hotplug: Fix confliction when migrating and, accessing ring pages
  2014-02-26  8:38 [PATCH 1/1] aio, memory-hotplug: Fix confliction when migrating and accessing ring pages Tang Chen
  2014-02-27  0:26 ` Tang Chen
  2014-03-10  5:30 ` [V2 PATCH 0/2] Bug fix in aio ring page migration Tang Chen
@ 2014-03-10  5:30 ` Tang Chen
  2014-03-10  5:31 ` [V2 PATCH 2/2] aio, mem-hotplug: Add memory barrier to aio ring page migration Tang Chen
  3 siblings, 0 replies; 7+ messages in thread
From: Tang Chen @ 2014-03-10  5:30 UTC (permalink / raw)
  To: viro, bcrl, jmoyer, kosaki.motohiro, kosaki.motohiro,
	isimatu.yasuaki, guz.fnst
  Cc: linux-fsdevel, linux-aio, linux-kernel, Miao Xie

AIO ring page migration has been implemented by the following patch:


https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/fs/aio.c?id=36bc08cc01709b4a9bb563b35aa530241ddc63e3

In this patch, ctx->completion_lock is used to prevent other processes
from accessing the ring page being migrated.

But in aio_setup_ring(), ioctx_add_table() and aio_read_events_ring(),
when writing to the ring page, they didn't take ctx->completion_lock.

As a result, for example, we have the following problem:

             thread 1                      |              thread 2
                                           |
aio_migratepage()                         |
  |-> take ctx->completion_lock            |
  |-> migrate_page_copy(new, old)          |
  |   *NOW*, ctx->ring_pages[idx] == old   |
                                           |
                                           |    *NOW*, 
ctx->ring_pages[idx] == old
                                           |    aio_read_events_ring()
                                           |     |-> ring = 
kmap_atomic(ctx->ring_pages[0])
                                           |     |-> ring->head = head; 
          *HERE, write to the old ring page*
                                           |     |-> kunmap_atomic(ring);
                                           |
  |-> ctx->ring_pages[idx] = new           |
  |   *BUT NOW*, the content of            |
  |    ring_pages[idx] is old.             |
  |-> release ctx->completion_lock         |

As above, the new ring page will not be updated.

The solution is taking ctx->completion_lock in thread 2, which means,
in aio_setup_ring(), ioctx_add_table() and aio_read_events_ring() when
writing to ring pages.

v2:
   Use spin_lock_irq rather than spin_lock_irqsave as Jeff suggested.

Reported-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
---
  fs/aio.c |   28 ++++++++++++++++++++++++++++
  1 files changed, 28 insertions(+), 0 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 062a5f6..dc70246 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -437,6 +437,14 @@ static int aio_setup_ring(struct kioctx *ctx)
  	ctx->user_id = ctx->mmap_base;
  	ctx->nr_events = nr_events; /* trusted copy */

+	/*
+	 * The aio ring pages are user space pages, so they can be migrated.
+	 * When writing to an aio ring page, we should ensure the page is not
+	 * being migrated. Aio page migration procedure is protected by
+	 * ctx->completion_lock, so we add this lock here.
+	 */
+	spin_lock_irq(&ctx->completion_lock);
+
  	ring = kmap_atomic(ctx->ring_pages[0]);
  	ring->nr = nr_events;	/* user copy */
  	ring->id = ~0U;
@@ -448,6 +456,8 @@ static int aio_setup_ring(struct kioctx *ctx)
  	kunmap_atomic(ring);
  	flush_dcache_page(ctx->ring_pages[0]);

+	spin_unlock_irq(&ctx->completion_lock);
+
  	return 0;
  }

@@ -556,9 +566,17 @@ static int ioctx_add_table(struct kioctx *ctx, 
struct mm_struct *mm)
  					rcu_read_unlock();
  					spin_unlock(&mm->ioctx_lock);

+					/*
+					 * Accessing ring pages must be done
+					 * holding ctx->completion_lock to
+					 * prevent aio ring page migration
+					 * procedure from migrating ring pages.
+					 */
+					spin_lock_irq(&ctx->completion_lock);
  					ring = kmap_atomic(ctx->ring_pages[0]);
  					ring->id = ctx->id;
  					kunmap_atomic(ring);
+					spin_unlock_irq(&ctx->completion_lock);
  					return 0;
  				}

@@ -1066,11 +1084,21 @@ static long aio_read_events_ring(struct kioctx *ctx,
  		head %= ctx->nr_events;
  	}

+	/*
+	 * The aio ring pages are user space pages, so they can be migrated.
+	 * When writing to an aio ring page, we should ensure the page is not
+	 * being migrated. Aio page migration procedure is protected by
+	 * ctx->completion_lock, so we add this lock here.
+	 */
+	spin_lock_irq(&ctx->completion_lock);
+
  	ring = kmap_atomic(ctx->ring_pages[0]);
  	ring->head = head;
  	kunmap_atomic(ring);
  	flush_dcache_page(ctx->ring_pages[0]);

+	spin_unlock_irq(&ctx->completion_lock);
+
  	pr_debug("%li  h%u t%u\n", ret, head, tail);

  	put_reqs_available(ctx, ret);
-- 
1.7.7


--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [V2 PATCH 2/2] aio, mem-hotplug: Add memory barrier to aio ring page migration.
  2014-02-26  8:38 [PATCH 1/1] aio, memory-hotplug: Fix confliction when migrating and accessing ring pages Tang Chen
                   ` (2 preceding siblings ...)
  2014-03-10  5:30 ` [V2 PATCH 1/2] aio, memory-hotplug: Fix confliction when migrating and, accessing ring pages Tang Chen
@ 2014-03-10  5:31 ` Tang Chen
  3 siblings, 0 replies; 7+ messages in thread
From: Tang Chen @ 2014-03-10  5:31 UTC (permalink / raw)
  To: viro, bcrl, jmoyer, kosaki.motohiro, kosaki.motohiro,
	isimatu.yasuaki, guz.fnst
  Cc: linux-fsdevel, linux-aio, linux-kernel, Miao Xie

When doing aio ring page migration, we migrated the page, and update
ctx->ring_pages[]. Like the following:

aio_migratepage()
  |-> migrate_page_copy(new, old)
  |   ......				/* Need barrier here */
  |-> ctx->ring_pages[idx] = new

Actually, we need a memory barrier between these two operations.
Otherwise, if ctx->ring_pages[] is updated before memory copy due to
the compiler optimization, other processes may have an opportunity
to access to the not fully initialized new ring page.

So add a wmb and rmb to synchronize them.

v2:
   change smp_rmb() to smp_read_barrier_depends(). Thanks Miao.

Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
---
  fs/aio.c |   14 ++++++++++++++
  1 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index dc70246..4133ba9 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -327,6 +327,14 @@ static int aio_migratepage(struct address_space 
*mapping, struct page *new,
  		pgoff_t idx;
  		spin_lock_irqsave(&ctx->completion_lock, flags);
  		migrate_page_copy(new, old);
+
+		/*
+		 * Ensure memory copy is finished before updating
+		 * ctx->ring_pages[]. Otherwise other processes may access to
+		 * new ring pages which are not fully initialized.
+		 */
+		smp_wmb();
+
  		idx = old->index;
  		if (idx < (pgoff_t)ctx->nr_pages) {
  			/* And only do the move if things haven't changed */
@@ -1069,6 +1077,12 @@ static long aio_read_events_ring(struct kioctx *ctx,
  		page = ctx->ring_pages[pos / AIO_EVENTS_PER_PAGE];
  		pos %= AIO_EVENTS_PER_PAGE;

+		/*
+		 * Ensure that the page's data was copied from old one by
+		 * aio_migratepage().
+		 */
+		smp_read_barrier_depends();
+
  		ev = kmap(page);
  		copy_ret = copy_to_user(event + ret, ev + pos,
  					sizeof(*ev) * avail);
-- 
1.7.7


--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-03-10  5:31 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-02-26  8:38 [PATCH 1/1] aio, memory-hotplug: Fix confliction when migrating and accessing ring pages Tang Chen
2014-02-27  0:26 ` Tang Chen
2014-02-27 10:03   ` Benjamin LaHaise
2014-02-27 14:38     ` tom
2014-03-10  5:30 ` [V2 PATCH 0/2] Bug fix in aio ring page migration Tang Chen
2014-03-10  5:30 ` [V2 PATCH 1/2] aio, memory-hotplug: Fix confliction when migrating and, accessing ring pages Tang Chen
2014-03-10  5:31 ` [V2 PATCH 2/2] aio, mem-hotplug: Add memory barrier to aio ring page migration Tang Chen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).