* [PATCH 1/1] aio, memory-hotplug: Fix confliction when migrating and accessing ring pages.
@ 2014-02-26 8:38 Tang Chen
2014-02-27 0:26 ` Tang Chen
` (3 more replies)
0 siblings, 4 replies; 7+ messages in thread
From: Tang Chen @ 2014-02-26 8:38 UTC (permalink / raw)
To: viro, bcrl, jmoyer, kosaki.motohiro, kosaki.motohiro,
isimatu.yasuaki, guz.fnst
Cc: linux-fsdevel, linux-aio, linux-kernel
AIO ring page migration has been implemented by the following patch:
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/fs/aio.c?id=36bc08cc01709b4a9bb563b35aa530241ddc63e3
In this patch, ctx->completion_lock is used to prevent other processes
from accessing the ring page being migrated.
But in aio_setup_ring(), ioctx_add_table() and aio_read_events_ring(),
when writing to the ring page, they didn't take ctx->completion_lock.
As a result, for example, we have the following problem:
thread 1 | thread 2
|
aio_migratepage() |
|-> take ctx->completion_lock |
|-> migrate_page_copy(new, old) |
| *NOW*, ctx->ring_pages[idx] == old |
|
| *NOW*, ctx->ring_pages[idx] == old
| aio_read_events_ring()
| |-> ring = kmap_atomic(ctx->ring_pages[0])
| |-> ring->head = head; *HERE, write to the old ring page*
| |-> kunmap_atomic(ring);
|
|-> ctx->ring_pages[idx] = new |
| *BUT NOW*, the content of |
| ring_pages[idx] is old. |
|-> release ctx->completion_lock |
As above, the new ring page will not be updated.
The solution is taking ctx->completion_lock in thread 2, which means,
in aio_setup_ring(), ioctx_add_table() and aio_read_events_ring() when
writing to ring pages.
Reported-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
---
fs/aio.c | 33 +++++++++++++++++++++++++++++++++
1 file changed, 33 insertions(+)
diff --git a/fs/aio.c b/fs/aio.c
index 062a5f6..50c089c 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -366,6 +366,7 @@ static int aio_setup_ring(struct kioctx *ctx)
int nr_pages;
int i;
struct file *file;
+ unsigned long flags;
/* Compensate for the ring buffer's head/tail overlap entry */
nr_events += 2; /* 1 is required, 2 for good luck */
@@ -437,6 +438,14 @@ static int aio_setup_ring(struct kioctx *ctx)
ctx->user_id = ctx->mmap_base;
ctx->nr_events = nr_events; /* trusted copy */
+ /*
+ * The aio ring pages are user space pages, so they can be migrated.
+ * When writing to an aio ring page, we should ensure the page is not
+ * being migrated. Aio page migration procedure is protected by
+ * ctx->completion_lock, so we add this lock here.
+ */
+ spin_lock_irqsave(&ctx->completion_lock, flags);
+
ring = kmap_atomic(ctx->ring_pages[0]);
ring->nr = nr_events; /* user copy */
ring->id = ~0U;
@@ -448,6 +457,8 @@ static int aio_setup_ring(struct kioctx *ctx)
kunmap_atomic(ring);
flush_dcache_page(ctx->ring_pages[0]);
+ spin_unlock_irqrestore(&ctx->completion_lock, flags);
+
return 0;
}
@@ -542,6 +553,7 @@ static int ioctx_add_table(struct kioctx *ctx, struct mm_struct *mm)
unsigned i, new_nr;
struct kioctx_table *table, *old;
struct aio_ring *ring;
+ unsigned long flags;
spin_lock(&mm->ioctx_lock);
rcu_read_lock();
@@ -556,9 +568,19 @@ static int ioctx_add_table(struct kioctx *ctx, struct mm_struct *mm)
rcu_read_unlock();
spin_unlock(&mm->ioctx_lock);
+ /*
+ * Accessing ring pages must be done
+ * holding ctx->completion_lock to
+ * prevent aio ring page migration
+ * procedure from migrating ring pages.
+ */
+ spin_lock_irqsave(&ctx->completion_lock,
+ flags);
ring = kmap_atomic(ctx->ring_pages[0]);
ring->id = ctx->id;
kunmap_atomic(ring);
+ spin_unlock_irqrestore(
+ &ctx->completion_lock, flags);
return 0;
}
@@ -1021,6 +1043,7 @@ static long aio_read_events_ring(struct kioctx *ctx,
unsigned head, tail, pos;
long ret = 0;
int copy_ret;
+ unsigned long flags;
mutex_lock(&ctx->ring_lock);
@@ -1066,11 +1089,21 @@ static long aio_read_events_ring(struct kioctx *ctx,
head %= ctx->nr_events;
}
+ /*
+ * The aio ring pages are user space pages, so they can be migrated.
+ * When writing to an aio ring page, we should ensure the page is not
+ * being migrated. Aio page migration procedure is protected by
+ * ctx->completion_lock, so we add this lock here.
+ */
+ spin_lock_irqsave(&ctx->completion_lock, flags);
+
ring = kmap_atomic(ctx->ring_pages[0]);
ring->head = head;
kunmap_atomic(ring);
flush_dcache_page(ctx->ring_pages[0]);
+ spin_unlock_irqrestore(&ctx->completion_lock, flags);
+
pr_debug("%li h%u t%u\n", ret, head, tail);
put_reqs_available(ctx, ret);
--
1.8.3.1
--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org. For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 1/1] aio, memory-hotplug: Fix confliction when migrating and accessing ring pages.
2014-02-26 8:38 [PATCH 1/1] aio, memory-hotplug: Fix confliction when migrating and accessing ring pages Tang Chen
@ 2014-02-27 0:26 ` Tang Chen
2014-02-27 10:03 ` Benjamin LaHaise
2014-03-10 5:30 ` [V2 PATCH 0/2] Bug fix in aio ring page migration Tang Chen
` (2 subsequent siblings)
3 siblings, 1 reply; 7+ messages in thread
From: Tang Chen @ 2014-02-27 0:26 UTC (permalink / raw)
To: viro, bcrl, jmoyer, kosaki.motohiro, kosaki.motohiro,
isimatu.yasuaki, guz.fnst
Cc: linux-fsdevel, linux-aio, linux-kernel
Hi all,
On 02/26/2014 04:38 PM, Tang Chen wrote:
> AIO ring page migration has been implemented by the following patch:
>
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/fs/aio.c?id=36bc08cc01709b4a9bb563b35aa530241ddc63e3
Forgot to mention that the above patch was merged when Linux 3.12 was
released.
So I think this problem exists in 3.12 stable tree.
If the following solution is acceptable, we need to merge it to 3.12
stable tree, too.
Please reply ASAP.
Thanks.
>
> In this patch, ctx->completion_lock is used to prevent other processes
> from accessing the ring page being migrated.
>
> But in aio_setup_ring(), ioctx_add_table() and aio_read_events_ring(),
> when writing to the ring page, they didn't take ctx->completion_lock.
>
> As a result, for example, we have the following problem:
>
> thread 1 | thread 2
> |
> aio_migratepage() |
> |-> take ctx->completion_lock |
> |-> migrate_page_copy(new, old) |
> | *NOW*, ctx->ring_pages[idx] == old |
> |
> | *NOW*, ctx->ring_pages[idx] == old
> | aio_read_events_ring()
> | |-> ring = kmap_atomic(ctx->ring_pages[0])
> | |-> ring->head = head; *HERE, write to the old ring page*
> | |-> kunmap_atomic(ring);
> |
> |-> ctx->ring_pages[idx] = new |
> | *BUT NOW*, the content of |
> | ring_pages[idx] is old. |
> |-> release ctx->completion_lock |
>
> As above, the new ring page will not be updated.
>
> The solution is taking ctx->completion_lock in thread 2, which means,
> in aio_setup_ring(), ioctx_add_table() and aio_read_events_ring() when
> writing to ring pages.
>
>
> Reported-by: Yasuaki Ishimatsu<isimatu.yasuaki@jp.fujitsu.com>
> Signed-off-by: Tang Chen<tangchen@cn.fujitsu.com>
> ---
> fs/aio.c | 33 +++++++++++++++++++++++++++++++++
> 1 file changed, 33 insertions(+)
>
> diff --git a/fs/aio.c b/fs/aio.c
> index 062a5f6..50c089c 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -366,6 +366,7 @@ static int aio_setup_ring(struct kioctx *ctx)
> int nr_pages;
> int i;
> struct file *file;
> + unsigned long flags;
>
> /* Compensate for the ring buffer's head/tail overlap entry */
> nr_events += 2; /* 1 is required, 2 for good luck */
> @@ -437,6 +438,14 @@ static int aio_setup_ring(struct kioctx *ctx)
> ctx->user_id = ctx->mmap_base;
> ctx->nr_events = nr_events; /* trusted copy */
>
> + /*
> + * The aio ring pages are user space pages, so they can be migrated.
> + * When writing to an aio ring page, we should ensure the page is not
> + * being migrated. Aio page migration procedure is protected by
> + * ctx->completion_lock, so we add this lock here.
> + */
> + spin_lock_irqsave(&ctx->completion_lock, flags);
> +
> ring = kmap_atomic(ctx->ring_pages[0]);
> ring->nr = nr_events; /* user copy */
> ring->id = ~0U;
> @@ -448,6 +457,8 @@ static int aio_setup_ring(struct kioctx *ctx)
> kunmap_atomic(ring);
> flush_dcache_page(ctx->ring_pages[0]);
>
> + spin_unlock_irqrestore(&ctx->completion_lock, flags);
> +
> return 0;
> }
>
> @@ -542,6 +553,7 @@ static int ioctx_add_table(struct kioctx *ctx, struct mm_struct *mm)
> unsigned i, new_nr;
> struct kioctx_table *table, *old;
> struct aio_ring *ring;
> + unsigned long flags;
>
> spin_lock(&mm->ioctx_lock);
> rcu_read_lock();
> @@ -556,9 +568,19 @@ static int ioctx_add_table(struct kioctx *ctx, struct mm_struct *mm)
> rcu_read_unlock();
> spin_unlock(&mm->ioctx_lock);
>
> + /*
> + * Accessing ring pages must be done
> + * holding ctx->completion_lock to
> + * prevent aio ring page migration
> + * procedure from migrating ring pages.
> + */
> + spin_lock_irqsave(&ctx->completion_lock,
> + flags);
> ring = kmap_atomic(ctx->ring_pages[0]);
> ring->id = ctx->id;
> kunmap_atomic(ring);
> + spin_unlock_irqrestore(
> + &ctx->completion_lock, flags);
> return 0;
> }
>
> @@ -1021,6 +1043,7 @@ static long aio_read_events_ring(struct kioctx *ctx,
> unsigned head, tail, pos;
> long ret = 0;
> int copy_ret;
> + unsigned long flags;
>
> mutex_lock(&ctx->ring_lock);
>
> @@ -1066,11 +1089,21 @@ static long aio_read_events_ring(struct kioctx *ctx,
> head %= ctx->nr_events;
> }
>
> + /*
> + * The aio ring pages are user space pages, so they can be migrated.
> + * When writing to an aio ring page, we should ensure the page is not
> + * being migrated. Aio page migration procedure is protected by
> + * ctx->completion_lock, so we add this lock here.
> + */
> + spin_lock_irqsave(&ctx->completion_lock, flags);
> +
> ring = kmap_atomic(ctx->ring_pages[0]);
> ring->head = head;
> kunmap_atomic(ring);
> flush_dcache_page(ctx->ring_pages[0]);
>
> + spin_unlock_irqrestore(&ctx->completion_lock, flags);
> +
> pr_debug("%li h%u t%u\n", ret, head, tail);
>
> put_reqs_available(ctx, ret);
--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org. For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/1] aio, memory-hotplug: Fix confliction when migrating and accessing ring pages.
2014-02-27 0:26 ` Tang Chen
@ 2014-02-27 10:03 ` Benjamin LaHaise
2014-02-27 14:38 ` tom
0 siblings, 1 reply; 7+ messages in thread
From: Benjamin LaHaise @ 2014-02-27 10:03 UTC (permalink / raw)
To: Tang Chen
Cc: viro, jmoyer, kosaki.motohiro, kosaki.motohiro, isimatu.yasuaki,
guz.fnst, linux-fsdevel, linux-aio, linux-kernel
On Thu, Feb 27, 2014 at 08:26:16AM +0800, Tang Chen wrote:
> Forgot to mention that the above patch was merged when Linux 3.12 was
> released.
> So I think this problem exists in 3.12 stable tree.
>
> If the following solution is acceptable, we need to merge it to 3.12
> stable tree, too.
>
> Please reply ASAP.
I'm travelling right now and won't be testing this patch until I get back
home in about a week, so, for now, I'll apply the patch to my aio-next tree
so that it gets some exposure to the various trinty runs and other tools
people run against the -next tree. I'll then push it out to Linus once I've
run my own sanity tests next week. Regards,
-ben
> Thanks.
>
> >
> >In this patch, ctx->completion_lock is used to prevent other processes
> >from accessing the ring page being migrated.
> >
> >But in aio_setup_ring(), ioctx_add_table() and aio_read_events_ring(),
> >when writing to the ring page, they didn't take ctx->completion_lock.
> >
> >As a result, for example, we have the following problem:
> >
> > thread 1 | thread 2
> > |
> >aio_migratepage() |
> > |-> take ctx->completion_lock |
> > |-> migrate_page_copy(new, old) |
> > | *NOW*, ctx->ring_pages[idx] == old |
> > |
> > | *NOW*,
> > ctx->ring_pages[idx] == old
> > | aio_read_events_ring()
> > | |-> ring =
> > kmap_atomic(ctx->ring_pages[0])
> > | |-> ring->head = head;
> > *HERE, write to the old ring
> > page*
> > | |-> kunmap_atomic(ring);
> > |
> > |-> ctx->ring_pages[idx] = new |
> > | *BUT NOW*, the content of |
> > | ring_pages[idx] is old. |
> > |-> release ctx->completion_lock |
> >
> >As above, the new ring page will not be updated.
> >
> >The solution is taking ctx->completion_lock in thread 2, which means,
> >in aio_setup_ring(), ioctx_add_table() and aio_read_events_ring() when
> >writing to ring pages.
> >
> >
> >Reported-by: Yasuaki Ishimatsu<isimatu.yasuaki@jp.fujitsu.com>
> >Signed-off-by: Tang Chen<tangchen@cn.fujitsu.com>
> >---
> > fs/aio.c | 33 +++++++++++++++++++++++++++++++++
> > 1 file changed, 33 insertions(+)
> >
> >diff --git a/fs/aio.c b/fs/aio.c
> >index 062a5f6..50c089c 100644
> >--- a/fs/aio.c
> >+++ b/fs/aio.c
> >@@ -366,6 +366,7 @@ static int aio_setup_ring(struct kioctx *ctx)
> > int nr_pages;
> > int i;
> > struct file *file;
> >+ unsigned long flags;
> >
> > /* Compensate for the ring buffer's head/tail overlap entry */
> > nr_events += 2; /* 1 is required, 2 for good luck */
> >@@ -437,6 +438,14 @@ static int aio_setup_ring(struct kioctx *ctx)
> > ctx->user_id = ctx->mmap_base;
> > ctx->nr_events = nr_events; /* trusted copy */
> >
> >+ /*
> >+ * The aio ring pages are user space pages, so they can be migrated.
> >+ * When writing to an aio ring page, we should ensure the page is not
> >+ * being migrated. Aio page migration procedure is protected by
> >+ * ctx->completion_lock, so we add this lock here.
> >+ */
> >+ spin_lock_irqsave(&ctx->completion_lock, flags);
> >+
> > ring = kmap_atomic(ctx->ring_pages[0]);
> > ring->nr = nr_events; /* user copy */
> > ring->id = ~0U;
> >@@ -448,6 +457,8 @@ static int aio_setup_ring(struct kioctx *ctx)
> > kunmap_atomic(ring);
> > flush_dcache_page(ctx->ring_pages[0]);
> >
> >+ spin_unlock_irqrestore(&ctx->completion_lock, flags);
> >+
> > return 0;
> > }
> >
> >@@ -542,6 +553,7 @@ static int ioctx_add_table(struct kioctx *ctx, struct
> >mm_struct *mm)
> > unsigned i, new_nr;
> > struct kioctx_table *table, *old;
> > struct aio_ring *ring;
> >+ unsigned long flags;
> >
> > spin_lock(&mm->ioctx_lock);
> > rcu_read_lock();
> >@@ -556,9 +568,19 @@ static int ioctx_add_table(struct kioctx *ctx, struct
> >mm_struct *mm)
> > rcu_read_unlock();
> > spin_unlock(&mm->ioctx_lock);
> >
> >+ /*
> >+ * Accessing ring pages must be done
> >+ * holding ctx->completion_lock to
> >+ * prevent aio ring page migration
> >+ * procedure from migrating ring
> >pages.
> >+ */
> >+ spin_lock_irqsave(&ctx->completion_lock,
> >+ flags);
> > ring =
> > kmap_atomic(ctx->ring_pages[0]);
> > ring->id = ctx->id;
> > kunmap_atomic(ring);
> >+ spin_unlock_irqrestore(
> >+ &ctx->completion_lock,
> >flags);
> > return 0;
> > }
> >
> >@@ -1021,6 +1043,7 @@ static long aio_read_events_ring(struct kioctx *ctx,
> > unsigned head, tail, pos;
> > long ret = 0;
> > int copy_ret;
> >+ unsigned long flags;
> >
> > mutex_lock(&ctx->ring_lock);
> >
> >@@ -1066,11 +1089,21 @@ static long aio_read_events_ring(struct kioctx
> >*ctx,
> > head %= ctx->nr_events;
> > }
> >
> >+ /*
> >+ * The aio ring pages are user space pages, so they can be migrated.
> >+ * When writing to an aio ring page, we should ensure the page is not
> >+ * being migrated. Aio page migration procedure is protected by
> >+ * ctx->completion_lock, so we add this lock here.
> >+ */
> >+ spin_lock_irqsave(&ctx->completion_lock, flags);
> >+
> > ring = kmap_atomic(ctx->ring_pages[0]);
> > ring->head = head;
> > kunmap_atomic(ring);
> > flush_dcache_page(ctx->ring_pages[0]);
> >
> >+ spin_unlock_irqrestore(&ctx->completion_lock, flags);
> >+
> > pr_debug("%li h%u t%u\n", ret, head, tail);
> >
> > put_reqs_available(ctx, ret);
--
"Thought is the essence of where you are now."
--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org. For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/1] aio, memory-hotplug: Fix confliction when migrating and accessing ring pages.
2014-02-27 10:03 ` Benjamin LaHaise
@ 2014-02-27 14:38 ` tom
0 siblings, 0 replies; 7+ messages in thread
From: tom @ 2014-02-27 14:38 UTC (permalink / raw)
To: Benjamin LaHaise
Cc: Tang Chen, viro, jmoyer, kosaki.motohiro, kosaki.motohiro,
isimatu.yasuaki, Gu Zheng, linux-fsdevel, linux-aio, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 6783 bytes --]
Hi Ben,
在 2014年2月27日,下午7:03,Benjamin LaHaise <bcrl@kvack.org> 写道:
> On Thu, Feb 27, 2014 at 08:26:16AM +0800, Tang Chen wrote:
>> Forgot to mention that the above patch was merged when Linux 3.12 was
>> released.
>> So I think this problem exists in 3.12 stable tree.
>>
>> If the following solution is acceptable, we need to merge it to 3.12
>> stable tree, too.
>>
>> Please reply ASAP.
>
> I'm travelling right now and won't be testing this patch until I get back
> home in about a week, so, for now, I'll apply the patch to my aio-next tree
> so that it gets some exposure to the various trinty runs and other tools
> people run against the -next tree. I'll then push it out to Linus once I've
> run my own sanity tests next week. Regards,
Thanks for the reply.
And if you have time, please also have a look at the following patch-set.
https://lkml.org/lkml/2014/2/27/143
patch1 is the same. And I think patch2 is also needed.
Thanks.
>
> -ben
>
>> Thanks.
>>
>>>
>>> In this patch, ctx->completion_lock is used to prevent other processes
>>> from accessing the ring page being migrated.
>>>
>>> But in aio_setup_ring(), ioctx_add_table() and aio_read_events_ring(),
>>> when writing to the ring page, they didn't take ctx->completion_lock.
>>>
>>> As a result, for example, we have the following problem:
>>>
>>> thread 1 | thread 2
>>> |
>>> aio_migratepage() |
>>> |-> take ctx->completion_lock |
>>> |-> migrate_page_copy(new, old) |
>>> | *NOW*, ctx->ring_pages[idx] == old |
>>> |
>>> | *NOW*,
>>> ctx->ring_pages[idx] == old
>>> | aio_read_events_ring()
>>> | |-> ring =
>>> kmap_atomic(ctx->ring_pages[0])
>>> | |-> ring->head = head;
>>> *HERE, write to the old ring
>>> page*
>>> | |-> kunmap_atomic(ring);
>>> |
>>> |-> ctx->ring_pages[idx] = new |
>>> | *BUT NOW*, the content of |
>>> | ring_pages[idx] is old. |
>>> |-> release ctx->completion_lock |
>>>
>>> As above, the new ring page will not be updated.
>>>
>>> The solution is taking ctx->completion_lock in thread 2, which means,
>>> in aio_setup_ring(), ioctx_add_table() and aio_read_events_ring() when
>>> writing to ring pages.
>>>
>>>
>>> Reported-by: Yasuaki Ishimatsu<isimatu.yasuaki@jp.fujitsu.com>
>>> Signed-off-by: Tang Chen<tangchen@cn.fujitsu.com>
>>> ---
>>> fs/aio.c | 33 +++++++++++++++++++++++++++++++++
>>> 1 file changed, 33 insertions(+)
>>>
>>> diff --git a/fs/aio.c b/fs/aio.c
>>> index 062a5f6..50c089c 100644
>>> --- a/fs/aio.c
>>> +++ b/fs/aio.c
>>> @@ -366,6 +366,7 @@ static int aio_setup_ring(struct kioctx *ctx)
>>> int nr_pages;
>>> int i;
>>> struct file *file;
>>> + unsigned long flags;
>>>
>>> /* Compensate for the ring buffer's head/tail overlap entry */
>>> nr_events += 2; /* 1 is required, 2 for good luck */
>>> @@ -437,6 +438,14 @@ static int aio_setup_ring(struct kioctx *ctx)
>>> ctx->user_id = ctx->mmap_base;
>>> ctx->nr_events = nr_events; /* trusted copy */
>>>
>>> + /*
>>> + * The aio ring pages are user space pages, so they can be migrated.
>>> + * When writing to an aio ring page, we should ensure the page is not
>>> + * being migrated. Aio page migration procedure is protected by
>>> + * ctx->completion_lock, so we add this lock here.
>>> + */
>>> + spin_lock_irqsave(&ctx->completion_lock, flags);
>>> +
>>> ring = kmap_atomic(ctx->ring_pages[0]);
>>> ring->nr = nr_events; /* user copy */
>>> ring->id = ~0U;
>>> @@ -448,6 +457,8 @@ static int aio_setup_ring(struct kioctx *ctx)
>>> kunmap_atomic(ring);
>>> flush_dcache_page(ctx->ring_pages[0]);
>>>
>>> + spin_unlock_irqrestore(&ctx->completion_lock, flags);
>>> +
>>> return 0;
>>> }
>>>
>>> @@ -542,6 +553,7 @@ static int ioctx_add_table(struct kioctx *ctx, struct
>>> mm_struct *mm)
>>> unsigned i, new_nr;
>>> struct kioctx_table *table, *old;
>>> struct aio_ring *ring;
>>> + unsigned long flags;
>>>
>>> spin_lock(&mm->ioctx_lock);
>>> rcu_read_lock();
>>> @@ -556,9 +568,19 @@ static int ioctx_add_table(struct kioctx *ctx, struct
>>> mm_struct *mm)
>>> rcu_read_unlock();
>>> spin_unlock(&mm->ioctx_lock);
>>>
>>> + /*
>>> + * Accessing ring pages must be done
>>> + * holding ctx->completion_lock to
>>> + * prevent aio ring page migration
>>> + * procedure from migrating ring
>>> pages.
>>> + */
>>> + spin_lock_irqsave(&ctx->completion_lock,
>>> + flags);
>>> ring =
>>> kmap_atomic(ctx->ring_pages[0]);
>>> ring->id = ctx->id;
>>> kunmap_atomic(ring);
>>> + spin_unlock_irqrestore(
>>> + &ctx->completion_lock,
>>> flags);
>>> return 0;
>>> }
>>>
>>> @@ -1021,6 +1043,7 @@ static long aio_read_events_ring(struct kioctx *ctx,
>>> unsigned head, tail, pos;
>>> long ret = 0;
>>> int copy_ret;
>>> + unsigned long flags;
>>>
>>> mutex_lock(&ctx->ring_lock);
>>>
>>> @@ -1066,11 +1089,21 @@ static long aio_read_events_ring(struct kioctx
>>> *ctx,
>>> head %= ctx->nr_events;
>>> }
>>>
>>> + /*
>>> + * The aio ring pages are user space pages, so they can be migrated.
>>> + * When writing to an aio ring page, we should ensure the page is not
>>> + * being migrated. Aio page migration procedure is protected by
>>> + * ctx->completion_lock, so we add this lock here.
>>> + */
>>> + spin_lock_irqsave(&ctx->completion_lock, flags);
>>> +
>>> ring = kmap_atomic(ctx->ring_pages[0]);
>>> ring->head = head;
>>> kunmap_atomic(ring);
>>> flush_dcache_page(ctx->ring_pages[0]);
>>>
>>> + spin_unlock_irqrestore(&ctx->completion_lock, flags);
>>> +
>>> pr_debug("%li h%u t%u\n", ret, head, tail);
>>>
>>> put_reqs_available(ctx, ret);
>
> --
> "Thought is the essence of where you are now."
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
[-- Attachment #2: Type: text/html, Size: 22113 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* [V2 PATCH 0/2] Bug fix in aio ring page migration
2014-02-26 8:38 [PATCH 1/1] aio, memory-hotplug: Fix confliction when migrating and accessing ring pages Tang Chen
2014-02-27 0:26 ` Tang Chen
@ 2014-03-10 5:30 ` Tang Chen
2014-03-10 5:30 ` [V2 PATCH 1/2] aio, memory-hotplug: Fix confliction when migrating and, accessing ring pages Tang Chen
2014-03-10 5:31 ` [V2 PATCH 2/2] aio, mem-hotplug: Add memory barrier to aio ring page migration Tang Chen
3 siblings, 0 replies; 7+ messages in thread
From: Tang Chen @ 2014-03-10 5:30 UTC (permalink / raw)
To: viro, bcrl, jmoyer, kosaki.motohiro, kosaki.motohiro,
isimatu.yasuaki, guz.fnst
Cc: linux-fsdevel, linux-aio, linux-kernel, Miao Xie
This patch-set fixes the following two problems:
1. Need to use ctx->completion_lock to protect ring pages
from being mis-written while migration.
2. Need memory barrier to ensure memory copy is done before
ctx->ring_pages[] is updated.
NOTE: AIO ring page migration was implemented since Linux 3.12.
So we need to merge these two patches into 3.12 stable tree.
Tang Chen (2):
aio, memory-hotplug: Fix confliction when migrating and accessing
ring pages.
aio, mem-hotplug: Add memory barrier to aio ring page migration.
fs/aio.c | 42 ++++++++++++++++++++++++++++++++++++++++++
1 files changed, 42 insertions(+), 0 deletions(-)
--
1.7.7
--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org. For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>
^ permalink raw reply [flat|nested] 7+ messages in thread
* [V2 PATCH 1/2] aio, memory-hotplug: Fix confliction when migrating and, accessing ring pages
2014-02-26 8:38 [PATCH 1/1] aio, memory-hotplug: Fix confliction when migrating and accessing ring pages Tang Chen
2014-02-27 0:26 ` Tang Chen
2014-03-10 5:30 ` [V2 PATCH 0/2] Bug fix in aio ring page migration Tang Chen
@ 2014-03-10 5:30 ` Tang Chen
2014-03-10 5:31 ` [V2 PATCH 2/2] aio, mem-hotplug: Add memory barrier to aio ring page migration Tang Chen
3 siblings, 0 replies; 7+ messages in thread
From: Tang Chen @ 2014-03-10 5:30 UTC (permalink / raw)
To: viro, bcrl, jmoyer, kosaki.motohiro, kosaki.motohiro,
isimatu.yasuaki, guz.fnst
Cc: linux-fsdevel, linux-aio, linux-kernel, Miao Xie
AIO ring page migration has been implemented by the following patch:
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/fs/aio.c?id=36bc08cc01709b4a9bb563b35aa530241ddc63e3
In this patch, ctx->completion_lock is used to prevent other processes
from accessing the ring page being migrated.
But in aio_setup_ring(), ioctx_add_table() and aio_read_events_ring(),
when writing to the ring page, they didn't take ctx->completion_lock.
As a result, for example, we have the following problem:
thread 1 | thread 2
|
aio_migratepage() |
|-> take ctx->completion_lock |
|-> migrate_page_copy(new, old) |
| *NOW*, ctx->ring_pages[idx] == old |
|
| *NOW*,
ctx->ring_pages[idx] == old
| aio_read_events_ring()
| |-> ring =
kmap_atomic(ctx->ring_pages[0])
| |-> ring->head = head;
*HERE, write to the old ring page*
| |-> kunmap_atomic(ring);
|
|-> ctx->ring_pages[idx] = new |
| *BUT NOW*, the content of |
| ring_pages[idx] is old. |
|-> release ctx->completion_lock |
As above, the new ring page will not be updated.
The solution is taking ctx->completion_lock in thread 2, which means,
in aio_setup_ring(), ioctx_add_table() and aio_read_events_ring() when
writing to ring pages.
v2:
Use spin_lock_irq rather than spin_lock_irqsave as Jeff suggested.
Reported-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
---
fs/aio.c | 28 ++++++++++++++++++++++++++++
1 files changed, 28 insertions(+), 0 deletions(-)
diff --git a/fs/aio.c b/fs/aio.c
index 062a5f6..dc70246 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -437,6 +437,14 @@ static int aio_setup_ring(struct kioctx *ctx)
ctx->user_id = ctx->mmap_base;
ctx->nr_events = nr_events; /* trusted copy */
+ /*
+ * The aio ring pages are user space pages, so they can be migrated.
+ * When writing to an aio ring page, we should ensure the page is not
+ * being migrated. Aio page migration procedure is protected by
+ * ctx->completion_lock, so we add this lock here.
+ */
+ spin_lock_irq(&ctx->completion_lock);
+
ring = kmap_atomic(ctx->ring_pages[0]);
ring->nr = nr_events; /* user copy */
ring->id = ~0U;
@@ -448,6 +456,8 @@ static int aio_setup_ring(struct kioctx *ctx)
kunmap_atomic(ring);
flush_dcache_page(ctx->ring_pages[0]);
+ spin_unlock_irq(&ctx->completion_lock);
+
return 0;
}
@@ -556,9 +566,17 @@ static int ioctx_add_table(struct kioctx *ctx,
struct mm_struct *mm)
rcu_read_unlock();
spin_unlock(&mm->ioctx_lock);
+ /*
+ * Accessing ring pages must be done
+ * holding ctx->completion_lock to
+ * prevent aio ring page migration
+ * procedure from migrating ring pages.
+ */
+ spin_lock_irq(&ctx->completion_lock);
ring = kmap_atomic(ctx->ring_pages[0]);
ring->id = ctx->id;
kunmap_atomic(ring);
+ spin_unlock_irq(&ctx->completion_lock);
return 0;
}
@@ -1066,11 +1084,21 @@ static long aio_read_events_ring(struct kioctx *ctx,
head %= ctx->nr_events;
}
+ /*
+ * The aio ring pages are user space pages, so they can be migrated.
+ * When writing to an aio ring page, we should ensure the page is not
+ * being migrated. Aio page migration procedure is protected by
+ * ctx->completion_lock, so we add this lock here.
+ */
+ spin_lock_irq(&ctx->completion_lock);
+
ring = kmap_atomic(ctx->ring_pages[0]);
ring->head = head;
kunmap_atomic(ring);
flush_dcache_page(ctx->ring_pages[0]);
+ spin_unlock_irq(&ctx->completion_lock);
+
pr_debug("%li h%u t%u\n", ret, head, tail);
put_reqs_available(ctx, ret);
--
1.7.7
--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org. For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [V2 PATCH 2/2] aio, mem-hotplug: Add memory barrier to aio ring page migration.
2014-02-26 8:38 [PATCH 1/1] aio, memory-hotplug: Fix confliction when migrating and accessing ring pages Tang Chen
` (2 preceding siblings ...)
2014-03-10 5:30 ` [V2 PATCH 1/2] aio, memory-hotplug: Fix confliction when migrating and, accessing ring pages Tang Chen
@ 2014-03-10 5:31 ` Tang Chen
3 siblings, 0 replies; 7+ messages in thread
From: Tang Chen @ 2014-03-10 5:31 UTC (permalink / raw)
To: viro, bcrl, jmoyer, kosaki.motohiro, kosaki.motohiro,
isimatu.yasuaki, guz.fnst
Cc: linux-fsdevel, linux-aio, linux-kernel, Miao Xie
When doing aio ring page migration, we migrated the page, and update
ctx->ring_pages[]. Like the following:
aio_migratepage()
|-> migrate_page_copy(new, old)
| ...... /* Need barrier here */
|-> ctx->ring_pages[idx] = new
Actually, we need a memory barrier between these two operations.
Otherwise, if ctx->ring_pages[] is updated before memory copy due to
the compiler optimization, other processes may have an opportunity
to access to the not fully initialized new ring page.
So add a wmb and rmb to synchronize them.
v2:
change smp_rmb() to smp_read_barrier_depends(). Thanks Miao.
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
---
fs/aio.c | 14 ++++++++++++++
1 files changed, 14 insertions(+), 0 deletions(-)
diff --git a/fs/aio.c b/fs/aio.c
index dc70246..4133ba9 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -327,6 +327,14 @@ static int aio_migratepage(struct address_space
*mapping, struct page *new,
pgoff_t idx;
spin_lock_irqsave(&ctx->completion_lock, flags);
migrate_page_copy(new, old);
+
+ /*
+ * Ensure memory copy is finished before updating
+ * ctx->ring_pages[]. Otherwise other processes may access to
+ * new ring pages which are not fully initialized.
+ */
+ smp_wmb();
+
idx = old->index;
if (idx < (pgoff_t)ctx->nr_pages) {
/* And only do the move if things haven't changed */
@@ -1069,6 +1077,12 @@ static long aio_read_events_ring(struct kioctx *ctx,
page = ctx->ring_pages[pos / AIO_EVENTS_PER_PAGE];
pos %= AIO_EVENTS_PER_PAGE;
+ /*
+ * Ensure that the page's data was copied from old one by
+ * aio_migratepage().
+ */
+ smp_read_barrier_depends();
+
ev = kmap(page);
copy_ret = copy_to_user(event + ret, ev + pos,
sizeof(*ev) * avail);
--
1.7.7
--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org. For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>
^ permalink raw reply related [flat|nested] 7+ messages in thread
end of thread, other threads:[~2014-03-10 5:31 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-02-26 8:38 [PATCH 1/1] aio, memory-hotplug: Fix confliction when migrating and accessing ring pages Tang Chen
2014-02-27 0:26 ` Tang Chen
2014-02-27 10:03 ` Benjamin LaHaise
2014-02-27 14:38 ` tom
2014-03-10 5:30 ` [V2 PATCH 0/2] Bug fix in aio ring page migration Tang Chen
2014-03-10 5:30 ` [V2 PATCH 1/2] aio, memory-hotplug: Fix confliction when migrating and, accessing ring pages Tang Chen
2014-03-10 5:31 ` [V2 PATCH 2/2] aio, mem-hotplug: Add memory barrier to aio ring page migration Tang Chen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).