* Re: [PATCH] aio: ensure access to ctx->ring_pages is correctly serialised
2014-03-21 18:35 ` [PATCH] aio: ensure access to ctx->ring_pages is correctly serialised Benjamin LaHaise
@ 2014-03-24 10:56 ` Gu Zheng
2014-03-24 10:59 ` [V2 PATCH 1/2] aio: clean up aio_migratepage() and related code much Gu Zheng
` (2 subsequent siblings)
3 siblings, 0 replies; 15+ messages in thread
From: Gu Zheng @ 2014-03-24 10:56 UTC (permalink / raw)
To: Benjamin LaHaise
Cc: Tang Chen, Dave Jones, Al Viro, jmoyer, kosaki.motohiro,
KAMEZAWA Hiroyuki, Yasuaki Ishimatsu, miaox, linux-aio, fsdevel,
linux-kernel, Andrew Morton
Hi Ben,
On 03/22/2014 02:35 AM, Benjamin LaHaise wrote:
> Hi all,
>
> Based on the issues reported by Tang and Gu, I've come up with the an
> alternative fix that avoids adding additional locking in the event read
> code path. The fix is to take the ring_lock mutex during page migration,
> which is already used to syncronize event readers and thus does not add
> any new locking requirements in aio_read_events_ring(). I've dropped
> the patches from Tang and Gu as a result. This patch is now in my
> git://git.kvack.org/~bcrl/aio-next.git tree and will be sent to Linus
> once a few other people chime in with their reviews of this change.
> Please review Tang, Gu. Thanks!
As I mentioned before:
https://lkml.org/lkml/2014/3/20/34
We can put put_aio_ring_file() at the first of the
context teardown flow (aio_free_ring). Then, page migration and ctx freeing
will have mutual execution guarded by lock_page() v.s. truncate().
So that, the additional spinlock(address_space's private_lock) used to
protect use and updates of the mapping's private_data, and the sane check
of context is needless, so does the additional percpu_ref_get/put(&ctx->users).
But the enlarge ring_lock protection region to remove the additional spin_lock
is an elegant method if we ignore the effect to reading events while migrating
page is going.
Thanks,
Gu
>
> -ben
--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org. For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* [V2 PATCH 1/2] aio: clean up aio_migratepage() and related code much
2014-03-21 18:35 ` [PATCH] aio: ensure access to ctx->ring_pages is correctly serialised Benjamin LaHaise
2014-03-24 10:56 ` Gu Zheng
@ 2014-03-24 10:59 ` Gu Zheng
2014-03-24 13:20 ` Benjamin LaHaise
2014-03-24 10:59 ` [V2 PATCH 2/2] aio: fix the confliction of aio read events and aio migrate page Gu Zheng
2014-03-24 18:22 ` [PATCH] aio: ensure access to ctx->ring_pages is correctly serialised Sasha Levin
3 siblings, 1 reply; 15+ messages in thread
From: Gu Zheng @ 2014-03-24 10:59 UTC (permalink / raw)
To: Benjamin LaHaise
Cc: Tang Chen, Dave Jones, Al Viro, jmoyer, kosaki.motohiro,
KAMEZAWA Hiroyuki, Yasuaki Ishimatsu, miaox, linux-aio, fsdevel,
linux-kernel, Andrew Morton
As the page migration framework holds lock_page() to protect the pages
(both old and new) while migrating, so while the page migrating, both
of old page and new page are locked. And the aio context teardown
routine will call *truncate*(in put_aio_ring_file()) to truncate
pagecache which needs to acquire page_lock() for each page one by one.
So there is a native mutual exclusion between *migrate page* v.s. truncate().
If put_aio_ring_file() is called at first of the context teardown flow
(aio_free_ring). Then, page migration and ctx freeing will have mutual
execution guarded by lock_page() v.s. truncate(). Once a page is removed
from radix-tree, it will not be migrated. On the other hand, the context
can not be freed while the page migraiton are ongoing.
aio_free_ring
-put_aio_ring_file() |
|-truncate_setsize() |migrate_pages()
| |-truncate_inode_pages_range() | |-__unmap_and_move()
| |-trylock_page(page) | |-lock_page(old)
|-i_mapping->private_data = NULL | ...
|-ctx->aio_ring_file=NULL | |-move_to_new_page()
| | |-trylock_page(newpage)
| |-aio_migratepage()
So that, the additional spinlock(address_space's private_lock) used to
protect use and updates of the mapping's private_data, and the sane check
of context is needless, so remove it here.
Besides, we also move filling ring_pages[] array and ctx->nr_pages into the
page_lock protection in aio_setup_ring to keep the semantic unanimous.
Moreover, after migrate_page_move_mapping() success, page migration should
never fail, so here we merge the flow in one routine.
Thanks KAMEZAWA Hiroyuki very much for giving directions on this.
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
---
v2: Rebased on latest linus' tree.
---
fs/aio.c | 79 ++++++++++++++++++--------------------------------------------
1 files changed, 23 insertions(+), 56 deletions(-)
diff --git a/fs/aio.c b/fs/aio.c
index 062a5f6..6453c12 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -229,11 +229,8 @@ static void put_aio_ring_file(struct kioctx *ctx)
if (aio_ring_file) {
truncate_setsize(aio_ring_file->f_inode, 0);
- /* Prevent further access to the kioctx from migratepages */
- spin_lock(&aio_ring_file->f_inode->i_mapping->private_lock);
aio_ring_file->f_inode->i_mapping->private_data = NULL;
ctx->aio_ring_file = NULL;
- spin_unlock(&aio_ring_file->f_inode->i_mapping->private_lock);
fput(aio_ring_file);
}
@@ -243,6 +240,8 @@ static void aio_free_ring(struct kioctx *ctx)
{
int i;
+ put_aio_ring_file(ctx);
+
for (i = 0; i < ctx->nr_pages; i++) {
struct page *page;
pr_debug("pid(%d) [%d] page->count=%d\n", current->pid, i,
@@ -254,8 +253,6 @@ static void aio_free_ring(struct kioctx *ctx)
put_page(page);
}
- put_aio_ring_file(ctx);
-
if (ctx->ring_pages && ctx->ring_pages != ctx->internal_pages) {
kfree(ctx->ring_pages);
ctx->ring_pages = NULL;
@@ -283,32 +280,22 @@ static int aio_migratepage(struct address_space *mapping, struct page *new,
{
struct kioctx *ctx;
unsigned long flags;
- int rc;
-
- rc = 0;
+ pgoff_t index;
+ int rc = MIGRATEPAGE_SUCCESS;
- /* Make sure the old page hasn't already been changed */
- spin_lock(&mapping->private_lock);
+ /*
+ * Because old page is *locked*, if we see mapping, the page isn't
+ * truncated, andmapping , mapping->private_data etc...are all valid.
+ */
ctx = mapping->private_data;
- if (ctx) {
- pgoff_t idx;
- spin_lock_irqsave(&ctx->completion_lock, flags);
- idx = old->index;
- if (idx < (pgoff_t)ctx->nr_pages) {
- if (ctx->ring_pages[idx] != old)
- rc = -EAGAIN;
- } else
- rc = -EINVAL;
- spin_unlock_irqrestore(&ctx->completion_lock, flags);
- } else
- rc = -EINVAL;
- spin_unlock(&mapping->private_lock);
- if (rc != 0)
- return rc;
+ index = old->index;
+ BUG_ON(index >= (pgoff_t)ctx->nr_pages);
+ VM_BUG_ON(ctx->ring_pages[index] != old);
/* Writeback must be complete */
BUG_ON(PageWriteback(old));
+ /* Extra ref cnt for rind_pages[] array */
get_page(new);
rc = migrate_page_move_mapping(mapping, new, old, NULL, mode, 1);
@@ -317,34 +304,13 @@ static int aio_migratepage(struct address_space *mapping, struct page *new,
return rc;
}
- /* We can potentially race against kioctx teardown here. Use the
- * address_space's private data lock to protect the mapping's
- * private_data.
- */
- spin_lock(&mapping->private_lock);
- ctx = mapping->private_data;
- if (ctx) {
- pgoff_t idx;
- spin_lock_irqsave(&ctx->completion_lock, flags);
- migrate_page_copy(new, old);
- idx = old->index;
- if (idx < (pgoff_t)ctx->nr_pages) {
- /* And only do the move if things haven't changed */
- if (ctx->ring_pages[idx] == old)
- ctx->ring_pages[idx] = new;
- else
- rc = -EAGAIN;
- } else
- rc = -EINVAL;
- spin_unlock_irqrestore(&ctx->completion_lock, flags);
- } else
- rc = -EBUSY;
- spin_unlock(&mapping->private_lock);
+ spin_lock_irqsave(&ctx->completion_lock, flags);
+ /* Migration should not fail if migrate_page_move_mapping SUCCESS */
+ migrate_page_copy(new, old);
+ ctx->ring_pages[old->index] = new;
+ spin_unlock_irqrestore(&ctx->completion_lock, flags);
- if (rc == MIGRATEPAGE_SUCCESS)
- put_page(old);
- else
- put_page(new);
+ put_page(old);
return rc;
}
@@ -397,6 +363,8 @@ static int aio_setup_ring(struct kioctx *ctx)
}
}
+ ctx->nr_pages = 0;
+
for (i = 0; i < nr_pages; i++) {
struct page *page;
page = find_or_create_page(file->f_inode->i_mapping,
@@ -407,13 +375,12 @@ static int aio_setup_ring(struct kioctx *ctx)
current->pid, i, page_count(page));
SetPageUptodate(page);
SetPageDirty(page);
- unlock_page(page);
-
ctx->ring_pages[i] = page;
+ ctx->nr_pages++;
+ unlock_page(page);
}
- ctx->nr_pages = i;
- if (unlikely(i != nr_pages)) {
+ if (unlikely(ctx->nr_pages != nr_pages)) {
aio_free_ring(ctx);
return -EAGAIN;
}
--
1.7.7
--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org. For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [V2 PATCH 1/2] aio: clean up aio_migratepage() and related code much
2014-03-24 10:59 ` [V2 PATCH 1/2] aio: clean up aio_migratepage() and related code much Gu Zheng
@ 2014-03-24 13:20 ` Benjamin LaHaise
2014-03-25 10:11 ` Gu Zheng
0 siblings, 1 reply; 15+ messages in thread
From: Benjamin LaHaise @ 2014-03-24 13:20 UTC (permalink / raw)
To: Gu Zheng
Cc: Tang Chen, Dave Jones, Al Viro, jmoyer, kosaki.motohiro,
KAMEZAWA Hiroyuki, Yasuaki Ishimatsu, miaox, linux-aio, fsdevel,
linux-kernel, Andrew Morton
On Mon, Mar 24, 2014 at 06:59:30PM +0800, Gu Zheng wrote:
> As the page migration framework holds lock_page() to protect the pages
> (both old and new) while migrating, so while the page migrating, both
> of old page and new page are locked. And the aio context teardown
> routine will call *truncate*(in put_aio_ring_file()) to truncate
> pagecache which needs to acquire page_lock() for each page one by one.
> So there is a native mutual exclusion between *migrate page* v.s. truncate().
>
> If put_aio_ring_file() is called at first of the context teardown flow
> (aio_free_ring). Then, page migration and ctx freeing will have mutual
> execution guarded by lock_page() v.s. truncate(). Once a page is removed
> from radix-tree, it will not be migrated. On the other hand, the context
> can not be freed while the page migraiton are ongoing.
Sorry, but your change to remove the taking of ->private_lock in
put_aio_ring_file() is not safe. If a malicious user reinstantiates
any pages in the ring buffer's mmaping, there is nothing protecting
the system against incoherent accesses of ->ring_pages. One possible
way of making this occur would be to use mremap() to expand the size
of the mapping or move it to a different location in the user process'
address space. Yes, it's a tiny race, but it's possible. There is
absolutely no reason to remove this locking -- ring teardown is
hardly a performance sensitive code path. I'm going to stick with my
approach instead.
-ben
--
"Thought is the essence of where you are now."
--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org. For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [V2 PATCH 1/2] aio: clean up aio_migratepage() and related code much
2014-03-24 13:20 ` Benjamin LaHaise
@ 2014-03-25 10:11 ` Gu Zheng
0 siblings, 0 replies; 15+ messages in thread
From: Gu Zheng @ 2014-03-25 10:11 UTC (permalink / raw)
To: Benjamin LaHaise
Cc: Tang Chen, Dave Jones, Al Viro, jmoyer, kosaki.motohiro,
KAMEZAWA Hiroyuki, Yasuaki Ishimatsu, miaox, linux-aio, fsdevel,
linux-kernel, Andrew Morton
Hi Ben,
On 03/24/2014 09:20 PM, Benjamin LaHaise wrote:
> On Mon, Mar 24, 2014 at 06:59:30PM +0800, Gu Zheng wrote:
>> As the page migration framework holds lock_page() to protect the pages
>> (both old and new) while migrating, so while the page migrating, both
>> of old page and new page are locked. And the aio context teardown
>> routine will call *truncate*(in put_aio_ring_file()) to truncate
>> pagecache which needs to acquire page_lock() for each page one by one.
>> So there is a native mutual exclusion between *migrate page* v.s. truncate().
>>
>> If put_aio_ring_file() is called at first of the context teardown flow
>> (aio_free_ring). Then, page migration and ctx freeing will have mutual
>> execution guarded by lock_page() v.s. truncate(). Once a page is removed
>> from radix-tree, it will not be migrated. On the other hand, the context
>> can not be freed while the page migraiton are ongoing.
>
> Sorry, but your change to remove the taking of ->private_lock in
> put_aio_ring_file() is not safe. If a malicious user reinstantiates
> any pages in the ring buffer's mmaping, there is nothing protecting
> the system against incoherent accesses of ->ring_pages. One possible
> way of making this occur would be to use mremap() to expand the size
> of the mapping or move it to a different location in the user process'
> address space. Yes, it's a tiny race, but it's possible. There is
> absolutely no reason to remove this locking -- ring teardown is
> hardly a performance sensitive code path. I'm going to stick with my
> approach instead.
OK, you can go ahead via your approach, but I'll hold the reservation
about the issue you mentioned above before I find out it clearly.
BTW, please also send it to the 3.12.y and 3.13.y stable tree once it is
merged into Linus' tree.
Thanks,
Gu
>
> -ben
^ permalink raw reply [flat|nested] 15+ messages in thread
* [V2 PATCH 2/2] aio: fix the confliction of aio read events and aio migrate page
2014-03-21 18:35 ` [PATCH] aio: ensure access to ctx->ring_pages is correctly serialised Benjamin LaHaise
2014-03-24 10:56 ` Gu Zheng
2014-03-24 10:59 ` [V2 PATCH 1/2] aio: clean up aio_migratepage() and related code much Gu Zheng
@ 2014-03-24 10:59 ` Gu Zheng
2014-03-24 18:22 ` [PATCH] aio: ensure access to ctx->ring_pages is correctly serialised Sasha Levin
3 siblings, 0 replies; 15+ messages in thread
From: Gu Zheng @ 2014-03-24 10:59 UTC (permalink / raw)
To: Benjamin LaHaise
Cc: Tang Chen, Dave Jones, Al Viro, jmoyer, kosaki.motohiro,
KAMEZAWA Hiroyuki, Yasuaki Ishimatsu, miaox, linux-aio, fsdevel,
linux-kernel, Andrew Morton
Since we do not have additional protection on the page at the read events
side, so it is possible that the read of the page takes place after the
page has been freed and allocated to another part of the kernel. This
would result in the read returning invalid information.
As a result, for example, we have the following problem:
thread 1 | thread 2
|
aio_migratepage() |
|-> take ctx->completion_lock |
|-> migrate_page_copy(new, old) |
| *NOW*, ctx->ring_pages[idx] == old |
|
| *NOW*, ctx->ring_pages[idx] == old
| aio_read_events_ring()
| |-> ring = kmap_atomic(ctx->ring_pages[0])
| |-> ring->head = head; *HERE, write to the old ring page*
| |-> kunmap_atomic(ring);
|
|-> ctx->ring_pages[idx] = new |
| *BUT NOW*, the content of |
| ring_pages[idx] is old. |
|-> release ctx->completion_lock |
As above, the new ring page will not be updated.
Fix this issue, as well as prevent races in aio_ring_setup() by taking
the ring_lock mutex and completion_lock during page migration and where
otherwise applicable.
Reported-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
---
v2:
Merged Tang Chen's patch to use the spin_lock to protect the ring buffer update.
Use ring_lock rather than the additional spin_lock as Benjamin LaHaise suggested.
---
fs/aio.c | 23 ++++++++++++++++++++++-
1 files changed, 22 insertions(+), 1 deletions(-)
diff --git a/fs/aio.c b/fs/aio.c
index 6453c12..ee74704 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -298,6 +298,9 @@ static int aio_migratepage(struct address_space *mapping, struct page *new,
/* Extra ref cnt for rind_pages[] array */
get_page(new);
+ /* Ensure no aio read events is going when migrating page */
+ mutex_lock(&ctx->ring_lock);
+
rc = migrate_page_move_mapping(mapping, new, old, NULL, mode, 1);
if (rc != MIGRATEPAGE_SUCCESS) {
put_page(new);
@@ -312,6 +315,8 @@ static int aio_migratepage(struct address_space *mapping, struct page *new,
put_page(old);
+ mutex_unlock(&ctx->ring_lock);
+
return rc;
}
#endif
@@ -523,9 +528,18 @@ static int ioctx_add_table(struct kioctx *ctx, struct mm_struct *mm)
rcu_read_unlock();
spin_unlock(&mm->ioctx_lock);
+ /*
+ * Accessing ring pages must be done
+ * holding ctx->completion_lock to
+ * prevent aio ring page migration
+ * procedure from migrating ring pages.
+ */
+ spin_lock_irq(&ctx->completion_lock);
ring = kmap_atomic(ctx->ring_pages[0]);
ring->id = ctx->id;
kunmap_atomic(ring);
+ spin_unlock_irq(&ctx->completion_lock);
+
return 0;
}
@@ -624,7 +638,14 @@ static struct kioctx *ioctx_alloc(unsigned nr_events)
if (!ctx->cpu)
goto err;
- if (aio_setup_ring(ctx) < 0)
+ /*
+ * Prevent races with page migration in aio_setup_ring() by holding
+ * the ring_lock mutex.
+ */
+ mutex_lock(&ctx->ring_lock);
+ err = aio_setup_ring(ctx);
+ mutex_unlock(&ctx->ring_lock);
+ if (err < 0)
goto err;
atomic_set(&ctx->reqs_available, ctx->nr_events - 1);
--
1.7.7
--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org. For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH] aio: ensure access to ctx->ring_pages is correctly serialised
2014-03-21 18:35 ` [PATCH] aio: ensure access to ctx->ring_pages is correctly serialised Benjamin LaHaise
` (2 preceding siblings ...)
2014-03-24 10:59 ` [V2 PATCH 2/2] aio: fix the confliction of aio read events and aio migrate page Gu Zheng
@ 2014-03-24 18:22 ` Sasha Levin
2014-03-24 19:07 ` Benjamin LaHaise
3 siblings, 1 reply; 15+ messages in thread
From: Sasha Levin @ 2014-03-24 18:22 UTC (permalink / raw)
To: Benjamin LaHaise, Gu Zheng, Tang Chen
Cc: Dave Jones, Al Viro, jmoyer, kosaki.motohiro, KAMEZAWA Hiroyuki,
Yasuaki Ishimatsu, miaox, linux-aio, fsdevel, linux-kernel,
Andrew Morton, linux-mm@kvack.org
On 03/21/2014 02:35 PM, Benjamin LaHaise wrote:
> Hi all,
>
> Based on the issues reported by Tang and Gu, I've come up with the an
> alternative fix that avoids adding additional locking in the event read
> code path. The fix is to take the ring_lock mutex during page migration,
> which is already used to syncronize event readers and thus does not add
> any new locking requirements in aio_read_events_ring(). I've dropped
> the patches from Tang and Gu as a result. This patch is now in my
> git://git.kvack.org/~bcrl/aio-next.git tree and will be sent to Linus
> once a few other people chime in with their reviews of this change.
> Please review Tang, Gu. Thanks!
Hi Benjamin,
This patch seems to trigger:
[ 433.476216] ======================================================
[ 433.478468] [ INFO: possible circular locking dependency detected ]
[ 433.480900] 3.14.0-rc7-next-20140324-sasha-00015-g1fb7de8 #267 Tainted: G W
[ 433.480900] -------------------------------------------------------
[ 433.480900] trinity-c57/13776 is trying to acquire lock:
[ 433.480900] (&ctx->ring_lock){+.+.+.}, at: aio_migratepage (include/linux/spinlock.h:303 fs/aio.c:306)
[ 433.480900]
[ 433.480900] but task is already holding lock:
[ 433.480900] (&mm->mmap_sem){++++++}, at: SYSC_move_pages (mm/migrate.c:1215 mm/migrate.c:1353 mm/migrate.c:1508)
[ 433.480900]
[ 433.480900] which lock already depends on the new lock.
[ 433.480900]
[ 433.480900]
[ 433.480900] the existing dependency chain (in reverse order) is:
[ 433.480900]
-> #1 (&mm->mmap_sem){++++++}:
[ 433.480900] lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602)
[ 433.480900] down_write (arch/x86/include/asm/rwsem.h:130 kernel/locking/rwsem.c:50)
[ 433.480900] SyS_io_setup (fs/aio.c:442 fs/aio.c:689 fs/aio.c:1201 fs/aio.c:1184)
[ 433.480900] tracesys (arch/x86/kernel/entry_64.S:749)
[ 433.480900]
-> #0 (&ctx->ring_lock){+.+.+.}:
[ 433.480900] __lock_acquire (kernel/locking/lockdep.c:1840 kernel/locking/lockdep.c:1945 kernel/locking/lockdep.c:2131 kernel/locking/lockdep.c:3182)
[ 433.480900] lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602)
[ 433.480900] mutex_lock_nested (kernel/locking/mutex.c:486 kernel/locking/mutex.c:587)
[ 433.480900] aio_migratepage (include/linux/spinlock.h:303 fs/aio.c:306)
[ 433.480900] move_to_new_page (mm/migrate.c:777)
[ 433.480900] migrate_pages (mm/migrate.c:921 mm/migrate.c:960 mm/migrate.c:1126)
[ 433.480900] SYSC_move_pages (mm/migrate.c:1278 mm/migrate.c:1353 mm/migrate.c:1508)
[ 433.480900] SyS_move_pages (mm/migrate.c:1456)
[ 433.480900] tracesys (arch/x86/kernel/entry_64.S:749)
[ 433.480900]
[ 433.480900] other info that might help us debug this:
[ 433.480900]
[ 433.480900] Possible unsafe locking scenario:
[ 433.480900]
[ 433.480900] CPU0 CPU1
[ 433.480900] ---- ----
[ 433.480900] lock(&mm->mmap_sem);
[ 433.480900] lock(&ctx->ring_lock);
[ 433.480900] lock(&mm->mmap_sem);
[ 433.480900] lock(&ctx->ring_lock);
[ 433.480900]
[ 433.480900] *** DEADLOCK ***
[ 433.480900]
[ 433.480900] 1 lock held by trinity-c57/13776:
[ 433.480900] #0: (&mm->mmap_sem){++++++}, at: SYSC_move_pages (mm/migrate.c:1215 mm/migrate.c:1353 mm/migrate.c:1508)
[ 433.480900]
[ 433.480900] stack backtrace:
[ 433.480900] CPU: 4 PID: 13776 Comm: trinity-c57 Tainted: G W 3.14.0-rc7-next-20140324-sasha-00015-g1fb7de8 #267
[ 433.480900] ffffffff87a80790 ffff8806abbbb9a8 ffffffff844bae02 0000000000000000
[ 433.480900] ffffffff87a80790 ffff8806abbbb9f8 ffffffff844ad86d 0000000000000001
[ 433.480900] ffff8806abbbba88 ffff8806abbbb9f8 ffff8806ab8fbcf0 ffff8806ab8fbd28
[ 433.480900] Call Trace:
[ 433.480900] dump_stack (lib/dump_stack.c:52)
[ 433.480900] print_circular_bug (kernel/locking/lockdep.c:1216)
[ 433.480900] __lock_acquire (kernel/locking/lockdep.c:1840 kernel/locking/lockdep.c:1945 kernel/locking/lockdep.c:2131 kernel/locking/lockdep.c:3182)
[ 433.480900] ? sched_clock (arch/x86/include/asm/paravirt.h:192 arch/x86/kernel/tsc.c:305)
[ 433.480900] ? sched_clock_local (kernel/sched/clock.c:214)
[ 433.480900] ? sched_clock_cpu (kernel/sched/clock.c:311)
[ 433.480900] ? __lock_acquire (kernel/locking/lockdep.c:3189)
[ 433.480900] lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602)
[ 433.480900] ? aio_migratepage (include/linux/spinlock.h:303 fs/aio.c:306)
[ 433.480900] mutex_lock_nested (kernel/locking/mutex.c:486 kernel/locking/mutex.c:587)
[ 433.480900] ? aio_migratepage (include/linux/spinlock.h:303 fs/aio.c:306)
[ 433.480900] ? aio_migratepage (fs/aio.c:303)
[ 433.480900] ? aio_migratepage (include/linux/spinlock.h:303 fs/aio.c:306)
[ 433.480900] ? aio_migratepage (include/linux/rcupdate.h:324 include/linux/rcupdate.h:909 include/linux/percpu-refcount.h:117 fs/aio.c:297)
[ 433.480900] ? preempt_count_sub (kernel/sched/core.c:2527)
[ 433.480900] aio_migratepage (include/linux/spinlock.h:303 fs/aio.c:306)
[ 433.480900] ? aio_migratepage (include/linux/rcupdate.h:886 include/linux/percpu-refcount.h:108 fs/aio.c:297)
[ 433.480900] ? mutex_unlock (kernel/locking/mutex.c:220)
[ 433.480900] move_to_new_page (mm/migrate.c:777)
[ 433.480900] ? try_to_unmap (mm/rmap.c:1516)
[ 433.480900] ? try_to_unmap_nonlinear (mm/rmap.c:1113)
[ 433.480900] ? invalid_migration_vma (mm/rmap.c:1472)
[ 433.480900] ? page_remove_rmap (mm/rmap.c:1380)
[ 433.480900] ? anon_vma_fork (mm/rmap.c:446)
[ 433.480900] migrate_pages (mm/migrate.c:921 mm/migrate.c:960 mm/migrate.c:1126)
[ 433.480900] ? follow_page_mask (mm/memory.c:1544)
[ 433.480900] ? alloc_misplaced_dst_page (mm/migrate.c:1177)
[ 433.480900] SYSC_move_pages (mm/migrate.c:1278 mm/migrate.c:1353 mm/migrate.c:1508)
[ 433.480900] ? SYSC_move_pages (include/linux/rcupdate.h:800 mm/migrate.c:1472)
[ 433.480900] ? sched_clock (arch/x86/include/asm/paravirt.h:192 arch/x86/kernel/tsc.c:305)
[ 433.480900] SyS_move_pages (mm/migrate.c:1456)
[ 433.480900] tracesys (arch/x86/kernel/entry_64.S:749)
Thanks,
Sasha
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] aio: ensure access to ctx->ring_pages is correctly serialised
2014-03-24 18:22 ` [PATCH] aio: ensure access to ctx->ring_pages is correctly serialised Sasha Levin
@ 2014-03-24 19:07 ` Benjamin LaHaise
2014-03-25 17:47 ` Sasha Levin
0 siblings, 1 reply; 15+ messages in thread
From: Benjamin LaHaise @ 2014-03-24 19:07 UTC (permalink / raw)
To: Sasha Levin
Cc: Gu Zheng, Tang Chen, Dave Jones, Al Viro, jmoyer, kosaki.motohiro,
KAMEZAWA Hiroyuki, Yasuaki Ishimatsu, miaox, linux-aio, fsdevel,
linux-kernel, Andrew Morton, linux-mm@kvack.org
On Mon, Mar 24, 2014 at 02:22:06PM -0400, Sasha Levin wrote:
> On 03/21/2014 02:35 PM, Benjamin LaHaise wrote:
> >Hi all,
> >
> >Based on the issues reported by Tang and Gu, I've come up with the an
> >alternative fix that avoids adding additional locking in the event read
> >code path. The fix is to take the ring_lock mutex during page migration,
> >which is already used to syncronize event readers and thus does not add
> >any new locking requirements in aio_read_events_ring(). I've dropped
> >the patches from Tang and Gu as a result. This patch is now in my
> >git://git.kvack.org/~bcrl/aio-next.git tree and will be sent to Linus
> >once a few other people chime in with their reviews of this change.
> >Please review Tang, Gu. Thanks!
>
> Hi Benjamin,
>
> This patch seems to trigger:
>
> [ 433.476216] ======================================================
> [ 433.478468] [ INFO: possible circular locking dependency detected ]
...
Yeah, that's a problem -- thanks for the report. The ring_lock mutex can't
be nested inside of mmap_sem, as aio_read_events_ring() can take a page
fault while holding ring_mutex. That makes the following change required.
I'll fold this change into the patch that caused this issue.
-ben
--
"Thought is the essence of where you are now."
diff --git a/fs/aio.c b/fs/aio.c
index c97cee8..f645e7e 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -300,7 +300,10 @@ static int aio_migratepage(struct address_space *mapping, struct page *new,
if (!ctx)
return -EINVAL;
- mutex_lock(&ctx->ring_lock);
+ if (!mutex_trylock(&ctx->ring_lock)) {
+ percpu_ref_put(&ctx->users);
+ return -EAGAIN;
+ }
/* Make sure the old page hasn't already been changed */
spin_lock(&mapping->private_lock);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH] aio: ensure access to ctx->ring_pages is correctly serialised
2014-03-24 19:07 ` Benjamin LaHaise
@ 2014-03-25 17:47 ` Sasha Levin
2014-03-25 18:57 ` Benjamin LaHaise
0 siblings, 1 reply; 15+ messages in thread
From: Sasha Levin @ 2014-03-25 17:47 UTC (permalink / raw)
To: Benjamin LaHaise
Cc: Gu Zheng, Tang Chen, Dave Jones, Al Viro, jmoyer, kosaki.motohiro,
KAMEZAWA Hiroyuki, Yasuaki Ishimatsu, miaox, linux-aio, fsdevel,
linux-kernel, Andrew Morton, linux-mm@kvack.org
On 03/24/2014 03:07 PM, Benjamin LaHaise wrote:
> On Mon, Mar 24, 2014 at 02:22:06PM -0400, Sasha Levin wrote:
>> On 03/21/2014 02:35 PM, Benjamin LaHaise wrote:
>>> Hi all,
>>>
>>> Based on the issues reported by Tang and Gu, I've come up with the an
>>> alternative fix that avoids adding additional locking in the event read
>>> code path. The fix is to take the ring_lock mutex during page migration,
>>> which is already used to syncronize event readers and thus does not add
>>> any new locking requirements in aio_read_events_ring(). I've dropped
>>> the patches from Tang and Gu as a result. This patch is now in my
>>> git://git.kvack.org/~bcrl/aio-next.git tree and will be sent to Linus
>>> once a few other people chime in with their reviews of this change.
>>> Please review Tang, Gu. Thanks!
>>
>> Hi Benjamin,
>>
>> This patch seems to trigger:
>>
>> [ 433.476216] ======================================================
>> [ 433.478468] [ INFO: possible circular locking dependency detected ]
> ...
>
> Yeah, that's a problem -- thanks for the report. The ring_lock mutex can't
> be nested inside of mmap_sem, as aio_read_events_ring() can take a page
> fault while holding ring_mutex. That makes the following change required.
> I'll fold this change into the patch that caused this issue.
Yup, that does the trick.
Could you please add something to document why this is a trylock instead of a lock? If
I were reading the code there's no way I'd understand what's the reason behind it
without knowing of this bug report.
Thanks,
Sasha
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] aio: ensure access to ctx->ring_pages is correctly serialised
2014-03-25 17:47 ` Sasha Levin
@ 2014-03-25 18:57 ` Benjamin LaHaise
0 siblings, 0 replies; 15+ messages in thread
From: Benjamin LaHaise @ 2014-03-25 18:57 UTC (permalink / raw)
To: Sasha Levin
Cc: Gu Zheng, Tang Chen, Dave Jones, Al Viro, jmoyer, kosaki.motohiro,
KAMEZAWA Hiroyuki, Yasuaki Ishimatsu, miaox, linux-aio, fsdevel,
linux-kernel, Andrew Morton, linux-mm@kvack.org
Hi Sasha,
On Tue, Mar 25, 2014 at 01:47:40PM -0400, Sasha Levin wrote:
> On 03/24/2014 03:07 PM, Benjamin LaHaise wrote:
...
> >Yeah, that's a problem -- thanks for the report. The ring_lock mutex can't
> >be nested inside of mmap_sem, as aio_read_events_ring() can take a page
> >fault while holding ring_mutex. That makes the following change required.
> >I'll fold this change into the patch that caused this issue.
>
> Yup, that does the trick.
>
> Could you please add something to document why this is a trylock instead of
> a lock? If
> I were reading the code there's no way I'd understand what's the reason
> behind it
> without knowing of this bug report.
Done. I've updated the patch in my aio-next.git tree, so it should be in
tomorrow's linux-next, and will give it one last day for any further problem
reports. Thanks for testing!
-ben
> Thanks,
> Sasha
--
"Thought is the essence of where you are now."
--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org. For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>
^ permalink raw reply [flat|nested] 15+ messages in thread