* [PATCH 1/2] aio: block io_destroy() until all context requests are completed
[not found] <cover.1398948681.git.bcrl@kvack.org>
@ 2014-05-01 13:06 ` Benjamin LaHaise
2014-05-01 13:07 ` [PATCH 2/2] aio: fix potential leak in aio_run_iocb() Benjamin LaHaise
1 sibling, 0 replies; 5+ messages in thread
From: Benjamin LaHaise @ 2014-05-01 13:06 UTC (permalink / raw)
To: torvalds; +Cc: linux-aio, linux-fsdevel, stable, Anatol Pomozov
deletes aio context and all resources related to. It makes sense that
no IO operations connected to the context should be running after the context
is destroyed. As we removed io_context we have no chance to
get requests status or call io_getevents().
man page for io_destroy says that this function may block until
all context's requests are completed. Before kernel 3.11 io_destroy()
blocked indeed, but since aio refactoring in 3.11 it is not true anymore.
Here is a pseudo-code that shows a testcase for a race condition discovered
in 3.11:
initialize io_context
io_submit(read to buffer)
io_destroy()
// context is destroyed so we can free the resources
free(buffers);
// if the buffer is allocated by some other user he'll be surprised
// to learn that the buffer still filled by an outstanding operation
// from the destroyed io_context
The fix is straight-forward - add a completion struct and wait on it
in io_destroy, complete() should be called when number of in-fligh requests
reaches zero.
If two or more io_destroy() called for the same context simultaneously then
only the first one waits for IO completion, other calls behaviour is undefined.
Tested: ran http://pastebin.com/LrPsQ4RL testcase for several hours and
do not see the race condition anymore.
Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com>
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
---
fs/aio.c | 36 ++++++++++++++++++++++++++++++++----
1 file changed, 32 insertions(+), 4 deletions(-)
diff --git a/fs/aio.c b/fs/aio.c
index 12a3de0e..2adbb03 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -112,6 +112,11 @@ struct kioctx {
struct work_struct free_work;
+ /*
+ * signals when all in-flight requests are done
+ */
+ struct completion *requests_done;
+
struct {
/*
* This counts the number of available slots in the ringbuffer,
@@ -508,6 +513,10 @@ static void free_ioctx_reqs(struct percpu_ref *ref)
{
struct kioctx *ctx = container_of(ref, struct kioctx, reqs);
+ /* At this point we know that there are no any in-flight requests */
+ if (ctx->requests_done)
+ complete(ctx->requests_done);
+
INIT_WORK(&ctx->free_work, free_ioctx);
schedule_work(&ctx->free_work);
}
@@ -718,7 +727,8 @@ err:
* when the processes owning a context have all exited to encourage
* the rapid destruction of the kioctx.
*/
-static void kill_ioctx(struct mm_struct *mm, struct kioctx *ctx)
+static void kill_ioctx(struct mm_struct *mm, struct kioctx *ctx,
+ struct completion *requests_done)
{
if (!atomic_xchg(&ctx->dead, 1)) {
struct kioctx_table *table;
@@ -747,7 +757,11 @@ static void kill_ioctx(struct mm_struct *mm, struct kioctx *ctx)
if (ctx->mmap_size)
vm_munmap(ctx->mmap_base, ctx->mmap_size);
+ ctx->requests_done = requests_done;
percpu_ref_kill(&ctx->users);
+ } else {
+ if (requests_done)
+ complete(requests_done);
}
}
@@ -809,7 +823,7 @@ void exit_aio(struct mm_struct *mm)
*/
ctx->mmap_size = 0;
- kill_ioctx(mm, ctx);
+ kill_ioctx(mm, ctx, NULL);
}
}
@@ -1185,7 +1199,7 @@ SYSCALL_DEFINE2(io_setup, unsigned, nr_events, aio_context_t __user *, ctxp)
if (!IS_ERR(ioctx)) {
ret = put_user(ioctx->user_id, ctxp);
if (ret)
- kill_ioctx(current->mm, ioctx);
+ kill_ioctx(current->mm, ioctx, NULL);
percpu_ref_put(&ioctx->users);
}
@@ -1203,8 +1217,22 @@ SYSCALL_DEFINE1(io_destroy, aio_context_t, ctx)
{
struct kioctx *ioctx = lookup_ioctx(ctx);
if (likely(NULL != ioctx)) {
- kill_ioctx(current->mm, ioctx);
+ struct completion requests_done =
+ COMPLETION_INITIALIZER_ONSTACK(requests_done);
+
+ /* Pass requests_done to kill_ioctx() where it can be set
+ * in a thread-safe way. If we try to set it here then we have
+ * a race condition if two io_destroy() called simultaneously.
+ */
+ kill_ioctx(current->mm, ioctx, &requests_done);
percpu_ref_put(&ioctx->users);
+
+ /* Wait until all IO for the context are done. Otherwise kernel
+ * keep using user-space buffers even if user thinks the context
+ * is destroyed.
+ */
+ wait_for_completion(&requests_done);
+
return 0;
}
pr_debug("EINVAL: io_destroy: invalid context id\n");
--
1.8.2.1
--
"Thought is the essence of where you are now."
--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org. For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 2/2] aio: fix potential leak in aio_run_iocb().
[not found] <cover.1398948681.git.bcrl@kvack.org>
2014-05-01 13:06 ` [PATCH 1/2] aio: block io_destroy() until all context requests are completed Benjamin LaHaise
@ 2014-05-01 13:07 ` Benjamin LaHaise
2014-05-02 8:56 ` Lukáš Czerner
1 sibling, 1 reply; 5+ messages in thread
From: Benjamin LaHaise @ 2014-05-01 13:07 UTC (permalink / raw)
To: torvalds; +Cc: linux-aio, linux-fsdevel, stable, Leon Yu
iovec should be reclaimed whenever caller of rw_copy_check_uvector() returns,
but it doesn't hold when failure happens right after aio_setup_vectored_rw().
Fix that in a such way to avoid hairy goto.
Signed-off-by: Leon Yu <chianglungyu@gmail.com>
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Cc: stable@vger.kernel.org
---
fs/aio.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/fs/aio.c b/fs/aio.c
index 2adbb03..a0ed6c7 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1327,10 +1327,8 @@ rw_common:
&iovec, compat)
: aio_setup_single_vector(req, rw, buf, &nr_segs,
iovec);
- if (ret)
- return ret;
-
- ret = rw_verify_area(rw, file, &req->ki_pos, req->ki_nbytes);
+ if (!ret)
+ ret = rw_verify_area(rw, file, &req->ki_pos, req->ki_nbytes);
if (ret < 0) {
if (iovec != &inline_vec)
kfree(iovec);
--
1.8.2.1
--
"Thought is the essence of where you are now."
--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org. For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 2/2] aio: fix potential leak in aio_run_iocb().
2014-05-01 13:07 ` [PATCH 2/2] aio: fix potential leak in aio_run_iocb() Benjamin LaHaise
@ 2014-05-02 8:56 ` Lukáš Czerner
2014-05-02 11:53 ` Mateusz Guzik
0 siblings, 1 reply; 5+ messages in thread
From: Lukáš Czerner @ 2014-05-02 8:56 UTC (permalink / raw)
To: Benjamin LaHaise; +Cc: torvalds, linux-aio, linux-fsdevel, stable, Leon Yu
On Thu, 1 May 2014, Benjamin LaHaise wrote:
> Date: Thu, 1 May 2014 09:07:09 -0400
> From: Benjamin LaHaise <bcrl@kvack.org>
> To: torvalds@linux-foundation.org
> Cc: linux-aio@kvack.org, linux-fsdevel@vger.kernel.org,
> stable@vger.kernel.org, Leon Yu <chianglungyu@gmail.com>
> Subject: [PATCH 2/2] aio: fix potential leak in aio_run_iocb().
>
> iovec should be reclaimed whenever caller of rw_copy_check_uvector() returns,
> but it doesn't hold when failure happens right after aio_setup_vectored_rw().
>
> Fix that in a such way to avoid hairy goto.
As I already replied to Leon,
this does not seem right.
>
> Signed-off-by: Leon Yu <chianglungyu@gmail.com>
> Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
> Cc: stable@vger.kernel.org
> ---
> fs/aio.c | 6 ++----
> 1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/fs/aio.c b/fs/aio.c
> index 2adbb03..a0ed6c7 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -1327,10 +1327,8 @@ rw_common:
> &iovec, compat)
> : aio_setup_single_vector(req, rw, buf, &nr_segs,
> iovec);
> - if (ret)
> - return ret;
> -
> - ret = rw_verify_area(rw, file, &req->ki_pos, req->ki_nbytes);
here ret could be possibly set to a positive number.
> + if (!ret)
> + ret = rw_verify_area(rw, file, &req->ki_pos, req->ki_nbytes);
> if (ret < 0) {
but here we're checking for negative and bail out. However this
changes the way it worked before this patch and the iovec would not
be reclaimed anyway as you mentioned in the commit description.
Thanks!
-Lukas
> if (iovec != &inline_vec)
> kfree(iovec);
>
--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org. For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 2/2] aio: fix potential leak in aio_run_iocb().
2014-05-02 8:56 ` Lukáš Czerner
@ 2014-05-02 11:53 ` Mateusz Guzik
2014-05-02 12:16 ` Lukáš Czerner
0 siblings, 1 reply; 5+ messages in thread
From: Mateusz Guzik @ 2014-05-02 11:53 UTC (permalink / raw)
To: Lukáš Czerner
Cc: Benjamin LaHaise, torvalds, linux-aio, linux-fsdevel, stable,
Leon Yu
On Fri, May 02, 2014 at 10:56:32AM +0200, Lukáš Czerner wrote:
> On Thu, 1 May 2014, Benjamin LaHaise wrote:
>
> > Date: Thu, 1 May 2014 09:07:09 -0400
> > From: Benjamin LaHaise <bcrl@kvack.org>
> > To: torvalds@linux-foundation.org
> > Cc: linux-aio@kvack.org, linux-fsdevel@vger.kernel.org,
> > stable@vger.kernel.org, Leon Yu <chianglungyu@gmail.com>
> > Subject: [PATCH 2/2] aio: fix potential leak in aio_run_iocb().
> >
> > iovec should be reclaimed whenever caller of rw_copy_check_uvector() returns,
> > but it doesn't hold when failure happens right after aio_setup_vectored_rw().
> >
> > Fix that in a such way to avoid hairy goto.
>
> As I already replied to Leon,
>
> this does not seem right.
>
> >
> > Signed-off-by: Leon Yu <chianglungyu@gmail.com>
> > Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
> > Cc: stable@vger.kernel.org
> > ---
> > fs/aio.c | 6 ++----
> > 1 file changed, 2 insertions(+), 4 deletions(-)
> >
> > diff --git a/fs/aio.c b/fs/aio.c
> > index 2adbb03..a0ed6c7 100644
> > --- a/fs/aio.c
> > +++ b/fs/aio.c
> > @@ -1327,10 +1327,8 @@ rw_common:
> > &iovec, compat)
> > : aio_setup_single_vector(req, rw, buf, &nr_segs,
> > iovec);
> > - if (ret)
> > - return ret;
> > -
> > - ret = rw_verify_area(rw, file, &req->ki_pos, req->ki_nbytes);
>
> here ret could be possibly set to a positive number.
>
How?
ret = (opcode == IOCB_CMD_PREADV ||
opcode == IOCB_CMD_PWRITEV)
? aio_setup_vectored_rw(req, rw, buf, &nr_segs,
&iovec, compat)
: aio_setup_single_vector(req, rw, buf, &nr_segs,
iovec);
Where aio_setup_vectored_rw:
if (ret < 0)
return ret;
[..]
return 0;
and aio_setup_single_vector:
if (unlikely(!access_ok(!rw, buf, kiocb->ki_nbytes)))
return -EFAULT;
[..]
return 0;
Both functions are returning ssize_t, thus it's either 0 on success or
negative on failure.
"if (ret)" replaced by "if (ret < 0)" should indeed set off alarm bells,
but turns it turns out to be fine here.
> > + if (!ret)
> > + ret = rw_verify_area(rw, file, &req->ki_pos, req->ki_nbytes);
> > if (ret < 0) {
>
So this check is fine and cleanup will be called.
However, there is a yet to be merged patch which fixes actual problem
which is weird rw_copy_check_uvector semantics:
https://lkml.org/lkml/2014/4/25/778
rendering this patch unnecessary
--
Mateusz Guzik
--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org. For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 2/2] aio: fix potential leak in aio_run_iocb().
2014-05-02 11:53 ` Mateusz Guzik
@ 2014-05-02 12:16 ` Lukáš Czerner
0 siblings, 0 replies; 5+ messages in thread
From: Lukáš Czerner @ 2014-05-02 12:16 UTC (permalink / raw)
To: Mateusz Guzik
Cc: Benjamin LaHaise, torvalds, linux-aio, linux-fsdevel, stable,
Leon Yu
[-- Attachment #1: Type: TEXT/PLAIN, Size: 3230 bytes --]
On Fri, 2 May 2014, Mateusz Guzik wrote:
> Date: Fri, 2 May 2014 13:53:11 +0200
> From: Mateusz Guzik <mguzik@redhat.com>
> To: Lukáš Czerner <lczerner@redhat.com>
> Cc: Benjamin LaHaise <bcrl@kvack.org>, torvalds@linux-foundation.org,
> linux-aio@kvack.org, linux-fsdevel@vger.kernel.org,
> stable@vger.kernel.org, Leon Yu <chianglungyu@gmail.com>
> Subject: Re: [PATCH 2/2] aio: fix potential leak in aio_run_iocb().
>
> On Fri, May 02, 2014 at 10:56:32AM +0200, Lukáš Czerner wrote:
> > On Thu, 1 May 2014, Benjamin LaHaise wrote:
> >
> > > Date: Thu, 1 May 2014 09:07:09 -0400
> > > From: Benjamin LaHaise <bcrl@kvack.org>
> > > To: torvalds@linux-foundation.org
> > > Cc: linux-aio@kvack.org, linux-fsdevel@vger.kernel.org,
> > > stable@vger.kernel.org, Leon Yu <chianglungyu@gmail.com>
> > > Subject: [PATCH 2/2] aio: fix potential leak in aio_run_iocb().
> > >
> > > iovec should be reclaimed whenever caller of rw_copy_check_uvector() returns,
> > > but it doesn't hold when failure happens right after aio_setup_vectored_rw().
> > >
> > > Fix that in a such way to avoid hairy goto.
> >
> > As I already replied to Leon,
> >
> > this does not seem right.
> >
> > >
> > > Signed-off-by: Leon Yu <chianglungyu@gmail.com>
> > > Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
> > > Cc: stable@vger.kernel.org
> > > ---
> > > fs/aio.c | 6 ++----
> > > 1 file changed, 2 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/fs/aio.c b/fs/aio.c
> > > index 2adbb03..a0ed6c7 100644
> > > --- a/fs/aio.c
> > > +++ b/fs/aio.c
> > > @@ -1327,10 +1327,8 @@ rw_common:
> > > &iovec, compat)
> > > : aio_setup_single_vector(req, rw, buf, &nr_segs,
> > > iovec);
> > > - if (ret)
> > > - return ret;
> > > -
> > > - ret = rw_verify_area(rw, file, &req->ki_pos, req->ki_nbytes);
> >
> > here ret could be possibly set to a positive number.
> >
>
> How?
>
> ret = (opcode == IOCB_CMD_PREADV ||
> opcode == IOCB_CMD_PWRITEV)
> ? aio_setup_vectored_rw(req, rw, buf, &nr_segs,
> &iovec, compat)
> : aio_setup_single_vector(req, rw, buf, &nr_segs,
> iovec);
>
> Where aio_setup_vectored_rw:
> if (ret < 0)
> return ret;
> [..]
> return 0;
ah right, it replaces the return value. Ignore me then.
-Lukas
>
>
> and aio_setup_single_vector:
> if (unlikely(!access_ok(!rw, buf, kiocb->ki_nbytes)))
> return -EFAULT;
> [..]
> return 0;
>
> Both functions are returning ssize_t, thus it's either 0 on success or
> negative on failure.
>
> "if (ret)" replaced by "if (ret < 0)" should indeed set off alarm bells,
> but turns it turns out to be fine here.
>
> > > + if (!ret)
> > > + ret = rw_verify_area(rw, file, &req->ki_pos, req->ki_nbytes);
> > > if (ret < 0) {
> >
>
> So this check is fine and cleanup will be called.
>
> However, there is a yet to be merged patch which fixes actual problem
> which is weird rw_copy_check_uvector semantics:
> https://lkml.org/lkml/2014/4/25/778
>
> rendering this patch unnecessary
>
>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2014-05-02 12:16 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <cover.1398948681.git.bcrl@kvack.org>
2014-05-01 13:06 ` [PATCH 1/2] aio: block io_destroy() until all context requests are completed Benjamin LaHaise
2014-05-01 13:07 ` [PATCH 2/2] aio: fix potential leak in aio_run_iocb() Benjamin LaHaise
2014-05-02 8:56 ` Lukáš Czerner
2014-05-02 11:53 ` Mateusz Guzik
2014-05-02 12:16 ` Lukáš Czerner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).