* [PATCH] aio: Make it possible to remap aio ring
@ 2014-09-18 15:56 Pavel Emelyanov
2014-09-25 12:18 ` Dmitry Monakhov
0 siblings, 1 reply; 4+ messages in thread
From: Pavel Emelyanov @ 2014-09-18 15:56 UTC (permalink / raw)
To: Benjamin LaHaise, Andrew Morton, Hugh Dickins, linux-aio,
Linux MM
Cc: Dmitriy Monakhov
Hi,
There are actually two issues this patch addresses. Let me start with
the one I tried to solve in the beginning.
So, in the checkpoint-restore project (criu) we try to dump tasks'
state and restore one back exactly as it was. One of the tasks' state
bits is rings set up with io_setup() call. There's (almost) no problems
in dumping them, there's a problem restoring them -- if I dump a task
with aio ring originally mapped at address A, I want to restore one
back at exactly the same address A. Unfortunately, the io_setup() does
not allow for that -- it mmaps the ring at whatever place mm finds
appropriate (it calls do_mmap_pgoff() with zero address and without
the MAP_FIXED flag).
To make restore possible I'm going to mremap() the freshly created ring
into the address A (under which it was seen before dump). The problem is
that the ring's virtual address is passed back to the user-space as the
context ID and this ID is then used as search key by all the other io_foo()
calls. Reworking this ID to be just some integer doesn't seem to work, as
this value is already used by libaio as a pointer using which this library
accesses memory for aio meta-data.
So, to make restore work we need to make sure that
a) ring is mapped at desired virtual address
b) kioctx->user_id matches this value
Having said that, the patch makes mremap() on aio region update the
kioctx's user_id and mmap_base values.
Here appears the 2nd issue I mentioned in the beginning of this mail.
If (regardless of the C/R dances I do) someone creates an io context
with io_setup(), then mremap()-s the ring and then destroys the context,
the kill_ioctx() routine will call munmap() on wrong (old) address.
This will result in a) aio ring remaining in memory and b) some other
vma get unexpectedly unmapped.
What do you think?
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
---
fs/aio.c | 25 +++++++++++++++++++++++++
include/linux/fs.h | 1 +
mm/mremap.c | 3 ++-
3 files changed, 28 insertions(+), 1 deletion(-)
diff --git a/fs/aio.c b/fs/aio.c
index 1c9c5f0..a0865e4 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -273,12 +273,37 @@ static void aio_free_ring(struct kioctx *ctx)
static int aio_ring_mmap(struct file *file, struct vm_area_struct *vma)
{
+ vma->vm_flags |= VM_DONTEXPAND;
vma->vm_ops = &generic_file_vm_ops;
return 0;
}
+static void aio_ring_remap(struct file *file, struct vm_area_struct *vma)
+{
+ struct mm_struct *mm = vma->vm_mm;
+ struct kioctx_table *table;
+ int i;
+
+ spin_lock(&mm->ioctx_lock);
+ rcu_read_lock();
+ table = rcu_dereference(mm->ioctx_table);
+ for (i = 0; i < table->nr; i++) {
+ struct kioctx *ctx;
+
+ ctx = table->table[i];
+ if (ctx && ctx->aio_ring_file == file) {
+ ctx->user_id = ctx->mmap_base = vma->vm_start;
+ break;
+ }
+ }
+
+ rcu_read_unlock();
+ spin_unlock(&mm->ioctx_lock);
+}
+
static const struct file_operations aio_ring_fops = {
.mmap = aio_ring_mmap,
+ .mremap = aio_ring_remap,
};
static int aio_set_page_dirty(struct page *page)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index e11d60c..379bd75 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1467,6 +1467,7 @@ struct file_operations {
long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
int (*mmap) (struct file *, struct vm_area_struct *);
+ void (*mremap)(struct file *, struct vm_area_struct *);
int (*open) (struct inode *, struct file *);
int (*flush) (struct file *, fl_owner_t id);
int (*release) (struct inode *, struct file *);
diff --git a/mm/mremap.c b/mm/mremap.c
index 05f1180..18200b9 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -287,7 +287,8 @@ static unsigned long move_vma(struct vm_area_struct *vma,
old_len = new_len;
old_addr = new_addr;
new_addr = -ENOMEM;
- }
+ } else if (vma->vm_file && vma->vm_file->f_op->mremap)
+ vma->vm_file->f_op->mremap(vma->vm_file, new_vma);
/* Conceal VM_ACCOUNT so old reservation is not undone */
if (vm_flags & VM_ACCOUNT) {
--
1.8.4.2
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] aio: Make it possible to remap aio ring
2014-09-18 15:56 [PATCH] aio: Make it possible to remap aio ring Pavel Emelyanov
@ 2014-09-25 12:18 ` Dmitry Monakhov
2014-09-25 15:13 ` Benjamin LaHaise
0 siblings, 1 reply; 4+ messages in thread
From: Dmitry Monakhov @ 2014-09-25 12:18 UTC (permalink / raw)
To: Pavel Emelyanov, Benjamin LaHaise, Andrew Morton, Hugh Dickins,
linux-aio, Linux MM
On Thu, 18 Sep 2014 19:56:17 +0400, Pavel Emelyanov <xemul@parallels.com> wrote:
> Hi,
>
> There are actually two issues this patch addresses. Let me start with
> the one I tried to solve in the beginning.
>
> So, in the checkpoint-restore project (criu) we try to dump tasks'
> state and restore one back exactly as it was. One of the tasks' state
> bits is rings set up with io_setup() call. There's (almost) no problems
> in dumping them, there's a problem restoring them -- if I dump a task
> with aio ring originally mapped at address A, I want to restore one
> back at exactly the same address A. Unfortunately, the io_setup() does
> not allow for that -- it mmaps the ring at whatever place mm finds
> appropriate (it calls do_mmap_pgoff() with zero address and without
> the MAP_FIXED flag).
>
> To make restore possible I'm going to mremap() the freshly created ring
> into the address A (under which it was seen before dump). The problem is
> that the ring's virtual address is passed back to the user-space as the
> context ID and this ID is then used as search key by all the other io_foo()
> calls. Reworking this ID to be just some integer doesn't seem to work, as
> this value is already used by libaio as a pointer using which this library
> accesses memory for aio meta-data.
>
> So, to make restore work we need to make sure that
>
> a) ring is mapped at desired virtual address
> b) kioctx->user_id matches this value
>
> Having said that, the patch makes mremap() on aio region update the
> kioctx's user_id and mmap_base values.
>
>
> Here appears the 2nd issue I mentioned in the beginning of this mail.
> If (regardless of the C/R dances I do) someone creates an io context
> with io_setup(), then mremap()-s the ring and then destroys the context,
> the kill_ioctx() routine will call munmap() on wrong (old) address.
> This will result in a) aio ring remaining in memory and b) some other
> vma get unexpectedly unmapped.
>
>
> What do you think?
Look reasonable.
Feel free to add Acked-by:Dmitry Monakhov <dmonakhov@openvz.org>
>
> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
> ---
> fs/aio.c | 25 +++++++++++++++++++++++++
> include/linux/fs.h | 1 +
> mm/mremap.c | 3 ++-
> 3 files changed, 28 insertions(+), 1 deletion(-)
>
> diff --git a/fs/aio.c b/fs/aio.c
> index 1c9c5f0..a0865e4 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -273,12 +273,37 @@ static void aio_free_ring(struct kioctx *ctx)
>
> static int aio_ring_mmap(struct file *file, struct vm_area_struct *vma)
> {
> + vma->vm_flags |= VM_DONTEXPAND;
> vma->vm_ops = &generic_file_vm_ops;
> return 0;
> }
>
> +static void aio_ring_remap(struct file *file, struct vm_area_struct *vma)
> +{
> + struct mm_struct *mm = vma->vm_mm;
> + struct kioctx_table *table;
> + int i;
> +
> + spin_lock(&mm->ioctx_lock);
> + rcu_read_lock();
> + table = rcu_dereference(mm->ioctx_table);
> + for (i = 0; i < table->nr; i++) {
> + struct kioctx *ctx;
> +
> + ctx = table->table[i];
> + if (ctx && ctx->aio_ring_file == file) {
> + ctx->user_id = ctx->mmap_base = vma->vm_start;
> + break;
> + }
> + }
> +
> + rcu_read_unlock();
> + spin_unlock(&mm->ioctx_lock);
> +}
> +
> static const struct file_operations aio_ring_fops = {
> .mmap = aio_ring_mmap,
> + .mremap = aio_ring_remap,
> };
>
> static int aio_set_page_dirty(struct page *page)
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index e11d60c..379bd75 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1467,6 +1467,7 @@ struct file_operations {
> long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
> long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
> int (*mmap) (struct file *, struct vm_area_struct *);
> + void (*mremap)(struct file *, struct vm_area_struct *);
> int (*open) (struct inode *, struct file *);
> int (*flush) (struct file *, fl_owner_t id);
> int (*release) (struct inode *, struct file *);
> diff --git a/mm/mremap.c b/mm/mremap.c
> index 05f1180..18200b9 100644
> --- a/mm/mremap.c
> +++ b/mm/mremap.c
> @@ -287,7 +287,8 @@ static unsigned long move_vma(struct vm_area_struct *vma,
> old_len = new_len;
> old_addr = new_addr;
> new_addr = -ENOMEM;
> - }
> + } else if (vma->vm_file && vma->vm_file->f_op->mremap)
> + vma->vm_file->f_op->mremap(vma->vm_file, new_vma);
>
> /* Conceal VM_ACCOUNT so old reservation is not undone */
> if (vm_flags & VM_ACCOUNT) {
> --
> 1.8.4.2
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] aio: Make it possible to remap aio ring
2014-09-25 12:18 ` Dmitry Monakhov
@ 2014-09-25 15:13 ` Benjamin LaHaise
2014-09-25 18:12 ` Zach Brown
0 siblings, 1 reply; 4+ messages in thread
From: Benjamin LaHaise @ 2014-09-25 15:13 UTC (permalink / raw)
To: Dmitry Monakhov
Cc: Pavel Emelyanov, Andrew Morton, Hugh Dickins, linux-aio, Linux MM
On Thu, Sep 25, 2014 at 04:18:51PM +0400, Dmitry Monakhov wrote:
> On Thu, 18 Sep 2014 19:56:17 +0400, Pavel Emelyanov <xemul@parallels.com> wrote:
> > Hi,
> >
> > There are actually two issues this patch addresses. Let me start with
> > the one I tried to solve in the beginning.
> >
> > So, in the checkpoint-restore project (criu) we try to dump tasks'
> > state and restore one back exactly as it was. One of the tasks' state
> > bits is rings set up with io_setup() call. There's (almost) no problems
> > in dumping them, there's a problem restoring them -- if I dump a task
> > with aio ring originally mapped at address A, I want to restore one
> > back at exactly the same address A. Unfortunately, the io_setup() does
> > not allow for that -- it mmaps the ring at whatever place mm finds
> > appropriate (it calls do_mmap_pgoff() with zero address and without
> > the MAP_FIXED flag).
> >
> > To make restore possible I'm going to mremap() the freshly created ring
> > into the address A (under which it was seen before dump). The problem is
> > that the ring's virtual address is passed back to the user-space as the
> > context ID and this ID is then used as search key by all the other io_foo()
> > calls. Reworking this ID to be just some integer doesn't seem to work, as
> > this value is already used by libaio as a pointer using which this library
> > accesses memory for aio meta-data.
> >
> > So, to make restore work we need to make sure that
> >
> > a) ring is mapped at desired virtual address
> > b) kioctx->user_id matches this value
> >
> > Having said that, the patch makes mremap() on aio region update the
> > kioctx's user_id and mmap_base values.
> >
> >
> > Here appears the 2nd issue I mentioned in the beginning of this mail.
> > If (regardless of the C/R dances I do) someone creates an io context
> > with io_setup(), then mremap()-s the ring and then destroys the context,
> > the kill_ioctx() routine will call munmap() on wrong (old) address.
> > This will result in a) aio ring remaining in memory and b) some other
> > vma get unexpectedly unmapped.
> >
> >
> > What do you think?
> Look reasonable.
> Feel free to add Acked-by:Dmitry Monakhov <dmonakhov@openvz.org>
> >
> > Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
I've had a look over this patch, and it seems okay to me. The interaction
with page migration looks safe, as well as with io_destroy(). I've applied
this to my aio-next tree at git://git.kvack.org/~bcrl/aio-next.git . If
mm folks have any concerns, please let me know.
-ben
> > ---
> > fs/aio.c | 25 +++++++++++++++++++++++++
> > include/linux/fs.h | 1 +
> > mm/mremap.c | 3 ++-
> > 3 files changed, 28 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/aio.c b/fs/aio.c
> > index 1c9c5f0..a0865e4 100644
> > --- a/fs/aio.c
> > +++ b/fs/aio.c
> > @@ -273,12 +273,37 @@ static void aio_free_ring(struct kioctx *ctx)
> >
> > static int aio_ring_mmap(struct file *file, struct vm_area_struct *vma)
> > {
> > + vma->vm_flags |= VM_DONTEXPAND;
> > vma->vm_ops = &generic_file_vm_ops;
> > return 0;
> > }
> >
> > +static void aio_ring_remap(struct file *file, struct vm_area_struct *vma)
> > +{
> > + struct mm_struct *mm = vma->vm_mm;
> > + struct kioctx_table *table;
> > + int i;
> > +
> > + spin_lock(&mm->ioctx_lock);
> > + rcu_read_lock();
> > + table = rcu_dereference(mm->ioctx_table);
> > + for (i = 0; i < table->nr; i++) {
> > + struct kioctx *ctx;
> > +
> > + ctx = table->table[i];
> > + if (ctx && ctx->aio_ring_file == file) {
> > + ctx->user_id = ctx->mmap_base = vma->vm_start;
> > + break;
> > + }
> > + }
> > +
> > + rcu_read_unlock();
> > + spin_unlock(&mm->ioctx_lock);
> > +}
> > +
> > static const struct file_operations aio_ring_fops = {
> > .mmap = aio_ring_mmap,
> > + .mremap = aio_ring_remap,
> > };
> >
> > static int aio_set_page_dirty(struct page *page)
> > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > index e11d60c..379bd75 100644
> > --- a/include/linux/fs.h
> > +++ b/include/linux/fs.h
> > @@ -1467,6 +1467,7 @@ struct file_operations {
> > long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
> > long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
> > int (*mmap) (struct file *, struct vm_area_struct *);
> > + void (*mremap)(struct file *, struct vm_area_struct *);
> > int (*open) (struct inode *, struct file *);
> > int (*flush) (struct file *, fl_owner_t id);
> > int (*release) (struct inode *, struct file *);
> > diff --git a/mm/mremap.c b/mm/mremap.c
> > index 05f1180..18200b9 100644
> > --- a/mm/mremap.c
> > +++ b/mm/mremap.c
> > @@ -287,7 +287,8 @@ static unsigned long move_vma(struct vm_area_struct *vma,
> > old_len = new_len;
> > old_addr = new_addr;
> > new_addr = -ENOMEM;
> > - }
> > + } else if (vma->vm_file && vma->vm_file->f_op->mremap)
> > + vma->vm_file->f_op->mremap(vma->vm_file, new_vma);
> >
> > /* Conceal VM_ACCOUNT so old reservation is not undone */
> > if (vm_flags & VM_ACCOUNT) {
> > --
> > 1.8.4.2
> >
> > --
> > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > the body to majordomo@kvack.org. For more info on Linux MM,
> > see: http://www.linux-mm.org/ .
> > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
"Thought is the essence of where you are now."
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] aio: Make it possible to remap aio ring
2014-09-25 15:13 ` Benjamin LaHaise
@ 2014-09-25 18:12 ` Zach Brown
0 siblings, 0 replies; 4+ messages in thread
From: Zach Brown @ 2014-09-25 18:12 UTC (permalink / raw)
To: Benjamin LaHaise
Cc: Dmitry Monakhov, Pavel Emelyanov, Andrew Morton, Hugh Dickins,
linux-aio, Linux MM
> > > To make restore possible I'm going to mremap() the freshly created ring
> > > into the address A (under which it was seen before dump).
> > > What do you think?
> > Look reasonable.
> > Feel free to add Acked-by:Dmitry Monakhov <dmonakhov@openvz.org>
> > >
> > > Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
>
> I've had a look over this patch, and it seems okay to me. The interaction
> with page migration looks safe, as well as with io_destroy(). I've applied
> this to my aio-next tree at git://git.kvack.org/~bcrl/aio-next.git . If
> mm folks have any concerns, please let me know.
I can chime in with generic support: the C/R folks have complained about
the implicit context mapping in the past. I'm all for letting them
explicitly re-establish it, though I didn't actually look at the patch.
- z
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2014-09-25 18:12 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-09-18 15:56 [PATCH] aio: Make it possible to remap aio ring Pavel Emelyanov
2014-09-25 12:18 ` Dmitry Monakhov
2014-09-25 15:13 ` Benjamin LaHaise
2014-09-25 18:12 ` Zach Brown
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).