* Re: [PATCH 2/2] mm,migration: Prevent rmap_walk_[anon|ksm] seeing the wrong VMA information
2010-04-26 22:37 ` [PATCH 2/2] mm,migration: Prevent rmap_walk_[anon|ksm] seeing the wrong VMA information Mel Gorman
@ 2010-04-26 23:05 ` Minchan Kim
2010-04-27 0:30 ` KAMEZAWA Hiroyuki
2010-04-27 9:17 ` Mel Gorman
2010-04-26 23:15 ` Minchan Kim
` (2 subsequent siblings)
3 siblings, 2 replies; 21+ messages in thread
From: Minchan Kim @ 2010-04-26 23:05 UTC (permalink / raw)
To: Mel Gorman
Cc: Linux-MM, LKML, KAMEZAWA Hiroyuki, Christoph Lameter,
Andrea Arcangeli, Rik van Riel, Andrew Morton
On Tue, Apr 27, 2010 at 7:37 AM, Mel Gorman <mel@csn.ul.ie> wrote:
> vma_adjust() is updating anon VMA information without any locks taken.
> In contrast, file-backed mappings use the i_mmap_lock and this lack of
> locking can result in races with page migration. During rmap_walk(),
> vma_address() can return -EFAULT for an address that will soon be valid.
> This leaves a dangling migration PTE behind which can later cause a BUG_ON
> to trigger when the page is faulted in.
>
> With the recent anon_vma changes, there can be more than one anon_vma->lock
> that can be taken in a anon_vma_chain but a second lock cannot be spinned
> upon in case of deadlock. Instead, the rmap walker tries to take locks of
> different anon_vma's. If the attempt fails, the operation is restarted.
>
> Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> ---
> mm/ksm.c | 13 +++++++++++++
> mm/mmap.c | 6 ++++++
> mm/rmap.c | 22 +++++++++++++++++++---
> 3 files changed, 38 insertions(+), 3 deletions(-)
>
> diff --git a/mm/ksm.c b/mm/ksm.c
> index 3666d43..baa5b4d 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -1674,9 +1674,22 @@ again:
> spin_lock(&anon_vma->lock);
> list_for_each_entry(vmac, &anon_vma->head, same_anon_vma) {
> vma = vmac->vma;
> +
> + /* See comment in mm/rmap.c#rmap_walk_anon on locking */
> + if (anon_vma != vma->anon_vma) {
> + if (!spin_trylock(&vma->anon_vma->lock)) {
> + spin_unlock(&anon_vma->lock);
> + goto again;
> + }
> + }
> +
> if (rmap_item->address < vma->vm_start ||
> rmap_item->address >= vma->vm_end)
> continue;
> +
> + if (anon_vma != vma->anon_vma)
> + spin_unlock(&vma->anon_vma->lock);
> +
> /*
> * Initially we examine only the vma which covers this
> * rmap_item; but later, if there is still work to do,
> diff --git a/mm/mmap.c b/mm/mmap.c
> index f90ea92..61d6f1d 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -578,6 +578,9 @@ again: remove_next = 1 + (end > next->vm_end);
> }
> }
>
> + if (vma->anon_vma)
> + spin_lock(&vma->anon_vma->lock);
> +
> if (root) {
> flush_dcache_mmap_lock(mapping);
> vma_prio_tree_remove(vma, root);
> @@ -620,6 +623,9 @@ again: remove_next = 1 + (end > next->vm_end);
> if (mapping)
> spin_unlock(&mapping->i_mmap_lock);
>
> + if (vma->anon_vma)
> + spin_unlock(&vma->anon_vma->lock);
> +
> if (remove_next) {
> if (file) {
> fput(file);
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 85f203e..bc313a6 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1368,15 +1368,31 @@ static int rmap_walk_anon(struct page *page, int (*rmap_one)(struct page *,
> * are holding mmap_sem. Users without mmap_sem are required to
> * take a reference count to prevent the anon_vma disappearing
> */
> +retry:
> anon_vma = page_anon_vma(page);
> if (!anon_vma)
> return ret;
> spin_lock(&anon_vma->lock);
> list_for_each_entry(avc, &anon_vma->head, same_anon_vma) {
> struct vm_area_struct *vma = avc->vma;
> - unsigned long address = vma_address(page, vma);
> - if (address == -EFAULT)
> - continue;
> + unsigned long address;
> +
> + /*
> + * Guard against deadlocks by not spinning against
> + * vma->anon_vma->lock. If contention is found, release our
> + * lock and try again until VMA list can be traversed without
> + * contention.
> + */
> + if (anon_vma != vma->anon_vma) {
> + if (!spin_trylock(&vma->anon_vma->lock)) {
> + spin_unlock(&anon_vma->lock);
> + goto retry;
> + }
> + }
> + address = vma_address(page, vma);
> + if (anon_vma != vma->anon_vma)
> + spin_unlock(&vma->anon_vma->lock);
> +
if (address == -EFAULT)
continue;
> ret = rmap_one(page, vma, address, arg);
> if (ret != SWAP_AGAIN)
> break;
> --
> 1.6.5
>
>
--
Kind regards,
Minchan Kim
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 2/2] mm,migration: Prevent rmap_walk_[anon|ksm] seeing the wrong VMA information
2010-04-26 23:05 ` Minchan Kim
@ 2010-04-27 0:30 ` KAMEZAWA Hiroyuki
2010-04-27 9:17 ` Mel Gorman
1 sibling, 0 replies; 21+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-04-27 0:30 UTC (permalink / raw)
To: Minchan Kim
Cc: Mel Gorman, Linux-MM, LKML, Christoph Lameter, Andrea Arcangeli,
Rik van Riel, Andrew Morton
On Tue, 27 Apr 2010 08:05:26 +0900
Minchan Kim <minchan.kim@gmail.com> wrote:
> On Tue, Apr 27, 2010 at 7:37 AM, Mel Gorman <mel@csn.ul.ie> wrote:
> > vma_adjust() is updating anon VMA information without any locks taken.
> > In contrast, file-backed mappings use the i_mmap_lock and this lack of
> > locking can result in races with page migration. During rmap_walk(),
> > vma_address() can return -EFAULT for an address that will soon be valid.
> > This leaves a dangling migration PTE behind which can later cause a BUG_ON
> > to trigger when the page is faulted in.
> >
> > With the recent anon_vma changes, there can be more than one anon_vma->lock
> > that can be taken in a anon_vma_chain but a second lock cannot be spinned
> > upon in case of deadlock. Instead, the rmap walker tries to take locks of
> > different anon_vma's. If the attempt fails, the operation is restarted.
> >
> > Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> > ---
> > A mm/ksm.c A | A 13 +++++++++++++
> > A mm/mmap.c | A A 6 ++++++
> > A mm/rmap.c | A 22 +++++++++++++++++++---
> > A 3 files changed, 38 insertions(+), 3 deletions(-)
> >
> > diff --git a/mm/ksm.c b/mm/ksm.c
> > index 3666d43..baa5b4d 100644
> > --- a/mm/ksm.c
> > +++ b/mm/ksm.c
> > @@ -1674,9 +1674,22 @@ again:
> > A A A A A A A A spin_lock(&anon_vma->lock);
> > A A A A A A A A list_for_each_entry(vmac, &anon_vma->head, same_anon_vma) {
> > A A A A A A A A A A A A vma = vmac->vma;
> > +
> > + A A A A A A A A A A A /* See comment in mm/rmap.c#rmap_walk_anon on locking */
> > + A A A A A A A A A A A if (anon_vma != vma->anon_vma) {
> > + A A A A A A A A A A A A A A A if (!spin_trylock(&vma->anon_vma->lock)) {
> > + A A A A A A A A A A A A A A A A A A A spin_unlock(&anon_vma->lock);
> > + A A A A A A A A A A A A A A A A A A A goto again;
> > + A A A A A A A A A A A A A A A }
> > + A A A A A A A A A A A }
> > +
> > A A A A A A A A A A A A if (rmap_item->address < vma->vm_start ||
> > A A A A A A A A A A A A A A rmap_item->address >= vma->vm_end)
> > A A A A A A A A A A A A A A A A continue;
> > +
> > + A A A A A A A A A A A if (anon_vma != vma->anon_vma)
> > + A A A A A A A A A A A A A A A spin_unlock(&vma->anon_vma->lock);
> > +
> > A A A A A A A A A A A A /*
> > A A A A A A A A A A A A * Initially we examine only the vma which covers this
> > A A A A A A A A A A A A * rmap_item; but later, if there is still work to do,
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index f90ea92..61d6f1d 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -578,6 +578,9 @@ again: A A A A A A A A A A A remove_next = 1 + (end > next->vm_end);
> > A A A A A A A A }
> > A A A A }
> >
> > + A A A if (vma->anon_vma)
> > + A A A A A A A spin_lock(&vma->anon_vma->lock);
> > +
> > A A A A if (root) {
> > A A A A A A A A flush_dcache_mmap_lock(mapping);
> > A A A A A A A A vma_prio_tree_remove(vma, root);
> > @@ -620,6 +623,9 @@ again: A A A A A A A A A A A remove_next = 1 + (end > next->vm_end);
> > A A A A if (mapping)
> > A A A A A A A A spin_unlock(&mapping->i_mmap_lock);
> >
> > + A A A if (vma->anon_vma)
> > + A A A A A A A spin_unlock(&vma->anon_vma->lock);
> > +
> > A A A A if (remove_next) {
> > A A A A A A A A if (file) {
> > A A A A A A A A A A A A fput(file);
> > diff --git a/mm/rmap.c b/mm/rmap.c
> > index 85f203e..bc313a6 100644
> > --- a/mm/rmap.c
> > +++ b/mm/rmap.c
> > @@ -1368,15 +1368,31 @@ static int rmap_walk_anon(struct page *page, int (*rmap_one)(struct page *,
> > A A A A * are holding mmap_sem. Users without mmap_sem are required to
> > A A A A * take a reference count to prevent the anon_vma disappearing
> > A A A A */
> > +retry:
> > A A A A anon_vma = page_anon_vma(page);
> > A A A A if (!anon_vma)
> > A A A A A A A A return ret;
> > A A A A spin_lock(&anon_vma->lock);
> > A A A A list_for_each_entry(avc, &anon_vma->head, same_anon_vma) {
> > A A A A A A A A struct vm_area_struct *vma = avc->vma;
> > - A A A A A A A unsigned long address = vma_address(page, vma);
> > - A A A A A A A if (address == -EFAULT)
> > - A A A A A A A A A A A continue;
> > + A A A A A A A unsigned long address;
> > +
> > + A A A A A A A /*
> > + A A A A A A A A * Guard against deadlocks by not spinning against
> > + A A A A A A A A * vma->anon_vma->lock. If contention is found, release our
> > + A A A A A A A A * lock and try again until VMA list can be traversed without
> > + A A A A A A A A * contention.
> > + A A A A A A A A */
> > + A A A A A A A if (anon_vma != vma->anon_vma) {
> > + A A A A A A A A A A A if (!spin_trylock(&vma->anon_vma->lock)) {
> > + A A A A A A A A A A A A A A A spin_unlock(&anon_vma->lock);
> > + A A A A A A A A A A A A A A A goto retry;
> > + A A A A A A A A A A A }
> > + A A A A A A A }
> > + A A A A A A A address = vma_address(page, vma);
> > + A A A A A A A if (anon_vma != vma->anon_vma)
> > + A A A A A A A A A A A spin_unlock(&vma->anon_vma->lock);
> > +
>
> if (address == -EFAULT)
> continue;
>
yes. thank you for pointing out.
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 2/2] mm,migration: Prevent rmap_walk_[anon|ksm] seeing the wrong VMA information
2010-04-26 23:05 ` Minchan Kim
2010-04-27 0:30 ` KAMEZAWA Hiroyuki
@ 2010-04-27 9:17 ` Mel Gorman
1 sibling, 0 replies; 21+ messages in thread
From: Mel Gorman @ 2010-04-27 9:17 UTC (permalink / raw)
To: Minchan Kim
Cc: Linux-MM, LKML, KAMEZAWA Hiroyuki, Christoph Lameter,
Andrea Arcangeli, Rik van Riel, Andrew Morton
On Tue, Apr 27, 2010 at 08:05:26AM +0900, Minchan Kim wrote:
> On Tue, Apr 27, 2010 at 7:37 AM, Mel Gorman <mel@csn.ul.ie> wrote:
> > vma_adjust() is updating anon VMA information without any locks taken.
> > In contrast, file-backed mappings use the i_mmap_lock and this lack of
> > locking can result in races with page migration. During rmap_walk(),
> > vma_address() can return -EFAULT for an address that will soon be valid.
> > This leaves a dangling migration PTE behind which can later cause a BUG_ON
> > to trigger when the page is faulted in.
> >
> > With the recent anon_vma changes, there can be more than one anon_vma->lock
> > that can be taken in a anon_vma_chain but a second lock cannot be spinned
> > upon in case of deadlock. Instead, the rmap walker tries to take locks of
> > different anon_vma's. If the attempt fails, the operation is restarted.
> >
> > Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> > ---
> > mm/ksm.c | 13 +++++++++++++
> > mm/mmap.c | 6 ++++++
> > mm/rmap.c | 22 +++++++++++++++++++---
> > 3 files changed, 38 insertions(+), 3 deletions(-)
> >
> > diff --git a/mm/ksm.c b/mm/ksm.c
> > index 3666d43..baa5b4d 100644
> > --- a/mm/ksm.c
> > +++ b/mm/ksm.c
> > @@ -1674,9 +1674,22 @@ again:
> > spin_lock(&anon_vma->lock);
> > list_for_each_entry(vmac, &anon_vma->head, same_anon_vma) {
> > vma = vmac->vma;
> > +
> > + /* See comment in mm/rmap.c#rmap_walk_anon on locking */
> > + if (anon_vma != vma->anon_vma) {
> > + if (!spin_trylock(&vma->anon_vma->lock)) {
> > + spin_unlock(&anon_vma->lock);
> > + goto again;
> > + }
> > + }
> > +
> > if (rmap_item->address < vma->vm_start ||
> > rmap_item->address >= vma->vm_end)
> > continue;
> > +
> > + if (anon_vma != vma->anon_vma)
> > + spin_unlock(&vma->anon_vma->lock);
> > +
> > /*
> > * Initially we examine only the vma which covers this
> > * rmap_item; but later, if there is still work to do,
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index f90ea92..61d6f1d 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -578,6 +578,9 @@ again: remove_next = 1 + (end > next->vm_end);
> > }
> > }
> >
> > + if (vma->anon_vma)
> > + spin_lock(&vma->anon_vma->lock);
> > +
> > if (root) {
> > flush_dcache_mmap_lock(mapping);
> > vma_prio_tree_remove(vma, root);
> > @@ -620,6 +623,9 @@ again: remove_next = 1 + (end > next->vm_end);
> > if (mapping)
> > spin_unlock(&mapping->i_mmap_lock);
> >
> > + if (vma->anon_vma)
> > + spin_unlock(&vma->anon_vma->lock);
> > +
> > if (remove_next) {
> > if (file) {
> > fput(file);
> > diff --git a/mm/rmap.c b/mm/rmap.c
> > index 85f203e..bc313a6 100644
> > --- a/mm/rmap.c
> > +++ b/mm/rmap.c
> > @@ -1368,15 +1368,31 @@ static int rmap_walk_anon(struct page *page, int (*rmap_one)(struct page *,
> > * are holding mmap_sem. Users without mmap_sem are required to
> > * take a reference count to prevent the anon_vma disappearing
> > */
> > +retry:
> > anon_vma = page_anon_vma(page);
> > if (!anon_vma)
> > return ret;
> > spin_lock(&anon_vma->lock);
> > list_for_each_entry(avc, &anon_vma->head, same_anon_vma) {
> > struct vm_area_struct *vma = avc->vma;
> > - unsigned long address = vma_address(page, vma);
> > - if (address == -EFAULT)
> > - continue;
> > + unsigned long address;
> > +
> > + /*
> > + * Guard against deadlocks by not spinning against
> > + * vma->anon_vma->lock. If contention is found, release our
> > + * lock and try again until VMA list can be traversed without
> > + * contention.
> > + */
> > + if (anon_vma != vma->anon_vma) {
> > + if (!spin_trylock(&vma->anon_vma->lock)) {
> > + spin_unlock(&anon_vma->lock);
> > + goto retry;
> > + }
> > + }
> > + address = vma_address(page, vma);
> > + if (anon_vma != vma->anon_vma)
> > + spin_unlock(&vma->anon_vma->lock);
> > +
>
> if (address == -EFAULT)
> continue;
>
Correct. Thanks.
> > ret = rmap_one(page, vma, address, arg);
> > if (ret != SWAP_AGAIN)
> > break;
> > --
> > 1.6.5
> >
> >
>
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 2/2] mm,migration: Prevent rmap_walk_[anon|ksm] seeing the wrong VMA information
2010-04-26 22:37 ` [PATCH 2/2] mm,migration: Prevent rmap_walk_[anon|ksm] seeing the wrong VMA information Mel Gorman
2010-04-26 23:05 ` Minchan Kim
@ 2010-04-26 23:15 ` Minchan Kim
2010-04-27 0:07 ` KAMEZAWA Hiroyuki
2010-04-27 0:30 ` Rik van Riel
3 siblings, 0 replies; 21+ messages in thread
From: Minchan Kim @ 2010-04-26 23:15 UTC (permalink / raw)
To: Mel Gorman
Cc: Linux-MM, LKML, KAMEZAWA Hiroyuki, Christoph Lameter,
Andrea Arcangeli, Rik van Riel, Andrew Morton
On Tue, Apr 27, 2010 at 7:37 AM, Mel Gorman <mel@csn.ul.ie> wrote:
> vma_adjust() is updating anon VMA information without any locks taken.
> In contrast, file-backed mappings use the i_mmap_lock and this lack of
> locking can result in races with page migration. During rmap_walk(),
> vma_address() can return -EFAULT for an address that will soon be valid.
> This leaves a dangling migration PTE behind which can later cause a BUG_ON
> to trigger when the page is faulted in.
>
> With the recent anon_vma changes, there can be more than one anon_vma->lock
> that can be taken in a anon_vma_chain but a second lock cannot be spinned
> upon in case of deadlock. Instead, the rmap walker tries to take locks of
> different anon_vma's. If the attempt fails, the operation is restarted.
>
> Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Actually, I am worry about rollback approach like this.
If we don't often need anon_vmas serializing, that's enough.
But I am not sure how often we need locking of anon_vmas like this.
Whenever we need it, we have to use rollback approach like this in future.
In my opinion, it's not good.
Rik, can't we make anon_vma locks more simple?
Anyway, Mel's patch is best now.
I hope improving locks of anon_vmas without rollback approach in near future.
Thanks, Mel.
--
Kind regards,
Minchan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 2/2] mm,migration: Prevent rmap_walk_[anon|ksm] seeing the wrong VMA information
2010-04-26 22:37 ` [PATCH 2/2] mm,migration: Prevent rmap_walk_[anon|ksm] seeing the wrong VMA information Mel Gorman
2010-04-26 23:05 ` Minchan Kim
2010-04-26 23:15 ` Minchan Kim
@ 2010-04-27 0:07 ` KAMEZAWA Hiroyuki
2010-04-27 3:50 ` KAMEZAWA Hiroyuki
2010-04-27 0:30 ` Rik van Riel
3 siblings, 1 reply; 21+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-04-27 0:07 UTC (permalink / raw)
To: Mel Gorman
Cc: Linux-MM, LKML, Minchan Kim, Christoph Lameter, Andrea Arcangeli,
Rik van Riel, Andrew Morton
On Mon, 26 Apr 2010 23:37:58 +0100
Mel Gorman <mel@csn.ul.ie> wrote:
> vma_adjust() is updating anon VMA information without any locks taken.
> In contrast, file-backed mappings use the i_mmap_lock and this lack of
> locking can result in races with page migration. During rmap_walk(),
> vma_address() can return -EFAULT for an address that will soon be valid.
> This leaves a dangling migration PTE behind which can later cause a BUG_ON
> to trigger when the page is faulted in.
>
> With the recent anon_vma changes, there can be more than one anon_vma->lock
> that can be taken in a anon_vma_chain but a second lock cannot be spinned
> upon in case of deadlock. Instead, the rmap walker tries to take locks of
> different anon_vma's. If the attempt fails, the operation is restarted.
>
> Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Ok, acquiring vma->anon_vma->spin_lock always sounds very safe.
(but slow.)
I'll test this, too.
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
> mm/ksm.c | 13 +++++++++++++
> mm/mmap.c | 6 ++++++
> mm/rmap.c | 22 +++++++++++++++++++---
> 3 files changed, 38 insertions(+), 3 deletions(-)
>
> diff --git a/mm/ksm.c b/mm/ksm.c
> index 3666d43..baa5b4d 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -1674,9 +1674,22 @@ again:
> spin_lock(&anon_vma->lock);
> list_for_each_entry(vmac, &anon_vma->head, same_anon_vma) {
> vma = vmac->vma;
> +
> + /* See comment in mm/rmap.c#rmap_walk_anon on locking */
> + if (anon_vma != vma->anon_vma) {
> + if (!spin_trylock(&vma->anon_vma->lock)) {
> + spin_unlock(&anon_vma->lock);
> + goto again;
> + }
> + }
> +
> if (rmap_item->address < vma->vm_start ||
> rmap_item->address >= vma->vm_end)
> continue;
> +
> + if (anon_vma != vma->anon_vma)
> + spin_unlock(&vma->anon_vma->lock);
> +
> /*
> * Initially we examine only the vma which covers this
> * rmap_item; but later, if there is still work to do,
> diff --git a/mm/mmap.c b/mm/mmap.c
> index f90ea92..61d6f1d 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -578,6 +578,9 @@ again: remove_next = 1 + (end > next->vm_end);
> }
> }
>
> + if (vma->anon_vma)
> + spin_lock(&vma->anon_vma->lock);
> +
> if (root) {
> flush_dcache_mmap_lock(mapping);
> vma_prio_tree_remove(vma, root);
> @@ -620,6 +623,9 @@ again: remove_next = 1 + (end > next->vm_end);
> if (mapping)
> spin_unlock(&mapping->i_mmap_lock);
>
> + if (vma->anon_vma)
> + spin_unlock(&vma->anon_vma->lock);
> +
> if (remove_next) {
> if (file) {
> fput(file);
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 85f203e..bc313a6 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1368,15 +1368,31 @@ static int rmap_walk_anon(struct page *page, int (*rmap_one)(struct page *,
> * are holding mmap_sem. Users without mmap_sem are required to
> * take a reference count to prevent the anon_vma disappearing
> */
> +retry:
> anon_vma = page_anon_vma(page);
> if (!anon_vma)
> return ret;
> spin_lock(&anon_vma->lock);
> list_for_each_entry(avc, &anon_vma->head, same_anon_vma) {
> struct vm_area_struct *vma = avc->vma;
> - unsigned long address = vma_address(page, vma);
> - if (address == -EFAULT)
> - continue;
> + unsigned long address;
> +
> + /*
> + * Guard against deadlocks by not spinning against
> + * vma->anon_vma->lock. If contention is found, release our
> + * lock and try again until VMA list can be traversed without
> + * contention.
> + */
> + if (anon_vma != vma->anon_vma) {
> + if (!spin_trylock(&vma->anon_vma->lock)) {
> + spin_unlock(&anon_vma->lock);
> + goto retry;
> + }
> + }
> + address = vma_address(page, vma);
> + if (anon_vma != vma->anon_vma)
> + spin_unlock(&vma->anon_vma->lock);
> +
> ret = rmap_one(page, vma, address, arg);
> if (ret != SWAP_AGAIN)
> break;
> --
> 1.6.5
>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 2/2] mm,migration: Prevent rmap_walk_[anon|ksm] seeing the wrong VMA information
2010-04-27 0:07 ` KAMEZAWA Hiroyuki
@ 2010-04-27 3:50 ` KAMEZAWA Hiroyuki
2010-04-27 4:03 ` KAMEZAWA Hiroyuki
2010-04-27 8:59 ` Mel Gorman
0 siblings, 2 replies; 21+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-04-27 3:50 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: Mel Gorman, Linux-MM, LKML, Minchan Kim, Christoph Lameter,
Andrea Arcangeli, Rik van Riel, Andrew Morton
On Tue, 27 Apr 2010 09:07:06 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Mon, 26 Apr 2010 23:37:58 +0100
> Mel Gorman <mel@csn.ul.ie> wrote:
>
> > vma_adjust() is updating anon VMA information without any locks taken.
> > In contrast, file-backed mappings use the i_mmap_lock and this lack of
> > locking can result in races with page migration. During rmap_walk(),
> > vma_address() can return -EFAULT for an address that will soon be valid.
> > This leaves a dangling migration PTE behind which can later cause a BUG_ON
> > to trigger when the page is faulted in.
> >
> > With the recent anon_vma changes, there can be more than one anon_vma->lock
> > that can be taken in a anon_vma_chain but a second lock cannot be spinned
> > upon in case of deadlock. Instead, the rmap walker tries to take locks of
> > different anon_vma's. If the attempt fails, the operation is restarted.
> >
> > Signed-off-by: Mel Gorman <mel@csn.ul.ie>
>
> Ok, acquiring vma->anon_vma->spin_lock always sounds very safe.
> (but slow.)
>
> I'll test this, too.
>
> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>
Sorry. reproduced. It seems the same bug before patch.
mapcount 1 -> unmap -> remap -> mapcount 0. And it was SwapCache.
Thanks,
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 2/2] mm,migration: Prevent rmap_walk_[anon|ksm] seeing the wrong VMA information
2010-04-27 3:50 ` KAMEZAWA Hiroyuki
@ 2010-04-27 4:03 ` KAMEZAWA Hiroyuki
2010-04-27 8:59 ` Mel Gorman
1 sibling, 0 replies; 21+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-04-27 4:03 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: Mel Gorman, Linux-MM, LKML, Minchan Kim, Christoph Lameter,
Andrea Arcangeli, Rik van Riel, Andrew Morton
On Tue, 27 Apr 2010 12:50:40 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Tue, 27 Apr 2010 09:07:06 +0900
> KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
>
> > On Mon, 26 Apr 2010 23:37:58 +0100
> > Mel Gorman <mel@csn.ul.ie> wrote:
> >
> > > vma_adjust() is updating anon VMA information without any locks taken.
> > > In contrast, file-backed mappings use the i_mmap_lock and this lack of
> > > locking can result in races with page migration. During rmap_walk(),
> > > vma_address() can return -EFAULT for an address that will soon be valid.
> > > This leaves a dangling migration PTE behind which can later cause a BUG_ON
> > > to trigger when the page is faulted in.
> > >
> > > With the recent anon_vma changes, there can be more than one anon_vma->lock
> > > that can be taken in a anon_vma_chain but a second lock cannot be spinned
> > > upon in case of deadlock. Instead, the rmap walker tries to take locks of
> > > different anon_vma's. If the attempt fails, the operation is restarted.
> > >
> > > Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> >
> > Ok, acquiring vma->anon_vma->spin_lock always sounds very safe.
> > (but slow.)
> >
> > I'll test this, too.
> >
> > Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> >
>
> Sorry. reproduced. It seems the same bug before patch.
> mapcount 1 -> unmap -> remap -> mapcount 0. And it was SwapCache.
>
Sorry again. It was _not_ SwapCache. Just SwapBacked.
Thanks,
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 2/2] mm,migration: Prevent rmap_walk_[anon|ksm] seeing the wrong VMA information
2010-04-27 3:50 ` KAMEZAWA Hiroyuki
2010-04-27 4:03 ` KAMEZAWA Hiroyuki
@ 2010-04-27 8:59 ` Mel Gorman
2010-04-27 9:09 ` KAMEZAWA Hiroyuki
1 sibling, 1 reply; 21+ messages in thread
From: Mel Gorman @ 2010-04-27 8:59 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: Linux-MM, LKML, Minchan Kim, Christoph Lameter, Andrea Arcangeli,
Rik van Riel, Andrew Morton
On Tue, Apr 27, 2010 at 12:50:40PM +0900, KAMEZAWA Hiroyuki wrote:
> On Tue, 27 Apr 2010 09:07:06 +0900
> KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
>
> > On Mon, 26 Apr 2010 23:37:58 +0100
> > Mel Gorman <mel@csn.ul.ie> wrote:
> >
> > > vma_adjust() is updating anon VMA information without any locks taken.
> > > In contrast, file-backed mappings use the i_mmap_lock and this lack of
> > > locking can result in races with page migration. During rmap_walk(),
> > > vma_address() can return -EFAULT for an address that will soon be valid.
> > > This leaves a dangling migration PTE behind which can later cause a BUG_ON
> > > to trigger when the page is faulted in.
> > >
> > > With the recent anon_vma changes, there can be more than one anon_vma->lock
> > > that can be taken in a anon_vma_chain but a second lock cannot be spinned
> > > upon in case of deadlock. Instead, the rmap walker tries to take locks of
> > > different anon_vma's. If the attempt fails, the operation is restarted.
> > >
> > > Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> >
> > Ok, acquiring vma->anon_vma->spin_lock always sounds very safe.
> > (but slow.)
> >
> > I'll test this, too.
> >
> > Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> >
>
> Sorry. reproduced. It seems the same bug before patch.
> mapcount 1 -> unmap -> remap -> mapcount 0. And it was SwapCache.
>
Same here, reproduced after 18 hours.
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 2/2] mm,migration: Prevent rmap_walk_[anon|ksm] seeing the wrong VMA information
2010-04-27 8:59 ` Mel Gorman
@ 2010-04-27 9:09 ` KAMEZAWA Hiroyuki
2010-04-27 10:29 ` Mel Gorman
0 siblings, 1 reply; 21+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-04-27 9:09 UTC (permalink / raw)
To: Mel Gorman
Cc: Linux-MM, LKML, Minchan Kim, Christoph Lameter, Andrea Arcangeli,
Rik van Riel, Andrew Morton
On Tue, 27 Apr 2010 09:59:51 +0100
Mel Gorman <mel@csn.ul.ie> wrote:
> On Tue, Apr 27, 2010 at 12:50:40PM +0900, KAMEZAWA Hiroyuki wrote:
> > On Tue, 27 Apr 2010 09:07:06 +0900
> > KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> >
> > > On Mon, 26 Apr 2010 23:37:58 +0100
> > > Mel Gorman <mel@csn.ul.ie> wrote:
> > >
> > > > vma_adjust() is updating anon VMA information without any locks taken.
> > > > In contrast, file-backed mappings use the i_mmap_lock and this lack of
> > > > locking can result in races with page migration. During rmap_walk(),
> > > > vma_address() can return -EFAULT for an address that will soon be valid.
> > > > This leaves a dangling migration PTE behind which can later cause a BUG_ON
> > > > to trigger when the page is faulted in.
> > > >
> > > > With the recent anon_vma changes, there can be more than one anon_vma->lock
> > > > that can be taken in a anon_vma_chain but a second lock cannot be spinned
> > > > upon in case of deadlock. Instead, the rmap walker tries to take locks of
> > > > different anon_vma's. If the attempt fails, the operation is restarted.
> > > >
> > > > Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> > >
> > > Ok, acquiring vma->anon_vma->spin_lock always sounds very safe.
> > > (but slow.)
> > >
> > > I'll test this, too.
> > >
> > > Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > >
> >
> > Sorry. reproduced. It seems the same bug before patch.
> > mapcount 1 -> unmap -> remap -> mapcount 0. And it was SwapCache.
> >
>
> Same here, reproduced after 18 hours.
>
Hmm. It seems rmap_one() is called and the race is not in vma_address()
but in remap_migration_pte().
So, I added more hooks for debug..but not reproduced yet.
(But I doubt my debug code, too ;)
But it seems strange to have a race in remap_migration_pte(), so, I doubt
my debug code, too.
Thanks,
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 2/2] mm,migration: Prevent rmap_walk_[anon|ksm] seeing the wrong VMA information
2010-04-27 9:09 ` KAMEZAWA Hiroyuki
@ 2010-04-27 10:29 ` Mel Gorman
2010-04-27 15:37 ` Andrea Arcangeli
0 siblings, 1 reply; 21+ messages in thread
From: Mel Gorman @ 2010-04-27 10:29 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: Linux-MM, LKML, Minchan Kim, Christoph Lameter, Andrea Arcangeli,
Rik van Riel, Andrew Morton
On Tue, Apr 27, 2010 at 06:09:49PM +0900, KAMEZAWA Hiroyuki wrote:
> On Tue, 27 Apr 2010 09:59:51 +0100
> Mel Gorman <mel@csn.ul.ie> wrote:
>
> > On Tue, Apr 27, 2010 at 12:50:40PM +0900, KAMEZAWA Hiroyuki wrote:
> > > On Tue, 27 Apr 2010 09:07:06 +0900
> > > KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > >
> > > > On Mon, 26 Apr 2010 23:37:58 +0100
> > > > Mel Gorman <mel@csn.ul.ie> wrote:
> > > >
> > > > > vma_adjust() is updating anon VMA information without any locks taken.
> > > > > In contrast, file-backed mappings use the i_mmap_lock and this lack of
> > > > > locking can result in races with page migration. During rmap_walk(),
> > > > > vma_address() can return -EFAULT for an address that will soon be valid.
> > > > > This leaves a dangling migration PTE behind which can later cause a BUG_ON
> > > > > to trigger when the page is faulted in.
> > > > >
> > > > > With the recent anon_vma changes, there can be more than one anon_vma->lock
> > > > > that can be taken in a anon_vma_chain but a second lock cannot be spinned
> > > > > upon in case of deadlock. Instead, the rmap walker tries to take locks of
> > > > > different anon_vma's. If the attempt fails, the operation is restarted.
> > > > >
> > > > > Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> > > >
> > > > Ok, acquiring vma->anon_vma->spin_lock always sounds very safe.
> > > > (but slow.)
> > > >
> > > > I'll test this, too.
> > > >
> > > > Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > > >
> > >
> > > Sorry. reproduced. It seems the same bug before patch.
> > > mapcount 1 -> unmap -> remap -> mapcount 0. And it was SwapCache.
> > >
> >
> > Same here, reproduced after 18 hours.
> >
> Hmm. It seems rmap_one() is called and the race is not in vma_address()
> but in remap_migration_pte().
It could have been in both but the vma lock should have been held across
the rmap_one. It still reproduces but it's still the right thing to do.
This is the current version of patch 2/2.
==== CUT HERE ====
[PATCH] mm,migration: Prevent rmap_walk_[anon|ksm] seeing the wrong VMA information
vma_adjust() is updating anon VMA information without any locks taken.
In contrast, file-backed mappings use the i_mmap_lock and this lack of
locking can result in races with page migration. During rmap_walk(),
vma_address() can return -EFAULT for an address that will soon be valid.
This leaves a dangling migration PTE behind which can later cause a BUG_ON
to trigger when the page is faulted in.
With the recent anon_vma changes, there can be more than one anon_vma->lock
that can be taken in a anon_vma_chain but a second lock cannot be spinned
upon in case of deadlock. Instead, the rmap walker tries to take locks of
different anon_vma's. If the attempt fails, the operation is restarted.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
mm/ksm.c | 19 +++++++++++++++++--
mm/mmap.c | 6 ++++++
mm/rmap.c | 27 +++++++++++++++++++++++----
3 files changed, 46 insertions(+), 6 deletions(-)
diff --git a/mm/ksm.c b/mm/ksm.c
index 3666d43..87c7531 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -1674,9 +1674,19 @@ again:
spin_lock(&anon_vma->lock);
list_for_each_entry(vmac, &anon_vma->head, same_anon_vma) {
vma = vmac->vma;
+
+ /* See comment in mm/rmap.c#rmap_walk_anon on locking */
+ if (anon_vma != vma->anon_vma) {
+ if (!spin_trylock(&vma->anon_vma->lock)) {
+ spin_unlock(&anon_vma->lock);
+ goto again;
+ }
+ }
+
if (rmap_item->address < vma->vm_start ||
rmap_item->address >= vma->vm_end)
- continue;
+ goto next_vma;
+
/*
* Initially we examine only the vma which covers this
* rmap_item; but later, if there is still work to do,
@@ -1684,9 +1694,14 @@ again:
* were forked from the original since ksmd passed.
*/
if ((rmap_item->mm == vma->vm_mm) == search_new_forks)
- continue;
+ goto next_vma;
ret = rmap_one(page, vma, rmap_item->address, arg);
+
+next_vma:
+ if (anon_vma != vma->anon_vma)
+ spin_unlock(&vma->anon_vma->lock);
+
if (ret != SWAP_AGAIN) {
spin_unlock(&anon_vma->lock);
goto out;
diff --git a/mm/mmap.c b/mm/mmap.c
index f90ea92..61d6f1d 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -578,6 +578,9 @@ again: remove_next = 1 + (end > next->vm_end);
}
}
+ if (vma->anon_vma)
+ spin_lock(&vma->anon_vma->lock);
+
if (root) {
flush_dcache_mmap_lock(mapping);
vma_prio_tree_remove(vma, root);
@@ -620,6 +623,9 @@ again: remove_next = 1 + (end > next->vm_end);
if (mapping)
spin_unlock(&mapping->i_mmap_lock);
+ if (vma->anon_vma)
+ spin_unlock(&vma->anon_vma->lock);
+
if (remove_next) {
if (file) {
fput(file);
diff --git a/mm/rmap.c b/mm/rmap.c
index 85f203e..7c2b7a9 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1368,18 +1368,37 @@ static int rmap_walk_anon(struct page *page, int (*rmap_one)(struct page *,
* are holding mmap_sem. Users without mmap_sem are required to
* take a reference count to prevent the anon_vma disappearing
*/
+retry:
anon_vma = page_anon_vma(page);
if (!anon_vma)
return ret;
spin_lock(&anon_vma->lock);
list_for_each_entry(avc, &anon_vma->head, same_anon_vma) {
struct vm_area_struct *vma = avc->vma;
- unsigned long address = vma_address(page, vma);
- if (address == -EFAULT)
- continue;
- ret = rmap_one(page, vma, address, arg);
+ unsigned long address;
+
+ /*
+ * Guard against deadlocks by not spinning against
+ * vma->anon_vma->lock. If contention is found, release our lock and
+ * try again until VMA list can be traversed without worrying about
+ * the details of the VMA changing underneath us.
+ */
+ if (anon_vma != vma->anon_vma) {
+ if (!spin_trylock(&vma->anon_vma->lock)) {
+ spin_unlock(&anon_vma->lock);
+ goto retry;
+ }
+ }
+ address = vma_address(page, vma);
+ if (address != -EFAULT)
+ ret = rmap_one(page, vma, address, arg);
+
+ if (anon_vma != vma->anon_vma)
+ spin_unlock(&vma->anon_vma->lock);
+
if (ret != SWAP_AGAIN)
break;
+
}
spin_unlock(&anon_vma->lock);
return ret;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH 2/2] mm,migration: Prevent rmap_walk_[anon|ksm] seeing the wrong VMA information
2010-04-27 10:29 ` Mel Gorman
@ 2010-04-27 15:37 ` Andrea Arcangeli
2010-04-27 16:35 ` Mel Gorman
0 siblings, 1 reply; 21+ messages in thread
From: Andrea Arcangeli @ 2010-04-27 15:37 UTC (permalink / raw)
To: Mel Gorman
Cc: KAMEZAWA Hiroyuki, Linux-MM, LKML, Minchan Kim, Christoph Lameter,
Rik van Riel, Andrew Morton
On Tue, Apr 27, 2010 at 11:29:05AM +0100, Mel Gorman wrote:
> It could have been in both but the vma lock should have been held across
> the rmap_one. It still reproduces but it's still the right thing to do.
> This is the current version of patch 2/2.
Well, keep in mind I reproduced the swapops bug with 2.6.33 anon-vma
code, it's unlikely that focusing on patch 2 you'll fix bug in
swapops.h. If this is a bug in the new anon-vma code, it needs fixing
of course! But I doubt this bug is related to swapops in execve on the
bprm->p args.
I've yet to check in detail patch 1 sorry, I'll let you know my
opinion about it as soon as I checked it in detail.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 2/2] mm,migration: Prevent rmap_walk_[anon|ksm] seeing the wrong VMA information
2010-04-27 15:37 ` Andrea Arcangeli
@ 2010-04-27 16:35 ` Mel Gorman
0 siblings, 0 replies; 21+ messages in thread
From: Mel Gorman @ 2010-04-27 16:35 UTC (permalink / raw)
To: Andrea Arcangeli
Cc: KAMEZAWA Hiroyuki, Linux-MM, LKML, Minchan Kim, Christoph Lameter,
Rik van Riel, Andrew Morton
On Tue, Apr 27, 2010 at 05:37:59PM +0200, Andrea Arcangeli wrote:
> On Tue, Apr 27, 2010 at 11:29:05AM +0100, Mel Gorman wrote:
> > It could have been in both but the vma lock should have been held across
> > the rmap_one. It still reproduces but it's still the right thing to do.
> > This is the current version of patch 2/2.
>
> Well, keep in mind I reproduced the swapops bug with 2.6.33 anon-vma
> code, it's unlikely that focusing on patch 2 you'll fix bug in
> swapops.h. If this is a bug in the new anon-vma code, it needs fixing
> of course! But I doubt this bug is related to swapops in execve on the
> bprm->p args.
>
Why do you doubt it's unrelated to execve? From what I've seen during the day,
there is a race in execve where the VMA gets moved (under the anon_vma lock)
before the page-tables are copied with move_ptes (without a lock). In that
case, execve can encounter and copy migration ptes before migration removes
them. This also applies to mainline because it is only taking the RCU lock
and not the anon_vma->lock.
I have a prototype that "handles" the situation with the new anon_vma
code by removing the migration ptes it finds while moving page tables
but it needs more work before releasing.
An alternative would be to split vma_adjust() into locked and unlocked
versions. shift_arg_pages() could then take the anon_vma lock to lock
both the VMA move and the pagetable copy here.
/*
* cover the whole range: [new_start, old_end)
*/
if (vma_adjust(vma, new_start, old_end, vma->vm_pgoff, NULL))
return -ENOMEM;
/*
* move the page tables downwards, on failure we rely on
* process cleanup to remove whatever mess we made.
*/
if (length != move_page_tables(vma, old_start,
vma, new_start, length))
return -ENOMEM;
It'd be messy to split up the locking of vma_adjust like this though and
exec() will hold the anon_vma locks for longer just to guard against
migration. It's not clear this is better than having move_ptes handle
the
> I've yet to check in detail patch 1 sorry, I'll let you know my
> opinion about it as soon as I checked it in detail.
>
No problem. I still need to revisit all of these patches once I am
confident the swapops bug cannot be triggered any more.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 2/2] mm,migration: Prevent rmap_walk_[anon|ksm] seeing the wrong VMA information
2010-04-26 22:37 ` [PATCH 2/2] mm,migration: Prevent rmap_walk_[anon|ksm] seeing the wrong VMA information Mel Gorman
` (2 preceding siblings ...)
2010-04-27 0:07 ` KAMEZAWA Hiroyuki
@ 2010-04-27 0:30 ` Rik van Riel
2010-04-27 0:31 ` KAMEZAWA Hiroyuki
3 siblings, 1 reply; 21+ messages in thread
From: Rik van Riel @ 2010-04-27 0:30 UTC (permalink / raw)
To: Mel Gorman
Cc: Linux-MM, LKML, Minchan Kim, KAMEZAWA Hiroyuki, Christoph Lameter,
Andrea Arcangeli, Andrew Morton
On 04/26/2010 06:37 PM, Mel Gorman wrote:
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 85f203e..bc313a6 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1368,15 +1368,31 @@ static int rmap_walk_anon(struct page *page, int (*rmap_one)(struct page *,
> * are holding mmap_sem. Users without mmap_sem are required to
> * take a reference count to prevent the anon_vma disappearing
> */
> +retry:
> anon_vma = page_anon_vma(page);
> if (!anon_vma)
> return ret;
> spin_lock(&anon_vma->lock);
> list_for_each_entry(avc,&anon_vma->head, same_anon_vma) {
> struct vm_area_struct *vma = avc->vma;
> - unsigned long address = vma_address(page, vma);
> - if (address == -EFAULT)
> - continue;
> + unsigned long address;
> +
> + /*
> + * Guard against deadlocks by not spinning against
> + * vma->anon_vma->lock. If contention is found, release our
> + * lock and try again until VMA list can be traversed without
> + * contention.
> + */
> + if (anon_vma != vma->anon_vma) {
> + if (!spin_trylock(&vma->anon_vma->lock)) {
> + spin_unlock(&anon_vma->lock);
> + goto retry;
> + }
> + }
If you're part way down the list, surely you'll need to
unlock multiple anon_vmas here before going to retry?
Otherwise you could forget to unlock one that you already
locked, and live-lock here.
--
All rights reversed
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 2/2] mm,migration: Prevent rmap_walk_[anon|ksm] seeing the wrong VMA information
2010-04-27 0:30 ` Rik van Riel
@ 2010-04-27 0:31 ` KAMEZAWA Hiroyuki
2010-04-27 2:13 ` Rik van Riel
0 siblings, 1 reply; 21+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-04-27 0:31 UTC (permalink / raw)
To: Rik van Riel
Cc: Mel Gorman, Linux-MM, LKML, Minchan Kim, Christoph Lameter,
Andrea Arcangeli, Andrew Morton
On Mon, 26 Apr 2010 20:30:41 -0400
Rik van Riel <riel@redhat.com> wrote:
> On 04/26/2010 06:37 PM, Mel Gorman wrote:
>
> > diff --git a/mm/rmap.c b/mm/rmap.c
> > index 85f203e..bc313a6 100644
> > --- a/mm/rmap.c
> > +++ b/mm/rmap.c
> > @@ -1368,15 +1368,31 @@ static int rmap_walk_anon(struct page *page, int (*rmap_one)(struct page *,
> > * are holding mmap_sem. Users without mmap_sem are required to
> > * take a reference count to prevent the anon_vma disappearing
> > */
> > +retry:
> > anon_vma = page_anon_vma(page);
> > if (!anon_vma)
> > return ret;
> > spin_lock(&anon_vma->lock);
> > list_for_each_entry(avc,&anon_vma->head, same_anon_vma) {
> > struct vm_area_struct *vma = avc->vma;
> > - unsigned long address = vma_address(page, vma);
> > - if (address == -EFAULT)
> > - continue;
> > + unsigned long address;
> > +
> > + /*
> > + * Guard against deadlocks by not spinning against
> > + * vma->anon_vma->lock. If contention is found, release our
> > + * lock and try again until VMA list can be traversed without
> > + * contention.
> > + */
> > + if (anon_vma != vma->anon_vma) {
> > + if (!spin_trylock(&vma->anon_vma->lock)) {
> > + spin_unlock(&anon_vma->lock);
> > + goto retry;
> > + }
> > + }
>
> If you're part way down the list, surely you'll need to
> unlock multiple anon_vmas here before going to retry?
>
vma->anon_vma->lock is released after vma_address().
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 2/2] mm,migration: Prevent rmap_walk_[anon|ksm] seeing the wrong VMA information
2010-04-27 0:31 ` KAMEZAWA Hiroyuki
@ 2010-04-27 2:13 ` Rik van Riel
0 siblings, 0 replies; 21+ messages in thread
From: Rik van Riel @ 2010-04-27 2:13 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: Mel Gorman, Linux-MM, LKML, Minchan Kim, Christoph Lameter,
Andrea Arcangeli, Andrew Morton
On 04/26/2010 08:31 PM, KAMEZAWA Hiroyuki wrote:
>> If you're part way down the list, surely you'll need to
>> unlock multiple anon_vmas here before going to retry?
>>
> vma->anon_vma->lock is released after vma_address().
Doh, never mind. Too much code in my mind at once...
Acked-by: Rik van Riel <riel@redhat.com>
--
All rights reversed
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 21+ messages in thread