All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alistair Popple <apopple@nvidia.com>
To: huang ying <huang.ying.caritas@gmail.com>
Cc: "Sierra Guiza, Alejandro \(Alex\)" <alex.sierra@amd.com>,
	Ralph Campbell <rcampbell@nvidia.com>,
	Lyude Paul <lyude@redhat.com>, Karol Herbst <kherbst@redhat.com>,
	David Hildenbrand <david@redhat.com>,
	John Hubbard <jhubbard@nvidia.com>,
	Felix Kuehling <Felix.Kuehling@amd.com>,
	linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org,
	Matthew Wilcox <willy@infradead.org>,
	linux-mm@kvack.org, Peter Xu <peterx@redhat.com>,
	Ben Skeggs <bskeggs@redhat.com>, Jason Gunthorpe <jgg@nvidia.com>,
	Huang Ying <ying.huang@intel.com>,
	stable@vger.kernel.org, akpm@linux-foundation.org,
	Logan Gunthorpe <logang@deltatee.com>
Subject: Re: [PATCH v2 1/2] mm/migrate_device.c: Copy pte dirty bit to page
Date: Wed, 17 Aug 2022 11:38:37 +1000	[thread overview]
Message-ID: <875yirve32.fsf@nvdebian.thelocal> (raw)
In-Reply-To: <CAC=cRTPGiXWjk=CYnCrhJnLx3mdkGDXZpvApo6yTbeW7+ZGajA@mail.gmail.com>


huang ying <huang.ying.caritas@gmail.com> writes:

> On Tue, Aug 16, 2022 at 3:39 PM Alistair Popple <apopple@nvidia.com> wrote:
>>
>> migrate_vma_setup() has a fast path in migrate_vma_collect_pmd() that
>> installs migration entries directly if it can lock the migrating page.
>> When removing a dirty pte the dirty bit is supposed to be carried over
>> to the underlying page to prevent it being lost.
>>
>> Currently migrate_vma_*() can only be used for private anonymous
>> mappings. That means loss of the dirty bit usually doesn't result in
>> data loss because these pages are typically not file-backed. However
>> pages may be backed by swap storage which can result in data loss if an
>> attempt is made to migrate a dirty page that doesn't yet have the
>> PageDirty flag set.
>>
>> In this case migration will fail due to unexpected references but the
>> dirty pte bit will be lost. If the page is subsequently reclaimed data
>> won't be written back to swap storage as it is considered uptodate,
>> resulting in data loss if the page is subsequently accessed.
>>
>> Prevent this by copying the dirty bit to the page when removing the pte
>> to match what try_to_migrate_one() does.
>>
>> Signed-off-by: Alistair Popple <apopple@nvidia.com>
>> Acked-by: Peter Xu <peterx@redhat.com>
>> Reported-by: Huang Ying <ying.huang@intel.com>
>> Fixes: 8c3328f1f36a ("mm/migrate: migrate_vma() unmap page from vma while collecting pages")
>> Cc: stable@vger.kernel.org
>>
>> ---
>>
>> Changes for v2:
>>
>>  - Fixed up Reported-by tag.
>>  - Added Peter's Acked-by.
>>  - Atomically read and clear the pte to prevent the dirty bit getting
>>    set after reading it.
>>  - Added fixes tag
>> ---
>>  mm/migrate_device.c | 21 ++++++++-------------
>>  1 file changed, 8 insertions(+), 13 deletions(-)
>>
>> diff --git a/mm/migrate_device.c b/mm/migrate_device.c
>> index 27fb37d..e2d09e5 100644
>> --- a/mm/migrate_device.c
>> +++ b/mm/migrate_device.c
>> @@ -7,6 +7,7 @@
>>  #include <linux/export.h>
>>  #include <linux/memremap.h>
>>  #include <linux/migrate.h>
>> +#include <linux/mm.h>
>>  #include <linux/mm_inline.h>
>>  #include <linux/mmu_notifier.h>
>>  #include <linux/oom.h>
>> @@ -61,7 +62,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
>>         struct migrate_vma *migrate = walk->private;
>>         struct vm_area_struct *vma = walk->vma;
>>         struct mm_struct *mm = vma->vm_mm;
>> -       unsigned long addr = start, unmapped = 0;
>> +       unsigned long addr = start;
>>         spinlock_t *ptl;
>>         pte_t *ptep;
>>
>> @@ -193,11 +194,10 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
>>                         bool anon_exclusive;
>>                         pte_t swp_pte;
>>
>> +                       flush_cache_page(vma, addr, pte_pfn(*ptep));
>> +                       pte = ptep_clear_flush(vma, addr, ptep);
>
> Although I think it's possible to batch the TLB flushing just before
> unlocking PTL.  The current code looks correct.

I think you might be right but I'd rather deal with batch TLB flushing
as a separate change that implements it for normal migration as well
given we don't seem to do it there either.

> Reviewed-by: "Huang, Ying" <ying.huang@intel.com>

Thanks.

> Best Regards,
> Huang, Ying
>
>>                         anon_exclusive = PageAnon(page) && PageAnonExclusive(page);
>>                         if (anon_exclusive) {
>> -                               flush_cache_page(vma, addr, pte_pfn(*ptep));
>> -                               ptep_clear_flush(vma, addr, ptep);
>> -
>>                                 if (page_try_share_anon_rmap(page)) {
>>                                         set_pte_at(mm, addr, ptep, pte);
>>                                         unlock_page(page);
>> @@ -205,12 +205,14 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
>>                                         mpfn = 0;
>>                                         goto next;
>>                                 }
>> -                       } else {
>> -                               ptep_get_and_clear(mm, addr, ptep);
>>                         }
>>
>>                         migrate->cpages++;
>>
>> +                       /* Set the dirty flag on the folio now the pte is gone. */
>> +                       if (pte_dirty(pte))
>> +                               folio_mark_dirty(page_folio(page));
>> +
>>                         /* Setup special migration page table entry */
>>                         if (mpfn & MIGRATE_PFN_WRITE)
>>                                 entry = make_writable_migration_entry(
>> @@ -242,9 +244,6 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
>>                          */
>>                         page_remove_rmap(page, vma, false);
>>                         put_page(page);
>> -
>> -                       if (pte_present(pte))
>> -                               unmapped++;
>>                 } else {
>>                         put_page(page);
>>                         mpfn = 0;
>> @@ -257,10 +256,6 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
>>         arch_leave_lazy_mmu_mode();
>>         pte_unmap_unlock(ptep - 1, ptl);
>>
>> -       /* Only flush the TLB if we actually modified any entries */
>> -       if (unmapped)
>> -               flush_tlb_range(walk->vma, start, end);
>> -
>>         return 0;
>>  }
>>
>>
>> base-commit: ffcf9c5700e49c0aee42dcba9a12ba21338e8136
>> --
>> git-series 0.9.1
>>

WARNING: multiple messages have this Message-ID (diff)
From: Alistair Popple <apopple@nvidia.com>
To: huang ying <huang.ying.caritas@gmail.com>
Cc: linux-mm@kvack.org, akpm@linux-foundation.org,
	linux-kernel@vger.kernel.org, "Sierra Guiza,
	Alejandro (Alex)" <alex.sierra@amd.com>,
	Felix Kuehling <Felix.Kuehling@amd.com>,
	Jason Gunthorpe <jgg@nvidia.com>,
	John Hubbard <jhubbard@nvidia.com>,
	David Hildenbrand <david@redhat.com>,
	Ralph Campbell <rcampbell@nvidia.com>,
	Matthew Wilcox <willy@infradead.org>,
	Karol Herbst <kherbst@redhat.com>, Lyude Paul <lyude@redhat.com>,
	Ben Skeggs <bskeggs@redhat.com>,
	Logan Gunthorpe <logang@deltatee.com>,
	paulus@ozlabs.org, Peter Xu <peterx@redhat.com>,
	linuxppc-dev@lists.ozlabs.org, Huang Ying <ying.huang@intel.com>,
	stable@vger.kernel.org
Subject: Re: [PATCH v2 1/2] mm/migrate_device.c: Copy pte dirty bit to page
Date: Wed, 17 Aug 2022 11:38:37 +1000	[thread overview]
Message-ID: <875yirve32.fsf@nvdebian.thelocal> (raw)
In-Reply-To: <CAC=cRTPGiXWjk=CYnCrhJnLx3mdkGDXZpvApo6yTbeW7+ZGajA@mail.gmail.com>


huang ying <huang.ying.caritas@gmail.com> writes:

> On Tue, Aug 16, 2022 at 3:39 PM Alistair Popple <apopple@nvidia.com> wrote:
>>
>> migrate_vma_setup() has a fast path in migrate_vma_collect_pmd() that
>> installs migration entries directly if it can lock the migrating page.
>> When removing a dirty pte the dirty bit is supposed to be carried over
>> to the underlying page to prevent it being lost.
>>
>> Currently migrate_vma_*() can only be used for private anonymous
>> mappings. That means loss of the dirty bit usually doesn't result in
>> data loss because these pages are typically not file-backed. However
>> pages may be backed by swap storage which can result in data loss if an
>> attempt is made to migrate a dirty page that doesn't yet have the
>> PageDirty flag set.
>>
>> In this case migration will fail due to unexpected references but the
>> dirty pte bit will be lost. If the page is subsequently reclaimed data
>> won't be written back to swap storage as it is considered uptodate,
>> resulting in data loss if the page is subsequently accessed.
>>
>> Prevent this by copying the dirty bit to the page when removing the pte
>> to match what try_to_migrate_one() does.
>>
>> Signed-off-by: Alistair Popple <apopple@nvidia.com>
>> Acked-by: Peter Xu <peterx@redhat.com>
>> Reported-by: Huang Ying <ying.huang@intel.com>
>> Fixes: 8c3328f1f36a ("mm/migrate: migrate_vma() unmap page from vma while collecting pages")
>> Cc: stable@vger.kernel.org
>>
>> ---
>>
>> Changes for v2:
>>
>>  - Fixed up Reported-by tag.
>>  - Added Peter's Acked-by.
>>  - Atomically read and clear the pte to prevent the dirty bit getting
>>    set after reading it.
>>  - Added fixes tag
>> ---
>>  mm/migrate_device.c | 21 ++++++++-------------
>>  1 file changed, 8 insertions(+), 13 deletions(-)
>>
>> diff --git a/mm/migrate_device.c b/mm/migrate_device.c
>> index 27fb37d..e2d09e5 100644
>> --- a/mm/migrate_device.c
>> +++ b/mm/migrate_device.c
>> @@ -7,6 +7,7 @@
>>  #include <linux/export.h>
>>  #include <linux/memremap.h>
>>  #include <linux/migrate.h>
>> +#include <linux/mm.h>
>>  #include <linux/mm_inline.h>
>>  #include <linux/mmu_notifier.h>
>>  #include <linux/oom.h>
>> @@ -61,7 +62,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
>>         struct migrate_vma *migrate = walk->private;
>>         struct vm_area_struct *vma = walk->vma;
>>         struct mm_struct *mm = vma->vm_mm;
>> -       unsigned long addr = start, unmapped = 0;
>> +       unsigned long addr = start;
>>         spinlock_t *ptl;
>>         pte_t *ptep;
>>
>> @@ -193,11 +194,10 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
>>                         bool anon_exclusive;
>>                         pte_t swp_pte;
>>
>> +                       flush_cache_page(vma, addr, pte_pfn(*ptep));
>> +                       pte = ptep_clear_flush(vma, addr, ptep);
>
> Although I think it's possible to batch the TLB flushing just before
> unlocking PTL.  The current code looks correct.

I think you might be right but I'd rather deal with batch TLB flushing
as a separate change that implements it for normal migration as well
given we don't seem to do it there either.

> Reviewed-by: "Huang, Ying" <ying.huang@intel.com>

Thanks.

> Best Regards,
> Huang, Ying
>
>>                         anon_exclusive = PageAnon(page) && PageAnonExclusive(page);
>>                         if (anon_exclusive) {
>> -                               flush_cache_page(vma, addr, pte_pfn(*ptep));
>> -                               ptep_clear_flush(vma, addr, ptep);
>> -
>>                                 if (page_try_share_anon_rmap(page)) {
>>                                         set_pte_at(mm, addr, ptep, pte);
>>                                         unlock_page(page);
>> @@ -205,12 +205,14 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
>>                                         mpfn = 0;
>>                                         goto next;
>>                                 }
>> -                       } else {
>> -                               ptep_get_and_clear(mm, addr, ptep);
>>                         }
>>
>>                         migrate->cpages++;
>>
>> +                       /* Set the dirty flag on the folio now the pte is gone. */
>> +                       if (pte_dirty(pte))
>> +                               folio_mark_dirty(page_folio(page));
>> +
>>                         /* Setup special migration page table entry */
>>                         if (mpfn & MIGRATE_PFN_WRITE)
>>                                 entry = make_writable_migration_entry(
>> @@ -242,9 +244,6 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
>>                          */
>>                         page_remove_rmap(page, vma, false);
>>                         put_page(page);
>> -
>> -                       if (pte_present(pte))
>> -                               unmapped++;
>>                 } else {
>>                         put_page(page);
>>                         mpfn = 0;
>> @@ -257,10 +256,6 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
>>         arch_leave_lazy_mmu_mode();
>>         pte_unmap_unlock(ptep - 1, ptl);
>>
>> -       /* Only flush the TLB if we actually modified any entries */
>> -       if (unmapped)
>> -               flush_tlb_range(walk->vma, start, end);
>> -
>>         return 0;
>>  }
>>
>>
>> base-commit: ffcf9c5700e49c0aee42dcba9a12ba21338e8136
>> --
>> git-series 0.9.1
>>

  parent reply	other threads:[~2022-08-17  1:45 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-16  7:39 [PATCH v2 1/2] mm/migrate_device.c: Copy pte dirty bit to page Alistair Popple
2022-08-16  7:39 ` Alistair Popple
2022-08-16  7:39 ` [PATCH v2 2/2] selftests/hmm-tests: Add test for dirty bits Alistair Popple
2022-08-16  7:39   ` Alistair Popple
2022-08-16  8:10 ` [PATCH v2 1/2] mm/migrate_device.c: Copy pte dirty bit to page huang ying
2022-08-16  8:10   ` huang ying
2022-08-16 20:35   ` Peter Xu
2022-08-16 20:35     ` Peter Xu
2022-08-17  1:49     ` Alistair Popple
2022-08-17  1:49       ` Alistair Popple
2022-08-17  2:45       ` Peter Xu
2022-08-17  2:45         ` Peter Xu
2022-08-17  5:41         ` Alistair Popple
2022-08-17  5:41           ` Alistair Popple
2022-08-17  7:17           ` Huang, Ying
2022-08-17  7:17             ` Huang, Ying
2022-08-17  9:41             ` Nadav Amit
2022-08-17  9:41               ` Nadav Amit
2022-08-17 19:27               ` Peter Xu
2022-08-17 19:27                 ` Peter Xu
2022-08-18  6:34                 ` Huang, Ying
2022-08-18  6:34                   ` Huang, Ying
2022-08-18 14:44                   ` Peter Xu
2022-08-18 14:44                     ` Peter Xu
2022-08-19  2:51                     ` Huang, Ying
2022-08-19  2:51                       ` Huang, Ying
2022-08-24  1:56                       ` Alistair Popple
2022-08-24  1:56                         ` Alistair Popple
2022-08-24 20:25                         ` Peter Xu
2022-08-24 20:25                           ` Peter Xu
2022-08-24 20:48                           ` Peter Xu
2022-08-24 20:48                             ` Peter Xu
2022-08-25  0:42                             ` Alistair Popple
2022-08-25  0:42                               ` Alistair Popple
2022-08-25  1:24                               ` Alistair Popple
2022-08-25  1:24                                 ` Alistair Popple
2022-08-25 15:04                                 ` Peter Xu
2022-08-25 15:04                                   ` Peter Xu
2022-08-25 22:09                                   ` Alistair Popple
2022-08-25 22:09                                     ` Alistair Popple
2022-08-25 23:36                                     ` Peter Xu
2022-08-25 23:36                                       ` Peter Xu
2022-08-25 14:40                               ` Peter Xu
2022-08-25 14:40                                 ` Peter Xu
2022-08-18  5:59               ` Huang, Ying
2022-08-18  5:59                 ` Huang, Ying
2022-08-17 19:07           ` Peter Xu
2022-08-17 19:07             ` Peter Xu
2022-08-17  1:38   ` Alistair Popple [this message]
2022-08-17  1:38     ` Alistair Popple

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=875yirve32.fsf@nvdebian.thelocal \
    --to=apopple@nvidia.com \
    --cc=Felix.Kuehling@amd.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.sierra@amd.com \
    --cc=bskeggs@redhat.com \
    --cc=david@redhat.com \
    --cc=huang.ying.caritas@gmail.com \
    --cc=jgg@nvidia.com \
    --cc=jhubbard@nvidia.com \
    --cc=kherbst@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=logang@deltatee.com \
    --cc=lyude@redhat.com \
    --cc=peterx@redhat.com \
    --cc=rcampbell@nvidia.com \
    --cc=stable@vger.kernel.org \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.