* [PATCH 0/2] mm: fix race condition in MADV_FREE
@ 2017-09-21 20:27 Shaohua Li
2017-09-21 20:27 ` [PATCH 1/2] mm: avoid marking swap cached page as lazyfree Shaohua Li
2017-09-21 20:27 ` [PATCH 2/2] mm: fix data corruption caused by lazyfree page Shaohua Li
0 siblings, 2 replies; 6+ messages in thread
From: Shaohua Li @ 2017-09-21 20:27 UTC (permalink / raw)
To: linux-mm; +Cc: Artem Savkov, Kernel-team, Shaohua Li
From: Shaohua Li <shli@fb.com>
Artem Savkov reported a race condition[1] in MADV_FREE. MADV_FREE clear pte
dirty bit and then mark the page lazyfree. There is no lock to prevent the
page is added to swap cache between these two steps by page reclaim. There are
two problems:
- page in swapcache is marked lazyfree (clear SwapBacked). This confuses some
code pathes, like page fault handling.
- The page is added into swapcache, and freed but the page isn't swapout
because pte isn't dity. This will cause data corruption.
The patches will fix the issues.
Thanks,
Shaohua
[1] https://marc.info/?l=linux-mm&m=150589811300667&w=2
Shaohua Li (2):
mm: avoid marking swap cached page as lazyfree
mm: fix data corruption caused by lazyfree page
mm/swap.c | 4 ++--
mm/vmscan.c | 12 ++++++++++++
2 files changed, 14 insertions(+), 2 deletions(-)
--
2.9.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 1/2] mm: avoid marking swap cached page as lazyfree
2017-09-21 20:27 [PATCH 0/2] mm: fix race condition in MADV_FREE Shaohua Li
@ 2017-09-21 20:27 ` Shaohua Li
2017-09-22 1:34 ` Rik van Riel
2017-09-21 20:27 ` [PATCH 2/2] mm: fix data corruption caused by lazyfree page Shaohua Li
1 sibling, 1 reply; 6+ messages in thread
From: Shaohua Li @ 2017-09-21 20:27 UTC (permalink / raw)
To: linux-mm
Cc: Artem Savkov, Kernel-team, Shaohua Li, stable, Johannes Weiner,
Michal Hocko, Hillf Danton, Minchan Kim, Hugh Dickins,
Rik van Riel, Mel Gorman, Andrew Morton
From: Shaohua Li <shli@fb.com>
MADV_FREE clears pte dirty bit and then marks the page lazyfree (clear
SwapBacked). There is no lock to prevent the page is added to swap cache
between these two steps by page reclaim. If the page is added to swap
cache, marking the page lazyfree will confuse page fault if the page is
reclaimed and refault.
Reported-and-tested-y: Artem Savkov <asavkov@redhat.com>
Fix: 802a3a92ad7a(mm: reclaim MADV_FREE pages)
Signed-off-by: Shaohua Li <shli@fb.com>
Cc: stable@vger.kernel.org
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
mm/swap.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/swap.c b/mm/swap.c
index 9295ae9..a77d68f 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -575,7 +575,7 @@ static void lru_lazyfree_fn(struct page *page, struct lruvec *lruvec,
void *arg)
{
if (PageLRU(page) && PageAnon(page) && PageSwapBacked(page) &&
- !PageUnevictable(page)) {
+ !PageSwapCache(page) && !PageUnevictable(page)) {
bool active = PageActive(page);
del_page_from_lru_list(page, lruvec,
@@ -665,7 +665,7 @@ void deactivate_file_page(struct page *page)
void mark_page_lazyfree(struct page *page)
{
if (PageLRU(page) && PageAnon(page) && PageSwapBacked(page) &&
- !PageUnevictable(page)) {
+ !PageSwapCache(page) && !PageUnevictable(page)) {
struct pagevec *pvec = &get_cpu_var(lru_lazyfree_pvecs);
get_page(page);
--
2.9.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 2/2] mm: fix data corruption caused by lazyfree page
2017-09-21 20:27 [PATCH 0/2] mm: fix race condition in MADV_FREE Shaohua Li
2017-09-21 20:27 ` [PATCH 1/2] mm: avoid marking swap cached page as lazyfree Shaohua Li
@ 2017-09-21 20:27 ` Shaohua Li
2017-09-22 6:01 ` Minchan Kim
1 sibling, 1 reply; 6+ messages in thread
From: Shaohua Li @ 2017-09-21 20:27 UTC (permalink / raw)
To: linux-mm
Cc: Artem Savkov, Kernel-team, Shaohua Li, Johannes Weiner,
Michal Hocko, Hillf Danton, Minchan Kim, Hugh Dickins,
Rik van Riel, Mel Gorman, Andrew Morton
From: Shaohua Li <shli@fb.com>
MADV_FREE clears pte dirty bit and then marks the page lazyfree (clear
SwapBacked). There is no lock to prevent the page is added to swap cache
between these two steps by page reclaim. If page reclaim finds such
page, it will simply add the page to swap cache without pageout the page
to swap because the page is marked as clean. Next time, page fault will
read data from the swap slot which doesn't have the original data, so we
have a data corruption. To fix issue, we mark the page dirty and pageout
the page.
However, we shouldn't dirty all pages which is clean and in swap cache.
swapin page is swap cache and clean too. So we only dirty page which is
added into swap cache in page reclaim, which shouldn't be swapin page.
Normal anonymous pages should be dirty already.
Reported-and-tested-y: Artem Savkov <asavkov@redhat.com>
Fix: 802a3a92ad7a(mm: reclaim MADV_FREE pages)
Signed-off-by: Shaohua Li <shli@fb.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
mm/vmscan.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index d811c81..820ee8d 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -980,6 +980,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
int may_enter_fs;
enum page_references references = PAGEREF_RECLAIM_CLEAN;
bool dirty, writeback;
+ bool new_swap_page = false;
cond_resched();
@@ -1165,6 +1166,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
/* Adding to swap updated mapping */
mapping = page_mapping(page);
+ new_swap_page = true;
}
} else if (unlikely(PageTransHuge(page))) {
/* Split file THP */
@@ -1185,6 +1187,16 @@ static unsigned long shrink_page_list(struct list_head *page_list,
nr_unmap_fail++;
goto activate_locked;
}
+
+ /*
+ * MADV_FREE clear pte dirty bit, but not yet clear
+ * SwapBacked for a page. We can't directly free the
+ * page because we already set swap entry in pte. The
+ * check guarantees this is such page and not a clean
+ * swapin page
+ */
+ if (!PageDirty(page) && new_swap_page)
+ set_page_dirty(page);
}
if (PageDirty(page)) {
--
2.9.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 1/2] mm: avoid marking swap cached page as lazyfree
2017-09-21 20:27 ` [PATCH 1/2] mm: avoid marking swap cached page as lazyfree Shaohua Li
@ 2017-09-22 1:34 ` Rik van Riel
0 siblings, 0 replies; 6+ messages in thread
From: Rik van Riel @ 2017-09-22 1:34 UTC (permalink / raw)
To: Shaohua Li, linux-mm
Cc: Artem Savkov, Kernel-team, Shaohua Li, stable, Johannes Weiner,
Michal Hocko, Hillf Danton, Minchan Kim, Hugh Dickins, Mel Gorman,
Andrew Morton
On Thu, 2017-09-21 at 13:27 -0700, Shaohua Li wrote:
> From: Shaohua Li <shli@fb.com>
>
> MADV_FREE clears pte dirty bit and then marks the page lazyfree
> (clear
> SwapBacked). There is no lock to prevent the page is added to swap
> cache
> between these two steps by page reclaim. If the page is added to swap
> cache, marking the page lazyfree will confuse page fault if the page
> is
> reclaimed and refault.
>
> Reported-and-tested-y: Artem Savkov <asavkov@redhat.com>
> Fix: 802a3a92ad7a(mm: reclaim MADV_FREE pages)
> Signed-off-by: Shaohua Li <shli@fb.com>
> Cc: stable@vger.kernel.org
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: Andrew Morton <akpm@linux-foundation.org>
>
Reviewed-by: Rik van Riel <riel@redhat.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 2/2] mm: fix data corruption caused by lazyfree page
2017-09-21 20:27 ` [PATCH 2/2] mm: fix data corruption caused by lazyfree page Shaohua Li
@ 2017-09-22 6:01 ` Minchan Kim
2017-09-22 18:45 ` Shaohua Li
0 siblings, 1 reply; 6+ messages in thread
From: Minchan Kim @ 2017-09-22 6:01 UTC (permalink / raw)
To: Shaohua Li
Cc: linux-mm, Artem Savkov, Kernel-team, Shaohua Li, Johannes Weiner,
Michal Hocko, Hillf Danton, Hugh Dickins, Rik van Riel,
Mel Gorman, Andrew Morton
On Thu, Sep 21, 2017 at 01:27:11PM -0700, Shaohua Li wrote:
> From: Shaohua Li <shli@fb.com>
>
> MADV_FREE clears pte dirty bit and then marks the page lazyfree (clear
> SwapBacked). There is no lock to prevent the page is added to swap cache
> between these two steps by page reclaim. If page reclaim finds such
> page, it will simply add the page to swap cache without pageout the page
> to swap because the page is marked as clean. Next time, page fault will
> read data from the swap slot which doesn't have the original data, so we
> have a data corruption. To fix issue, we mark the page dirty and pageout
> the page.
>
> However, we shouldn't dirty all pages which is clean and in swap cache.
> swapin page is swap cache and clean too. So we only dirty page which is
> added into swap cache in page reclaim, which shouldn't be swapin page.
> Normal anonymous pages should be dirty already.
>
> Reported-and-tested-y: Artem Savkov <asavkov@redhat.com>
> Fix: 802a3a92ad7a(mm: reclaim MADV_FREE pages)
> Signed-off-by: Shaohua Li <shli@fb.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> ---
> mm/vmscan.c | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index d811c81..820ee8d 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -980,6 +980,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
> int may_enter_fs;
> enum page_references references = PAGEREF_RECLAIM_CLEAN;
> bool dirty, writeback;
> + bool new_swap_page = false;
>
> cond_resched();
>
> @@ -1165,6 +1166,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
>
> /* Adding to swap updated mapping */
> mapping = page_mapping(page);
> + new_swap_page = true;
> }
> } else if (unlikely(PageTransHuge(page))) {
> /* Split file THP */
> @@ -1185,6 +1187,16 @@ static unsigned long shrink_page_list(struct list_head *page_list,
> nr_unmap_fail++;
> goto activate_locked;
> }
> +
> + /*
> + * MADV_FREE clear pte dirty bit, but not yet clear
> + * SwapBacked for a page. We can't directly free the
> + * page because we already set swap entry in pte. The
> + * check guarantees this is such page and not a clean
> + * swapin page
> + */
> + if (!PageDirty(page) && new_swap_page)
> + set_page_dirty(page);
> }
>
> if (PageDirty(page)) {
> --
> 2.9.5
>
Couldn't we simple roll back to the logic before MADV_FREE's birth?
diff --git a/mm/swap_state.c b/mm/swap_state.c
index 71ce2d1ccbf7..548c19b5f78e 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -231,7 +231,7 @@ int add_to_swap(struct page *page)
* deadlock in the swap out path.
*/
/*
- * Add it to the swap cache.
+ * Add it to the swap cache and mark it dirty
*/
err = add_to_swap_cache(page, entry,
__GFP_HIGH|__GFP_NOMEMALLOC|__GFP_NOWARN);
@@ -243,6 +243,7 @@ int add_to_swap(struct page *page)
*/
goto fail;
+ SetPageDirty(page);
return 1;
fail:
To me, it would be more simple/readable rather than introducing
a new branch in complicated shrink_page_list.
And I don't see why we cannot merge [1/2] and [2/2].
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 2/2] mm: fix data corruption caused by lazyfree page
2017-09-22 6:01 ` Minchan Kim
@ 2017-09-22 18:45 ` Shaohua Li
0 siblings, 0 replies; 6+ messages in thread
From: Shaohua Li @ 2017-09-22 18:45 UTC (permalink / raw)
To: Minchan Kim
Cc: linux-mm, Artem Savkov, Kernel-team, Shaohua Li, Johannes Weiner,
Michal Hocko, Hillf Danton, Hugh Dickins, Rik van Riel,
Mel Gorman, Andrew Morton
On Fri, Sep 22, 2017 at 03:01:41PM +0900, Minchan Kim wrote:
> On Thu, Sep 21, 2017 at 01:27:11PM -0700, Shaohua Li wrote:
> > From: Shaohua Li <shli@fb.com>
> >
> > MADV_FREE clears pte dirty bit and then marks the page lazyfree (clear
> > SwapBacked). There is no lock to prevent the page is added to swap cache
> > between these two steps by page reclaim. If page reclaim finds such
> > page, it will simply add the page to swap cache without pageout the page
> > to swap because the page is marked as clean. Next time, page fault will
> > read data from the swap slot which doesn't have the original data, so we
> > have a data corruption. To fix issue, we mark the page dirty and pageout
> > the page.
> >
> > However, we shouldn't dirty all pages which is clean and in swap cache.
> > swapin page is swap cache and clean too. So we only dirty page which is
> > added into swap cache in page reclaim, which shouldn't be swapin page.
> > Normal anonymous pages should be dirty already.
> >
> > Reported-and-tested-y: Artem Savkov <asavkov@redhat.com>
> > Fix: 802a3a92ad7a(mm: reclaim MADV_FREE pages)
> > Signed-off-by: Shaohua Li <shli@fb.com>
> > Cc: Johannes Weiner <hannes@cmpxchg.org>
> > Cc: Michal Hocko <mhocko@suse.com>
> > Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
> > Cc: Minchan Kim <minchan@kernel.org>
> > Cc: Hugh Dickins <hughd@google.com>
> > Cc: Rik van Riel <riel@redhat.com>
> > Cc: Mel Gorman <mgorman@techsingularity.net>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > ---
> > mm/vmscan.c | 12 ++++++++++++
> > 1 file changed, 12 insertions(+)
> >
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index d811c81..820ee8d 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -980,6 +980,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
> > int may_enter_fs;
> > enum page_references references = PAGEREF_RECLAIM_CLEAN;
> > bool dirty, writeback;
> > + bool new_swap_page = false;
> >
> > cond_resched();
> >
> > @@ -1165,6 +1166,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
> >
> > /* Adding to swap updated mapping */
> > mapping = page_mapping(page);
> > + new_swap_page = true;
> > }
> > } else if (unlikely(PageTransHuge(page))) {
> > /* Split file THP */
> > @@ -1185,6 +1187,16 @@ static unsigned long shrink_page_list(struct list_head *page_list,
> > nr_unmap_fail++;
> > goto activate_locked;
> > }
> > +
> > + /*
> > + * MADV_FREE clear pte dirty bit, but not yet clear
> > + * SwapBacked for a page. We can't directly free the
> > + * page because we already set swap entry in pte. The
> > + * check guarantees this is such page and not a clean
> > + * swapin page
> > + */
> > + if (!PageDirty(page) && new_swap_page)
> > + set_page_dirty(page);
> > }
> >
> > if (PageDirty(page)) {
> > --
> > 2.9.5
> >
>
> Couldn't we simple roll back to the logic before MADV_FREE's birth?
>
> diff --git a/mm/swap_state.c b/mm/swap_state.c
> index 71ce2d1ccbf7..548c19b5f78e 100644
> --- a/mm/swap_state.c
> +++ b/mm/swap_state.c
> @@ -231,7 +231,7 @@ int add_to_swap(struct page *page)
> * deadlock in the swap out path.
> */
> /*
> - * Add it to the swap cache.
> + * Add it to the swap cache and mark it dirty
> */
> err = add_to_swap_cache(page, entry,
> __GFP_HIGH|__GFP_NOMEMALLOC|__GFP_NOWARN);
> @@ -243,6 +243,7 @@ int add_to_swap(struct page *page)
> */
> goto fail;
>
> + SetPageDirty(page);
> return 1;
>
> fail:
>
> To me, it would be more simple/readable rather than introducing
> a new branch in complicated shrink_page_list.
This is neat, thanks for the suggestion! I'll use set_page_dirty, becuase
swapcache set_page_dirty not just set the page dirty.
> And I don't see why we cannot merge [1/2] and [2/2].
I feel two separate patches are more clear, but I'll let Andrew decide.
Thanks,
Shaohua
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2017-09-22 18:45 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-09-21 20:27 [PATCH 0/2] mm: fix race condition in MADV_FREE Shaohua Li
2017-09-21 20:27 ` [PATCH 1/2] mm: avoid marking swap cached page as lazyfree Shaohua Li
2017-09-22 1:34 ` Rik van Riel
2017-09-21 20:27 ` [PATCH 2/2] mm: fix data corruption caused by lazyfree page Shaohua Li
2017-09-22 6:01 ` Minchan Kim
2017-09-22 18:45 ` Shaohua Li
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).