From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg0-f70.google.com (mail-pg0-f70.google.com [74.125.83.70]) by kanga.kvack.org (Postfix) with ESMTP id A0CDE6B025E for ; Tue, 26 Sep 2017 13:26:30 -0400 (EDT) Received: by mail-pg0-f70.google.com with SMTP id p5so22550478pgn.7 for ; Tue, 26 Sep 2017 10:26:30 -0700 (PDT) Received: from mail.kernel.org (mail.kernel.org. [198.145.29.99]) by mx.google.com with ESMTPS id t12si5553177pgs.36.2017.09.26.10.26.29 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 26 Sep 2017 10:26:29 -0700 (PDT) From: Shaohua Li Subject: [PATCH V3 0/2] mm: fix race condition in MADV_FREE Date: Tue, 26 Sep 2017 10:26:24 -0700 Message-Id: Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org Cc: asavkov@redhat.com, Kernel-team@fb.com, Shaohua Li From: Shaohua Li Artem Savkov reported a race condition[1] in MADV_FREE. MADV_FREE clear pte dirty bit and then mark the page lazyfree. There is no lock to prevent the page is added to swap cache between these two steps by page reclaim. There are two problems: - page in swapcache is marked lazyfree (clear SwapBacked). This confuses some code pathes, like page fault handling. - The page is added into swapcache, and freed but the page isn't swapout because pte isn't dirty. This will cause data corruption. The patches will fix the issues. I knew Minchan suggested these should be combined to one patch, but I really think the separation makes things clearer because these are two issues even they are stemmed from the same race. Thanks, Shaohua V2->V3: - reword patch log and code comments, no code change V1->V2: - dirty page in add_to_swap instead of in shrink_page_list as suggested by Minchan Shaohua Li (2): mm: avoid marking swap cached page as lazyfree mm: fix data corruption caused by lazyfree page mm/swap.c | 4 ++-- mm/swap_state.c | 11 +++++++++++ 2 files changed, 13 insertions(+), 2 deletions(-) -- 2.9.5 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f199.google.com (mail-pf0-f199.google.com [209.85.192.199]) by kanga.kvack.org (Postfix) with ESMTP id 73A726B025F for ; Tue, 26 Sep 2017 13:26:31 -0400 (EDT) Received: by mail-pf0-f199.google.com with SMTP id y29so18963084pff.6 for ; Tue, 26 Sep 2017 10:26:31 -0700 (PDT) Received: from mail.kernel.org (mail.kernel.org. [198.145.29.99]) by mx.google.com with ESMTPS id q90si5943098pfk.278.2017.09.26.10.26.30 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 26 Sep 2017 10:26:30 -0700 (PDT) From: Shaohua Li Subject: [PATCH V3 1/2] mm: avoid marking swap cached page as lazyfree Date: Tue, 26 Sep 2017 10:26:25 -0700 Message-Id: <6537ef3814398c0073630b03f176263bc81f0902.1506446061.git.shli@fb.com> In-Reply-To: References: In-Reply-To: References: Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org Cc: asavkov@redhat.com, Kernel-team@fb.com, Shaohua Li , stable@vger.kernel.org, Johannes Weiner , Michal Hocko , Hillf Danton , Minchan Kim , Hugh Dickins , Mel Gorman , Andrew Morton From: Shaohua Li MADV_FREE clears pte dirty bit and then marks the page lazyfree (clear SwapBacked). There is no lock to prevent the page is added to swap cache between these two steps by page reclaim. Page reclaim could add the page to swap cache and unmap the page. After page reclaim, the page is added back to lru. At that time, we probably start draining per-cpu pagevec and mark the page lazyfree. So the page could be in a state with SwapBacked cleared and PG_swapcache set. Next time there is a refault in the virtual address, do_swap_page can find the page from swap cache but the page has PageSwapCache false because SwapBacked isn't set, so do_swap_page will bail out and do nothing. The task will keep running into fault handler. Reported-and-tested-by: Artem Savkov Fix: 802a3a92ad7a(mm: reclaim MADV_FREE pages) Signed-off-by: Shaohua Li Cc: stable@vger.kernel.org Cc: Johannes Weiner Cc: Michal Hocko Cc: Hillf Danton Cc: Minchan Kim Cc: Hugh Dickins Cc: Mel Gorman Cc: Andrew Morton Reviewed-by: Rik van Riel --- mm/swap.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/mm/swap.c b/mm/swap.c index 9295ae9..a77d68f 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -575,7 +575,7 @@ static void lru_lazyfree_fn(struct page *page, struct lruvec *lruvec, void *arg) { if (PageLRU(page) && PageAnon(page) && PageSwapBacked(page) && - !PageUnevictable(page)) { + !PageSwapCache(page) && !PageUnevictable(page)) { bool active = PageActive(page); del_page_from_lru_list(page, lruvec, @@ -665,7 +665,7 @@ void deactivate_file_page(struct page *page) void mark_page_lazyfree(struct page *page) { if (PageLRU(page) && PageAnon(page) && PageSwapBacked(page) && - !PageUnevictable(page)) { + !PageSwapCache(page) && !PageUnevictable(page)) { struct pagevec *pvec = &get_cpu_var(lru_lazyfree_pvecs); get_page(page); -- 2.9.5 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg0-f70.google.com (mail-pg0-f70.google.com [74.125.83.70]) by kanga.kvack.org (Postfix) with ESMTP id 5852F6B0260 for ; Tue, 26 Sep 2017 13:26:32 -0400 (EDT) Received: by mail-pg0-f70.google.com with SMTP id p5so22550574pgn.7 for ; Tue, 26 Sep 2017 10:26:32 -0700 (PDT) Received: from mail.kernel.org (mail.kernel.org. [198.145.29.99]) by mx.google.com with ESMTPS id 89si5936109pfj.139.2017.09.26.10.26.31 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 26 Sep 2017 10:26:31 -0700 (PDT) From: Shaohua Li Subject: [PATCH V3 2/2] mm: fix data corruption caused by lazyfree page Date: Tue, 26 Sep 2017 10:26:26 -0700 Message-Id: <08c84256b007bf3f63c91d94383bd9eb6fee2daa.1506446061.git.shli@fb.com> In-Reply-To: References: In-Reply-To: References: Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org Cc: asavkov@redhat.com, Kernel-team@fb.com, Shaohua Li , stable@vger.kernel.org, Johannes Weiner , Hillf Danton , Minchan Kim , Hugh Dickins , Rik van Riel , Mel Gorman , Andrew Morton From: Shaohua Li MADV_FREE clears pte dirty bit and then marks the page lazyfree (clear SwapBacked). There is no lock to prevent the page is added to swap cache between these two steps by page reclaim. If page reclaim finds such page, it will simply add the page to swap cache without pageout the page to swap because the page is marked as clean. Next time, page fault will read data from the swap slot which doesn't have the original data, so we have a data corruption. To fix issue, we mark the page dirty and pageout the page. However, we shouldn't dirty all pages which is clean and in swap cache. swapin page is swap cache and clean too. So we only dirty page which is added into swap cache in page reclaim, which shouldn't be swapin page. As Minchan suggested, simply dirty the page in add_to_swap can do the job. Reported-by: Artem Savkov Fix: 802a3a92ad7a(mm: reclaim MADV_FREE pages) Signed-off-by: Shaohua Li Cc: stable@vger.kernel.org Cc: Johannes Weiner Cc: Hillf Danton Cc: Minchan Kim Cc: Hugh Dickins Cc: Rik van Riel Cc: Mel Gorman Cc: Andrew Morton Acked-by: Michal Hocko --- mm/swap_state.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/mm/swap_state.c b/mm/swap_state.c index 71ce2d1..ed91091 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -242,6 +242,17 @@ int add_to_swap(struct page *page) * clear SWAP_HAS_CACHE flag. */ goto fail; + /* + * Normally the page will be dirtied in unmap because its pte should be + * dirty. A special case is MADV_FREE page. The page'e pte could have + * dirty bit cleared but the page's SwapBacked bit is still set because + * clearing the dirty bit and SwapBacked bit has no lock protected. For + * such page, unmap will not set dirty bit for it, so page reclaim will + * not write the page out. This can cause data corruption when the page + * is swap in later. Always setting the dirty bit for the page solves + * the problem. + */ + set_page_dirty(page); return 1; -- 2.9.5 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr0-f197.google.com (mail-wr0-f197.google.com [209.85.128.197]) by kanga.kvack.org (Postfix) with ESMTP id 195A36B0069 for ; Tue, 26 Sep 2017 15:25:35 -0400 (EDT) Received: by mail-wr0-f197.google.com with SMTP id v109so13422799wrc.5 for ; Tue, 26 Sep 2017 12:25:35 -0700 (PDT) Received: from gum.cmpxchg.org (gum.cmpxchg.org. [85.214.110.215]) by mx.google.com with ESMTPS id w57si1014045edb.269.2017.09.26.12.25.33 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 26 Sep 2017 12:25:33 -0700 (PDT) Date: Tue, 26 Sep 2017 15:25:24 -0400 From: Johannes Weiner Subject: Re: [PATCH V3 1/2] mm: avoid marking swap cached page as lazyfree Message-ID: <20170926192524.GA30943@cmpxchg.org> References: <6537ef3814398c0073630b03f176263bc81f0902.1506446061.git.shli@fb.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <6537ef3814398c0073630b03f176263bc81f0902.1506446061.git.shli@fb.com> Sender: owner-linux-mm@kvack.org List-ID: To: Shaohua Li Cc: linux-mm@kvack.org, asavkov@redhat.com, Kernel-team@fb.com, Shaohua Li , stable@vger.kernel.org, Michal Hocko , Hillf Danton , Minchan Kim , Hugh Dickins , Mel Gorman , Andrew Morton On Tue, Sep 26, 2017 at 10:26:25AM -0700, Shaohua Li wrote: > From: Shaohua Li > > MADV_FREE clears pte dirty bit and then marks the page lazyfree (clear > SwapBacked). There is no lock to prevent the page is added to swap cache > between these two steps by page reclaim. Page reclaim could add the page > to swap cache and unmap the page. After page reclaim, the page is added > back to lru. At that time, we probably start draining per-cpu pagevec > and mark the page lazyfree. So the page could be in a state with > SwapBacked cleared and PG_swapcache set. Next time there is a refault in > the virtual address, do_swap_page can find the page from swap cache but > the page has PageSwapCache false because SwapBacked isn't set, so > do_swap_page will bail out and do nothing. The task will keep running > into fault handler. The patch lgtm, but for the changelog it probably makes sense to start with the user-visible behavior, i.e. the endlessly looping swap fault handler because it thinks it's racing with the swap slot being freed. Makes it easier for other distro/vendor people to identify this for backporting. On that note, I think this should go into 4.13 and be tagged for 4.12 stable. > Reported-and-tested-by: Artem Savkov > Fix: 802a3a92ad7a(mm: reclaim MADV_FREE pages) > Signed-off-by: Shaohua Li > Cc: stable@vger.kernel.org > Cc: Johannes Weiner > Cc: Michal Hocko > Cc: Hillf Danton > Cc: Minchan Kim > Cc: Hugh Dickins > Cc: Mel Gorman > Cc: Andrew Morton > Reviewed-by: Rik van Riel Acked-by: Johannes Weiner -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr0-f198.google.com (mail-wr0-f198.google.com [209.85.128.198]) by kanga.kvack.org (Postfix) with ESMTP id D0C246B0069 for ; Tue, 26 Sep 2017 15:40:24 -0400 (EDT) Received: by mail-wr0-f198.google.com with SMTP id z1so299462wre.6 for ; Tue, 26 Sep 2017 12:40:24 -0700 (PDT) Received: from gum.cmpxchg.org (gum.cmpxchg.org. [85.214.110.215]) by mx.google.com with ESMTPS id z24si8016240edb.493.2017.09.26.12.40.22 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 26 Sep 2017 12:40:22 -0700 (PDT) Date: Tue, 26 Sep 2017 15:40:17 -0400 From: Johannes Weiner Subject: Re: [PATCH V3 2/2] mm: fix data corruption caused by lazyfree page Message-ID: <20170926194017.GB30943@cmpxchg.org> References: <08c84256b007bf3f63c91d94383bd9eb6fee2daa.1506446061.git.shli@fb.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <08c84256b007bf3f63c91d94383bd9eb6fee2daa.1506446061.git.shli@fb.com> Sender: owner-linux-mm@kvack.org List-ID: To: Shaohua Li Cc: linux-mm@kvack.org, asavkov@redhat.com, Kernel-team@fb.com, Shaohua Li , stable@vger.kernel.org, Hillf Danton , Minchan Kim , Hugh Dickins , Rik van Riel , Mel Gorman , Andrew Morton On Tue, Sep 26, 2017 at 10:26:26AM -0700, Shaohua Li wrote: > From: Shaohua Li > > MADV_FREE clears pte dirty bit and then marks the page lazyfree (clear > SwapBacked). There is no lock to prevent the page is added to swap cache > between these two steps by page reclaim. If page reclaim finds such > page, it will simply add the page to swap cache without pageout the page > to swap because the page is marked as clean. Next time, page fault will > read data from the swap slot which doesn't have the original data, so we > have a data corruption. To fix issue, we mark the page dirty and pageout > the page. Reclaim and MADV_FREE hold the page lock when manipulating the dirty and the swapcache state. Instead of undoing a racing MADV_FREE in reclaim, wouldn't it be safe to check the dirty bit before add_to_swap() and skip clean pages? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr0-f197.google.com (mail-wr0-f197.google.com [209.85.128.197]) by kanga.kvack.org (Postfix) with ESMTP id 2C8616B0069 for ; Tue, 26 Sep 2017 16:23:30 -0400 (EDT) Received: by mail-wr0-f197.google.com with SMTP id h16so13533199wrf.0 for ; Tue, 26 Sep 2017 13:23:30 -0700 (PDT) Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id n43si7604913wrb.385.2017.09.26.13.23.27 for (version=TLS1 cipher=AES128-SHA bits=128/128); Tue, 26 Sep 2017 13:23:28 -0700 (PDT) Date: Tue, 26 Sep 2017 22:23:24 +0200 From: Michal Hocko Subject: Re: [PATCH V3 1/2] mm: avoid marking swap cached page as lazyfree Message-ID: <20170926202324.ay6ets5nke7h5yil@dhcp22.suse.cz> References: <6537ef3814398c0073630b03f176263bc81f0902.1506446061.git.shli@fb.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <6537ef3814398c0073630b03f176263bc81f0902.1506446061.git.shli@fb.com> Sender: owner-linux-mm@kvack.org List-ID: To: Shaohua Li Cc: linux-mm@kvack.org, asavkov@redhat.com, Kernel-team@fb.com, Shaohua Li , stable@vger.kernel.org, Johannes Weiner , Hillf Danton , Minchan Kim , Hugh Dickins , Mel Gorman , Andrew Morton On Tue 26-09-17 10:26:25, Shaohua Li wrote: > From: Shaohua Li > > MADV_FREE clears pte dirty bit and then marks the page lazyfree (clear > SwapBacked). There is no lock to prevent the page is added to swap cache > between these two steps by page reclaim. Page reclaim could add the page > to swap cache and unmap the page. After page reclaim, the page is added > back to lru. At that time, we probably start draining per-cpu pagevec > and mark the page lazyfree. So the page could be in a state with > SwapBacked cleared and PG_swapcache set. Next time there is a refault in > the virtual address, do_swap_page can find the page from swap cache but > the page has PageSwapCache false because SwapBacked isn't set, so > do_swap_page will bail out and do nothing. The task will keep running > into fault handler. Thanks for the clarification in the changelog. It is much more clear now! > Reported-and-tested-by: Artem Savkov > Fix: 802a3a92ad7a(mm: reclaim MADV_FREE pages) > Signed-off-by: Shaohua Li > Cc: stable@vger.kernel.org > Cc: Johannes Weiner > Cc: Michal Hocko > Cc: Hillf Danton > Cc: Minchan Kim > Cc: Hugh Dickins > Cc: Mel Gorman > Cc: Andrew Morton > Reviewed-by: Rik van Riel Marking for stable as suggested by Johannes makes perfect sense to me. Acked-by: Michal Hocko > --- > mm/swap.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/mm/swap.c b/mm/swap.c > index 9295ae9..a77d68f 100644 > --- a/mm/swap.c > +++ b/mm/swap.c > @@ -575,7 +575,7 @@ static void lru_lazyfree_fn(struct page *page, struct lruvec *lruvec, > void *arg) > { > if (PageLRU(page) && PageAnon(page) && PageSwapBacked(page) && > - !PageUnevictable(page)) { > + !PageSwapCache(page) && !PageUnevictable(page)) { > bool active = PageActive(page); > > del_page_from_lru_list(page, lruvec, > @@ -665,7 +665,7 @@ void deactivate_file_page(struct page *page) > void mark_page_lazyfree(struct page *page) > { > if (PageLRU(page) && PageAnon(page) && PageSwapBacked(page) && > - !PageUnevictable(page)) { > + !PageSwapCache(page) && !PageUnevictable(page)) { > struct pagevec *pvec = &get_cpu_var(lru_lazyfree_pvecs); > > get_page(page); > -- > 2.9.5 > -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg0-f72.google.com (mail-pg0-f72.google.com [74.125.83.72]) by kanga.kvack.org (Postfix) with ESMTP id F327E6B025E for ; Tue, 26 Sep 2017 16:24:03 -0400 (EDT) Received: by mail-pg0-f72.google.com with SMTP id j16so23167044pga.6 for ; Tue, 26 Sep 2017 13:24:03 -0700 (PDT) Received: from mail.kernel.org (mail.kernel.org. [198.145.29.99]) by mx.google.com with ESMTPS id 73si6171890pfr.122.2017.09.26.13.24.02 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 26 Sep 2017 13:24:02 -0700 (PDT) Date: Tue, 26 Sep 2017 12:46:28 -0700 From: Shaohua Li Subject: Re: [PATCH V3 2/2] mm: fix data corruption caused by lazyfree page Message-ID: <20170926194628.ii5ugcow7jcqdgqg@kernel.org> References: <08c84256b007bf3f63c91d94383bd9eb6fee2daa.1506446061.git.shli@fb.com> <20170926194017.GB30943@cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170926194017.GB30943@cmpxchg.org> Sender: owner-linux-mm@kvack.org List-ID: To: Johannes Weiner Cc: linux-mm@kvack.org, asavkov@redhat.com, Kernel-team@fb.com, Shaohua Li , stable@vger.kernel.org, Hillf Danton , Minchan Kim , Hugh Dickins , Rik van Riel , Mel Gorman , Andrew Morton On Tue, Sep 26, 2017 at 03:40:17PM -0400, Johannes Weiner wrote: > On Tue, Sep 26, 2017 at 10:26:26AM -0700, Shaohua Li wrote: > > From: Shaohua Li > > > > MADV_FREE clears pte dirty bit and then marks the page lazyfree (clear > > SwapBacked). There is no lock to prevent the page is added to swap cache > > between these two steps by page reclaim. If page reclaim finds such > > page, it will simply add the page to swap cache without pageout the page > > to swap because the page is marked as clean. Next time, page fault will > > read data from the swap slot which doesn't have the original data, so we > > have a data corruption. To fix issue, we mark the page dirty and pageout > > the page. > > Reclaim and MADV_FREE hold the page lock when manipulating the dirty > and the swapcache state. > > Instead of undoing a racing MADV_FREE in reclaim, wouldn't it be safe > to check the dirty bit before add_to_swap() and skip clean pages? That would work, but I don't see an easy/clean way to check the dirty bit. Since the race is rare, I think this optimiztion isn't worthy. Thanks, Shaohua -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f197.google.com (mail-pf0-f197.google.com [209.85.192.197]) by kanga.kvack.org (Postfix) with ESMTP id 83FCB6B0069 for ; Tue, 26 Sep 2017 19:20:08 -0400 (EDT) Received: by mail-pf0-f197.google.com with SMTP id p87so19958783pfj.4 for ; Tue, 26 Sep 2017 16:20:08 -0700 (PDT) Received: from lgeamrelo12.lge.com (LGEAMRELO12.lge.com. [156.147.23.52]) by mx.google.com with ESMTP id e8si6572295pgf.23.2017.09.26.16.20.06 for ; Tue, 26 Sep 2017 16:20:07 -0700 (PDT) Date: Wed, 27 Sep 2017 08:20:05 +0900 From: Minchan Kim Subject: Re: [PATCH V3 1/2] mm: avoid marking swap cached page as lazyfree Message-ID: <20170926232005.GA32370@bbox> References: <6537ef3814398c0073630b03f176263bc81f0902.1506446061.git.shli@fb.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <6537ef3814398c0073630b03f176263bc81f0902.1506446061.git.shli@fb.com> Sender: owner-linux-mm@kvack.org List-ID: To: Shaohua Li Cc: linux-mm@kvack.org, asavkov@redhat.com, Kernel-team@fb.com, Shaohua Li , stable@vger.kernel.org, Johannes Weiner , Michal Hocko , Hillf Danton , Hugh Dickins , Mel Gorman , Andrew Morton On Tue, Sep 26, 2017 at 10:26:25AM -0700, Shaohua Li wrote: > From: Shaohua Li > > MADV_FREE clears pte dirty bit and then marks the page lazyfree (clear > SwapBacked). There is no lock to prevent the page is added to swap cache > between these two steps by page reclaim. Page reclaim could add the page > to swap cache and unmap the page. After page reclaim, the page is added > back to lru. At that time, we probably start draining per-cpu pagevec > and mark the page lazyfree. So the page could be in a state with > SwapBacked cleared and PG_swapcache set. Next time there is a refault in > the virtual address, do_swap_page can find the page from swap cache but > the page has PageSwapCache false because SwapBacked isn't set, so > do_swap_page will bail out and do nothing. The task will keep running > into fault handler. With new description, I got why you want to seperate this. Yub, it should be separated. Sorry for the noise. What I was missing is PageSwapCache's change which checked PG_swapbacked as well as PG_swapcache. I didn't notice that the change. Acked-by: Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f198.google.com (mail-pf0-f198.google.com [209.85.192.198]) by kanga.kvack.org (Postfix) with ESMTP id 556106B025E for ; Tue, 26 Sep 2017 19:20:50 -0400 (EDT) Received: by mail-pf0-f198.google.com with SMTP id f84so20003240pfj.0 for ; Tue, 26 Sep 2017 16:20:50 -0700 (PDT) Received: from lgeamrelo13.lge.com (LGEAMRELO13.lge.com. [156.147.23.53]) by mx.google.com with ESMTP id k1si6516398pgn.353.2017.09.26.16.20.48 for ; Tue, 26 Sep 2017 16:20:49 -0700 (PDT) Date: Wed, 27 Sep 2017 08:20:46 +0900 From: Minchan Kim Subject: Re: [PATCH V3 2/2] mm: fix data corruption caused by lazyfree page Message-ID: <20170926232046.GB32370@bbox> References: <08c84256b007bf3f63c91d94383bd9eb6fee2daa.1506446061.git.shli@fb.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <08c84256b007bf3f63c91d94383bd9eb6fee2daa.1506446061.git.shli@fb.com> Sender: owner-linux-mm@kvack.org List-ID: To: Shaohua Li Cc: linux-mm@kvack.org, asavkov@redhat.com, Kernel-team@fb.com, Shaohua Li , stable@vger.kernel.org, Johannes Weiner , Hillf Danton , Hugh Dickins , Rik van Riel , Mel Gorman , Andrew Morton On Tue, Sep 26, 2017 at 10:26:26AM -0700, Shaohua Li wrote: > From: Shaohua Li > > MADV_FREE clears pte dirty bit and then marks the page lazyfree (clear > SwapBacked). There is no lock to prevent the page is added to swap cache > between these two steps by page reclaim. If page reclaim finds such > page, it will simply add the page to swap cache without pageout the page > to swap because the page is marked as clean. Next time, page fault will > read data from the swap slot which doesn't have the original data, so we > have a data corruption. To fix issue, we mark the page dirty and pageout > the page. > > However, we shouldn't dirty all pages which is clean and in swap cache. > swapin page is swap cache and clean too. So we only dirty page which is > added into swap cache in page reclaim, which shouldn't be swapin page. > As Minchan suggested, simply dirty the page in add_to_swap can do the > job. > > Reported-by: Artem Savkov > Fix: 802a3a92ad7a(mm: reclaim MADV_FREE pages) > Signed-off-by: Shaohua Li Acked-by: Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org