linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: stable@vger.kernel.org,senozhatsky@chromium.org,ngupta@vflare.org,ivan@cloudflare.com,david@redhat.com,axboe@kernel.dk,minchan@kernel.org,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org
Subject: [patch 08/14] mm: fix unexpected zeroed page mapping with zram swap
Date: Thu, 14 Apr 2022 19:13:46 -0700	[thread overview]
Message-ID: <20220415021347.165E6C385A1@smtp.kernel.org> (raw)
In-Reply-To: <20220414191240.9f86d15a3e3afd848a9839a6@linux-foundation.org>

From: Minchan Kim <minchan@kernel.org>
Subject: mm: fix unexpected zeroed page mapping with zram swap

Two processes under CLONE_VM cloning, user process can be corrupted by
seeing zeroed page unexpectedly.

    CPU A                        CPU B

do_swap_page                do_swap_page
SWP_SYNCHRONOUS_IO path     SWP_SYNCHRONOUS_IO path
swap_readpage valid data
  swap_slot_free_notify
    delete zram entry
                            swap_readpage zeroed(invalid) data
                            pte_lock
                            map the *zero data* to userspace
                            pte_unlock
pte_lock
if (!pte_same)
  goto out_nomap;
pte_unlock
return and next refault will
read zeroed data

The swap_slot_free_notify is bogus for CLONE_VM case since it doesn't
increase the refcount of swap slot at copy_mm so it couldn't catch up
whether it's safe or not to discard data from backing device.  In the
case, only the lock it could rely on to synchronize swap slot freeing is
page table lock.  Thus, this patch gets rid of the swap_slot_free_notify
function.  With this patch, CPU A will see correct data.

    CPU A                        CPU B

do_swap_page                do_swap_page
SWP_SYNCHRONOUS_IO path     SWP_SYNCHRONOUS_IO path
                            swap_readpage original data
                            pte_lock
                            map the original data
                            swap_free
                              swap_range_free
                                bd_disk->fops->swap_slot_free_notify
swap_readpage read zeroed data
                            pte_unlock
pte_lock
if (!pte_same)
  goto out_nomap;
pte_unlock
return
on next refault will see mapped data by CPU B

The concern of the patch would increase memory consumption since it could
keep wasted memory with compressed form in zram as well as uncompressed
form in address space.  However, most of cases of zram uses no readahead
and do_swap_page is followed by swap_free so it will free the compressed
form from in zram quickly.

Link: https://lkml.kernel.org/r/YjTVVxIAsnKAXjTd@google.com
Fixes: 0bcac06f27d7 ("mm, swap: skip swapcache for swapin of synchronous device")
Reported-by: Ivan Babrou <ivan@cloudflare.com>
Tested-by: Ivan Babrou <ivan@cloudflare.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: David Hildenbrand <david@redhat.com>
Cc: <stable@vger.kernel.org>	[4.14+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/page_io.c |   54 -------------------------------------------------
 1 file changed, 54 deletions(-)

--- a/mm/page_io.c~mm-fix-unexpected-zeroed-page-mapping-with-zram-swap
+++ a/mm/page_io.c
@@ -51,54 +51,6 @@ void end_swap_bio_write(struct bio *bio)
 	bio_put(bio);
 }
 
-static void swap_slot_free_notify(struct page *page)
-{
-	struct swap_info_struct *sis;
-	struct gendisk *disk;
-	swp_entry_t entry;
-
-	/*
-	 * There is no guarantee that the page is in swap cache - the software
-	 * suspend code (at least) uses end_swap_bio_read() against a non-
-	 * swapcache page.  So we must check PG_swapcache before proceeding with
-	 * this optimization.
-	 */
-	if (unlikely(!PageSwapCache(page)))
-		return;
-
-	sis = page_swap_info(page);
-	if (data_race(!(sis->flags & SWP_BLKDEV)))
-		return;
-
-	/*
-	 * The swap subsystem performs lazy swap slot freeing,
-	 * expecting that the page will be swapped out again.
-	 * So we can avoid an unnecessary write if the page
-	 * isn't redirtied.
-	 * This is good for real swap storage because we can
-	 * reduce unnecessary I/O and enhance wear-leveling
-	 * if an SSD is used as the as swap device.
-	 * But if in-memory swap device (eg zram) is used,
-	 * this causes a duplicated copy between uncompressed
-	 * data in VM-owned memory and compressed data in
-	 * zram-owned memory.  So let's free zram-owned memory
-	 * and make the VM-owned decompressed page *dirty*,
-	 * so the page should be swapped out somewhere again if
-	 * we again wish to reclaim it.
-	 */
-	disk = sis->bdev->bd_disk;
-	entry.val = page_private(page);
-	if (disk->fops->swap_slot_free_notify && __swap_count(entry) == 1) {
-		unsigned long offset;
-
-		offset = swp_offset(entry);
-
-		SetPageDirty(page);
-		disk->fops->swap_slot_free_notify(sis->bdev,
-				offset);
-	}
-}
-
 static void end_swap_bio_read(struct bio *bio)
 {
 	struct page *page = bio_first_page_all(bio);
@@ -114,7 +66,6 @@ static void end_swap_bio_read(struct bio
 	}
 
 	SetPageUptodate(page);
-	swap_slot_free_notify(page);
 out:
 	unlock_page(page);
 	WRITE_ONCE(bio->bi_private, NULL);
@@ -394,11 +345,6 @@ int swap_readpage(struct page *page, boo
 	if (sis->flags & SWP_SYNCHRONOUS_IO) {
 		ret = bdev_read_page(sis->bdev, swap_page_sector(page), page);
 		if (!ret) {
-			if (trylock_page(page)) {
-				swap_slot_free_notify(page);
-				unlock_page(page);
-			}
-
 			count_vm_event(PSWPIN);
 			goto out;
 		}
_


  parent reply	other threads:[~2022-04-15  2:13 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-15  2:12 incoming Andrew Morton
2022-04-15  2:13 ` [patch 01/14] MAINTAINERS: Broadcom internal lists aren't maintainers Andrew Morton
2022-04-15  2:13 ` [patch 02/14] tmpfs: fix regressions from wider use of ZERO_PAGE Andrew Morton
2022-04-15 22:10   ` Linus Torvalds
2022-04-15 22:21     ` Matthew Wilcox
2022-04-15 22:41     ` Hugh Dickins
     [not found]     ` <Ylpj9of+CP4ipDtm@zn.tnic>
2022-04-16 14:07       ` Mark Hemment
     [not found]         ` <Ylr8rR+LHQ1uGL47@zn.tnic>
2022-04-16 17:42           ` Linus Torvalds
     [not found]             ` <Ylsx5PwyUPOainHa@zn.tnic>
     [not found]               ` <YlxtTNFP58TcUHZQ@zn.tnic>
2022-04-17 20:56                 ` Linus Torvalds
     [not found]                   ` <Yl06PWVgeZplboXV@zn.tnic>
2022-04-18 17:10                     ` Linus Torvalds
     [not found]                       ` <Yl5+DCfQmG5C3BHf@zn.tnic>
2022-04-19 16:41                         ` Linus Torvalds
     [not found]                           ` <Yl7146AZDgfLviVv@zn.tnic>
     [not found]                             ` <YmFy8DEqvX4FlnuB@zn.tnic>
     [not found]                               ` <CAHk-=wgf2C9nFiC+3UFG4k7XVTQq5aV6fasSYuT_nQeo_Yew6A@mail.gmail.com>
2022-04-21 17:22                                 ` Linus Torvalds
     [not found]                                   ` <YmWm5AXdwgwu57KZ@zn.tnic>
2022-04-24 19:54                                     ` Linus Torvalds
2022-04-24 20:24                                       ` Linus Torvalds
     [not found]                                       ` <YmiK7Bos+zLAvL0t@zn.tnic>
2022-04-27  1:29                                         ` Linus Torvalds
     [not found]                                           ` <YmkdxaKdc2w/3I7o@zn.tnic>
2022-04-27 16:00                                             ` Linus Torvalds
     [not found]                                               ` <YnLMSWbz6BNfsBME@zn.tnic>
2022-05-04 19:22                                                 ` Linus Torvalds
     [not found]                                                   ` <YnLfl6lupN2nq7+t@zn.tnic>
2022-05-04 20:40                                                     ` Linus Torvalds
     [not found]                                                       ` <YnLplKy0Y66SsvQw@zn.tnic>
2022-05-04 21:09                                                         ` Linus Torvalds
     [not found]                                                           ` <Ynow8F3G8Kl6V3gu@zn.tnic>
2022-05-10 17:17                                                             ` clear_user (was: [patch 02/14] tmpfs: fix regressions from wider use of ZERO_PAGE) Linus Torvalds
2022-05-10 17:28                                                             ` Linus Torvalds
     [not found]                                                               ` <YnqqhmYv75p+xl73@zn.tnic>
     [not found]                                                                 ` <Ynq1nVpu1xCpjnXm@zn.tnic>
2022-05-24 12:32                                                                   ` [PATCH] x86/clear_user: Make it faster Borislav Petkov
2022-05-24 16:51                                                                     ` Linus Torvalds
2022-05-24 17:30                                                                       ` Borislav Petkov
2022-05-25 12:11                                                                     ` Mark Hemment
2022-05-27 11:28                                                                       ` Borislav Petkov
2022-05-27 11:10                                                                     ` Ingo Molnar
2022-06-22 14:21                                                                     ` Borislav Petkov
2022-06-22 15:06                                                                       ` Linus Torvalds
2022-06-22 20:14                                                                         ` Borislav Petkov
2022-06-22 21:07                                                                           ` Linus Torvalds
2022-06-23  9:41                                                                             ` Borislav Petkov
2022-07-05 17:01                                                                               ` [PATCH -final] " Borislav Petkov
2022-07-06  9:24                                                                                 ` Alexey Dobriyan
2022-07-11 10:33                                                                                   ` Borislav Petkov
2022-07-12 12:32                                                                                     ` Alexey Dobriyan
2022-08-06 12:49                                                                                       ` Borislav Petkov
2022-04-15  2:13 ` [patch 03/14] mm/secretmem: fix panic when growing a memfd_secret Andrew Morton
2022-04-15  2:13 ` [patch 04/14] irq_work: use kasan_record_aux_stack_noalloc() record callstack Andrew Morton
2022-04-15  2:13 ` [patch 05/14] kasan: fix hw tags enablement when KUNIT tests are disabled Andrew Morton
2022-04-15  2:13 ` [patch 06/14] mm, kfence: support kmem_dump_obj() for KFENCE objects Andrew Morton
2022-04-15  2:13 ` Andrew Morton [this message]
2022-04-15  2:13 ` [patch 09/14] mm: compaction: fix compiler warning when CONFIG_COMPACTION=n Andrew Morton
2022-04-15  2:13 ` [patch 10/14] hugetlb: do not demote poisoned hugetlb pages Andrew Morton
2022-04-15  2:13 ` [patch 11/14] revert "fs/binfmt_elf: fix PT_LOAD p_align values for loaders" Andrew Morton
2022-04-15  2:13 ` [patch 12/14] revert "fs/binfmt_elf: use PT_LOAD p_align values for static PIE" Andrew Morton
2022-04-15  2:14 ` [patch 13/14] mm/vmalloc: fix spinning drain_vmap_work after reading from /proc/vmcore Andrew Morton
2022-04-15  2:14 ` [patch 14/14] mm: kmemleak: take a full lowmem check in kmemleak_*_phys() Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220415021347.165E6C385A1@smtp.kernel.org \
    --to=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=david@redhat.com \
    --cc=ivan@cloudflare.com \
    --cc=linux-mm@kvack.org \
    --cc=minchan@kernel.org \
    --cc=mm-commits@vger.kernel.org \
    --cc=ngupta@vflare.org \
    --cc=patches@lists.linux.dev \
    --cc=senozhatsky@chromium.org \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).