stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Ivan Babrou <ivan@cloudflare.com>,
	Minchan Kim <minchan@kernel.org>, Nitin Gupta <ngupta@vflare.org>,
	Sergey Senozhatsky <senozhatsky@chromium.org>,
	Jens Axboe <axboe@kernel.dk>,
	David Hildenbrand <david@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: [PATCH 5.4 41/52] mm: fix unexpected zeroed page mapping with zram swap
Date: Tue, 10 May 2022 15:08:10 +0200	[thread overview]
Message-ID: <20220510130731.053838711@linuxfoundation.org> (raw)
In-Reply-To: <20220510130729.852544477@linuxfoundation.org>

From: Minchan Kim <minchan@kernel.org>

commit e914d8f00391520ecc4495dd0ca0124538ab7119 upstream.

Two processes under CLONE_VM cloning, user process can be corrupted by
seeing zeroed page unexpectedly.

      CPU A                        CPU B

  do_swap_page                do_swap_page
  SWP_SYNCHRONOUS_IO path     SWP_SYNCHRONOUS_IO path
  swap_readpage valid data
    swap_slot_free_notify
      delete zram entry
                              swap_readpage zeroed(invalid) data
                              pte_lock
                              map the *zero data* to userspace
                              pte_unlock
  pte_lock
  if (!pte_same)
    goto out_nomap;
  pte_unlock
  return and next refault will
  read zeroed data

The swap_slot_free_notify is bogus for CLONE_VM case since it doesn't
increase the refcount of swap slot at copy_mm so it couldn't catch up
whether it's safe or not to discard data from backing device.  In the
case, only the lock it could rely on to synchronize swap slot freeing is
page table lock.  Thus, this patch gets rid of the swap_slot_free_notify
function.  With this patch, CPU A will see correct data.

      CPU A                        CPU B

  do_swap_page                do_swap_page
  SWP_SYNCHRONOUS_IO path     SWP_SYNCHRONOUS_IO path
                              swap_readpage original data
                              pte_lock
                              map the original data
                              swap_free
                                swap_range_free
                                  bd_disk->fops->swap_slot_free_notify
  swap_readpage read zeroed data
                              pte_unlock
  pte_lock
  if (!pte_same)
    goto out_nomap;
  pte_unlock
  return
  on next refault will see mapped data by CPU B

The concern of the patch would increase memory consumption since it
could keep wasted memory with compressed form in zram as well as
uncompressed form in address space.  However, most of cases of zram uses
no readahead and do_swap_page is followed by swap_free so it will free
the compressed form from in zram quickly.

Link: https://lkml.kernel.org/r/YjTVVxIAsnKAXjTd@google.com
Fixes: 0bcac06f27d7 ("mm, swap: skip swapcache for swapin of synchronous device")
Reported-by: Ivan Babrou <ivan@cloudflare.com>
Tested-by: Ivan Babrou <ivan@cloudflare.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: David Hildenbrand <david@redhat.com>
Cc: <stable@vger.kernel.org>	[4.14+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 mm/page_io.c |   54 ------------------------------------------------------
 1 file changed, 54 deletions(-)

--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -69,54 +69,6 @@ void end_swap_bio_write(struct bio *bio)
 	bio_put(bio);
 }
 
-static void swap_slot_free_notify(struct page *page)
-{
-	struct swap_info_struct *sis;
-	struct gendisk *disk;
-	swp_entry_t entry;
-
-	/*
-	 * There is no guarantee that the page is in swap cache - the software
-	 * suspend code (at least) uses end_swap_bio_read() against a non-
-	 * swapcache page.  So we must check PG_swapcache before proceeding with
-	 * this optimization.
-	 */
-	if (unlikely(!PageSwapCache(page)))
-		return;
-
-	sis = page_swap_info(page);
-	if (!(sis->flags & SWP_BLKDEV))
-		return;
-
-	/*
-	 * The swap subsystem performs lazy swap slot freeing,
-	 * expecting that the page will be swapped out again.
-	 * So we can avoid an unnecessary write if the page
-	 * isn't redirtied.
-	 * This is good for real swap storage because we can
-	 * reduce unnecessary I/O and enhance wear-leveling
-	 * if an SSD is used as the as swap device.
-	 * But if in-memory swap device (eg zram) is used,
-	 * this causes a duplicated copy between uncompressed
-	 * data in VM-owned memory and compressed data in
-	 * zram-owned memory.  So let's free zram-owned memory
-	 * and make the VM-owned decompressed page *dirty*,
-	 * so the page should be swapped out somewhere again if
-	 * we again wish to reclaim it.
-	 */
-	disk = sis->bdev->bd_disk;
-	entry.val = page_private(page);
-	if (disk->fops->swap_slot_free_notify && __swap_count(entry) == 1) {
-		unsigned long offset;
-
-		offset = swp_offset(entry);
-
-		SetPageDirty(page);
-		disk->fops->swap_slot_free_notify(sis->bdev,
-				offset);
-	}
-}
-
 static void end_swap_bio_read(struct bio *bio)
 {
 	struct page *page = bio_first_page_all(bio);
@@ -132,7 +84,6 @@ static void end_swap_bio_read(struct bio
 	}
 
 	SetPageUptodate(page);
-	swap_slot_free_notify(page);
 out:
 	unlock_page(page);
 	WRITE_ONCE(bio->bi_private, NULL);
@@ -371,11 +322,6 @@ int swap_readpage(struct page *page, boo
 
 	ret = bdev_read_page(sis->bdev, swap_page_sector(page), page);
 	if (!ret) {
-		if (trylock_page(page)) {
-			swap_slot_free_notify(page);
-			unlock_page(page);
-		}
-
 		count_vm_event(PSWPIN);
 		return 0;
 	}



  parent reply	other threads:[~2022-05-10 13:35 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-10 13:07 [PATCH 5.4 00/52] 5.4.193-rc1 review Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 5.4 01/52] MIPS: Fix CP0 counter erratum detection for R4k CPUs Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 5.4 02/52] parisc: Merge model and model name into one line in /proc/cpuinfo Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 5.4 03/52] ALSA: fireworks: fix wrong return count shorter than expected by 4 bytes Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 5.4 04/52] gpiolib: of: fix bounds check for gpio-reserved-ranges Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 5.4 05/52] Revert "SUNRPC: attempt AF_LOCAL connect on setup" Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 5.4 06/52] firewire: fix potential uaf in outbound_phy_packet_callback() Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 5.4 07/52] firewire: remove check of list iterator against head past the loop body Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 5.4 08/52] firewire: core: extend card->lock in fw_core_handle_bus_reset Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 5.4 09/52] ACPICA: Always create namespace nodes using acpi_ns_create_node() Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 5.4 10/52] genirq: Synchronize interrupt thread startup Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 5.4 11/52] ASoC: da7219: Fix change notifications for tone generator frequency Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 5.4 12/52] ASoC: wm8958: Fix change notifications for DSP controls Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 5.4 13/52] ASoC: meson: Fix event generation for G12A tohdmi mux Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 5.4 14/52] s390/dasd: fix data corruption for ESE devices Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 5.4 15/52] s390/dasd: prevent double format of tracks " Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 5.4 16/52] s390/dasd: Fix read for ESE with blksize < 4k Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 5.4 17/52] s390/dasd: Fix read inconsistency for ESE DASD devices Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 5.4 18/52] can: grcan: grcan_close(): fix deadlock Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 5.4 19/52] can: grcan: use ofdev->dev when allocating DMA memory Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 5.4 20/52] nfc: replace improper check device_is_registered() in netlink related functions Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 5.4 21/52] nfc: nfcmrvl: main: reorder destructive operations in nfcmrvl_nci_unregister_dev to avoid bugs Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 5.4 22/52] NFC: netlink: fix sleep in atomic bug when firmware download timeout Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 5.4 23/52] hwmon: (adt7470) Fix warning on module removal Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 5.4 24/52] ASoC: dmaengine: Restore NULL prepare_slave_config() callback Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 5.4 25/52] RDMA/siw: Fix a condition race issue in MPA request processing Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 5.4 26/52] net: ethernet: mediatek: add missing of_node_put() in mtk_sgmii_init() Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 5.4 27/52] net: stmmac: dwmac-sun8i: add missing of_node_put() in sun8i_dwmac_register_mdio_mux() Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 5.4 28/52] net: emaclite: Add error handling for of_address_to_resource() Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 5.4 29/52] selftests: mirror_gre_bridge_1q: Avoid changing PVID while interface is operational Greg Kroah-Hartman
2022-05-10 13:07 ` [PATCH 5.4 30/52] bnxt_en: Fix possible bnxt_open() failure caused by wrong RFS flag Greg Kroah-Hartman
2022-05-10 13:08 ` [PATCH 5.4 31/52] smsc911x: allow using IRQ0 Greg Kroah-Hartman
2022-05-10 13:08 ` [PATCH 5.4 32/52] btrfs: always log symlinks in full mode Greg Kroah-Hartman
2022-05-10 13:08 ` [PATCH 5.4 33/52] net: igmp: respect RCU rules in ip_mc_source() and ip_mc_msfilter() Greg Kroah-Hartman
2022-05-10 13:08 ` [PATCH 5.4 34/52] drm/amdkfd: Use drm_priv to pass VM from KFD to amdgpu Greg Kroah-Hartman
2022-05-10 13:08 ` [PATCH 5.4 35/52] NFSv4: Dont invalidate inode attributes on delegation return Greg Kroah-Hartman
2022-05-10 13:08 ` [PATCH 5.4 36/52] kvm: x86/cpuid: Only provide CPUID leaf 0xA if host has architectural PMU Greg Kroah-Hartman
2022-05-10 13:08 ` [PATCH 5.4 37/52] x86/kvm: Preserve BSP MSR_KVM_POLL_CONTROL across suspend/resume Greg Kroah-Hartman
2022-05-10 13:08 ` [PATCH 5.4 38/52] KVM: LAPIC: Enable timer posted-interrupt only when mwait/hlt is advertised Greg Kroah-Hartman
2022-05-10 13:08 ` [PATCH 5.4 39/52] net: ipv6: ensure we call ipv6_mc_down() at most once Greg Kroah-Hartman
2022-05-10 13:08 ` [PATCH 5.4 40/52] block-map: add __GFP_ZERO flag for alloc_page in function bio_copy_kern Greg Kroah-Hartman
2022-05-10 13:08 ` Greg Kroah-Hartman [this message]
2022-05-10 13:08 ` [PATCH 5.4 42/52] ALSA: pcm: Fix races among concurrent hw_params and hw_free calls Greg Kroah-Hartman
2022-05-10 13:08 ` [PATCH 5.4 43/52] ALSA: pcm: Fix races among concurrent read/write and buffer changes Greg Kroah-Hartman
2022-05-10 13:08 ` [PATCH 5.4 44/52] ALSA: pcm: Fix races among concurrent prepare and hw_params/hw_free calls Greg Kroah-Hartman
2022-05-10 13:08 ` [PATCH 5.4 45/52] ALSA: pcm: Fix races among concurrent prealloc proc writes Greg Kroah-Hartman
2022-05-10 13:08 ` [PATCH 5.4 46/52] ALSA: pcm: Fix potential AB/BA lock with buffer_mutex and mmap_lock Greg Kroah-Hartman
2022-05-10 13:08 ` [PATCH 5.4 47/52] tcp: make sure treq->af_specific is initialized Greg Kroah-Hartman
2022-05-10 13:08 ` [PATCH 5.4 48/52] dm: fix mempool NULL pointer race when completing IO Greg Kroah-Hartman
2022-05-10 13:08 ` [PATCH 5.4 49/52] dm: interlock pending dm_io and dm_wait_for_bios_completion Greg Kroah-Hartman
2022-05-10 13:08 ` [PATCH 5.4 50/52] PCI: aardvark: Clear all MSIs at setup Greg Kroah-Hartman
2022-05-10 13:08 ` [PATCH 5.4 51/52] PCI: aardvark: Fix reading MSI interrupt number Greg Kroah-Hartman
2022-05-10 13:08 ` [PATCH 5.4 52/52] mmc: rtsx: add 74 Clocks in power on flow Greg Kroah-Hartman
2022-05-10 17:09 ` [PATCH 5.4 00/52] 5.4.193-rc1 review Florian Fainelli
2022-05-10 22:43 ` Shuah Khan
2022-05-11  1:11 ` Guenter Roeck
2022-05-11  1:58 ` Samuel Zou
2022-05-11  5:44 ` Naresh Kamboju
2022-05-11  9:19 ` Jon Hunter
2022-05-11  9:59 ` Sudip Mukherjee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220510130731.053838711@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=david@redhat.com \
    --cc=ivan@cloudflare.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=minchan@kernel.org \
    --cc=ngupta@vflare.org \
    --cc=senozhatsky@chromium.org \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).