All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Huang\, Ying" <ying.huang@intel.com>
To: Gao Xiang <hsiangkao@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,  <linux-mm@kvack.org>,
	 <linux-kernel@vger.kernel.org>,
	 Rafael Aquini <aquini@redhat.com>,
	 Carlos Maiolino <cmaiolino@redhat.com>,
	 Eric Sandeen <esandeen@redhat.com>,
	 stable <stable@vger.kernel.org>
Subject: Re: [PATCH] mm, THP, swap: fix allocating cluster for swapfile by mistake
Date: Thu, 20 Aug 2020 12:36:08 +0800	[thread overview]
Message-ID: <871rk2x7bb.fsf@yhuang-dev.intel.com> (raw)
In-Reply-To: <20200819195613.24269-1-hsiangkao@redhat.com> (Gao Xiang's message of "Thu, 20 Aug 2020 03:56:13 +0800")

Gao Xiang <hsiangkao@redhat.com> writes:

> SWP_FS doesn't mean the device is file-backed swap device,
> which just means each writeback request should go through fs
> by DIO. Or it'll just use extents added by .swap_activate(),
> but it also works as file-backed swap device.
>
> So in order to achieve the goal of the original patch,
> SWP_BLKDEV should be used instead.
>
> FS corruption can be observed with SSD device + XFS +
> fragmented swapfile due to CONFIG_THP_SWAP=y.
>
> Fixes: f0eea189e8e9 ("mm, THP, swap: Don't allocate huge cluster for file backed swap device")
> Fixes: 38d8b4e6bdc8 ("mm, THP, swap: delay splitting THP during swap out")
> Cc: "Huang, Ying" <ying.huang@intel.com>
> Cc: stable <stable@vger.kernel.org>
> Signed-off-by: Gao Xiang <hsiangkao@redhat.com>

Good catch!  The fix itself looks good me!  Although the description is
a little confusing.

After some digging, it seems that SWP_FS is set on the swap devices
which make swap entry read/write go through the file system specific
callback (now used by swap over NFS only).

Best Regards,
Huang, Ying

> ---
>
> I reproduced the issue with the following details:
>
> Environment:
> QEMU + upstream kernel + buildroot + NVMe (2 GB)
>
> Kernel config:
> CONFIG_BLK_DEV_NVME=y
> CONFIG_THP_SWAP=y
>
> Some reproducable steps:
> mkfs.xfs -f /dev/nvme0n1
> mkdir /tmp/mnt
> mount /dev/nvme0n1 /tmp/mnt
> bs="32k"
> sz="1024m"    # doesn't matter too much, I also tried 16m
> xfs_io -f -c "pwrite -R -b $bs 0 $sz" -c "fdatasync" /tmp/mnt/sw
> xfs_io -f -c "pwrite -R -b $bs 0 $sz" -c "fdatasync" /tmp/mnt/sw
> xfs_io -f -c "pwrite -R -b $bs 0 $sz" -c "fdatasync" /tmp/mnt/sw
> xfs_io -f -c "pwrite -F -S 0 -b $bs 0 $sz" -c "fdatasync" /tmp/mnt/sw
> xfs_io -f -c "pwrite -R -b $bs 0 $sz" -c "fsync" /tmp/mnt/sw
>
> mkswap /tmp/mnt/sw
> swapon /tmp/mnt/sw
>
> stress --vm 2 --vm-bytes 600M   # doesn't matter too much as well
>
> Symptoms:
>  - FS corruption (e.g. checksum failure)
>  - memory corruption at: 0xd2808010
>  - segfault
>  ... 
>
>  mm/swapfile.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 6c26916e95fd..2937daf3ca02 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -1074,7 +1074,7 @@ int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_size)
>  			goto nextsi;
>  		}
>  		if (size == SWAPFILE_CLUSTER) {
> -			if (!(si->flags & SWP_FS))
> +			if (si->flags & SWP_BLKDEV)
>  				n_ret = swap_alloc_cluster(si, swp_entries);
>  		} else
>  			n_ret = scan_swap_map_slots(si, SWAP_HAS_CACHE,


  parent reply	other threads:[~2020-08-20  4:36 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-19 19:56 [PATCH] mm, THP, swap: fix allocating cluster for swapfile by mistake Gao Xiang
2020-08-19 20:05 ` Andrew Morton
2020-08-19 20:15   ` Gao Xiang
2020-08-19 21:41     ` Yang Shi
2020-08-20  1:24       ` Gao Xiang
2020-08-19 20:44   ` Rafael Aquini
2020-08-19 20:54     ` Gao Xiang
2020-08-20  4:36 ` Huang, Ying [this message]
2020-08-20  4:41   ` Gao Xiang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=871rk2x7bb.fsf@yhuang-dev.intel.com \
    --to=ying.huang@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=aquini@redhat.com \
    --cc=cmaiolino@redhat.com \
    --cc=esandeen@redhat.com \
    --cc=hsiangkao@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.