All of lore.kernel.org
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Ryan Roberts <ryan.roberts@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@redhat.com>,
	"Huang, Ying" <ying.huang@intel.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	stable@vger.kernel.org
Subject: Re: [PATCH v1] mm: swap: Fix race between free_swap_and_cache() and swapoff()
Date: Tue, 5 Mar 2024 21:19:50 -0500	[thread overview]
Message-ID: <20240306021950.GA801254@cmpxchg.org> (raw)
In-Reply-To: <20240305151349.3781428-1-ryan.roberts@arm.com>

Hi Ryan,

On Tue, Mar 05, 2024 at 03:13:49PM +0000, Ryan Roberts wrote:
> There was previously a theoretical window where swapoff() could run and
> teardown a swap_info_struct while a call to free_swap_and_cache() was
> running in another thread. This could cause, amongst other bad
> possibilities, swap_page_trans_huge_swapped() (called by
> free_swap_and_cache()) to access the freed memory for swap_map.
> 
> This is a theoretical problem and I haven't been able to provoke it from
> a test case. But there has been agreement based on code review that this
> is possible (see link below).
> 
> Fix it by using get_swap_device()/put_swap_device(), which will stall
> swapoff(). There was an extra check in _swap_info_get() to confirm that
> the swap entry was valid. This wasn't present in get_swap_device() so
> I've added it. I couldn't find any existing get_swap_device() call sites
> where this extra check would cause any false alarms.

Unfortunately, I found one, testing current mm-everything:

[  189.420777] get_swap_device: Unused swap offset entry 000641ae
[  189.426648] ------------[ cut here ]------------
[  189.431290] WARNING: CPU: 3 PID: 369 at mm/swapfile.c:1301 get_swap_device+0x2da/0x3f0
[  189.439242] CPU: 3 PID: 369 Comm: cachish Not tainted 6.8.0-rc5-00527-g19d98776f227-dirty #30
[  189.447791] Hardware name: Micro-Star International Co., Ltd. MS-7B98/Z390-A PRO (MS-7B98), BIOS 1.80 12/25/2019
[  189.457998] RIP: 0010:get_swap_device+0x2da/0x3f0
[  189.462721] Code: a8 03 75 2a 65 48 ff 08 e9 36 ff ff ff 4c 89 e9 48 c7 c2 40 fd 91 83 48 c7 c6 c0 f9 91 83 48 c7 c7 60 ee 91 83 e8 26 2f af ff <0f> 0b eb af 4c 8d 6b 08 48 b8 00 00 00 00 00 fc ff df 4c 89 ea 48
[  189.481497] RSP: 0000:ffffc90000cff8a8 EFLAGS: 00010282
[  189.486760] RAX: 0000000000000032 RBX: ffff8881262eee00 RCX: 0000000000000000
[  189.493909] RDX: 0000000000000001 RSI: ffffffff83a1e620 RDI: 0000000000000001
[  189.501054] RBP: 1ffff9200019ff15 R08: 0000000000000001 R09: fffff5200019fee1
[  189.508202] R10: ffffc90000cff70f R11: 0000000000000001 R12: ffffc900018d51ae
[  189.515346] R13: 00000000000641ae R14: 0000000000000000 R15: 00000000000641af
[  189.522494] FS:  00007f7120263680(0000) GS:ffff88841c380000(0000) knlGS:0000000000000000
[  189.530591] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  189.536373] CR2: 00007f6e659a2ea3 CR3: 0000000046860004 CR4: 00000000003706f0
[  189.543516] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  189.550661] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  189.557811] Call Trace:
[  189.560276]  <TASK>
[  189.562393]  ? __warn+0xc4/0x250
[  189.565647]  ? get_swap_device+0x2da/0x3f0
[  189.569761]  ? report_bug+0x348/0x440
[  189.573444]  ? handle_bug+0x6d/0x90
[  189.576951]  ? exc_invalid_op+0x13/0x40
[  189.580810]  ? asm_exc_invalid_op+0x16/0x20
[  189.585019]  ? get_swap_device+0x2da/0x3f0
[  189.589142]  ? get_swap_device+0x2da/0x3f0
[  189.593255]  ? __pfx_get_swap_device+0x10/0x10
[  189.597717]  __read_swap_cache_async+0x9f/0x630
[  189.602281]  ? __pfx___read_swap_cache_async+0x10/0x10
[  189.607439]  ? __mod_memcg_lruvec_state+0x238/0x4f0
[  189.612344]  ? __pfx_swp_swap_info+0x10/0x10
[  189.616652]  swap_cluster_readahead+0x2cd/0x510
[  189.621206]  ? __pfx_swap_cluster_readahead+0x10/0x10
[  189.626279]  ? swap_cache_get_folio+0xcd/0x360
[  189.630760]  ? __count_memcg_events+0x10a/0x370
[  189.635318]  shmem_swapin_folio+0x2f2/0xc60
[  189.639525]  ? __pfx__raw_spin_lock+0x10/0x10
[  189.643908]  ? __pte_offset_map+0x19/0x1d0
[  189.648024]  shmem_get_folio_gfp+0x307/0xe30
[  189.652323]  ? __schedule+0x9f0/0x1fe0
[  189.656096]  ? __pfx_shmem_get_folio_gfp+0x10/0x10
[  189.660923]  ? filemap_map_pages+0x999/0xe60
[  189.665211]  shmem_fault+0x1d9/0x810
[  189.668834]  ? __pfx_shmem_fault+0x10/0x10
[  189.672954]  ? __pfx_filemap_map_pages+0x10/0x10
[  189.677590]  __do_fault+0xed/0x390
[  189.681012]  __handle_mm_fault+0x1ba1/0x2e80
[  189.685297]  ? __pfx___handle_mm_fault+0x10/0x10
[  189.689933]  ? __pfx_down_read_trylock+0x10/0x10
[  189.694570]  ? __pfx_hrtimer_nanosleep+0x10/0x10
[  189.699215]  handle_mm_fault+0xe0/0x560
[  189.703074]  ? __pfx_restore_fpregs_from_fpstate+0x10/0x10
[  189.708620]  do_user_addr_fault+0x2ba/0x9d0
[  189.712828]  exc_page_fault+0x54/0x90
[  189.716508]  asm_exc_page_fault+0x22/0x30
[  189.720535] RIP: 0033:0x5640dc2d72b5
[  189.724131] Code: 98 48 ba 00 00 00 00 03 00 00 00 48 89 d6 48 2b 75 a0 ba 00 00 00 00 48 f7 f6 48 89 d1 48 89 ca 48 8b 45 a0 48 01 d0 48 01 d8 <0f> b6 00 bf e8 03 00 00 e8 1e fe ff ff 48 83 45 a8 01 48 8d 45 d0
[  189.742922] RSP: 002b:00007ffc227e3f60 EFLAGS: 00010206
[  189.748165] RAX: 00007f6e659a2ea3 RBX: 00007f6e2007e000 RCX: 0000000045924ea3
[  189.755311] RDX: 0000000045924ea3 RSI: 0000000300000000 RDI: 00007f71202586a0
[  189.762483] RBP: 00007ffc227e3fe0 R08: 00007f7120258074 R09: 00007f71202580a0
[  189.769633] R10: 0000000000019458 R11: 00000000008aa400 R12: 0000000000000000
[  189.776781] R13: 00007ffc227e4128 R14: 00007f712029d000 R15: 00005640dc2d9dd8
[  189.783928]  </TASK>
[  189.786126] ---[ end trace 0000000000000000 ]---
[  285.827888] get_swap_device: Unused swap offset entry 0018403f
[  320.699306] get_swap_device: Unused swap offset entry 000b001b
[  354.031339] get_swap_device: Unused swap offset entry 000681a9
[  364.958435] get_swap_device: Unused swap offset entry 001f4055
[  364.976235] get_swap_device: Unused swap offset entry 001f4057
[  365.530174] get_swap_device: Unused swap offset entry 000d415c
[  394.223792] get_swap_device: Unused swap offset entry 001540d0
[  394.317299] get_swap_device: Unused swap offset entry 000341d9
[  394.341727] get_swap_device: Unused swap offset entry 0006c07e
[  396.062365] get_swap_device: Unused swap offset entry 000541a4
[  396.068262] get_swap_device: Unused swap offset entry 000541a7
[  402.629551] get_swap_device: Unused swap offset entry 00294021
[  436.740622] get_swap_device: Unused swap offset entry 00334155
[  436.758527] get_swap_device: Unused swap offset entry 001b417c

swap_cluster_readahead() calls __read_swap_cache_async() on a range of
made-up swap entries around the faulting slot. The device and the
range (si->max) are valid, but the specific entry might not be in
use. __read_swap_cache_async() instead relies on swapcache_prepare()
returning -ENOENT to catch this and skip gracefully.

Confirmed that reverting the patch makes the warnings go away.

  parent reply	other threads:[~2024-03-06  2:19 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-05 15:13 [PATCH v1] mm: swap: Fix race between free_swap_and_cache() and swapoff() Ryan Roberts
2024-03-05 15:50 ` David Hildenbrand
2024-03-05 16:33   ` Ryan Roberts
2024-03-05 22:05     ` David Hildenbrand
2024-03-06  2:39       ` Huang, Ying
2024-03-06  8:10       ` Ryan Roberts
2024-03-06  2:19 ` Johannes Weiner [this message]
2024-03-06  8:23   ` Ryan Roberts
2024-03-06  2:52 ` Huang, Ying
2024-03-06  8:51   ` Miaohe Lin
2024-03-06  9:31     ` Ryan Roberts
2024-03-07  2:38       ` Miaohe Lin
2024-03-07  5:56         ` Huang, Ying
2024-03-07  6:50           ` Miaohe Lin
2024-03-07  7:34             ` Huang, Ying
2024-03-07  7:48               ` Ryan Roberts
2024-03-07  8:54                 ` Huang, Ying
2024-03-07  9:19                   ` Ryan Roberts
2024-03-08  0:55                     ` Huang, Ying
2024-03-07  8:50               ` Miaohe Lin
2024-03-07  4:37       ` Huang, Ying
2024-03-07  4:39     ` Huang, Ying
2024-03-06  9:03   ` Ryan Roberts
2024-03-07  5:48     ` Huang, Ying

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240306021950.GA801254@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ryan.roberts@arm.com \
    --cc=stable@vger.kernel.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.