From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from invmail4.hynix.com (exvmail4.hynix.com [166.125.252.92]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 5CF652C21C6 for ; Fri, 3 Oct 2025 02:08:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=166.125.252.92 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759457294; cv=none; b=VerphVk5hJL4Cz6aCxUvUVJ3iduJNFKU+lAmeDFsx8Ag/At5ZxUdGxjaNMhkrjMMGZQj+I7OBV6DPq4MAK3R+f7fsJmxzXoQKrAmawHxHGKCFOm4x5Za2yNiIkJATtwgawjJgN1aoQ6ZztWwlZkrIP/+XOlU2hovmSs40M6z3nI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759457294; c=relaxed/simple; bh=PZqSkglvJbAMMhE1XeTq+cK6FBzB4WPFb8deXKQsqOo=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=hzgPsnYOaeEb9DoSY02tVIVE3+y/ppZNC1Y+ky22EHNG/yAWBWGxXDOphwm5c0G0Q+lZcC/emMTFJhta2gXlXUM1YOnjMu23KghDHwPPrFDLzt5NmS/1O4qYzNbSXrXvPaag/53HbxeA+Il5TaMJZ/xn7Xx3zKVAnMIfvU80O4Y= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=sk.com; spf=pass smtp.mailfrom=sk.com; arc=none smtp.client-ip=166.125.252.92 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=sk.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=sk.com X-AuditID: a67dfc5b-c45ff70000001609-fc-68df3005d372 Date: Fri, 3 Oct 2025 11:08:00 +0900 From: Byungchul Park To: Yeoreum Yun Cc: akpm@linux-foundation.org, david@redhat.com, ziy@nvidia.com, matthew.brost@intel.com, joshua.hahnjy@gmail.com, rakie.kim@sk.com, gourry@gourry.net, ying.huang@linux.alibaba.com, apopple@nvidia.com, clameter@sgi.com, kravetz@us.ibm.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, max.byungchul.park@gmail.com, kernel_team@skhynix.com, harry.yoo@oracle.com, gwan-gyeong.mun@intel.com, syzkaller@googlegroups.com, ysk@kzalloc.com Subject: Re: [RFC] mm/migrate: make sure folio_unlock() before folio_wait_writeback() Message-ID: <20251003020800.GA29748@system.software.com> References: <20251002081612.53281-1-byungchul@sk.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.9.4 (2018-02-28) X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrIIsWRmVeSWpSXmKPExsXC9ZZnkS6rwf0Mg70T9CzmrF/DZrHrRojF +sZ17BZf1/9itvh59zi7xcXXf5gs7i97xmJxfOs8dovra48yWVzeNYfN4t6a/6wW3/qkLS5M 7GW1OPKmm9li7hdDi9VrMiy+rF7FZjH76D12ByGPNfPWMHrsnHWX3WPPxJNsHt1tl9k9Fu95 yeSx6dMkdo+Fv18we5yY8ZvFY+dDS4/e5ndsHh+f3mLxeL/vKpvH3etAZeeu9TF7fN4kF8Af xWWTkpqTWZZapG+XwJWxdsYe9oKO/Ip3/S3sDYzHvbsYOTkkBEwkzi+9xQpjr7y8mw3EZhFQ kbi0chELiM0moC5x48ZPZhBbREBVYnH7GaA4FwezwGFmicavn8ASwgLhEtPbz4E18ApYSGx5 vgvI5uAQEqiX2P/YHyIsKHFy5hOwEmagmX/mXWIGKWEWkJZY/o8DIiwv0bx1NthETgF9iYWL 54OdJiqgLHFg23EmiDOvsUu87JOBsCUlDq64wTKBUXAWkg2zkGyYhbBhFpINCxhZVjEKZeaV 5SZm5pjoZVTmZVboJefnbmIERvSy2j/ROxg/XQg+xCjAwajEw+tRcC9DiDWxrLgy9xCjBAez kghvwoo7GUK8KYmVValF+fFFpTmpxYcYpTlYlMR5jb6VpwgJpCeWpGanphakFsFkmTg4pRoY jXq5zkRIxV780L+1/HxxXoBk5O3gSutTadv/HzdPFvBNbs5QmCGj/cZ1uVLur/TtKdslbffO YPMQ26Z/2dBxdrbTteKFLe89HHbxTdTtWhVRZMjx5/vy5SLlmk7d9y38b+6fFMqpqTRtzr8P 32avkZ3rwbSwJuTwJMW6WUfvfFZ/OmN1Rxy/EktxRqKhFnNRcSIAxvp0UOQCAAA= X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrHIsWRmVeSWpSXmKPExsXC5WfdrMtqcD/DYNYWRYs569ewWey6EWKx vnEdu8XX9b+YLX7ePc5ucfH1HyaL+8uesVgc3zqP3eLw3JOsFtfXHmWyuLxrDpvFvTX/WS2+ 9UlbXJjYy2px6NpzVosjb7qZLeZ+MbRYvSbD4svqVWwWs4/eY3cQ8Vgzbw2jx85Zd9k99kw8 yebR3XaZ3WPxnpdMHps+TWL3WPj7BbPHiRm/WTx2PrT06G1+x+bx8ektFo/3+66yedy9DlT2 7baHx+IXH5g8zl3rYw4QjOKySUnNySxLLdK3S+DKWDtjD3tBR37Fu/4W9gbG495djJwcEgIm Eisv72YDsVkEVCQurVzEAmKzCahL3LjxkxnEFhFQlVjcfgYozsXBLHCYWaLx6yewhLBAuMT0 9nNgDbwCFhJbnu8Csjk4hATqJfY/9ocIC0qcnPkErIQZaOafeZeYQUqYBaQllv/jgAjLSzRv nQ02kVNAX2Lh4vmsILaogLLEgW3HmSYw8s1CMmkWkkmzECbNQjJpASPLKkaRzLyy3MTMHFO9 4uyMyrzMCr3k/NxNjMBYXVb7Z+IOxi+X3Q8xCnAwKvHwehTcyxBiTSwrrsw9xCjBwawkwpuw 4k6GEG9KYmVValF+fFFpTmrxIUZpDhYlcV6v8NQEIYH0xJLU7NTUgtQimCwTB6dUAyP7DLEp de+fWXUoNh29836ysr/Xbo58g4teU5YvOuM3MfFt1k2mXJ2IT169C5fVzXgZK7l7WuWZeUkG 1plHbkb4lndd5k3Y4b3aLpzdl+ef5IELV7mWvTz8bN33hTt+r3kczWEyZ4dYko5uR/3P5zO4 yz/vYDPyOqB6Rjxq/i7zozunqewo1xZRYinOSDTUYi4qTgQAu5HYnNECAAA= X-CFilter-Loop: Reflected On Thu, Oct 02, 2025 at 12:49:23PM +0100, Yeoreum Yun wrote: > Sorry code was wrong. > > Hi Byoungchul, > > > > > DEPT(Dependency Tracker) reported a deadlock: > > > > > > =================================================== > > > DEPT: Circular dependency has been detected. > > > 6.15.11-00046-g2c223fa7bd9a-dirty #13 Not tainted > > > --------------------------------------------------- > > > summary > > > --------------------------------------------------- > > > *** DEADLOCK *** > > > > > > context A > > > [S] (unknown)(pg_locked_map:0) > > > [W] dept_page_wait_on_bit(pg_writeback_map:0) > > > [E] dept_page_clear_bit(pg_locked_map:0) > > > > > > context B > > > [S] (unknown)(pg_writeback_map:0) > > > [W] dept_page_wait_on_bit(pg_locked_map:0) > > > [E] dept_page_clear_bit(pg_writeback_map:0) > > > > > > [S]: start of the event context > > > [W]: the wait blocked > > > [E]: the event not reachable > > > --------------------------------------------------- > > > context A's detail > > > --------------------------------------------------- > > > context A > > > [S] (unknown)(pg_locked_map:0) > > > [W] dept_page_wait_on_bit(pg_writeback_map:0) > > > [E] dept_page_clear_bit(pg_locked_map:0) > > > > > > [S] (unknown)(pg_locked_map:0): > > > (N/A) > > > > > > [W] dept_page_wait_on_bit(pg_writeback_map:0): > > > [] folio_wait_bit+0x2c/0x38 > > > stacktrace: > > > folio_wait_bit_common+0x824/0x8b8 > > > folio_wait_bit+0x2c/0x38 > > > folio_wait_writeback+0x5c/0xa4 > > > migrate_pages_batch+0x5e4/0x1788 > > > migrate_pages+0x15c4/0x1840 > > > compact_zone+0x9c8/0x1d20 > > > compact_node+0xd4/0x27c > > > sysctl_compaction_handler+0x104/0x194 > > > proc_sys_call_handler+0x25c/0x3f8 > > > proc_sys_write+0x20/0x2c > > > do_iter_readv_writev+0x350/0x448 > > > vfs_writev+0x1ac/0x44c > > > do_pwritev+0x100/0x15c > > > __arm64_sys_pwritev2+0x6c/0xcc > > > invoke_syscall.constprop.0+0x64/0x18c > > > el0_svc_common.constprop.0+0x80/0x198 > > > > > > [E] dept_page_clear_bit(pg_locked_map:0): > > > [] migrate_folio_undo_src+0x1b4/0x200 > > > stacktrace: > > > migrate_folio_undo_src+0x1b4/0x200 > > > migrate_pages_batch+0x1578/0x1788 > > > migrate_pages+0x15c4/0x1840 > > > compact_zone+0x9c8/0x1d20 > > > compact_node+0xd4/0x27c > > > sysctl_compaction_handler+0x104/0x194 > > > proc_sys_call_handler+0x25c/0x3f8 > > > proc_sys_write+0x20/0x2c > > > do_iter_readv_writev+0x350/0x448 > > > vfs_writev+0x1ac/0x44c > > > do_pwritev+0x100/0x15c > > > __arm64_sys_pwritev2+0x6c/0xcc > > > invoke_syscall.constprop.0+0x64/0x18c > > > el0_svc_common.constprop.0+0x80/0x198 > > > do_el0_svc+0x28/0x3c > > > el0_svc+0x50/0x220 > > > --------------------------------------------------- > > > context B's detail > > > --------------------------------------------------- > > > context B > > > [S] (unknown)(pg_writeback_map:0) > > > [W] dept_page_wait_on_bit(pg_locked_map:0) > > > [E] dept_page_clear_bit(pg_writeback_map:0) > > > > > > [S] (unknown)(pg_writeback_map:0): > > > (N/A) > > > > > > [W] dept_page_wait_on_bit(pg_locked_map:0): > > > [] bdev_getblk+0x58/0x120 > > > stacktrace: > > > find_get_block_common+0x224/0xbc4 > > > bdev_getblk+0x58/0x120 > > > __ext4_get_inode_loc+0x194/0x98c > > > ext4_get_inode_loc+0x4c/0xcc > > > ext4_reserve_inode_write+0x74/0x158 > > > __ext4_mark_inode_dirty+0xd4/0x4e0 > > > __ext4_ext_dirty+0x118/0x164 > > > ext4_ext_map_blocks+0x1578/0x2ca8 > > > ext4_map_blocks+0x2a4/0xa60 > > > ext4_convert_unwritten_extents+0x1b0/0x3c0 > > > ext4_convert_unwritten_io_end_vec+0x90/0x1a0 > > > ext4_end_io_end+0x58/0x194 > > > ext4_end_io_rsv_work+0xc4/0x150 > > > process_one_work+0x3b4/0xac0 > > > worker_thread+0x2b0/0x53c > > > kthread+0x1a0/0x33c > > > > > > [E] dept_page_clear_bit(pg_writeback_map:0): > > > [] ext4_finish_bio+0x638/0x820 > > > stacktrace: > > > folio_end_writeback+0x140/0x488 > > > ext4_finish_bio+0x638/0x820 > > > ext4_release_io_end+0x74/0x188 > > > ext4_end_io_end+0xa0/0x194 > > > ext4_end_io_rsv_work+0xc4/0x150 > > > process_one_work+0x3b4/0xac0 > > > worker_thread+0x2b0/0x53c > > > kthread+0x1a0/0x33c > > > ret_from_fork+0x10/0x20 > > > > > > To simplify the scenario: > > > > > > context X (wq worker) context Y (process context) > > > > > > migrate_pages_batch() > > > ext4_end_io_end() ... > > > ... migrate_folio_unmap() > > > ext4_get_inode_loc() ... > > > ... folio_lock() // hold the folio lock > > > bdev_getblk() ... > > > ... folio_wait_writeback() // wait forever > > > __find_get_block_slow() > > > ... ... > > > folio_lock() // wait forever > > > folio_unlock() migrate_folio_undo_src() > > > ... > > > ... folio_unlock() // never reachable > > > ext4_finish_bio() > > > ... > > > folio_end_writeback() // never reachable > > > > > > context X is waiting for the folio lock to be released by context Y, > > > while context Y is waiting for the writeback to end in context X. > > > Ultimately, two contexts are waiting for the event that will never > > > happen, say, deadlock. > > > > > > *Only one* of the following two conditions should be allowed, or we > > > cannot avoid this kind of deadlock: > > > > > > 1. while holding a folio lock (and heading for folio_unlock()), > > > waiting for a writeback to end, > > > 2. while heading for the writeback end, waiting for the folio lock to > > > be released, > > > > > > Since allowing 2 and avoiding 1 sound more sensible than the other, > > > remove the first condition by making sure folio_unlock() before > > > folio_wait_writeback() in migrate_folio_unmap(). > > > > > > Fixes: 49d2e9cc45443 ("[PATCH] Swap Migration V5: migrate_pages() function") > > > Reported-by: Yunseong Kim > > > Signed-off-by: Byungchul Park > > > Tested-by: Yunseong Kim > > > --- > > > > > > Hi, > > > > > > Thanks to Yunseong for reporting the issue, testing, and confirming if > > > this patch can resolve the issue. We used the latest version of DEPT > > > to detect the issue: > > > > > > https://lore.kernel.org/all/20251002081247.51255-1-byungchul@sk.com/ > > > > > > I mentioned in the commit message above like: > > > > > > *Only one* of the following two conditions should be allowed, or we > > > cannot avoid this kind of deadlock: > > > > > > 1. while holding a folio lock (and heading for folio_unlock()), > > > waiting for a writeback to end, > > > 2. while heading for the writeback end, waiting for the folio lock > > > to be released, > > > > > > Honestly, I'm not convinced which one we should choose between two, I > > > chose 'allowing 2 and avoiding 1' to resolve this issue though. > > > > > > However, please let me know if I was wrong and we should go for > > > 'allowing 1 and avoiding 2'. If so, I should try a different approach, > > > for example, to fix by preventing folio_lock() or using folio_try_lock() > > > while heading for writeback end in ext4_end_io_end() or something. > > > > > > To Yunseong, > > > > > > The link you shared for a system hang is: > > > > > > https://gist.github.com/kzall0c/a6091bb2fd536865ca9aabfd017a1fc5 > > > > > > I think an important stacktrace for this issue, this is, waiting for > > > PG_writeback, was missed in the log. > > > > > > Byungchul > > > > > > --- > > > mm/migrate.c | 57 +++++++++++++++++++++++++++++++++++++--------------- > > > 1 file changed, 41 insertions(+), 16 deletions(-) > > > > > > diff --git a/mm/migrate.c b/mm/migrate.c > > > index 9e5ef39ce73a..60b0b054f27a 100644 > > > --- a/mm/migrate.c > > > +++ b/mm/migrate.c > > > @@ -1215,6 +1215,17 @@ static int migrate_folio_unmap(new_folio_t get_new_folio, > > > > > > dst->private = NULL; > > > > > > +retry_wait_writeback: > > > + /* > > > + * Only in the case of a full synchronous migration is it > > > + * necessary to wait for PageWriteback. In the async case, the > > > + * retry loop is too short and in the sync-light case, the > > > + * overhead of stalling is too much. Plus, do not write-back if > > > + * it's in the middle of direct compaction > > > + */ > > > + if (folio_test_writeback(src) && mode == MIGRATE_SYNC) > > > + folio_wait_writeback(src); > > > + > > > if (!folio_trylock(src)) { > > > if (mode == MIGRATE_ASYNC) > > > goto out; > > > @@ -1245,27 +1256,41 @@ static int migrate_folio_unmap(new_folio_t get_new_folio, > > > > > > folio_lock(src); > > > } > > > - locked = true; > > > - if (folio_test_mlocked(src)) > > > - old_page_state |= PAGE_WAS_MLOCKED; > > > > > > if (folio_test_writeback(src)) { > > > - /* > > > - * Only in the case of a full synchronous migration is it > > > - * necessary to wait for PageWriteback. In the async case, > > > - * the retry loop is too short and in the sync-light case, > > > - * the overhead of stalling is too much > > > - */ > > > - switch (mode) { > > > - case MIGRATE_SYNC: > > > - break; > > > - default: > > > - rc = -EBUSY; > > > - goto out; > > > + if (mode == MIGRATE_SYNC) { > > > + /* > > > + * folio_unlock() is required before trying > > > + * folio_wait_writeback(). Or it leads a > > > + * deadlock like: > > > + * > > > + * context x context y > > > + * in XXX_io_end() in migrate_folio_unmap() > > > + * > > > + * ... ... > > > + * bdev_getblk(); folio_lock(); > > > + * > > > + * // wait forever // wait forever > > > + * folio_lock(); folio_wait_writeback(); > > > + * > > > + * ... ... > > > + * folio_unlock(); > > > + * ... // never reachable > > > + * folio_unlock(); > > > + * // never reachable > > > + * folio_end_writeback(); > > > + */ > > > + folio_unlock(src); > > > + goto retry_wait_writeback; > > > } > > > - folio_wait_writeback(src); > > > + rc = -EBUSY; > > > + goto out; > > > } > > > > > > + locked = true; > > > + if (folio_test_mlocked(src)) > > > + old_page_state |= PAGE_WAS_MLOCKED; > > > + > > > /* > > > * By try_to_migrate(), src->mapcount goes down to 0 here. In this case, > > > * we cannot notice that anon_vma is freed while we migrate a page. > > > > Hmm, I still have concerns about this change. > > (1) seems to imply that the use of WB_SYNC_ALL by > > mpage_writebacks() is also incorrect. In addition, > > this change could introduce another theoretical livelock > > when the folio enters writeback frequently. > > > > AFAIK, while a folio is under writeback, > > its related buffers won’t be freed by migration, and > > since try_free_buffer() checks the writeback state first, > > taking folio_lock() shouldn’t be necessary while bdev_getblk(). > > > > Therefore, it seems sufficient to check whether > > the folio is under writeback in __find_get_block_slow(), e.g.: > > > > diff --git a/fs/buffer.c b/fs/buffer.c > > index 6a8752f7bbed..804d33df6b0f 100644 > > --- a/fs/buffer.c > > +++ b/fs/buffer.c > > @@ -194,6 +194,9 @@ __find_get_block_slow(struct block_device *bdev, sector_t block, bool atomic) > > if (IS_ERR(folio)) > > goto out; > > > > + if (folio_test_writeback(folio)) > > + return true; > > + > > /* > > * Folio lock protects the buffers. Callers that cannot block > > * will fallback to serializing vs try_to_free_buffers() via > > > > Am I missing something? > > Sorry, the code was wrong. the suggestion is: > > diff --git a/fs/buffer.c b/fs/buffer.c > index 6a8752f7bbed..804d33df6b0f 100644 > --- a/fs/buffer.c > +++ b/fs/buffer.c > @@ -194,6 +194,9 @@ __find_get_block_slow(struct block_device *bdev, sector_t block, bool atomic) > if (IS_ERR(folio)) > goto out; > > + if (folio_test_writeback(folio)) > + atomic = true; > + Looks much better to me. Or make sure to atomic(= true) is passed if folio_test_writeback(folio). Byungchul > /* > * Folio lock protects the buffers. Callers that cannot block > * will fallback to serializing vs try_to_free_buffers() via > > -- > Sincerely, > Yeoreum Yun