From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1E8DDC61DA3 for ; Tue, 21 Feb 2023 22:25:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 585BC6B0071; Tue, 21 Feb 2023 17:25:49 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 535506B0072; Tue, 21 Feb 2023 17:25:49 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3D5456B0073; Tue, 21 Feb 2023 17:25:49 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 2E5106B0071 for ; Tue, 21 Feb 2023 17:25:49 -0500 (EST) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id AEBB480A7B for ; Tue, 21 Feb 2023 22:25:48 +0000 (UTC) X-FDA: 80492732376.13.96A1293 Received: from mail-qv1-f54.google.com (mail-qv1-f54.google.com [209.85.219.54]) by imf13.hostedemail.com (Postfix) with ESMTP id DC13E20009 for ; Tue, 21 Feb 2023 22:25:46 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=dPP8UQcE; spf=pass (imf13.hostedemail.com: domain of hughd@google.com designates 209.85.219.54 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1677018346; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=mhaTjoGH7VJWCphvDW2ykyIb5PsyF67iAGu6nul2kos=; b=pvUnUagtCL5LWjMYNNNcn6sGnHrDx3h2FX1rjXPaWY7H7wrJCvPnlmnSrtoHKIyGJctxsW WFG5oJUyrwuIVyo0yo6flGyKj/iS0flzerZ4B7WKYdFo3wjFuA22vGy50wglkm7dU/JzBd gLwAI8PXXLli81yd88wNwXStZMDE45I= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=dPP8UQcE; spf=pass (imf13.hostedemail.com: domain of hughd@google.com designates 209.85.219.54 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1677018346; a=rsa-sha256; cv=none; b=5g/W5JY/xDgystzCGFh+4JhNf8jz+5LpgHAz42iVBIZC5LkzpPF1Trdc098Owlwc6Lpw6g +e/8z9CeV6OfoYfa54Q6U5wPfik4dxYMtAFwQHqpfnlaW8vTRf3k6lpISc4Xqx30KiH63D 0+1J1kDCMKzxlTAFQK6kgpL6BiGQwnY= Received: by mail-qv1-f54.google.com with SMTP id ne1so6594401qvb.9 for ; Tue, 21 Feb 2023 14:25:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=mhaTjoGH7VJWCphvDW2ykyIb5PsyF67iAGu6nul2kos=; b=dPP8UQcEI5q2OYNRVNdJP8ptvACzfJyDI+4yjspMT1jpRA7dnlmvpAqNvsxPX6O6Wd 8jQjohV4yhFsq2meMG29HhdjMr7zyVEKwknIWBlT63soaeJbSRm7t4Q9DFr1dwAgHV1m EXOqW42nm65qklNlnRJGN1jXDayfVdLNHCTf4MTwFgMxbtbZsrNRufk5QooOnOhCUkwN Xxxr5edTQVJ85SY94Ke9DaZToAJCXFD1drhdfxohfoISMpt/JuwGmXitS86ryrKakBH/ IXXjDDwUPtrlPRap14/V/2jaPYMW/tOGpcCYtuhXEjCcBjC8QHPIr3NOZr+5tn71RGSI OTyw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=mhaTjoGH7VJWCphvDW2ykyIb5PsyF67iAGu6nul2kos=; b=oP2pev6/X47r7T2HVmQPsNk0wtzs+PRuMYtl/3tkGjhCEPqA3qzbsifWp/hsk+Cd3s 9Zg8+fH+ntsYVgWP8jyua04DmKPTYCjJUevdzedrfHJxXeGjdtJlexcUo57aml9T1wxg DnF6MStsl9J66DxWp6QTDp4OGJ06laVi7HvUWOUFdEqB650oKkUl6fpVq4y/qu6NRhaV ipHWmQBhdupRevKBv6u+rrQAfQtZt96HE/pj3PS3r7fP0o4dl12alErEMr9+FIrFmGyo 71w3scx2NwbNEFx9NFzpNOKXat5iSN5DZAhMnnXUCZG1UDnwu0qq6ATAq97+MA4xnAJ1 43sg== X-Gm-Message-State: AO0yUKUhDTDgPyGYVTveAFvZ+ti5hDCU00Yo6ajbdGkzw1gWjsNVH54Z I90HO1r5Yinum+a8jNl6/lzJIQ== X-Google-Smtp-Source: AK7set+28Brx2HsGPuSWzITqEU9LSJkbpjZkfY54aqATfrj5w5ricvR7PbJ7RESQwo81uBQBlt0zEA== X-Received: by 2002:a05:6214:2a8d:b0:56c:15c9:b5f0 with SMTP id jr13-20020a0562142a8d00b0056c15c9b5f0mr9907081qvb.17.1677018345809; Tue, 21 Feb 2023 14:25:45 -0800 (PST) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id d65-20020a37b444000000b0073b3316bbd0sm2706995qkf.29.2023.02.21.14.25.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Feb 2023 14:25:43 -0800 (PST) Date: Tue, 21 Feb 2023 14:25:41 -0800 (PST) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: "Huang, Ying" cc: Hugh Dickins , Andrew Morton , Jan Kara , linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, Zi Yan , Yang Shi , Baolin Wang , Oscar Salvador , Matthew Wilcox , Bharata B Rao , Alistair Popple , Xin Hao , Minchan Kim , Mike Kravetz , Hyeonggon Yoo <42.hyeyoo@gmail.com>, "Xu, Pengfei" , Christoph Hellwig , Stefan Roesch , Tejun Heo Subject: Re: [PATCH -v5 0/9] migrate_pages(): batch TLB flushing In-Reply-To: <871qmjdsj0.fsf@yhuang6-desk2.ccr.corp.intel.com> Message-ID: <20f1628e-96a7-3a5d-fef5-dae31f8eb196@google.com> References: <20230213123444.155149-1-ying.huang@intel.com> <87a6c8c-c5c1-67dc-1e32-eb30831d6e3d@google.com> <874jrg7kke.fsf@yhuang6-desk2.ccr.corp.intel.com> <2ab4b33e-f570-a6ff-6315-7d5a4614a7bd@google.com> <871qmjdsj0.fsf@yhuang6-desk2.ccr.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 X-Rspamd-Queue-Id: DC13E20009 X-Stat-Signature: ty8yu7rpar1hog1aq5ewogfmsufdgb5f X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1677018346-46951 X-HE-Meta: U2FsdGVkX1+G7SLUqI64O0KrYWCSEAGbsfDilZEvNf9WYUCM8lZIr0EKHPKvKZ1P3YnqZSZvESjvVcNGi575xmRV249Ul3BtkNW2FLUQjgRfwrFayiV4l8uK2Q0snMv9tRSOow6UePqKMBbaZO6RN4zttbE/ceqKSthyzP6PePw46clOqE6e4iX/bnCBSBQG4LwE6326dp18Nx9zXy4vS9Nmf9lxSAL2iLuvccF/uWUQ99FmibmkSqNt1MKmkoZtlk7Q/19QsT8eq5ByjW4KDPASUu1SmcujpqPwqaofw7a8iB+9qqHrQvrkGtAazeDro1k+uk0v9D3KB5mGlsmOzFZMJugXn1PkjO1uT0BAoC1NsCXl2vL5DAEYeJSpzz7UR5hYinMKg/xAHTZsSwRggreZYC7bLTiGvBkGY3E2UCiT+GjfgwHiQXWcVSjrpu5ZYeN1r/a5TrbT5opcUOV1FoISqmaIKMMjwIS9iC/RGa47k52ogGcnGrhdobHnc6G7TjLyiE07raW0VCwBmZvsDjTMVjC8EdTSTwUNdv80JN+P1HXSvu1mICqzXdsJC4OyW6mechBDTbGRfqE7VtChWCQHjIYsA7r6LB1RKYIL+BS9ZOr6pGpq/yChwQSkmGCemHOTxcYOuLEr81dcyrCIp2DATDJ7znb389Di+i7MbvGAe3NbAZFB4ovqgHAQowJZQUOBfCmAUzL4hrBLr00ggrRjNYzxsGvl8921+gikW9Sh5sSprJIJZWGJIYN+9g8aFxW+EgHtU/RCKLmN664i0X8qdpO1Juwqi8Zi9SpxeEunwgnakBSsB4fvMaYij+jATmlDsexIrIpADravoc2oQLpH9yn4ztaH9hFslz9bRI/0R1BxDxVVQtlgPjjXqkdd2PoGGjuVsyLgjZAAeF4IEvZxeq9v9MXN1kV4M07C/yhSu+bgdRfZrUKGlAFPzpnjSIwvyvuYcsmdk48JvDu 7mGNUGr4 77anB83miK/RpKhMXh36CP29jYS/h/bh+R2zWBu//xQngZZgff6l643WyEAVDBNwMEMZZhxhVa+0MksGAoHt3rUV0ijbzYz8fL1wB2emBdaXBTmckRfrS1LKmkZRbAXlfIie3koFiFWEck2LBn1l6Anl/u2+2U7lHs0SwgDYWopnKd2nbXnhg63mbYu/7XIHTlANlydRUk3ZnkcChZMLZbuSVpQnuaF9wu3hr5kQoi/adhbuP5tp1IxbL5x6UJ5jGQudr0pzfkWrLSG22ggI2lyXtIQBXiQaBSY09RuKn/ErQtFTECS3AcbqsH0lkPF8xz6p29yYpC1Rv8tqQoZTpZ5XfplCCv6qyl4hXhUpBNHEMy5xwSQYp5kYmGJNi9acurEZgYdJD+S6C7XzQSn1I7xnh+qHOKIfBn2Wr8CAQ1gR6sBvAoPOCw8+rP7ZO03t+aHy8myYerAMfbcjl+0PiTvXJIDYFU7jejFryuTpaPEa2XwqoMqpOq+4HUZpTpjOnA4V0xYWv6v9zcg9fsIxhMwr1AGwYxBB32HFz32/rRbzqq+qggFdcd5B/6B39rMEfg0IaBPW6JO1zIZNB+tWOc4Ee2jLXhRR19mxmca8uyALfUUtk375EmsHnKudYPNuhAa+T4wUmYYgsc7pXZR9u1Mq3K8C0iUelAMWWZlrOMsy99AIiBlcl+b+1UrvE5uezbW9Ys8HgIN6J15tdUELEHemFT2X6Oj7FLiRgbWDrtFlpqZcwXZ5ITbmZwOBXyDNZ8SCy X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, 21 Feb 2023, Huang, Ying wrote: > > On second thought, I think that it may be better to provide a fix as > simple as possible firstly. Then we can work on a more complex fix as > we discussed above. The simple fix is easy to review now. And, we will > have more time to test and review the complex fix. > > In the following fix, I disabled the migration batching except for the > MIGRATE_ASYNC mode, or the split folios of a THP folio. After that, I > will work on the complex fix to enable migration batching for all modes. > > What do you think about that? I don't think there's a need to rush in the wrong fix so quickly. Your series was in (though sometimes out of) linux-next for some while, without causing any widespread problems. Andrew did send it to Linus yesterday, I expect he'll be pushing it out later today or tomorrow, but I don't think it's going to cause big problems. Aiming for a fix in -rc2 would be good. Why would it be complex? Hugh > > Best Regards, > Huang, Ying > > -------------------------------8<--------------------------------- > From 8e475812eacd9f2eeac76776c2b1a17af3e59b89 Mon Sep 17 00:00:00 2001 > From: Huang Ying > Date: Tue, 21 Feb 2023 16:37:50 +0800 > Subject: [PATCH] migrate_pages: fix deadlock in batched migration > > Two deadlock bugs were reported for the migrate_pages() batching > series. Thanks Hugh and Pengfei! For example, in the following > deadlock trace snippet, > > INFO: task kworker/u4:0:9 blocked for more than 147 seconds. > Not tainted 6.2.0-rc4-kvm+ #1314 > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > task:kworker/u4:0 state:D stack:0 pid:9 ppid:2 flags:0x00004000 > Workqueue: loop4 loop_rootcg_workfn > Call Trace: > > __schedule+0x43b/0xd00 > schedule+0x6a/0xf0 > io_schedule+0x4a/0x80 > folio_wait_bit_common+0x1b5/0x4e0 > ? __pfx_wake_page_function+0x10/0x10 > __filemap_get_folio+0x73d/0x770 > shmem_get_folio_gfp+0x1fd/0xc80 > shmem_write_begin+0x91/0x220 > generic_perform_write+0x10e/0x2e0 > __generic_file_write_iter+0x17e/0x290 > ? generic_write_checks+0x12b/0x1a0 > generic_file_write_iter+0x97/0x180 > ? __sanitizer_cov_trace_const_cmp4+0x1a/0x20 > do_iter_readv_writev+0x13c/0x210 > ? __sanitizer_cov_trace_const_cmp4+0x1a/0x20 > do_iter_write+0xf6/0x330 > vfs_iter_write+0x46/0x70 > loop_process_work+0x723/0xfe0 > loop_rootcg_workfn+0x28/0x40 > process_one_work+0x3cc/0x8d0 > worker_thread+0x66/0x630 > ? __pfx_worker_thread+0x10/0x10 > kthread+0x153/0x190 > ? __pfx_kthread+0x10/0x10 > ret_from_fork+0x29/0x50 > > > INFO: task repro:1023 blocked for more than 147 seconds. > Not tainted 6.2.0-rc4-kvm+ #1314 > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > task:repro state:D stack:0 pid:1023 ppid:360 flags:0x00004004 > Call Trace: > > __schedule+0x43b/0xd00 > schedule+0x6a/0xf0 > io_schedule+0x4a/0x80 > folio_wait_bit_common+0x1b5/0x4e0 > ? compaction_alloc+0x77/0x1150 > ? __pfx_wake_page_function+0x10/0x10 > folio_wait_bit+0x30/0x40 > folio_wait_writeback+0x2e/0x1e0 > migrate_pages_batch+0x555/0x1ac0 > ? __pfx_compaction_alloc+0x10/0x10 > ? __pfx_compaction_free+0x10/0x10 > ? __this_cpu_preempt_check+0x17/0x20 > ? lock_is_held_type+0xe6/0x140 > migrate_pages+0x100e/0x1180 > ? __pfx_compaction_free+0x10/0x10 > ? __pfx_compaction_alloc+0x10/0x10 > compact_zone+0xe10/0x1b50 > ? lock_is_held_type+0xe6/0x140 > ? check_preemption_disabled+0x80/0xf0 > compact_node+0xa3/0x100 > ? __sanitizer_cov_trace_const_cmp8+0x1c/0x30 > ? _find_first_bit+0x7b/0x90 > sysctl_compaction_handler+0x5d/0xb0 > proc_sys_call_handler+0x29d/0x420 > proc_sys_write+0x2b/0x40 > vfs_write+0x3a3/0x780 > ksys_write+0xb7/0x180 > __x64_sys_write+0x26/0x30 > do_syscall_64+0x3b/0x90 > entry_SYSCALL_64_after_hwframe+0x72/0xdc > RIP: 0033:0x7f3a2471f59d > RSP: 002b:00007ffe567f7288 EFLAGS: 00000217 ORIG_RAX: 0000000000000001 > RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f3a2471f59d > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000005 > RBP: 00007ffe567f72a0 R08: 0000000000000010 R09: 0000000000000010 > R10: 0000000000000010 R11: 0000000000000217 R12: 00000000004012e0 > R13: 00007ffe567f73e0 R14: 0000000000000000 R15: 0000000000000000 > > > The page migration task has held the lock of the shmem folio A, and is > waiting the writeback of the folio B of the file system on the loop > block device to complete. While the loop worker task which writes > back the folio B is waiting to lock the shmem folio A, because the > folio A backs the folio B in the loop device. Thus deadlock is > triggered. > > In general, if we have locked some other folios except the one we are > migrating, it's not safe to wait synchronously, for example, to wait > the writeback to complete or wait to lock the buffer head. > > To fix the deadlock, in this patch, we avoid to batch the page > migration except for MIGRATE_ASYNC mode or the split folios of a THP > folio. In MIGRATE_ASYNC mode, synchronous waiting is avoided. And > there isn't any dependency relationship among the split folios of a > THP folio. > > The fix can be improved via converting migration mode from synchronous > to asynchronous if we have locked some other folios except the one we > are migrating. We will do that in the near future. > > Link: https://lore.kernel.org/linux-mm/87a6c8c-c5c1-67dc-1e32-eb30831d6e3d@google.com/ > Link: https://lore.kernel.org/linux-mm/874jrg7kke.fsf@yhuang6-desk2.ccr.corp.intel.com/ > Signed-off-by: "Huang, Ying" > Reported-by: Hugh Dickins > Reported-by: "Xu, Pengfei" > Cc: Christoph Hellwig > Cc: Stefan Roesch > Cc: Tejun Heo > Cc: Xin Hao > Cc: Zi Yan > Cc: Yang Shi > Cc: Baolin Wang > Cc: Matthew Wilcox > Cc: Mike Kravetz > --- > mm/migrate.c | 13 +++++++++---- > 1 file changed, 9 insertions(+), 4 deletions(-) > > diff --git a/mm/migrate.c b/mm/migrate.c > index ef68a1aff35c..bc04c34543f3 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -1937,7 +1937,7 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page, > enum migrate_mode mode, int reason, unsigned int *ret_succeeded) > { > int rc, rc_gather; > - int nr_pages; > + int nr_pages, batch; > struct folio *folio, *folio2; > LIST_HEAD(folios); > LIST_HEAD(ret_folios); > @@ -1951,6 +1951,11 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page, > mode, reason, &stats, &ret_folios); > if (rc_gather < 0) > goto out; > + > + if (mode == MIGRATE_ASYNC) > + batch = NR_MAX_BATCHED_MIGRATION; > + else > + batch = 1; > again: > nr_pages = 0; > list_for_each_entry_safe(folio, folio2, from, lru) { > @@ -1961,11 +1966,11 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page, > } > > nr_pages += folio_nr_pages(folio); > - if (nr_pages > NR_MAX_BATCHED_MIGRATION) > + if (nr_pages >= batch) > break; > } > - if (nr_pages > NR_MAX_BATCHED_MIGRATION) > - list_cut_before(&folios, from, &folio->lru); > + if (nr_pages >= batch) > + list_cut_before(&folios, from, &folio2->lru); > else > list_splice_init(from, &folios); > rc = migrate_pages_batch(&folios, get_new_page, put_new_page, private, > -- > 2.39.1