From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id ED7C3C2BD09 for ; Fri, 28 Jun 2024 20:42:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 01F566B0089; Fri, 28 Jun 2024 16:42:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F11416B008C; Fri, 28 Jun 2024 16:42:50 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DFF996B0092; Fri, 28 Jun 2024 16:42:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id C28316B0089 for ; Fri, 28 Jun 2024 16:42:50 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 61268A2093 for ; Fri, 28 Jun 2024 20:42:50 +0000 (UTC) X-FDA: 82281471300.06.A38F713 Received: from sin.source.kernel.org (sin.source.kernel.org [145.40.73.55]) by imf15.hostedemail.com (Postfix) with ESMTP id 31FBAA0011 for ; Fri, 28 Jun 2024 20:42:47 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b="qrP9bV/9"; dmarc=none; spf=pass (imf15.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719607353; a=rsa-sha256; cv=none; b=4srXVSKFEmGHjpjcNvJHkkped6HrtQrL3L/z+j9cZAMWyEKiFz1fBnM12ixaiXG4wChXlw yCOhrSaY1v464GilElS/C2EwEl2ziMCh2TXnavK6SzebdvxVyrrqvCL0cg4xlPVrEZPJbr BSGnY6+Xcs+40F3Df62TYUxuED5vDVI= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b="qrP9bV/9"; dmarc=none; spf=pass (imf15.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719607353; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=6zdXMHeudzrPmZQsrSE6mGyYxVpCSxp5bjG0rjgcUHw=; b=EXWaDHzCTMCTveYJn5BPY7q4jT8i0tKTpLQnOy7x63kIDwMfVfb9Q6KKgkbhDLqQscqcGG BUKz1eMUJk6oYNX68S6X2I4PwA/E6wXpsU36LPJVFqRCE9ltS8KJrw1cZf5FL1jQxvjOIc +vifmTwXBmns6gHETY95JfqSkeU3YY0= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sin.source.kernel.org (Postfix) with ESMTP id 275D5CE2E1E; Fri, 28 Jun 2024 20:42:43 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 227F7C116B1; Fri, 28 Jun 2024 20:42:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1719607362; bh=sC5/XjAiHZDzOKF4zecBptmEbklTx74wU/VV/LkSRnw=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=qrP9bV/90gz4EOySJs+s89K63a+KU7/tOJYGmIh8CD5h6ETR+GklndnjiEGnKJ1R8 M604UOHtSmzC1HeNySuPnqGBSkZTXXyMTW67OXeNjWW91KP1aDjsJtlmh0/Eq//E+l QM/coZ2FsdKZ3ICWX4GgLz/5fbxT5MLuc29dxJEM= Date: Fri, 28 Jun 2024 13:42:41 -0700 From: Andrew Morton To: yangge1116@126.com Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org, 21cnbao@gmail.com, peterx@redhat.com, yang@os.amperecomputing.com, baolin.wang@linux.alibaba.com, liuzixing@hygon.cn Subject: Re: [PATCH V2] mm/gup: Fix longterm pin on slow gup regression Message-Id: <20240628134241.53c5f68f936efe0aa8f0b789@linux-foundation.org> In-Reply-To: <1719554518-11006-1-git-send-email-yangge1116@126.com> References: <1719554518-11006-1-git-send-email-yangge1116@126.com> X-Mailer: Sylpheed 3.8.0beta1 (GTK+ 2.24.33; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 31FBAA0011 X-Stat-Signature: ihspf4b73zrbejwdc45qa59s91ontowd X-Rspam-User: X-HE-Tag: 1719607367-312437 X-HE-Meta: U2FsdGVkX18jcWLLa5VuVBCZhjK8z5cj89vFIrkVQEoORPOtyDmN+xpTYo9C06iZDV+xEuWKSIu4cugeXcbzOJZnW+SMkxClSxOUG8YnuVbPE7Xlr5bW5pyiML+gT/L+f1Mn3EdlLOzTvC8mG9aVY/IwnxwY6A8HEaMphi1m6M+Tr4110tEV088q9V8t45Jj6MKalNvjd689jPUcthmoqWDqUb+s+4QlFiev6kRhFm0W/lh2+5XpEK9286KiX7th9we78WK5LuZWOurWD9O1bWv02riUTZt6Ahw5AzPVFcy7+8XDOQ6XDVU5pOnY8E6Bv72nwWjZ0drOH9E/UftWIMD22dVj4QnlAsRP5SUb8R9lDevS66+3IMbBonvDNZHNoqtXvdMPpwL7GNqmJWnALmwrChdYmKCGLUii6OhXI2VWnzjGSsbJ9uXK05c7DIVr0F5pLs99uFnVXMhMnutqBEUF0fMsr7fxuZletXRi7oswvxCUSO00EVd2jnmSwyg20xS/J6UnW043Hz9m+KPQXeO+xmpZ5I8T/0VxjBPaM2W7GaMZ/d1XjpVKEIw4p/MAeu2Chvfjk1ntXaLqpUX/BRZHbOLUT/XI2YZzFKd5Jk6vR3Bs4c88AIqxzchPi1e/RrCUNDy/EzVBlmwVwIzM/oH6NtdoHkM0TklhprEGV256TeV9lSamTS8WaYZ5jyuG1sEmEs2c2qZByoP7yYjzqmNxzmXITbN/ORyVU0ZfgT62xqWMwWnrx1hlsYupQ7jgTG/L1PbTZEwYqBvgpcbXAQE+ptybdYSIzjaz/JHnoWOHzBQXbWYHMX8GZzTYx2Ite/H/b0Eiu425llRmqs8j1M4Zy9CjGVbHQQfa10aaavQPDxXptx13BxL1/PAKnndfxma457yH20KRLMY0IKD/4MJLGBED0+3vqT5eIvJoJbkcC2qFA5aPorHgW76VPZzTRHi5xIwH0SMbPuoUZKR wa088yEH DEWgTkKu0BltBK+oTpur7uOS5xta74B4xDShPjqJSHpnvgq5a0J2MQe077pP5ZEdgGoPh7hDgP/F13A/AgbKsyh0td4d9F36MCWmpkMiciBZVr8XrkTIAOupBl9eEB4Nw3/hC2jSXxhrxHsQ62Idz0tiF/Z2f74+I3ChtQFvqfSnlRYrMtkPHj642VNex7gZKMck0N3RM2NFEaHk9yUF3Bnxvj/6J8t9x+pDk6d1Wd2nuojbGu40s61A0irkM6NKFqAin8SVAQIIfJKKroNHvVejbYOYemsvEycwzMnuzsVrNgB7vGIG1i6qyp5acmIl302XO8OrHmBHsb88dx0AWKyaKJUPjkEek31+DFv/qzvuVvGw= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, 28 Jun 2024 14:01:58 +0800 yangge1116@126.com wrote: > From: yangge > > If a large number of CMA memory are configured in system (for > example, the CMA memory accounts for 50% of the system memory), > starting a SEV virtual machine will fail. During starting the SEV > virtual machine, it will call pin_user_pages_fast(..., FOLL_LONGTERM, > ...) to pin memory. Normally if a page is present and in CMA area, > pin_user_pages_fast() will first call __get_user_pages_locked() to > pin the page in CMA area, and then call > check_and_migrate_movable_pages() to migrate the page from CMA area > to non-CMA area. But the current code calling __get_user_pages_locked() > will fail, because it call try_grab_folio() to pin page in gup slow > path. > > The commit 57edfcfd3419 ("mm/gup: accelerate thp gup even for "pages > != NULL"") uses try_grab_folio() in gup slow path, which seems to be > problematic because try_grap_folio() will check if the page can be > longterm pinned. This check may fail and cause __get_user_pages_lock() > to fail. However, these checks are not required in gup slow path, > seems we can use try_grab_page() instead of try_grab_folio(). In > addition, in the current code, try_grab_page() can only add 1 to the > page's refcount. We extend this function so that the page's refcount > can be increased according to the parameters passed in. > > The following log reveals it: > > [ 464.325306] WARNING: CPU: 13 PID: 6734 at mm/gup.c:1313 __get_user_pages+0x423/0x520 > [ 464.325464] CPU: 13 PID: 6734 Comm: qemu-kvm Kdump: loaded Not tainted 6.6.33+ #6 > [ 464.325477] RIP: 0010:__get_user_pages+0x423/0x520 > [ 464.325515] Call Trace: > [ 464.325520] > [ 464.325523] ? __get_user_pages+0x423/0x520 > [ 464.325528] ? __warn+0x81/0x130 > [ 464.325536] ? __get_user_pages+0x423/0x520 > [ 464.325541] ? report_bug+0x171/0x1a0 > [ 464.325549] ? handle_bug+0x3c/0x70 > [ 464.325554] ? exc_invalid_op+0x17/0x70 > [ 464.325558] ? asm_exc_invalid_op+0x1a/0x20 > [ 464.325567] ? __get_user_pages+0x423/0x520 > [ 464.325575] __gup_longterm_locked+0x212/0x7a0 > [ 464.325583] internal_get_user_pages_fast+0xfb/0x190 > [ 464.325590] pin_user_pages_fast+0x47/0x60 > [ 464.325598] sev_pin_memory+0xca/0x170 [kvm_amd] > [ 464.325616] sev_mem_enc_register_region+0x81/0x130 [kvm_amd] > Well, we also have Yang Shi's patch (https://lkml.kernel.org/r/20240627231601.1713119-1-yang@os.amperecomputing.com) which takes a significantly different approach. Which way should we go?