From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id ED3B4C52D6D for ; Fri, 2 Aug 2024 12:38:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 662986B007B; Fri, 2 Aug 2024 08:38:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 612A16B0083; Fri, 2 Aug 2024 08:38:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4DA5E6B0085; Fri, 2 Aug 2024 08:38:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 2E8806B007B for ; Fri, 2 Aug 2024 08:38:38 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 7ECA8C0C4E for ; Fri, 2 Aug 2024 12:38:37 +0000 (UTC) X-FDA: 82407259074.21.18F28D4 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf22.hostedemail.com (Postfix) with ESMTP id C75F6C001C for ; Fri, 2 Aug 2024 12:38:33 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=EZAy3bMq; spf=none (imf22.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1722602287; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=a2xbx+l3XorIuUZYCisP3R92j0F82O08k/og0JN1IR0=; b=lVlq4dGFahe3nXkILRwZbdf0DJUjINn8hZ0QCZYCVP0FAfviFsdOV7UiiEQKZkuIVFMhX6 jjXucAwlp1vDuHeOhUs06m9TJEBkGAEk/3vTMWFhNgADfil2vKpydQCwoV+5MpxldWw//n CUq+pEmehvh4Qjus4vxbpX+01sC4BeU= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=EZAy3bMq; spf=none (imf22.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1722602287; a=rsa-sha256; cv=none; b=yavy1DpHtdv1g0aoTZM6rsJ75YlTF2ph8/fKuO/zgppfGzjoIiZnp1f5Td1X7zTuj3H4QC tgh7w4VDJSsok87VjUED3pS6KwriBwa+ojwFi2bh//gX0mUD/JM/eSJJ4/2sJlD4rYroby rF+/TqGy3K6YMF8IpQXNl1rNtre2I4U= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=a2xbx+l3XorIuUZYCisP3R92j0F82O08k/og0JN1IR0=; b=EZAy3bMqVSECnfwHSQ4Wu0uA6q xV0U4klQWNuJ8HJ4o4DJkHV3i7ZQsgJzvcTdomKwrDwjn2GkObh0pkOu11soxlKpqx3InSYdzUbc0 ay5YHfmCZSZVuNqcUWBfSyaERGbgEEv9Y0OP0F+J/u9qUNLsx0s0lbOJ+Eid1eps6v7xkgmH391Wf 7jj0j+l+ntzSByqfmqqT5rPB86spd7D5USB8d2viqfuVLFIjZPjBEl6A+dfhYSBgbNQHt+wf8/CUr OcEacAF/XFzMHrzu6dMueYEpjoGdsB4A3yrtFZhRuc55k4AWNoeJ8WDkP4MEx5zI/ZkyBkz5cJ2mH vBWBeFXA==; Received: from willy by casper.infradead.org with local (Exim 4.97.1 #2 (Red Hat Linux)) id 1sZrY3-0000000112T-0NMz; Fri, 02 Aug 2024 12:38:19 +0000 Date: Fri, 2 Aug 2024 13:38:18 +0100 From: Matthew Wilcox To: David Woodhouse Cc: Carsten Stollmaier , Sean Christopherson , Paolo Bonzini , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , nh-open-source@amazon.com, Peter Xu , Sebastian Biemueller , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Andrew Morton , "linux-mm@kvack.org" , Andrea Arcangeli Subject: Re: [PATCH] KVM: x86: Use gfn_to_pfn_cache for steal_time Message-ID: References: <20240802114402.96669-1-stollmc@amazon.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: C75F6C001C X-Stat-Signature: ywt5788agg4xe69cy1gxqrt33yz5c1ux X-HE-Tag: 1722602313-515911 X-HE-Meta: U2FsdGVkX1/F0N38axARcA9i0m5odula6Fn6VNivrVyhgZVZAf/zF+V4cvuh1KhJgZlnNblKFMhnCVyIP5u33I0lIJ/Gt+CD/T5gZTVzcglHDuFdTWjKZ26rEO7+2vHqWURSOOHZFEGT3SDgCE63uqWaQci3BFS/tIK0ZkCdrqhStg9xFRa77ZUnlRKEzJb8IMHYrlFdI46JQJ4jKkAKaC9TLHIi16OckRiRfMHqR6okZV5KJucl/cA6bkEWfMUJOcGvWs1P8Strung1xnCSXJDat7jdjJrkk7PecZx8AW8wMl5tYHo5SDq03YsyzEJzhy908LITQwASBM5vResH2wq0+VmbLPJZtJgreFCUOC6w0lpG4BQiJzgCILgwFMLJSGoDMkSSXjiyD8ZXtq6Iqc0PLopftGJzO3lHaZcESbv/J1m54i32EJsfFAs3owlPldo1ALuaTvcBjIJZIWNXEbAx9FebWPUBYnwbKwjwNTM89N2hQ7XkfBaZJI5UoQdoFpaQOEemCDGzq2es7ar8YAIHGjWGYGXY79oMC1fkhRgZEWIVNAKJsJZWxVJhk2kxQbWCwy25GKz56spfINAQxQqinLajcXi/u10eKsLlugT3FNBKBnS7sat5jaGfKVmrqDoTG3QVFSLrjn8l6axADD7VFgFMDc7CIfFeU/QaFEj/h169uVmVEmeYGImJDwhqFAxs4/LGepFy2PSBCSdTKoUTM3JT80ZhdQoXk7zsLz8i8tiYpux8vnHPZVqyxHp5miT/f2D7QTtRbu0mxt0rORDdAdMXkl08eJW+b8LglwHlG0CwMLhLzzEWQrHnmfVGG/jcZbaoQeVqlIFJk66RRuHCri0UnA5mtBNyXUITpbOtUE72itW7BjIjBaTT9oiNZTNbyK6n7k0nYUuEjCBA9xYXiu13KWLDg+VjYzGgMpuS/NJkMkgbUIWXvwH6C46imv5X/TnCIG5JjcNSYbw Lgquh5Vb 96bSEIJKX8G2pfnHoCi8xEMMjnPJBzJwwtS4khfnr3d1+YEH7PGFu7pdmFDeW5iXT4CttZFOZXpQYmCzmW+dOT6/prEtGxtd2YJEiuQt2XKX8PYGqhSdaT0ZkzDJRQs1fL4FWTpvjR1Bdx3STDH0K2iVBuBuAP0G/MzNWpNHJC6sPdKpebWMOey7kBz3asfBl8DC2ibUohkhAnGLHGDR5/uk/W3AfPIR2j0Hdc7VFfLtR94gecjGsRdpl64qhMUOftxStOZ7SdHcP8736n/Pak5GnULqKekJHXMnNGF0OjRLPzp8XhfnzVS1BiH7VpXsKYHWM9kVGqEmQcNEID98OoLz6KQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Aug 02, 2024 at 01:03:16PM +0100, David Woodhouse wrote: > On Fri, 2024-08-02 at 11:44 +0000, Carsten Stollmaier wrote: > > handle_userfault uses TASK_INTERRUPTIBLE, so it is interruptible by > > signals. do_user_addr_fault then busy-retries it if the pending signal > > is non-fatal. This leads to contention of the mmap_lock. Why does handle_userfault use TASK_INTERRUPTIBLE? We really don't want to stop handling a page fault just because somebody resized a window or a timer went off. TASK_KILLABLE, sure. This goes all the way back to Andreas' terse "add new syscall" patch, so there's no justification for it in the commit logs. > The busy-loop causes so much contention on mmap_lock that post-copy > live migration fails to make progress, and is leading to failures. Yes? > > > This patch replaces the use of gfn_to_hva_cache with gfn_to_pfn_cache, > > as gfn_to_pfn_cache ensures page presence for the memory access, > > preventing the contention of the mmap_lock. > > > > Signed-off-by: Carsten Stollmaier > > Reviewed-by: David Woodhouse > > I think this makes sense on its own, as it addresses the specific case > where KVM is *likely* to be touching a userfaulted (guest) page. And it > allows us to ditch yet another explicit asm exception handler. > > We should note, though, that in terms of the original problem described > above, it's a bit of a workaround. It just means that by using > kvm_gpc_refresh() to obtain the user page, we end up in > handle_userfault() without the FAULT_FLAG_INTERRUPTIBLE flag. > > (Note to self: should kvm_gpc_refresh() take fault flags, to allow > interruptible and killable modes to be selected by its caller?) > > > An alternative workaround (which perhaps we should *also* consider) > looked like this (plus some suitable code comment, of course): > > --- a/arch/x86/mm/fault.c > +++ b/arch/x86/mm/fault.c > @@ -1304,6 +1304,8 @@ void do_user_addr_fault(struct pt_regs *regs, > */ > if (user_mode(regs)) > flags |= FAULT_FLAG_USER; > + else > + flags &= ~FAULT_FLAG_INTERRUPTIBLE; > > #ifdef CONFIG_X86_64 > /* > > > That would *also* handle arbitrary copy_to_user/copy_from_user() to > userfault pages, which could theoretically hit the same busy loop. > > I'm actually tempted to make user access *interruptible* though, and > either add copy_{from,to}_user_interruptible() or change the semantics > of the existing ones (which I believe are already killable). > > That would require each architecture implementing interruptible > exceptions, by doing an extable lookup before the retry. Not overly > complex, but needs to be done for all architectures (although not at > once; we could live with not-yet-done architectures just remaining > killable). > > Thoughts? >