From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3AD8E629 for ; Tue, 26 Nov 2024 00:06:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732579615; cv=none; b=HzUUtrnd38wucooLY6DLcV+Tj5Qnokloip1o3dHL18VshmtoMKPvJKdQvTgEUZhBAGeMPiG+BcQUpsufIGDfWPFZ6Nlw4BWTSdw+M/SkZgxyduwXl4BTmmN0MaX8fcrp4uYi10gSQ59tzCSzfY6z0ZkouPy4rB5PE8jbKNqrcyc= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732579615; c=relaxed/simple; bh=M5Co9uthH1gnhd3RN0apUr1JSaNbeWOyJUNf+j9Ky/k=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=lR1rBowN9FXpMYFoa6PWvqyO4Jt1Hdj0WWpNewXuK8Ei+RctG6zSaMgGRbVAm+FFJtbmDNd1zIccpLOYi0Nf+5bpRmaU8lPHuQaTpNwKyn3lGc2EU2WTUIfg7YecRAKk2XHtGsyhBXAVGwc1h8WwnU8CVeERIkFlfNXAbsgrqDQ= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=dpa33jkQ; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="dpa33jkQ" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-2ea050e557dso5942754a91.1 for ; Mon, 25 Nov 2024 16:06:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1732579613; x=1733184413; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=ugon1neLTS7p3z4YTPYc8MJMYZy7L4ypwCsQUtXZKYM=; b=dpa33jkQSiyQ8y92Jlvk1+/2D/vgGqntoggfBx4BhtBhFNHNwT6GKy6Ke2MAkdkQMC FxMMQq4Xx08U10aQFUVCqemCMh8Ix9/9Vc4JXBBJXfBgARg5CI4gsYf7rqWOiElN+6b0 NP8DGrE2a3DHjy4GDhfKBoF0mkfo7kPYhW3vlF0SedQetQqTxHRwpxbz4YxvUXTFRWgC xockf/ETNHNYAVlpcV980ynL2CH0ICoBOAoI4qMb+lNgsJvsoUEt4Sq21kfi3RZgkDVJ LIFJ1XPF3Yj+/8hHD1nvlKPhO9bGaj0UrF8umItBVYdbc2Ei89yN4FzK6MWxFCciSc1x ZCQA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1732579613; x=1733184413; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ugon1neLTS7p3z4YTPYc8MJMYZy7L4ypwCsQUtXZKYM=; b=Db7xaDgdqF+aF+8eRciGNyZOPCfvfw18/nYmb0l8Uv19sPGscIBNxk7GwDtmpjffDt V3JLve3+XLzPmR/RTPFZmEOPe4J9lcArlEiTzjiqfkRfQs5XjzXmEz/aEF+HGY5D3TqT xgSQya5CDkIVlk0KPNYDwwT29llgu9k+3hlM6uEc0RjI8UuDfrBmKg82K96HgVUHGLkq rBmgTl+gYowzWnWPLCFlG9m5DjHVWUfAzF+u6MSr3zfnFMp66lY+XOQChsCwfVqCBg1o 3z1lOvnzsk1fTHK4LiWHi3JERg2c9oLKCr6qxYkkoR/zo/JOk0/Izu0nxN02jjqgpQtw 92cg== X-Forwarded-Encrypted: i=1; AJvYcCWphYHO7ruTskPVaH3DrX38QiZe7m5sBmPyqDZT+/FmK7KKpK/FbkCf9sk7MqrYjJNYUy2WhamffQKRZwY=@vger.kernel.org X-Gm-Message-State: AOJu0YyE4bp7EJ3y6Rxa8Q/JhdaikFosNAPFidKslM96NJVmxkVwZrvb OCJZAz2W40BR8P3xJW0kCbpofL1pUMl9oiu42BAJn99ePv5UlLV3XMAAa1Hj2+KmJ7VIzR3fDxC dww== X-Google-Smtp-Source: AGHT+IG9PKWc+g14McheU5BkRkSI/aTn5PquV13KcWCYV3a7A5r9c+gWMln98TaoHNW0Ap5dQ27lIcklU4M= X-Received: from pjbta12.prod.google.com ([2002:a17:90b:4ecc:b0:2ea:3a1b:f493]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:1d81:b0:2ea:aa69:1077 with SMTP id 98e67ed59e1d1-2eb0e02b69cmr18708951a91.6.1732579613575; Mon, 25 Nov 2024 16:06:53 -0800 (PST) Date: Mon, 25 Nov 2024 16:06:52 -0800 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20241118130403.23184-1-kalyazin@amazon.com> Message-ID: Subject: Re: [PATCH] KVM: x86: async_pf: check earlier if can deliver async pf From: Sean Christopherson To: Nikita Kalyazin Cc: pbonzini@redhat.com, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, david@redhat.com, peterx@redhat.com, oleg@redhat.com, vkuznets@redhat.com, gshan@redhat.com, graf@amazon.de, jgowans@amazon.com, roypat@amazon.co.uk, derekmn@amazon.com, nsaenz@amazon.es, xmarcalx@amazon.com Content-Type: text/plain; charset="us-ascii" On Mon, Nov 25, 2024, Nikita Kalyazin wrote: > On 21/11/2024 21:05, Sean Christopherson wrote: > > On Thu, Nov 21, 2024, Nikita Kalyazin wrote: > > > On 19/11/2024 13:24, Sean Christopherson wrote: > > > > None of this justifies breaking host-side, non-paravirt async page faults. If a > > > > vCPU hits a missing page, KVM can schedule out the vCPU and let something else > > > > run on the pCPU, or enter idle and let the SMT sibling get more cycles, or maybe > > > > even enter a low enough sleep state to let other cores turbo a wee bit. > > > > > > > > I have no objection to disabling host async page faults, e.g. it's probably a net > > > > negative for 1:1 vCPU:pCPU pinned setups, but such disabling needs an opt-in from > > > > userspace. > > > > > > That's a good point, I didn't think about it. The async work would still > > > need to execute somewhere in that case (or sleep in GUP until the page is > > > available). > > > > The "async work" is often an I/O operation, e.g. to pull in the page from disk, > > or over the network from the source. The *CPU* doesn't need to actively do > > anything for those operations. The I/O is initiated, so the CPU can do something > > else, or go idle if there's no other work to be done. > > > > > If processing the fault synchronously, the vCPU thread can also sleep in the > > > same way freeing the pCPU for something else, > > > > If and only if the vCPU can handle a PV async #PF. E.g. if the guest kernel flat > > out doesn't support PV async #PF, or the fault happened while the guest was in an > > incompatible mode, etc. > > > > If KVM doesn't do async #PFs of any kind, the vCPU will spin on the fault until > > the I/O completes and the page is ready. > > I ran a little experiment to see that by backing guest memory by a file on > FUSE and delaying response to one of the read operations to emulate a delay > in fault processing. ... > In both cases the fault handling code is blocked and the pCPU is free for > other tasks. I can't see the vCPU spinning on the IO to get completed if > the async task isn't created. I tried that with and without async PF > enabled by the guest (MSR_KVM_ASYNC_PF_EN). > > What am I missing? Ah, I was wrong about the vCPU spinning. The goal is specifically to schedule() from KVM context, i.e. from kvm_vcpu_block(), so that if a virtual interrupt arrives for the guest, KVM can wake the vCPU and deliver the IRQ, e.g. to reduce latency for interrupt delivery, and possible even to let the guest schedule in a different task if the IRQ is the guest's tick. Letting mm/ or fs/ do schedule() means the only wake event even for the vCPU task is the completion of the I/O (or whatever the fault is waiting on).