From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.151]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 083E0EAF9 for ; Tue, 10 Oct 2023 09:18:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="WnXyyC9Z" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1696929517; x=1728465517; h=date:from:to:cc:subject:message-id:references: mime-version:in-reply-to; bh=nhMjTskI0kdje3r63TqyWep3YjWnF2MvMNQ53LvDfCs=; b=WnXyyC9ZEidMXu9aWe5tLeKOOyXJ+d/iQJSlRwYNoFpuJPA9W9aA/hL6 x6GtJVpARPJWBvV9O91XXSuMZk3HMYvzRvUnXz3j41mpLwJu1ROI8rpNv KU471CtaD49JandWQlMFZvMEtnRIr7P7QbarngjeRKL3cQBL39opXyNhE 2tU3mtsghQIsEq+Qnphbr9b8WtQEBrhpeU1XJSo/Bzy6uwhwf4OP+pMVN KEqdO9WdEG0vvxSn2RZa2rtARp2LpIlvJ/Mm6M6UPGrJbxNrjwwugqrQ4 JoS+kCeCaYhaG0uBWGK90G9AwbAGtsY1Bph+ZROVVhuErSMZNtpHqVZRH Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10858"; a="364643124" X-IronPort-AV: E=Sophos;i="6.03,212,1694761200"; d="scan'208";a="364643124" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Oct 2023 02:18:35 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10858"; a="844048252" X-IronPort-AV: E=Sophos;i="6.03,212,1694761200"; d="scan'208";a="844048252" Received: from yilunxu-optiplex-7050.sh.intel.com (HELO localhost) ([10.239.159.165]) by FMSMGA003.fm.intel.com with ESMTP; 10 Oct 2023 02:18:30 -0700 Date: Tue, 10 Oct 2023 17:17:36 +0800 From: Xu Yilun To: isaku.yamahata@intel.com Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, isaku.yamahata@gmail.com, Michael Roth , Paolo Bonzini , Sean Christopherson , erdemaktas@google.com, Sagi Shahar , David Matlack , Kai Huang , Zhi Wang , chen.bo@intel.com, linux-coco@lists.linux.dev, Chao Peng , Ackerley Tng , Vishal Annapurve , Yuan Yao , Jarkko Sakkinen , Quentin Perret , wei.w.wang@intel.com, Fuad Tabba Subject: Re: [PATCH 6/8] KVM: gmem, x86: Add gmem hook for invalidating private memory Message-ID: References: <8c9f0470ba6e5dc122f3f4e37c4dcfb6fb97b184.1692119201.git.isaku.yamahata@intel.com> Precedence: bulk X-Mailing-List: linux-coco@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <8c9f0470ba6e5dc122f3f4e37c4dcfb6fb97b184.1692119201.git.isaku.yamahata@intel.com> On 2023-08-15 at 10:18:53 -0700, isaku.yamahata@intel.com wrote: > From: Michael Roth > > TODO: add a CONFIG option that can be to completely skip arch > invalidation loop and avoid __weak references for arch/platforms that > don't need an additional invalidation hook. > > In some cases, like with SEV-SNP, guest memory needs to be updated in a > platform-specific manner before it can be safely freed back to the host. > Add hooks to wire up handling of this sort when freeing memory in > response to FALLOC_FL_PUNCH_HOLE operations. > > Also issue invalidations of all allocated pages when releasing the gmem > file so that the pages are not left in an unusable state when they get > freed back to the host. > > Signed-off-by: Michael Roth > Link: https://lore.kernel.org/r/20230612042559.375660-3-michael.roth@amd.com > [...] > +/* Handle arch-specific hooks needed before releasing guarded pages. */ > +static void kvm_gmem_issue_arch_invalidate(struct kvm *kvm, struct file *file, > + pgoff_t start, pgoff_t end) > +{ > + pgoff_t file_end = i_size_read(file_inode(file)) >> PAGE_SHIFT; > + pgoff_t index = start; > + > + end = min(end, file_end); > + > + while (index < end) { > + struct folio *folio; > + unsigned int order; > + struct page *page; > + kvm_pfn_t pfn; > + > + folio = __filemap_get_folio(file->f_mapping, index, > + FGP_LOCK, 0); > + if (!folio) { > + index++; > + continue; > + } > + > + page = folio_file_page(folio, index); > + pfn = page_to_pfn(page); > + order = folio_order(folio); > + > + kvm_arch_gmem_invalidate(kvm, pfn, pfn + min((1ul << order), end - index)); Observed an issue there. The valid page may not point to the first page in the folio, then the range [pfn, pfn + (1ul << order)) expands to the next folio. This makes a part of the pages be invalidated again when loop to the next folio. On TDX, it causes TDH_PHYMEM_PAGE_WBINVD failed. > + > + index = folio_next_index(folio); > + folio_unlock(folio); > + folio_put(folio); > + > + cond_resched(); > + } > +} My fix would be: diff --git a/virt/kvm/guest_mem.c b/virt/kvm/guest_mem.c index e629782d73d5..3665003c3746 100644 --- a/virt/kvm/guest_mem.c +++ b/virt/kvm/guest_mem.c @@ -155,7 +155,7 @@ static void kvm_gmem_issue_arch_invalidate(struct kvm *kvm, struct inode *inode, while (index < end) { struct folio *folio; - unsigned int order; + pgoff_t ntails; struct page *page; kvm_pfn_t pfn; @@ -168,9 +168,9 @@ static void kvm_gmem_issue_arch_invalidate(struct kvm *kvm, struct inode *inode, page = folio_file_page(folio, index); pfn = page_to_pfn(page); - order = folio_order(folio); + ntails = folio_nr_pages(folio) - folio_page_idx(folio, page); - kvm_arch_gmem_invalidate(kvm, pfn, pfn + min((1ul << order), end - index)); + kvm_arch_gmem_invalidate(kvm, pfn, pfn + min(ntails, end - index)); index = folio_next_index(folio); folio_unlock(folio); Thanks, Yilun