From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.9]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 89FF626FD9A for ; Fri, 24 Oct 2025 20:13:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.9 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1761336823; cv=none; b=puac+/sXJkUbf8zsaRhzYZtXTpLz9+zLsL+Jh/XWz2QSxNwBNb8mh0kqJ6hU+4z1i5uCp53iq7selIBFt0LF+q22dHZtvzDLYiPp9PKINe7qvL2rgBrsAiYeIZWP/o69OSmQvFrgHqM/j9ZqdG51Ej3hTOR3FqXf3xOAZ0tCA+I= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1761336823; c=relaxed/simple; bh=j3v5TNMbUFZB30eKW14I321bGNuowEE00eakVRG41Gk=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=p+gEa/kgmdKHowcWrLiYutt8LUm3g29cm4lRXy5hagP4+uWL7ChS87KEqjts9EFcB6pTobu4tHPIcBaqf7z/1+eQ6aGZrZuoa3G2ubvSOu/iNHVFkSa3Y/EuFh8EDZD/nmFNvxN5vjlCrPXPuV3luPyYCBGSYeIwIuv8xCceuUg= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=TyZySqVp; arc=none smtp.client-ip=198.175.65.9 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="TyZySqVp" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1761336822; x=1792872822; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=j3v5TNMbUFZB30eKW14I321bGNuowEE00eakVRG41Gk=; b=TyZySqVpYHun8+7bX5H2OjxJxUQVR957RQ8w+c6UMUdvcaI+vvTP+4Hu 9qiDToFMGI4Xh79MFKWxZcd+jgDBlTpTnVG7yrM2nlt9Po86C/rHZvMFD TrlVFpt+v7hv+1YBWX3s7cK20xQKev5WkvVyz2lCzXbJ25vuAkcfcioEh awNSP3M83NuTjWp++nba0Rp/2bopkUd7/gxTTpkrigD59odLv2/1wyXTl i1JYJkTXQj7RyHBE69t90Uvj5GdGWgZRM2O4fiG82emJFL7Tvisx4zM8a /gLMjBUX9ah/GxAu6s3bL+fDsxvVHDBFyZo78Bd/wiNUoOlGo47Jx9nS+ A==; X-CSE-ConnectionGUID: KNq2KL/DSVS49uaZ2Ys0Og== X-CSE-MsgGUID: P0x5LysqSRS/fxJCrBtbNQ== X-IronPort-AV: E=McAfee;i="6800,10657,11586"; a="86153514" X-IronPort-AV: E=Sophos;i="6.19,253,1754982000"; d="scan'208";a="86153514" Received: from orviesa004.jf.intel.com ([10.64.159.144]) by orvoesa101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Oct 2025 13:13:41 -0700 X-CSE-ConnectionGUID: P9cW1H75QuuPAB04qIwYaQ== X-CSE-MsgGUID: hrkB1Sj2RYGhD/jdDOzsHw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.19,253,1754982000"; d="scan'208";a="188815297" Received: from aschofie-mobl2.amr.corp.intel.com (HELO [10.125.109.44]) ([10.125.109.44]) by orviesa004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Oct 2025 13:13:40 -0700 Message-ID: <2e49e80f-fab0-4248-8dae-76543e3c6ae3@intel.com> Date: Fri, 24 Oct 2025 13:13:40 -0700 Precedence: bulk X-Mailing-List: linux-coco@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2 00/21] Runtime TDX Module update support To: dan.j.williams@intel.com, Chao Gao Cc: Vishal Annapurve , "Reshetova, Elena" , "linux-coco@lists.linux.dev" , "linux-kernel@vger.kernel.org" , "x86@kernel.org" , "Chatre, Reinette" , "Weiny, Ira" , "Huang, Kai" , "yilun.xu@linux.intel.com" , "sagis@google.com" , "paulmck@kernel.org" , "nik.borisov@suse.com" , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Ingo Molnar , "Kirill A. Shutemov" , Paolo Bonzini , "Edgecombe, Rick P" , Thomas Gleixner References: <5b4c2bb3-cfde-4559-a59d-0ff9f2a250b4@intel.com> <68fbd63450c7c_10e910021@dwillia2-mobl4.notmuch> From: Dave Hansen Content-Language: en-US Autocrypt: addr=dave.hansen@intel.com; keydata= xsFNBE6HMP0BEADIMA3XYkQfF3dwHlj58Yjsc4E5y5G67cfbt8dvaUq2fx1lR0K9h1bOI6fC oAiUXvGAOxPDsB/P6UEOISPpLl5IuYsSwAeZGkdQ5g6m1xq7AlDJQZddhr/1DC/nMVa/2BoY 2UnKuZuSBu7lgOE193+7Uks3416N2hTkyKUSNkduyoZ9F5twiBhxPJwPtn/wnch6n5RsoXsb ygOEDxLEsSk/7eyFycjE+btUtAWZtx+HseyaGfqkZK0Z9bT1lsaHecmB203xShwCPT49Blxz VOab8668QpaEOdLGhtvrVYVK7x4skyT3nGWcgDCl5/Vp3TWA4K+IofwvXzX2ON/Mj7aQwf5W iC+3nWC7q0uxKwwsddJ0Nu+dpA/UORQWa1NiAftEoSpk5+nUUi0WE+5DRm0H+TXKBWMGNCFn c6+EKg5zQaa8KqymHcOrSXNPmzJuXvDQ8uj2J8XuzCZfK4uy1+YdIr0yyEMI7mdh4KX50LO1 pmowEqDh7dLShTOif/7UtQYrzYq9cPnjU2ZW4qd5Qz2joSGTG9eCXLz5PRe5SqHxv6ljk8mb ApNuY7bOXO/A7T2j5RwXIlcmssqIjBcxsRRoIbpCwWWGjkYjzYCjgsNFL6rt4OL11OUF37wL QcTl7fbCGv53KfKPdYD5hcbguLKi/aCccJK18ZwNjFhqr4MliQARAQABzUVEYXZpZCBDaHJp c3RvcGhlciBIYW5zZW4gKEludGVsIFdvcmsgQWRkcmVzcykgPGRhdmUuaGFuc2VuQGludGVs LmNvbT7CwXgEEwECACIFAlQ+9J0CGwMGCwkIBwMCBhUIAgkKCwQWAgMBAh4BAheAAAoJEGg1 lTBwyZKwLZUP/0dnbhDc229u2u6WtK1s1cSd9WsflGXGagkR6liJ4um3XCfYWDHvIdkHYC1t MNcVHFBwmQkawxsYvgO8kXT3SaFZe4ISfB4K4CL2qp4JO+nJdlFUbZI7cz/Td9z8nHjMcWYF IQuTsWOLs/LBMTs+ANumibtw6UkiGVD3dfHJAOPNApjVr+M0P/lVmTeP8w0uVcd2syiaU5jB aht9CYATn+ytFGWZnBEEQFnqcibIaOrmoBLu2b3fKJEd8Jp7NHDSIdrvrMjYynmc6sZKUqH2 I1qOevaa8jUg7wlLJAWGfIqnu85kkqrVOkbNbk4TPub7VOqA6qG5GCNEIv6ZY7HLYd/vAkVY E8Plzq/NwLAuOWxvGrOl7OPuwVeR4hBDfcrNb990MFPpjGgACzAZyjdmYoMu8j3/MAEW4P0z F5+EYJAOZ+z212y1pchNNauehORXgjrNKsZwxwKpPY9qb84E3O9KYpwfATsqOoQ6tTgr+1BR CCwP712H+E9U5HJ0iibN/CDZFVPL1bRerHziuwuQuvE0qWg0+0SChFe9oq0KAwEkVs6ZDMB2 P16MieEEQ6StQRlvy2YBv80L1TMl3T90Bo1UUn6ARXEpcbFE0/aORH/jEXcRteb+vuik5UGY 5TsyLYdPur3TXm7XDBdmmyQVJjnJKYK9AQxj95KlXLVO38lczsFNBFRjzmoBEACyAxbvUEhd GDGNg0JhDdezyTdN8C9BFsdxyTLnSH31NRiyp1QtuxvcqGZjb2trDVuCbIzRrgMZLVgo3upr MIOx1CXEgmn23Zhh0EpdVHM8IKx9Z7V0r+rrpRWFE8/wQZngKYVi49PGoZj50ZEifEJ5qn/H Nsp2+Y+bTUjDdgWMATg9DiFMyv8fvoqgNsNyrrZTnSgoLzdxr89FGHZCoSoAK8gfgFHuO54B lI8QOfPDG9WDPJ66HCodjTlBEr/Cwq6GruxS5i2Y33YVqxvFvDa1tUtl+iJ2SWKS9kCai2DR 3BwVONJEYSDQaven/EHMlY1q8Vln3lGPsS11vSUK3QcNJjmrgYxH5KsVsf6PNRj9mp8Z1kIG qjRx08+nnyStWC0gZH6NrYyS9rpqH3j+hA2WcI7De51L4Rv9pFwzp161mvtc6eC/GxaiUGuH BNAVP0PY0fqvIC68p3rLIAW3f97uv4ce2RSQ7LbsPsimOeCo/5vgS6YQsj83E+AipPr09Caj 0hloj+hFoqiticNpmsxdWKoOsV0PftcQvBCCYuhKbZV9s5hjt9qn8CE86A5g5KqDf83Fxqm/ vXKgHNFHE5zgXGZnrmaf6resQzbvJHO0Fb0CcIohzrpPaL3YepcLDoCCgElGMGQjdCcSQ+Ci FCRl0Bvyj1YZUql+ZkptgGjikQARAQABwsFfBBgBAgAJBQJUY85qAhsMAAoJEGg1lTBwyZKw l4IQAIKHs/9po4spZDFyfDjunimEhVHqlUt7ggR1Hsl/tkvTSze8pI1P6dGp2XW6AnH1iayn yRcoyT0ZJ+Zmm4xAH1zqKjWplzqdb/dO28qk0bPso8+1oPO8oDhLm1+tY+cOvufXkBTm+whm +AyNTjaCRt6aSMnA/QHVGSJ8grrTJCoACVNhnXg/R0g90g8iV8Q+IBZyDkG0tBThaDdw1B2l asInUTeb9EiVfL/Zjdg5VWiF9LL7iS+9hTeVdR09vThQ/DhVbCNxVk+DtyBHsjOKifrVsYep WpRGBIAu3bK8eXtyvrw1igWTNs2wazJ71+0z2jMzbclKAyRHKU9JdN6Hkkgr2nPb561yjcB8 sIq1pFXKyO+nKy6SZYxOvHxCcjk2fkw6UmPU6/j/nQlj2lfOAgNVKuDLothIxzi8pndB8Jju KktE5HJqUUMXePkAYIxEQ0mMc8Po7tuXdejgPMwgP7x65xtfEqI0RuzbUioFltsp1jUaRwQZ MTsCeQDdjpgHsj+P2ZDeEKCbma4m6Ez/YWs4+zDm1X8uZDkZcfQlD9NldbKDJEXLIjYWo1PH hYepSffIWPyvBMBTW2W5FRjJ4vLRrJSUoEfJuPQ3vW9Y73foyo/qFoURHO48AinGPZ7PC7TF vUaNOTjKedrqHkaOcqB185ahG2had0xnFsDPlx5y In-Reply-To: <68fbd63450c7c_10e910021@dwillia2-mobl4.notmuch> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit On 10/24/25 12:40, dan.j.williams@intel.com wrote: > Dave Hansen wrote: >> On 10/24/25 00:43, Chao Gao wrote: >> ... >>> Beyond "the kvm_tdx object gets torn down during a build," I see two potential >>> issues: >>> >>> 1. TD Build and TDX migration aren't purely kernel processes -- they span multiple >>> KVM ioctls. Holding a read-write lock throughout the entire process would >>> require exiting to userspace while the lock is held. I think this is >>> irregular, but I'm not sure if it's acceptable for read-write semaphores. >> >> Sure, I guess it's irregular. But look at it this way: let's say we >> concocted some scheme to use a TD build refcount and a module update >> flag, had them both wait_event_interruptible() on each other, and then >> did wakeups. That would get the same semantics without an rwsem. > > This sounds unworkable to me. > > First, you cannot return to userspace while holding a lock. Lockdep will > rightfully scream: > > "WARNING: lock held when returning to user space!" Well, yup, it sure does look that way for normal lockdep-annotated lock types. It does seem like a sane rule to have for most things. But, just to be clear, this is a lockdep thing and a good, solid semantic to have. It's not a rule that no kernel locking structure can ever be held when returning to userspace. > The complexity of ensuring that a multi-stage ABI transaction completes > from the kernel side is painful. If that process dies in the middle of > its ABI sequence who cleans up these references? The 'struct kvm_tdx' has to get destroyed at some point. It also has a 'kvm_tdx_state' field that could be tied very tightly to the build status. The reference gets cleaned up before the point when the kvm_tdx->state memory is freed. > The operational mechanism to make sure that one process flow does not > mess up another process flow is for those process to communicate with > *userspace* file locks, or for those process to check for failures after > the fact and retry. Unless you can make the build side an atomic ABI, > this is a documentation + userspace problem, not a kernel problem. Yeah, that's a totally valid take on it. My only worry is that the module update is going to be off in another world from the thing building TDs. We had a similar set of challenges around microcode updates, CPUSVN and SGX enclaves. The guy doing "echo 1 > /sys/.../whatever" wasn't coordinating with every entity on the system that might run an SGX enclave. It certainly didn't help that enclave creation is typically done by unprivileged users. Maybe the KVM/TDX world is a _bit_ more narrow and they will be talking to each other, or the /dev/kvm permissions will be a nice funnel to get them talking to each other. The SGX solution, btw, was to at least ensure forward progress (CPUSVN update) when the last enclave goes away. So new enclaves aren't *prevented* from starting but the window when the first one starts (enclave count going from 0->1) is leveraged to do the update.