From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2ED343BB3C for ; Thu, 1 Feb 2024 04:57:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=18.9.28.11 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706763458; cv=none; b=ZkNWFEZI+Lj9i8xNZv3rwRa5qZcq9GJwYVYKJGxoXZHgV0gT76f1o7Vivc2VABM3ggBnM3bCGafQQZnijfipTO70VsvGPKh5LexQr6EIsTKDDOrncExaT3v11iVN4F9MS+MgApAW49dov1qQAp3z3ZhM6XWaeof6LhS0/ihdm0o= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706763458; c=relaxed/simple; bh=zYVGiC1rLRFF+J+Yr82QEPl5LjdWofkbHi4aPWTR5Z8=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=BQoRqRLliJQqKkvS6jIw6MglzN8kxMq0IoBuCMHPBOFuWlpisIwZs0uUY5l/z5xas1+EUNEipraBLIYW2jlATIqe+ThuRtRWiE4+wrnbMV0JoG6AYY7Nho1dTOwbpD67gGQ7iAE5JL6fRpLz/hMOYGMVoWkc45sMN7lQrBbU6tc= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=mit.edu; spf=pass smtp.mailfrom=mit.edu; dkim=pass (2048-bit key) header.d=mit.edu header.i=@mit.edu header.b=MqwSgJtn; arc=none smtp.client-ip=18.9.28.11 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=mit.edu Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=mit.edu Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=mit.edu header.i=@mit.edu header.b="MqwSgJtn" Received: from cwcc.thunk.org (pool-173-48-116-252.bstnma.fios.verizon.net [173.48.116.252]) (authenticated bits=0) (User authenticated as tytso@ATHENA.MIT.EDU) by outgoing.mit.edu (8.14.7/8.12.4) with ESMTP id 4114vATx012056 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 31 Jan 2024 23:57:11 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mit.edu; s=outgoing; t=1706763434; bh=s5ToCA1JK1JWcXOK8/ttVS+ZKfu8UHQRYtgp6PPZIrk=; h=Date:From:Subject:Message-ID:MIME-Version:Content-Type; b=MqwSgJtnw4X94zDhBAH6QhHOATjVVZYoLWGI0YKJLSWQwmWWTNJLtmHmrkbkdW3YG ov/cwRHzxDcMWkoZ+QGs2/X3qO/+Is1sILcwTcZXCKj+eQ7gvpgX2Filnj+BmfPxHW zEKcY6ib7BfLs8zNsKvX7hAdPz9r9gOLKR7hx3IXW1jnhf71C5GCcnfEPsk3adQL0r EhjRDnSms1WnNOFHwou6YDmR5oanEQg0PfTf4ENs5Gl+zhjh3f/D5WnmGfHA+ZCkzv tYq2GVgzrjDM2UzjbnuirVULtYw+A0i/N85rQVmy/uelT+3CQNYIXvJzH0Cfv2+erM UhQsg/cq8tdYw== Received: by cwcc.thunk.org (Postfix, from userid 15806) id 4E15B15C0667; Wed, 31 Jan 2024 23:57:10 -0500 (EST) Date: Wed, 31 Jan 2024 23:57:10 -0500 From: "Theodore Ts'o" To: "Jason A. Donenfeld" Cc: "Reshetova, Elena" , "Kirill A. Shutemov" , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , "x86@kernel.org" , Kuppuswamy Sathyanarayanan , "Nakajima, Jun" , Tom Lendacky , "Kalra, Ashish" , Sean Christopherson , "linux-coco@lists.linux.dev" , "linux-kernel@vger.kernel.org" Subject: Re: [PATCH 1/2] x86/random: Retry on RDSEED failure Message-ID: <20240201045710.GD2356784@mit.edu> References: <20240131140756.GB2356784@mit.edu> <20240131171042.GA2371371@mit.edu> Precedence: bulk X-Mailing-List: linux-coco@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: On Wed, Jan 31, 2024 at 07:01:01PM +0100, Jason A. Donenfeld wrote: > So if this is what we're congealing around, I guess we can: > > 0) Leave RDSEED alone and focus on RDRAND. > 1) Add `WARN_ON_ONCE(in_early_boot);` to the failure path of RDRAND > (and simply hope this doesn't get exploited for guest-guest boot DoS). > 2) Loop forever in RDRAND on CoCo VMs, post-boot, with the comments > and variable naming making it clear that this is a hardware bug > workaround, not a "feature" added for "extra security". > 3) Complain loudly to Intel and get them to fix the hardware. > > Though, a large part of me would really like to skip that step (2), > first because it's a pretty gross bandaid that adds lots of > complexity, and second because it'll make (3) less poignant If we need to loop more than, say, 10 seconds in a CoCo VM, I'd just panic with a repeated RDRAND failure message. This makes the point of (3) that much pointed, and it's better than having a CoCo VM mysteriously hang in the face of a DOS attack. I'll note that it should be relatively easy for Intel to make sure that if there is an undue draw on RDRAND, to at that point enforce "fair share" mode where each of the N cores get at most 1/N of the available entropy. So if you have single core CoCo VM on a 256 core machine trying to boot, and the evil attacker has purchased 255 cores worth of VM's, all of which are busy-looping on RDRAND, while the CoCo VM is booting, if it is looping on RDRAND, it should be getting 1/256th of the availabe RDRAND output, and since it is only trying to grab enough randomness to seed the /dev/random CRNG, if it can't get enough randomness in 10 seconds --- well, Intel's customers should be finding another vendor's CPU that can do a better job. - Ted