From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 693E820459B for ; Tue, 1 Apr 2025 17:56:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743530198; cv=none; b=bl+ggG2/8g8wHXlf3Kawo4pTpqBVFoLAd6MWqSqYmvHPGgB0/Z+G8cNgTEvPTfbGr/joAB6HRF+yHMXDVm2gRdEF81SvpZm3ZBEcsNvqwW5LnPGAKGIbahddwFWgy4kJZggMiPDVXonNyDlOg7EwzpfwnhsUgfKrTkE54i+hPok= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743530198; c=relaxed/simple; bh=JY/McJ0AfPTWrwEuCB0Y4tX5FQiivcdCF5ulAN/9/k0=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=bgVoVlu0U+ErfmIiQDzonLZKUa5bjy9QFgjYsXhv5EyyZ72Qh1Zn+VNxbt6SRqF8Ip8O8bgahlRJiItlbbls4NhvUUxgHLHN2H/mN3ouPU0vw1IwN70VqyBdcj7OfEurm+sBo4yjSwi07e+gRZq+tMkpsO2V7+r1vDgRc48/vRg= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=snGltB86; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="snGltB86" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-2ff854a2541so9677981a91.0 for ; Tue, 01 Apr 2025 10:56:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1743530196; x=1744134996; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=Qoqh5w7v6sEF/zJAyXyVye9BB0n7cEU53dS3BSwT4ds=; b=snGltB86sVlySJqfmWZX2jUih07kkvUDvyRj6i5mR7+u+LFxLzL67PycRsan6Y4IGE Od3Qmt+I00gdNB100T9LmSu15wIgDUo2n9lQ/8I/JdsbSwoX3aHc9aKbC1yLaskBrgok vJolC45lEEOFc/Rdt5bJYqD6BM966xNQONtjHBhEpWxO51SxJjxmHuXpj7nNQG1vl55B UNw0AXm5V52Fwio/wX56hZFDmH3mzCWx4Tyi2pllsOfUfDoRfXBZAHjaY+WK5rw6Dm0J n58VfGaXm/POONdXmAQWkHyTBsnXkN0CBotKfXotoaWOFJCiZtp81+PxWnoGA7JjcI1e g3Cg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1743530196; x=1744134996; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=Qoqh5w7v6sEF/zJAyXyVye9BB0n7cEU53dS3BSwT4ds=; b=i0tCGr2UyyJI56KmSRVhyETzLUvbvYlWz/5VC+TQ86TT9dunGcIjjSquDuzF1fFAnV Jdv6ZEH45+X39QwQwHkZn8M/LnQ7ALeVOjDknZBCBiAIgu2HOkrrbzGiiLAnlIEVeLTM PRuRLDz5m56MJry1UJN8ZVmw8G3rmQVuSTj6s9IFWJb/Ll9zbt6OUJcv7ey1AIZxFtZC ZVGy6wsItyzUPT9G96BrRRqrKPDUJU4ZqfcZD+9SyhULCyrVuYzNTbUZyw3dMVIUN48e 4dnybpx4EvhFYRF7MX8is8AE8ll/T43SlgO3ypjv6uMa+32vdVgW4FVcRuONWaD+LomC u0ig== X-Forwarded-Encrypted: i=1; AJvYcCU/vGAWSpIEc4uNiRx+efqS7xRHPYgsS5BQgdM4vBueYeZSOwfSRcQYRDqt8xQnKgorFSR5tGjO5/FeiZQ=@vger.kernel.org X-Gm-Message-State: AOJu0YzfFZluU1P8BkbFQ16L6vtgsoAoyTnteYxsxck+Z8yVx8lb857k vGqBtj8FZsAohuu9vXbrBHfMbmu3A5xpwKXOaauPZtO2El6gPGQuzV1m9rydlIrdUQVHyuTpJCc 1cQ== X-Google-Smtp-Source: AGHT+IH+MvNYmYhEz8ZKuhfDMkVTnE4dMnSFnMG49oA9TpYeZSiC5C72BuaWvZUrJMa4rmOOs2d7G4zOFVI= X-Received: from pfbit9.prod.google.com ([2002:a05:6a00:4589:b0:736:5b36:db8f]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:aa7:88c9:0:b0:732:5276:4ac9 with SMTP id d2e1a72fcca58-7398037e929mr18764281b3a.9.1743530195575; Tue, 01 Apr 2025 10:56:35 -0700 (PDT) Date: Tue, 1 Apr 2025 10:56:34 -0700 In-Reply-To: <37b9903a-e8fc-4d57-a1ae-2bd2f26a9974@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250318153316.1970147-1-chao.gao@intel.com> <20250318153316.1970147-2-chao.gao@intel.com> <37b9903a-e8fc-4d57-a1ae-2bd2f26a9974@intel.com> Message-ID: Subject: Re: [PATCH v4 1/8] x86/fpu/xstate: Always preserve non-user xfeatures/flags in __state_perm From: Sean Christopherson To: "Chang S. Bae" Cc: Chao Gao , x86@kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, tglx@linutronix.de, dave.hansen@intel.com, pbonzini@redhat.com, peterz@infradead.org, rick.p.edgecombe@intel.com, weijiang.yang@intel.com, john.allen@amd.com, bp@alien8.de, xin3.li@intel.com, Maxim Levitsky , Ingo Molnar , Dave Hansen , "H. Peter Anvin" , Mitchell Levy , Samuel Holland , Li RongQing , Adamos Ttofari , Vignesh Balasubramanian , Aruna Ramakrishna Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable On Tue, Apr 01, 2025, Chang S. Bae wrote: > On 3/18/2025 8:31 AM, Chao Gao wrote: > >=20 > > When granting userspace or a KVM guest access to an xfeature, preserve = the > > entity's existing supervisor and software-defined permissions as tracke= d > > by __state_perm, i.e. use __state_perm to track *all* permissions even > > though all supported supervisor xfeatures are granted to all FPUs and > > FPU_GUEST_PERM_LOCKED disallows changing permissions. > >=20 > > Effectively clobbering supervisor permissions results in inconsistent > > behavior, as xstate_get_group_perm() will report supervisor features fo= r > > process that do NOT request access to dynamic user xfeatures, whereas a= ny > > and all supervisor features will be absent from the set of permissions = for > > any process that is granted access to one or more dynamic xfeatures (wh= ich > > right now means AMX). > >=20 > > The inconsistency isn't problematic because fpu_xstate_prctl() already > > strips out everything except user xfeatures: > >=20 > > case ARCH_GET_XCOMP_PERM: > > /* > > * Lockless snapshot as it can also change right after= the > > * dropping the lock. > > */ > > permitted =3D xstate_get_host_group_perm(); > > permitted &=3D XFEATURE_MASK_USER_SUPPORTED; > > return put_user(permitted, uptr); > >=20 > > case ARCH_GET_XCOMP_GUEST_PERM: > > permitted =3D xstate_get_guest_group_perm(); > > permitted &=3D XFEATURE_MASK_USER_SUPPORTED; > > return put_user(permitted, uptr); > >=20 > > and similarly KVM doesn't apply the __state_perm to supervisor states > > (kvm_get_filtered_xcr0() incorporates xstate_get_guest_group_perm()): > >=20 > > case 0xd: { > > u64 permitted_xcr0 =3D kvm_get_filtered_xcr0(); > > u64 permitted_xss =3D kvm_caps.supported_xss; > >=20 > > But if KVM in particular were to ever change, dropping supervisor > > permissions would result in subtle bugs in KVM's reporting of supported > > CPUID settings. And the above behavior also means that having supervis= or > > xfeatures in __state_perm is correctly handled by all users. > >=20 > > Dropping supervisor permissions also creates another landmine for KVM. = If > > more dynamic user xfeatures are ever added, requesting access to multip= le > > xfeatures in separate ARCH_REQ_XCOMP_GUEST_PERM calls will result in th= e > > second invocation of __xstate_request_perm() computing the wrong ksize,= as > > as the mask passed to xstate_calculate_size() would not contain *any* > > supervisor features. > >=20 > > Commit 781c64bfcb73 ("x86/fpu/xstate: Handle supervisor states in XSTAT= E > > permissions") fudged around the size issue for userspace FPUs, but for > > reasons unknown skipped guest FPUs. Lack of a fix for KVM "works" only > > because KVM doesn't yet support virtualizing features that have supervi= sor > > xfeatures, i.e. as of today, KVM guest FPUs will never need the relevan= t > > xfeatures. > >=20 > > Simply extending the hack-a-fix for guests would temporarily solve the > > ksize issue, but wouldn't address the inconsistency issue and would lea= ve > > another lurking pitfall for KVM. KVM support for virtualizing CET will > > likely add CET_KERNEL as a guest-only xfeature, i.e. CET_KERNEL will no= t > > be set in xfeatures_mask_supervisor() and would again be dropped when > > granting access to dynamic xfeatures. > >=20 > > Note, the existing clobbering behavior is rather subtle. The @permitte= d > > parameter to __xstate_request_perm() comes from: > >=20 > > permitted =3D xstate_get_group_perm(guest); > >=20 > > which is either fpu->guest_perm.__state_perm or fpu->perm.__state_perm, > > where __state_perm is initialized to: > >=20 > > fpu->perm.__state_perm =3D fpu_kernel_cfg.default_fea= tures; > >=20 > > and copied to the guest side of things: > >=20 > > /* Same defaults for guests */ > > fpu->guest_perm =3D fpu->perm; > >=20 > > fpu_kernel_cfg.default_features contains everything except the dynamic > > xfeatures, i.e. everything except XFEATURE_MASK_XTILE_DATA: > >=20 > > fpu_kernel_cfg.default_features =3D fpu_kernel_cfg.max_feature= s; > > fpu_kernel_cfg.default_features &=3D ~XFEATURE_MASK_USER_DYNAM= IC; > >=20 > > When __xstate_request_perm() restricts the local "mask" variable to > > compute the user state size: > >=20 > > mask &=3D XFEATURE_MASK_USER_SUPPORTED; > > usize =3D xstate_calculate_size(mask, false); > >=20 > > it subtly overwrites the target __state_perm with "mask" containing onl= y > > user xfeatures: > >=20 > > perm =3D guest ? &fpu->guest_perm : &fpu->perm; > > /* Pairs with the READ_ONCE() in xstate_get_group_perm() */ > > WRITE_ONCE(perm->__state_perm, mask); >=20 > This changelog appears to be largely derived from Sean=E2=80=99s previous= email. FWIW, I wrote the changelog. I'm sure I derived many of the details from m= y original mail, but I would rather not redirect future readers to lore to fu= lly understand the change. https://lore.kernel.org/all/20231103224402.347278-1-seanjc@google.com > I think it can be significantly shortened to focus on the key points, suc= h > as: I strongly prefer the extremely verbose version. I wrote the code, and in = all honesty, the below short version doesn't help me understand the full scope = of the change (I have long since paged out the context). From a KVM perspecti= ve, capturing why this flaw isn't problematic (yet!) is just as important as fi= xing the issue. > x86/fpu/xstate: Preserve non-user bits in permission handling >=20 > When granting userspace or a KVM guest access to an xfeature, the task > leader=E2=80=99s perm->__state_perm (host or guest) is overwritten, unint= entionally > discarding non-user bits. Additionally, supervisor state permissions are > always granted. >=20 > The current behavior presents the following issues: >=20 > * Inconsistencies in permission handling =E2=80=93 Supervisor permissio= ns are > universally granted, and the FPU_GUEST_PERM_LOCKED bit is explicitly > set to prevent permission changes. >=20 > * Redundant permission setting =E2=80=93 Since supervisor state permiss= ions > are always granted, the permitted mask already includes them, making > it unnecessary to set them again. >=20 > Ensure that __xstate_request_perm() does not inadvertently drop > supervisor and software-defined permissions. Also, avoid redundantly > granting supervisor state permissions, and document this behavior in the > code comments. >=20 > Clarify the presence of non-user feature and flag bits in the field > description. >=20