From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-ej1-f53.google.com (mail-ej1-f53.google.com [209.85.218.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id ED73C36C0A1 for ; Tue, 13 Jan 2026 19:27:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.53 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768332477; cv=none; b=rWyDWFZ7PKqccRMSsuDEUOf168kkegDuwRkZKcKlq5Z1W7bPmXF63QJT4NkIVJzYH2sGjZX2v3OgrywtkDqFDTSMGjeQeE6JMMUmA/tdKEpbNVU4ugfTCLl1CNlxqKrjndGaykP62JyJv2K9pn+6E5v0FlENwUsFk6WY7E9ISSY= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768332477; c=relaxed/simple; bh=j3MTvudFYp/Z3BA8v0Nf5anA/VDSXeRdzFYQIHWSPAc=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=eBbntptN0QaSBRGLoyZP7wZ0udTwSgS6Cjsj3OZrTBtWfwGsiK/xdGwraHr8Ttr6icg48QNvjbaryniPOa9XdvHe2wPd4sxmxazMz/XBQ4uLOe463z//YhoAsRzzmLNPGD8GjjVELVcIWQIDs07HdauDA6YTY5EH1nbaqi483rU= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=chromium.org; spf=pass smtp.mailfrom=chromium.org; dkim=pass (1024-bit key) header.d=chromium.org header.i=@chromium.org header.b=DNd2ZMxf; arc=none smtp.client-ip=209.85.218.53 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=chromium.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=chromium.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=chromium.org header.i=@chromium.org header.b="DNd2ZMxf" Received: by mail-ej1-f53.google.com with SMTP id a640c23a62f3a-b86f69bbe60so410505766b.1 for ; Tue, 13 Jan 2026 11:27:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1768332474; x=1768937274; darn=vger.kernel.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=LZBaZL2bW3E22EnT/tukcFNcW6Np3cq6LKK5NCta40E=; b=DNd2ZMxf8ee2YEmUorVDEpObdW237Ree7ApsW0zcztpWBq8l0j/I81K+HEqxOztAET aiMXxXL5QVPBzB7d0gB2l+W7sr8EXEY9AHy5OLrEgZStabohy+KDjnIYp+Pp7OveWVnv Lc9WUy3+2U87qsDxtY4egHqeu9l0GZC54truA= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1768332474; x=1768937274; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=LZBaZL2bW3E22EnT/tukcFNcW6Np3cq6LKK5NCta40E=; b=Gu4bj9UR2LIYylM9nEk0UF4qBTQLdTsixdAgh+bff00SPBRSXaUEzfv+1cpYNZjXRi PEw7+Kl6dkn+kABdpQU0oE/jBR1EKfihcfWIj3yHX0bcNjC3beuKKpqpVcTqH0pvMYns M//RsS3PuBzCiUMq/p7P58tav/CIHh9q/t2LwPH7H36q1w4So/g52Fm0sUcYVZZWsfYJ a915ifyCfhOyC3wYWUwQxYNZJ9qHiZMzuwdqJ5Y4DDhM0yXXsiIpdSy+Kz2s7vMX+W/O lQEjKUth6BEZd1txYXlHBlIaSUOiFcAtkPygF08MFrMGMpCrpADfASS696zB7SlMKI+u zT7Q== X-Forwarded-Encrypted: i=1; AJvYcCV9/WTjSPedj8REIg9ZmuAPWjq/vYAnrPSTVhMLaeSZSI35qQxj5Czk82wAa/N++7eUSfOuCm8NLpwQOBk=@vger.kernel.org X-Gm-Message-State: AOJu0Yx7Wk0FoV/LAoROQWTXLEwVpnBab4x5G/oOKLYR6gIWO3VdY0+/ 3QU13nO8HydNbCdBBq+a3uIegceDCYF1A04ZGp6NjmicOw5LK8OK55XpBZAngiMpRw== X-Gm-Gg: AY/fxX4yF+Jid7nGgt25nnMFx0gk9tzVFu35FimJyKNUVBEVJ2smaHgblagIjyqV4TA Er2yvbvWU8i3MOBLWVMkRE2HwCnwQHXKJscp3NF4fAlt7f4d3EdcXw/ghc7hZqvUMvoBLTyY0AK QcLEz/IejTdLvhDYTPJfax4NiJXBkYsUueAs/859BDlWPMq+D9kVLOmAoFjxkM5VzWhsN2SW64O hthjbyCG/tnUVSm8XW7QahPEiSxgIeJV4lPf3trgYm8/ZsfB4OT6K4OrAzUqlegpOx28iWsJIcy sPl7lepTVWUE6S2gV50RY/uRnbu4FIWJ4r8y66y6sWVz6N2lYsFrUQqGrblI6EsYIgu+nmaD4uf d19tgBAS7UjaEvk3dZ9l1wzirIxzsj4na9pZpdIXxW6RbQ6iKGjqliQGc791zCYeLQE2x8DIk9N syyoMe X-Received: by 2002:a17:906:c141:b0:b87:4f80:f4ca with SMTP id a640c23a62f3a-b87612c3b87mr19870766b.56.1768332474242; Tue, 13 Jan 2026 11:27:54 -0800 (PST) Received: from google.com ([2a00:79e0:a:200:e42d:53cb:2a0f:636a]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-b8724197145sm564109066b.11.2026.01.13.11.27.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 13 Jan 2026 11:27:53 -0800 (PST) Date: Tue, 13 Jan 2026 20:27:46 +0100 From: Dmytro Maluka To: Lu Baolu Cc: Joerg Roedel , Will Deacon , Robin Murphy , Kevin Tian , Jason Gunthorpe , Samiullah Khawaja , iommu@lists.linux.dev, linux-kernel@vger.kernel.org, "Vineeth Pillai (Google)" , Aashish Sharma Subject: Re: [PATCH 1/3] iommu/vt-d: Use 128-bit atomic updates for context entries Message-ID: References: <20260113030052.977366-1-baolu.lu@linux.intel.com> <20260113030052.977366-2-baolu.lu@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20260113030052.977366-2-baolu.lu@linux.intel.com> On Tue, Jan 13, 2026 at 11:00:46AM +0800, Lu Baolu wrote: > On Intel IOMMU, device context entries are accessed by hardware in > 128-bit chunks. Currently, the driver updates these entries by > programming the 'lo' and 'hi' 64-bit fields individually. > > This creates a potential race condition where the IOMMU hardware may fetch > a context entry while the CPU has only completed one of the two 64-bit > writes. This "torn" entry — consisting of half-old and half-new data — > could lead to unpredictable hardware behavior, especially when > transitioning the 'Present' bit or changing translation types. > > To ensure the IOMMU hardware always observes a consistent state, use > 128-bit atomic updates for context entries. This is achieved by building > context entries on the stack and write them to the table in a single > operation. > > As this relies on arch_cmpxchg128_local(), restrict INTEL_IOMMU > dependencies to X86_64. > > Fixes: ba39592764ed2 ("Intel IOMMU: Intel IOMMU driver") > Reported-by: Dmytro Maluka FWIW, Jason and Kevin contributed to this discovery more than I did. :) > Closes: https://lore.kernel.org/all/aTG7gc7I5wExai3S@google.com/ > Signed-off-by: Lu Baolu > --- > drivers/iommu/intel/Kconfig | 2 +- > drivers/iommu/intel/iommu.h | 22 ++++++++++++++++++---- > drivers/iommu/intel/iommu.c | 30 +++++++++++++++--------------- > drivers/iommu/intel/pasid.c | 18 +++++++++--------- > 4 files changed, 43 insertions(+), 29 deletions(-) > > diff --git a/drivers/iommu/intel/Kconfig b/drivers/iommu/intel/Kconfig > index 5471f814e073..efda19820f95 100644 > --- a/drivers/iommu/intel/Kconfig > +++ b/drivers/iommu/intel/Kconfig > @@ -11,7 +11,7 @@ config DMAR_DEBUG > > config INTEL_IOMMU > bool "Support for Intel IOMMU using DMA Remapping Devices" > - depends on PCI_MSI && ACPI && X86 > + depends on PCI_MSI && ACPI && X86_64 > select IOMMU_API > select GENERIC_PT > select IOMMU_PT > diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h > index 25c5e22096d4..b8999802f401 100644 > --- a/drivers/iommu/intel/iommu.h > +++ b/drivers/iommu/intel/iommu.h > @@ -546,6 +546,16 @@ struct pasid_entry; > struct pasid_state_entry; > struct page_req_dsc; > > +static __always_inline void intel_iommu_atomic128_set(u128 *ptr, u128 val) > +{ > + /* > + * Use the cmpxchg16b instruction for 128-bit atomicity. As updates > + * are serialized by a spinlock, we use the local (unlocked) variant > + * to avoid unnecessary bus locking overhead. > + */ > + arch_cmpxchg128_local(ptr, *ptr, val); Any reason why not cmpxchg128_local()? (except following the AMD driver) Otherwise, Reviewed-by: Dmytro Maluka > +} > + > /* > * 0: Present > * 1-11: Reserved > @@ -569,8 +579,13 @@ struct root_entry { > * 8-23: domain id > */ > struct context_entry { > - u64 lo; > - u64 hi; > + union { > + struct { > + u64 lo; > + u64 hi; > + }; > + u128 val128; > + }; > }; > > struct iommu_domain_info { > @@ -946,8 +961,7 @@ static inline int context_domain_id(struct context_entry *c) > > static inline void context_clear_entry(struct context_entry *context) > { > - context->lo = 0; > - context->hi = 0; > + intel_iommu_atomic128_set(&context->val128, 0); > } > > #ifdef CONFIG_INTEL_IOMMU > diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c > index 134302fbcd92..d721061ebda2 100644 > --- a/drivers/iommu/intel/iommu.c > +++ b/drivers/iommu/intel/iommu.c > @@ -1147,8 +1147,8 @@ static int domain_context_mapping_one(struct dmar_domain *domain, > domain_lookup_dev_info(domain, iommu, bus, devfn); > u16 did = domain_id_iommu(domain, iommu); > int translation = CONTEXT_TT_MULTI_LEVEL; > + struct context_entry *context, new = {0}; > struct pt_iommu_vtdss_hw_info pt_info; > - struct context_entry *context; > int ret; > > if (WARN_ON(!intel_domain_is_ss_paging(domain))) > @@ -1170,19 +1170,19 @@ static int domain_context_mapping_one(struct dmar_domain *domain, > goto out_unlock; > > copied_context_tear_down(iommu, context, bus, devfn); > - context_clear_entry(context); > - context_set_domain_id(context, did); > + context_set_domain_id(&new, did); > > if (info && info->ats_supported) > translation = CONTEXT_TT_DEV_IOTLB; > else > translation = CONTEXT_TT_MULTI_LEVEL; > > - context_set_address_root(context, pt_info.ssptptr); > - context_set_address_width(context, pt_info.aw); > - context_set_translation_type(context, translation); > - context_set_fault_enable(context); > - context_set_present(context); > + context_set_address_root(&new, pt_info.ssptptr); > + context_set_address_width(&new, pt_info.aw); > + context_set_translation_type(&new, translation); > + context_set_fault_enable(&new); > + context_set_present(&new); > + intel_iommu_atomic128_set(&context->val128, new.val128); > if (!ecap_coherent(iommu->ecap)) > clflush_cache_range(context, sizeof(*context)); > context_present_cache_flush(iommu, did, bus, devfn); > @@ -3771,8 +3771,8 @@ static int intel_iommu_set_dirty_tracking(struct iommu_domain *domain, > static int context_setup_pass_through(struct device *dev, u8 bus, u8 devfn) > { > struct device_domain_info *info = dev_iommu_priv_get(dev); > + struct context_entry *context, new = {0}; > struct intel_iommu *iommu = info->iommu; > - struct context_entry *context; > > spin_lock(&iommu->lock); > context = iommu_context_addr(iommu, bus, devfn, 1); > @@ -3787,17 +3787,17 @@ static int context_setup_pass_through(struct device *dev, u8 bus, u8 devfn) > } > > copied_context_tear_down(iommu, context, bus, devfn); > - context_clear_entry(context); > - context_set_domain_id(context, FLPT_DEFAULT_DID); > + context_set_domain_id(&new, FLPT_DEFAULT_DID); > > /* > * In pass through mode, AW must be programmed to indicate the largest > * AGAW value supported by hardware. And ASR is ignored by hardware. > */ > - context_set_address_width(context, iommu->msagaw); > - context_set_translation_type(context, CONTEXT_TT_PASS_THROUGH); > - context_set_fault_enable(context); > - context_set_present(context); > + context_set_address_width(&new, iommu->msagaw); > + context_set_translation_type(&new, CONTEXT_TT_PASS_THROUGH); > + context_set_fault_enable(&new); > + context_set_present(&new); > + intel_iommu_atomic128_set(&context->val128, new.val128); > if (!ecap_coherent(iommu->ecap)) > clflush_cache_range(context, sizeof(*context)); > context_present_cache_flush(iommu, FLPT_DEFAULT_DID, bus, devfn); > diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c > index 3e2255057079..298a39183996 100644 > --- a/drivers/iommu/intel/pasid.c > +++ b/drivers/iommu/intel/pasid.c > @@ -978,23 +978,23 @@ static int context_entry_set_pasid_table(struct context_entry *context, > struct device_domain_info *info = dev_iommu_priv_get(dev); > struct pasid_table *table = info->pasid_table; > struct intel_iommu *iommu = info->iommu; > + struct context_entry new = {0}; > unsigned long pds; > > - context_clear_entry(context); > - > pds = context_get_sm_pds(table); > - context->lo = (u64)virt_to_phys(table->table) | context_pdts(pds); > - context_set_sm_rid2pasid(context, IOMMU_NO_PASID); > + new.lo = (u64)virt_to_phys(table->table) | context_pdts(pds); > + context_set_sm_rid2pasid(&new, IOMMU_NO_PASID); > > if (info->ats_supported) > - context_set_sm_dte(context); > + context_set_sm_dte(&new); > if (info->pasid_supported) > - context_set_pasid(context); > + context_set_pasid(&new); > if (info->pri_supported) > - context_set_sm_pre(context); > + context_set_sm_pre(&new); > > - context_set_fault_enable(context); > - context_set_present(context); > + context_set_fault_enable(&new); > + context_set_present(&new); > + intel_iommu_atomic128_set(&context->val128, new.val128); > __iommu_flush_cache(iommu, context, sizeof(*context)); > > return 0; > -- > 2.43.0 >