public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support
@ 2008-07-10 18:16 Suresh Siddha
  2008-07-10 18:16 ` [patch 01/26] x64, x2apic/intr-remap: Intel vt-d, IOMMU code reorganization Suresh Siddha
                   ` (27 more replies)
  0 siblings, 28 replies; 87+ messages in thread
From: Suresh Siddha @ 2008-07-10 18:16 UTC (permalink / raw)
  To: mingo, hpa, tglx, akpm, arjan, andi, ebiederm, jbarnes, steiner
  Cc: linux-kernel

x2APIC architecture provides a new x2apic mode, which allows for the
increased range of processor addressability ( > 8 bit apic ID support),
MSR access to APIC registers, etc. x2apic specification can be found at
http://download.intel.com/design/processor/specupdt/318148.pdf
(located under http://developer.intel.com/products/processor/manuals/index.htm )

Interrupt-remapping is part of Intel Virtualization Technology for
Directed I/O architecture and the specification can be found from
http://download.intel.com/technology/computing/vptech/Intel(r)_VT_for_Direct_IO.pdf
(above link seems to be broken for the moment, but in general it should be
found under http://www.intel.com/technology/virtualization/ )

Interrupt-remapping architecture enables extended Interrupt Mode on x86
platforms supporting 32-bit APIC-IDs. This infrastructure allows
the existing interrupt sources such as I/OxAPICs and MSI/MSI-X devices work
seamlessly with apic-id's > 8 bits. As such, this is a pre-requisite for
enabling x2apic mode in the CPU.

This patchset adds 64-bit support for interrupt-remapping and x2apic, which
introduces apic_ops for basic APIC ops(uncached memory Vs MSR accesses etc),
new irq_chip's for supporting interrupt-remapping and new genapic for
supporting IPI's, logical cluster/physical x2apic modes.

irq migration in the presence of interrupt-remapping is done from the
process-context as opposed to interrupt-context. Interrupt-remapping
infrastrucutre allows us to do this migration in a simple fashion (atleast for
edge triggered interrupts).

Interrupt-remapping (CONFIG_INTR_REMAP) and DMA-remapping (CONFIG_DMAR)
can be enabled separately.

More details in the individual patches that follow.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
-- 


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [patch 01/26] x64, x2apic/intr-remap: Intel vt-d, IOMMU code reorganization
  2008-07-10 18:16 [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support Suresh Siddha
@ 2008-07-10 18:16 ` Suresh Siddha
  2008-07-10 18:16 ` [patch 02/26] x64, x2apic/intr-remap: fix the need for sequential array allocation of iommus Suresh Siddha
                   ` (26 subsequent siblings)
  27 siblings, 0 replies; 87+ messages in thread
From: Suresh Siddha @ 2008-07-10 18:16 UTC (permalink / raw)
  To: mingo, hpa, tglx, akpm, arjan, andi, ebiederm, jbarnes, steiner
  Cc: linux-kernel, Suresh Siddha

[-- Attachment #1: cleanup_intel_iommu_dmar_code.patch --]
[-- Type: text/plain, Size: 16911 bytes --]

code reorganization of the generic Intel vt-d parsing related routines and linux
iommu routines specific to Intel vt-d.

drivers/pci/dmar.c	now contains the generic vt-d parsing related routines
drivers/pci/intel_iommu.c contains the iommu routines specific to vt-d

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---

Index: tree-x86/drivers/pci/dmar.c
===================================================================
--- tree-x86.orig/drivers/pci/dmar.c	2008-07-10 09:51:45.000000000 -0700
+++ tree-x86/drivers/pci/dmar.c	2008-07-10 09:51:46.000000000 -0700
@@ -19,9 +19,11 @@
  * Author: Shaohua Li <shaohua.li@intel.com>
  * Author: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
  *
- * This file implements early detection/parsing of DMA Remapping Devices
+ * This file implements early detection/parsing of Remapping Devices
  * reported to OS through BIOS via DMA remapping reporting (DMAR) ACPI
  * tables.
+ *
+ * These routines are used by both DMA-remapping and Interrupt-remapping
  */
 
 #include <linux/pci.h>
@@ -300,6 +302,37 @@
 	return ret;
 }
 
+int dmar_pci_device_match(struct pci_dev *devices[], int cnt,
+			  struct pci_dev *dev)
+{
+	int index;
+
+	while (dev) {
+		for (index = 0; index < cnt; index++)
+			if (dev == devices[index])
+				return 1;
+
+		/* Check our parent */
+		dev = dev->bus->self;
+	}
+
+	return 0;
+}
+
+struct dmar_drhd_unit *
+dmar_find_matched_drhd_unit(struct pci_dev *dev)
+{
+	struct dmar_drhd_unit *drhd = NULL;
+
+	list_for_each_entry(drhd, &dmar_drhd_units, list) {
+		if (drhd->include_all || dmar_pci_device_match(drhd->devices,
+						drhd->devices_cnt, dev))
+			return drhd;
+	}
+
+	return NULL;
+}
+
 
 int __init dmar_table_init(void)
 {
@@ -343,3 +376,58 @@
 
 	return (ACPI_SUCCESS(status) ? 1 : 0);
 }
+
+struct intel_iommu *alloc_iommu(struct intel_iommu *iommu,
+				struct dmar_drhd_unit *drhd)
+{
+	int map_size;
+	u32 ver;
+
+	iommu->reg = ioremap(drhd->reg_base_addr, PAGE_SIZE_4K);
+	if (!iommu->reg) {
+		printk(KERN_ERR "IOMMU: can't map the region\n");
+		goto error;
+	}
+	iommu->cap = dmar_readq(iommu->reg + DMAR_CAP_REG);
+	iommu->ecap = dmar_readq(iommu->reg + DMAR_ECAP_REG);
+
+	/* the registers might be more than one page */
+	map_size = max_t(int, ecap_max_iotlb_offset(iommu->ecap),
+		cap_max_fault_reg_offset(iommu->cap));
+	map_size = PAGE_ALIGN_4K(map_size);
+	if (map_size > PAGE_SIZE_4K) {
+		iounmap(iommu->reg);
+		iommu->reg = ioremap(drhd->reg_base_addr, map_size);
+		if (!iommu->reg) {
+			printk(KERN_ERR "IOMMU: can't map the region\n");
+			goto error;
+		}
+	}
+
+	ver = readl(iommu->reg + DMAR_VER_REG);
+	pr_debug("IOMMU %llx: ver %d:%d cap %llx ecap %llx\n",
+		drhd->reg_base_addr, DMAR_VER_MAJOR(ver), DMAR_VER_MINOR(ver),
+		iommu->cap, iommu->ecap);
+
+	spin_lock_init(&iommu->register_lock);
+
+	drhd->iommu = iommu;
+	return iommu;
+error:
+	kfree(iommu);
+	return NULL;
+}
+
+void free_iommu(struct intel_iommu *iommu)
+{
+	if (!iommu)
+		return;
+
+#ifdef CONFIG_DMAR
+	free_dmar_iommu(iommu);
+#endif
+
+	if (iommu->reg)
+		iounmap(iommu->reg);
+	kfree(iommu);
+}
Index: tree-x86/drivers/pci/intel-iommu.c
===================================================================
--- tree-x86.orig/drivers/pci/intel-iommu.c	2008-07-10 09:51:45.000000000 -0700
+++ tree-x86/drivers/pci/intel-iommu.c	2008-07-10 09:51:46.000000000 -0700
@@ -990,6 +990,8 @@
 		return -ENOMEM;
 	}
 
+	spin_lock_init(&iommu->lock);
+
 	/*
 	 * if Caching mode is set, then invalid translations are tagged
 	 * with domainid 0. Hence we need to pre-allocate it.
@@ -998,62 +1000,15 @@
 		set_bit(0, iommu->domain_ids);
 	return 0;
 }
-static struct intel_iommu *alloc_iommu(struct intel_iommu *iommu,
-					struct dmar_drhd_unit *drhd)
-{
-	int ret;
-	int map_size;
-	u32 ver;
-
-	iommu->reg = ioremap(drhd->reg_base_addr, PAGE_SIZE_4K);
-	if (!iommu->reg) {
-		printk(KERN_ERR "IOMMU: can't map the region\n");
-		goto error;
-	}
-	iommu->cap = dmar_readq(iommu->reg + DMAR_CAP_REG);
-	iommu->ecap = dmar_readq(iommu->reg + DMAR_ECAP_REG);
-
-	/* the registers might be more than one page */
-	map_size = max_t(int, ecap_max_iotlb_offset(iommu->ecap),
-		cap_max_fault_reg_offset(iommu->cap));
-	map_size = PAGE_ALIGN_4K(map_size);
-	if (map_size > PAGE_SIZE_4K) {
-		iounmap(iommu->reg);
-		iommu->reg = ioremap(drhd->reg_base_addr, map_size);
-		if (!iommu->reg) {
-			printk(KERN_ERR "IOMMU: can't map the region\n");
-			goto error;
-		}
-	}
 
-	ver = readl(iommu->reg + DMAR_VER_REG);
-	pr_debug("IOMMU %llx: ver %d:%d cap %llx ecap %llx\n",
-		drhd->reg_base_addr, DMAR_VER_MAJOR(ver), DMAR_VER_MINOR(ver),
-		iommu->cap, iommu->ecap);
-	ret = iommu_init_domains(iommu);
-	if (ret)
-		goto error_unmap;
-	spin_lock_init(&iommu->lock);
-	spin_lock_init(&iommu->register_lock);
-
-	drhd->iommu = iommu;
-	return iommu;
-error_unmap:
-	iounmap(iommu->reg);
-error:
-	kfree(iommu);
-	return NULL;
-}
 
 static void domain_exit(struct dmar_domain *domain);
-static void free_iommu(struct intel_iommu *iommu)
+
+void free_dmar_iommu(struct intel_iommu *iommu)
 {
 	struct dmar_domain *domain;
 	int i;
 
-	if (!iommu)
-		return;
-
 	i = find_first_bit(iommu->domain_ids, cap_ndoms(iommu->cap));
 	for (; i < cap_ndoms(iommu->cap); ) {
 		domain = iommu->domains[i];
@@ -1078,10 +1033,6 @@
 
 	/* free context mapping */
 	free_context_table(iommu);
-
-	if (iommu->reg)
-		iounmap(iommu->reg);
-	kfree(iommu);
 }
 
 static struct dmar_domain * iommu_alloc_domain(struct intel_iommu *iommu)
@@ -1426,37 +1377,6 @@
 	return NULL;
 }
 
-static int dmar_pci_device_match(struct pci_dev *devices[], int cnt,
-     struct pci_dev *dev)
-{
-	int index;
-
-	while (dev) {
-		for (index = 0; index < cnt; index++)
-			if (dev == devices[index])
-				return 1;
-
-		/* Check our parent */
-		dev = dev->bus->self;
-	}
-
-	return 0;
-}
-
-static struct dmar_drhd_unit *
-dmar_find_matched_drhd_unit(struct pci_dev *dev)
-{
-	struct dmar_drhd_unit *drhd = NULL;
-
-	list_for_each_entry(drhd, &dmar_drhd_units, list) {
-		if (drhd->include_all || dmar_pci_device_match(drhd->devices,
-						drhd->devices_cnt, dev))
-			return drhd;
-	}
-
-	return NULL;
-}
-
 /* domain is initialized */
 static struct dmar_domain *get_domain_for_dev(struct pci_dev *pdev, int gaw)
 {
@@ -1764,6 +1684,10 @@
 			goto error;
 		}
 
+		ret = iommu_init_domains(iommu);
+		if (ret)
+			goto error;
+
 		/*
 		 * TBD:
 		 * we could share the same root & context tables
Index: tree-x86/drivers/pci/intel-iommu.h
===================================================================
--- tree-x86.orig/drivers/pci/intel-iommu.h	2008-07-10 09:51:45.000000000 -0700
+++ tree-x86/drivers/pci/intel-iommu.h	2008-07-10 09:51:46.000000000 -0700
@@ -27,19 +27,7 @@
 #include <linux/sysdev.h>
 #include "iova.h"
 #include <linux/io.h>
-
-/*
- * We need a fixed PAGE_SIZE of 4K irrespective of
- * arch PAGE_SIZE for IOMMU page tables.
- */
-#define PAGE_SHIFT_4K		(12)
-#define PAGE_SIZE_4K		(1UL << PAGE_SHIFT_4K)
-#define PAGE_MASK_4K		(((u64)-1) << PAGE_SHIFT_4K)
-#define PAGE_ALIGN_4K(addr)	(((addr) + PAGE_SIZE_4K - 1) & PAGE_MASK_4K)
-
-#define IOVA_PFN(addr)		((addr) >> PAGE_SHIFT_4K)
-#define DMA_32BIT_PFN		IOVA_PFN(DMA_32BIT_MASK)
-#define DMA_64BIT_PFN		IOVA_PFN(DMA_64BIT_MASK)
+#include "dma_remapping.h"
 
 /*
  * Intel IOMMU register specification per version 1.0 public spec.
@@ -187,158 +175,31 @@
 #define dma_frcd_source_id(c) (c & 0xffff)
 #define dma_frcd_page_addr(d) (d & (((u64)-1) << 12)) /* low 64 bit */
 
-/*
- * 0: Present
- * 1-11: Reserved
- * 12-63: Context Ptr (12 - (haw-1))
- * 64-127: Reserved
- */
-struct root_entry {
-	u64	val;
-	u64	rsvd1;
-};
-#define ROOT_ENTRY_NR (PAGE_SIZE_4K/sizeof(struct root_entry))
-static inline bool root_present(struct root_entry *root)
-{
-	return (root->val & 1);
-}
-static inline void set_root_present(struct root_entry *root)
-{
-	root->val |= 1;
-}
-static inline void set_root_value(struct root_entry *root, unsigned long value)
-{
-	root->val |= value & PAGE_MASK_4K;
-}
-
-struct context_entry;
-static inline struct context_entry *
-get_context_addr_from_root(struct root_entry *root)
-{
-	return (struct context_entry *)
-		(root_present(root)?phys_to_virt(
-		root->val & PAGE_MASK_4K):
-		NULL);
-}
-
-/*
- * low 64 bits:
- * 0: present
- * 1: fault processing disable
- * 2-3: translation type
- * 12-63: address space root
- * high 64 bits:
- * 0-2: address width
- * 3-6: aval
- * 8-23: domain id
- */
-struct context_entry {
-	u64 lo;
-	u64 hi;
-};
-#define context_present(c) ((c).lo & 1)
-#define context_fault_disable(c) (((c).lo >> 1) & 1)
-#define context_translation_type(c) (((c).lo >> 2) & 3)
-#define context_address_root(c) ((c).lo & PAGE_MASK_4K)
-#define context_address_width(c) ((c).hi &  7)
-#define context_domain_id(c) (((c).hi >> 8) & ((1 << 16) - 1))
-
-#define context_set_present(c) do {(c).lo |= 1;} while (0)
-#define context_set_fault_enable(c) \
-	do {(c).lo &= (((u64)-1) << 2) | 1;} while (0)
-#define context_set_translation_type(c, val) \
-	do { \
-		(c).lo &= (((u64)-1) << 4) | 3; \
-		(c).lo |= ((val) & 3) << 2; \
-	} while (0)
-#define CONTEXT_TT_MULTI_LEVEL 0
-#define context_set_address_root(c, val) \
-	do {(c).lo |= (val) & PAGE_MASK_4K;} while (0)
-#define context_set_address_width(c, val) do {(c).hi |= (val) & 7;} while (0)
-#define context_set_domain_id(c, val) \
-	do {(c).hi |= ((val) & ((1 << 16) - 1)) << 8;} while (0)
-#define context_clear_entry(c) do {(c).lo = 0; (c).hi = 0;} while (0)
-
-/*
- * 0: readable
- * 1: writable
- * 2-6: reserved
- * 7: super page
- * 8-11: available
- * 12-63: Host physcial address
- */
-struct dma_pte {
-	u64 val;
-};
-#define dma_clear_pte(p)	do {(p).val = 0;} while (0)
-
-#define DMA_PTE_READ (1)
-#define DMA_PTE_WRITE (2)
-
-#define dma_set_pte_readable(p) do {(p).val |= DMA_PTE_READ;} while (0)
-#define dma_set_pte_writable(p) do {(p).val |= DMA_PTE_WRITE;} while (0)
-#define dma_set_pte_prot(p, prot) \
-		do {(p).val = ((p).val & ~3) | ((prot) & 3); } while (0)
-#define dma_pte_addr(p) ((p).val & PAGE_MASK_4K)
-#define dma_set_pte_addr(p, addr) do {\
-		(p).val |= ((addr) & PAGE_MASK_4K); } while (0)
-#define dma_pte_present(p) (((p).val & 3) != 0)
-
-struct intel_iommu;
-
-struct dmar_domain {
-	int	id;			/* domain id */
-	struct intel_iommu *iommu;	/* back pointer to owning iommu */
-
-	struct list_head devices; 	/* all devices' list */
-	struct iova_domain iovad;	/* iova's that belong to this domain */
-
-	struct dma_pte	*pgd;		/* virtual address */
-	spinlock_t	mapping_lock;	/* page table lock */
-	int		gaw;		/* max guest address width */
-
-	/* adjusted guest address width, 0 is level 2 30-bit */
-	int		agaw;
-
-#define DOMAIN_FLAG_MULTIPLE_DEVICES 1
-	int		flags;
-};
-
-/* PCI domain-device relationship */
-struct device_domain_info {
-	struct list_head link;	/* link to domain siblings */
-	struct list_head global; /* link to global list */
-	u8 bus;			/* PCI bus numer */
-	u8 devfn;		/* PCI devfn number */
-	struct pci_dev *dev; /* it's NULL for PCIE-to-PCI bridge */
-	struct dmar_domain *domain; /* pointer to domain */
-};
-
-extern int init_dmars(void);
-
 struct intel_iommu {
 	void __iomem	*reg; /* Pointer to hardware regs, virtual addr */
 	u64		cap;
 	u64		ecap;
-	unsigned long 	*domain_ids; /* bitmap of domains */
-	struct dmar_domain **domains; /* ptr to domains */
 	int		seg;
 	u32		gcmd; /* Holds TE, EAFL. Don't need SRTP, SFL, WBF */
-	spinlock_t	lock; /* protect context, domain ids */
 	spinlock_t	register_lock; /* protect register handling */
+
+#ifdef CONFIG_DMAR
+	unsigned long 	*domain_ids; /* bitmap of domains */
+	struct dmar_domain **domains; /* ptr to domains */
+	spinlock_t	lock; /* protect context, domain ids */
 	struct root_entry *root_entry; /* virtual address */
 
 	unsigned int irq;
 	unsigned char name[7];    /* Device Name */
 	struct msi_msg saved_msg;
 	struct sys_device sysdev;
+#endif
 };
 
-#ifndef CONFIG_DMAR_GFX_WA
-static inline void iommu_prepare_gfx_mapping(void)
-{
-	return;
-}
-#endif /* !CONFIG_DMAR_GFX_WA */
+extern struct dmar_drhd_unit * dmar_find_matched_drhd_unit(struct pci_dev *dev);
+
+extern struct intel_iommu *alloc_iommu(struct intel_iommu *iommu,
+				       struct dmar_drhd_unit *drhd);
+extern void free_iommu(struct intel_iommu *iommu);
 
 #endif
Index: tree-x86/drivers/pci/dma_remapping.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ tree-x86/drivers/pci/dma_remapping.h	2008-07-10 09:51:46.000000000 -0700
@@ -0,0 +1,155 @@
+#ifndef _DMA_REMAPPING_H
+#define _DMA_REMAPPING_H
+
+/*
+ * We need a fixed PAGE_SIZE of 4K irrespective of
+ * arch PAGE_SIZE for IOMMU page tables.
+ */
+#define PAGE_SHIFT_4K		(12)
+#define PAGE_SIZE_4K		(1UL << PAGE_SHIFT_4K)
+#define PAGE_MASK_4K		(((u64)-1) << PAGE_SHIFT_4K)
+#define PAGE_ALIGN_4K(addr)	(((addr) + PAGE_SIZE_4K - 1) & PAGE_MASK_4K)
+
+#define IOVA_PFN(addr)		((addr) >> PAGE_SHIFT_4K)
+#define DMA_32BIT_PFN		IOVA_PFN(DMA_32BIT_MASK)
+#define DMA_64BIT_PFN		IOVA_PFN(DMA_64BIT_MASK)
+
+
+/*
+ * 0: Present
+ * 1-11: Reserved
+ * 12-63: Context Ptr (12 - (haw-1))
+ * 64-127: Reserved
+ */
+struct root_entry {
+	u64	val;
+	u64	rsvd1;
+};
+#define ROOT_ENTRY_NR (PAGE_SIZE_4K/sizeof(struct root_entry))
+static inline bool root_present(struct root_entry *root)
+{
+	return (root->val & 1);
+}
+static inline void set_root_present(struct root_entry *root)
+{
+	root->val |= 1;
+}
+static inline void set_root_value(struct root_entry *root, unsigned long value)
+{
+	root->val |= value & PAGE_MASK_4K;
+}
+
+struct context_entry;
+static inline struct context_entry *
+get_context_addr_from_root(struct root_entry *root)
+{
+	return (struct context_entry *)
+		(root_present(root)?phys_to_virt(
+		root->val & PAGE_MASK_4K):
+		NULL);
+}
+
+/*
+ * low 64 bits:
+ * 0: present
+ * 1: fault processing disable
+ * 2-3: translation type
+ * 12-63: address space root
+ * high 64 bits:
+ * 0-2: address width
+ * 3-6: aval
+ * 8-23: domain id
+ */
+struct context_entry {
+	u64 lo;
+	u64 hi;
+};
+#define context_present(c) ((c).lo & 1)
+#define context_fault_disable(c) (((c).lo >> 1) & 1)
+#define context_translation_type(c) (((c).lo >> 2) & 3)
+#define context_address_root(c) ((c).lo & PAGE_MASK_4K)
+#define context_address_width(c) ((c).hi &  7)
+#define context_domain_id(c) (((c).hi >> 8) & ((1 << 16) - 1))
+
+#define context_set_present(c) do {(c).lo |= 1;} while (0)
+#define context_set_fault_enable(c) \
+	do {(c).lo &= (((u64)-1) << 2) | 1;} while (0)
+#define context_set_translation_type(c, val) \
+	do { \
+		(c).lo &= (((u64)-1) << 4) | 3; \
+		(c).lo |= ((val) & 3) << 2; \
+	} while (0)
+#define CONTEXT_TT_MULTI_LEVEL 0
+#define context_set_address_root(c, val) \
+	do {(c).lo |= (val) & PAGE_MASK_4K;} while (0)
+#define context_set_address_width(c, val) do {(c).hi |= (val) & 7;} while (0)
+#define context_set_domain_id(c, val) \
+	do {(c).hi |= ((val) & ((1 << 16) - 1)) << 8;} while (0)
+#define context_clear_entry(c) do {(c).lo = 0; (c).hi = 0;} while (0)
+
+/*
+ * 0: readable
+ * 1: writable
+ * 2-6: reserved
+ * 7: super page
+ * 8-11: available
+ * 12-63: Host physcial address
+ */
+struct dma_pte {
+	u64 val;
+};
+#define dma_clear_pte(p)	do {(p).val = 0;} while (0)
+
+#define DMA_PTE_READ (1)
+#define DMA_PTE_WRITE (2)
+
+#define dma_set_pte_readable(p) do {(p).val |= DMA_PTE_READ;} while (0)
+#define dma_set_pte_writable(p) do {(p).val |= DMA_PTE_WRITE;} while (0)
+#define dma_set_pte_prot(p, prot) \
+		do {(p).val = ((p).val & ~3) | ((prot) & 3); } while (0)
+#define dma_pte_addr(p) ((p).val & PAGE_MASK_4K)
+#define dma_set_pte_addr(p, addr) do {\
+		(p).val |= ((addr) & PAGE_MASK_4K); } while (0)
+#define dma_pte_present(p) (((p).val & 3) != 0)
+
+struct intel_iommu;
+
+struct dmar_domain {
+	int	id;			/* domain id */
+	struct intel_iommu *iommu;	/* back pointer to owning iommu */
+
+	struct list_head devices; 	/* all devices' list */
+	struct iova_domain iovad;	/* iova's that belong to this domain */
+
+	struct dma_pte	*pgd;		/* virtual address */
+	spinlock_t	mapping_lock;	/* page table lock */
+	int		gaw;		/* max guest address width */
+
+	/* adjusted guest address width, 0 is level 2 30-bit */
+	int		agaw;
+
+#define DOMAIN_FLAG_MULTIPLE_DEVICES 1
+	int		flags;
+};
+
+/* PCI domain-device relationship */
+struct device_domain_info {
+	struct list_head link;	/* link to domain siblings */
+	struct list_head global; /* link to global list */
+	u8 bus;			/* PCI bus numer */
+	u8 devfn;		/* PCI devfn number */
+	struct pci_dev *dev; /* it's NULL for PCIE-to-PCI bridge */
+	struct dmar_domain *domain; /* pointer to domain */
+};
+
+extern int init_dmars(void);
+extern void free_dmar_iommu(struct intel_iommu *iommu);
+
+#ifndef CONFIG_DMAR_GFX_WA
+static inline void iommu_prepare_gfx_mapping(void)
+{
+	return;
+}
+#endif /* !CONFIG_DMAR_GFX_WA */
+
+#endif

-- 


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [patch 02/26] x64, x2apic/intr-remap: fix the need for sequential array allocation of iommus
  2008-07-10 18:16 [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support Suresh Siddha
  2008-07-10 18:16 ` [patch 01/26] x64, x2apic/intr-remap: Intel vt-d, IOMMU code reorganization Suresh Siddha
@ 2008-07-10 18:16 ` Suresh Siddha
  2008-07-10 18:16 ` [patch 03/26] x64, x2apic/intr-remap: code re-structuring, to be used by both DMA and Interrupt remapping Suresh Siddha
                   ` (25 subsequent siblings)
  27 siblings, 0 replies; 87+ messages in thread
From: Suresh Siddha @ 2008-07-10 18:16 UTC (permalink / raw)
  To: mingo, hpa, tglx, akpm, arjan, andi, ebiederm, jbarnes, steiner
  Cc: linux-kernel, Suresh Siddha

[-- Attachment #1: cleanup_dmar_g_iommu_logic.patch --]
[-- Type: text/plain, Size: 4035 bytes --]

Clean up the intel-iommu code related to deferred iommu flush logic. There is
no need to allocate all the iommu's as a sequential array.

This will be used later in the interrupt-remapping patch series to
allocate iommu much early and individually for each device remapping
hardware unit.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---

Index: tree-x86/drivers/pci/dmar.c
===================================================================
--- tree-x86.orig/drivers/pci/dmar.c	2008-07-10 09:51:46.000000000 -0700
+++ tree-x86/drivers/pci/dmar.c	2008-07-10 09:51:49.000000000 -0700
@@ -377,11 +377,18 @@
 	return (ACPI_SUCCESS(status) ? 1 : 0);
 }
 
-struct intel_iommu *alloc_iommu(struct intel_iommu *iommu,
-				struct dmar_drhd_unit *drhd)
+struct intel_iommu *alloc_iommu(struct dmar_drhd_unit *drhd)
 {
+	struct intel_iommu *iommu;
 	int map_size;
 	u32 ver;
+	static int iommu_allocated = 0;
+
+	iommu = kzalloc(sizeof(*iommu), GFP_KERNEL);
+	if (!iommu)
+		return NULL;
+
+	iommu->seq_id = iommu_allocated++;
 
 	iommu->reg = ioremap(drhd->reg_base_addr, PAGE_SIZE_4K);
 	if (!iommu->reg) {
Index: tree-x86/drivers/pci/intel-iommu.c
===================================================================
--- tree-x86.orig/drivers/pci/intel-iommu.c	2008-07-10 09:51:46.000000000 -0700
+++ tree-x86/drivers/pci/intel-iommu.c	2008-07-10 09:51:49.000000000 -0700
@@ -58,8 +58,6 @@
 
 DEFINE_TIMER(unmap_timer,  flush_unmaps_timeout, 0, 0);
 
-static struct intel_iommu *g_iommus;
-
 #define HIGH_WATER_MARK 250
 struct deferred_flush_tables {
 	int next;
@@ -1649,8 +1647,6 @@
 	 * endfor
 	 */
 	for_each_drhd_unit(drhd) {
-		if (drhd->ignored)
-			continue;
 		g_num_of_iommus++;
 		/*
 		 * lock not needed as this is only incremented in the single
@@ -1659,26 +1655,17 @@
 		 */
 	}
 
-	g_iommus = kzalloc(g_num_of_iommus * sizeof(*iommu), GFP_KERNEL);
-	if (!g_iommus) {
-		ret = -ENOMEM;
-		goto error;
-	}
-
 	deferred_flush = kzalloc(g_num_of_iommus *
 		sizeof(struct deferred_flush_tables), GFP_KERNEL);
 	if (!deferred_flush) {
-		kfree(g_iommus);
 		ret = -ENOMEM;
 		goto error;
 	}
 
-	i = 0;
 	for_each_drhd_unit(drhd) {
 		if (drhd->ignored)
 			continue;
-		iommu = alloc_iommu(&g_iommus[i], drhd);
-		i++;
+		iommu = alloc_iommu(drhd);
 		if (!iommu) {
 			ret = -ENOMEM;
 			goto error;
@@ -1770,7 +1757,6 @@
 		iommu = drhd->iommu;
 		free_iommu(iommu);
 	}
-	kfree(g_iommus);
 	return ret;
 }
 
@@ -1927,7 +1913,10 @@
 	/* just flush them all */
 	for (i = 0; i < g_num_of_iommus; i++) {
 		if (deferred_flush[i].next) {
-			iommu_flush_iotlb_global(&g_iommus[i], 0);
+			struct intel_iommu *iommu =
+				deferred_flush[i].domain[0]->iommu;
+
+			iommu_flush_iotlb_global(iommu, 0);
 			for (j = 0; j < deferred_flush[i].next; j++) {
 				__free_iova(&deferred_flush[i].domain[j]->iovad,
 						deferred_flush[i].iova[j]);
@@ -1957,7 +1946,8 @@
 	if (list_size == HIGH_WATER_MARK)
 		flush_unmaps();
 
-	iommu_id = dom->iommu - g_iommus;
+	iommu_id = dom->iommu->seq_id;
+
 	next = deferred_flush[iommu_id].next;
 	deferred_flush[iommu_id].domain[next] = dom;
 	deferred_flush[iommu_id].iova[next] = iova;
Index: tree-x86/drivers/pci/intel-iommu.h
===================================================================
--- tree-x86.orig/drivers/pci/intel-iommu.h	2008-07-10 09:51:46.000000000 -0700
+++ tree-x86/drivers/pci/intel-iommu.h	2008-07-10 09:51:49.000000000 -0700
@@ -182,6 +182,7 @@
 	int		seg;
 	u32		gcmd; /* Holds TE, EAFL. Don't need SRTP, SFL, WBF */
 	spinlock_t	register_lock; /* protect register handling */
+	int		seq_id;	/* sequence id of the iommu */
 
 #ifdef CONFIG_DMAR
 	unsigned long 	*domain_ids; /* bitmap of domains */
@@ -198,8 +199,7 @@
 
 extern struct dmar_drhd_unit * dmar_find_matched_drhd_unit(struct pci_dev *dev);
 
-extern struct intel_iommu *alloc_iommu(struct intel_iommu *iommu,
-				       struct dmar_drhd_unit *drhd);
+extern struct intel_iommu *alloc_iommu(struct dmar_drhd_unit *drhd);
 extern void free_iommu(struct intel_iommu *iommu);
 
 #endif

-- 


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [patch 03/26] x64, x2apic/intr-remap: code re-structuring, to be used by both DMA and Interrupt remapping
  2008-07-10 18:16 [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support Suresh Siddha
  2008-07-10 18:16 ` [patch 01/26] x64, x2apic/intr-remap: Intel vt-d, IOMMU code reorganization Suresh Siddha
  2008-07-10 18:16 ` [patch 02/26] x64, x2apic/intr-remap: fix the need for sequential array allocation of iommus Suresh Siddha
@ 2008-07-10 18:16 ` Suresh Siddha
  2008-07-10 18:16 ` [patch 04/26] x64, x2apic/intr-remap: use CONFIG_DMAR for DMA-remapping specific code Suresh Siddha
                   ` (24 subsequent siblings)
  27 siblings, 0 replies; 87+ messages in thread
From: Suresh Siddha @ 2008-07-10 18:16 UTC (permalink / raw)
  To: mingo, hpa, tglx, akpm, arjan, andi, ebiederm, jbarnes, steiner
  Cc: linux-kernel, Suresh Siddha

[-- Attachment #1: parse_iommus_early.patch --]
[-- Type: text/plain, Size: 6926 bytes --]

Allocate the iommu during the parse of DMA remapping hardware
definition structures. And also, introduce routines for device
scope initialization which will be explicitly called during
dma-remapping initialization.

These will be used for enabling interrupt remapping separately from the
existing DMA-remapping enabling sequence.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---

Index: tree-x86/drivers/pci/dmar.c
===================================================================
--- tree-x86.orig/drivers/pci/dmar.c	2008-07-10 09:51:49.000000000 -0700
+++ tree-x86/drivers/pci/dmar.c	2008-07-10 09:51:51.000000000 -0700
@@ -174,19 +174,37 @@
 	struct acpi_dmar_hardware_unit *drhd;
 	struct dmar_drhd_unit *dmaru;
 	int ret = 0;
-	static int include_all;
 
 	dmaru = kzalloc(sizeof(*dmaru), GFP_KERNEL);
 	if (!dmaru)
 		return -ENOMEM;
 
+	dmaru->hdr = header;
 	drhd = (struct acpi_dmar_hardware_unit *)header;
 	dmaru->reg_base_addr = drhd->address;
 	dmaru->include_all = drhd->flags & 0x1; /* BIT0: INCLUDE_ALL */
 
+	ret = alloc_iommu(dmaru);
+	if (ret) {
+		kfree(dmaru);
+		return ret;
+	}
+	dmar_register_drhd_unit(dmaru);
+	return 0;
+}
+
+static int __init
+dmar_parse_dev(struct dmar_drhd_unit *dmaru)
+{
+	struct acpi_dmar_hardware_unit *drhd;
+	static int include_all;
+	int ret;
+
+	drhd = (struct acpi_dmar_hardware_unit *) dmaru->hdr;
+
 	if (!dmaru->include_all)
 		ret = dmar_parse_dev_scope((void *)(drhd + 1),
-				((void *)drhd) + header->length,
+				((void *)drhd) + drhd->header.length,
 				&dmaru->devices_cnt, &dmaru->devices,
 				drhd->segment);
 	else {
@@ -199,10 +217,10 @@
 		include_all = 1;
 	}
 
-	if (ret || (dmaru->devices_cnt == 0 && !dmaru->include_all))
+	if (ret || (dmaru->devices_cnt == 0 && !dmaru->include_all)) {
+		list_del(&dmaru->list);
 		kfree(dmaru);
-	else
-		dmar_register_drhd_unit(dmaru);
+	}
 	return ret;
 }
 
@@ -211,23 +229,35 @@
 {
 	struct acpi_dmar_reserved_memory *rmrr;
 	struct dmar_rmrr_unit *rmrru;
-	int ret = 0;
 
 	rmrru = kzalloc(sizeof(*rmrru), GFP_KERNEL);
 	if (!rmrru)
 		return -ENOMEM;
 
+	rmrru->hdr = header;
 	rmrr = (struct acpi_dmar_reserved_memory *)header;
 	rmrru->base_address = rmrr->base_address;
 	rmrru->end_address = rmrr->end_address;
+
+	dmar_register_rmrr_unit(rmrru);
+	return 0;
+}
+
+static int __init
+rmrr_parse_dev(struct dmar_rmrr_unit *rmrru)
+{
+	struct acpi_dmar_reserved_memory *rmrr;
+	int ret;
+
+	rmrr = (struct acpi_dmar_reserved_memory *) rmrru->hdr;
 	ret = dmar_parse_dev_scope((void *)(rmrr + 1),
-		((void *)rmrr) + header->length,
+		((void *)rmrr) + rmrr->header.length,
 		&rmrru->devices_cnt, &rmrru->devices, rmrr->segment);
 
-	if (ret || (rmrru->devices_cnt == 0))
+	if (ret || (rmrru->devices_cnt == 0)) {
+		list_del(&rmrru->list);
 		kfree(rmrru);
-	else
-		dmar_register_rmrr_unit(rmrru);
+	}
 	return ret;
 }
 
@@ -333,15 +363,42 @@
 	return NULL;
 }
 
+int __init dmar_dev_scope_init(void)
+{
+	struct dmar_drhd_unit *drhd;
+	struct dmar_rmrr_unit *rmrr;
+	int ret = -ENODEV;
+
+	for_each_drhd_unit(drhd) {
+		ret = dmar_parse_dev(drhd);
+		if (ret)
+			return ret;
+	}
+
+	for_each_rmrr_units(rmrr) {
+		ret = rmrr_parse_dev(rmrr);
+		if (ret)
+			return ret;
+	}
+
+	return ret;
+}
+
 
 int __init dmar_table_init(void)
 {
-
+	static int dmar_table_initialized;
 	int ret;
 
+	if (dmar_table_initialized)
+		return 0;
+
+	dmar_table_initialized = 1;
+
 	ret = parse_dmar_table();
 	if (ret) {
-		printk(KERN_INFO PREFIX "parse DMAR table failure.\n");
+		if (ret != -ENODEV)
+			printk(KERN_INFO PREFIX "parse DMAR table failure.\n");
 		return ret;
 	}
 
@@ -377,7 +434,7 @@
 	return (ACPI_SUCCESS(status) ? 1 : 0);
 }
 
-struct intel_iommu *alloc_iommu(struct dmar_drhd_unit *drhd)
+int alloc_iommu(struct dmar_drhd_unit *drhd)
 {
 	struct intel_iommu *iommu;
 	int map_size;
@@ -386,7 +443,7 @@
 
 	iommu = kzalloc(sizeof(*iommu), GFP_KERNEL);
 	if (!iommu)
-		return NULL;
+		return -ENOMEM;
 
 	iommu->seq_id = iommu_allocated++;
 
@@ -419,10 +476,10 @@
 	spin_lock_init(&iommu->register_lock);
 
 	drhd->iommu = iommu;
-	return iommu;
+	return 0;
 error:
 	kfree(iommu);
-	return NULL;
+	return -1;
 }
 
 void free_iommu(struct intel_iommu *iommu)
Index: tree-x86/drivers/pci/intel-iommu.c
===================================================================
--- tree-x86.orig/drivers/pci/intel-iommu.c	2008-07-10 09:51:49.000000000 -0700
+++ tree-x86/drivers/pci/intel-iommu.c	2008-07-10 09:51:51.000000000 -0700
@@ -1665,11 +1665,8 @@
 	for_each_drhd_unit(drhd) {
 		if (drhd->ignored)
 			continue;
-		iommu = alloc_iommu(drhd);
-		if (!iommu) {
-			ret = -ENOMEM;
-			goto error;
-		}
+
+		iommu = drhd->iommu;
 
 		ret = iommu_init_domains(iommu);
 		if (ret)
@@ -2324,6 +2321,9 @@
 	if (dmar_table_init())
 		return 	-ENODEV;
 
+	if (dmar_dev_scope_init())
+		return 	-ENODEV;
+
 	iommu_init_mempool();
 	dmar_init_reserved_ranges();
 
Index: tree-x86/include/linux/dmar.h
===================================================================
--- tree-x86.orig/include/linux/dmar.h	2008-07-10 09:51:45.000000000 -0700
+++ tree-x86/include/linux/dmar.h	2008-07-10 09:51:51.000000000 -0700
@@ -46,12 +46,14 @@
 
 extern int dmar_table_init(void);
 extern int early_dmar_detect(void);
+extern int dmar_dev_scope_init(void);
 
 extern struct list_head dmar_drhd_units;
 extern struct list_head dmar_rmrr_units;
 
 struct dmar_drhd_unit {
 	struct list_head list;		/* list of drhd units	*/
+	struct  acpi_dmar_header *hdr;	/* ACPI header		*/
 	u64	reg_base_addr;		/* register base address*/
 	struct	pci_dev **devices; 	/* target device array	*/
 	int	devices_cnt;		/* target device count	*/
@@ -62,6 +64,7 @@
 
 struct dmar_rmrr_unit {
 	struct list_head list;		/* list of rmrr units	*/
+	struct acpi_dmar_header *hdr;	/* ACPI header		*/
 	u64	base_address;		/* reserved base address*/
 	u64	end_address;		/* reserved end address */
 	struct pci_dev **devices;	/* target devices */
@@ -72,6 +75,8 @@
 	list_for_each_entry(drhd, &dmar_drhd_units, list)
 #define for_each_rmrr_units(rmrr) \
 	list_for_each_entry(rmrr, &dmar_rmrr_units, list)
+
+extern int alloc_iommu(struct dmar_drhd_unit *);
 #else
 static inline void detect_intel_iommu(void)
 {
@@ -81,6 +86,9 @@
 {
 	return -ENODEV;
 }
-
+static inline int dmar_table_init(void)
+{
+	return -ENODEV;
+}
 #endif /* !CONFIG_DMAR */
 #endif /* __DMAR_H__ */
Index: tree-x86/drivers/pci/intel-iommu.h
===================================================================
--- tree-x86.orig/drivers/pci/intel-iommu.h	2008-07-10 09:51:49.000000000 -0700
+++ tree-x86/drivers/pci/intel-iommu.h	2008-07-10 09:51:51.000000000 -0700
@@ -199,7 +199,7 @@
 
 extern struct dmar_drhd_unit * dmar_find_matched_drhd_unit(struct pci_dev *dev);
 
-extern struct intel_iommu *alloc_iommu(struct dmar_drhd_unit *drhd);
+extern int alloc_iommu(struct dmar_drhd_unit *drhd);
 extern void free_iommu(struct intel_iommu *iommu);
 
 #endif

-- 


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [patch 04/26] x64, x2apic/intr-remap: use CONFIG_DMAR for DMA-remapping specific code
  2008-07-10 18:16 [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support Suresh Siddha
                   ` (2 preceding siblings ...)
  2008-07-10 18:16 ` [patch 03/26] x64, x2apic/intr-remap: code re-structuring, to be used by both DMA and Interrupt remapping Suresh Siddha
@ 2008-07-10 18:16 ` Suresh Siddha
  2008-07-10 18:16 ` [patch 05/26] x64, x2apic/intr-remap: Fix the need for RMRR in the DMA-remapping detection Suresh Siddha
                   ` (23 subsequent siblings)
  27 siblings, 0 replies; 87+ messages in thread
From: Suresh Siddha @ 2008-07-10 18:16 UTC (permalink / raw)
  To: mingo, hpa, tglx, akpm, arjan, andi, ebiederm, jbarnes, steiner
  Cc: linux-kernel, Suresh Siddha

[-- Attachment #1: config_dmar_for_dma_remapping_specific_code.patch --]
[-- Type: text/plain, Size: 2424 bytes --]

DMA remapping specific code covered with CONFIG_DMAR in
the generic code which will also be used later for enabling Interrupt-remapping.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---

Index: tree-x86/drivers/pci/dmar.c
===================================================================
--- tree-x86.orig/drivers/pci/dmar.c	2008-07-10 09:51:51.000000000 -0700
+++ tree-x86/drivers/pci/dmar.c	2008-07-10 09:51:53.000000000 -0700
@@ -39,7 +39,6 @@
  * these units are not supported by the architecture.
  */
 LIST_HEAD(dmar_drhd_units);
-LIST_HEAD(dmar_rmrr_units);
 
 static struct acpi_table_header * __initdata dmar_tbl;
 
@@ -55,11 +54,6 @@
 		list_add(&drhd->list, &dmar_drhd_units);
 }
 
-static void __init dmar_register_rmrr_unit(struct dmar_rmrr_unit *rmrr)
-{
-	list_add(&rmrr->list, &dmar_rmrr_units);
-}
-
 static int __init dmar_parse_one_dev_scope(struct acpi_dmar_device_scope *scope,
 					   struct pci_dev **dev, u16 segment)
 {
@@ -224,6 +218,15 @@
 	return ret;
 }
 
+#ifdef CONFIG_DMAR
+LIST_HEAD(dmar_rmrr_units);
+
+static void __init dmar_register_rmrr_unit(struct dmar_rmrr_unit *rmrr)
+{
+	list_add(&rmrr->list, &dmar_rmrr_units);
+}
+
+
 static int __init
 dmar_parse_one_rmrr(struct acpi_dmar_header *header)
 {
@@ -260,6 +263,7 @@
 	}
 	return ret;
 }
+#endif
 
 static void __init
 dmar_table_print_dmar_entry(struct acpi_dmar_header *header)
@@ -284,6 +288,7 @@
 	}
 }
 
+
 /**
  * parse_dmar_table - parses the DMA reporting table
  */
@@ -316,7 +321,9 @@
 			ret = dmar_parse_one_drhd(entry_header);
 			break;
 		case ACPI_DMAR_TYPE_RESERVED_MEMORY:
+#ifdef CONFIG_DMAR
 			ret = dmar_parse_one_rmrr(entry_header);
+#endif
 			break;
 		default:
 			printk(KERN_WARNING PREFIX
@@ -366,7 +373,6 @@
 int __init dmar_dev_scope_init(void)
 {
 	struct dmar_drhd_unit *drhd;
-	struct dmar_rmrr_unit *rmrr;
 	int ret = -ENODEV;
 
 	for_each_drhd_unit(drhd) {
@@ -375,11 +381,16 @@
 			return ret;
 	}
 
-	for_each_rmrr_units(rmrr) {
-		ret = rmrr_parse_dev(rmrr);
-		if (ret)
-			return ret;
+#ifdef CONFIG_DMAR
+	{
+		struct dmar_rmrr_unit *rmrr;
+		for_each_rmrr_units(rmrr) {
+			ret = rmrr_parse_dev(rmrr);
+			if (ret)
+				return ret;
+		}
 	}
+#endif
 
 	return ret;
 }
@@ -407,10 +418,12 @@
 		return -ENODEV;
 	}
 
+#ifdef CONFIG_DMAR
 	if (list_empty(&dmar_rmrr_units)) {
 		printk(KERN_INFO PREFIX "No RMRR found\n");
 		return -ENODEV;
 	}
+#endif
 
 	return 0;
 }

-- 


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [patch 05/26] x64, x2apic/intr-remap: Fix the need for RMRR in the DMA-remapping detection
  2008-07-10 18:16 [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support Suresh Siddha
                   ` (3 preceding siblings ...)
  2008-07-10 18:16 ` [patch 04/26] x64, x2apic/intr-remap: use CONFIG_DMAR for DMA-remapping specific code Suresh Siddha
@ 2008-07-10 18:16 ` Suresh Siddha
  2008-07-10 18:16 ` [patch 06/26] x64, x2apic/intr-remap: parse ioapic scope under vt-d structures Suresh Siddha
                   ` (22 subsequent siblings)
  27 siblings, 0 replies; 87+ messages in thread
From: Suresh Siddha @ 2008-07-10 18:16 UTC (permalink / raw)
  To: mingo, hpa, tglx, akpm, arjan, andi, ebiederm, jbarnes, steiner
  Cc: linux-kernel, Suresh Siddha, Yong Y Wang

[-- Attachment #1: fix_dmar_table_init.patch --]
[-- Type: text/plain, Size: 651 bytes --]

Presence of RMRR structures is not compulsory for enabling DMA-remapping.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Yong Y Wang <yong.y.wang@intel.com>
---

Index: tree-x86/drivers/pci/dmar.c
===================================================================
--- tree-x86.orig/drivers/pci/dmar.c	2008-07-10 09:51:53.000000000 -0700
+++ tree-x86/drivers/pci/dmar.c	2008-07-10 09:51:55.000000000 -0700
@@ -419,10 +419,8 @@
 	}
 
 #ifdef CONFIG_DMAR
-	if (list_empty(&dmar_rmrr_units)) {
+	if (list_empty(&dmar_rmrr_units))
 		printk(KERN_INFO PREFIX "No RMRR found\n");
-		return -ENODEV;
-	}
 #endif
 
 	return 0;

-- 


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [patch 06/26] x64, x2apic/intr-remap: parse ioapic scope under vt-d structures
  2008-07-10 18:16 [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support Suresh Siddha
                   ` (4 preceding siblings ...)
  2008-07-10 18:16 ` [patch 05/26] x64, x2apic/intr-remap: Fix the need for RMRR in the DMA-remapping detection Suresh Siddha
@ 2008-07-10 18:16 ` Suresh Siddha
  2008-07-10 18:16 ` [patch 07/26] x64, x2apic/intr-remap: move IOMMU_WAIT_OP() macro to intel-iommu.h Suresh Siddha
                   ` (21 subsequent siblings)
  27 siblings, 0 replies; 87+ messages in thread
From: Suresh Siddha @ 2008-07-10 18:16 UTC (permalink / raw)
  To: mingo, hpa, tglx, akpm, arjan, andi, ebiederm, jbarnes, steiner
  Cc: linux-kernel, Suresh Siddha

[-- Attachment #1: intr_remapping_ioapic_parse.patch --]
[-- Type: text/plain, Size: 4433 bytes --]

Parse the vt-d device scope structures to find the mapping between IO-APICs
and the interrupt remapping hardware units.

This will be used later for enabling Interrupt-remapping for IOAPIC devices.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---

Index: tree-x86/drivers/pci/Makefile
===================================================================
--- tree-x86.orig/drivers/pci/Makefile	2008-07-10 09:51:45.000000000 -0700
+++ tree-x86/drivers/pci/Makefile	2008-07-10 09:51:57.000000000 -0700
@@ -26,6 +26,8 @@
 # Build Intel IOMMU support
 obj-$(CONFIG_DMAR) += dmar.o iova.o intel-iommu.o
 
+obj-$(CONFIG_INTR_REMAP) += dmar.o intr_remapping.o
+
 #
 # Some architectures use the generic PCI setup functions
 #
Index: tree-x86/drivers/pci/intel-iommu.h
===================================================================
--- tree-x86.orig/drivers/pci/intel-iommu.h	2008-07-10 09:51:51.000000000 -0700
+++ tree-x86/drivers/pci/intel-iommu.h	2008-07-10 09:51:57.000000000 -0700
@@ -114,6 +114,8 @@
 #define ecap_max_iotlb_offset(e) \
 	(ecap_iotlb_offset(e) + ecap_niotlb_iunits(e) * 16)
 #define ecap_coherent(e)	((e) & 0x1)
+#define ecap_eim_support(e)	((e >> 4) & 0x1)
+#define ecap_ir_support(e)	((e >> 3) & 0x1)
 
 
 /* IOTLB_REG */
Index: tree-x86/drivers/pci/intr_remapping.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ tree-x86/drivers/pci/intr_remapping.c	2008-07-10 09:51:57.000000000 -0700
@@ -0,0 +1,70 @@
+#include <linux/dmar.h>
+#include <asm/io_apic.h>
+#include "intel-iommu.h"
+#include "intr_remapping.h"
+
+static struct ioapic_scope ir_ioapic[MAX_IO_APICS];
+static int ir_ioapic_num;
+
+static int ir_parse_ioapic_scope(struct acpi_dmar_header *header,
+				 struct intel_iommu *iommu)
+{
+	struct acpi_dmar_hardware_unit *drhd;
+	struct acpi_dmar_device_scope *scope;
+	void *start, *end;
+
+	drhd = (struct acpi_dmar_hardware_unit *)header;
+
+	start = (void *)(drhd + 1);
+	end = ((void *)drhd) + header->length;
+
+	while (start < end) {
+		scope = start;
+		if (scope->entry_type == ACPI_DMAR_SCOPE_TYPE_IOAPIC) {
+			if (ir_ioapic_num == MAX_IO_APICS) {
+				printk(KERN_WARNING "Exceeded Max IO APICS\n");
+				return -1;
+			}
+
+			printk(KERN_INFO "IOAPIC id %d under DRHD base"
+			       " 0x%Lx\n", scope->enumeration_id,
+			       drhd->address);
+
+			ir_ioapic[ir_ioapic_num].iommu = iommu;
+			ir_ioapic[ir_ioapic_num].id = scope->enumeration_id;
+			ir_ioapic_num++;
+		}
+		start += scope->length;
+	}
+
+	return 0;
+}
+
+/*
+ * Finds the assocaition between IOAPIC's and its Interrupt-remapping
+ * hardware unit.
+ */
+int __init parse_ioapics_under_ir(void)
+{
+	struct dmar_drhd_unit *drhd;
+	int ir_supported = 0;
+
+	for_each_drhd_unit(drhd) {
+		struct intel_iommu *iommu = drhd->iommu;
+
+		if (ecap_ir_support(iommu->ecap)) {
+			if (ir_parse_ioapic_scope(drhd->hdr, iommu))
+				return -1;
+
+			ir_supported = 1;
+		}
+	}
+
+	if (ir_supported && ir_ioapic_num != nr_ioapics) {
+		printk(KERN_WARNING
+		       "Not all IO-APIC's listed under remapping hardware\n");
+		return -1;
+	}
+
+	return ir_supported;
+}
Index: tree-x86/include/linux/dmar.h
===================================================================
--- tree-x86.orig/include/linux/dmar.h	2008-07-10 09:51:51.000000000 -0700
+++ tree-x86/include/linux/dmar.h	2008-07-10 09:51:57.000000000 -0700
@@ -47,6 +47,7 @@
 extern int dmar_table_init(void);
 extern int early_dmar_detect(void);
 extern int dmar_dev_scope_init(void);
+extern int parse_ioapics_under_ir(void);
 
 extern struct list_head dmar_drhd_units;
 extern struct list_head dmar_rmrr_units;
Index: tree-x86/drivers/pci/dmar.c
===================================================================
--- tree-x86.orig/drivers/pci/dmar.c	2008-07-10 09:51:55.000000000 -0700
+++ tree-x86/drivers/pci/dmar.c	2008-07-10 09:51:57.000000000 -0700
@@ -423,6 +423,9 @@
 		printk(KERN_INFO PREFIX "No RMRR found\n");
 #endif
 
+#ifdef CONFIG_INTR_REMAP
+	parse_ioapics_under_ir();
+#endif
 	return 0;
 }
 
Index: tree-x86/drivers/pci/intr_remapping.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ tree-x86/drivers/pci/intr_remapping.h	2008-07-10 09:51:57.000000000 -0700
@@ -0,0 +1,6 @@
+#include "intel-iommu.h"
+
+struct ioapic_scope {
+	struct intel_iommu *iommu;
+	unsigned int id;
+};

-- 


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [patch 07/26] x64, x2apic/intr-remap: move IOMMU_WAIT_OP() macro to intel-iommu.h
  2008-07-10 18:16 [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support Suresh Siddha
                   ` (5 preceding siblings ...)
  2008-07-10 18:16 ` [patch 06/26] x64, x2apic/intr-remap: parse ioapic scope under vt-d structures Suresh Siddha
@ 2008-07-10 18:16 ` Suresh Siddha
  2008-07-10 18:16 ` [patch 08/26] x64, x2apic/intr-remap: Queued invalidation infrastructure (part of VT-d) Suresh Siddha
                   ` (20 subsequent siblings)
  27 siblings, 0 replies; 87+ messages in thread
From: Suresh Siddha @ 2008-07-10 18:16 UTC (permalink / raw)
  To: mingo, hpa, tglx, akpm, arjan, andi, ebiederm, jbarnes, steiner
  Cc: linux-kernel, Suresh Siddha

[-- Attachment #1: move_iommu_wait_io_macro.patch --]
[-- Type: text/plain, Size: 1955 bytes --]

move IOMMU_WAIT_OP() macro to header file.

This will be used by both DMA-remapping and Intr-remapping.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---

Index: tree-x86/drivers/pci/intel-iommu.c
===================================================================
--- tree-x86.orig/drivers/pci/intel-iommu.c	2008-07-10 09:51:51.000000000 -0700
+++ tree-x86/drivers/pci/intel-iommu.c	2008-07-10 09:51:59.000000000 -0700
@@ -49,8 +49,6 @@
 
 #define DEFAULT_DOMAIN_ADDRESS_WIDTH 48
 
-#define DMAR_OPERATION_TIMEOUT ((cycles_t) tsc_khz*10*1000) /* 10sec */
-
 #define DOMAIN_MAX_ADDR(gaw) ((((u64)1) << gaw) - 1)
 
 
@@ -486,19 +484,6 @@
 	return 0;
 }
 
-#define IOMMU_WAIT_OP(iommu, offset, op, cond, sts) \
-{\
-	cycles_t start_time = get_cycles();\
-	while (1) {\
-		sts = op (iommu->reg + offset);\
-		if (cond)\
-			break;\
-		if (DMAR_OPERATION_TIMEOUT < (get_cycles() - start_time))\
-			panic("DMAR hardware is malfunctioning\n");\
-		cpu_relax();\
-	}\
-}
-
 static void iommu_set_root_entry(struct intel_iommu *iommu)
 {
 	void *addr;
Index: tree-x86/drivers/pci/intel-iommu.h
===================================================================
--- tree-x86.orig/drivers/pci/intel-iommu.h	2008-07-10 09:51:57.000000000 -0700
+++ tree-x86/drivers/pci/intel-iommu.h	2008-07-10 09:51:59.000000000 -0700
@@ -177,6 +177,21 @@
 #define dma_frcd_source_id(c) (c & 0xffff)
 #define dma_frcd_page_addr(d) (d & (((u64)-1) << 12)) /* low 64 bit */
 
+#define DMAR_OPERATION_TIMEOUT ((cycles_t) tsc_khz*10*1000) /* 10sec */
+
+#define IOMMU_WAIT_OP(iommu, offset, op, cond, sts) \
+{\
+	cycles_t start_time = get_cycles();\
+	while (1) {\
+		sts = op (iommu->reg + offset);\
+		if (cond)\
+			break;\
+		if (DMAR_OPERATION_TIMEOUT < (get_cycles() - start_time))\
+			panic("DMAR hardware is malfunctioning\n");\
+		cpu_relax();\
+	}\
+}
+
 struct intel_iommu {
 	void __iomem	*reg; /* Pointer to hardware regs, virtual addr */
 	u64		cap;

-- 


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [patch 08/26] x64, x2apic/intr-remap: Queued invalidation infrastructure (part of VT-d)
  2008-07-10 18:16 [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support Suresh Siddha
                   ` (6 preceding siblings ...)
  2008-07-10 18:16 ` [patch 07/26] x64, x2apic/intr-remap: move IOMMU_WAIT_OP() macro to intel-iommu.h Suresh Siddha
@ 2008-07-10 18:16 ` Suresh Siddha
  2008-07-10 18:16 ` [patch 09/26] x64, x2apic/intr-remap: Interrupt remapping infrastructure Suresh Siddha
                   ` (19 subsequent siblings)
  27 siblings, 0 replies; 87+ messages in thread
From: Suresh Siddha @ 2008-07-10 18:16 UTC (permalink / raw)
  To: mingo, hpa, tglx, akpm, arjan, andi, ebiederm, jbarnes, steiner
  Cc: linux-kernel, Suresh Siddha

[-- Attachment #1: qi_infrastructure.patch --]
[-- Type: text/plain, Size: 8646 bytes --]

Queued invalidation (part of Intel Virtualization Technology for
Directed I/O architecture) infrastructure.

This will be used for invalidating the interrupt entry cache in the
case of Interrupt-remapping and IOTLB invalidation in the case
of DMA-remapping.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---

Index: tree-x86/drivers/pci/dmar.c
===================================================================
--- tree-x86.orig/drivers/pci/dmar.c	2008-07-10 09:51:57.000000000 -0700
+++ tree-x86/drivers/pci/dmar.c	2008-07-10 09:52:01.000000000 -0700
@@ -28,6 +28,7 @@
 
 #include <linux/pci.h>
 #include <linux/dmar.h>
+#include <linux/timer.h>
 #include "iova.h"
 #include "intel-iommu.h"
 
@@ -509,3 +510,152 @@
 		iounmap(iommu->reg);
 	kfree(iommu);
 }
+
+/*
+ * Reclaim all the submitted descriptors which have completed its work.
+ */
+static inline void reclaim_free_desc(struct q_inval *qi)
+{
+	while (qi->desc_status[qi->free_tail] == QI_DONE) {
+		qi->desc_status[qi->free_tail] = QI_FREE;
+		qi->free_tail = (qi->free_tail + 1) % QI_LENGTH;
+		qi->free_cnt++;
+	}
+}
+
+/*
+ * Submit the queued invalidation descriptor to the remapping
+ * hardware unit and wait for its completion.
+ */
+void qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu)
+{
+	struct q_inval *qi = iommu->qi;
+	struct qi_desc *hw, wait_desc;
+	int wait_index, index;
+	unsigned long flags;
+
+	if (!qi)
+		return;
+
+	hw = qi->desc;
+
+	spin_lock(&qi->q_lock);
+	while (qi->free_cnt < 3) {
+		spin_unlock(&qi->q_lock);
+		cpu_relax();
+		spin_lock(&qi->q_lock);
+	}
+
+	index = qi->free_head;
+	wait_index = (index + 1) % QI_LENGTH;
+
+	qi->desc_status[index] = qi->desc_status[wait_index] = QI_IN_USE;
+
+	hw[index] = *desc;
+
+	wait_desc.low = QI_IWD_STATUS_DATA(2) | QI_IWD_STATUS_WRITE | QI_IWD_TYPE;
+	wait_desc.high = virt_to_phys(&qi->desc_status[wait_index]);
+
+	hw[wait_index] = wait_desc;
+
+	__iommu_flush_cache(iommu, &hw[index], sizeof(struct qi_desc));
+	__iommu_flush_cache(iommu, &hw[wait_index], sizeof(struct qi_desc));
+
+	qi->free_head = (qi->free_head + 2) % QI_LENGTH;
+	qi->free_cnt -= 2;
+
+	spin_lock_irqsave(&iommu->register_lock, flags);
+	/*
+	 * update the HW tail register indicating the presence of
+	 * new descriptors.
+	 */
+	writel(qi->free_head << 4, iommu->reg + DMAR_IQT_REG);
+	spin_unlock_irqrestore(&iommu->register_lock, flags);
+
+	while (qi->desc_status[wait_index] != QI_DONE) {
+		spin_unlock(&qi->q_lock);
+		cpu_relax();
+		spin_lock(&qi->q_lock);
+	}
+
+	qi->desc_status[index] = QI_DONE;
+
+	reclaim_free_desc(qi);
+	spin_unlock(&qi->q_lock);
+}
+
+/*
+ * Flush the global interrupt entry cache.
+ */
+void qi_global_iec(struct intel_iommu *iommu)
+{
+	struct qi_desc desc;
+
+	desc.low = QI_IEC_TYPE;
+	desc.high = 0;
+
+	qi_submit_sync(&desc, iommu);
+}
+
+/*
+ * Enable Queued Invalidation interface. This is a must to support
+ * interrupt-remapping. Also used by DMA-remapping, which replaces
+ * register based IOTLB invalidation.
+ */
+int dmar_enable_qi(struct intel_iommu *iommu)
+{
+	u32 cmd, sts;
+	unsigned long flags;
+	struct q_inval *qi;
+
+	if (!ecap_qis(iommu->ecap))
+		return -ENOENT;
+
+	/*
+	 * queued invalidation is already setup and enabled.
+	 */
+	if (iommu->qi)
+		return 0;
+
+	iommu->qi = kmalloc(sizeof(*qi), GFP_KERNEL);
+	if (!iommu->qi)
+		return -ENOMEM;
+
+	qi = iommu->qi;
+
+	qi->desc = (void *)(get_zeroed_page(GFP_KERNEL));
+	if (!qi->desc) {
+		kfree(qi);
+		iommu->qi = 0;
+		return -ENOMEM;
+	}
+
+	qi->desc_status = kmalloc(QI_LENGTH * sizeof(int), GFP_KERNEL);
+	if (!qi->desc_status) {
+		free_page((unsigned long) qi->desc);
+		kfree(qi);
+		iommu->qi = 0;
+		return -ENOMEM;
+	}
+
+	qi->free_head = qi->free_tail = 0;
+	qi->free_cnt = QI_LENGTH;
+
+	spin_lock_init(&qi->q_lock);
+
+	spin_lock_irqsave(&iommu->register_lock, flags);
+	/* write zero to the tail reg */
+	writel(0, iommu->reg + DMAR_IQT_REG);
+
+	dmar_writeq(iommu->reg + DMAR_IQA_REG, virt_to_phys(qi->desc));
+
+	cmd = iommu->gcmd | DMA_GCMD_QIE;
+	iommu->gcmd |= DMA_GCMD_QIE;
+	writel(cmd, iommu->reg + DMAR_GCMD_REG);
+
+	/* Make sure hardware complete it */
+	IOMMU_WAIT_OP(iommu, DMAR_GSTS_REG, readl, (sts & DMA_GSTS_QIES), sts);
+	spin_unlock_irqrestore(&iommu->register_lock, flags);
+
+	return 0;
+}
Index: tree-x86/drivers/pci/intel-iommu.c
===================================================================
--- tree-x86.orig/drivers/pci/intel-iommu.c	2008-07-10 09:51:59.000000000 -0700
+++ tree-x86/drivers/pci/intel-iommu.c	2008-07-10 09:52:01.000000000 -0700
@@ -181,13 +181,6 @@
 	kmem_cache_free(iommu_iova_cache, iova);
 }
 
-static inline void __iommu_flush_cache(
-	struct intel_iommu *iommu, void *addr, int size)
-{
-	if (!ecap_coherent(iommu->ecap))
-		clflush_cache_range(addr, size);
-}
-
 /* Gets context entry for a given bus and devfn */
 static struct context_entry * device_to_context_entry(struct intel_iommu *iommu,
 		u8 bus, u8 devfn)
Index: tree-x86/drivers/pci/intel-iommu.h
===================================================================
--- tree-x86.orig/drivers/pci/intel-iommu.h	2008-07-10 09:51:59.000000000 -0700
+++ tree-x86/drivers/pci/intel-iommu.h	2008-07-10 09:52:01.000000000 -0700
@@ -27,6 +27,7 @@
 #include <linux/sysdev.h>
 #include "iova.h"
 #include <linux/io.h>
+#include <asm/cacheflush.h>
 #include "dma_remapping.h"
 
 /*
@@ -51,6 +52,10 @@
 #define	DMAR_PLMLIMIT_REG 0x6c	/* PMRR low limit */
 #define	DMAR_PHMBASE_REG 0x70	/* pmrr high base addr */
 #define	DMAR_PHMLIMIT_REG 0x78	/* pmrr high limit */
+#define DMAR_IQH_REG	0x80	/* Invalidation queue head register */
+#define DMAR_IQT_REG	0x88	/* Invalidation queue tail register */
+#define DMAR_IQA_REG	0x90	/* Invalidation queue addr register */
+#define DMAR_ICS_REG	0x98	/* Invalidation complete status register */
 
 #define OFFSET_STRIDE		(9)
 /*
@@ -114,6 +119,7 @@
 #define ecap_max_iotlb_offset(e) \
 	(ecap_iotlb_offset(e) + ecap_niotlb_iunits(e) * 16)
 #define ecap_coherent(e)	((e) & 0x1)
+#define ecap_qis(e)		((e) & 0x2)
 #define ecap_eim_support(e)	((e >> 4) & 0x1)
 #define ecap_ir_support(e)	((e >> 3) & 0x1)
 
@@ -131,6 +137,17 @@
 #define DMA_TLB_IH_NONLEAF (((u64)1) << 6)
 #define DMA_TLB_MAX_SIZE (0x3f)
 
+/* INVALID_DESC */
+#define DMA_ID_TLB_GLOBAL_FLUSH	(((u64)1) << 3)
+#define DMA_ID_TLB_DSI_FLUSH	(((u64)2) << 3)
+#define DMA_ID_TLB_PSI_FLUSH	(((u64)3) << 3)
+#define DMA_ID_TLB_READ_DRAIN	(((u64)1) << 7)
+#define DMA_ID_TLB_WRITE_DRAIN	(((u64)1) << 6)
+#define DMA_ID_TLB_DID(id)	(((u64)((id & 0xffff) << 16)))
+#define DMA_ID_TLB_IH_NONLEAF	(((u64)1) << 6)
+#define DMA_ID_TLB_ADDR(addr)	(addr)
+#define DMA_ID_TLB_ADDR_MASK(mask)	(mask)
+
 /* PMEN_REG */
 #define DMA_PMEN_EPM (((u32)1)<<31)
 #define DMA_PMEN_PRS (((u32)1)<<0)
@@ -140,6 +157,7 @@
 #define DMA_GCMD_SRTP (((u32)1) << 30)
 #define DMA_GCMD_SFL (((u32)1) << 29)
 #define DMA_GCMD_EAFL (((u32)1) << 28)
+#define DMA_GCMD_QIE (((u32)1) << 26)
 #define DMA_GCMD_WBF (((u32)1) << 27)
 
 /* GSTS_REG */
@@ -147,6 +165,7 @@
 #define DMA_GSTS_RTPS (((u32)1) << 30)
 #define DMA_GSTS_FLS (((u32)1) << 29)
 #define DMA_GSTS_AFLS (((u32)1) << 28)
+#define DMA_GSTS_QIES (((u32)1) << 26)
 #define DMA_GSTS_WBFS (((u32)1) << 27)
 
 /* CCMD_REG */
@@ -192,6 +211,40 @@
 	}\
 }
 
+#define QI_LENGTH	256	/* queue length */
+
+enum {
+	QI_FREE,
+	QI_IN_USE,
+	QI_DONE
+};
+
+#define QI_CC_TYPE		0x1
+#define QI_IOTLB_TYPE		0x2
+#define QI_DIOTLB_TYPE		0x3
+#define QI_IEC_TYPE		0x4
+#define QI_IWD_TYPE		0x5
+
+#define QI_IEC_SELECTIVE	(((u64)1) << 4)
+#define QI_IEC_IIDEX(idx)	(((u64)(idx & 0xffff) << 32))
+#define QI_IEC_IM(m)		(((u64)(m & 0x1f) << 27))
+
+#define QI_IWD_STATUS_DATA(d)	(((u64)d) << 32)
+#define QI_IWD_STATUS_WRITE	(((u64)1) << 5)
+
+struct qi_desc {
+	u64 low, high;
+};
+
+struct q_inval {
+	spinlock_t      q_lock;
+	struct qi_desc  *desc;          /* invalidation queue */
+	int             *desc_status;   /* desc status */
+	int             free_head;      /* first free entry */
+	int             free_tail;      /* last free entry */
+	int             free_cnt;
+};
+
 struct intel_iommu {
 	void __iomem	*reg; /* Pointer to hardware regs, virtual addr */
 	u64		cap;
@@ -212,8 +265,16 @@
 	struct msi_msg saved_msg;
 	struct sys_device sysdev;
 #endif
+	struct q_inval  *qi;            /* Queued invalidation info */
 };
 
+static inline void __iommu_flush_cache(
+	struct intel_iommu *iommu, void *addr, int size)
+{
+	if (!ecap_coherent(iommu->ecap))
+		clflush_cache_range(addr, size);
+}
+
 extern struct dmar_drhd_unit * dmar_find_matched_drhd_unit(struct pci_dev *dev);
 
 extern int alloc_iommu(struct dmar_drhd_unit *drhd);

-- 


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [patch 09/26] x64, x2apic/intr-remap: Interrupt remapping infrastructure
  2008-07-10 18:16 [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support Suresh Siddha
                   ` (7 preceding siblings ...)
  2008-07-10 18:16 ` [patch 08/26] x64, x2apic/intr-remap: Queued invalidation infrastructure (part of VT-d) Suresh Siddha
@ 2008-07-10 18:16 ` Suresh Siddha
  2008-07-10 18:16 ` [patch 10/26] x64, x2apic/intr-remap: routines managing Interrupt remapping table entries Suresh Siddha
                   ` (18 subsequent siblings)
  27 siblings, 0 replies; 87+ messages in thread
From: Suresh Siddha @ 2008-07-10 18:16 UTC (permalink / raw)
  To: mingo, hpa, tglx, akpm, arjan, andi, ebiederm, jbarnes, steiner
  Cc: linux-kernel, Suresh Siddha

[-- Attachment #1: intr_remapping_infrastructure.patch --]
[-- Type: text/plain, Size: 12763 bytes --]

Interrupt remapping (part of Intel Virtualization Tech for directed I/O)
infrastructure.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---

Index: tree-x86/drivers/pci/intel-iommu.h
===================================================================
--- tree-x86.orig/drivers/pci/intel-iommu.h	2008-07-10 09:52:01.000000000 -0700
+++ tree-x86/drivers/pci/intel-iommu.h	2008-07-10 09:52:03.000000000 -0700
@@ -56,6 +56,7 @@
 #define DMAR_IQT_REG	0x88	/* Invalidation queue tail register */
 #define DMAR_IQA_REG	0x90	/* Invalidation queue addr register */
 #define DMAR_ICS_REG	0x98	/* Invalidation complete status register */
+#define DMAR_IRTA_REG	0xb8    /* Interrupt remapping table addr register */
 
 #define OFFSET_STRIDE		(9)
 /*
@@ -157,16 +158,20 @@
 #define DMA_GCMD_SRTP (((u32)1) << 30)
 #define DMA_GCMD_SFL (((u32)1) << 29)
 #define DMA_GCMD_EAFL (((u32)1) << 28)
-#define DMA_GCMD_QIE (((u32)1) << 26)
 #define DMA_GCMD_WBF (((u32)1) << 27)
+#define DMA_GCMD_QIE (((u32)1) << 26)
+#define DMA_GCMD_SIRTP (((u32)1) << 24)
+#define DMA_GCMD_IRE (((u32) 1) << 25)
 
 /* GSTS_REG */
 #define DMA_GSTS_TES (((u32)1) << 31)
 #define DMA_GSTS_RTPS (((u32)1) << 30)
 #define DMA_GSTS_FLS (((u32)1) << 29)
 #define DMA_GSTS_AFLS (((u32)1) << 28)
-#define DMA_GSTS_QIES (((u32)1) << 26)
 #define DMA_GSTS_WBFS (((u32)1) << 27)
+#define DMA_GSTS_QIES (((u32)1) << 26)
+#define DMA_GSTS_IRTPS (((u32)1) << 24)
+#define DMA_GSTS_IRES (((u32)1) << 25)
 
 /* CCMD_REG */
 #define DMA_CCMD_ICC (((u64)1) << 63)
@@ -245,6 +250,16 @@
 	int             free_cnt;
 };
 
+#ifdef CONFIG_INTR_REMAP
+/* 1MB - maximum possible interrupt remapping table size */
+#define INTR_REMAP_PAGE_ORDER	8
+#define INTR_REMAP_TABLE_REG_SIZE	0xf
+
+struct ir_table {
+	struct irte *base;
+};
+#endif
+
 struct intel_iommu {
 	void __iomem	*reg; /* Pointer to hardware regs, virtual addr */
 	u64		cap;
@@ -266,6 +281,9 @@
 	struct sys_device sysdev;
 #endif
 	struct q_inval  *qi;            /* Queued invalidation info */
+#ifdef CONFIG_INTR_REMAP
+	struct ir_table *ir_table;	/* Interrupt remapping info */
+#endif
 };
 
 static inline void __iommu_flush_cache(
@@ -279,5 +297,7 @@
 
 extern int alloc_iommu(struct dmar_drhd_unit *drhd);
 extern void free_iommu(struct intel_iommu *iommu);
+extern int dmar_enable_qi(struct intel_iommu *iommu);
+extern void qi_global_iec(struct intel_iommu *iommu);
 
 #endif
Index: tree-x86/drivers/pci/intr_remapping.c
===================================================================
--- tree-x86.orig/drivers/pci/intr_remapping.c	2008-07-10 09:51:57.000000000 -0700
+++ tree-x86/drivers/pci/intr_remapping.c	2008-07-10 09:52:03.000000000 -0700
@@ -1,10 +1,147 @@
 #include <linux/dmar.h>
+#include <linux/spinlock.h>
+#include <linux/jiffies.h>
+#include <linux/pci.h>
 #include <asm/io_apic.h>
 #include "intel-iommu.h"
 #include "intr_remapping.h"
 
 static struct ioapic_scope ir_ioapic[MAX_IO_APICS];
 static int ir_ioapic_num;
+int intr_remapping_enabled;
+
+static void iommu_set_intr_remapping(struct intel_iommu *iommu, int mode)
+{
+	u64 addr;
+	u32 cmd, sts;
+	unsigned long flags;
+
+	addr = virt_to_phys((void *)iommu->ir_table->base);
+
+	spin_lock_irqsave(&iommu->register_lock, flags);
+
+	dmar_writeq(iommu->reg + DMAR_IRTA_REG,
+		    (addr) | IR_X2APIC_MODE(mode) | INTR_REMAP_TABLE_REG_SIZE);
+
+	/* Set interrupt-remapping table pointer */
+	cmd = iommu->gcmd | DMA_GCMD_SIRTP;
+	writel(cmd, iommu->reg + DMAR_GCMD_REG);
+
+	IOMMU_WAIT_OP(iommu, DMAR_GSTS_REG,
+		      readl, (sts & DMA_GSTS_IRTPS), sts);
+	spin_unlock_irqrestore(&iommu->register_lock, flags);
+
+	/*
+	 * global invalidation of interrupt entry cache before enabling
+	 * interrupt-remapping.
+	 */
+	qi_global_iec(iommu);
+
+	spin_lock_irqsave(&iommu->register_lock, flags);
+
+	/* Enable interrupt-remapping */
+	cmd = iommu->gcmd | DMA_GCMD_IRE;
+	iommu->gcmd |= DMA_GCMD_IRE;
+	writel(cmd, iommu->reg + DMAR_GCMD_REG);
+
+	IOMMU_WAIT_OP(iommu, DMAR_GSTS_REG,
+		      readl, (sts & DMA_GSTS_IRES), sts);
+
+	spin_unlock_irqrestore(&iommu->register_lock, flags);
+}
+
+
+static int setup_intr_remapping(struct intel_iommu *iommu, int mode)
+{
+	struct ir_table *ir_table;
+	struct page *pages;
+
+	ir_table = iommu->ir_table = kzalloc(sizeof(struct ir_table),
+					     GFP_KERNEL);
+
+	if (!iommu->ir_table)
+		return -ENOMEM;
+
+	pages = alloc_pages(GFP_KERNEL | __GFP_ZERO, INTR_REMAP_PAGE_ORDER);
+
+	if (!pages) {
+		printk(KERN_ERR "failed to allocate pages of order %d\n",
+		       INTR_REMAP_PAGE_ORDER);
+		kfree(iommu->ir_table);
+		return -ENOMEM;
+	}
+
+	ir_table->base = page_address(pages);
+
+	iommu_set_intr_remapping(iommu, mode);
+	return 0;
+}
+
+int __init enable_intr_remapping(int eim)
+{
+	struct dmar_drhd_unit *drhd;
+	int setup = 0;
+
+	/*
+	 * check for the Interrupt-remapping support
+	 */
+	for_each_drhd_unit(drhd) {
+		struct intel_iommu *iommu = drhd->iommu;
+
+		if (!ecap_ir_support(iommu->ecap))
+			continue;
+
+		if (eim && !ecap_eim_support(iommu->ecap)) {
+			printk(KERN_INFO "DRHD %Lx: EIM not supported by DRHD, "
+			       " ecap %Lx\n", drhd->reg_base_addr, iommu->ecap);
+			return -1;
+		}
+	}
+
+	/*
+	 * Enable queued invalidation for all the DRHD's.
+	 */
+	for_each_drhd_unit(drhd) {
+		int ret;
+		struct intel_iommu *iommu = drhd->iommu;
+		ret = dmar_enable_qi(iommu);
+
+		if (ret) {
+			printk(KERN_ERR "DRHD %Lx: failed to enable queued, "
+			       " invalidation, ecap %Lx, ret %d\n",
+			       drhd->reg_base_addr, iommu->ecap, ret);
+			return -1;
+		}
+	}
+
+	/*
+	 * Setup Interrupt-remapping for all the DRHD's now.
+	 */
+	for_each_drhd_unit(drhd) {
+		struct intel_iommu *iommu = drhd->iommu;
+
+		if (!ecap_ir_support(iommu->ecap))
+			continue;
+
+		if (setup_intr_remapping(iommu, eim))
+			goto error;
+
+		setup = 1;
+	}
+
+	if (!setup)
+		goto error;
+
+	intr_remapping_enabled = 1;
+
+	return 0;
+
+error:
+	/*
+	 * handle error condition gracefully here!
+	 */
+	return -1;
+}
 
 static int ir_parse_ioapic_scope(struct acpi_dmar_header *header,
 				 struct intel_iommu *iommu)
Index: tree-x86/include/linux/dmar.h
===================================================================
--- tree-x86.orig/include/linux/dmar.h	2008-07-10 09:51:57.000000000 -0700
+++ tree-x86/include/linux/dmar.h	2008-07-10 09:52:03.000000000 -0700
@@ -25,9 +25,85 @@
 #include <linux/types.h>
 #include <linux/msi.h>
 
-#ifdef CONFIG_DMAR
+#if defined(CONFIG_DMAR) || defined(CONFIG_INTR_REMAP)
 struct intel_iommu;
 
+struct dmar_drhd_unit {
+	struct list_head list;		/* list of drhd units	*/
+	struct  acpi_dmar_header *hdr;	/* ACPI header		*/
+	u64	reg_base_addr;		/* register base address*/
+	struct	pci_dev **devices; 	/* target device array	*/
+	int	devices_cnt;		/* target device count	*/
+	u8	ignored:1; 		/* ignore drhd		*/
+	u8	include_all:1;
+	struct intel_iommu *iommu;
+};
+
+extern struct list_head dmar_drhd_units;
+
+#define for_each_drhd_unit(drhd) \
+	list_for_each_entry(drhd, &dmar_drhd_units, list)
+
+extern int dmar_table_init(void);
+extern int early_dmar_detect(void);
+extern int dmar_dev_scope_init(void);
+
+/* Intel IOMMU detection */
+extern void detect_intel_iommu(void);
+
+
+extern int parse_ioapics_under_ir(void);
+extern int alloc_iommu(struct dmar_drhd_unit *);
+#else
+static inline void detect_intel_iommu(void)
+{
+	return;
+}
+
+static inline int dmar_table_init(void)
+{
+	return -ENODEV;
+}
+#endif /* !CONFIG_DMAR && !CONFIG_INTR_REMAP */
+
+#ifdef CONFIG_INTR_REMAP
+extern int intr_remapping_enabled;
+extern int enable_intr_remapping(int);
+
+struct irte {
+	union {
+		struct {
+			__u64	present 	: 1,
+				fpd		: 1,
+				dst_mode	: 1,
+				redir_hint	: 1,
+				trigger_mode	: 1,
+				dlvry_mode	: 3,
+				avail		: 4,
+				__reserved_1	: 4,
+				vector		: 8,
+				__reserved_2	: 8,
+				dest_id		: 32;
+		};
+		__u64 low;
+	};
+
+	union {
+		struct {
+			__u64	sid		: 16,
+				sq		: 2,
+				svt		: 2,
+				__reserved_3	: 44;
+		};
+		__u64 high;
+	};
+};
+#else
+#define enable_intr_remapping(mode)	(-1)
+#define intr_remapping_enabled		(0)
+#endif
+
+#ifdef CONFIG_DMAR
 extern const char *dmar_get_fault_reason(u8 fault_reason);
 
 /* Can't use the common MSI interrupt functions
@@ -40,29 +116,8 @@
 extern int dmar_set_interrupt(struct intel_iommu *iommu);
 extern int arch_setup_dmar_msi(unsigned int irq);
 
-/* Intel IOMMU detection and initialization functions */
-extern void detect_intel_iommu(void);
-extern int intel_iommu_init(void);
-
-extern int dmar_table_init(void);
-extern int early_dmar_detect(void);
-extern int dmar_dev_scope_init(void);
-extern int parse_ioapics_under_ir(void);
-
-extern struct list_head dmar_drhd_units;
+extern int iommu_detected, no_iommu;
 extern struct list_head dmar_rmrr_units;
-
-struct dmar_drhd_unit {
-	struct list_head list;		/* list of drhd units	*/
-	struct  acpi_dmar_header *hdr;	/* ACPI header		*/
-	u64	reg_base_addr;		/* register base address*/
-	struct	pci_dev **devices; 	/* target device array	*/
-	int	devices_cnt;		/* target device count	*/
-	u8	ignored:1; 		/* ignore drhd		*/
-	u8	include_all:1;
-	struct intel_iommu *iommu;
-};
-
 struct dmar_rmrr_unit {
 	struct list_head list;		/* list of rmrr units	*/
 	struct acpi_dmar_header *hdr;	/* ACPI header		*/
@@ -72,24 +127,19 @@
 	int	devices_cnt;		/* target device count */
 };
 
-#define for_each_drhd_unit(drhd) \
-	list_for_each_entry(drhd, &dmar_drhd_units, list)
 #define for_each_rmrr_units(rmrr) \
 	list_for_each_entry(rmrr, &dmar_rmrr_units, list)
-
-extern int alloc_iommu(struct dmar_drhd_unit *);
+/* Intel DMAR  initialization functions */
+extern int intel_iommu_init(void);
+extern int dmar_disabled;
 #else
-static inline void detect_intel_iommu(void)
-{
-	return;
-}
 static inline int intel_iommu_init(void)
 {
+#ifdef CONFIG_INTR_REMAP
+	return dmar_dev_scope_init();
+#else
 	return -ENODEV;
-}
-static inline int dmar_table_init(void)
-{
-	return -ENODEV;
+#endif
 }
 #endif /* !CONFIG_DMAR */
 #endif /* __DMAR_H__ */
Index: tree-x86/drivers/pci/dma_remapping.h
===================================================================
--- tree-x86.orig/drivers/pci/dma_remapping.h	2008-07-10 09:51:46.000000000 -0700
+++ tree-x86/drivers/pci/dma_remapping.h	2008-07-10 09:52:03.000000000 -0700
@@ -145,6 +145,8 @@
 extern int init_dmars(void);
 extern void free_dmar_iommu(struct intel_iommu *iommu);
 
+extern int dmar_disabled;
+
 #ifndef CONFIG_DMAR_GFX_WA
 static inline void iommu_prepare_gfx_mapping(void)
 {
Index: tree-x86/drivers/pci/intel-iommu.c
===================================================================
--- tree-x86.orig/drivers/pci/intel-iommu.c	2008-07-10 09:52:01.000000000 -0700
+++ tree-x86/drivers/pci/intel-iommu.c	2008-07-10 09:52:03.000000000 -0700
@@ -76,7 +76,7 @@
 
 static void domain_remove_dev_info(struct dmar_domain *domain);
 
-static int dmar_disabled;
+int dmar_disabled;
 static int __initdata dmar_map_gfx = 1;
 static int dmar_forcedac;
 static int intel_iommu_strict;
@@ -2238,15 +2238,6 @@
 
 }
 
-void __init detect_intel_iommu(void)
-{
-	if (swiotlb || no_iommu || iommu_detected || dmar_disabled)
-		return;
-	if (early_dmar_detect()) {
-		iommu_detected = 1;
-	}
-}
-
 static void __init init_no_remapping_devices(void)
 {
 	struct dmar_drhd_unit *drhd;
@@ -2293,15 +2284,19 @@
 {
 	int ret = 0;
 
-	if (no_iommu || swiotlb || dmar_disabled)
-		return -ENODEV;
-
 	if (dmar_table_init())
 		return 	-ENODEV;
 
 	if (dmar_dev_scope_init())
 		return 	-ENODEV;
 
+	/*
+	 * Check the need for DMA-remapping initialization now.
+	 * Above initialization will also be used by Interrupt-remapping.
+	 */
+	if (no_iommu || swiotlb || dmar_disabled)
+		return -ENODEV;
+
 	iommu_init_mempool();
 	dmar_init_reserved_ranges();
 
Index: tree-x86/drivers/pci/intr_remapping.h
===================================================================
--- tree-x86.orig/drivers/pci/intr_remapping.h	2008-07-10 09:51:57.000000000 -0700
+++ tree-x86/drivers/pci/intr_remapping.h	2008-07-10 09:52:03.000000000 -0700
@@ -4,3 +4,5 @@
 	struct intel_iommu *iommu;
 	unsigned int id;
 };
+
+#define IR_X2APIC_MODE(mode) (mode ? (1 << 11) : 0)
Index: tree-x86/drivers/pci/dmar.c
===================================================================
--- tree-x86.orig/drivers/pci/dmar.c	2008-07-10 09:52:01.000000000 -0700
+++ tree-x86/drivers/pci/dmar.c	2008-07-10 09:52:03.000000000 -0700
@@ -449,6 +449,22 @@
 	return (ACPI_SUCCESS(status) ? 1 : 0);
 }
 
+void __init detect_intel_iommu(void)
+{
+	int ret;
+
+	ret = early_dmar_detect();
+
+#ifdef CONFIG_DMAR
+	{
+		if (ret && !no_iommu && !iommu_detected && !swiotlb &&
+		    !dmar_disabled)
+			iommu_detected = 1;
+	}
+#endif
+}
+
+
 int alloc_iommu(struct dmar_drhd_unit *drhd)
 {
 	struct intel_iommu *iommu;

-- 


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [patch 10/26] x64, x2apic/intr-remap: routines managing Interrupt remapping table entries.
  2008-07-10 18:16 [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support Suresh Siddha
                   ` (8 preceding siblings ...)
  2008-07-10 18:16 ` [patch 09/26] x64, x2apic/intr-remap: Interrupt remapping infrastructure Suresh Siddha
@ 2008-07-10 18:16 ` Suresh Siddha
  2008-07-10 18:16 ` [patch 11/26] x64, x2apic/intr-remap: generic irq migration support from process context Suresh Siddha
                   ` (17 subsequent siblings)
  27 siblings, 0 replies; 87+ messages in thread
From: Suresh Siddha @ 2008-07-10 18:16 UTC (permalink / raw)
  To: mingo, hpa, tglx, akpm, arjan, andi, ebiederm, jbarnes, steiner
  Cc: linux-kernel, Suresh Siddha

[-- Attachment #1: irte_management_routines.patch --]
[-- Type: text/plain, Size: 7772 bytes --]

Routines handling the management of interrupt remapping table entries.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---

Index: tree-x86/drivers/pci/intr_remapping.c
===================================================================
--- tree-x86.orig/drivers/pci/intr_remapping.c	2008-07-10 09:52:03.000000000 -0700
+++ tree-x86/drivers/pci/intr_remapping.c	2008-07-10 09:52:05.000000000 -0700
@@ -2,6 +2,7 @@
 #include <linux/spinlock.h>
 #include <linux/jiffies.h>
 #include <linux/pci.h>
+#include <linux/irq.h>
 #include <asm/io_apic.h>
 #include "intel-iommu.h"
 #include "intr_remapping.h"
@@ -10,6 +11,248 @@
 static int ir_ioapic_num;
 int intr_remapping_enabled;
 
+static struct {
+	struct intel_iommu *iommu;
+	u16 irte_index;
+	u16 sub_handle;
+	u8  irte_mask;
+} irq_2_iommu[NR_IRQS];
+
+static DEFINE_SPINLOCK(irq_2_ir_lock);
+
+int irq_remapped(int irq)
+{
+	if (irq > NR_IRQS)
+		return 0;
+
+	if (!irq_2_iommu[irq].iommu)
+		return 0;
+
+	return 1;
+}
+
+int get_irte(int irq, struct irte *entry)
+{
+	int index;
+
+	if (!entry || irq > NR_IRQS)
+		return -1;
+
+	spin_lock(&irq_2_ir_lock);
+	if (!irq_2_iommu[irq].iommu) {
+		spin_unlock(&irq_2_ir_lock);
+		return -1;
+	}
+
+	index = irq_2_iommu[irq].irte_index + irq_2_iommu[irq].sub_handle;
+	*entry = *(irq_2_iommu[irq].iommu->ir_table->base + index);
+
+	spin_unlock(&irq_2_ir_lock);
+	return 0;
+}
+
+int alloc_irte(struct intel_iommu *iommu, int irq, u16 count)
+{
+	struct ir_table *table = iommu->ir_table;
+	u16 index, start_index;
+	unsigned int mask = 0;
+	int i;
+
+	if (!count)
+		return -1;
+
+	/*
+	 * start the IRTE search from index 0.
+	 */
+	index = start_index = 0;
+
+	if (count > 1) {
+		count = __roundup_pow_of_two(count);
+		mask = ilog2(count);
+	}
+
+	if (mask > ecap_max_handle_mask(iommu->ecap)) {
+		printk(KERN_ERR
+		       "Requested mask %x exceeds the max invalidation handle"
+		       " mask value %Lx\n", mask,
+		       ecap_max_handle_mask(iommu->ecap));
+		return -1;
+	}
+
+	spin_lock(&irq_2_ir_lock);
+	do {
+		for (i = index; i < index + count; i++)
+			if  (table->base[i].present)
+				break;
+		/* empty index found */
+		if (i == index + count)
+			break;
+
+		index = (index + count) % INTR_REMAP_TABLE_ENTRIES;
+
+		if (index == start_index) {
+			spin_unlock(&irq_2_ir_lock);
+			printk(KERN_ERR "can't allocate an IRTE\n");
+			return -1;
+		}
+	} while (1);
+
+	for (i = index; i < index + count; i++)
+		table->base[i].present = 1;
+
+	irq_2_iommu[irq].iommu = iommu;
+	irq_2_iommu[irq].irte_index =  index;
+	irq_2_iommu[irq].sub_handle = 0;
+	irq_2_iommu[irq].irte_mask = mask;
+
+	spin_unlock(&irq_2_ir_lock);
+
+	return index;
+}
+
+static void qi_flush_iec(struct intel_iommu *iommu, int index, int mask)
+{
+	struct qi_desc desc;
+
+	desc.low = QI_IEC_IIDEX(index) | QI_IEC_TYPE | QI_IEC_IM(mask)
+		   | QI_IEC_SELECTIVE;
+	desc.high = 0;
+
+	qi_submit_sync(&desc, iommu);
+}
+
+int map_irq_to_irte_handle(int irq, u16 *sub_handle)
+{
+	int index;
+
+	spin_lock(&irq_2_ir_lock);
+	if (irq >= NR_IRQS || !irq_2_iommu[irq].iommu) {
+		spin_unlock(&irq_2_ir_lock);
+		return -1;
+	}
+
+	*sub_handle = irq_2_iommu[irq].sub_handle;
+	index = irq_2_iommu[irq].irte_index;
+	spin_unlock(&irq_2_ir_lock);
+	return index;
+}
+
+int set_irte_irq(int irq, struct intel_iommu *iommu, u16 index, u16 subhandle)
+{
+	spin_lock(&irq_2_ir_lock);
+	if (irq >= NR_IRQS || irq_2_iommu[irq].iommu) {
+		spin_unlock(&irq_2_ir_lock);
+		return -1;
+	}
+
+	irq_2_iommu[irq].iommu = iommu;
+	irq_2_iommu[irq].irte_index = index;
+	irq_2_iommu[irq].sub_handle = subhandle;
+	irq_2_iommu[irq].irte_mask = 0;
+
+	spin_unlock(&irq_2_ir_lock);
+
+	return 0;
+}
+
+int clear_irte_irq(int irq, struct intel_iommu *iommu, u16 index)
+{
+	spin_lock(&irq_2_ir_lock);
+	if (irq >= NR_IRQS || !irq_2_iommu[irq].iommu) {
+		spin_unlock(&irq_2_ir_lock);
+		return -1;
+	}
+
+	irq_2_iommu[irq].iommu = NULL;
+	irq_2_iommu[irq].irte_index = 0;
+	irq_2_iommu[irq].sub_handle = 0;
+	irq_2_iommu[irq].irte_mask = 0;
+
+	spin_unlock(&irq_2_ir_lock);
+
+	return 0;
+}
+
+int modify_irte(int irq, struct irte *irte_modified)
+{
+	int index;
+	struct irte *irte;
+	struct intel_iommu *iommu;
+
+	spin_lock(&irq_2_ir_lock);
+	if (irq >= NR_IRQS || !irq_2_iommu[irq].iommu) {
+		spin_unlock(&irq_2_ir_lock);
+		return -1;
+	}
+
+	iommu = irq_2_iommu[irq].iommu;
+
+	index = irq_2_iommu[irq].irte_index + irq_2_iommu[irq].sub_handle;
+	irte = &iommu->ir_table->base[index];
+
+	set_64bit((unsigned long *)irte, irte_modified->low | (1 << 1));
+	__iommu_flush_cache(iommu, irte, sizeof(*irte));
+
+	qi_flush_iec(iommu, index, 0);
+
+	spin_unlock(&irq_2_ir_lock);
+	return 0;
+}
+
+int flush_irte(int irq)
+{
+	int index;
+	struct intel_iommu *iommu;
+
+	spin_lock(&irq_2_ir_lock);
+	if (irq >= NR_IRQS || !irq_2_iommu[irq].iommu) {
+		spin_unlock(&irq_2_ir_lock);
+		return -1;
+	}
+
+	iommu = irq_2_iommu[irq].iommu;
+
+	index = irq_2_iommu[irq].irte_index + irq_2_iommu[irq].sub_handle;
+
+	qi_flush_iec(iommu, index, irq_2_iommu[irq].irte_mask);
+	spin_unlock(&irq_2_ir_lock);
+
+	return 0;
+}
+
+int free_irte(int irq)
+{
+	int index, i;
+	struct irte *irte;
+	struct intel_iommu *iommu;
+
+	spin_lock(&irq_2_ir_lock);
+	if (irq >= NR_IRQS || !irq_2_iommu[irq].iommu) {
+		spin_unlock(&irq_2_ir_lock);
+		return -1;
+	}
+
+	iommu = irq_2_iommu[irq].iommu;
+
+	index = irq_2_iommu[irq].irte_index + irq_2_iommu[irq].sub_handle;
+	irte = &iommu->ir_table->base[index];
+
+	if (!irq_2_iommu[irq].sub_handle) {
+		for (i = 0; i < (1 << irq_2_iommu[irq].irte_mask); i++)
+			set_64bit((unsigned long *)irte, 0);
+		qi_flush_iec(iommu, index, irq_2_iommu[irq].irte_mask);
+	}
+
+	irq_2_iommu[irq].iommu = NULL;
+	irq_2_iommu[irq].irte_index = 0;
+	irq_2_iommu[irq].sub_handle = 0;
+	irq_2_iommu[irq].irte_mask = 0;
+
+	spin_unlock(&irq_2_ir_lock);
+
+	return 0;
+}
+
 static void iommu_set_intr_remapping(struct intel_iommu *iommu, int mode)
 {
 	u64 addr;
Index: tree-x86/include/linux/dmar.h
===================================================================
--- tree-x86.orig/include/linux/dmar.h	2008-07-10 09:52:03.000000000 -0700
+++ tree-x86/include/linux/dmar.h	2008-07-10 09:52:05.000000000 -0700
@@ -98,7 +98,19 @@
 		__u64 high;
 	};
 };
+extern int get_irte(int irq, struct irte *entry);
+extern int modify_irte(int irq, struct irte *irte_modified);
+extern int alloc_irte(struct intel_iommu *iommu, int irq, u16 count);
+extern int set_irte_irq(int irq, struct intel_iommu *iommu, u16 index,
+   			u16 sub_handle);
+extern int map_irq_to_irte_handle(int irq, u16 *sub_handle);
+extern int clear_irte_irq(int irq, struct intel_iommu *iommu, u16 index);
+extern int flush_irte(int irq);
+extern int free_irte(int irq);
+
+extern int irq_remapped(int irq);
 #else
+#define irq_remapped(irq)		(0)
 #define enable_intr_remapping(mode)	(-1)
 #define intr_remapping_enabled		(0)
 #endif
Index: tree-x86/drivers/pci/intel-iommu.h
===================================================================
--- tree-x86.orig/drivers/pci/intel-iommu.h	2008-07-10 09:52:03.000000000 -0700
+++ tree-x86/drivers/pci/intel-iommu.h	2008-07-10 09:52:05.000000000 -0700
@@ -123,6 +123,7 @@
 #define ecap_qis(e)		((e) & 0x2)
 #define ecap_eim_support(e)	((e >> 4) & 0x1)
 #define ecap_ir_support(e)	((e >> 3) & 0x1)
+#define ecap_max_handle_mask(e) ((e >> 20) & 0xf)
 
 
 /* IOTLB_REG */
@@ -255,6 +256,8 @@
 #define INTR_REMAP_PAGE_ORDER	8
 #define INTR_REMAP_TABLE_REG_SIZE	0xf
 
+#define INTR_REMAP_TABLE_ENTRIES	65536
+
 struct ir_table {
 	struct irte *base;
 };
@@ -300,4 +303,5 @@
 extern int dmar_enable_qi(struct intel_iommu *iommu);
 extern void qi_global_iec(struct intel_iommu *iommu);
 
+extern void qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu);
 #endif

-- 


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [patch 11/26] x64, x2apic/intr-remap: generic irq migration support from process context
  2008-07-10 18:16 [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support Suresh Siddha
                   ` (9 preceding siblings ...)
  2008-07-10 18:16 ` [patch 10/26] x64, x2apic/intr-remap: routines managing Interrupt remapping table entries Suresh Siddha
@ 2008-07-10 18:16 ` Suresh Siddha
  2008-07-10 23:08   ` Eric W. Biederman
  2008-07-10 18:16 ` [patch 12/26] x64, x2apic/intr-remap: 8259 specific mask/unmask routines Suresh Siddha
                   ` (16 subsequent siblings)
  27 siblings, 1 reply; 87+ messages in thread
From: Suresh Siddha @ 2008-07-10 18:16 UTC (permalink / raw)
  To: mingo, hpa, tglx, akpm, arjan, andi, ebiederm, jbarnes, steiner
  Cc: linux-kernel, Suresh Siddha

[-- Attachment #1: irq_migration_from_process_context.patch --]
[-- Type: text/plain, Size: 1628 bytes --]

Generic infrastructure for migrating the irq from the process context in the
presence of CONFIG_GENERIC_PENDING_IRQ.

This will be used later for migrating irq in the presence of
interrupt-remapping.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---

Index: tree-x86/include/linux/irq.h
===================================================================
--- tree-x86.orig/include/linux/irq.h	2008-07-10 09:51:45.000000000 -0700
+++ tree-x86/include/linux/irq.h	2008-07-10 09:52:07.000000000 -0700
@@ -62,6 +62,7 @@
 #define IRQ_MOVE_PENDING	0x00200000	/* need to re-target IRQ destination */
 #define IRQ_NO_BALANCING	0x00400000	/* IRQ is excluded from balancing */
 #define IRQ_SPURIOUS_DISABLED	0x00800000	/* IRQ was disabled by the spurious trap */
+#define IRQ_MOVE_PCNTXT	0x01000000	/* IRQ migration from process context */
 
 #ifdef CONFIG_IRQ_PER_CPU
 # define CHECK_IRQ_PER_CPU(var) ((var) & IRQ_PER_CPU)
Index: tree-x86/kernel/irq/manage.c
===================================================================
--- tree-x86.orig/kernel/irq/manage.c	2008-07-10 09:51:45.000000000 -0700
+++ tree-x86/kernel/irq/manage.c	2008-07-10 09:52:07.000000000 -0700
@@ -89,7 +89,14 @@
 	set_balance_irq_affinity(irq, cpumask);
 
 #ifdef CONFIG_GENERIC_PENDING_IRQ
-	set_pending_irq(irq, cpumask);
+	if (desc->status & IRQ_MOVE_PCNTXT) {
+		unsigned long flags;
+
+		spin_lock_irqsave(&desc->lock, flags);
+		desc->chip->set_affinity(irq, cpumask);
+		spin_unlock_irqrestore(&desc->lock, flags);
+	} else
+		set_pending_irq(irq, cpumask);
 #else
 	desc->affinity = cpumask;
 	desc->chip->set_affinity(irq, cpumask);

-- 


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [patch 12/26] x64, x2apic/intr-remap: 8259 specific mask/unmask routines
  2008-07-10 18:16 [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support Suresh Siddha
                   ` (10 preceding siblings ...)
  2008-07-10 18:16 ` [patch 11/26] x64, x2apic/intr-remap: generic irq migration support from process context Suresh Siddha
@ 2008-07-10 18:16 ` Suresh Siddha
  2008-07-10 18:16 ` [patch 13/26] x64, x2apic/intr-remap: ioapic routines which deal with initial io-apic RTE setup Suresh Siddha
                   ` (15 subsequent siblings)
  27 siblings, 0 replies; 87+ messages in thread
From: Suresh Siddha @ 2008-07-10 18:16 UTC (permalink / raw)
  To: mingo, hpa, tglx, akpm, arjan, andi, ebiederm, jbarnes, steiner
  Cc: linux-kernel, Suresh Siddha

[-- Attachment #1: 8259_routines.patch --]
[-- Type: text/plain, Size: 1510 bytes --]

8259 specific mask/unmask routines which be used later while enabling
interrupt-remapping.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---

Index: tree-x86/arch/x86/kernel/i8259.c
===================================================================
--- tree-x86.orig/arch/x86/kernel/i8259.c	2008-07-10 09:51:45.000000000 -0700
+++ tree-x86/arch/x86/kernel/i8259.c	2008-07-10 09:52:09.000000000 -0700
@@ -282,6 +282,30 @@
 
 device_initcall(i8259A_init_sysfs);
 
+void mask_8259A(void)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&i8259A_lock, flags);
+
+	outb(0xff, PIC_MASTER_IMR);	/* mask all of 8259A-1 */
+	outb(0xff, PIC_SLAVE_IMR);	/* mask all of 8259A-2 */
+
+	spin_unlock_irqrestore(&i8259A_lock, flags);
+}
+
+void unmask_8259A(void)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&i8259A_lock, flags);
+
+	outb(cached_master_mask, PIC_MASTER_IMR); /* restore master IRQ mask */
+	outb(cached_slave_mask, PIC_SLAVE_IMR);	  /* restore slave IRQ mask */
+
+	spin_unlock_irqrestore(&i8259A_lock, flags);
+}
+
 void init_8259A(int auto_eoi)
 {
 	unsigned long flags;
Index: tree-x86/include/asm-x86/i8259.h
===================================================================
--- tree-x86.orig/include/asm-x86/i8259.h	2008-07-10 09:51:45.000000000 -0700
+++ tree-x86/include/asm-x86/i8259.h	2008-07-10 09:52:09.000000000 -0700
@@ -57,4 +57,7 @@
 
 extern struct irq_chip i8259A_chip;
 
+extern void mask_8259A(void);
+extern void unmask_8259A(void);
+
 #endif	/* __ASM_I8259_H__ */

-- 


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [patch 13/26] x64, x2apic/intr-remap: ioapic routines which deal with initial io-apic RTE setup
  2008-07-10 18:16 [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support Suresh Siddha
                   ` (11 preceding siblings ...)
  2008-07-10 18:16 ` [patch 12/26] x64, x2apic/intr-remap: 8259 specific mask/unmask routines Suresh Siddha
@ 2008-07-10 18:16 ` Suresh Siddha
  2008-07-10 18:16 ` [patch 14/26] x64, x2apic/intr-remap: introduce read_apic_id() to genapic routines Suresh Siddha
                   ` (14 subsequent siblings)
  27 siblings, 0 replies; 87+ messages in thread
From: Suresh Siddha @ 2008-07-10 18:16 UTC (permalink / raw)
  To: mingo, hpa, tglx, akpm, arjan, andi, ebiederm, jbarnes, steiner
  Cc: linux-kernel, Suresh Siddha

[-- Attachment #1: ioapic_routines.patch --]
[-- Type: text/plain, Size: 3153 bytes --]

Generic ioapic specific routines which be used later during enabling
interrupt-remapping.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---

Index: tree-x86/arch/x86/kernel/io_apic_64.c
===================================================================
--- tree-x86.orig/arch/x86/kernel/io_apic_64.c	2008-07-10 09:51:45.000000000 -0700
+++ tree-x86/arch/x86/kernel/io_apic_64.c	2008-07-10 09:52:11.000000000 -0700
@@ -114,6 +114,9 @@
  */
 int nr_ioapic_registers[MAX_IO_APICS];
 
+/* I/O APIC RTE contents at the OS boot up */
+struct IO_APIC_route_entry *early_ioapic_entries[MAX_IO_APICS];
+
 /* I/O APIC entries */
 struct mp_config_ioapic mp_ioapics[MAX_IO_APICS];
 int nr_ioapics;
@@ -446,6 +449,69 @@
 			clear_IO_APIC_pin(apic, pin);
 }
 
+/*
+ * Saves and masks all the unmasked IO-APIC RTE's
+ */
+int save_mask_IO_APIC_setup(void)
+{
+	union IO_APIC_reg_01 reg_01;
+	unsigned long flags;
+	int apic, pin;
+
+	/*
+	 * The number of IO-APIC IRQ registers (== #pins):
+	 */
+	for (apic = 0; apic < nr_ioapics; apic++) {
+		spin_lock_irqsave(&ioapic_lock, flags);
+		reg_01.raw = io_apic_read(apic, 1);
+		spin_unlock_irqrestore(&ioapic_lock, flags);
+		nr_ioapic_registers[apic] = reg_01.bits.entries+1;
+	}
+
+	for (apic = 0; apic < nr_ioapics; apic++) {
+		early_ioapic_entries[apic] =
+			kzalloc(sizeof(struct IO_APIC_route_entry) *
+				nr_ioapic_registers[apic], GFP_KERNEL);
+		if (!early_ioapic_entries[apic])
+			return -ENOMEM;
+	}
+
+	for (apic = 0; apic < nr_ioapics; apic++)
+		for (pin = 0; pin < nr_ioapic_registers[apic]; pin++) {
+			struct IO_APIC_route_entry entry;
+
+			entry = early_ioapic_entries[apic][pin] =
+				ioapic_read_entry(apic, pin);
+			if (!entry.mask) {
+				entry.mask = 1;
+				ioapic_write_entry(apic, pin, entry);
+			}
+		}
+	return 0;
+}
+
+void restore_IO_APIC_setup(void)
+{
+	int apic, pin;
+
+	for (apic = 0; apic < nr_ioapics; apic++)
+		for (pin = 0; pin < nr_ioapic_registers[apic]; pin++)
+			ioapic_write_entry(apic, pin,
+					   early_ioapic_entries[apic][pin]);
+}
+
+void reinit_intr_remapped_IO_APIC(int intr_remapping)
+{
+	/*
+	 * for now plain restore of previous settings.
+	 * TBD: In the case of OS enabling interrupt-remapping,
+	 * IO-APIC RTE's need to be setup to point to interrupt-remapping
+	 * table entries. for now, do a plain restore, and wait for
+	 * the setup_IO_APIC_irqs() to do proper initialization.
+	 */
+	restore_IO_APIC_setup();
+}
+
 int skip_ioapic_setup;
 int ioapic_force;
 
Index: tree-x86/include/asm-x86/io_apic.h
===================================================================
--- tree-x86.orig/include/asm-x86/io_apic.h	2008-07-10 09:51:45.000000000 -0700
+++ tree-x86/include/asm-x86/io_apic.h	2008-07-10 09:52:11.000000000 -0700
@@ -193,6 +193,12 @@
 extern int (*ioapic_renumber_irq)(int ioapic, int irq);
 extern void ioapic_init_mappings(void);
 
+#ifdef CONFIG_X86_64
+extern int save_mask_IO_APIC_setup(void);
+extern void restore_IO_APIC_setup(void);
+extern void reinit_intr_remapped_IO_APIC(int);
+#endif
+
 #else  /* !CONFIG_X86_IO_APIC */
 #define io_apic_assign_pci_irqs 0
 static const int timer_through_8259 = 0;

-- 


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [patch 14/26] x64, x2apic/intr-remap: introduce read_apic_id() to genapic routines
  2008-07-10 18:16 [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support Suresh Siddha
                   ` (12 preceding siblings ...)
  2008-07-10 18:16 ` [patch 13/26] x64, x2apic/intr-remap: ioapic routines which deal with initial io-apic RTE setup Suresh Siddha
@ 2008-07-10 18:16 ` Suresh Siddha
  2008-07-10 18:16 ` [patch 15/26] x64, x2apic/intr-remap: basic apic ops support Suresh Siddha
                   ` (13 subsequent siblings)
  27 siblings, 0 replies; 87+ messages in thread
From: Suresh Siddha @ 2008-07-10 18:16 UTC (permalink / raw)
  To: mingo, hpa, tglx, akpm, arjan, andi, ebiederm, jbarnes, steiner
  Cc: linux-kernel, Suresh Siddha

[-- Attachment #1: genapic_read_apic_id.patch --]
[-- Type: text/plain, Size: 6036 bytes --]

Move the read_apic_id()  to genapic routines.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---

Index: tree-x86/arch/x86/kernel/genapic_64.c
===================================================================
--- tree-x86.orig/arch/x86/kernel/genapic_64.c	2008-07-10 09:51:45.000000000 -0700
+++ tree-x86/arch/x86/kernel/genapic_64.c	2008-07-10 09:52:12.000000000 -0700
@@ -79,17 +79,6 @@
 	return 0;
 }
 
-unsigned int read_apic_id(void)
-{
-	unsigned int id;
-
-	WARN_ON(preemptible() && num_online_cpus() > 1);
-	id = apic_read(APIC_ID);
-	if (uv_system_type >= UV_X2APIC)
-		id  |= __get_cpu_var(x2apic_extra_bits);
-	return id;
-}
-
 enum uv_system_type get_uv_system_type(void)
 {
 	return uv_system_type;
Index: tree-x86/arch/x86/kernel/genapic_flat_64.c
===================================================================
--- tree-x86.orig/arch/x86/kernel/genapic_flat_64.c	2008-07-10 09:51:45.000000000 -0700
+++ tree-x86/arch/x86/kernel/genapic_flat_64.c	2008-07-10 09:52:12.000000000 -0700
@@ -15,9 +15,11 @@
 #include <linux/kernel.h>
 #include <linux/ctype.h>
 #include <linux/init.h>
+#include <linux/hardirq.h>
 #include <asm/smp.h>
 #include <asm/ipi.h>
 #include <asm/genapic.h>
+#include <mach_apicdef.h>
 
 static cpumask_t flat_target_cpus(void)
 {
@@ -95,9 +97,17 @@
 		__send_IPI_shortcut(APIC_DEST_ALLINC, vector, APIC_DEST_LOGICAL);
 }
 
+static unsigned int read_xapic_id(void)
+{
+	unsigned int id;
+
+	id = GET_XAPIC_ID(apic_read(APIC_ID));
+	return id;
+}
+
 static int flat_apic_id_registered(void)
 {
-	return physid_isset(GET_APIC_ID(read_apic_id()), phys_cpu_present_map);
+	return physid_isset(read_xapic_id(), phys_cpu_present_map);
 }
 
 static unsigned int flat_cpu_mask_to_apicid(cpumask_t cpumask)
@@ -123,6 +133,7 @@
 	.send_IPI_mask = flat_send_IPI_mask,
 	.cpu_mask_to_apicid = flat_cpu_mask_to_apicid,
 	.phys_pkg_id = phys_pkg_id,
+	.read_apic_id = read_xapic_id,
 };
 
 /*
@@ -187,4 +198,5 @@
 	.send_IPI_mask = physflat_send_IPI_mask,
 	.cpu_mask_to_apicid = physflat_cpu_mask_to_apicid,
 	.phys_pkg_id = phys_pkg_id,
+	.read_apic_id = read_xapic_id,
 };
Index: tree-x86/arch/x86/kernel/genx2apic_uv_x.c
===================================================================
--- tree-x86.orig/arch/x86/kernel/genx2apic_uv_x.c	2008-07-10 09:51:45.000000000 -0700
+++ tree-x86/arch/x86/kernel/genx2apic_uv_x.c	2008-07-10 09:52:12.000000000 -0700
@@ -18,6 +18,7 @@
 #include <linux/sched.h>
 #include <linux/bootmem.h>
 #include <linux/module.h>
+#include <linux/hardirq.h>
 #include <asm/smp.h>
 #include <asm/ipi.h>
 #include <asm/genapic.h>
@@ -134,9 +135,19 @@
 		return BAD_APICID;
 }
 
+static unsigned int uv_read_apic_id(void)
+{
+	unsigned int id;
+
+	WARN_ON(preemptible() && num_online_cpus() > 1);
+	id = apic_read(APIC_ID) | __get_cpu_var(x2apic_extra_bits);
+
+	return id;
+}
+
 static unsigned int phys_pkg_id(int index_msb)
 {
-	return GET_APIC_ID(read_apic_id()) >> index_msb;
+	return uv_read_apic_id() >> index_msb;
 }
 
 #ifdef ZZZ		/* Needs x2apic patch */
@@ -159,6 +170,7 @@
 	/* ZZZ.send_IPI_self = uv_send_IPI_self, */
 	.cpu_mask_to_apicid = uv_cpu_mask_to_apicid,
 	.phys_pkg_id = phys_pkg_id,	/* Fixme ZZZ */
+	.read_apic_id = uv_read_apic_id,
 };
 
 static __cpuinit void set_x2apic_extra_bits(int pnode)
Index: tree-x86/include/asm-x86/genapic_64.h
===================================================================
--- tree-x86.orig/include/asm-x86/genapic_64.h	2008-07-10 09:51:45.000000000 -0700
+++ tree-x86/include/asm-x86/genapic_64.h	2008-07-10 09:52:12.000000000 -0700
@@ -27,6 +27,7 @@
 	/* */
 	unsigned int (*cpu_mask_to_apicid)(cpumask_t cpumask);
 	unsigned int (*phys_pkg_id)(int index_msb);
+	unsigned int (*read_apic_id)(void);
 };
 
 extern struct genapic *genapic;
Index: tree-x86/include/asm-x86/mach-default/mach_apic.h
===================================================================
--- tree-x86.orig/include/asm-x86/mach-default/mach_apic.h	2008-07-10 09:51:45.000000000 -0700
+++ tree-x86/include/asm-x86/mach-default/mach_apic.h	2008-07-10 09:52:12.000000000 -0700
@@ -30,6 +30,7 @@
 #define cpu_mask_to_apicid (genapic->cpu_mask_to_apicid)
 #define phys_pkg_id	(genapic->phys_pkg_id)
 #define vector_allocation_domain    (genapic->vector_allocation_domain)
+#define read_apic_id  (genapic->read_apic_id)
 extern void setup_apic_routing(void);
 #else
 #define INT_DELIVERY_MODE dest_LowestPrio
Index: tree-x86/include/asm-x86/smp.h
===================================================================
--- tree-x86.orig/include/asm-x86/smp.h	2008-07-10 09:51:45.000000000 -0700
+++ tree-x86/include/asm-x86/smp.h	2008-07-10 09:52:12.000000000 -0700
@@ -176,12 +176,10 @@
 {
 	return *(u32 *)(APIC_BASE + APIC_ID);
 }
-#else
-extern unsigned int read_apic_id(void);
 #endif
 
 
-# ifdef APIC_DEFINITION
+# if defined(APIC_DEFINITION) || defined(CONFIG_X86_64)
 extern int hard_smp_processor_id(void);
 # else
 #  include <mach_apicdef.h>
Index: tree-x86/include/asm-x86/mach-default/mach_apicdef.h
===================================================================
--- tree-x86.orig/include/asm-x86/mach-default/mach_apicdef.h	2008-07-10 09:51:45.000000000 -0700
+++ tree-x86/include/asm-x86/mach-default/mach_apicdef.h	2008-07-10 09:52:12.000000000 -0700
@@ -5,8 +5,9 @@
 
 #ifdef CONFIG_X86_64
 #define	APIC_ID_MASK		(0xFFu<<24)
-#define GET_APIC_ID(x)          (((x)>>24)&0xFFu)
+#define GET_APIC_ID(x)          (x)
 #define	SET_APIC_ID(x)		(((x)<<24))
+#define GET_XAPIC_ID(x)		(((x) >> 24) & 0xFFu)
 #else
 #define		APIC_ID_MASK		(0xF<<24)
 static inline unsigned get_apic_id(unsigned long x) 
Index: tree-x86/arch/x86/kernel/apic_64.c
===================================================================
--- tree-x86.orig/arch/x86/kernel/apic_64.c	2008-07-10 09:51:45.000000000 -0700
+++ tree-x86/arch/x86/kernel/apic_64.c	2008-07-10 09:52:12.000000000 -0700
@@ -1096,6 +1096,11 @@
 	cpu_set(cpu, cpu_present_map);
 }
 
+int hard_smp_processor_id(void)
+{
+	return read_apic_id();
+}
+
 /*
  * Power management
  */

-- 


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [patch 15/26] x64, x2apic/intr-remap: basic apic ops support
  2008-07-10 18:16 [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support Suresh Siddha
                   ` (13 preceding siblings ...)
  2008-07-10 18:16 ` [patch 14/26] x64, x2apic/intr-remap: introduce read_apic_id() to genapic routines Suresh Siddha
@ 2008-07-10 18:16 ` Suresh Siddha
  2008-07-10 18:16 ` [patch 16/26] x64, x2apic/intr-remap: cpuid bits for x2apic feature Suresh Siddha
                   ` (12 subsequent siblings)
  27 siblings, 0 replies; 87+ messages in thread
From: Suresh Siddha @ 2008-07-10 18:16 UTC (permalink / raw)
  To: mingo, hpa, tglx, akpm, arjan, andi, ebiederm, jbarnes, steiner
  Cc: linux-kernel, Suresh Siddha

[-- Attachment #1: apic_ops.patch --]
[-- Type: text/plain, Size: 11951 bytes --]

Introduce basic apic operations which handle the apic programming. This
will be used later to introduce another specific operations for x2apic.

For the perfomance critial accesses like IPI's, EOI etc, we use the
native operations as they are already referenced by different
indirections like genapic, irq_chip etc.

64bit Paravirt ops can also define their apic operations accordingly.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---

Index: tree-x86/arch/x86/kernel/apic_64.c
===================================================================
--- tree-x86.orig/arch/x86/kernel/apic_64.c	2008-07-10 09:52:12.000000000 -0700
+++ tree-x86/arch/x86/kernel/apic_64.c	2008-07-10 09:52:14.000000000 -0700
@@ -119,13 +119,13 @@
 	return lapic_get_version() >= 0x14;
 }
 
-void apic_wait_icr_idle(void)
+void xapic_wait_icr_idle(void)
 {
 	while (apic_read(APIC_ICR) & APIC_ICR_BUSY)
 		cpu_relax();
 }
 
-u32 safe_apic_wait_icr_idle(void)
+u32 safe_xapic_wait_icr_idle(void)
 {
 	u32 send_status;
 	int timeout;
@@ -141,6 +141,36 @@
 	return send_status;
 }
 
+void xapic_icr_write(u32 low, u32 id)
+{
+	apic_write(APIC_ICR2, id << 24);
+	apic_write(APIC_ICR, low);
+}
+
+u64 xapic_icr_read(void)
+{
+	u32 icr1, icr2;
+
+	icr2 = apic_read(APIC_ICR2);
+	icr1 = apic_read(APIC_ICR);
+
+	return (icr1 | ((u64)icr2 << 32));
+}
+
+static struct apic_ops xapic_ops = {
+	.read = native_apic_mem_read,
+	.write = native_apic_mem_write,
+	.write_atomic = native_apic_mem_write_atomic,
+	.icr_read = xapic_icr_read,
+	.icr_write = xapic_icr_write,
+	.wait_icr_idle = xapic_wait_icr_idle,
+	.safe_wait_icr_idle = safe_xapic_wait_icr_idle,
+};
+
+struct apic_ops __read_mostly *apic_ops = &xapic_ops;
+
+EXPORT_SYMBOL_GPL(apic_ops);
+
 /**
  * enable_NMI_through_LVT0 - enable NMI through local vector table 0
  */
Index: tree-x86/arch/x86/kernel/paravirt.c
===================================================================
--- tree-x86.orig/arch/x86/kernel/paravirt.c	2008-07-10 09:51:45.000000000 -0700
+++ tree-x86/arch/x86/kernel/paravirt.c	2008-07-10 09:52:14.000000000 -0700
@@ -373,9 +373,11 @@
 
 struct pv_apic_ops pv_apic_ops = {
 #ifdef CONFIG_X86_LOCAL_APIC
-	.apic_write = native_apic_write,
-	.apic_write_atomic = native_apic_write_atomic,
-	.apic_read = native_apic_read,
+#ifnded CONFIG_X86_64
+	.apic_write = native_apic_mem_write,
+	.apic_write_atomic = native_apic_mem_write_atomic,
+	.apic_read = native_apic_mem_read,
+#endif
 	.setup_boot_clock = setup_boot_APIC_clock,
 	.setup_secondary_clock = setup_secondary_APIC_clock,
 	.startup_ipi_hook = paravirt_nop,
Index: tree-x86/include/asm-x86/apic.h
===================================================================
--- tree-x86.orig/include/asm-x86/apic.h	2008-07-10 09:51:45.000000000 -0700
+++ tree-x86/include/asm-x86/apic.h	2008-07-10 09:52:14.000000000 -0700
@@ -47,32 +47,59 @@
 #ifdef CONFIG_PARAVIRT
 #include <asm/paravirt.h>
 #else
-#define apic_write native_apic_write
-#define apic_write_atomic native_apic_write_atomic
-#define apic_read native_apic_read
+#ifndef CONFIG_X86_64
+#define apic_write native_apic_mem_write
+#define apic_write_atomic native_apic_mem_write_atomic
+#define apic_read native_apic_mem_read
+#endif
 #define setup_boot_clock setup_boot_APIC_clock
 #define setup_secondary_clock setup_secondary_APIC_clock
 #endif
 
 extern int is_vsmp_box(void);
 
-static inline void native_apic_write(unsigned long reg, u32 v)
+static inline void native_apic_mem_write(u32 reg, u32 v)
 {
 	*((volatile u32 *)(APIC_BASE + reg)) = v;
 }
 
-static inline void native_apic_write_atomic(unsigned long reg, u32 v)
+static inline void native_apic_mem_write_atomic(u32 reg, u32 v)
 {
 	(void)xchg((u32 *)(APIC_BASE + reg), v);
 }
 
-static inline u32 native_apic_read(unsigned long reg)
+static inline u32 native_apic_mem_read(u32 reg)
 {
 	return *((volatile u32 *)(APIC_BASE + reg));
 }
 
+#ifdef CONFIG_X86_32
 extern void apic_wait_icr_idle(void);
 extern u32 safe_apic_wait_icr_idle(void);
+extern void apic_icr_write(u32 low, u32 id);
+#else
+
+struct apic_ops {
+	u32 (*read)(u32 reg);
+	void (*write)(u32 reg, u32 v);
+	void (*write_atomic)(u32 reg, u32 v);
+	u64 (*icr_read)(void);
+	void (*icr_write)(u32 low, u32 high);
+	void (*wait_icr_idle)(void);
+	u32 (*safe_wait_icr_idle)(void);
+};
+
+extern struct apic_ops *apic_ops;
+
+#define apic_read (apic_ops->read)
+#define apic_write (apic_ops->write)
+#define apic_write_atomic (apic_ops->write_atomic)
+#define apic_icr_read (apic_ops->icr_read)
+#define apic_icr_write (apic_ops->icr_write)
+#define apic_wait_icr_idle (apic_ops->wait_icr_idle)
+#define safe_apic_wait_icr_idle (apic_ops->safe_wait_icr_idle)
+#endif
+
 extern int get_physical_broadcast(void);
 
 #ifdef CONFIG_X86_GOOD_APIC
@@ -95,7 +122,11 @@
 	 */
 
 	/* Docs say use 0 for future compatibility */
+#ifdef CONFIG_X86_32
 	apic_write_around(APIC_EOI, 0);
+#else
+	native_apic_mem_write(APIC_EOI, 0);
+#endif
 }
 
 extern int lapic_get_maxlvt(void);
Index: tree-x86/include/asm-x86/paravirt.h
===================================================================
--- tree-x86.orig/include/asm-x86/paravirt.h	2008-07-10 09:51:45.000000000 -0700
+++ tree-x86/include/asm-x86/paravirt.h	2008-07-10 09:52:14.000000000 -0700
@@ -902,6 +902,7 @@
 /*
  * Basic functions accessing APICs.
  */
+#ifndef CONFIG_X86_64
 static inline void apic_write(unsigned long reg, u32 v)
 {
 	PVOP_VCALL2(pv_apic_ops.apic_write, reg, v);
@@ -916,6 +917,7 @@
 {
 	return PVOP_CALL1(unsigned long, pv_apic_ops.apic_read, reg);
 }
+#endif
 
 static inline void setup_boot_clock(void)
 {
Index: tree-x86/include/asm-x86/ipi.h
===================================================================
--- tree-x86.orig/include/asm-x86/ipi.h	2008-07-10 09:51:45.000000000 -0700
+++ tree-x86/include/asm-x86/ipi.h	2008-07-10 09:52:14.000000000 -0700
@@ -49,6 +49,12 @@
 	return SET_APIC_DEST_FIELD(mask);
 }
 
+static inline void __xapic_wait_icr_idle(void)
+{
+	while (native_apic_mem_read(APIC_ICR) & APIC_ICR_BUSY)
+		cpu_relax();
+}
+
 static inline void __send_IPI_shortcut(unsigned int shortcut, int vector,
 				       unsigned int dest)
 {
@@ -64,7 +70,7 @@
 	/*
 	 * Wait for idle.
 	 */
-	apic_wait_icr_idle();
+	__xapic_wait_icr_idle();
 
 	/*
 	 * No need to touch the target chip field
@@ -74,7 +80,7 @@
 	/*
 	 * Send the IPI. The write to APIC_ICR fires this off.
 	 */
-	apic_write(APIC_ICR, cfg);
+	native_apic_mem_write(APIC_ICR, cfg);
 }
 
 /*
@@ -92,13 +98,13 @@
 	if (unlikely(vector == NMI_VECTOR))
 		safe_apic_wait_icr_idle();
 	else
-		apic_wait_icr_idle();
+		__xapic_wait_icr_idle();
 
 	/*
 	 * prepare target chip field
 	 */
 	cfg = __prepare_ICR2(mask);
-	apic_write(APIC_ICR2, cfg);
+	native_apic_mem_write(APIC_ICR2, cfg);
 
 	/*
 	 * program the ICR
@@ -108,7 +114,7 @@
 	/*
 	 * Send the IPI. The write to APIC_ICR fires this off.
 	 */
-	apic_write(APIC_ICR, cfg);
+	native_apic_mem_write(APIC_ICR, cfg);
 }
 
 static inline void send_IPI_mask_sequence(cpumask_t mask, int vector)
Index: tree-x86/arch/x86/kernel/apic_32.c
===================================================================
--- tree-x86.orig/arch/x86/kernel/apic_32.c	2008-07-10 09:51:45.000000000 -0700
+++ tree-x86/arch/x86/kernel/apic_32.c	2008-07-10 09:52:14.000000000 -0700
@@ -145,6 +145,12 @@
 	return lapic_get_version() >= 0x14;
 }
 
+void apic_icr_write(u32 low, u32 id)
+{
+	apic_write_around(APIC_ICR2, SET_APIC_DEST_FIELD(id));
+	apic_write_around(APIC_ICR, low);
+}
+
 void apic_wait_icr_idle(void)
 {
 	while (apic_read(APIC_ICR) & APIC_ICR_BUSY)
Index: tree-x86/arch/x86/kernel/smpboot.c
===================================================================
--- tree-x86.orig/arch/x86/kernel/smpboot.c	2008-07-10 09:51:45.000000000 -0700
+++ tree-x86/arch/x86/kernel/smpboot.c	2008-07-10 09:52:14.000000000 -0700
@@ -123,7 +123,6 @@
 
 static atomic_t init_deasserted;
 
-static int boot_cpu_logical_apicid;
 
 /* representing cpus for which sibling maps can be computed */
 static cpumask_t cpu_sibling_setup_map;
@@ -165,6 +164,8 @@
 #endif
 
 #ifdef CONFIG_X86_32
+static int boot_cpu_logical_apicid;
+
 u8 cpu_2_logical_apicid[NR_CPUS] __read_mostly =
 					{ [0 ... NR_CPUS-1] = BAD_APICID };
 
@@ -546,8 +547,7 @@
 			printk(KERN_CONT
 			       "a previous APIC delivery may have failed\n");
 
-		apic_write_around(APIC_ICR2, SET_APIC_DEST_FIELD(apicid));
-		apic_write_around(APIC_ICR, APIC_DM_REMRD | regs[i]);
+		apic_icr_write(APIC_DM_REMRD | regs[i], apicid);
 
 		timeout = 0;
 		do {
@@ -579,11 +579,9 @@
 	int maxlvt;
 
 	/* Target chip */
-	apic_write_around(APIC_ICR2, SET_APIC_DEST_FIELD(logical_apicid));
-
 	/* Boot on the stack */
 	/* Kick the second */
-	apic_write_around(APIC_ICR, APIC_DM_NMI | APIC_DEST_LOGICAL);
+	apic_icr_write(APIC_DM_NMI | APIC_DEST_LOGICAL, logical_apicid);
 
 	Dprintk("Waiting for send to finish...\n");
 	send_status = safe_apic_wait_icr_idle();
@@ -639,13 +637,11 @@
 	/*
 	 * Turn INIT on target chip
 	 */
-	apic_write_around(APIC_ICR2, SET_APIC_DEST_FIELD(phys_apicid));
-
 	/*
 	 * Send IPI
 	 */
-	apic_write_around(APIC_ICR, APIC_INT_LEVELTRIG | APIC_INT_ASSERT
-				| APIC_DM_INIT);
+	apic_icr_write(APIC_INT_LEVELTRIG | APIC_INT_ASSERT | APIC_DM_INIT,
+		       phys_apicid);
 
 	Dprintk("Waiting for send to finish...\n");
 	send_status = safe_apic_wait_icr_idle();
@@ -655,10 +651,8 @@
 	Dprintk("Deasserting INIT.\n");
 
 	/* Target chip */
-	apic_write_around(APIC_ICR2, SET_APIC_DEST_FIELD(phys_apicid));
-
 	/* Send IPI */
-	apic_write_around(APIC_ICR, APIC_INT_LEVELTRIG | APIC_DM_INIT);
+	apic_icr_write(APIC_INT_LEVELTRIG | APIC_DM_INIT, phys_apicid);
 
 	Dprintk("Waiting for send to finish...\n");
 	send_status = safe_apic_wait_icr_idle();
@@ -703,12 +697,10 @@
 		 */
 
 		/* Target chip */
-		apic_write_around(APIC_ICR2, SET_APIC_DEST_FIELD(phys_apicid));
-
 		/* Boot on the stack */
 		/* Kick the second */
-		apic_write_around(APIC_ICR, APIC_DM_STARTUP
-					| (start_eip >> 12));
+		apic_icr_write(APIC_DM_STARTUP | (start_eip >> 12),
+			       phys_apicid);
 
 		/*
 		 * Give the other CPU some time to accept the IPI.
@@ -1147,7 +1139,9 @@
 	 * Setup boot CPU information
 	 */
 	smp_store_cpu_info(0); /* Final full version of the data */
+#ifdef CONFIG_X86_32
 	boot_cpu_logical_apicid = logical_smp_processor_id();
+#endif
 	current_thread_info()->cpu = 0;  /* needed? */
 	set_cpu_sibling_map(0);
 
Index: tree-x86/arch/x86/kernel/io_apic_64.c
===================================================================
--- tree-x86.orig/arch/x86/kernel/io_apic_64.c	2008-07-10 09:52:11.000000000 -0700
+++ tree-x86/arch/x86/kernel/io_apic_64.c	2008-07-10 09:52:14.000000000 -0700
@@ -1157,6 +1157,7 @@
 void __apicdebuginit print_local_APIC(void * dummy)
 {
 	unsigned int v, ver, maxlvt;
+	unsigned long icr;
 
 	if (apic_verbosity == APIC_QUIET)
 		return;
@@ -1200,10 +1201,9 @@
 	v = apic_read(APIC_ESR);
 	printk(KERN_DEBUG "... APIC ESR: %08x\n", v);
 
-	v = apic_read(APIC_ICR);
-	printk(KERN_DEBUG "... APIC ICR: %08x\n", v);
-	v = apic_read(APIC_ICR2);
-	printk(KERN_DEBUG "... APIC ICR2: %08x\n", v);
+	icr = apic_icr_read();
+	printk(KERN_DEBUG "... APIC ICR: %08x\n", icr);
+	printk(KERN_DEBUG "... APIC ICR2: %08x\n", icr >> 32);
 
 	v = apic_read(APIC_LVTT);
 	printk(KERN_DEBUG "... APIC LVTT: %08x\n", v);
Index: tree-x86/include/asm-x86/smp.h
===================================================================
--- tree-x86.orig/include/asm-x86/smp.h	2008-07-10 09:52:12.000000000 -0700
+++ tree-x86/include/asm-x86/smp.h	2008-07-10 09:52:14.000000000 -0700
@@ -165,13 +165,13 @@
 
 #ifdef CONFIG_X86_LOCAL_APIC
 
+#ifndef CONFIG_X86_64
 static inline int logical_smp_processor_id(void)
 {
 	/* we don't want to mark this access volatile - bad code generation */
 	return GET_APIC_LOGICAL_ID(*(u32 *)(APIC_BASE + APIC_LDR));
 }
 
-#ifndef CONFIG_X86_64
 static inline unsigned int read_apic_id(void)
 {
 	return *(u32 *)(APIC_BASE + APIC_ID);

-- 


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [patch 16/26] x64, x2apic/intr-remap: cpuid bits for x2apic feature
  2008-07-10 18:16 [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support Suresh Siddha
                   ` (14 preceding siblings ...)
  2008-07-10 18:16 ` [patch 15/26] x64, x2apic/intr-remap: basic apic ops support Suresh Siddha
@ 2008-07-10 18:16 ` Suresh Siddha
  2008-07-10 18:16 ` [patch 17/26] x64, x2apic/intr-remap: disable DMA-remapping if Interrupt-remapping is detected (temporary quirk) Suresh Siddha
                   ` (11 subsequent siblings)
  27 siblings, 0 replies; 87+ messages in thread
From: Suresh Siddha @ 2008-07-10 18:16 UTC (permalink / raw)
  To: mingo, hpa, tglx, akpm, arjan, andi, ebiederm, jbarnes, steiner
  Cc: linux-kernel, Suresh Siddha

[-- Attachment #1: x2apic_feature.patch --]
[-- Type: text/plain, Size: 1805 bytes --]

cpuid feature for x2apic.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---

Index: tree-x86/include/asm-x86/cpufeature.h
===================================================================
--- tree-x86.orig/include/asm-x86/cpufeature.h	2008-07-10 09:51:45.000000000 -0700
+++ tree-x86/include/asm-x86/cpufeature.h	2008-07-10 09:52:16.000000000 -0700
@@ -90,6 +90,7 @@
 #define X86_FEATURE_CX16	(4*32+13) /* CMPXCHG16B */
 #define X86_FEATURE_XTPR	(4*32+14) /* Send Task Priority Messages */
 #define X86_FEATURE_DCA		(4*32+18) /* Direct Cache Access */
+#define X86_FEATURE_X2APIC	(4*32+21) /* x2APIC */
 
 /* VIA/Cyrix/Centaur-defined CPU features, CPUID level 0xC0000001, word 5 */
 #define X86_FEATURE_XSTORE	(5*32+ 2) /* on-CPU RNG present (xstore insn) */
@@ -188,6 +189,7 @@
 #define cpu_has_gbpages		boot_cpu_has(X86_FEATURE_GBPAGES)
 #define cpu_has_arch_perfmon	boot_cpu_has(X86_FEATURE_ARCH_PERFMON)
 #define cpu_has_pat		boot_cpu_has(X86_FEATURE_PAT)
+#define cpu_has_x2apic		boot_cpu_has(X86_FEATURE_X2APIC)
 
 #if defined(CONFIG_X86_INVLPG) || defined(CONFIG_X86_64)
 # define cpu_has_invlpg		1
Index: tree-x86/arch/x86/kernel/cpu/feature_names.c
===================================================================
--- tree-x86.orig/arch/x86/kernel/cpu/feature_names.c	2008-07-10 09:51:45.000000000 -0700
+++ tree-x86/arch/x86/kernel/cpu/feature_names.c	2008-07-10 09:52:16.000000000 -0700
@@ -45,7 +45,7 @@
 	/* Intel-defined (#2) */
 	"pni", NULL, NULL, "monitor", "ds_cpl", "vmx", "smx", "est",
 	"tm2", "ssse3", "cid", NULL, NULL, "cx16", "xtpr", NULL,
-	NULL, NULL, "dca", "sse4_1", "sse4_2", NULL, NULL, "popcnt",
+	NULL, NULL, "dca", "sse4_1", "sse4_2", "x2apic", NULL, "popcnt",
 	NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
 
 	/* VIA/Cyrix/Centaur-defined */

-- 


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [patch 17/26] x64, x2apic/intr-remap: disable DMA-remapping if Interrupt-remapping is detected (temporary quirk)
  2008-07-10 18:16 [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support Suresh Siddha
                   ` (15 preceding siblings ...)
  2008-07-10 18:16 ` [patch 16/26] x64, x2apic/intr-remap: cpuid bits for x2apic feature Suresh Siddha
@ 2008-07-10 18:16 ` Suresh Siddha
  2008-07-10 18:16 ` [patch 18/26] x64, x2apic/intr-remap: x2apic ops for x2apic mode support Suresh Siddha
                   ` (10 subsequent siblings)
  27 siblings, 0 replies; 87+ messages in thread
From: Suresh Siddha @ 2008-07-10 18:16 UTC (permalink / raw)
  To: mingo, hpa, tglx, akpm, arjan, andi, ebiederm, jbarnes, steiner
  Cc: linux-kernel, Suresh Siddha

[-- Attachment #1: disable_dmar_if_intr_remapping.patch --]
[-- Type: text/plain, Size: 1984 bytes --]

Interrupt-remapping enables queued invalidation. And once queued invalidation
is enabled, IOTLB invalidation also needs to use the queued invalidation
mechanism and the register based IOTLB invalidation doesn't work.

For now, Support for IOTLB invalidation using queued invalidation is
missing. Meanwhile, disable DMA-remapping, if Interrupt-remapping
support is detected.

For the meanwhile, if someone wants to really enable DMA-remapping, they
can use nox2apic, which will disable interrupt-remapping and as such
doesn't enable queued invalidation.

And given that none of the release platforms support intr-remapping yet,
we should be ok for this temporary hack.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---

Index: tree-x86/drivers/pci/dmar.c
===================================================================
--- tree-x86.orig/drivers/pci/dmar.c	2008-07-10 09:52:03.000000000 -0700
+++ tree-x86/drivers/pci/dmar.c	2008-07-10 09:52:19.000000000 -0700
@@ -457,6 +457,31 @@
 
 #ifdef CONFIG_DMAR
 	{
+		struct acpi_table_dmar *dmar;
+		/*
+		 * for now we will disable dma-remapping when interrupt
+		 * remapping is enabled.
+		 * When support for queued invalidation for IOTLB invalidation
+		 * is added, we will not need this any more.
+		 */
+		dmar = (struct acpi_table_dmar *) dmar_tbl;
+		if (ret && cpu_has_x2apic && dmar->flags & 0x1) {
+			printk(KERN_INFO
+			       "Queued invalidation will be enabled to support "
+			       "x2apic and Intr-remapping.\n");
+			printk(KERN_INFO
+			       "Disabling IOMMU detection, because of missing "
+			       "queued invalidation support for IOTLB "
+			       "invalidation\n");
+			printk(KERN_INFO
+			       "Use \"nox2apic\", if you want to use Intel "
+			       " IOMMU for DMA-remapping and don't care about "
+			       " x2apic support\n");
+
+			dmar_disabled = 1;
+			return;
+		}
+
 		if (ret && !no_iommu && !iommu_detected && !swiotlb &&
 		    !dmar_disabled)
 			iommu_detected = 1;

-- 


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [patch 18/26] x64, x2apic/intr-remap: x2apic ops for x2apic mode support
  2008-07-10 18:16 [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support Suresh Siddha
                   ` (16 preceding siblings ...)
  2008-07-10 18:16 ` [patch 17/26] x64, x2apic/intr-remap: disable DMA-remapping if Interrupt-remapping is detected (temporary quirk) Suresh Siddha
@ 2008-07-10 18:16 ` Suresh Siddha
  2008-07-10 18:16 ` [patch 19/26] x64, x2apic/intr-remap: introcude self IPI to genapic routines Suresh Siddha
                   ` (9 subsequent siblings)
  27 siblings, 0 replies; 87+ messages in thread
From: Suresh Siddha @ 2008-07-10 18:16 UTC (permalink / raw)
  To: mingo, hpa, tglx, akpm, arjan, andi, ebiederm, jbarnes, steiner
  Cc: linux-kernel, Suresh Siddha

[-- Attachment #1: basic_x2apic_ops.patch --]
[-- Type: text/plain, Size: 3051 bytes --]

x2apic ops for x2apic mode support. This uses MSR interface and differs
slightly from the xapic register layout.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---

Index: tree-x86/arch/x86/kernel/apic_64.c
===================================================================
--- tree-x86.orig/arch/x86/kernel/apic_64.c	2008-07-10 09:52:14.000000000 -0700
+++ tree-x86/arch/x86/kernel/apic_64.c	2008-07-10 09:52:22.000000000 -0700
@@ -171,6 +171,41 @@
 
 EXPORT_SYMBOL_GPL(apic_ops);
 
+static void x2apic_wait_icr_idle(void)
+{
+	/* no need to wait for icr idle in x2apic */
+	return;
+}
+
+static u32 safe_x2apic_wait_icr_idle(void)
+{
+	/* no need to wait for icr idle in x2apic */
+	return 0;
+}
+
+void x2apic_icr_write(u32 low, u32 id)
+{
+	wrmsrl(APIC_BASE_MSR + (APIC_ICR >> 4), ((__u64) id) << 32 | low);
+}
+
+u64 x2apic_icr_read(void)
+{
+	unsigned long val;
+
+	rdmsrl(APIC_BASE_MSR + (APIC_ICR >> 4), val);
+	return val;
+}
+
+static struct apic_ops x2apic_ops = {
+	.read = native_apic_msr_read,
+	.write = native_apic_msr_write,
+	.write_atomic = native_apic_msr_write,
+	.icr_read = x2apic_icr_read,
+	.icr_write = x2apic_icr_write,
+	.wait_icr_idle = x2apic_wait_icr_idle,
+	.safe_wait_icr_idle = safe_x2apic_wait_icr_idle,
+};
+
 /**
  * enable_NMI_through_LVT0 - enable NMI through local vector table 0
  */
Index: tree-x86/include/asm-x86/apic.h
===================================================================
--- tree-x86.orig/include/asm-x86/apic.h	2008-07-10 09:52:14.000000000 -0700
+++ tree-x86/include/asm-x86/apic.h	2008-07-10 09:52:22.000000000 -0700
@@ -7,6 +7,8 @@
 #include <asm/apicdef.h>
 #include <asm/processor.h>
 #include <asm/system.h>
+#include <asm/cpufeature.h>
+#include <asm/msr.h>
 
 #define ARCH_APICTIMER_STOPS_ON_C3	1
 
@@ -73,6 +75,26 @@
 	return *((volatile u32 *)(APIC_BASE + reg));
 }
 
+static inline void native_apic_msr_write(u32 reg, u32 v)
+{
+	if (reg == APIC_DFR || reg == APIC_ID || reg == APIC_LDR ||
+	    reg == APIC_LVR)
+		return;
+
+	wrmsr(APIC_BASE_MSR + (reg >> 4), v, 0);
+}
+
+static inline u32 native_apic_msr_read(u32 reg)
+{
+	u32 low, high;
+
+	if (reg == APIC_DFR)
+		return -1;
+
+	rdmsr(APIC_BASE_MSR + (reg >> 4), low, high);
+	return low;
+}
+
 #ifdef CONFIG_X86_32
 extern void apic_wait_icr_idle(void);
 extern u32 safe_apic_wait_icr_idle(void);
Index: tree-x86/include/asm-x86/apicdef.h
===================================================================
--- tree-x86.orig/include/asm-x86/apicdef.h	2008-07-10 09:51:45.000000000 -0700
+++ tree-x86/include/asm-x86/apicdef.h	2008-07-10 09:52:22.000000000 -0700
@@ -105,6 +105,7 @@
 #define	APIC_TMICT	0x380
 #define	APIC_TMCCT	0x390
 #define	APIC_TDCR	0x3E0
+#define APIC_SELF_IPI	0x3F0
 #define		APIC_TDR_DIV_TMBASE	(1 << 2)
 #define		APIC_TDR_DIV_1		0xB
 #define		APIC_TDR_DIV_2		0x0
@@ -128,6 +129,8 @@
 #define	APIC_EILVT3     0x530
 
 #define APIC_BASE (fix_to_virt(FIX_APIC_BASE))
+#define APIC_BASE_MSR	0x800
+#define X2APIC_ENABLE	(1UL << 10)
 
 #ifdef CONFIG_X86_32
 # define MAX_IO_APICS 64

-- 


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [patch 19/26] x64, x2apic/intr-remap: introcude self IPI to genapic routines
  2008-07-10 18:16 [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support Suresh Siddha
                   ` (17 preceding siblings ...)
  2008-07-10 18:16 ` [patch 18/26] x64, x2apic/intr-remap: x2apic ops for x2apic mode support Suresh Siddha
@ 2008-07-10 18:16 ` Suresh Siddha
  2008-07-10 23:34   ` Eric W. Biederman
  2008-07-10 18:16 ` [patch 20/26] x64, x2apic/intr-remap: x2apic cluster mode support Suresh Siddha
                   ` (8 subsequent siblings)
  27 siblings, 1 reply; 87+ messages in thread
From: Suresh Siddha @ 2008-07-10 18:16 UTC (permalink / raw)
  To: mingo, hpa, tglx, akpm, arjan, andi, ebiederm, jbarnes, steiner
  Cc: linux-kernel, Suresh Siddha

[-- Attachment #1: self_ipi.patch --]
[-- Type: text/plain, Size: 3447 bytes --]

Introduce self IPI op for genapic.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---

Index: tree-x86/arch/x86/kernel/genapic_64.c
===================================================================
--- tree-x86.orig/arch/x86/kernel/genapic_64.c	2008-07-10 09:52:12.000000000 -0700
+++ tree-x86/arch/x86/kernel/genapic_64.c	2008-07-10 09:52:24.000000000 -0700
@@ -61,7 +61,7 @@
 
 /* Same for both flat and physical. */
 
-void send_IPI_self(int vector)
+void apic_send_IPI_self(int vector)
 {
 	__send_IPI_shortcut(APIC_DEST_SELF, vector, APIC_DEST_PHYSICAL);
 }
Index: tree-x86/arch/x86/kernel/genapic_flat_64.c
===================================================================
--- tree-x86.orig/arch/x86/kernel/genapic_flat_64.c	2008-07-10 09:52:12.000000000 -0700
+++ tree-x86/arch/x86/kernel/genapic_flat_64.c	2008-07-10 09:52:24.000000000 -0700
@@ -131,6 +131,7 @@
 	.send_IPI_all = flat_send_IPI_all,
 	.send_IPI_allbutself = flat_send_IPI_allbutself,
 	.send_IPI_mask = flat_send_IPI_mask,
+	.send_IPI_self = apic_send_IPI_self,
 	.cpu_mask_to_apicid = flat_cpu_mask_to_apicid,
 	.phys_pkg_id = phys_pkg_id,
 	.read_apic_id = read_xapic_id,
@@ -196,6 +197,7 @@
 	.send_IPI_all = physflat_send_IPI_all,
 	.send_IPI_allbutself = physflat_send_IPI_allbutself,
 	.send_IPI_mask = physflat_send_IPI_mask,
+	.send_IPI_self = apic_send_IPI_self,
 	.cpu_mask_to_apicid = physflat_cpu_mask_to_apicid,
 	.phys_pkg_id = phys_pkg_id,
 	.read_apic_id = read_xapic_id,
Index: tree-x86/include/asm-x86/genapic_64.h
===================================================================
--- tree-x86.orig/include/asm-x86/genapic_64.h	2008-07-10 09:52:12.000000000 -0700
+++ tree-x86/include/asm-x86/genapic_64.h	2008-07-10 09:52:24.000000000 -0700
@@ -24,6 +24,7 @@
 	void (*send_IPI_mask)(cpumask_t mask, int vector);
 	void (*send_IPI_allbutself)(int vector);
 	void (*send_IPI_all)(int vector);
+	void (*send_IPI_self)(int vector);
 	/* */
 	unsigned int (*cpu_mask_to_apicid)(cpumask_t cpumask);
 	unsigned int (*phys_pkg_id)(int index_msb);
@@ -36,6 +37,7 @@
 extern struct genapic apic_physflat;
 extern int acpi_madt_oem_check(char *, char *);
 
+extern void apic_send_IPI_self(int vector);
 enum uv_system_type {UV_NONE, UV_LEGACY_APIC, UV_X2APIC, UV_NON_UNIQUE_APIC};
 extern enum uv_system_type get_uv_system_type(void);
 extern int is_uv_system(void);
Index: tree-x86/include/asm-x86/hw_irq.h
===================================================================
--- tree-x86.orig/include/asm-x86/hw_irq.h	2008-07-10 09:51:45.000000000 -0700
+++ tree-x86/include/asm-x86/hw_irq.h	2008-07-10 09:52:24.000000000 -0700
@@ -73,7 +73,9 @@
 #endif
 
 /* IPI functions */
+#ifdef CONFIG_X86_32
 extern void send_IPI_self(int vector);
+#endif
 extern void send_IPI(int dest, int vector);
 
 /* Statistics */
Index: tree-x86/include/asm-x86/mach-default/mach_apic.h
===================================================================
--- tree-x86.orig/include/asm-x86/mach-default/mach_apic.h	2008-07-10 09:52:12.000000000 -0700
+++ tree-x86/include/asm-x86/mach-default/mach_apic.h	2008-07-10 09:52:24.000000000 -0700
@@ -31,6 +31,7 @@
 #define phys_pkg_id	(genapic->phys_pkg_id)
 #define vector_allocation_domain    (genapic->vector_allocation_domain)
 #define read_apic_id  (genapic->read_apic_id)
+#define send_IPI_self (genapic->send_IPI_self)
 extern void setup_apic_routing(void);
 #else
 #define INT_DELIVERY_MODE dest_LowestPrio

-- 


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [patch 20/26] x64, x2apic/intr-remap: x2apic cluster mode support
  2008-07-10 18:16 [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support Suresh Siddha
                   ` (18 preceding siblings ...)
  2008-07-10 18:16 ` [patch 19/26] x64, x2apic/intr-remap: introcude self IPI to genapic routines Suresh Siddha
@ 2008-07-10 18:16 ` Suresh Siddha
  2008-07-10 18:16 ` [patch 21/26] x64, x2apic/intr-remap: setup init_apic_ldr for UV Suresh Siddha
                   ` (7 subsequent siblings)
  27 siblings, 0 replies; 87+ messages in thread
From: Suresh Siddha @ 2008-07-10 18:16 UTC (permalink / raw)
  To: mingo, hpa, tglx, akpm, arjan, andi, ebiederm, jbarnes, steiner
  Cc: linux-kernel, Suresh Siddha

[-- Attachment #1: x2apic_cluster.patch --]
[-- Type: text/plain, Size: 5137 bytes --]

x2apic cluster mode support.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---

Index: tree-x86/arch/x86/kernel/genx2apic_cluster.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ tree-x86/arch/x86/kernel/genx2apic_cluster.c	2008-07-10 09:52:27.000000000 -0700
@@ -0,0 +1,135 @@
+#include <linux/threads.h>
+#include <linux/cpumask.h>
+#include <linux/string.h>
+#include <linux/kernel.h>
+#include <linux/ctype.h>
+#include <linux/init.h>
+#include <asm/smp.h>
+#include <asm/ipi.h>
+#include <asm/genapic.h>
+
+DEFINE_PER_CPU(u32, x86_cpu_to_logical_apicid);
+
+/* Start with all IRQs pointing to boot CPU.  IRQ balancing will shift them. */
+
+static cpumask_t x2apic_target_cpus(void)
+{
+	return cpumask_of_cpu(0);
+}
+
+/*
+ * for now each logical cpu is in its own vector allocation domain.
+ */
+static cpumask_t x2apic_vector_allocation_domain(int cpu)
+{
+	cpumask_t domain = CPU_MASK_NONE;
+	cpu_set(cpu, domain);
+	return domain;
+}
+
+static void __x2apic_send_IPI_dest(unsigned int apicid, int vector,
+				   unsigned int dest)
+{
+	unsigned long cfg;
+
+	cfg = __prepare_ICR(0, vector, dest);
+
+	/*
+	 * send the IPI.
+	 */
+	x2apic_icr_write(cfg, apicid);
+}
+
+/*
+ * for now, we send the IPI's one by one in the cpumask.
+ * TBD: Based on the cpu mask, we can send the IPI's to the cluster group
+ * at once. We have 16 cpu's in a cluster. This will minimize IPI register
+ * writes.
+ */
+static void x2apic_send_IPI_mask(cpumask_t mask, int vector)
+{
+	unsigned long flags;
+	unsigned long query_cpu;
+
+	local_irq_save(flags);
+	for_each_cpu_mask(query_cpu, mask) {
+		__x2apic_send_IPI_dest(per_cpu(x86_cpu_to_logical_apicid, query_cpu),
+				       vector, APIC_DEST_LOGICAL);
+	}
+	local_irq_restore(flags);
+}
+
+static void x2apic_send_IPI_allbutself(int vector)
+{
+	cpumask_t mask = cpu_online_map;
+
+	cpu_clear(smp_processor_id(), mask);
+
+	if (!cpus_empty(mask))
+		x2apic_send_IPI_mask(mask, vector);
+}
+
+static void x2apic_send_IPI_all(int vector)
+{
+	x2apic_send_IPI_mask(cpu_online_map, vector);
+}
+
+static int x2apic_apic_id_registered(void)
+{
+	return 1;
+}
+
+static unsigned int x2apic_cpu_mask_to_apicid(cpumask_t cpumask)
+{
+	int cpu;
+
+	/*
+	 * We're using fixed IRQ delivery, can only return one phys APIC ID.
+	 * May as well be the first.
+	 */
+	cpu = first_cpu(cpumask);
+	if ((unsigned)cpu < NR_CPUS)
+		return per_cpu(x86_cpu_to_logical_apicid, cpu);
+	else
+		return BAD_APICID;
+}
+
+static unsigned int x2apic_read_id(void)
+{
+	return apic_read(APIC_ID);
+}
+
+static unsigned int phys_pkg_id(int index_msb)
+{
+	return x2apic_read_id() >> index_msb;
+}
+
+static void x2apic_send_IPI_self(int vector)
+{
+	apic_write(APIC_SELF_IPI, vector);
+}
+
+static void init_x2apic_ldr(void)
+{
+	int cpu = smp_processor_id();
+
+	per_cpu(x86_cpu_to_logical_apicid, cpu) = apic_read(APIC_LDR);
+	return;
+}
+
+struct genapic apic_x2apic_cluster = {
+	.name = "cluster x2apic",
+	.int_delivery_mode = dest_LowestPrio,
+	.int_dest_mode = (APIC_DEST_LOGICAL != 0),
+	.target_cpus = x2apic_target_cpus,
+	.vector_allocation_domain = x2apic_vector_allocation_domain,
+	.apic_id_registered = x2apic_apic_id_registered,
+	.init_apic_ldr = init_x2apic_ldr,
+	.send_IPI_all = x2apic_send_IPI_all,
+	.send_IPI_allbutself = x2apic_send_IPI_allbutself,
+	.send_IPI_mask = x2apic_send_IPI_mask,
+	.send_IPI_self = x2apic_send_IPI_self,
+	.cpu_mask_to_apicid = x2apic_cpu_mask_to_apicid,
+	.phys_pkg_id = phys_pkg_id,
+	.read_apic_id = x2apic_read_id,
+};
Index: tree-x86/arch/x86/kernel/Makefile
===================================================================
--- tree-x86.orig/arch/x86/kernel/Makefile	2008-07-10 09:51:45.000000000 -0700
+++ tree-x86/arch/x86/kernel/Makefile	2008-07-10 09:52:27.000000000 -0700
@@ -104,6 +104,7 @@
 # 64 bit specific files
 ifeq ($(CONFIG_X86_64),y)
         obj-y				+= genapic_64.o genapic_flat_64.o genx2apic_uv_x.o tlb_uv.o
+        obj-y				+= genx2apic_cluster.o
         obj-$(CONFIG_X86_PM_TIMER)	+= pmtimer_64.o
         obj-$(CONFIG_AUDIT)		+= audit_64.o
 
Index: tree-x86/arch/x86/kernel/genapic_64.c
===================================================================
--- tree-x86.orig/arch/x86/kernel/genapic_64.c	2008-07-10 09:52:24.000000000 -0700
+++ tree-x86/arch/x86/kernel/genapic_64.c	2008-07-10 09:52:27.000000000 -0700
@@ -38,6 +38,8 @@
 {
 	if (uv_system_type == UV_NON_UNIQUE_APIC)
 		genapic = &apic_x2apic_uv_x;
+	else if (cpu_has_x2apic && intr_remapping_enabled)
+		genapic = &apic_x2apic_cluster;
 	else
 #ifdef CONFIG_ACPI
 	/*
Index: tree-x86/include/asm-x86/genapic_64.h
===================================================================
--- tree-x86.orig/include/asm-x86/genapic_64.h	2008-07-10 09:52:24.000000000 -0700
+++ tree-x86/include/asm-x86/genapic_64.h	2008-07-10 09:52:27.000000000 -0700
@@ -35,6 +35,7 @@
 
 extern struct genapic apic_flat;
 extern struct genapic apic_physflat;
+extern struct genapic apic_x2apic_cluster;
 extern int acpi_madt_oem_check(char *, char *);
 
 extern void apic_send_IPI_self(int vector);

-- 


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [patch 21/26] x64, x2apic/intr-remap: setup init_apic_ldr for UV
  2008-07-10 18:16 [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support Suresh Siddha
                   ` (19 preceding siblings ...)
  2008-07-10 18:16 ` [patch 20/26] x64, x2apic/intr-remap: x2apic cluster mode support Suresh Siddha
@ 2008-07-10 18:16 ` Suresh Siddha
  2008-07-11  0:14   ` Andrew Morton
  2008-07-10 18:16 ` [patch 22/26] x64, x2apic/intr-remap: IO-APIC support for interrupt-remapping Suresh Siddha
                   ` (6 subsequent siblings)
  27 siblings, 1 reply; 87+ messages in thread
From: Suresh Siddha @ 2008-07-10 18:16 UTC (permalink / raw)
  To: mingo, hpa, tglx, akpm, arjan, andi, ebiederm, jbarnes, steiner
  Cc: linux-kernel, Suresh Siddha

[-- Attachment #1: uv_init_apic_ldr.patch --]
[-- Type: text/plain, Size: 897 bytes --]

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Jack Steiner <steiner@sgi.com>
---

Index: tree-x86/arch/x86/kernel/genx2apic_uv_x.c
===================================================================
--- tree-x86.orig/arch/x86/kernel/genx2apic_uv_x.c	2008-07-10 09:52:12.000000000 -0700
+++ tree-x86/arch/x86/kernel/genx2apic_uv_x.c	2008-07-10 09:52:29.000000000 -0700
@@ -120,6 +120,10 @@
 	return 1;
 }
 
+static inline void uv_init_apic_ldr(void)
+{
+}
+
 static unsigned int uv_cpu_mask_to_apicid(cpumask_t cpumask)
 {
 	int cpu;
@@ -164,6 +168,7 @@
 	.target_cpus = uv_target_cpus,
 	.vector_allocation_domain = uv_vector_allocation_domain,/* Fixme ZZZ */
 	.apic_id_registered = uv_apic_id_registered,
+	.init_apic_ldr = uv_init_apic_ldr,
 	.send_IPI_all = uv_send_IPI_all,
 	.send_IPI_allbutself = uv_send_IPI_allbutself,
 	.send_IPI_mask = uv_send_IPI_mask,

-- 


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [patch 22/26] x64, x2apic/intr-remap: IO-APIC support for interrupt-remapping
  2008-07-10 18:16 [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support Suresh Siddha
                   ` (20 preceding siblings ...)
  2008-07-10 18:16 ` [patch 21/26] x64, x2apic/intr-remap: setup init_apic_ldr for UV Suresh Siddha
@ 2008-07-10 18:16 ` Suresh Siddha
  2008-07-10 18:16 ` [patch 23/26] x64, x2apic/intr-remap: MSI and MSI-X support for interrupt remapping infrastructure Suresh Siddha
                   ` (5 subsequent siblings)
  27 siblings, 0 replies; 87+ messages in thread
From: Suresh Siddha @ 2008-07-10 18:16 UTC (permalink / raw)
  To: mingo, hpa, tglx, akpm, arjan, andi, ebiederm, jbarnes, steiner
  Cc: linux-kernel, Suresh Siddha

[-- Attachment #1: ioapic_support_for_ir.patch --]
[-- Type: text/plain, Size: 15012 bytes --]

IO-APIC support in the presence of interrupt-remapping infrastructure.

IO-APIC RTE will be programmed with interrupt-remapping table entry(IRTE)
index and the IRTE will contain information about the vector, cpu destination,
trigger mode etc, which traditionally was present in the IO-APIC RTE.

Introduce a new irq_chip for cleaner irq migration (in the process
context as opposed to the current irq migration in the context of an interrupt.
interrupt-remapping infrastructure will help us achieve this cleanly).

For edge triggered, irq migration is a simple atomic update(of vector
and cpu destination) of IRTE and flush the hardware cache.

For level triggered, we need to modify the io-apic RTE aswell with the update
vector information, along with modifying IRTE with vector and cpu destination.
So irq migration for level triggered is little  bit more complex compared to
edge triggered migration. But the good news is, we use the same algorithm
for level triggered migration as we have today, only difference being,
we now initiate the irq migration from process context instead of the
interrupt context.

In future, when we do a directed EOI (combined with cpu EOI broadcast
suppression) to the IO-APIC, level triggered irq migration will also be
as simple as edge triggered migration and we can do the irq migration
with a simple atomic update to IO-APIC RTE.

TBD: some tests/changes needed in the presence of fixup_irqs() for
level triggered irq migration.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---

Index: tree-x86/arch/x86/kernel/io_apic_64.c
===================================================================
--- tree-x86.orig/arch/x86/kernel/io_apic_64.c	2008-07-10 09:52:14.000000000 -0700
+++ tree-x86/arch/x86/kernel/io_apic_64.c	2008-07-10 09:52:31.000000000 -0700
@@ -37,6 +37,7 @@
 #include <acpi/acpi_bus.h>
 #endif
 #include <linux/bootmem.h>
+#include <linux/dmar.h>
 
 #include <asm/idle.h>
 #include <asm/io.h>
@@ -48,6 +49,7 @@
 #include <asm/nmi.h>
 #include <asm/msidef.h>
 #include <asm/hypertransport.h>
+#include <asm/irq_remapping.h>
 
 #include <mach_ipi.h>
 #include <mach_apic.h>
@@ -312,7 +314,12 @@
 		pin = entry->pin;
 		if (pin == -1)
 			break;
-		io_apic_write(apic, 0x11 + pin*2, dest);
+		/*
+		 * With interrupt-remapping, destination information comes
+		 * from interrupt-remapping table entry.
+		 */
+		if (!irq_remapped(irq))
+			io_apic_write(apic, 0x11 + pin*2, dest);
 		reg = io_apic_read(apic, 0x10 + pin*2);
 		reg &= ~IO_APIC_REDIR_VECTOR_MASK;
 		reg |= vector;
@@ -906,18 +913,98 @@
 
 
 static struct irq_chip ioapic_chip;
+#ifdef CONFIG_INTR_REMAP
+static struct irq_chip ir_ioapic_chip;
+#endif
 
 static void ioapic_register_intr(int irq, unsigned long trigger)
 {
-	if (trigger) {
+	if (trigger)
 		irq_desc[irq].status |= IRQ_LEVEL;
-		set_irq_chip_and_handler_name(irq, &ioapic_chip,
-					      handle_fasteoi_irq, "fasteoi");
-	} else {
+	else
 		irq_desc[irq].status &= ~IRQ_LEVEL;
+
+#ifdef CONFIG_INTR_REMAP
+	if (irq_remapped(irq)) {
+		irq_desc[irq].status |= IRQ_MOVE_PCNTXT;
+		if (trigger)
+			set_irq_chip_and_handler_name(irq, &ir_ioapic_chip,
+						      handle_fasteoi_irq,
+						     "fasteoi");
+		else
+			set_irq_chip_and_handler_name(irq, &ir_ioapic_chip,
+						      handle_edge_irq, "edge");
+		return;
+	}
+#endif
+	if (trigger)
+		set_irq_chip_and_handler_name(irq, &ioapic_chip,
+					      handle_fasteoi_irq,
+					      "fasteoi");
+	else
 		set_irq_chip_and_handler_name(irq, &ioapic_chip,
 					      handle_edge_irq, "edge");
+}
+
+static int setup_ioapic_entry(int apic, int irq,
+			      struct IO_APIC_route_entry *entry,
+			      unsigned int destination, int trigger,
+			      int polarity, int vector)
+{
+	/*
+	 * add it to the IO-APIC irq-routing table:
+	 */
+	memset(entry,0,sizeof(*entry));
+
+#ifdef CONFIG_INTR_REMAP
+	if (intr_remapping_enabled) {
+		struct intel_iommu *iommu = map_ioapic_to_ir(apic);
+		struct irte irte;
+		struct IR_IO_APIC_route_entry *ir_entry =
+			(struct IR_IO_APIC_route_entry *) entry;
+		int index;
+
+		if (!iommu)
+			panic("No mapping iommu for ioapic %d\n", apic);
+
+		index = alloc_irte(iommu, irq, 1);
+		if (index < 0)
+			panic("Failed to allocate IRTE for ioapic %d\n", apic);
+
+		memset(&irte, 0, sizeof(irte));
+
+		irte.present = 1;
+		irte.dst_mode = INT_DEST_MODE;
+		irte.trigger_mode = trigger;
+		irte.dlvry_mode = INT_DELIVERY_MODE;
+		irte.vector = vector;
+		irte.dest_id = IRTE_DEST(destination);
+
+		modify_irte(irq, &irte);
+
+		ir_entry->index2 = (index >> 15) & 0x1;
+		ir_entry->zero = 0;
+		ir_entry->format = 1;
+		ir_entry->index = (index & 0x7fff);
+	} else
+#endif
+	{
+		entry->delivery_mode = INT_DELIVERY_MODE;
+		entry->dest_mode = INT_DEST_MODE;
+		entry->dest = destination;
 	}
+
+	entry->mask = 0;				/* enable IRQ */
+	entry->trigger = trigger;
+	entry->polarity = polarity;
+	entry->vector = vector;
+
+	/* Mask level triggered irqs.
+	 * Use IRQ_DELAYED_DISABLE for edge triggered irqs.
+	 */
+	if (trigger)
+		entry->mask = 1;
+	return 0;
 }
 
 static void setup_IO_APIC_irq(int apic, int pin, unsigned int irq,
@@ -942,24 +1029,15 @@
 		    apic, mp_ioapics[apic].mp_apicid, pin, cfg->vector,
 		    irq, trigger, polarity);
 
-	/*
-	 * add it to the IO-APIC irq-routing table:
-	 */
-	memset(&entry,0,sizeof(entry));
-
-	entry.delivery_mode = INT_DELIVERY_MODE;
-	entry.dest_mode = INT_DEST_MODE;
-	entry.dest = cpu_mask_to_apicid(mask);
-	entry.mask = 0;				/* enable IRQ */
-	entry.trigger = trigger;
-	entry.polarity = polarity;
-	entry.vector = cfg->vector;
 
-	/* Mask level triggered irqs.
-	 * Use IRQ_DELAYED_DISABLE for edge triggered irqs.
-	 */
-	if (trigger)
-		entry.mask = 1;
+	if (setup_ioapic_entry(mp_ioapics[apic].mp_apicid, irq, &entry,
+			       cpu_mask_to_apicid(mask), trigger, polarity,
+			       cfg->vector)) {
+		printk("Failed to setup ioapic entry for ioapic  %d, pin %d\n",
+		       mp_ioapics[apic].mp_apicid, pin);
+		__clear_irq_vector(irq);
+		return;
+	}
 
 	ioapic_register_intr(irq, trigger);
 	if (irq < 16)
@@ -1011,6 +1089,9 @@
 {
 	struct IO_APIC_route_entry entry;
 
+	if (intr_remapping_enabled)
+		return;
+
 	memset(&entry, 0, sizeof(entry));
 
 	/*
@@ -1466,6 +1547,147 @@
  */
 
 #ifdef CONFIG_SMP
+
+#ifdef CONFIG_INTR_REMAP
+static void ir_irq_migration(struct work_struct *work);
+
+static DECLARE_DELAYED_WORK(ir_migration_work, ir_irq_migration);
+
+/*
+ * Migrate the IO-APIC irq in the presence of intr-remapping.
+ *
+ * For edge triggered, irq migration is a simple atomic update(of vector
+ * and cpu destination) of IRTE and flush the hardware cache.
+ *
+ * For level triggered, we need to modify the io-apic RTE aswell with the update
+ * vector information, along with modifying IRTE with vector and destination.
+ * So irq migration for level triggered is little  bit more complex compared to
+ * edge triggered migration. But the good news is, we use the same algorithm
+ * for level triggered migration as we have today, only difference being,
+ * we now initiate the irq migration from process context instead of the
+ * interrupt context.
+ *
+ * In future, when we do a directed EOI (combined with cpu EOI broadcast
+ * suppression) to the IO-APIC, level triggered irq migration will also be
+ * as simple as edge triggered migration and we can do the irq migration
+ * with a simple atomic update to IO-APIC RTE.
+ */
+static void migrate_ioapic_irq(int irq, cpumask_t mask)
+{
+	struct irq_cfg *cfg = irq_cfg + irq;
+	struct irq_desc *desc = irq_desc + irq;
+	cpumask_t tmp, cleanup_mask;
+	struct irte irte;
+	int modify_ioapic_rte = desc->status & IRQ_LEVEL;
+	unsigned int dest;
+	unsigned long flags;
+
+	cpus_and(tmp, mask, cpu_online_map);
+	if (cpus_empty(tmp))
+		return;
+
+	if (get_irte(irq, &irte))
+		return;
+
+	if (assign_irq_vector(irq, mask))
+		return;
+
+	cpus_and(tmp, cfg->domain, mask);
+	dest = cpu_mask_to_apicid(tmp);
+
+	if (modify_ioapic_rte) {
+		spin_lock_irqsave(&ioapic_lock, flags);
+		__target_IO_APIC_irq(irq, dest, cfg->vector);
+		spin_unlock_irqrestore(&ioapic_lock, flags);
+	}
+
+	irte.vector = cfg->vector;
+	irte.dest_id = IRTE_DEST(dest);
+
+	/*
+	 * Modified the IRTE and flushes the Interrupt entry cache.
+	 */
+	modify_irte(irq, &irte);
+
+	if (cfg->move_in_progress) {
+		cpus_and(cleanup_mask, cfg->old_domain, cpu_online_map);
+		cfg->move_cleanup_count = cpus_weight(cleanup_mask);
+		send_IPI_mask(cleanup_mask, IRQ_MOVE_CLEANUP_VECTOR);
+		cfg->move_in_progress = 0;
+	}
+
+	irq_desc[irq].affinity = mask;
+}
+
+static int migrate_irq_remapped_level(int irq)
+{
+	int ret = -1;
+
+	mask_IO_APIC_irq(irq);
+
+	if (io_apic_level_ack_pending(irq)) {
+		/*
+	 	 * Interrupt in progress. Migrating irq now will change the
+		 * vector information in the IO-APIC RTE and that will confuse
+		 * the EOI broadcast performed by cpu.
+		 * So, delay the irq migration to the next instance.
+		 */
+		schedule_delayed_work(&ir_migration_work, 1);
+		goto unmask;
+	}
+
+	/* everthing is clear. we have right of way */
+	migrate_ioapic_irq(irq, irq_desc[irq].pending_mask);
+
+	ret = 0;
+	irq_desc[irq].status &= ~IRQ_MOVE_PENDING;
+	cpus_clear(irq_desc[irq].pending_mask);
+
+unmask:
+	unmask_IO_APIC_irq(irq);
+	return ret;
+}
+
+static void ir_irq_migration(struct work_struct *work)
+{
+	int irq;
+
+	for (irq = 0; irq < NR_IRQS; irq++) {
+		struct irq_desc *desc = irq_desc + irq;
+		if (desc->status & IRQ_MOVE_PENDING) {
+			unsigned long flags;
+
+			spin_lock_irqsave(&desc->lock, flags);
+			if (!desc->chip->set_affinity ||
+			    !(desc->status & IRQ_MOVE_PENDING)) {
+				desc->status &= ~IRQ_MOVE_PENDING;
+				spin_unlock_irqrestore(&desc->lock, flags);
+				continue;
+			}
+
+			desc->chip->set_affinity(irq,
+					         irq_desc[irq].pending_mask);
+			spin_unlock_irqrestore(&desc->lock, flags);
+		}
+	}
+}
+
+/*
+ * Migrates the IRQ destination in the process context.
+ */
+static void set_ir_ioapic_affinity_irq(unsigned int irq, cpumask_t mask)
+{
+	if (irq_desc[irq].status & IRQ_LEVEL) {
+		irq_desc[irq].status |= IRQ_MOVE_PENDING;
+		irq_desc[irq].pending_mask = mask;
+		migrate_irq_remapped_level(irq);
+		return;
+	}
+
+	migrate_ioapic_irq(irq, mask);
+}
+#endif
+
 asmlinkage void smp_irq_move_cleanup_interrupt(void)
 {
 	unsigned vector, me;
@@ -1522,6 +1744,17 @@
 #else
 static inline void irq_complete_move(unsigned int irq) {}
 #endif
+#ifdef CONFIG_INTR_REMAP
+static void ack_x2apic_level(unsigned int irq)
+{
+	ack_x2APIC_irq();
+}
+
+static void ack_x2apic_edge(unsigned int irq)
+{
+	ack_x2APIC_irq();
+}
+#endif
 
 static void ack_apic_edge(unsigned int irq)
 {
@@ -1596,6 +1829,21 @@
 	.retrigger	= ioapic_retrigger_irq,
 };
 
+#ifdef CONFIG_INTR_REMAP
+static struct irq_chip ir_ioapic_chip __read_mostly = {
+	.name 		= "IR-IO-APIC",
+	.startup 	= startup_ioapic_irq,
+	.mask	 	= mask_IO_APIC_irq,
+	.unmask	 	= unmask_IO_APIC_irq,
+	.ack 		= ack_x2apic_edge,
+	.eoi 		= ack_x2apic_level,
+#ifdef CONFIG_SMP
+	.set_affinity 	= set_ir_ioapic_affinity_irq,
+#endif
+	.retrigger	= ioapic_retrigger_irq,
+};
+#endif
+
 static inline void init_IO_APIC_traps(void)
 {
 	int irq;
@@ -1782,6 +2030,8 @@
 	 * 8259A.
 	 */
 	if (pin1 == -1) {
+		if (intr_remapping_enabled)
+			panic("BIOS bug: timer not connected to IO-APIC");
 		pin1 = pin2;
 		apic1 = apic2;
 		no_pin1 = 1;
@@ -1808,6 +2058,8 @@
 				clear_IO_APIC_pin(0, pin1);
 			goto out;
 		}
+		if (intr_remapping_enabled)
+			panic("timer doesn't work through Interrupt-remapped IO-APIC");
 		clear_IO_APIC_pin(apic1, pin1);
 		if (!no_pin1)
 			apic_printk(APIC_QUIET,KERN_ERR "..MP-BIOS bug: "
@@ -2393,6 +2645,10 @@
 				setup_IO_APIC_irq(ioapic, pin, irq,
 						  irq_trigger(irq_entry),
 						  irq_polarity(irq_entry));
+#ifdef CONFIG_INTR_REMAP
+			else if (intr_remapping_enabled)
+				set_ir_ioapic_affinity_irq(irq, TARGET_CPUS);
+#endif
 			else
 				set_ioapic_affinity_irq(irq, TARGET_CPUS);
 		}
Index: tree-x86/drivers/pci/intr_remapping.c
===================================================================
--- tree-x86.orig/drivers/pci/intr_remapping.c	2008-07-10 09:52:05.000000000 -0700
+++ tree-x86/drivers/pci/intr_remapping.c	2008-07-10 09:52:31.000000000 -0700
@@ -220,6 +220,16 @@
 	return 0;
 }
 
+struct intel_iommu *map_ioapic_to_ir(int apic)
+{
+	int i;
+
+	for (i = 0; i < MAX_IO_APICS; i++)
+		if (ir_ioapic[i].id == apic)
+			return ir_ioapic[i].iommu;
+	return NULL;
+}
+
 int free_irte(int irq)
 {
 	int index, i;
Index: tree-x86/include/linux/dmar.h
===================================================================
--- tree-x86.orig/include/linux/dmar.h	2008-07-10 09:52:05.000000000 -0700
+++ tree-x86/include/linux/dmar.h	2008-07-10 09:52:31.000000000 -0700
@@ -109,6 +109,7 @@
 extern int free_irte(int irq);
 
 extern int irq_remapped(int irq);
+extern struct intel_iommu *map_ioapic_to_ir(int apic);
 #else
 #define irq_remapped(irq)		(0)
 #define enable_intr_remapping(mode)	(-1)
Index: tree-x86/arch/x86/kernel/apic_64.c
===================================================================
--- tree-x86.orig/arch/x86/kernel/apic_64.c	2008-07-10 09:52:22.000000000 -0700
+++ tree-x86/arch/x86/kernel/apic_64.c	2008-07-10 09:52:31.000000000 -0700
@@ -46,6 +46,7 @@
 static int disable_apic_timer __cpuinitdata;
 static int apic_calibrate_pmtmr __initdata;
 int disable_apic;
+int x2apic;
 
 /* Local APIC timer works in C2 */
 int local_apic_timer_c2_ok;
Index: tree-x86/include/asm-x86/io_apic.h
===================================================================
--- tree-x86.orig/include/asm-x86/io_apic.h	2008-07-10 09:52:11.000000000 -0700
+++ tree-x86/include/asm-x86/io_apic.h	2008-07-10 09:52:31.000000000 -0700
@@ -107,6 +107,20 @@
 
 } __attribute__ ((packed));
 
+struct IR_IO_APIC_route_entry {
+	__u64	vector		: 8,
+		zero		: 3,
+		index2		: 1,
+		delivery_status : 1,
+		polarity	: 1,
+		irr		: 1,
+		trigger		: 1,
+		mask		: 1,
+		reserved	: 31,
+		format		: 1,
+		index		: 15;
+} __attribute__ ((packed));
+
 #ifdef CONFIG_X86_IO_APIC
 
 /*
Index: tree-x86/include/asm-x86/irq_remapping.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ tree-x86/include/asm-x86/irq_remapping.h	2008-07-10 09:52:31.000000000 -0700
@@ -0,0 +1,8 @@
+#ifndef _ASM_IRQ_REMAPPING_H
+#define _ASM_IRQ_REMAPPING_H
+
+extern int x2apic;
+
+#define IRTE_DEST(dest) ((x2apic) ? dest : dest << 8)
+
+#endif
Index: tree-x86/include/asm-x86/apic.h
===================================================================
--- tree-x86.orig/include/asm-x86/apic.h	2008-07-10 09:52:22.000000000 -0700
+++ tree-x86/include/asm-x86/apic.h	2008-07-10 09:52:31.000000000 -0700
@@ -134,6 +134,15 @@
 # define apic_write_around(x, y) apic_write_atomic((x), (y))
 #endif
 
+#ifdef CONFIG_X86_64
+static inline void ack_x2APIC_irq(void)
+{
+	/* Docs say use 0 for future compatibility */
+	native_apic_msr_write(APIC_EOI, 0);
+}
+#endif
+
+
 static inline void ack_APIC_irq(void)
 {
 	/*

-- 


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [patch 23/26] x64, x2apic/intr-remap: MSI and MSI-X support for interrupt remapping infrastructure
  2008-07-10 18:16 [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support Suresh Siddha
                   ` (21 preceding siblings ...)
  2008-07-10 18:16 ` [patch 22/26] x64, x2apic/intr-remap: IO-APIC support for interrupt-remapping Suresh Siddha
@ 2008-07-10 18:16 ` Suresh Siddha
  2008-07-11  1:22   ` Eric W. Biederman
  2008-07-10 18:16 ` [patch 24/26] x64, x2apic/intr-remap: add x2apic support, including enabling interrupt-remapping Suresh Siddha
                   ` (4 subsequent siblings)
  27 siblings, 1 reply; 87+ messages in thread
From: Suresh Siddha @ 2008-07-10 18:16 UTC (permalink / raw)
  To: mingo, hpa, tglx, akpm, arjan, andi, ebiederm, jbarnes, steiner
  Cc: linux-kernel, Suresh Siddha

[-- Attachment #1: msi_intr_remap_support.patch --]
[-- Type: text/plain, Size: 8938 bytes --]

MSI and MSI-X support for interrupt remapping infrastructure.

MSI address register will be programmed with interrupt-remapping table
entry(IRTE) index and the IRTE will contain information about the vector,
cpu destination, etc.

For MSI-X, all the IRTE's will be consecutively allocated in the table,
and the address registers will contain the starting index to the block
and the data register will contain the subindex with in that block.

This also introduces a new irq_chip for cleaner irq migration (in the process
context as opposed to the current irq migration in the context of an interrupt.
interrupt-remapping infrastructure will help us achieve this).

As MSI is edge triggered, irq migration is a simple atomic update(of vector
and cpu destination) of IRTE and flushing the hardware cache.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---

Index: tree-x86/arch/x86/kernel/io_apic_64.c
===================================================================
--- tree-x86.orig/arch/x86/kernel/io_apic_64.c	2008-07-10 09:52:31.000000000 -0700
+++ tree-x86/arch/x86/kernel/io_apic_64.c	2008-07-10 09:52:34.000000000 -0700
@@ -2289,6 +2289,9 @@
 
 	dynamic_irq_cleanup(irq);
 
+#ifdef CONFIG_INTR_REMAP
+	free_irte(irq);
+#endif
 	spin_lock_irqsave(&vector_lock, flags);
 	__clear_irq_vector(irq);
 	spin_unlock_irqrestore(&vector_lock, flags);
@@ -2307,11 +2310,42 @@
 
 	tmp = TARGET_CPUS;
 	err = assign_irq_vector(irq, tmp);
-	if (!err) {
-		cpus_and(tmp, cfg->domain, tmp);
-		dest = cpu_mask_to_apicid(tmp);
+	if (err)
+		return err;
+
+	cpus_and(tmp, cfg->domain, tmp);
+	dest = cpu_mask_to_apicid(tmp);
+
+#ifdef CONFIG_INTR_REMAP
+	if (irq_remapped(irq)) {
+		struct irte irte;
+		int ir_index;
+		u16 sub_handle;
+
+		ir_index = map_irq_to_irte_handle(irq, &sub_handle);
+		BUG_ON(ir_index == -1);
+
+		memset (&irte, 0, sizeof(irte));
+
+		irte.present = 1;
+		irte.dst_mode = INT_DEST_MODE;
+		irte.trigger_mode = 0; /* edge */
+		irte.dlvry_mode = INT_DELIVERY_MODE;
+		irte.vector = cfg->vector;
+		irte.dest_id = IRTE_DEST(dest);
+
+		modify_irte(irq, &irte);
 
 		msg->address_hi = MSI_ADDR_BASE_HI;
+		msg->data = sub_handle;
+		msg->address_lo = MSI_ADDR_BASE_LO | MSI_ADDR_IR_EXT_INT |
+				  MSI_ADDR_IR_SHV |
+				  MSI_ADDR_IR_INDEX1(ir_index) |
+				  MSI_ADDR_IR_INDEX2(ir_index);
+	} else
+#endif
+	{
+		msg->address_hi = MSI_ADDR_BASE_HI;
 		msg->address_lo =
 			MSI_ADDR_BASE_LO |
 			((INT_DEST_MODE == 0) ?
@@ -2361,6 +2395,55 @@
 	write_msi_msg(irq, &msg);
 	irq_desc[irq].affinity = mask;
 }
+
+#ifdef CONFIG_INTR_REMAP
+/*
+ * Migrate the MSI irq to another cpumask. This migration is
+ * done in the process context using interrupt-remapping hardware.
+ */
+static void ir_set_msi_irq_affinity(unsigned int irq, cpumask_t mask)
+{
+	struct irq_cfg *cfg = irq_cfg + irq;
+	unsigned int dest;
+	cpumask_t tmp, cleanup_mask;
+	struct irte irte;
+
+	cpus_and(tmp, mask, cpu_online_map);
+	if (cpus_empty(tmp))
+		return;
+
+	if (get_irte(irq, &irte))
+		return;
+
+	if (assign_irq_vector(irq, mask))
+		return;
+
+	cpus_and(tmp, cfg->domain, mask);
+	dest = cpu_mask_to_apicid(tmp);
+
+	irte.vector = cfg->vector;
+	irte.dest_id = IRTE_DEST(dest);
+
+	/*
+	 * atomically update the IRTE with the new destination and vector.
+	 */
+	modify_irte(irq, &irte);
+
+	/*
+	 * After this point, all the interrupts will start arriving
+	 * at the new destination. So, time to cleanup the previous
+	 * vector allocation.
+	 */
+	if (cfg->move_in_progress) {
+		cpus_and(cleanup_mask, cfg->old_domain, cpu_online_map);
+		cfg->move_cleanup_count = cpus_weight(cleanup_mask);
+		send_IPI_mask(cleanup_mask, IRQ_MOVE_CLEANUP_VECTOR);
+		cfg->move_in_progress = 0;
+	}
+
+	irq_desc[irq].affinity = mask;
+}
+#endif
 #endif /* CONFIG_SMP */
 
 /*
@@ -2378,26 +2461,157 @@
 	.retrigger	= ioapic_retrigger_irq,
 };
 
-int arch_setup_msi_irq(struct pci_dev *dev, struct msi_desc *desc)
+#ifdef CONFIG_INTR_REMAP
+static struct irq_chip msi_ir_chip = {
+	.name		= "IR-PCI-MSI",
+	.unmask		= unmask_msi_irq,
+	.mask		= mask_msi_irq,
+	.ack		= ack_x2apic_edge,
+#ifdef CONFIG_SMP
+	.set_affinity	= ir_set_msi_irq_affinity,
+#endif
+	.retrigger	= ioapic_retrigger_irq,
+};
+
+/*
+ * Map the PCI dev to the corresponding remapping hardware unit
+ * and allocate 'nvec' consecutive interrupt-remapping table entries
+ * in it.
+ */
+static int msi_alloc_irte(struct pci_dev *dev, int irq, int nvec)
 {
+	struct intel_iommu *iommu;
+	int index;
+
+	iommu = map_dev_to_ir(dev);
+	if (!iommu) {
+		printk(KERN_ERR
+		       "Unable to map PCI %s to iommu\n", pci_name(dev));
+		return -ENOENT;
+	}
+
+	index = alloc_irte(iommu, irq, nvec);
+	if (index < 0) {
+		printk(KERN_ERR
+		       "Unable to allocate %d IRTE for PCI %s\n", nvec,
+		        pci_name(dev));
+		return -ENOSPC;
+	}
+	return index;
+}
+#endif
+
+static int setup_msi_irq(struct pci_dev *dev, struct msi_desc *desc, int irq)
+{
+	int ret;
 	struct msi_msg msg;
+
+	ret = msi_compose_msg(dev, irq, &msg);
+	if (ret < 0)
+		return ret;
+
+	set_irq_msi(irq, desc);
+	write_msi_msg(irq, &msg);
+
+#ifdef CONFIG_INTR_REMAP
+	if (irq_remapped(irq)) {
+		struct irq_desc *desc = irq_desc + irq;
+		/*
+		 * irq migration in process context
+		 */
+		desc->status |= IRQ_MOVE_PCNTXT;
+		set_irq_chip_and_handler_name(irq, &msi_ir_chip, handle_edge_irq, "edge");
+	} else
+#endif
+		set_irq_chip_and_handler_name(irq, &msi_chip, handle_edge_irq, "edge");
+
+	return 0;
+}
+
+int arch_setup_msi_irq(struct pci_dev *dev, struct msi_desc *desc)
+{
 	int irq, ret;
+
 	irq = create_irq();
 	if (irq < 0)
 		return irq;
 
-	ret = msi_compose_msg(dev, irq, &msg);
+#ifdef CONFIG_INTR_REMAP
+	if (!intr_remapping_enabled)
+		goto no_ir;
+
+	ret = msi_alloc_irte(dev, irq, 1);
+	if (ret < 0)
+		goto error;
+no_ir:
+#endif
+	ret = setup_msi_irq(dev, desc, irq);
 	if (ret < 0) {
 		destroy_irq(irq);
 		return ret;
 	}
+	return 0;
 
-	set_irq_msi(irq, desc);
-	write_msi_msg(irq, &msg);
+#ifdef CONFIG_INTR_REMAP
+error:
+	destroy_irq(irq);
+	return ret;
+#endif
+}
 
-	set_irq_chip_and_handler_name(irq, &msi_chip, handle_edge_irq, "edge");
+int arch_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
+{
+	int irq, ret, sub_handle;
+	struct msi_desc *desc;
+#ifdef CONFIG_INTR_REMAP
+	struct intel_iommu *iommu = 0;
+	int index = 0;
+#endif
 
+	sub_handle = 0;
+	list_for_each_entry(desc, &dev->msi_list, list) {
+		irq = create_irq();
+		if (irq < 0)
+			return irq;
+#ifdef CONFIG_INTR_REMAP
+		if (!intr_remapping_enabled)
+			goto no_ir;
+
+		if (!sub_handle) {
+			/*
+			 * allocate the consecutive block of IRTE's
+			 * for 'nvec'
+			 */
+			index = msi_alloc_irte(dev, irq, nvec);
+			if (index < 0) {
+				ret = index;
+				goto error;
+			}
+		} else {
+			iommu = map_dev_to_ir(dev);
+			if (!iommu) {
+				ret = -ENOENT;
+				goto error;
+			}
+			/*
+			 * setup the mapping between the irq and the IRTE
+			 * base index, the sub_handle pointing to the
+			 * appropriate interrupt remap table entry.
+			 */
+			set_irte_irq(irq, iommu, index, sub_handle);
+		}
+no_ir:
+#endif
+		ret = setup_msi_irq(dev, desc, irq);
+		if (ret < 0)
+			goto error;
+		sub_handle++;
+	}
 	return 0;
+
+error:
+	destroy_irq(irq);
+	return ret;
 }
 
 void arch_teardown_msi_irq(unsigned int irq)
Index: tree-x86/drivers/pci/intr_remapping.c
===================================================================
--- tree-x86.orig/drivers/pci/intr_remapping.c	2008-07-10 09:52:31.000000000 -0700
+++ tree-x86/drivers/pci/intr_remapping.c	2008-07-10 09:52:34.000000000 -0700
@@ -230,6 +230,17 @@
 	return NULL;
 }
 
+struct intel_iommu *map_dev_to_ir(struct pci_dev *dev)
+{
+	struct dmar_drhd_unit *drhd;
+
+	drhd = dmar_find_matched_drhd_unit(dev);
+	if (!drhd)
+		return NULL;
+
+	return drhd->iommu;
+}
+
 int free_irte(int irq)
 {
 	int index, i;
Index: tree-x86/include/linux/dmar.h
===================================================================
--- tree-x86.orig/include/linux/dmar.h	2008-07-10 09:52:31.000000000 -0700
+++ tree-x86/include/linux/dmar.h	2008-07-10 09:52:34.000000000 -0700
@@ -109,6 +109,7 @@
 extern int free_irte(int irq);
 
 extern int irq_remapped(int irq);
+extern struct intel_iommu *map_dev_to_ir(struct pci_dev *dev);
 extern struct intel_iommu *map_ioapic_to_ir(int apic);
 #else
 #define irq_remapped(irq)		(0)
Index: tree-x86/include/asm-x86/msidef.h
===================================================================
--- tree-x86.orig/include/asm-x86/msidef.h	2008-07-10 09:51:44.000000000 -0700
+++ tree-x86/include/asm-x86/msidef.h	2008-07-10 09:52:34.000000000 -0700
@@ -48,4 +48,8 @@
 #define  MSI_ADDR_DEST_ID(dest)		(((dest) << MSI_ADDR_DEST_ID_SHIFT) & \
 					 MSI_ADDR_DEST_ID_MASK)
 
+#define MSI_ADDR_IR_EXT_INT		(1 << 4)
+#define MSI_ADDR_IR_SHV			(1 << 3)
+#define MSI_ADDR_IR_INDEX1(index)	((index & 0x8000) >> 13)
+#define MSI_ADDR_IR_INDEX2(index)	((index & 0x7fff) << 5)
 #endif /* ASM_MSIDEF_H */

-- 


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [patch 24/26] x64, x2apic/intr-remap: add x2apic support, including enabling interrupt-remapping
  2008-07-10 18:16 [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support Suresh Siddha
                   ` (22 preceding siblings ...)
  2008-07-10 18:16 ` [patch 23/26] x64, x2apic/intr-remap: MSI and MSI-X support for interrupt remapping infrastructure Suresh Siddha
@ 2008-07-10 18:16 ` Suresh Siddha
  2008-07-10 18:16 ` [patch 25/26] x64, x2apic/intr-remap: support for x2apic physical mode support Suresh Siddha
                   ` (3 subsequent siblings)
  27 siblings, 0 replies; 87+ messages in thread
From: Suresh Siddha @ 2008-07-10 18:16 UTC (permalink / raw)
  To: mingo, hpa, tglx, akpm, arjan, andi, ebiederm, jbarnes, steiner
  Cc: linux-kernel, Suresh Siddha

[-- Attachment #1: enable_x2apic.patch --]
[-- Type: text/plain, Size: 9400 bytes --]

x2apic support.  Interrupt-remapping must be enabled before enabling x2apic,
this is needed to ensure that IO interrupts continue to work properly after the
cpu mode is changed to x2apic(which uses 32bit extended physical/cluster
apic id).

On systems where apicid's are > 255, BIOS can handover the control to OS in
x2apic mode. Or if the OS handover was in legacy xapic mode, check
if it is capable of x2apic mode. And if we succeed in enabling
Interrupt-remapping, then we can enable x2apic mode in the CPU.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---

Index: tree-x86/arch/x86/kernel/apic_64.c
===================================================================
--- tree-x86.orig/arch/x86/kernel/apic_64.c	2008-07-10 09:52:31.000000000 -0700
+++ tree-x86/arch/x86/kernel/apic_64.c	2008-07-10 09:52:36.000000000 -0700
@@ -27,6 +27,7 @@
 #include <linux/clockchips.h>
 #include <linux/acpi_pmtmr.h>
 #include <linux/module.h>
+#include <linux/dmar.h>
 
 #include <asm/atomic.h>
 #include <asm/smp.h>
@@ -39,6 +40,7 @@
 #include <asm/proto.h>
 #include <asm/timex.h>
 #include <asm/apic.h>
+#include <asm/i8259.h>
 
 #include <mach_ipi.h>
 #include <mach_apic.h>
@@ -46,8 +48,12 @@
 static int disable_apic_timer __cpuinitdata;
 static int apic_calibrate_pmtmr __initdata;
 int disable_apic;
+int disable_x2apic;
 int x2apic;
 
+/* x2apic enabled before OS handover */
+int x2apic_preenabled;
+
 /* Local APIC timer works in C2 */
 int local_apic_timer_c2_ok;
 EXPORT_SYMBOL_GPL(local_apic_timer_c2_ok);
@@ -896,6 +902,125 @@
 	apic_pm_activate();
 }
 
+void check_x2apic(void)
+{
+	int msr, msr2;
+
+	rdmsr(MSR_IA32_APICBASE, msr, msr2);
+
+	if (msr & X2APIC_ENABLE) {
+		printk("x2apic enabled by BIOS, switching to x2apic ops\n");
+		x2apic_preenabled = x2apic = 1;
+		apic_ops = &x2apic_ops;
+	}
+}
+
+void enable_x2apic(void)
+{
+	int msr, msr2;
+
+	rdmsr(MSR_IA32_APICBASE, msr, msr2);
+	if (!(msr & X2APIC_ENABLE)) {
+		printk("Enabling x2apic\n");
+		wrmsr(MSR_IA32_APICBASE, msr | X2APIC_ENABLE, 0);
+	}
+}
+
+void enable_IR_x2apic(void)
+{
+#ifdef CONFIG_INTR_REMAP
+	int ret;
+	unsigned long flags;
+
+	if (!cpu_has_x2apic)
+		return;
+
+	if (!x2apic_preenabled && disable_x2apic) {
+		printk(KERN_INFO
+		       "Skipped enabling x2apic and Interrupt-remapping "
+		       "because of nox2apic\n");
+		return;
+	}
+
+	if (x2apic_preenabled && disable_x2apic)
+		panic("Bios already enabled x2apic, can't enforce nox2apic");
+
+	if (!x2apic_preenabled && skip_ioapic_setup) {
+		printk(KERN_INFO
+		       "Skipped enabling x2apic and Interrupt-remapping "
+		       "because of skipping io-apic setup\n");
+		return;
+	}
+
+	ret = dmar_table_init();
+	if (ret) {
+		printk(KERN_INFO
+		       "dmar_table_init() failed with %d:\n", ret);
+
+		if (x2apic_preenabled)
+			panic("x2apic enabled by bios. But IR enabling failed");
+		else
+			printk(KERN_INFO
+			       "Not enabling x2apic,Intr-remapping\n");
+		return;
+	}
+
+	local_irq_save(flags);
+	mask_8259A();
+	save_mask_IO_APIC_setup();
+
+	ret = enable_intr_remapping(1);
+
+	if (ret && x2apic_preenabled) {
+		local_irq_restore(flags);
+		panic("x2apic enabled by bios. But IR enabling failed");
+	}
+
+	if (ret)
+		goto end;
+
+	if (!x2apic) {
+		x2apic = 1;
+		apic_ops = &x2apic_ops;
+		enable_x2apic();
+	}
+end:
+	if (ret)
+		/*
+		 * IR enabling failed
+		 */
+		restore_IO_APIC_setup();
+	else
+		reinit_intr_remapped_IO_APIC(x2apic_preenabled);
+
+	unmask_8259A();
+	local_irq_restore(flags);
+
+	if (!ret) {
+		if (!x2apic_preenabled)
+			printk(KERN_INFO
+			       "Enabled x2apic and interrupt-remapping\n");
+		else
+			printk(KERN_INFO
+			       "Enabled Interrupt-remapping\n");
+	} else
+		printk(KERN_ERR
+		       "Failed to enable Interrupt-remapping and x2apic\n");
+#else
+	if (!cpu_has_x2apic)
+		return;
+
+	if (x2apic_preenabled)
+		panic("x2apic enabled prior OS handover,"
+		      " enable CONFIG_INTR_REMAP");
+
+	printk(KERN_INFO "Enable CONFIG_INTR_REMAP for enabling intr-remapping "
+	       " and x2apic\n");
+#endif
+
+	return;
+}
+
 /*
  * Detect and enable local APICs on non-SMP boards.
  * Original code written by Keir Fraser.
@@ -943,6 +1068,11 @@
  */
 void __init init_apic_mappings(void)
 {
+	if (x2apic) {
+		boot_cpu_physical_apicid = GET_APIC_ID(read_apic_id());
+		return;
+	}
+
 	/*
 	 * If no local APIC can be found then set up a fake all
 	 * zeroes page to simulate the local APIC and another
@@ -981,6 +1111,9 @@
 		return -1;
 	}
 
+	enable_IR_x2apic();
+	setup_apic_routing();
+
 	verify_local_APIC();
 
 	connect_bsp_APIC();
@@ -1238,10 +1371,14 @@
 	maxlvt = lapic_get_maxlvt();
 
 	local_irq_save(flags);
-	rdmsr(MSR_IA32_APICBASE, l, h);
-	l &= ~MSR_IA32_APICBASE_BASE;
-	l |= MSR_IA32_APICBASE_ENABLE | mp_lapic_addr;
-	wrmsr(MSR_IA32_APICBASE, l, h);
+	if (!x2apic) {
+		rdmsr(MSR_IA32_APICBASE, l, h);
+		l &= ~MSR_IA32_APICBASE_BASE;
+		l |= MSR_IA32_APICBASE_ENABLE | mp_lapic_addr;
+		wrmsr(MSR_IA32_APICBASE, l, h);
+	} else
+		enable_x2apic();
+
 	apic_write(APIC_LVTERR, ERROR_APIC_VECTOR | APIC_LVT_MASKED);
 	apic_write(APIC_ID, apic_pm_state.apic_id);
 	apic_write(APIC_DFR, apic_pm_state.apic_dfr);
@@ -1381,6 +1518,15 @@
 	return (clusters > 2);
 }
 
+static __init int setup_nox2apic(char *str)
+{
+	disable_x2apic = 1;
+	clear_cpu_cap(&boot_cpu_data, X86_FEATURE_X2APIC);
+	return 0;
+}
+early_param("nox2apic", setup_nox2apic);
+
+
 /*
  * APIC command line parameters
  */
Index: tree-x86/arch/x86/kernel/genapic_64.c
===================================================================
--- tree-x86.orig/arch/x86/kernel/genapic_64.c	2008-07-10 09:52:27.000000000 -0700
+++ tree-x86/arch/x86/kernel/genapic_64.c	2008-07-10 09:52:36.000000000 -0700
@@ -16,6 +16,7 @@
 #include <linux/ctype.h>
 #include <linux/init.h>
 #include <linux/hardirq.h>
+#include <linux/dmar.h>
 
 #include <asm/smp.h>
 #include <asm/ipi.h>
Index: tree-x86/arch/x86/kernel/setup.c
===================================================================
--- tree-x86.orig/arch/x86/kernel/setup.c	2008-07-10 09:51:44.000000000 -0700
+++ tree-x86/arch/x86/kernel/setup.c	2008-07-10 09:52:36.000000000 -0700
@@ -729,6 +729,8 @@
 	num_physpages = max_pfn;
 
 	check_efer();
+ 	if (cpu_has_x2apic)
+ 		check_x2apic();
 
 	/* How many end-of-memory variables you have, grandma! */
 	/* need this before calling reserve_initrd */
Index: tree-x86/arch/x86/kernel/smpboot.c
===================================================================
--- tree-x86.orig/arch/x86/kernel/smpboot.c	2008-07-10 09:52:14.000000000 -0700
+++ tree-x86/arch/x86/kernel/smpboot.c	2008-07-10 09:52:36.000000000 -0700
@@ -1145,6 +1145,11 @@
 	current_thread_info()->cpu = 0;  /* needed? */
 	set_cpu_sibling_map(0);
 
+#ifdef CONFIG_X86_64
+	enable_IR_x2apic();
+	setup_apic_routing();
+#endif
+
 	if (smp_sanity_check(max_cpus) < 0) {
 		printk(KERN_INFO "SMP disabled\n");
 		disable_smp();
Index: tree-x86/arch/x86/kernel/acpi/boot.c
===================================================================
--- tree-x86.orig/arch/x86/kernel/acpi/boot.c	2008-07-10 09:51:44.000000000 -0700
+++ tree-x86/arch/x86/kernel/acpi/boot.c	2008-07-10 09:52:36.000000000 -0700
@@ -1347,7 +1347,9 @@
 				acpi_ioapic = 1;
 
 				smp_found_config = 1;
+#ifdef CONFIG_X86_32
 				setup_apic_routing();
+#endif
 			}
 		}
 		if (error == -EINVAL) {
Index: tree-x86/arch/x86/kernel/cpu/common_64.c
===================================================================
--- tree-x86.orig/arch/x86/kernel/cpu/common_64.c	2008-07-10 09:51:44.000000000 -0700
+++ tree-x86/arch/x86/kernel/cpu/common_64.c	2008-07-10 09:52:36.000000000 -0700
@@ -603,6 +603,8 @@
 	barrier();
 
 	check_efer();
+	if (cpu != 0 && x2apic)
+		enable_x2apic();
 
 	/*
 	 * set up and load the per-CPU TSS
Index: tree-x86/arch/x86/kernel/mpparse.c
===================================================================
--- tree-x86.orig/arch/x86/kernel/mpparse.c	2008-07-10 09:51:44.000000000 -0700
+++ tree-x86/arch/x86/kernel/mpparse.c	2008-07-10 09:52:36.000000000 -0700
@@ -545,7 +545,9 @@
        generic_bigsmp_probe();
 #endif
 
+#ifdef CONFIG_X86_32
 	setup_apic_routing();
+#endif
 	if (!num_processors)
 		printk(KERN_ERR "MPTABLE: no processors registered!\n");
 	return num_processors;
Index: tree-x86/Documentation/kernel-parameters.txt
===================================================================
--- tree-x86.orig/Documentation/kernel-parameters.txt	2008-07-10 09:51:44.000000000 -0700
+++ tree-x86/Documentation/kernel-parameters.txt	2008-07-10 09:52:36.000000000 -0700
@@ -1377,6 +1377,8 @@
 
 	nolapic_timer	[X86-32,APIC] Do not use the local APIC timer.
 
+	nox2apic	[X86-64,APIC] Do not enable x2APIC mode.
+
 	noltlbs		[PPC] Do not use large page/tlb entries for kernel
 			lowmem mapping on PPC40x.
 
Index: tree-x86/include/asm-x86/apic.h
===================================================================
--- tree-x86.orig/include/asm-x86/apic.h	2008-07-10 09:52:31.000000000 -0700
+++ tree-x86/include/asm-x86/apic.h	2008-07-10 09:52:36.000000000 -0700
@@ -100,6 +100,11 @@
 extern u32 safe_apic_wait_icr_idle(void);
 extern void apic_icr_write(u32 low, u32 id);
 #else
+extern int x2apic, x2apic_preenabled;
+extern void check_x2apic(void);
+extern void enable_x2apic(void);
+extern void enable_IR_x2apic(void);
+extern void x2apic_icr_write(u32 low, u32 id);
 
 struct apic_ops {
 	u32 (*read)(u32 reg);

-- 


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [patch 25/26] x64, x2apic/intr-remap: support for x2apic physical mode support
  2008-07-10 18:16 [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support Suresh Siddha
                   ` (23 preceding siblings ...)
  2008-07-10 18:16 ` [patch 24/26] x64, x2apic/intr-remap: add x2apic support, including enabling interrupt-remapping Suresh Siddha
@ 2008-07-10 18:16 ` Suresh Siddha
  2008-07-10 18:17 ` [patch 26/26] x64, x2apic/intr-remap: introduce CONFIG_INTR_REMAP Suresh Siddha
                   ` (2 subsequent siblings)
  27 siblings, 0 replies; 87+ messages in thread
From: Suresh Siddha @ 2008-07-10 18:16 UTC (permalink / raw)
  To: mingo, hpa, tglx, akpm, arjan, andi, ebiederm, jbarnes, steiner
  Cc: linux-kernel, Suresh Siddha

[-- Attachment #1: x2apic_physical.patch --]
[-- Type: text/plain, Size: 5329 bytes --]

x2apic Physical mode  support. By default we will use x2apic cluster mode.
x2apic physical mode can be selected using "x2apic_phys" boot parameter.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---

Index: tree-x86/arch/x86/kernel/Makefile
===================================================================
--- tree-x86.orig/arch/x86/kernel/Makefile	2008-07-10 09:52:27.000000000 -0700
+++ tree-x86/arch/x86/kernel/Makefile	2008-07-10 09:52:40.000000000 -0700
@@ -105,6 +105,7 @@
 ifeq ($(CONFIG_X86_64),y)
         obj-y				+= genapic_64.o genapic_flat_64.o genx2apic_uv_x.o tlb_uv.o
         obj-y				+= genx2apic_cluster.o
+        obj-y				+= genx2apic_phys.o
         obj-$(CONFIG_X86_PM_TIMER)	+= pmtimer_64.o
         obj-$(CONFIG_AUDIT)		+= audit_64.o
 
Index: tree-x86/include/asm-x86/genapic_64.h
===================================================================
--- tree-x86.orig/include/asm-x86/genapic_64.h	2008-07-10 09:52:27.000000000 -0700
+++ tree-x86/include/asm-x86/genapic_64.h	2008-07-10 09:52:40.000000000 -0700
@@ -36,6 +36,7 @@
 extern struct genapic apic_flat;
 extern struct genapic apic_physflat;
 extern struct genapic apic_x2apic_cluster;
+extern struct genapic apic_x2apic_phys;
 extern int acpi_madt_oem_check(char *, char *);
 
 extern void apic_send_IPI_self(int vector);
Index: tree-x86/arch/x86/kernel/genx2apic_phys.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ tree-x86/arch/x86/kernel/genx2apic_phys.c	2008-07-10 09:52:40.000000000 -0700
@@ -0,0 +1,122 @@
+#include <linux/threads.h>
+#include <linux/cpumask.h>
+#include <linux/string.h>
+#include <linux/kernel.h>
+#include <linux/ctype.h>
+#include <linux/init.h>
+#include <asm/smp.h>
+#include <asm/ipi.h>
+#include <asm/genapic.h>
+
+
+/* Start with all IRQs pointing to boot CPU.  IRQ balancing will shift them. */
+
+static cpumask_t x2apic_target_cpus(void)
+{
+	return cpumask_of_cpu(0);
+}
+
+static cpumask_t x2apic_vector_allocation_domain(int cpu)
+{
+	cpumask_t domain = CPU_MASK_NONE;
+	cpu_set(cpu, domain);
+	return domain;
+}
+
+static void __x2apic_send_IPI_dest(unsigned int apicid, int vector,
+				   unsigned int dest)
+{
+	unsigned long cfg;
+
+	cfg = __prepare_ICR(0, vector, dest);
+
+	/*
+	 * send the IPI.
+	 */
+	x2apic_icr_write(cfg, apicid);
+}
+
+static void x2apic_send_IPI_mask(cpumask_t mask, int vector)
+{
+	unsigned long flags;
+	unsigned long query_cpu;
+
+	local_irq_save(flags);
+	for_each_cpu_mask(query_cpu, mask) {
+		__x2apic_send_IPI_dest(per_cpu(x86_cpu_to_apicid, query_cpu),
+				       vector, APIC_DEST_PHYSICAL);
+	}
+	local_irq_restore(flags);
+}
+
+static void x2apic_send_IPI_allbutself(int vector)
+{
+	cpumask_t mask = cpu_online_map;
+
+	cpu_clear(smp_processor_id(), mask);
+
+	if (!cpus_empty(mask))
+		x2apic_send_IPI_mask(mask, vector);
+}
+
+static void x2apic_send_IPI_all(int vector)
+{
+	x2apic_send_IPI_mask(cpu_online_map, vector);
+}
+
+static int x2apic_apic_id_registered(void)
+{
+	return 1;
+}
+
+static unsigned int x2apic_cpu_mask_to_apicid(cpumask_t cpumask)
+{
+	int cpu;
+
+	/*
+	 * We're using fixed IRQ delivery, can only return one phys APIC ID.
+	 * May as well be the first.
+	 */
+	cpu = first_cpu(cpumask);
+	if ((unsigned)cpu < NR_CPUS)
+		return per_cpu(x86_cpu_to_apicid, cpu);
+	else
+		return BAD_APICID;
+}
+
+static unsigned int x2apic_read_id(void)
+{
+	return apic_read(APIC_ID);
+}
+
+static unsigned int phys_pkg_id(int index_msb)
+{
+	return x2apic_read_id() >> index_msb;
+}
+
+void x2apic_send_IPI_self(int vector)
+{
+	apic_write(APIC_SELF_IPI, vector);
+}
+
+void init_x2apic_ldr(void)
+{
+	return;
+}
+
+struct genapic apic_x2apic_phys = {
+	.name = "physical x2apic",
+	.int_delivery_mode = dest_Fixed,
+	.int_dest_mode = (APIC_DEST_PHYSICAL != 0),
+	.target_cpus = x2apic_target_cpus,
+	.vector_allocation_domain = x2apic_vector_allocation_domain,
+	.apic_id_registered = x2apic_apic_id_registered,
+	.init_apic_ldr = init_x2apic_ldr,
+	.send_IPI_all = x2apic_send_IPI_all,
+	.send_IPI_allbutself = x2apic_send_IPI_allbutself,
+	.send_IPI_mask = x2apic_send_IPI_mask,
+	.send_IPI_self = x2apic_send_IPI_self,
+	.cpu_mask_to_apicid = x2apic_cpu_mask_to_apicid,
+	.phys_pkg_id = phys_pkg_id,
+	.read_apic_id = x2apic_read_id,
+};
Index: tree-x86/arch/x86/kernel/genapic_64.c
===================================================================
--- tree-x86.orig/arch/x86/kernel/genapic_64.c	2008-07-10 09:52:36.000000000 -0700
+++ tree-x86/arch/x86/kernel/genapic_64.c	2008-07-10 09:52:40.000000000 -0700
@@ -30,6 +30,15 @@
 
 struct genapic __read_mostly *genapic = &apic_flat;
 
+static int x2apic_phys = 0;
+
+static int set_x2apic_phys_mode(char *arg)
+{
+	x2apic_phys = 1;
+	return 0;
+}
+early_param("x2apic_phys", set_x2apic_phys_mode);
+
 static enum uv_system_type uv_system_type;
 
 /*
@@ -39,9 +48,12 @@
 {
 	if (uv_system_type == UV_NON_UNIQUE_APIC)
 		genapic = &apic_x2apic_uv_x;
-	else if (cpu_has_x2apic && intr_remapping_enabled)
-		genapic = &apic_x2apic_cluster;
-	else
+	else if (cpu_has_x2apic && intr_remapping_enabled) {
+		if (x2apic_phys)
+			genapic = &apic_x2apic_phys;
+		else
+			genapic = &apic_x2apic_cluster;
+	} else
 #ifdef CONFIG_ACPI
 	/*
 	 * Quirk: some x86_64 machines can only use physical APIC mode

-- 


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [patch 26/26] x64, x2apic/intr-remap: introduce CONFIG_INTR_REMAP
  2008-07-10 18:16 [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support Suresh Siddha
                   ` (24 preceding siblings ...)
  2008-07-10 18:16 ` [patch 25/26] x64, x2apic/intr-remap: support for x2apic physical mode support Suresh Siddha
@ 2008-07-10 18:17 ` Suresh Siddha
  2008-07-10 23:29   ` Eric W. Biederman
  2008-07-10 19:53 ` [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support Ingo Molnar
  2008-07-10 20:05 ` Eric W. Biederman
  27 siblings, 1 reply; 87+ messages in thread
From: Suresh Siddha @ 2008-07-10 18:17 UTC (permalink / raw)
  To: mingo, hpa, tglx, akpm, arjan, andi, ebiederm, jbarnes, steiner
  Cc: linux-kernel, Suresh Siddha

[-- Attachment #1: introduce_config_intr_remap.patch --]
[-- Type: text/plain, Size: 848 bytes --]

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---

Index: tree-x86/arch/x86/Kconfig
===================================================================
--- tree-x86.orig/arch/x86/Kconfig	2008-07-10 09:51:44.000000000 -0700
+++ tree-x86/arch/x86/Kconfig	2008-07-10 09:52:43.000000000 -0700
@@ -1657,6 +1657,14 @@
 	 workaround will setup a 1:1 mapping for the first
 	 16M to make floppy (an ISA device) work.
 
+config INTR_REMAP
+	bool "Support for Interrupt Remapping (EXPERIMENTAL)"
+	depends on X86_64 && X86_IO_APIC && PCI_MSI && ACPI && EXPERIMENTAL
+	help
+	 Supports Interrupt remapping for IO-APIC and MSI devices.
+	 To use x2apic mode in the CPU's which support x2APIC enhancements or
+	 to support platforms with CPU's having > 8 bit APIC ID, say Y.
+
 source "drivers/pci/pcie/Kconfig"
 
 source "drivers/pci/Kconfig"

-- 


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support
  2008-07-10 18:16 [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support Suresh Siddha
                   ` (25 preceding siblings ...)
  2008-07-10 18:17 ` [patch 26/26] x64, x2apic/intr-remap: introduce CONFIG_INTR_REMAP Suresh Siddha
@ 2008-07-10 19:53 ` Ingo Molnar
  2008-07-10 20:22   ` Suresh Siddha
                     ` (3 more replies)
  2008-07-10 20:05 ` Eric W. Biederman
  27 siblings, 4 replies; 87+ messages in thread
From: Ingo Molnar @ 2008-07-10 19:53 UTC (permalink / raw)
  To: Suresh Siddha
  Cc: hpa, tglx, akpm, arjan, andi, ebiederm, jbarnes, steiner,
	linux-kernel


* Suresh Siddha <suresh.b.siddha@intel.com> wrote:

> x2APIC architecture provides a new x2apic mode, which allows for the 
> increased range of processor addressability ( > 8 bit apic ID 
> support), MSR access to APIC registers, etc. x2apic specification can 
> be found at 
> http://download.intel.com/design/processor/specupdt/318148.pdf 
> (located under 
> http://developer.intel.com/products/processor/manuals/index.htm )
> 
> Interrupt-remapping is part of Intel Virtualization Technology for 
> Directed I/O architecture and the specification can be found from 
> http://download.intel.com/technology/computing/vptech/Intel(r)_VT_for_Direct_IO.pdf 
> (above link seems to be broken for the moment, but in general it 
> should be found under http://www.intel.com/technology/virtualization/ 
> )
> 
> Interrupt-remapping architecture enables extended Interrupt Mode on 
> x86 platforms supporting 32-bit APIC-IDs. This infrastructure allows 
> the existing interrupt sources such as I/OxAPICs and MSI/MSI-X devices 
> work seamlessly with apic-id's > 8 bits. As such, this is a 
> pre-requisite for enabling x2apic mode in the CPU.
> 
> This patchset adds 64-bit support for interrupt-remapping and x2apic, 
> which introduces apic_ops for basic APIC ops(uncached memory Vs MSR 
> accesses etc), new irq_chip's for supporting interrupt-remapping and 
> new genapic for supporting IPI's, logical cluster/physical x2apic 
> modes.
> 
> irq migration in the presence of interrupt-remapping is done from the 
> process-context as opposed to interrupt-context. Interrupt-remapping 
> infrastrucutre allows us to do this migration in a simple fashion 
> (atleast for edge triggered interrupts).
> 
> Interrupt-remapping (CONFIG_INTR_REMAP) and DMA-remapping 
> (CONFIG_DMAR) can be enabled separately.
> 
> More details in the individual patches that follow.

quite some stuff!

For review and testing purposes i've created a new topic branch for 
this: tip/x86/x2apic and have picked up your patches into it.

I've pushed it out, but it's not merged into tip/master yet (obviously, 
you sent this just a few minutes ago :)

It integrates fine with tip/master. If you do this:

  git-checkout tip/master
  git-merge tip/x86/x2apic

you'll get a clean merge.

Btw., i threw it at the -tip test-cluster and got back a quick build 
bugreport:

arch/x86/xen/enlighten.c: In function 'xen_patch':
arch/x86/xen/enlighten.c:1084: warning: label 'patch_site' defined but not used
arch/x86/xen/enlighten.c: At top level:
arch/x86/xen/enlighten.c:1272: error: expected identifier before '(' token
arch/x86/xen/enlighten.c:1273: error: expected '}' before '.' token
arch/x86/kernel/paravirt.c:376:2: error: invalid preprocessing directive 
#ifndedarch/x86/kernel/paravirt.c:384:2: error: #endif without #if

with this config:

  http://redhat.com/~mingo/misc/config-Thu_Jul_10_21_43_28_CEST_2008.bad

bisection shows that it was caused by:

| 38c56e6b674074f8ec98722ceca3d15771e17abe is first bad commit
| commit 38c56e6b674074f8ec98722ceca3d15771e17abe
| Author: Suresh Siddha <suresh.b.siddha@intel.com>
| Date:   Thu Jul 10 11:16:49 2008 -0700
|
|    x64, x2apic/intr-remap: basic apic ops support

(that's all for tonight - will have another look tomorrow.)

	Ingo

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support
  2008-07-10 18:16 [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support Suresh Siddha
                   ` (26 preceding siblings ...)
  2008-07-10 19:53 ` [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support Ingo Molnar
@ 2008-07-10 20:05 ` Eric W. Biederman
  2008-07-10 20:18   ` Ingo Molnar
                     ` (2 more replies)
  27 siblings, 3 replies; 87+ messages in thread
From: Eric W. Biederman @ 2008-07-10 20:05 UTC (permalink / raw)
  To: Suresh Siddha
  Cc: mingo, hpa, tglx, akpm, arjan, andi, jbarnes, steiner,
	linux-kernel


At a quick skim nothing looks really bad, nothing looks really bad,
but you are dealing in an area of code that could be made much nicer
and if we are going to support a noticeably different style of irq
management we need to get in some of those pending cleanups so the
code does not fall down under it's own wait.

Suresh Siddha <suresh.b.siddha@intel.com> writes:

> irq migration in the presence of interrupt-remapping is done from the
> process-context as opposed to interrupt-context. Interrupt-remapping
> infrastrucutre allows us to do this migration in a simple fashion (atleast for
> edge triggered interrupts).

Unless I have misread things this irq migration remains racy, as I did not
see any instructions that would guarantee that in flight irqs were flushed
to the cpus local apics before we cleaned up the destination.

You are sizing an array as NR_IRQS this is something there should be sufficient
existing infrastructure to avoid.  Arrays sized by NR_IRQS is a significant
problem both for scaling the system up and down so ultimately we need to kill
this.  For now we should not introduce any new arrays.

A lot of your code is generic, and some of it is for just x86_64.  Since the
cpus are capable of running in 32bit mode.  We really need to implement x86_32
and x86_64 support in the same code base.  Which I believe means factoring out
pieces of io_apic_N.c into things such as msi.c that can be shared between the
two architectures.

Eric


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support
  2008-07-10 20:05 ` Eric W. Biederman
@ 2008-07-10 20:18   ` Ingo Molnar
  2008-07-10 21:07     ` Eric W. Biederman
  2008-07-10 21:15   ` Suresh Siddha
  2008-07-10 22:09   ` Arjan van de Ven
  2 siblings, 1 reply; 87+ messages in thread
From: Ingo Molnar @ 2008-07-10 20:18 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Suresh Siddha, hpa, tglx, akpm, arjan, andi, jbarnes, steiner,
	linux-kernel


* Eric W. Biederman <ebiederm@xmission.com> wrote:

> A lot of your code is generic, and some of it is for just x86_64.  
> Since the cpus are capable of running in 32bit mode.  We really need 
> to implement x86_32 and x86_64 support in the same code base.  Which I 
> believe means factoring out pieces of io_apic_N.c into things such as 
> msi.c that can be shared between the two architectures.

i think the APIC code should be fully unified down the line - the 
APIC/IOAPIC knows little about the mode the CPU is running in and has to 
be programmed the same way independent of which mode the CPU is in. The 
current fork between the 32-bit and 64-bit APIC code is in good part 
artificial.

	Ingo

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support
  2008-07-10 19:53 ` [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support Ingo Molnar
@ 2008-07-10 20:22   ` Suresh Siddha
  2008-07-10 21:56   ` Suresh Siddha
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 87+ messages in thread
From: Suresh Siddha @ 2008-07-10 20:22 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Siddha, Suresh B, hpa@zytor.com, tglx@linutronix.de,
	akpm@linux-foundation.org, arjan@linux.intel.com,
	andi@firstfloor.org, ebiederm@xmission.com,
	jbarnes@virtuousgeek.org, steiner@sgi.com,
	linux-kernel@vger.kernel.org

On Thu, Jul 10, 2008 at 12:53:20PM -0700, Ingo Molnar wrote:
> quite some stuff!
> 
> For review and testing purposes i've created a new topic branch for
> this: tip/x86/x2apic and have picked up your patches into it.
> 
> I've pushed it out, but it's not merged into tip/master yet (obviously,
> you sent this just a few minutes ago :)
> 
> It integrates fine with tip/master. If you do this:
> 
>   git-checkout tip/master
>   git-merge tip/x86/x2apic
> 
> you'll get a clean merge.
> 
> Btw., i threw it at the -tip test-cluster and got back a quick build
> bugreport:
> 
> arch/x86/xen/enlighten.c: In function 'xen_patch':
> arch/x86/xen/enlighten.c:1084: warning: label 'patch_site' defined but not used
> arch/x86/xen/enlighten.c: At top level:
> arch/x86/xen/enlighten.c:1272: error: expected identifier before '(' token
> arch/x86/xen/enlighten.c:1273: error: expected '}' before '.' token
> arch/x86/kernel/paravirt.c:376:2: error: invalid preprocessing directive
> #ifndedarch/x86/kernel/paravirt.c:384:2: error: #endif without #if
> 
> with this config:
> 
>   http://redhat.com/~mingo/misc/config-Thu_Jul_10_21_43_28_CEST_2008.bad

Yes. It is conflicting with Jermey's recent changes. I will post the fixes
soon, meanwhile everything should be ok with !CONFIG_PARAVIRT

thanks,
suresh

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support
  2008-07-10 20:18   ` Ingo Molnar
@ 2008-07-10 21:07     ` Eric W. Biederman
  0 siblings, 0 replies; 87+ messages in thread
From: Eric W. Biederman @ 2008-07-10 21:07 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Suresh Siddha, hpa, tglx, akpm, arjan, andi, jbarnes, steiner,
	linux-kernel

Ingo Molnar <mingo@elte.hu> writes:

> * Eric W. Biederman <ebiederm@xmission.com> wrote:
>
>> A lot of your code is generic, and some of it is for just x86_64.  
>> Since the cpus are capable of running in 32bit mode.  We really need 
>> to implement x86_32 and x86_64 support in the same code base.  Which I 
>> believe means factoring out pieces of io_apic_N.c into things such as 
>> msi.c that can be shared between the two architectures.
>
> i think the APIC code should be fully unified down the line - the 
> APIC/IOAPIC knows little about the mode the CPU is running in and has to 
> be programmed the same way independent of which mode the CPU is in. The 
> current fork between the 32-bit and 64-bit APIC code is in good part 
> artificial.

I completely agree.  However
1) There is a fair amount of work involved in the unification so taking it in small pieces
   is good.
2) We are doing much more then we should in ioapic_N.c anyway.

So it makes sense to grab the pieces we are actively working on factor them out and unify them
first.

The basic model I expect will end up looking something like:
Type of cpu irq reception.  PIC mode, local APIC mode, ???? Virtualized modes????
Type of irq source.  ioapic, msi, htirq, ??? Virtualized source ????
Type of configuration mptable, acpi mps table.

Some of this we have split out today and nicely factored.  Other parts we don't.

In particular for setting up msi and ioapics we use exactly the same mapping
of bits.  So we describe things a little differently from the irq reception layer
to the irq sending layer we should be able to reuse exactly the same msi and ioapic code
instead of having their setup methods test for irq_remapping().

Eric

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support
  2008-07-10 20:05 ` Eric W. Biederman
  2008-07-10 20:18   ` Ingo Molnar
@ 2008-07-10 21:15   ` Suresh Siddha
  2008-07-10 22:52     ` Eric W. Biederman
  2008-07-10 22:09   ` Arjan van de Ven
  2 siblings, 1 reply; 87+ messages in thread
From: Suresh Siddha @ 2008-07-10 21:15 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Siddha, Suresh B, mingo@elte.hu, hpa@zytor.com,
	tglx@linutronix.de, akpm@linux-foundation.org,
	arjan@linux.intel.com, andi@firstfloor.org,
	jbarnes@virtuousgeek.org, steiner@sgi.com,
	linux-kernel@vger.kernel.org

On Thu, Jul 10, 2008 at 01:05:50PM -0700, Eric W. Biederman wrote:
> 
> At a quick skim nothing looks really bad, nothing looks really bad,
> but you are dealing in an area of code that could be made much nicer
> and if we are going to support a noticeably different style of irq
> management we need to get in some of those pending cleanups so the
> code does not fall down under it's own wait.
> 
> Suresh Siddha <suresh.b.siddha@intel.com> writes:
> 
> > irq migration in the presence of interrupt-remapping is done from the
> > process-context as opposed to interrupt-context. Interrupt-remapping
> > infrastrucutre allows us to do this migration in a simple fashion (atleast for
> > edge triggered interrupts).
> 
> Unless I have misread things this irq migration remains racy, as I did not
> see any instructions that would guarantee that in flight irqs were flushed
> to the cpus local apics before we cleaned up the destination.

Flushing the interrupt entry cache will take care of this. We modify the IRTE
and then flush the interrupt entry cache before cleaning up the original
vector allocated.

Any new interrupts from the device will see the new entry. Old in flight
interrupts will be registered at the CPU before the flush of the cache is
complete.

> 
> You are sizing an array as NR_IRQS this is something there should be sufficient
> existing infrastructure to avoid.  Arrays sized by NR_IRQS is a significant
> problem both for scaling the system up and down so ultimately we need to kill
> this.  For now we should not introduce any new arrays.

Ok. Ideally dynamic_irq_init()/cleanup() can take care of this. or
create_irq()/destroy_irq() and embed this as a pointer somewhere inside
irq_desc. I need to take a look at this more closer and post a fix up patch.

> 
> A lot of your code is generic, and some of it is for just x86_64.  Since the
> cpus are capable of running in 32bit mode.  We really need to implement x86_32
> and x86_64 support in the same code base.  Which I believe means factoring out
> pieces of io_apic_N.c into things such as msi.c that can be shared between the
> two architectures.

Yes, As you and Ingo mentioned, there is nothing 64bit specific and one
can easily add the 32bit support. But before that we need, some more
x86 unification and I am very short on resources currently :(

thanks,
suresh

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support
  2008-07-10 19:53 ` [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support Ingo Molnar
  2008-07-10 20:22   ` Suresh Siddha
@ 2008-07-10 21:56   ` Suresh Siddha
  2008-07-11 10:28     ` Ingo Molnar
  2008-07-16 14:37   ` Yong Wang
  2008-07-22 20:49   ` Andrew Morton
  3 siblings, 1 reply; 87+ messages in thread
From: Suresh Siddha @ 2008-07-10 21:56 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Siddha, Suresh B, hpa@zytor.com, tglx@linutronix.de,
	akpm@linux-foundation.org, arjan@linux.intel.com,
	andi@firstfloor.org, ebiederm@xmission.com,
	jbarnes@virtuousgeek.org, steiner@sgi.com,
	linux-kernel@vger.kernel.org, jeremy

On Thu, Jul 10, 2008 at 12:53:20PM -0700, Ingo Molnar wrote:
> 
> Btw., i threw it at the -tip test-cluster and got back a quick build
> bugreport:
> 
> arch/x86/xen/enlighten.c: In function 'xen_patch':
> arch/x86/xen/enlighten.c:1084: warning: label 'patch_site' defined but not used
> arch/x86/xen/enlighten.c: At top level:
> arch/x86/xen/enlighten.c:1272: error: expected identifier before '(' token
> arch/x86/xen/enlighten.c:1273: error: expected '}' before '.' token
> arch/x86/kernel/paravirt.c:376:2: error: invalid preprocessing directive
> #ifndedarch/x86/kernel/paravirt.c:384:2: error: #endif without #if
> 
> with this config:
> 
>   http://redhat.com/~mingo/misc/config-Thu_Jul_10_21_43_28_CEST_2008.bad

Ingo, that was my stupid typo. Please apply this patch. BTW, we
need some more xen64 paravirt fixes in this area. I will look at it
as soon as possible.
---

[patch] compile fix with x2apic patch and CONFIG_PARAVIRT

fix the typo.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---

diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index b80105a..4f29ff8 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -360,7 +360,7 @@ struct pv_cpu_ops pv_cpu_ops = {
 
 struct pv_apic_ops pv_apic_ops = {
 #ifdef CONFIG_X86_LOCAL_APIC
-#ifnded CONFIG_X86_64
+#ifndef CONFIG_X86_64
 	.apic_write = native_apic_mem_write,
 	.apic_write_atomic = native_apic_mem_write_atomic,
 	.apic_read = native_apic_mem_read,

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support
  2008-07-10 20:05 ` Eric W. Biederman
  2008-07-10 20:18   ` Ingo Molnar
  2008-07-10 21:15   ` Suresh Siddha
@ 2008-07-10 22:09   ` Arjan van de Ven
  2008-07-10 22:54     ` Eric W. Biederman
  2 siblings, 1 reply; 87+ messages in thread
From: Arjan van de Ven @ 2008-07-10 22:09 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Suresh Siddha, mingo, hpa, tglx, akpm, andi, jbarnes, steiner,
	linux-kernel

Eric W. Biederman wrote:
> A lot of your code is generic, and some of it is for just x86_64.  Since the
> cpus are capable of running in 32bit mode.  We really need to implement x86_32
> and x86_64 support in the same code base.  Which I believe means factoring out
> pieces of io_apic_N.c into things such as msi.c that can be shared between the
> two architectures.

in general the purpose of this is to support 256 and more logical threads....
.... not going to happen on 32 bit


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support
  2008-07-10 21:15   ` Suresh Siddha
@ 2008-07-10 22:52     ` Eric W. Biederman
  2008-07-11  2:35       ` Suresh Siddha
  0 siblings, 1 reply; 87+ messages in thread
From: Eric W. Biederman @ 2008-07-10 22:52 UTC (permalink / raw)
  To: Suresh Siddha
  Cc: mingo@elte.hu, hpa@zytor.com, tglx@linutronix.de,
	akpm@linux-foundation.org, arjan@linux.intel.com,
	andi@firstfloor.org, jbarnes@virtuousgeek.org, steiner@sgi.com,
	linux-kernel@vger.kernel.org

Suresh Siddha <suresh.b.siddha@intel.com> writes:

> Flushing the interrupt entry cache will take care of this. We modify the IRTE
> and then flush the interrupt entry cache before cleaning up the original
> vector allocated.
>
> Any new interrupts from the device will see the new entry. Old in flight
> interrupts will be registered at the CPU before the flush of the cache is
> complete.

That sounds nice in principle.  I saw cpu cache flushes, I saw writes.
I did not see any reads which is necessary to get that behavior with
the standard pci transaction rules.

Having seen enough little races and misbehaving hardware I'm very paranoid
about irq migration.  The current implementation is belt and suspenders
and I still think there are races that I have missed.

>> You are sizing an array as NR_IRQS this is something there should be
> sufficient
>> existing infrastructure to avoid.  Arrays sized by NR_IRQS is a significant
>> problem both for scaling the system up and down so ultimately we need to kill
>> this.  For now we should not introduce any new arrays.
>
> Ok. Ideally dynamic_irq_init()/cleanup() can take care of this. or
> create_irq()/destroy_irq() and embed this as a pointer somewhere inside
> irq_desc. I need to take a look at this more closer and post a fix up patch.

Sounds good.  Ultimately we are looking at handler_data or chip_data.
There are very specific rules that meant I could not use them for
the msi data but otherwise I don't remember exactly what the are for.
IOMMU are covered though.

>> A lot of your code is generic, and some of it is for just x86_64.  Since the
>> cpus are capable of running in 32bit mode.  We really need to implement x86_32
>> and x86_64 support in the same code base.  Which I believe means factoring out
>> pieces of io_apic_N.c into things such as msi.c that can be shared between the
>> two architectures.
>
> Yes, As you and Ingo mentioned, there is nothing 64bit specific and one
> can easily add the 32bit support. But before that we need, some more
> x86 unification and I am very short on resources currently :(

At least for msi the code you are working on was essentially unified
when it was written, it just happened to have two copies.  I don't
think I'm asking for heaving lifting.  Mostly just putting code that
is touched into something other then the growing monstrosity that is
ioapic.c

Further can we please see some better abstractions.  In particular can
we generate a token for the irq destination.  And have the msi and
ioapic setup read that token and program it into the hardware.  The
rules for which bits go where is exactly the same both with and
without irq_remapping so having an if statement there seems to obscure
what is really happening.  Especially if as it appears that we may be used
the new token format with x2apics without remapping.

My primary concern is that the end result be well factored irq handling code
so it is possible to get in there and look at the code and maintain it.

A small part of that is the 32bit support.  Another part are the missing
abstractions I described.  I don't know what else since I have barely scratched the surface patch
review wise.

Eric

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support
  2008-07-10 22:09   ` Arjan van de Ven
@ 2008-07-10 22:54     ` Eric W. Biederman
  0 siblings, 0 replies; 87+ messages in thread
From: Eric W. Biederman @ 2008-07-10 22:54 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Suresh Siddha, mingo, hpa, tglx, akpm, andi, jbarnes, steiner,
	linux-kernel

Arjan van de Ven <arjan@linux.intel.com> writes:

> in general the purpose of this is to support 256 and more logical threads....
> .... not going to happen on 32 bit

Huh?

Intel is still making new 32bit processors with otherwise modern chipsets
like tolapai so I don't plan to count anything out.  Especially things like
the iommu support for irqs may be interesting.  Further the iommu irq
support is interesting to hypervisors so I would not at all be surprised
to see the code getting reused in that context, and with 32bit kernels.

I'm not asking for anyone to unify code they are not touching but
rather to code things so unification because trivial.  The entire
purpose of having arch/x86.  We have had this discussion and the
viewpoint that we won't add new hardware features to x86_32 and just
put it in maintenance mode lost, because the hardware manufactures
include Intel have not put x86_32 into strictly maintenance mode.

Eric


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [patch 11/26] x64, x2apic/intr-remap: generic irq migration support from process context
  2008-07-10 18:16 ` [patch 11/26] x64, x2apic/intr-remap: generic irq migration support from process context Suresh Siddha
@ 2008-07-10 23:08   ` Eric W. Biederman
  2008-07-11  5:41     ` Suresh Siddha
  0 siblings, 1 reply; 87+ messages in thread
From: Eric W. Biederman @ 2008-07-10 23:08 UTC (permalink / raw)
  To: Suresh Siddha
  Cc: mingo, hpa, tglx, akpm, arjan, andi, ebiederm, jbarnes, steiner,
	linux-kernel

Suresh Siddha <suresh.b.siddha@intel.com> writes:

> Generic infrastructure for migrating the irq from the process context in the
> presence of CONFIG_GENERIC_PENDING_IRQ.
>
> This will be used later for migrating irq in the presence of
> interrupt-remapping.

Why the API difference between IRQ_MOVE_PCNTXT set affinity handlers and
!CONFIG_GENERIC_PENDING_IRQ handlers?


>  #ifdef CONFIG_GENERIC_PENDING_IRQ
> -	set_pending_irq(irq, cpumask);
> +	if (desc->status & IRQ_MOVE_PCNTXT) {
> +		unsigned long flags;
> +
> +		spin_lock_irqsave(&desc->lock, flags);
> +		desc->chip->set_affinity(irq, cpumask);
> +		spin_unlock_irqrestore(&desc->lock, flags);
> +	} else
> +		set_pending_irq(irq, cpumask);
>  #else
>  	desc->affinity = cpumask;
>  	desc->chip->set_affinity(irq, cpumask);
>
> -- 

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [patch 26/26] x64, x2apic/intr-remap: introduce CONFIG_INTR_REMAP
  2008-07-10 18:17 ` [patch 26/26] x64, x2apic/intr-remap: introduce CONFIG_INTR_REMAP Suresh Siddha
@ 2008-07-10 23:29   ` Eric W. Biederman
  2008-07-10 23:37     ` Yong Wang
  0 siblings, 1 reply; 87+ messages in thread
From: Eric W. Biederman @ 2008-07-10 23:29 UTC (permalink / raw)
  To: Suresh Siddha
  Cc: mingo, hpa, tglx, akpm, arjan, andi, ebiederm, jbarnes, steiner,
	linux-kernel

Suresh Siddha <suresh.b.siddha@intel.com> writes:

> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
> ---
>
> Index: tree-x86/arch/x86/Kconfig
> ===================================================================
> --- tree-x86.orig/arch/x86/Kconfig	2008-07-10 09:51:44.000000000 -0700
> +++ tree-x86/arch/x86/Kconfig	2008-07-10 09:52:43.000000000 -0700
> @@ -1657,6 +1657,14 @@
>  	 workaround will setup a 1:1 mapping for the first
>  	 16M to make floppy (an ISA device) work.
>  
> +config INTR_REMAP
> +	bool "Support for Interrupt Remapping (EXPERIMENTAL)"
> +	depends on X86_64 && X86_IO_APIC && PCI_MSI && ACPI && EXPERIMENTAL
> +	help
> +	 Supports Interrupt remapping for IO-APIC and MSI devices.
> +	 To use x2apic mode in the CPU's which support x2APIC enhancements or
> +	 to support platforms with CPU's having > 8 bit APIC ID, say Y.
> +

This has got to be the strangest config option.
X86_IO_APIC makes sense.  Fundamentally it is the wrong config option but that
isn't the fault of your code.

PCI_MSI should not matter.

Is it possible to run in x2apic mode without running in interrupt remapping mode?
Just send the irqs directly to the cpu?  It feels like it should be if an iommu
doesn't get in the way.

Eric

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [patch 19/26] x64, x2apic/intr-remap: introcude self IPI to genapic routines
  2008-07-10 18:16 ` [patch 19/26] x64, x2apic/intr-remap: introcude self IPI to genapic routines Suresh Siddha
@ 2008-07-10 23:34   ` Eric W. Biederman
  2008-07-11  2:29     ` Mike Travis
  0 siblings, 1 reply; 87+ messages in thread
From: Eric W. Biederman @ 2008-07-10 23:34 UTC (permalink / raw)
  To: Suresh Siddha
  Cc: mingo, hpa, tglx, akpm, arjan, andi, ebiederm, jbarnes, steiner,
	linux-kernel

> Index: tree-x86/include/asm-x86/hw_irq.h
> ===================================================================
> --- tree-x86.orig/include/asm-x86/hw_irq.h 2008-07-10 09:51:45.000000000 -0700
> +++ tree-x86/include/asm-x86/hw_irq.h	2008-07-10 09:52:24.000000000 -0700
> @@ -73,7 +73,9 @@
>  #endif
>  
>  /* IPI functions */
> +#ifdef CONFIG_X86_32
>  extern void send_IPI_self(int vector);
> +#endif
>  extern void send_IPI(int dest, int vector);

Cute undoing unification.

Eric

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [patch 26/26] x64, x2apic/intr-remap: introduce CONFIG_INTR_REMAP
  2008-07-10 23:29   ` Eric W. Biederman
@ 2008-07-10 23:37     ` Yong Wang
  2008-07-11  1:50       ` Suresh Siddha
  2008-07-11  1:53       ` Eric W. Biederman
  0 siblings, 2 replies; 87+ messages in thread
From: Yong Wang @ 2008-07-10 23:37 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Suresh Siddha, mingo, hpa, tglx, akpm, arjan, andi, jbarnes,
	steiner, linux-kernel

> Is it possible to run in x2apic mode without running in interrupt remapping mode?
> Just send the irqs directly to the cpu?  It feels like it should be if an iommu
> doesn't get in the way.
> 

Yes, it is possible to run in x2apic mode without intr-remap. However, the extended
cpu addressability will not be fully utilized that way.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [patch 21/26] x64, x2apic/intr-remap: setup init_apic_ldr for UV
  2008-07-10 18:16 ` [patch 21/26] x64, x2apic/intr-remap: setup init_apic_ldr for UV Suresh Siddha
@ 2008-07-11  0:14   ` Andrew Morton
  2008-07-11  1:56     ` Suresh Siddha
  0 siblings, 1 reply; 87+ messages in thread
From: Andrew Morton @ 2008-07-11  0:14 UTC (permalink / raw)
  To: Suresh Siddha
  Cc: mingo, hpa, tglx, arjan, andi, ebiederm, jbarnes, steiner,
	linux-kernel

On Thu, 10 Jul 2008 11:16:55 -0700 Suresh Siddha <suresh.b.siddha@intel.com> wrote:

> +static inline void uv_init_apic_ldr(void)
> +{
> +}
> +
>  static unsigned int uv_cpu_mask_to_apicid(cpumask_t cpumask)
>  {
>  	int cpu;
> @@ -164,6 +168,7 @@
>  	.target_cpus = uv_target_cpus,
>  	.vector_allocation_domain = uv_vector_allocation_domain,/* Fixme ZZZ */
>  	.apic_id_registered = uv_apic_id_registered,
> +	.init_apic_ldr = uv_init_apic_ldr,

There's no point in declaring it inline if it's always called indirectly.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [patch 23/26] x64, x2apic/intr-remap: MSI and MSI-X support for interrupt remapping infrastructure
  2008-07-10 18:16 ` [patch 23/26] x64, x2apic/intr-remap: MSI and MSI-X support for interrupt remapping infrastructure Suresh Siddha
@ 2008-07-11  1:22   ` Eric W. Biederman
  2008-07-11  6:07     ` Suresh Siddha
  0 siblings, 1 reply; 87+ messages in thread
From: Eric W. Biederman @ 2008-07-11  1:22 UTC (permalink / raw)
  To: Suresh Siddha
  Cc: mingo, hpa, tglx, akpm, arjan, andi, ebiederm, jbarnes, steiner,
	linux-kernel

Suresh Siddha <suresh.b.siddha@intel.com> writes:

> MSI and MSI-X support for interrupt remapping infrastructure.
>
> MSI address register will be programmed with interrupt-remapping table
> entry(IRTE) index and the IRTE will contain information about the vector,
> cpu destination, etc.
>
> For MSI-X, all the IRTE's will be consecutively allocated in the table,
> and the address registers will contain the starting index to the block
> and the data register will contain the subindex with in that block.
>
> This also introduces a new irq_chip for cleaner irq migration (in the process
> context as opposed to the current irq migration in the context of an interrupt.
> interrupt-remapping infrastructure will help us achieve this).
>
> As MSI is edge triggered, irq migration is a simple atomic update(of vector
> and cpu destination) of IRTE and flushing the hardware cache.

A couple of things.  I believe MSI edge triggered irqs can always be migrated
safely outside of irq context.

Flushing the hardware cache sounds like it will flush the irqs towards the cpu.
How do we flush the inflight irqs flushed to the apic.  Does a register read work?

For MSI irqs going directly to the cpu it should, as long as the cpu and local
apic count as the same device from the perspective of pci ordering rules.


> Index: tree-x86/arch/x86/kernel/io_apic_64.c
> ===================================================================
> --- tree-x86.orig/arch/x86/kernel/io_apic_64.c 2008-07-10 09:52:31.000000000
> -0700
> +++ tree-x86/arch/x86/kernel/io_apic_64.c 2008-07-10 09:52:34.000000000 -0700
> @@ -2289,6 +2289,9 @@
>  
>  	dynamic_irq_cleanup(irq);
>  
> +#ifdef CONFIG_INTR_REMAP
> +	free_irte(irq);
> +#endif
>  	spin_lock_irqsave(&vector_lock, flags);
>  	__clear_irq_vector(irq);
>  	spin_unlock_irqrestore(&vector_lock, flags);
> @@ -2307,11 +2310,42 @@
>  
>  	tmp = TARGET_CPUS;
>  	err = assign_irq_vector(irq, tmp);
> -	if (!err) {
> -		cpus_and(tmp, cfg->domain, tmp);
> -		dest = cpu_mask_to_apicid(tmp);
> +	if (err)
> +		return err;
> +
> +	cpus_and(tmp, cfg->domain, tmp);
> +	dest = cpu_mask_to_apicid(tmp);

Can we simplify this a little.  In particular have a function 

struct IOAPIC_ROUTE_entry x86_map_irq(irq, mask);

Where x86_map_irq would ultimately figure out the path to the cpu.
In the simple case it would just call assign_irq_vector();
When irqs are remapped it would perform the additional
map_irq_to_irte_handle();
modify_irte(irq, &irte);

And then have the generic msi code and the ioapic code.
Map from the struct IOAPIC_ROUTE_entry or whatever to the appropriate bits for the hardware
they control.

That should allows us a lot more flexibility going forward with less code then is in your
patches.

> +#ifdef CONFIG_INTR_REMAP
> +	if (irq_remapped(irq)) {
> +		struct irte irte;
> +		int ir_index;
> +		u16 sub_handle;
> +
> +		ir_index = map_irq_to_irte_handle(irq, &sub_handle);
> +		BUG_ON(ir_index == -1);
> +
> +		memset (&irte, 0, sizeof(irte));
> +
> +		irte.present = 1;
> +		irte.dst_mode = INT_DEST_MODE;
> +		irte.trigger_mode = 0; /* edge */
> +		irte.dlvry_mode = INT_DELIVERY_MODE;
> +		irte.vector = cfg->vector;
> +		irte.dest_id = IRTE_DEST(dest);
> +
> +		modify_irte(irq, &irte);
>  
>  		msg->address_hi = MSI_ADDR_BASE_HI;
> +		msg->data = sub_handle;
> +		msg->address_lo = MSI_ADDR_BASE_LO | MSI_ADDR_IR_EXT_INT |
> +				  MSI_ADDR_IR_SHV |
> +				  MSI_ADDR_IR_INDEX1(ir_index) |
> +				  MSI_ADDR_IR_INDEX2(ir_index);
> +	} else
> +#endif
> +	{
> +		msg->address_hi = MSI_ADDR_BASE_HI;
>  		msg->address_lo =
>  			MSI_ADDR_BASE_LO |
>  			((INT_DEST_MODE == 0) ?
> @@ -2361,6 +2395,55 @@
>  	write_msi_msg(irq, &msg);
>  	irq_desc[irq].affinity = mask;
>  }
> +
> +#ifdef CONFIG_INTR_REMAP
> +/*
> + * Migrate the MSI irq to another cpumask. This migration is
> + * done in the process context using interrupt-remapping hardware.
> + */
> +static void ir_set_msi_irq_affinity(unsigned int irq, cpumask_t mask)
> +{
> +	struct irq_cfg *cfg = irq_cfg + irq;
> +	unsigned int dest;
> +	cpumask_t tmp, cleanup_mask;
> +	struct irte irte;
> +
> +	cpus_and(tmp, mask, cpu_online_map);
> +	if (cpus_empty(tmp))
> +		return;
> +
> +	if (get_irte(irq, &irte))
> +		return;
> +
> +	if (assign_irq_vector(irq, mask))
> +		return;
> +
> +	cpus_and(tmp, cfg->domain, mask);
> +	dest = cpu_mask_to_apicid(tmp);
> +
> +	irte.vector = cfg->vector;
> +	irte.dest_id = IRTE_DEST(dest);
> +
> +	/*
> +	 * atomically update the IRTE with the new destination and vector.
> +	 */
> +	modify_irte(irq, &irte);
> +
> +	/*
> +	 * After this point, all the interrupts will start arriving
> +	 * at the new destination. So, time to cleanup the previous
> +	 * vector allocation.
> +	 */
> +	if (cfg->move_in_progress) {
> +		cpus_and(cleanup_mask, cfg->old_domain, cpu_online_map);
> +		cfg->move_cleanup_count = cpus_weight(cleanup_mask);
> +		send_IPI_mask(cleanup_mask, IRQ_MOVE_CLEANUP_VECTOR);
> +		cfg->move_in_progress = 0;
> +	}
> +
> +	irq_desc[irq].affinity = mask;
> +}
> +#endif
>  #endif /* CONFIG_SMP */
>  
>  /*
> @@ -2378,26 +2461,157 @@
>  	.retrigger	= ioapic_retrigger_irq,
>  };

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [patch 26/26] x64, x2apic/intr-remap: introduce CONFIG_INTR_REMAP
  2008-07-10 23:37     ` Yong Wang
@ 2008-07-11  1:50       ` Suresh Siddha
  2008-07-11  1:53       ` Eric W. Biederman
  1 sibling, 0 replies; 87+ messages in thread
From: Suresh Siddha @ 2008-07-11  1:50 UTC (permalink / raw)
  To: Yong Wang
  Cc: Eric W. Biederman, Siddha, Suresh B, mingo@elte.hu, hpa@zytor.com,
	tglx@linutronix.de, akpm@linux-foundation.org,
	arjan@linux.intel.com, andi@firstfloor.org,
	jbarnes@virtuousgeek.org, steiner@sgi.com,
	linux-kernel@vger.kernel.org

On Thu, Jul 10, 2008 at 04:37:33PM -0700, Yong Wang wrote:
> > Is it possible to run in x2apic mode without running in interrupt remapping mode?
> > Just send the irqs directly to the cpu?  It feels like it should be if an iommu
> > doesn't get in the way.
> >
> 
> Yes, it is possible to run in x2apic mode without intr-remap. However, the extended
> cpu addressability will not be fully utilized that way.

No. CPU can't be in x2apic mode with out enabling interrupt-remapping.
Interrupts may not work (even when we have few cpu's). For example, in logical
cluster mode, logical x2apic id (initialized by HW) will be > 16 bit,
in a smaller DP configuration.

In general, enabling x2apic with out enabling interrupt-remapping is not
recommended, as the platform will be using legacy interrupt format and the
CPU will be using extended ID mode.

thanks,
suresh

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [patch 26/26] x64, x2apic/intr-remap: introduce CONFIG_INTR_REMAP
  2008-07-10 23:37     ` Yong Wang
  2008-07-11  1:50       ` Suresh Siddha
@ 2008-07-11  1:53       ` Eric W. Biederman
  1 sibling, 0 replies; 87+ messages in thread
From: Eric W. Biederman @ 2008-07-11  1:53 UTC (permalink / raw)
  To: Yong Wang
  Cc: Suresh Siddha, mingo, hpa, tglx, akpm, arjan, andi, jbarnes,
	steiner, linux-kernel

Yong Wang <yong.y.wang@linux.intel.com> writes:

> Yes, it is possible to run in x2apic mode without intr-remap. However, the
> extended
> cpu addressability will not be fully utilized that way.

Thanks.  I can see that as there are only about 16bits for the cpuid
in the irq routing entries.

Still it looks like we should enable x2apic if we can (as it is more
optimized) and should be faster.  Then if we have x2apic support
enable irq remapping if we can, although I'm not positive x2apic
support is required for the irq remapping.

Ultimately we should also enable directed ack when available instead
of broadcast ack for level triggered ioapic irqs.  That should also
reduce some rare but potentially unnecessary traffic.

Eric


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [patch 21/26] x64, x2apic/intr-remap: setup init_apic_ldr for UV
  2008-07-11  0:14   ` Andrew Morton
@ 2008-07-11  1:56     ` Suresh Siddha
  0 siblings, 0 replies; 87+ messages in thread
From: Suresh Siddha @ 2008-07-11  1:56 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Siddha, Suresh B, mingo@elte.hu, hpa@zytor.com,
	tglx@linutronix.de, arjan@linux.intel.com, andi@firstfloor.org,
	ebiederm@xmission.com, jbarnes@virtuousgeek.org, steiner@sgi.com,
	linux-kernel@vger.kernel.org

On Thu, Jul 10, 2008 at 05:14:30PM -0700, Andrew Morton wrote:
> On Thu, 10 Jul 2008 11:16:55 -0700 Suresh Siddha <suresh.b.siddha@intel.com> wrote:
> 
> > +static inline void uv_init_apic_ldr(void)
> > +{
> > +}
> > +
> >  static unsigned int uv_cpu_mask_to_apicid(cpumask_t cpumask)
> >  {
> >       int cpu;
> > @@ -164,6 +168,7 @@
> >       .target_cpus = uv_target_cpus,
> >       .vector_allocation_domain = uv_vector_allocation_domain,/* Fixme ZZZ */
> >       .apic_id_registered = uv_apic_id_registered,
> > +     .init_apic_ldr = uv_init_apic_ldr,
> 
> There's no point in declaring it inline if it's always called indirectly.

oops. will fix it.

thanks,
suresh

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [patch 19/26] x64, x2apic/intr-remap: introcude self IPI to genapic routines
  2008-07-10 23:34   ` Eric W. Biederman
@ 2008-07-11  2:29     ` Mike Travis
  2008-07-11  3:50       ` Eric W. Biederman
  0 siblings, 1 reply; 87+ messages in thread
From: Mike Travis @ 2008-07-11  2:29 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Suresh Siddha, mingo, hpa, tglx, akpm, arjan, andi, jbarnes,
	steiner, linux-kernel

Eric W. Biederman wrote:
>> Index: tree-x86/include/asm-x86/hw_irq.h
>> ===================================================================
>> --- tree-x86.orig/include/asm-x86/hw_irq.h 2008-07-10 09:51:45.000000000 -0700
>> +++ tree-x86/include/asm-x86/hw_irq.h	2008-07-10 09:52:24.000000000 -0700
>> @@ -73,7 +73,9 @@
>>  #endif
>>  
>>  /* IPI functions */
>> +#ifdef CONFIG_X86_32
>>  extern void send_IPI_self(int vector);
>> +#endif
>>  extern void send_IPI(int dest, int vector);
> 
> Cute undoing unification.
> 
> Eric

On a similar subject I would really like to change the send_IPI_mask to pass a
pointer to the cpumask_t arg:

        void (*send_IPI_mask)(cpumask_t mask, int vector);


This bloats the stack by 512 bytes and seemingly is called by some fairly
nested routines.  Any opinions?

Thanks,
Mike

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support
  2008-07-10 22:52     ` Eric W. Biederman
@ 2008-07-11  2:35       ` Suresh Siddha
  2008-07-11  3:15         ` Eric W. Biederman
  0 siblings, 1 reply; 87+ messages in thread
From: Suresh Siddha @ 2008-07-11  2:35 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Siddha, Suresh B, mingo@elte.hu, hpa@zytor.com,
	tglx@linutronix.de, akpm@linux-foundation.org,
	arjan@linux.intel.com, andi@firstfloor.org,
	jbarnes@virtuousgeek.org, steiner@sgi.com,
	linux-kernel@vger.kernel.org

On Thu, Jul 10, 2008 at 03:52:50PM -0700, Eric W. Biederman wrote:
> Suresh Siddha <suresh.b.siddha@intel.com> writes:
> 
> > Flushing the interrupt entry cache will take care of this. We modify the IRTE
> > and then flush the interrupt entry cache before cleaning up the original
> > vector allocated.
> >
> > Any new interrupts from the device will see the new entry. Old in flight
> > interrupts will be registered at the CPU before the flush of the cache is
> > complete.
> 
> That sounds nice in principle.  I saw cpu cache flushes, I saw writes.
> I did not see any reads which is necessary to get that behavior with
> the standard pci transaction rules.

qi_flush_iec() will submit an invalidation descriptor and will wait
till it finishes the invalidation of the interrupt entry cache.
qi_submit_sync() will do the job. Descriptor completion ensures that
the inflight interrupts are flushed.

> Having seen enough little races and misbehaving hardware I'm very paranoid
> about irq migration.  The current implementation is belt and suspenders
> and I still think there are races that I have missed.

Eric, This process irq migration is done on the cutting edge hardware
which was designed with all the feedback and experiences in the mind ;)

And also, I don't think we are deviating much from what we are currently doing.
We are still using cleanup vector etc, to clean up the previous vector
allocation.

> >> You are sizing an array as NR_IRQS this is something there should be
> > sufficient
> >> existing infrastructure to avoid.  Arrays sized by NR_IRQS is a significant
> >> problem both for scaling the system up and down so ultimately we need to kill
> >> this.  For now we should not introduce any new arrays.
> >
> > Ok. Ideally dynamic_irq_init()/cleanup() can take care of this. or
> > create_irq()/destroy_irq() and embed this as a pointer somewhere inside
> > irq_desc. I need to take a look at this more closer and post a fix up patch.
> 
> Sounds good.  Ultimately we are looking at handler_data or chip_data.
> There are very specific rules that meant I could not use them for
> the msi data but otherwise I don't remember exactly what the are for.
> IOMMU are covered though.

IOMMU is covered as part of pci_dev (pci_sysdata). But in the case of
interrupt-remapping, there are some interrupt resources like ioapics and
hpet, which don't have the corresponding pci dev. Will take a look at this.

> At least for msi the code you are working on was essentially unified
> when it was written, it just happened to have two copies.  I don't
> think I'm asking for heaving lifting.  Mostly just putting code that
> is touched into something other then the growing monstrosity that is
> ioapic.c

We can create msi.c which handles MSI specific handling. I will
look into this. But I def welcome somone beating me in posting those
patches :) I made a note of this however.

> Further can we please see some better abstractions.  In particular can
> we generate a token for the irq destination.  And have the msi and
> ioapic setup read that token and program it into the hardware.  The
> rules for which bits go where is exactly the same both with and
> without irq_remapping so having an if statement there seems to obscure
> what is really happening.  Especially if as it appears that we may be used
> the new token format with x2apics without remapping.

unfortunately x2apic can't be enabled with out enabling interrupt-remapping.
Interrupts don't work in majority of the configurations (as I mentioned
earlier). Programming IOAPIC RTE's and MSI address/data registers are
completely different based on the presence of interrupt-remapping.

> 
> My primary concern is that the end result be well factored irq handling code
> so it is possible to get in there and look at the code and maintain it.
> 
> A small part of that is the 32bit support.  Another part are the missing
> abstractions I described.  I don't know what else since I have barely scratched the surface patch
> review wise.

Please keep the expert comments coming.

thanks,
suresh

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support
  2008-07-11  2:35       ` Suresh Siddha
@ 2008-07-11  3:15         ` Eric W. Biederman
  0 siblings, 0 replies; 87+ messages in thread
From: Eric W. Biederman @ 2008-07-11  3:15 UTC (permalink / raw)
  To: Suresh Siddha
  Cc: mingo@elte.hu, hpa@zytor.com, tglx@linutronix.de,
	akpm@linux-foundation.org, arjan@linux.intel.com,
	andi@firstfloor.org, jbarnes@virtuousgeek.org, steiner@sgi.com,
	linux-kernel@vger.kernel.org

Suresh Siddha <suresh.b.siddha@intel.com> writes:

>> That sounds nice in principle.  I saw cpu cache flushes, I saw writes.
>> I did not see any reads which is necessary to get that behavior with
>> the standard pci transaction rules.
>
> qi_flush_iec() will submit an invalidation descriptor and will wait
> till it finishes the invalidation of the interrupt entry cache.
> qi_submit_sync() will do the job. Descriptor completion ensures that
> the inflight interrupts are flushed.

I will have to look a second time.  It seems I did not see the wait.

>> Having seen enough little races and misbehaving hardware I'm very paranoid
>> about irq migration.  The current implementation is belt and suspenders
>> and I still think there are races that I have missed.
>
> Eric, This process irq migration is done on the cutting edge hardware
> which was designed with all the feedback and experiences in the mind ;)
>
> And also, I don't think we are deviating much from what we are currently doing.
> We are still using cleanup vector etc, to clean up the previous vector
> allocation.

Yes.  In general I think the design appears sound.  I'm just not certain
of the implementation details.  Having spent way to many hours
debugging some very subtle hardware issues, I am paranoid.


>> Sounds good.  Ultimately we are looking at handler_data or chip_data.
>> There are very specific rules that meant I could not use them for
>> the msi data but otherwise I don't remember exactly what the are for.
>> IOMMU are covered though.
>
> IOMMU is covered as part of pci_dev (pci_sysdata). But in the case of
> interrupt-remapping, there are some interrupt resources like ioapics and
> hpet, which don't have the corresponding pci dev. Will take a look at this.
>
>> At least for msi the code you are working on was essentially unified
>> when it was written, it just happened to have two copies.  I don't
>> think I'm asking for heaving lifting.  Mostly just putting code that
>> is touched into something other then the growing monstrosity that is
>> ioapic.c
>
> We can create msi.c which handles MSI specific handling. I will
> look into this. But I def welcome somone beating me in posting those
> patches :) I made a note of this however.

Thanks.  It is all of these little things.  My hope is that we can
whittle down the unshared core instead of increasing the amount of
non-shared code.

>> Further can we please see some better abstractions.  In particular can
>> we generate a token for the irq destination.  And have the msi and
>> ioapic setup read that token and program it into the hardware.  The
>> rules for which bits go where is exactly the same both with and
>> without irq_remapping so having an if statement there seems to obscure
>> what is really happening.  Especially if as it appears that we may be used
>> the new token format with x2apics without remapping.
>
> unfortunately x2apic can't be enabled with out enabling interrupt-remapping.
> Interrupts don't work in majority of the configurations (as I mentioned
> earlier). Programming IOAPIC RTE's and MSI address/data registers are
> completely different based on the presence of interrupt-remapping.

Regardless of the x2apic mode issues there is a different issue here (address better in
another response where I gave an example).

There are a set of architecturally defined bits that can be
programmed.  These same bits exist in the ioapic routing entry and in
the msi message.

Therefore we should have a generic mapping function that says give the architecturally
defined bits.

Then both the ioapic setup and the msi setup can call x86_irq_map(irq) get those
architectural bits and program them in their architecturally defined location.

For MSI it looks like you be able to take advantage of a few more bits, but the
same principle applies.

Getting these intermediate abstractions relatively clean is important so we can do
things with the hardware.

Eric

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [patch 19/26] x64, x2apic/intr-remap: introcude self IPI to genapic routines
  2008-07-11  2:29     ` Mike Travis
@ 2008-07-11  3:50       ` Eric W. Biederman
  2008-07-11 13:55         ` Mike Travis
  0 siblings, 1 reply; 87+ messages in thread
From: Eric W. Biederman @ 2008-07-11  3:50 UTC (permalink / raw)
  To: Mike Travis
  Cc: Suresh Siddha, mingo, hpa, tglx, akpm, arjan, andi, jbarnes,
	steiner, linux-kernel

Mike Travis <travis@sgi.com> writes:
>
> On a similar subject I would really like to change the send_IPI_mask to pass a
> pointer to the cpumask_t arg:
>
>         void (*send_IPI_mask)(cpumask_t mask, int vector);
>
>
> This bloats the stack by 512 bytes and seemingly is called by some fairly
> nested routines.  Any opinions?

It sounds like a pain.  Especially since we would need to dereference
cpumask_t when we use it.  Does any remember if there was a plan for
dealing with cpumask_t when the number of cpus got large?

If we pass in a pointer to constant data semantically we should be fine.

Mostly I am wondering if three isn't a cleaner solution hidden away somewhere.

Eric


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [patch 11/26] x64, x2apic/intr-remap: generic irq migration support from process context
  2008-07-10 23:08   ` Eric W. Biederman
@ 2008-07-11  5:41     ` Suresh Siddha
  2008-07-11  9:19       ` Eric W. Biederman
  0 siblings, 1 reply; 87+ messages in thread
From: Suresh Siddha @ 2008-07-11  5:41 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Siddha, Suresh B, mingo@elte.hu, hpa@zytor.com,
	tglx@linutronix.de, akpm@linux-foundation.org,
	arjan@linux.intel.com, andi@firstfloor.org,
	jbarnes@virtuousgeek.org, steiner@sgi.com,
	linux-kernel@vger.kernel.org

On Thu, Jul 10, 2008 at 04:08:03PM -0700, Eric W. Biederman wrote:
> Suresh Siddha <suresh.b.siddha@intel.com> writes:
> 
> > Generic infrastructure for migrating the irq from the process context in the
> > presence of CONFIG_GENERIC_PENDING_IRQ.
> >
> > This will be used later for migrating irq in the presence of
> > interrupt-remapping.
> 
> Why the API difference between IRQ_MOVE_PCNTXT set affinity handlers and
> !CONFIG_GENERIC_PENDING_IRQ handlers?

You are referring to desc->lock portion?

Two reasons:

a. Other code in CONFIG_GENERIC_PENDING_IRQ, assuming that desc->lock is held
while calling set_affinity. like fixup_irqs(). Just wanted to be same
across the board.

b. for level triggered, we still touch irq_desc and set IRQ_MOVE_PENDING,
when we fail to move the irq (if there is already some level triggered
interrupt happening in parallel). while, we can acquire the lock inside
the set_affinity, I thought this simplifies things.

> 
> 
> >  #ifdef CONFIG_GENERIC_PENDING_IRQ
> > -     set_pending_irq(irq, cpumask);
> > +     if (desc->status & IRQ_MOVE_PCNTXT) {
> > +             unsigned long flags;
> > +
> > +             spin_lock_irqsave(&desc->lock, flags);
> > +             desc->chip->set_affinity(irq, cpumask);
> > +             spin_unlock_irqrestore(&desc->lock, flags);
> > +     } else
> > +             set_pending_irq(irq, cpumask);
> >  #else
> >       desc->affinity = cpumask;
> >       desc->chip->set_affinity(irq, cpumask);
> >
> > --

thanks,
suresh

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [patch 23/26] x64, x2apic/intr-remap: MSI and MSI-X support for interrupt remapping infrastructure
  2008-07-11  1:22   ` Eric W. Biederman
@ 2008-07-11  6:07     ` Suresh Siddha
  2008-07-11  8:59       ` Eric W. Biederman
  0 siblings, 1 reply; 87+ messages in thread
From: Suresh Siddha @ 2008-07-11  6:07 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Siddha, Suresh B, mingo@elte.hu, hpa@zytor.com,
	tglx@linutronix.de, akpm@linux-foundation.org,
	arjan@linux.intel.com, andi@firstfloor.org,
	jbarnes@virtuousgeek.org, steiner@sgi.com,
	linux-kernel@vger.kernel.org

On Thu, Jul 10, 2008 at 06:22:57PM -0700, Eric W. Biederman wrote:
> > As MSI is edge triggered, irq migration is a simple atomic update(of vector
> > and cpu destination) of IRTE and flushing the hardware cache.
> 
> A couple of things.  I believe MSI edge triggered irqs can always be migrated
> safely outside of irq context.
> 
> Flushing the hardware cache sounds like it will flush the irqs towards the cpu.
> How do we flush the inflight irqs flushed to the apic.  Does a register read work?

As I mentioned in other thread, we are waiting for the queued invalidation
descriptor to complete (in qi_submit_sync()). That will register the inflight
ones at the cpu's apic before setting the descriptor completion.

> For MSI irqs going directly to the cpu it should, as long as the cpu and local
> apic count as the same device from the perspective of pci ordering rules.
> 
> > +++ tree-x86/arch/x86/kernel/io_apic_64.c 2008-07-10 09:52:34.000000000 -0700
> > @@ -2289,6 +2289,9 @@
> >
> >       dynamic_irq_cleanup(irq);
> >
> > +#ifdef CONFIG_INTR_REMAP
> > +     free_irte(irq);
> > +#endif
> >       spin_lock_irqsave(&vector_lock, flags);
> >       __clear_irq_vector(irq);
> >       spin_unlock_irqrestore(&vector_lock, flags);
> > @@ -2307,11 +2310,42 @@
> >
> >       tmp = TARGET_CPUS;
> >       err = assign_irq_vector(irq, tmp);
> > -     if (!err) {
> > -             cpus_and(tmp, cfg->domain, tmp);
> > -             dest = cpu_mask_to_apicid(tmp);
> > +     if (err)
> > +             return err;
> > +
> > +     cpus_and(tmp, cfg->domain, tmp);
> > +     dest = cpu_mask_to_apicid(tmp);
> 
> Can we simplify this a little.  In particular have a function
> 
> struct IOAPIC_ROUTE_entry x86_map_irq(irq, mask);
> 
> Where x86_map_irq would ultimately figure out the path to the cpu.
> In the simple case it would just call assign_irq_vector();
> When irqs are remapped it would perform the additional

But we already know that the irq's are remapped, as we are using different
irq_chip's when irq's are remapped.

> map_irq_to_irte_handle();
> modify_irte(irq, &irte);
> 
> And then have the generic msi code and the ioapic code.
> Map from the struct IOAPIC_ROUTE_entry or whatever to the appropriate bits for the hardware
> they control.
> 
> That should allows us a lot more flexibility going forward with less code then is in your
> patches.

Are you talking about the setup code or the migration code? Because in migration
code, we don't even touch MSI/IO-apic devices (for edge atleast) and we
already use different irq_chip's for that.

For initial setup, I agree that it can use some simplifications. It's getting
late here and I will look at all your suggestions tomorrow.

thanks,
suresh

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [patch 23/26] x64, x2apic/intr-remap: MSI and MSI-X support for interrupt remapping infrastructure
  2008-07-11  6:07     ` Suresh Siddha
@ 2008-07-11  8:59       ` Eric W. Biederman
  2008-07-11 23:07         ` Suresh Siddha
  0 siblings, 1 reply; 87+ messages in thread
From: Eric W. Biederman @ 2008-07-11  8:59 UTC (permalink / raw)
  To: Suresh Siddha
  Cc: mingo@elte.hu, hpa@zytor.com, tglx@linutronix.de,
	akpm@linux-foundation.org, arjan@linux.intel.com,
	andi@firstfloor.org, jbarnes@virtuousgeek.org, steiner@sgi.com,
	linux-kernel@vger.kernel.org

Suresh Siddha <suresh.b.siddha@intel.com> writes:

>> 
>> Can we simplify this a little.  In particular have a function
>> 
>> struct IOAPIC_ROUTE_entry x86_map_irq(irq, mask);
>> 
>> Where x86_map_irq would ultimately figure out the path to the cpu.
>> In the simple case it would just call assign_irq_vector();
>> When irqs are remapped it would perform the additional
>
> But we already know that the irq's are remapped, as we are using different
> irq_chip's when irq's are remapped.
>
>> map_irq_to_irte_handle();
>> modify_irte(irq, &irte);
>> 
>> And then have the generic msi code and the ioapic code.
>> Map from the struct IOAPIC_ROUTE_entry or whatever to the appropriate bits for
> the hardware
>> they control.
>> 
>> That should allows us a lot more flexibility going forward with less code then
> is in your
>> patches.
>
> Are you talking about the setup code or the migration code? Because in migration
> code, we don't even touch MSI/IO-apic devices (for edge atleast) and we
> already use different irq_chip's for that.

I guess I was looking at the setup code.

At any rate the way the code is currently factored does not lend itself easily
to adding another iommu, and things that could be common aren't so maintenance
is harder then it should be.  If we continue on the current path I'm scared
of what that code will look like when we add Xen, VMware, kvm, lguest, and
AMD iommu support in the coming months.

ppc64 and sparc64 seem to have a subarch model where the chipset and
cpu capabilities are pretty standard.  Unfortunately x86 (as usual)
looks like it will become much more pick and choose so I don't think
we can just reuse any of the techniques those other architectures have
done.

What I am ultimately looking for is the x86 iommu irq mapping api.
And how we handle irqs in the context of it.

So as a start I think we can create x86_map_irq, as I suggested.
Since we have the pci dev to lookup the iommu then we really shouldn't
need multiple irq_chip structures (although it may be worth it if we
can detect we can optimize irq migration).

I just don't want to have a MxN problem where we have to implement
every kind of irq chip handler with every kind of iommu if I can help
it.  Even Mx2 starts looking pretty nasty.

> For initial setup, I agree that it can use some simplifications. It's getting
> late here and I will look at all your suggestions tomorrow.

Sounds good.

Eric


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [patch 11/26] x64, x2apic/intr-remap: generic irq migration support from process context
  2008-07-11  5:41     ` Suresh Siddha
@ 2008-07-11  9:19       ` Eric W. Biederman
  0 siblings, 0 replies; 87+ messages in thread
From: Eric W. Biederman @ 2008-07-11  9:19 UTC (permalink / raw)
  To: Suresh Siddha
  Cc: mingo@elte.hu, hpa@zytor.com, tglx@linutronix.de,
	akpm@linux-foundation.org, arjan@linux.intel.com,
	andi@firstfloor.org, jbarnes@virtuousgeek.org, steiner@sgi.com,
	linux-kernel@vger.kernel.org

Suresh Siddha <suresh.b.siddha@intel.com> writes:

> You are referring to desc->lock portion?

yes.

> Two reasons:
>
> a. Other code in CONFIG_GENERIC_PENDING_IRQ, assuming that desc->lock is held
> while calling set_affinity. like fixup_irqs(). Just wanted to be same
> across the board.

Almost useful fixup_irqs is and always has been broken.  So it doesn't really count.

> b. for level triggered, we still touch irq_desc and set IRQ_MOVE_PENDING,
> when we fail to move the irq (if there is already some level triggered
> interrupt happening in parallel). while, we can acquire the lock inside
> the set_affinity, I thought this simplifies things.

Actually if you acquire the lock inside of set_affinity you can sleep,
which may simplify things more.

Fixup_irqs assumes everything happens atomically with irqs disabled so
it doesn't unless you busy wait until the irq is handled.  The best
you can do is if you have an ioxapic force the issue by sending a
directed EOI.  For older apics there is nothing you can do because IPI
are edge triggered and only the acknowledgement of level triggered
interrupts triggers a broadcast EOI for a vector.

Unless you just plain get interested or have a business interest in
true cpu hotplug I don't expect you to fix fixup_irqs.  That is more
work then unifying io_apic.c between x86_32 and x86_64.  Adding yet
another kludge to the kludge is probably fine for now.

Eric

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support
  2008-07-10 21:56   ` Suresh Siddha
@ 2008-07-11 10:28     ` Ingo Molnar
  2008-07-11 20:09       ` Ingo Molnar
  0 siblings, 1 reply; 87+ messages in thread
From: Ingo Molnar @ 2008-07-11 10:28 UTC (permalink / raw)
  To: Suresh Siddha
  Cc: hpa@zytor.com, tglx@linutronix.de, akpm@linux-foundation.org,
	arjan@linux.intel.com, andi@firstfloor.org, ebiederm@xmission.com,
	jbarnes@virtuousgeek.org, steiner@sgi.com,
	linux-kernel@vger.kernel.org, jeremy


* Suresh Siddha <suresh.b.siddha@intel.com> wrote:

> On Thu, Jul 10, 2008 at 12:53:20PM -0700, Ingo Molnar wrote:
> > 
> > Btw., i threw it at the -tip test-cluster and got back a quick build
> > bugreport:
> > 
> > arch/x86/xen/enlighten.c: In function 'xen_patch':
> > arch/x86/xen/enlighten.c:1084: warning: label 'patch_site' defined but not used
> > arch/x86/xen/enlighten.c: At top level:
> > arch/x86/xen/enlighten.c:1272: error: expected identifier before '(' token
> > arch/x86/xen/enlighten.c:1273: error: expected '}' before '.' token
> > arch/x86/kernel/paravirt.c:376:2: error: invalid preprocessing directive
> > #ifndedarch/x86/kernel/paravirt.c:384:2: error: #endif without #if
> > 
> > with this config:
> > 
> >   http://redhat.com/~mingo/misc/config-Thu_Jul_10_21_43_28_CEST_2008.bad
> 
> Ingo, that was my stupid typo. Please apply this patch. BTW, we
> need some more xen64 paravirt fixes in this area. I will look at it
> as soon as possible.

applied to tip/x86/x2apic - thanks Suresh.

	Ingo

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [patch 19/26] x64, x2apic/intr-remap: introcude self IPI to genapic routines
  2008-07-11  3:50       ` Eric W. Biederman
@ 2008-07-11 13:55         ` Mike Travis
  0 siblings, 0 replies; 87+ messages in thread
From: Mike Travis @ 2008-07-11 13:55 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Suresh Siddha, mingo, hpa, tglx, akpm, arjan, andi, jbarnes,
	steiner, linux-kernel

Eric W. Biederman wrote:
> Mike Travis <travis@sgi.com> writes:
>> On a similar subject I would really like to change the send_IPI_mask to pass a
>> pointer to the cpumask_t arg:
>>
>>         void (*send_IPI_mask)(cpumask_t mask, int vector);
>>
>>
>> This bloats the stack by 512 bytes and seemingly is called by some fairly
>> nested routines.  Any opinions?
> 
> It sounds like a pain.  Especially since we would need to dereference
> cpumask_t when we use it.  Does any remember if there was a plan for
> dealing with cpumask_t when the number of cpus got large?
> 
> If we pass in a pointer to constant data semantically we should be fine.
> 
> Mostly I am wondering if three isn't a cleaner solution hidden away somewhere.
> 
> Eric

This is similar to the set_cpus_allowed_ptr which is the alternative for
passing the cpumask as a pointer.  When I did this way back when, the general
consensus was that the extra dereference was just a bit of noise in low use
functions.  

This case is different in that there is an API (via genapic).  I could always
add a new entry for send_IPI_mask_ptr, or I could change the existing interface
to be like the current cpumask operators, passing a pointer to the cpumask args
"silently".  This has an advantage in that we could "not" pass the pointer if
say, NR_CPUS <= sizeof(LONG).  But I'd have to change references of the form
(genapic->send_IPI_mask).

And there are cases where the cpumask arg is cpumask_of_cpu() meaning there's
only one bit of interest which really wastes stack space.  And with the (real
soon now) next iteration to 16k cpus, pushing 2k on the stack, we will feel
the pain... ;-)

Thanks,
Mike

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support
  2008-07-11 10:28     ` Ingo Molnar
@ 2008-07-11 20:09       ` Ingo Molnar
  2008-07-11 20:31         ` Suresh Siddha
  0 siblings, 1 reply; 87+ messages in thread
From: Ingo Molnar @ 2008-07-11 20:09 UTC (permalink / raw)
  To: Suresh Siddha
  Cc: hpa@zytor.com, tglx@linutronix.de, akpm@linux-foundation.org,
	arjan@linux.intel.com, andi@firstfloor.org, ebiederm@xmission.com,
	jbarnes@virtuousgeek.org, steiner@sgi.com,
	linux-kernel@vger.kernel.org, jeremy


* Ingo Molnar <mingo@elte.hu> wrote:

> > >   http://redhat.com/~mingo/misc/config-Thu_Jul_10_21_43_28_CEST_2008.bad
> > 
> > Ingo, that was my stupid typo. Please apply this patch. BTW, we need 
> > some more xen64 paravirt fixes in this area. I will look at it as 
> > soon as possible.
> 
> applied to tip/x86/x2apic - thanks Suresh.

another problem is the redefinition of apic_read(), causing:

 arch/x86/xen/enlighten.c: In function ‘xen_patch':
 arch/x86/xen/enlighten.c:1084: warning: label ‘patch_site' defined but not used
 arch/x86/xen/enlighten.c: At top level:
 arch/x86/xen/enlighten.c:1272: error: expected identifier before ‘(' token
 arch/x86/xen/enlighten.c:1273: error: expected ‘}' before ‘.' token

with this config:

  http://redhat.com/~mingo/misc/config-Fri_Jul_11_21_51_18_CEST_2008.bad

the continued spaghetti in all the APIC variants is quite ugly. This 
should all be handled via a single apic_ops template that should cover 
the paravirt and native variants as well.

	Ingo

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support
  2008-07-11 20:09       ` Ingo Molnar
@ 2008-07-11 20:31         ` Suresh Siddha
  2008-07-11 20:42           ` Yinghai Lu
  2008-07-11 20:49           ` Ingo Molnar
  0 siblings, 2 replies; 87+ messages in thread
From: Suresh Siddha @ 2008-07-11 20:31 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Siddha, Suresh B, hpa@zytor.com, tglx@linutronix.de,
	akpm@linux-foundation.org, arjan@linux.intel.com,
	andi@firstfloor.org, ebiederm@xmission.com,
	jbarnes@virtuousgeek.org, steiner@sgi.com,
	linux-kernel@vger.kernel.org, jeremy@goop.org

On Fri, Jul 11, 2008 at 01:09:57PM -0700, Ingo Molnar wrote:
> 
> * Ingo Molnar <mingo@elte.hu> wrote:
> 
> > > >   http://redhat.com/~mingo/misc/config-Thu_Jul_10_21_43_28_CEST_2008.bad
> > >
> > > Ingo, that was my stupid typo. Please apply this patch. BTW, we need
> > > some more xen64 paravirt fixes in this area. I will look at it as
> > > soon as possible.
> >
> > applied to tip/x86/x2apic - thanks Suresh.
> 
> another problem is the redefinition of apic_read(), causing:
> 
>  arch/x86/xen/enlighten.c: In function ‘xen_patch':
>  arch/x86/xen/enlighten.c:1084: warning: label ‘patch_site' defined but not used
>  arch/x86/xen/enlighten.c: At top level:
>  arch/x86/xen/enlighten.c:1272: error: expected identifier before ‘(' token
>  arch/x86/xen/enlighten.c:1273: error: expected ‘}' before ‘.' token
> 
> with this config:
> 
>   http://redhat.com/~mingo/misc/config-Fri_Jul_11_21_51_18_CEST_2008.bad
> 
> the continued spaghetti in all the APIC variants is quite ugly. This
> should all be handled via a single apic_ops template that should cover
> the paravirt and native variants as well.

Ingo, I just posted the fix for this.

To cleanup the code:

struct pv_apic_ops {
#ifdef CONFIG_X86_LOCAL_APIC
	/*
	* Direct APIC operations, principally for VMI.  Ideally
	* these shouldn't be in this interface.
	*/
	void (*apic_write)(unsigned long reg, u32 v);
	void (*apic_write_atomic)(unsigned long reg, u32 v);
	u32 (*apic_read)(unsigned long reg);

Probably we should move the three above routines to basic apic_ops, which just
deal with the apic HW accesses and retain the below for pv_apic_ops, which
care more than the basic reg accesses. This will be true for both 32/64bits..

        void (*setup_boot_clock)(void);
        void (*setup_secondary_clock)(void);

        void (*startup_ipi_hook)(int phys_apicid,
                                 unsigned long start_eip,
                                 unsigned long start_esp);
#endif
};


Unless there is an objection, I will post the fix.

thanks,
suresh

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support
  2008-07-11 20:31         ` Suresh Siddha
@ 2008-07-11 20:42           ` Yinghai Lu
  2008-07-11 20:45             ` Ingo Molnar
  2008-07-11 20:49           ` Ingo Molnar
  1 sibling, 1 reply; 87+ messages in thread
From: Yinghai Lu @ 2008-07-11 20:42 UTC (permalink / raw)
  To: Suresh Siddha
  Cc: Ingo Molnar, hpa@zytor.com, tglx@linutronix.de,
	akpm@linux-foundation.org, arjan@linux.intel.com,
	andi@firstfloor.org, ebiederm@xmission.com,
	jbarnes@virtuousgeek.org, steiner@sgi.com,
	linux-kernel@vger.kernel.org, jeremy@goop.org

got:

Linux version 2.6.26-rc9-tip-01763-g74f94b1-dirty (yhlu@linux-zpir)
(gcc version 4.3.1 20080507 (prerelease) [gcc-4_3-branch revision
135036] (SUSE Linux) ) #320 SMP Fri Jul 11 13:35:20 PDT 2008
Command line: console=uart8250,io,0x3f8,115200n8
initrd=kernel.org/mydisk11_x86_64.gz rw root=/dev/ram0 debug
show_msr=1 nopat initcall_debug apic=verbose pci=routeirq ip=dhcp
load_ramdisk=1 ramdisk_size=131072
BOOT_IMAGE=kernel.org/bzImage_2.6.26_k8.h
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009cc00 (usable)....
 BIOS-e820: 000000000009cc00 - 00000000000a0000 (reserved)......................
 BIOS-e820: 00000000000e6000 - 0000000000100000 (reserved)......................
 BIOS-e820: 0000000000100000 - 0000000087fe0000 (usable)........................
 BIOS-e820: 0000000087fe0000 - 0000000087fee000 (ACPI data).....................
 BIOS-e820: 0000000087fee000 - 0000000087fff2e0 (ACPI NVS)......................
 BIOS-e820: 0000000087fff2e0 - 0000000088000000 (reserved)
 BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved)
 BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
 BIOS-e820: 00000000ff700000 - 0000000100000000 (reserved)
 BIOS-e820: 0000000100000000 - 0000000478000000 (usable)
KERNEL supported cpus:
  Intel GenuineIntel
  AMD AuthenticAMD
  Centaur CentaurHauls
Early serial console at I/O port 0x3f8 (options '115200n8')
console [uart0] enabled
PAT support disabled.
last_pfn = 0x478000 max_arch_pfn = 0x3ffffffff
last_pfn = 0x87fe0 max_arch_pfn = 0x3ffffffff
init_memory_mapping
Using GB pages for direct mapping
 0000000000 - 0080000000 page 1G
 0080000000 - 0087e00000 page 2M
 0087e00000 - 0087fe0000 page 4k
kernel direct mapping tables up to 87fe0000 @ 8000-b000
last_map_addr: 87fe0000 end: 87fe0000
init_memory_mapping
Using GB pages for direct mapping
 0100000000 - 0440000000 page 1G
 0440000000 - 0478000000 page 2M
kernel direct mapping tables up to 478000000 @ a000-c000
last_map_addr: 478000000 end: 478000000
RAMDISK: 7e6d7000 - 7ffffe0b
DMI 2.3 present.
ACPI: RSDP 000FA740, 0024 (r2 SUN   )
ACPI: XSDT 87FE0100, 006C (r1 SUN    X6420          14 MSFT       97)
ACPI: FACP 87FE0290, 00F4 (r3 SUN    X6420          14 MSFT       97)
ACPI: DSDT 87FE0500, 876F (r1 SUN    X6420          14 INTL 20060512)
ACPI: FACS 87FEE000, 0040
ACPI: APIC 87FE0390, 00D8 (r1 SUN    X6420          14 MSFT       97)
ACPI: SPCR 87FE0470, 0050 (r1 SUN    X6420          14 MSFT       97)
ACPI: SLIT 87FE04C0, 003C (r1 SUN    X6420          14 MSFT       97)
ACPI: OEMB 87FEE040, 0063 (r1 SUN    X6420          14 MSFT       97)
ACPI: SRAT 87FE8C70, 0220 (r1 AMD    HAMMER          1 AMD         1)
ACPI: HPET 87FE8E90, 0038 (r1 SUN    X6420          14 MSFT       97)
ACPI: IPET 87FE8ED0, 0038 (r1 SUN     X6420         14 MSFT       97)
ACPI: SSDT 87FE8F10, 2854 (r1 A M I  POWERNOW        1 AMD         1)
SRAT: PXM 0 -> APIC 4 -> Node 0
SRAT: PXM 0 -> APIC 5 -> Node 0
SRAT: PXM 0 -> APIC 6 -> Node 0
SRAT: PXM 0 -> APIC 7 -> Node 0
SRAT: PXM 1 -> APIC 8 -> Node 1
SRAT: PXM 1 -> APIC 9 -> Node 1
SRAT: PXM 1 -> APIC 10 -> Node 1
SRAT: PXM 1 -> APIC 11 -> Node 1
SRAT: PXM 2 -> APIC 12 -> Node 2
SRAT: PXM 2 -> APIC 13 -> Node 2
SRAT: PXM 2 -> APIC 14 -> Node 2
SRAT: PXM 2 -> APIC 15 -> Node 2
SRAT: PXM 3 -> APIC 16 -> Node 3
SRAT: PXM 3 -> APIC 17 -> Node 3
SRAT: PXM 3 -> APIC 18 -> Node 3
SRAT: PXM 3 -> APIC 19 -> Node 3
SRAT: Node 0 PXM 0 0-a0000
Entering add_active_range(0, 0x0, 0x9c) 0 entries of 3200 used
SRAT: Node 0 PXM 0 100000-88000000
Entering add_active_range(0, 0x100, 0x87fe0) 1 entries of 3200 used
SRAT: Node 0 PXM 0 100000000-178000000
Entering add_active_range(0, 0x100000, 0x178000) 2 entries of 3200 used
SRAT: Node 1 PXM 1 178000000-278000000
Entering add_active_range(1, 0x178000, 0x278000) 3 entries of 3200 used
SRAT: Node 2 PXM 2 278000000-378000000
Entering add_active_range(2, 0x278000, 0x378000) 4 entries of 3200 used
SRAT: Node 3 PXM 3 378000000-478000000
Entering add_active_range(3, 0x378000, 0x478000) 5 entries of 3200 used
ACPI: SLIT: nodes = 4
 10 13 13 16
 13 10 13 13
 13 13 10 13
 16 13 13 10
NUMA: Allocated memnodemap from b000 - 13f80
NUMA: Using 20 for the hash shift.
Bootmem setup node 0 0000000000000000-0000000178000000
  NODE_DATA [0000000000013f80 - 0000000000018f7f]
  bootmap [0000000000019000 -  0000000000047fff] pages 2f
(9 early reservations) ==> bootmem
  #0 [0000000000 - 0000001000]   BIOS data page ==> [0000000000 - 0000001000]
  #1 [0000006000 - 0000008000]       TRAMPOLINE ==> [0000006000 - 0000008000]
  #2 [0000200000 - 00010ba6d4]    TEXT DATA BSS ==> [0000200000 - 00010ba6d4]
  #3 [007e6d7000 - 007ffffe0b]          RAMDISK ==> [007e6d7000 - 007ffffe0b]
  #4 [000009c800 - 0000100000]    BIOS reserved ==> [000009c800 - 0000100000]
  #5 [0000008000 - 000000a000]          PGTABLE ==> [0000008000 - 000000a000]
  #6 [000000a000 - 000000b000]          PGTABLE ==> [000000a000 - 000000b000]
  #7 [0000001000 - 000000103c]        ACPI SLIT ==> [0000001000 - 000000103c]
  #8 [000000b000 - 0000013f80]       MEMNODEMAP ==> [000000b000 - 0000013f80]
Bootmem setup node 1 0000000178000000-0000000278000000
  NODE_DATA [0000000178000000 - 0000000178004fff]
  bootmap [0000000178005000 -  0000000178024fff] pages 20
(9 early reservations) ==> bootmem
  #0 [0000000000 - 0000001000]   BIOS data page
  #1 [0000006000 - 0000008000]       TRAMPOLINE
  #2 [0000200000 - 00010ba6d4]    TEXT DATA BSS
  #3 [007e6d7000 - 007ffffe0b]          RAMDISK
  #4 [000009c800 - 0000100000]    BIOS reserved
  #5 [0000008000 - 000000a000]          PGTABLE
  #6 [000000a000 - 000000b000]          PGTABLE
  #7 [0000001000 - 000000103c]        ACPI SLIT
  #8 [000000b000 - 0000013f80]       MEMNODEMAP
Bootmem setup node 2 0000000278000000-0000000378000000
  NODE_DATA [0000000278000000 - 0000000278004fff]
  bootmap [0000000278005000 -  0000000278024fff] pages 20
(9 early reservations) ==> bootmem
  #0 [0000000000 - 0000001000]   BIOS data page
  #1 [0000006000 - 0000008000]       TRAMPOLINE
  #2 [0000200000 - 00010ba6d4]    TEXT DATA BSS
  #3 [007e6d7000 - 007ffffe0b]          RAMDISK
  #4 [000009c800 - 0000100000]    BIOS reserved
  #5 [0000008000 - 000000a000]          PGTABLE
  #6 [000000a000 - 000000b000]          PGTABLE
  #7 [0000001000 - 000000103c]        ACPI SLIT
  #8 [000000b000 - 0000013f80]       MEMNODEMAP
Bootmem setup node 3 0000000378000000-0000000478000000
  NODE_DATA [0000000378000000 - 0000000378004fff]
  bootmap [0000000378005000 -  0000000378024fff] pages 20
(9 early reservations) ==> bootmem
  #0 [0000000000 - 0000001000]   BIOS data page
  #1 [0000006000 - 0000008000]       TRAMPOLINE
  #2 [0000200000 - 00010ba6d4]    TEXT DATA BSS
  #3 [007e6d7000 - 007ffffe0b]          RAMDISK
  #4 [000009c800 - 0000100000]    BIOS reserved
  #5 [0000008000 - 000000a000]          PGTABLE
  #6 [000000a000 - 000000b000]          PGTABLE
  #7 [0000001000 - 000000103c]        ACPI SLIT
  #8 [000000b000 - 0000013f80]       MEMNODEMAP
Scan SMP from ffff880000000000 for 1024 bytes.
Scan SMP from ffff88000009fc00 for 1024 bytes.
Scan SMP from ffff8800000f0000 for 65536 bytes.
found SMP MP-table at [ffff8800000ff780] 000ff780
 [ffffe20000000000-ffffe27fffffffff] PGD ->ffff8800011bd000 on node 0
 [ffffe20000000000-ffffe2003fffffff] PUD ->ffff8800011be000 on node 0
[ffffe20005240000-ffffe200053fffff] potential offnode page_structs
 [ffffe20000000000-ffffe200053fffff] PMD ->
[ffff880001200000-ffff880004bfffff] on node 0
[ffffe20008a40000-ffffe20008bfffff] potential offnode page_structs
 [ffffe20005400000-ffffe20008bfffff] PMD ->
[ffff880178200000-ffff88017b9fffff] on node 1
[ffffe2000c240000-ffffe2000c3fffff] potential offnode page_structs
 [ffffe20008c00000-ffffe2000c3fffff] PMD ->
[ffff880278200000-ffff88027b9fffff] on node 2
 [ffffe2000c400000-ffffe2000fbfffff] PMD ->
[ffff880378200000-ffff88037b9fffff] on node 3
Zone PFN ranges:
  DMA      0x00000000 -> 0x00001000
  DMA32    0x00001000 -> 0x00100000
  Normal   0x00100000 -> 0x00478000
Movable zone start PFN for each node
early_node_map[6] active PFN ranges
    0: 0x00000000 -> 0x0000009c
    0: 0x00000100 -> 0x00087fe0
    0: 0x00100000 -> 0x00178000
    1: 0x00178000 -> 0x00278000
    2: 0x00278000 -> 0x00378000
    3: 0x00378000 -> 0x00478000
On node 0 totalpages: 1048444
  DMA zone: 56 pages used for memmap
  DMA zone: 115 pages reserved
  DMA zone: 3825 pages, LIFO batch:0
  DMA32 zone: 14280 pages used for memmap
  DMA32 zone: 538648 pages, LIFO batch:31
  Normal zone: 6720 pages used for memmap
  Normal zone: 484800 pages, LIFO batch:31
  Movable zone: 0 pages used for memmap
On node 1 totalpages: 1048576
  DMA zone: 0 pages used for memmap
  DMA32 zone: 0 pages used for memmap
  Normal zone: 14336 pages used for memmap
  Normal zone: 1034240 pages, LIFO batch:31
  Movable zone: 0 pages used for memmap
On node 2 totalpages: 1048576
  DMA zone: 0 pages used for memmap
  DMA32 zone: 0 pages used for memmap
  Normal zone: 14336 pages used for memmap
  Normal zone: 1034240 pages, LIFO batch:31
  Movable zone: 0 pages used for memmap
On node 3 totalpages: 1048576
  DMA zone: 0 pages used for memmap
  DMA32 zone: 0 pages used for memmap
  Normal zone: 14336 pages used for memmap
  Normal zone: 1034240 pages, LIFO batch:31
  Movable zone: 0 pages used for memmap
ACPI: PM-Timer IO Port: 0x4008
ACPI: Local APIC address 0xfee00000
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x04] enabled)
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x05] enabled)
ACPI: LAPIC (acpi_id[0x03] lapic_id[0x06] enabled)
ACPI: LAPIC (acpi_id[0x04] lapic_id[0x07] enabled)
ACPI: LAPIC (acpi_id[0x05] lapic_id[0x08] enabled)
ACPI: LAPIC (acpi_id[0x06] lapic_id[0x09] enabled)
ACPI: LAPIC (acpi_id[0x07] lapic_id[0x0a] enabled)
ACPI: LAPIC (acpi_id[0x08] lapic_id[0x0b] enabled)
ACPI: LAPIC (acpi_id[0x09] lapic_id[0x0c] enabled)
ACPI: LAPIC (acpi_id[0x0a] lapic_id[0x0d] enabled)
ACPI: LAPIC (acpi_id[0x0b] lapic_id[0x0e] enabled)
ACPI: LAPIC (acpi_id[0x0c] lapic_id[0x0f] enabled)
ACPI: LAPIC (acpi_id[0x0d] lapic_id[0x10] enabled)
ACPI: LAPIC (acpi_id[0x0e] lapic_id[0x11] enabled)
ACPI: LAPIC (acpi_id[0x0f] lapic_id[0x12] enabled)
ACPI: LAPIC (acpi_id[0x10] lapic_id[0x13] enabled)
ACPI: IOAPIC (id[0x00] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 0, version 0, address 0xfec00000, GSI 0-23
ACPI: IOAPIC (id[0x01] address[0xbdeff000] gsi_base[24])
IOAPIC[1]: apic_id 1, version 0, address 0xbdeff000, GSI 24-47
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
ACPI: IRQ0 used by override.
ACPI: IRQ2 used by override.
ACPI: IRQ9 used by override.
ACPI: HPET id: 0x0 base: 0xfed00000
Using ACPI (MADT) for SMP configuration information
SMP: Allowing 16 CPUs, 0 hotplug CPUs
init_cpu_to_node:
cpu 0 -> apicid 4 -> node 0
cpu 1 -> apicid 5 -> node 0
cpu 2 -> apicid 6 -> node 0
cpu 3 -> apicid 7 -> node 0
cpu 4 -> apicid 8 -> node 1
cpu 5 -> apicid 9 -> node 1
cpu 6 -> apicid 10 -> node 1
cpu 7 -> apicid 11 -> node 1
cpu 8 -> apicid 12 -> node 2
cpu 9 -> apicid 13 -> node 2
cpu 10 -> apicid 14 -> node 2
cpu 11 -> apicid 15 -> node 2
cpu 12 -> apicid 16 -> node 3
cpu 13 -> apicid 17 -> node 3
cpu 14 -> apicid 18 -> node 3
cpu 15 -> apicid 19 -> node 3
mapped APIC to ffffffffff5fb000 (        fee00000)
mapped IOAPIC to ffffffffff5fa000 (00000000fec00000)
mapped IOAPIC to ffffffffff5f9000 (00000000bdeff000)
Allocating PCI resources starting at 90000000 (gap: 88000000:76c00000)
PERCPU: Allocating 53312 bytes of per cpu data
per cpu data for cpu0 on node0 at 00000000010e4000
per cpu data for cpu1 on node0 at 00000000010f2000
per cpu data for cpu2 on node0 at 0000000001100000
per cpu data for cpu3 on node0 at 000000000110e000
per cpu data for cpu4 on node1 at 000000017ba18000
per cpu data for cpu5 on node1 at 000000017ba26000
per cpu data for cpu6 on node1 at 000000017ba34000
per cpu data for cpu7 on node1 at 000000017ba42000
per cpu data for cpu8 on node2 at 000000027ba18000
per cpu data for cpu9 on node2 at 000000027ba26000
per cpu data for cpu10 on node2 at 000000027ba34000
per cpu data for cpu11 on node2 at 000000027ba42000
per cpu data for cpu12 on node3 at 000000037ba18000
per cpu data for cpu13 on node3 at 000000037ba26000
per cpu data for cpu14 on node3 at 000000037ba34000
per cpu data for cpu15 on node3 at 000000037ba42000
NR_CPUS: 128, nr_cpu_ids: 16, nr_node_ids 4
Built 4 zonelists in Zone order, mobility grouping on.  Total pages: 4129993
Policy zone: Normal
Kernel command line: console=uart8250,io,0x3f8,115200n8
initrd=kernel.org/mydisk11_x86_64.gz rw root=/dev/ram0 debug
show_msr=1 nopat initcall_debug apic=verbose pci=routeirq ip=dhcp
load_ramdisk=1 ramdisk_size=131072
BOOT_IMAGE=kernel.org/bzImage_2.6.26_k8.h
Initializing CPU#0
PID hash table entries: 4096 (order: 12, 32768 bytes)
Extended CMOS year: 2000
TSC calibrated against PM_TIMER
Detected 2293.903 MHz processor.
spurious 8259A interrupt: IRQ7.
Console: colour VGA+ 80x25
console handover: boot [uart0] -> real [ttyS0]
Checking aperture...
No AGP bridge found
Node 0: aperture @ a21c000000 size 32 MB
Aperture beyond 4GB. Ignoring.
Your BIOS doesn't leave a aperture memory hole
Please enable the IOMMU option in the BIOS setup
This costs you 64 MB of RAM
Mapping aperture over 65536 KB of RAM @ 20000000
numa_free_all_bootmem node 0 done
numa_free_all_bootmem node 1 done
numa_free_all_bootmem node 2 done
numa_free_all_bootmem node 3 done
Memory: 16437296k/18743296k available (8383k kernel code, 339392k
reserved, 4020k data, 988k init)
CPA: page pool initialized 1 of 1 pages preallocated
SLUB: Genslabs=13, HWalign=64, Order=0-3, MinObjects=0, CPUs=16, Nodes=4
hpet clockevent registered
Calibrating delay loop (skipped), value calculated using timer
frequency.. <6>4587.80 BogoMIPS (lpj=9175600)
Dentry cache hash table entries: 2097152 (order: 12, 16777216 bytes)
Inode-cache hash table entries: 1048576 (order: 11, 8388608 bytes)
Mount-cache hash table entries: 256
Initializing cgroup subsys ns
Initializing cgroup subsys cpuacct
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 512K (64 bytes/line)
CPU 0/4 -> Node 0
Enable MMCONFIG on AMD Family 10h
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 0
using C1E aware idle routine
ACPI: Core revision 20080321
Parsing all Control Methods:
Table [DSDT](id 0001) - 1289 Objects with 114 Devices 462 Methods 26 Regions
Parsing all Control Methods:
Table [SSDT](id 0002) - 80 Objects with 0 Devices 0 Methods 0 Regions
 tbxface-0598 [00] tb_load_namespace     : ACPI Tables successfully acquired
evxfevnt-0091 [00] enable                : Transition to ACPI mode successful
Setting APIC routing to physical flat
Kernel panic - not syncing: Boot APIC ID in local APIC unexpected (0 vs 4)
Pid: 1, comm: swapper Not tainted 2.6.26-rc9-tip-01763-g74f94b1-dirty #320

Call Trace:
 [<ffffffff80a21505>] ? set_cpu_sibling_map+0x38c/0x3bd
 [<ffffffff80245215>] ? read_xapic_id+0x25/0x3e
 [<ffffffff80e5a2c3>] ? verify_local_APIC+0x139/0x1b9
 [<ffffffff80245215>] ? read_xapic_id+0x25/0x3e
 [<ffffffff80e589af>] ? native_smp_prepare_cpus+0x224/0x2e9
 [<ffffffff80e4881a>] ? kernel_init+0x64/0x341
 [<ffffffff8022a439>] ? child_rip+0xa/0x11
 [<ffffffff80e487b6>] ? kernel_init+0x0/0x341
 [<ffffffff8022a42f>] ? child_rip+0x0/0x11


guess read_apic_id changing cuase some problem...

YH

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support
  2008-07-11 20:42           ` Yinghai Lu
@ 2008-07-11 20:45             ` Ingo Molnar
  2008-07-11 21:24               ` Suresh Siddha
  0 siblings, 1 reply; 87+ messages in thread
From: Ingo Molnar @ 2008-07-11 20:45 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Suresh Siddha, hpa@zytor.com, tglx@linutronix.de,
	akpm@linux-foundation.org, arjan@linux.intel.com,
	andi@firstfloor.org, ebiederm@xmission.com,
	jbarnes@virtuousgeek.org, steiner@sgi.com,
	linux-kernel@vger.kernel.org, jeremy@goop.org


* Yinghai Lu <yhlu.kernel@gmail.com> wrote:

> Setting APIC routing to physical flat
> Kernel panic - not syncing: Boot APIC ID in local APIC unexpected (0 vs 4)
> Pid: 1, comm: swapper Not tainted 2.6.26-rc9-tip-01763-g74f94b1-dirty #320
> 
> Call Trace:
>  [<ffffffff80a21505>] ? set_cpu_sibling_map+0x38c/0x3bd
>  [<ffffffff80245215>] ? read_xapic_id+0x25/0x3e
>  [<ffffffff80e5a2c3>] ? verify_local_APIC+0x139/0x1b9
>  [<ffffffff80245215>] ? read_xapic_id+0x25/0x3e
>  [<ffffffff80e589af>] ? native_smp_prepare_cpus+0x224/0x2e9
>  [<ffffffff80e4881a>] ? kernel_init+0x64/0x341
>  [<ffffffff8022a439>] ? child_rip+0xa/0x11
>  [<ffffffff80e487b6>] ? kernel_init+0x0/0x341
>  [<ffffffff8022a42f>] ? child_rip+0x0/0x11
> 
> 
> guess read_apic_id changing cuase some problem...

i got a build error sooner so i've taken it out of tip/master again.

	Ingo

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support
  2008-07-11 20:31         ` Suresh Siddha
  2008-07-11 20:42           ` Yinghai Lu
@ 2008-07-11 20:49           ` Ingo Molnar
  1 sibling, 0 replies; 87+ messages in thread
From: Ingo Molnar @ 2008-07-11 20:49 UTC (permalink / raw)
  To: Suresh Siddha
  Cc: hpa@zytor.com, tglx@linutronix.de, akpm@linux-foundation.org,
	arjan@linux.intel.com, andi@firstfloor.org, ebiederm@xmission.com,
	jbarnes@virtuousgeek.org, steiner@sgi.com,
	linux-kernel@vger.kernel.org, jeremy@goop.org


* Suresh Siddha <suresh.b.siddha@intel.com> wrote:

> On Fri, Jul 11, 2008 at 01:09:57PM -0700, Ingo Molnar wrote:
> > 
> > * Ingo Molnar <mingo@elte.hu> wrote:
> > 
> > > > >   http://redhat.com/~mingo/misc/config-Thu_Jul_10_21_43_28_CEST_2008.bad
> > > >
> > > > Ingo, that was my stupid typo. Please apply this patch. BTW, we need
> > > > some more xen64 paravirt fixes in this area. I will look at it as
> > > > soon as possible.
> > >
> > > applied to tip/x86/x2apic - thanks Suresh.
> > 
> > another problem is the redefinition of apic_read(), causing:
> > 
> >  arch/x86/xen/enlighten.c: In function ‘xen_patch':
> >  arch/x86/xen/enlighten.c:1084: warning: label ‘patch_site' defined but not used
> >  arch/x86/xen/enlighten.c: At top level:
> >  arch/x86/xen/enlighten.c:1272: error: expected identifier before ‘(' token
> >  arch/x86/xen/enlighten.c:1273: error: expected ‘}' before ‘.' token
> > 
> > with this config:
> > 
> >   http://redhat.com/~mingo/misc/config-Fri_Jul_11_21_51_18_CEST_2008.bad
> > 
> > the continued spaghetti in all the APIC variants is quite ugly. This
> > should all be handled via a single apic_ops template that should cover
> > the paravirt and native variants as well.
> 
> Ingo, I just posted the fix for this.

applied to tip/x86/x2apic:

  Suresh Siddha (3):
        x2apic: uninline uv_init_apic_ldr()
        x2apic: xen64 paravirt basic apic ops
        x2apic: kernel-parameter documentation for "x2apic_phys"

thanks Suresh.

> To cleanup the code:
> 
> struct pv_apic_ops {
> #ifdef CONFIG_X86_LOCAL_APIC
> 	/*
> 	* Direct APIC operations, principally for VMI.  Ideally
> 	* these shouldn't be in this interface.
> 	*/
> 	void (*apic_write)(unsigned long reg, u32 v);
> 	void (*apic_write_atomic)(unsigned long reg, u32 v);
> 	u32 (*apic_read)(unsigned long reg);
> 
> Probably we should move the three above routines to basic apic_ops, 
> which just deal with the apic HW accesses and retain the below for 
> pv_apic_ops, which care more than the basic reg accesses. This will be 
> true for both 32/64bits..
> 
>         void (*setup_boot_clock)(void);
>         void (*setup_secondary_clock)(void);
> 
>         void (*startup_ipi_hook)(int phys_apicid,
>                                  unsigned long start_eip,
>                                  unsigned long start_esp);
> #endif
> };
> 
> Unless there is an objection, I will post the fix.

ok. Jeremy, agreed?

	Ingo

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support
  2008-07-11 20:45             ` Ingo Molnar
@ 2008-07-11 21:24               ` Suresh Siddha
  2008-07-11 22:02                 ` Yinghai Lu
  0 siblings, 1 reply; 87+ messages in thread
From: Suresh Siddha @ 2008-07-11 21:24 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Yinghai Lu, Siddha, Suresh B, hpa@zytor.com, tglx@linutronix.de,
	akpm@linux-foundation.org, arjan@linux.intel.com,
	andi@firstfloor.org, ebiederm@xmission.com,
	jbarnes@virtuousgeek.org, steiner@sgi.com,
	linux-kernel@vger.kernel.org, jeremy@goop.org

On Fri, Jul 11, 2008 at 01:45:21PM -0700, Ingo Molnar wrote:
> 
> * Yinghai Lu <yhlu.kernel@gmail.com> wrote:
> 
> > Setting APIC routing to physical flat
> > Kernel panic - not syncing: Boot APIC ID in local APIC unexpected (0 vs 4)
> > Pid: 1, comm: swapper Not tainted 2.6.26-rc9-tip-01763-g74f94b1-dirty #320
> >
> > Call Trace:
> >  [<ffffffff80a21505>] ? set_cpu_sibling_map+0x38c/0x3bd
> >  [<ffffffff80245215>] ? read_xapic_id+0x25/0x3e
> >  [<ffffffff80e5a2c3>] ? verify_local_APIC+0x139/0x1b9
> >  [<ffffffff80245215>] ? read_xapic_id+0x25/0x3e
> >  [<ffffffff80e589af>] ? native_smp_prepare_cpus+0x224/0x2e9
> >  [<ffffffff80e4881a>] ? kernel_init+0x64/0x341
> >  [<ffffffff8022a439>] ? child_rip+0xa/0x11
> >  [<ffffffff80e487b6>] ? kernel_init+0x0/0x341
> >  [<ffffffff8022a42f>] ? child_rip+0x0/0x11
> >
> >
> > guess read_apic_id changing cuase some problem...

Yinghai, Can you please try the appended patch to see if it fixes your problem?

I guess you need to try the tip/x86/x2apic tree now please :(

BTW, I don't know why we even do verify_local_APIC() in 64bit. Especially
when we don't care for the return value, it should be a no-op.

---
genapic's read_apic_id() returns the actual apic id extracted from
the APIC_ID register. And in some cases like UV, read_apic_id()
returns completely different values from APIC ID register.

Use the native apic register read, rather than genapic read_apic_id()
in verify_local_APIC()

And also, lapic_suspend() should also use native apic register read.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---

diff --git a/arch/x86/kernel/apic_64.c b/arch/x86/kernel/apic_64.c
index 9a319d7..dd05010 100644
--- a/arch/x86/kernel/apic_64.c
+++ b/arch/x86/kernel/apic_64.c
@@ -698,10 +698,10 @@ int __init verify_local_APIC(void)
 	/*
 	 * The ID register is read/write in a real APIC.
 	 */
-	reg0 = read_apic_id();
+	reg0 = apic_read(APIC_ID);
 	apic_printk(APIC_DEBUG, "Getting ID: %x\n", reg0);
 	apic_write(APIC_ID, reg0 ^ APIC_ID_MASK);
-	reg1 = read_apic_id();
+	reg1 = apic_read(APIC_ID);
 	apic_printk(APIC_DEBUG, "Getting ID: %x\n", reg1);
 	apic_write(APIC_ID, reg0);
 	if (reg1 != (reg0 ^ APIC_ID_MASK))
@@ -1336,7 +1336,7 @@ static int lapic_suspend(struct sys_device *dev, pm_message_t state)
 
 	maxlvt = lapic_get_maxlvt();
 
-	apic_pm_state.apic_id = read_apic_id();
+	apic_pm_state.apic_id = apic_read(APIC_ID);
 	apic_pm_state.apic_taskpri = apic_read(APIC_TASKPRI);
 	apic_pm_state.apic_ldr = apic_read(APIC_LDR);
 	apic_pm_state.apic_dfr = apic_read(APIC_DFR);

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support
  2008-07-11 21:24               ` Suresh Siddha
@ 2008-07-11 22:02                 ` Yinghai Lu
  2008-07-12  3:16                   ` Yinghai Lu
  2008-07-12  5:37                   ` Ingo Molnar
  0 siblings, 2 replies; 87+ messages in thread
From: Yinghai Lu @ 2008-07-11 22:02 UTC (permalink / raw)
  To: Suresh Siddha
  Cc: Ingo Molnar, hpa@zytor.com, tglx@linutronix.de,
	akpm@linux-foundation.org, arjan@linux.intel.com,
	andi@firstfloor.org, ebiederm@xmission.com,
	jbarnes@virtuousgeek.org, steiner@sgi.com,
	linux-kernel@vger.kernel.org, jeremy@goop.org

On Fri, Jul 11, 2008 at 2:24 PM, Suresh Siddha
<suresh.b.siddha@intel.com> wrote:
> On Fri, Jul 11, 2008 at 01:45:21PM -0700, Ingo Molnar wrote:
>>
>> * Yinghai Lu <yhlu.kernel@gmail.com> wrote:
>>
>> > Setting APIC routing to physical flat
>> > Kernel panic - not syncing: Boot APIC ID in local APIC unexpected (0 vs 4)
>> > Pid: 1, comm: swapper Not tainted 2.6.26-rc9-tip-01763-g74f94b1-dirty #320
>> >
>> > Call Trace:
>> >  [<ffffffff80a21505>] ? set_cpu_sibling_map+0x38c/0x3bd
>> >  [<ffffffff80245215>] ? read_xapic_id+0x25/0x3e
>> >  [<ffffffff80e5a2c3>] ? verify_local_APIC+0x139/0x1b9
>> >  [<ffffffff80245215>] ? read_xapic_id+0x25/0x3e
>> >  [<ffffffff80e589af>] ? native_smp_prepare_cpus+0x224/0x2e9
>> >  [<ffffffff80e4881a>] ? kernel_init+0x64/0x341
>> >  [<ffffffff8022a439>] ? child_rip+0xa/0x11
>> >  [<ffffffff80e487b6>] ? kernel_init+0x0/0x341
>> >  [<ffffffff8022a42f>] ? child_rip+0x0/0x11
>> >
>> >
>> > guess read_apic_id changing cuase some problem...
>
> Yinghai, Can you please try the appended patch to see if it fixes your problem?
>
> I guess you need to try the tip/x86/x2apic tree now please :(
>
> BTW, I don't know why we even do verify_local_APIC() in 64bit. Especially
> when we don't care for the return value, it should be a no-op.
>
> ---
> genapic's read_apic_id() returns the actual apic id extracted from
> the APIC_ID register. And in some cases like UV, read_apic_id()
> returns completely different values from APIC ID register.
>
> Use the native apic register read, rather than genapic read_apic_id()
> in verify_local_APIC()
>
> And also, lapic_suspend() should also use native apic register read.
>
> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
> ---
>
> diff --git a/arch/x86/kernel/apic_64.c b/arch/x86/kernel/apic_64.c
> index 9a319d7..dd05010 100644
> --- a/arch/x86/kernel/apic_64.c
> +++ b/arch/x86/kernel/apic_64.c
> @@ -698,10 +698,10 @@ int __init verify_local_APIC(void)
>        /*
>         * The ID register is read/write in a real APIC.
>         */
> -       reg0 = read_apic_id();
> +       reg0 = apic_read(APIC_ID);
>        apic_printk(APIC_DEBUG, "Getting ID: %x\n", reg0);
>        apic_write(APIC_ID, reg0 ^ APIC_ID_MASK);
> -       reg1 = read_apic_id();
> +       reg1 = apic_read(APIC_ID);
>        apic_printk(APIC_DEBUG, "Getting ID: %x\n", reg1);
>        apic_write(APIC_ID, reg0);
>        if (reg1 != (reg0 ^ APIC_ID_MASK))
> @@ -1336,7 +1336,7 @@ static int lapic_suspend(struct sys_device *dev, pm_message_t state)
>
>        maxlvt = lapic_get_maxlvt();
>
> -       apic_pm_state.apic_id = read_apic_id();
> +       apic_pm_state.apic_id = apic_read(APIC_ID);
>        apic_pm_state.apic_taskpri = apic_read(APIC_TASKPRI);
>        apic_pm_state.apic_ldr = apic_read(APIC_LDR);
>        apic_pm_state.apic_dfr = apic_read(APIC_DFR);
>

works. it should be merged into the patch that introduce new read_apic_id

really should unify read_apic_id, GET_APIC_ID, GET_XAPIC_ID...

YH

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [patch 23/26] x64, x2apic/intr-remap: MSI and MSI-X support for interrupt remapping infrastructure
  2008-07-11  8:59       ` Eric W. Biederman
@ 2008-07-11 23:07         ` Suresh Siddha
  2008-07-11 23:50           ` Eric W. Biederman
  0 siblings, 1 reply; 87+ messages in thread
From: Suresh Siddha @ 2008-07-11 23:07 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Siddha, Suresh B, mingo@elte.hu, hpa@zytor.com,
	tglx@linutronix.de, akpm@linux-foundation.org,
	arjan@linux.intel.com, andi@firstfloor.org,
	jbarnes@virtuousgeek.org, steiner@sgi.com,
	linux-kernel@vger.kernel.org

On Fri, Jul 11, 2008 at 01:59:24AM -0700, Eric W. Biederman wrote:
> What I am ultimately looking for is the x86 iommu irq mapping api.
> And how we handle irqs in the context of it.
> 
> So as a start I think we can create x86_map_irq, as I suggested.

Sure. Will probably introduce irq_mapping_ops (which may be as simple as
ops containing specific msi_compose_msg, ioapic_compose_rte, etc). This
should simiplify the setup code. I will look into this and post these patches
next week.

> Since we have the pci dev to lookup the iommu then we really shouldn't

Not all irq's in the platform will be remapped, for example, interrupts
generated by IOMMU itself are not remapped. And all irq's don't have
corresponding pci dev, like IO-APIC, MSI etc.

> need multiple irq_chip structures (although it may be worth it if we
> can detect we can optimize irq migration).

Depending on the IOMMU/hardware, they may define different irq_chip's
if they add/simplify functionality, or may  use existing irq_chip's.

thanks,
suresh

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [patch 23/26] x64, x2apic/intr-remap: MSI and MSI-X support for interrupt remapping infrastructure
  2008-07-11 23:07         ` Suresh Siddha
@ 2008-07-11 23:50           ` Eric W. Biederman
  0 siblings, 0 replies; 87+ messages in thread
From: Eric W. Biederman @ 2008-07-11 23:50 UTC (permalink / raw)
  To: Suresh Siddha
  Cc: mingo@elte.hu, hpa@zytor.com, tglx@linutronix.de,
	akpm@linux-foundation.org, arjan@linux.intel.com,
	andi@firstfloor.org, jbarnes@virtuousgeek.org, steiner@sgi.com,
	linux-kernel@vger.kernel.org

Suresh Siddha <suresh.b.siddha@intel.com> writes:

> On Fri, Jul 11, 2008 at 01:59:24AM -0700, Eric W. Biederman wrote:

> Sure. Will probably introduce irq_mapping_ops (which may be as simple as
> ops containing specific msi_compose_msg, ioapic_compose_rte, etc). This
> should simiplify the setup code. I will look into this and post these patches
> next week.

Thanks.  I am a little leery of separate hooks for different message
types.  As they are just different ways of encoding the information
for an interrupt message.  The index parameter seems to have placed
the same bits in the same relative location for both msi and the
ioapic_rte.  Which suggests to me that we have an architecturally
defined correspondence between the bits.  If that is indeed the case
and if we can continue to do that we should be able to keep the
complexity of the arch code down with sharing code, and letting the
iommu be oblivious to what part of the architecture will be sending
interrupt messages.

Although looking at it.  In theory if not in practice there are 64 bits
in an ioapic_rte message and 36 bits in an msi message (as long as we 
maintain the architecturally standard window).  So we may actually
be able to encode more information in one message then in the other.

>> need multiple irq_chip structures (although it may be worth it if we
>> can detect we can optimize irq migration).
>
> Depending on the IOMMU/hardware, they may define different irq_chip's
> if they add/simplify functionality, or may  use existing irq_chip's.

There are a lot of variables, as long as end result is relatively clean
and maintainable.

Eric

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support
  2008-07-11 22:02                 ` Yinghai Lu
@ 2008-07-12  3:16                   ` Yinghai Lu
  2008-07-12  3:52                     ` Eric W. Biederman
  2008-07-13  0:55                     ` Suresh Siddha
  2008-07-12  5:37                   ` Ingo Molnar
  1 sibling, 2 replies; 87+ messages in thread
From: Yinghai Lu @ 2008-07-12  3:16 UTC (permalink / raw)
  To: Suresh Siddha
  Cc: Ingo Molnar, hpa@zytor.com, tglx@linutronix.de,
	akpm@linux-foundation.org, arjan@linux.intel.com,
	andi@firstfloor.org, ebiederm@xmission.com,
	jbarnes@virtuousgeek.org, steiner@sgi.com,
	linux-kernel@vger.kernel.org, jeremy@goop.org

1. wonder if x2apic can be use with uniprocessor.

in APIC_init_uniprocessor, it will try to enable x2apic, but later

apic_write(APIC_ID, SET_APIC_ID(boot_cpu_physical_apicid));

but SET_APIC_ID is still for xapic version. so need to GET_APIC_ID,
SET_APIC_ID for different
genapic like 32bit.

2 check_x2apic is called in setup_arch, but it only set apic_ops,
and genapic still not changed, aka apic_flat...
wonder if you need to call setup_apic_routing to set genapic.

otherwise read_apic_id could have use the one from apic_flat....need
to shift......

3.or move read_apic_id to apic_ops intead...together with GET_APIC_ID too.
but 32bit version seems like to put GET_APIC_ID with genapic...

which one is better? 2 or 3

YH

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support
  2008-07-12  3:16                   ` Yinghai Lu
@ 2008-07-12  3:52                     ` Eric W. Biederman
  2008-07-12  6:17                       ` Yinghai Lu
  2008-07-13  0:55                     ` Suresh Siddha
  1 sibling, 1 reply; 87+ messages in thread
From: Eric W. Biederman @ 2008-07-12  3:52 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Suresh Siddha, Ingo Molnar, hpa@zytor.com, tglx@linutronix.de,
	akpm@linux-foundation.org, arjan@linux.intel.com,
	andi@firstfloor.org, ebiederm@xmission.com,
	jbarnes@virtuousgeek.org, steiner@sgi.com,
	linux-kernel@vger.kernel.org, jeremy@goop.org

"Yinghai Lu" <yhlu.kernel@gmail.com> writes:

> 1. wonder if x2apic can be use with uniprocessor.
>
> in APIC_init_uniprocessor, it will try to enable x2apic, but later
>
> apic_write(APIC_ID, SET_APIC_ID(boot_cpu_physical_apicid));
>
> but SET_APIC_ID is still for xapic version. so need to GET_APIC_ID,
> SET_APIC_ID for different
> genapic like 32bit.
>
> 2 check_x2apic is called in setup_arch, but it only set apic_ops,
> and genapic still not changed, aka apic_flat...
> wonder if you need to call setup_apic_routing to set genapic.
>
> otherwise read_apic_id could have use the one from apic_flat....need
> to shift......
>
> 3.or move read_apic_id to apic_ops intead...together with GET_APIC_ID too.
> but 32bit version seems like to put GET_APIC_ID with genapic...
>
> which one is better? 2 or 3

Z finish untangle SMP support from apic initialization and move the apic
initialization up into init_IRQ.

That is better but is likely the wrong short term approach.

Eric



^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support
  2008-07-11 22:02                 ` Yinghai Lu
  2008-07-12  3:16                   ` Yinghai Lu
@ 2008-07-12  5:37                   ` Ingo Molnar
  2008-07-12  6:06                     ` Yinghai Lu
  1 sibling, 1 reply; 87+ messages in thread
From: Ingo Molnar @ 2008-07-12  5:37 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Suresh Siddha, hpa@zytor.com, tglx@linutronix.de,
	akpm@linux-foundation.org, arjan@linux.intel.com,
	andi@firstfloor.org, ebiederm@xmission.com,
	jbarnes@virtuousgeek.org, steiner@sgi.com,
	linux-kernel@vger.kernel.org, jeremy@goop.org


* Yinghai Lu <yhlu.kernel@gmail.com> wrote:

> > Yinghai, Can you please try the appended patch to see if it fixes 
> > your problem?

> works. it should be merged into the patch that introduce new 
> read_apic_id

ok, that came from the x86/uv branch - but in that form it was not 
affected, only tip/x2apic exposed the problem, right?

So i've rebased tip/x86/x2apic and moved Suresh's fix in front of the 
other patches, to make it all bisectable.

i have also applied your apic unification patches. This is how the topic 
layout looks like now:

Ingo Molnar (1):
      Merge branch 'x86/core' into x86/x2apic

Suresh Siddha (31):
      x64, x2apic/intr-remap: Interrupt-remapping and x2apic support, fix
      x64, x2apic/intr-remap: Intel vt-d, IOMMU code reorganization
      x64, x2apic/intr-remap: fix the need for sequential array allocation of iommus
      x64, x2apic/intr-remap: code re-structuring, to be used by both DMA and Interrupt remapping
      x64, x2apic/intr-remap: use CONFIG_DMAR for DMA-remapping specific code
      x64, x2apic/intr-remap: Fix the need for RMRR in the DMA-remapping detection
      x64, x2apic/intr-remap: parse ioapic scope under vt-d structures
      x64, x2apic/intr-remap: move IOMMU_WAIT_OP() macro to intel-iommu.h
      x64, x2apic/intr-remap: Queued invalidation infrastructure (part of VT-d)
      x64, x2apic/intr-remap: Interrupt remapping infrastructure
      x64, x2apic/intr-remap: routines managing Interrupt remapping table entries.
      x64, x2apic/intr-remap: generic irq migration support from process context
      x64, x2apic/intr-remap: 8259 specific mask/unmask routines
      x64, x2apic/intr-remap: ioapic routines which deal with initial io-apic RTE setup
      x64, x2apic/intr-remap: introduce read_apic_id() to genapic routines
      x64, x2apic/intr-remap: basic apic ops support
      x64, x2apic/intr-remap: cpuid bits for x2apic feature
      x64, x2apic/intr-remap: disable DMA-remapping if Interrupt-remapping is detected (temporary quirk)
      x64, x2apic/intr-remap: x2apic ops for x2apic mode support
      x64, x2apic/intr-remap: introcude self IPI to genapic routines
      x64, x2apic/intr-remap: x2apic cluster mode support
      x64, x2apic/intr-remap: setup init_apic_ldr for UV
      x64, x2apic/intr-remap: IO-APIC support for interrupt-remapping
      x64, x2apic/intr-remap: MSI and MSI-X support for interrupt remapping infrastructure
      x64, x2apic/intr-remap: add x2apic support, including enabling interrupt-remapping
      x64, x2apic/intr-remap: support for x2apic physical mode support
      x64, x2apic/intr-remap: introduce CONFIG_INTR_REMAP
      x64, x2apic/intr-remap: Interrupt-remapping and x2apic support
      x2apic: uninline uv_init_apic_ldr()
      x2apic: xen64 paravirt basic apic ops
      x2apic: kernel-parameter documentation for "x2apic_phys"

Yinghai Lu (3):
      x86: let 32bit use apic_ops too
      x86: mach_apicdef.h need to include before smp.h
      x86: make read_apic_id return final apicid

	Ingo

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support
  2008-07-12  5:37                   ` Ingo Molnar
@ 2008-07-12  6:06                     ` Yinghai Lu
  2008-07-12  6:45                       ` Ingo Molnar
  0 siblings, 1 reply; 87+ messages in thread
From: Yinghai Lu @ 2008-07-12  6:06 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Suresh Siddha, hpa@zytor.com, tglx@linutronix.de,
	akpm@linux-foundation.org, arjan@linux.intel.com,
	andi@firstfloor.org, ebiederm@xmission.com,
	jbarnes@virtuousgeek.org, steiner@sgi.com,
	linux-kernel@vger.kernel.org, jeremy@goop.org

On Fri, Jul 11, 2008 at 10:37 PM, Ingo Molnar <mingo@elte.hu> wrote:
>
> * Yinghai Lu <yhlu.kernel@gmail.com> wrote:
>
>> > Yinghai, Can you please try the appended patch to see if it fixes
>> > your problem?
>
>> works. it should be merged into the patch that introduce new
>> read_apic_id
>
> ok, that came from the x86/uv branch - but in that form it was not
> affected, only tip/x2apic exposed the problem, right?

because of new read_apic_id()

>
> So i've rebased tip/x86/x2apic and moved Suresh's fix in front of the
> other patches, to make it all bisectable.

should incorperate that fix into the patch that introduce new read_apic_id()

commit df8cc50cc9357ba5a5d6a07744fa36b16a81121c
Author: Suresh Siddha <suresh.b.siddha@intel.com>
Date:   Thu Jul 10 11:16:48 2008 -0700

    x64, x2apic/intr-remap: introduce read_apic_id() to genapic routines

    Move the read_apic_id()  to genapic routines.

YH

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support
  2008-07-12  3:52                     ` Eric W. Biederman
@ 2008-07-12  6:17                       ` Yinghai Lu
  2008-07-12  7:02                         ` Eric W. Biederman
  2008-07-13  1:00                         ` Suresh Siddha
  0 siblings, 2 replies; 87+ messages in thread
From: Yinghai Lu @ 2008-07-12  6:17 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Suresh Siddha, Ingo Molnar, hpa@zytor.com, tglx@linutronix.de,
	akpm@linux-foundation.org, arjan@linux.intel.com,
	andi@firstfloor.org, jbarnes@virtuousgeek.org, steiner@sgi.com,
	linux-kernel@vger.kernel.org, jeremy@goop.org

On Fri, Jul 11, 2008 at 8:52 PM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
> "Yinghai Lu" <yhlu.kernel@gmail.com> writes:
>
>> 1. wonder if x2apic can be use with uniprocessor.
>>
>> in APIC_init_uniprocessor, it will try to enable x2apic, but later
>>
>> apic_write(APIC_ID, SET_APIC_ID(boot_cpu_physical_apicid));
>>
>> but SET_APIC_ID is still for xapic version. so need to GET_APIC_ID,
>> SET_APIC_ID for different
>> genapic like 32bit.
>>
>> 2 check_x2apic is called in setup_arch, but it only set apic_ops,
>> and genapic still not changed, aka apic_flat...
>> wonder if you need to call setup_apic_routing to set genapic.
>>
>> otherwise read_apic_id could have use the one from apic_flat....need
>> to shift......
>>
>> 3.or move read_apic_id to apic_ops intead...together with GET_APIC_ID too.
>> but 32bit version seems like to put GET_APIC_ID with genapic...
>>
>> which one is better? 2 or 3
>
> Z finish untangle SMP support from apic initialization and move the apic
> initialization up into init_IRQ.
>
> That is better but is likely the wrong short term approach.

plan to add get_apic_id(x) into 64bit genapic, and will use
#define GET_APIC_ID(x) genapic->get_apic_id(x)
#define read_apic_id() GET_APIC_ID(apic_read(APIC_ID))

so it is identical to 32bit, and we smooth the merging of 32/64 apic code

also read the x2APIC spec pdf, it doesn't say anything about interrupt
remapping...need to be used with x2apic...

YH

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support
  2008-07-12  6:06                     ` Yinghai Lu
@ 2008-07-12  6:45                       ` Ingo Molnar
  0 siblings, 0 replies; 87+ messages in thread
From: Ingo Molnar @ 2008-07-12  6:45 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Suresh Siddha, hpa@zytor.com, tglx@linutronix.de,
	akpm@linux-foundation.org, arjan@linux.intel.com,
	andi@firstfloor.org, ebiederm@xmission.com,
	jbarnes@virtuousgeek.org, steiner@sgi.com,
	linux-kernel@vger.kernel.org, jeremy@goop.org


* Yinghai Lu <yhlu.kernel@gmail.com> wrote:

> > other patches, to make it all bisectable.
> 
> should incorperate that fix into the patch that introduce new 
> read_apic_id()
> 
> commit df8cc50cc9357ba5a5d6a07744fa36b16a81121c
> Author: Suresh Siddha <suresh.b.siddha@intel.com>
> Date:   Thu Jul 10 11:16:48 2008 -0700
> 
>     x64, x2apic/intr-remap: introduce read_apic_id() to genapic routines
> 
>     Move the read_apic_id()  to genapic routines.

ok, i've moved it straight after that commit. (bisecting hitting exactly 
that window is not an issue and it's better if we see in the history the 
types of breakages certain changes can cause - that helps people who 
research yet-unfixed crashes, etc.)

	Ingo

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support
  2008-07-12  6:17                       ` Yinghai Lu
@ 2008-07-12  7:02                         ` Eric W. Biederman
  2008-07-12  7:49                           ` Yinghai Lu
  2008-07-13  1:32                           ` Suresh Siddha
  2008-07-13  1:00                         ` Suresh Siddha
  1 sibling, 2 replies; 87+ messages in thread
From: Eric W. Biederman @ 2008-07-12  7:02 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Eric W. Biederman, Suresh Siddha, Ingo Molnar, hpa@zytor.com,
	tglx@linutronix.de, akpm@linux-foundation.org,
	arjan@linux.intel.com, andi@firstfloor.org,
	jbarnes@virtuousgeek.org, steiner@sgi.com,
	linux-kernel@vger.kernel.org, jeremy@goop.org

"Yinghai Lu" <yhlu.kernel@gmail.com> writes:

> On Fri, Jul 11, 2008 at 8:52 PM, Eric W. Biederman
> <ebiederm@xmission.com> wrote:
>> "Yinghai Lu" <yhlu.kernel@gmail.com> writes:
>>
>>> 1. wonder if x2apic can be use with uniprocessor.
>>>
>>> in APIC_init_uniprocessor, it will try to enable x2apic, but later
>>>
>>> apic_write(APIC_ID, SET_APIC_ID(boot_cpu_physical_apicid));
>>>
>>> but SET_APIC_ID is still for xapic version. so need to GET_APIC_ID,
>>> SET_APIC_ID for different
>>> genapic like 32bit.
>>>
>>> 2 check_x2apic is called in setup_arch, but it only set apic_ops,
>>> and genapic still not changed, aka apic_flat...
>>> wonder if you need to call setup_apic_routing to set genapic.
>>>
>>> otherwise read_apic_id could have use the one from apic_flat....need
>>> to shift......
>>>
>>> 3.or move read_apic_id to apic_ops intead...together with GET_APIC_ID too.
>>> but 32bit version seems like to put GET_APIC_ID with genapic...
>>>
>>> which one is better? 2 or 3
>>
>> Z finish untangle SMP support from apic initialization and move the apic
>> initialization up into init_IRQ.
>>
>> That is better but is likely the wrong short term approach.
>
> plan to add get_apic_id(x) into 64bit genapic, and will use
> #define GET_APIC_ID(x) genapic->get_apic_id(x)
> #define read_apic_id() GET_APIC_ID(apic_read(APIC_ID))
>
> so it is identical to 32bit, and we smooth the merging of 32/64 apic code
>
> also read the x2APIC spec pdf, it doesn't say anything about interrupt
> remapping...need to be used with x2apic...

Clustered logical mode won't work as it requires > 16 bits of apicid.
So only flat physical mode will work.

Eric

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support
  2008-07-12  7:02                         ` Eric W. Biederman
@ 2008-07-12  7:49                           ` Yinghai Lu
  2008-07-12  8:11                             ` Eric W. Biederman
  2008-07-13  1:01                             ` Suresh Siddha
  2008-07-13  1:32                           ` Suresh Siddha
  1 sibling, 2 replies; 87+ messages in thread
From: Yinghai Lu @ 2008-07-12  7:49 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Suresh Siddha, Ingo Molnar, hpa@zytor.com, tglx@linutronix.de,
	akpm@linux-foundation.org, arjan@linux.intel.com,
	andi@firstfloor.org, jbarnes@virtuousgeek.org, steiner@sgi.com,
	linux-kernel@vger.kernel.org, jeremy@goop.org

On Sat, Jul 12, 2008 at 12:02 AM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
> "Yinghai Lu" <yhlu.kernel@gmail.com> writes:
>
>> On Fri, Jul 11, 2008 at 8:52 PM, Eric W. Biederman
>> <ebiederm@xmission.com> wrote:
>>> "Yinghai Lu" <yhlu.kernel@gmail.com> writes:
>>>
>>>> 1. wonder if x2apic can be use with uniprocessor.
>>>>
>>>> in APIC_init_uniprocessor, it will try to enable x2apic, but later
>>>>
>>>> apic_write(APIC_ID, SET_APIC_ID(boot_cpu_physical_apicid));
>>>>
>>>> but SET_APIC_ID is still for xapic version. so need to GET_APIC_ID,
>>>> SET_APIC_ID for different
>>>> genapic like 32bit.
>>>>
>>>> 2 check_x2apic is called in setup_arch, but it only set apic_ops,
>>>> and genapic still not changed, aka apic_flat...
>>>> wonder if you need to call setup_apic_routing to set genapic.
>>>>
>>>> otherwise read_apic_id could have use the one from apic_flat....need
>>>> to shift......
>>>>
>>>> 3.or move read_apic_id to apic_ops intead...together with GET_APIC_ID too.
>>>> but 32bit version seems like to put GET_APIC_ID with genapic...
>>>>
>>>> which one is better? 2 or 3
>>>
>>> Z finish untangle SMP support from apic initialization and move the apic
>>> initialization up into init_IRQ.
>>>
>>> That is better but is likely the wrong short term approach.
>>
>> plan to add get_apic_id(x) into 64bit genapic, and will use
>> #define GET_APIC_ID(x) genapic->get_apic_id(x)
>> #define read_apic_id() GET_APIC_ID(apic_read(APIC_ID))
>>
>> so it is identical to 32bit, and we smooth the merging of 32/64 apic code
>>
>> also read the x2APIC spec pdf, it doesn't say anything about interrupt
>> remapping...need to be used with x2apic...
>
> Clustered logical mode won't work as it requires > 16 bits of apicid.
> So only flat physical mode will work.

current read_apic_id in genx2apic_cluster and genx2apic_phys is the same...

YH

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support
  2008-07-12  7:49                           ` Yinghai Lu
@ 2008-07-12  8:11                             ` Eric W. Biederman
  2008-07-12  8:37                               ` Yinghai Lu
  2008-07-13  1:01                             ` Suresh Siddha
  1 sibling, 1 reply; 87+ messages in thread
From: Eric W. Biederman @ 2008-07-12  8:11 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Suresh Siddha, Ingo Molnar, hpa@zytor.com, tglx@linutronix.de,
	akpm@linux-foundation.org, arjan@linux.intel.com,
	andi@firstfloor.org, jbarnes@virtuousgeek.org, steiner@sgi.com,
	linux-kernel@vger.kernel.org, jeremy@goop.org

"Yinghai Lu" <yhlu.kernel@gmail.com> writes:

>>> also read the x2APIC spec pdf, it doesn't say anything about interrupt
>>> remapping...need to be used with x2apic...
>>
>> Clustered logical mode won't work as it requires > 16 bits of apicid.
>> So only flat physical mode will work.
>
> current read_apic_id in genx2apic_cluster and genx2apic_phys is the same...

There is a fixed defined mapping between logical & physical mappings,
so that may not be an issue.

A logical cluster apicid is encoded with the high 16bits being the
cluster number, and the low 16bits being a bitmap of which core
in the cluster to send the irq to.  It sounded like a single
cluster can not span multiple sockets.

So in practice if you have 2 sockets you have a cluster id of 1.
Which means physical apic ids over 16 and logical apicids over 65536.

Eric

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support
  2008-07-12  8:11                             ` Eric W. Biederman
@ 2008-07-12  8:37                               ` Yinghai Lu
  2008-07-12  9:46                                 ` Eric W. Biederman
  2008-07-13  1:02                                 ` Suresh Siddha
  0 siblings, 2 replies; 87+ messages in thread
From: Yinghai Lu @ 2008-07-12  8:37 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Suresh Siddha, Ingo Molnar, hpa@zytor.com, tglx@linutronix.de,
	akpm@linux-foundation.org, arjan@linux.intel.com,
	andi@firstfloor.org, jbarnes@virtuousgeek.org, steiner@sgi.com,
	linux-kernel@vger.kernel.org, jeremy@goop.org

On Sat, Jul 12, 2008 at 1:11 AM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
> "Yinghai Lu" <yhlu.kernel@gmail.com> writes:
>
>>>> also read the x2APIC spec pdf, it doesn't say anything about interrupt
>>>> remapping...need to be used with x2apic...
>>>
>>> Clustered logical mode won't work as it requires > 16 bits of apicid.
>>> So only flat physical mode will work.
>>
>> current read_apic_id in genx2apic_cluster and genx2apic_phys is the same...
>
> There is a fixed defined mapping between logical & physical mappings,
> so that may not be an issue.
>
> A logical cluster apicid is encoded with the high 16bits being the
> cluster number, and the low 16bits being a bitmap of which core
> in the cluster to send the irq to.  It sounded like a single
> cluster can not span multiple sockets.
>
> So in practice if you have 2 sockets you have a cluster id of 1.
> Which means physical apic ids over 16 and logical apicids over 65536.
is it like
0x0001, 0x0002, 0x0004, 0x0008, ...., 0x8000 in one cluster? so every
cluster only have 16

YH

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support
  2008-07-12  8:37                               ` Yinghai Lu
@ 2008-07-12  9:46                                 ` Eric W. Biederman
  2008-07-13  1:02                                 ` Suresh Siddha
  1 sibling, 0 replies; 87+ messages in thread
From: Eric W. Biederman @ 2008-07-12  9:46 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Suresh Siddha, Ingo Molnar, hpa@zytor.com, tglx@linutronix.de,
	akpm@linux-foundation.org, arjan@linux.intel.com,
	andi@firstfloor.org, jbarnes@virtuousgeek.org, steiner@sgi.com,
	linux-kernel@vger.kernel.org, jeremy@goop.org

"Yinghai Lu" <yhlu.kernel@gmail.com> writes:

>> So in practice if you have 2 sockets you have a cluster id of 1.
>> Which means physical apic ids over 16 and logical apicids over 65536.
> is it like
> 0x0001, 0x0002, 0x0004, 0x0008, ...., 0x8000 in one cluster? so every
> cluster only have 16

Yes.  At least that is how I read the x2apic documentation.  

Eric


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support
  2008-07-12  3:16                   ` Yinghai Lu
  2008-07-12  3:52                     ` Eric W. Biederman
@ 2008-07-13  0:55                     ` Suresh Siddha
  1 sibling, 0 replies; 87+ messages in thread
From: Suresh Siddha @ 2008-07-13  0:55 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Siddha, Suresh B, Ingo Molnar, hpa@zytor.com, tglx@linutronix.de,
	akpm@linux-foundation.org, arjan@linux.intel.com,
	andi@firstfloor.org, ebiederm@xmission.com,
	jbarnes@virtuousgeek.org, steiner@sgi.com,
	linux-kernel@vger.kernel.org, jeremy@goop.org

On Fri, Jul 11, 2008 at 08:16:03PM -0700, Yinghai Lu wrote:
> 1. wonder if x2apic can be use with uniprocessor.

>From the theory yes.

> in APIC_init_uniprocessor, it will try to enable x2apic, but later
> 
> apic_write(APIC_ID, SET_APIC_ID(boot_cpu_physical_apicid));
> 
> but SET_APIC_ID is still for xapic version. so need to GET_APIC_ID,
> SET_APIC_ID for different
> genapic like 32bit.

apic_write for APIC_ID in x2apic is ignored. As it is a RO register.

> 2 check_x2apic is called in setup_arch, but it only set apic_ops,

BIOS can handover the control to OS in x2apic mode, so we need
to check very early about the mode of the APIC to use proper
accesses (mem Vs reg). BIOS can hand over in x2apic if the platform
has > 8 bit apic id's.

> and genapic still not changed, aka apic_flat...
> wonder if you need to call setup_apic_routing to set genapic.
> 
> otherwise read_apic_id could have use the one from apic_flat....need
> to shift......

for setup_apic_routing(), we really need to know the platform capbilities,
like intr-remapping etc. So we can;t do the genapic setup so early.

Ideally, read_apic_id() can be part of native apic_ops, but UV has
a different implementation.

And also, typically for boot processor, xapic id and x2apic id will be same.
xapic id needs shifting but not x2apic id.

And also, we can re-set the boot_cpu_physical_apicid, after enabling x2apic.
I was planning to look at this in the next patchset.

> 3.or move read_apic_id to apic_ops intead...together with GET_APIC_ID too.
> but 32bit version seems like to put GET_APIC_ID with genapic...

yeah, UV has a different version. so need to think more about the clean solution

thanks,
suresh

> 
> which one is better? 2 or 3
> 
> YH

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support
  2008-07-12  6:17                       ` Yinghai Lu
  2008-07-12  7:02                         ` Eric W. Biederman
@ 2008-07-13  1:00                         ` Suresh Siddha
  1 sibling, 0 replies; 87+ messages in thread
From: Suresh Siddha @ 2008-07-13  1:00 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Eric W. Biederman, Siddha, Suresh B, Ingo Molnar, hpa@zytor.com,
	tglx@linutronix.de, akpm@linux-foundation.org,
	arjan@linux.intel.com, andi@firstfloor.org,
	jbarnes@virtuousgeek.org, steiner@sgi.com,
	linux-kernel@vger.kernel.org, jeremy@goop.org

On Fri, Jul 11, 2008 at 11:17:02PM -0700, Yinghai Lu wrote:
> also read the x2APIC spec pdf, it doesn't say anything about interrupt
> remapping...need to be used with x2apic...

We are updating the spec with more clarifications. But in short,
intr-remapping needs to be enabled prior to enabling x2apic in the CPU.

physical mode might work for < 255 id's for some. But it is HW
implementation specific and might not work from generation to generation.
chipsets or cpu's may drop the interrupts if cpu and chipset are
in different modes (one in legacy mode and another in extended).

So Intel is recommending to enable Intr-remapping before enabling x2apic.

thanks,
suresh

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support
  2008-07-12  7:49                           ` Yinghai Lu
  2008-07-12  8:11                             ` Eric W. Biederman
@ 2008-07-13  1:01                             ` Suresh Siddha
  1 sibling, 0 replies; 87+ messages in thread
From: Suresh Siddha @ 2008-07-13  1:01 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Eric W. Biederman, Siddha, Suresh B, Ingo Molnar, hpa@zytor.com,
	tglx@linutronix.de, akpm@linux-foundation.org,
	arjan@linux.intel.com, andi@firstfloor.org,
	jbarnes@virtuousgeek.org, steiner@sgi.com,
	linux-kernel@vger.kernel.org, jeremy@goop.org

On Sat, Jul 12, 2008 at 12:49:53AM -0700, Yinghai Lu wrote:
> On Sat, Jul 12, 2008 at 12:02 AM, Eric W. Biederman
> <ebiederm@xmission.com> wrote:
> > "Yinghai Lu" <yhlu.kernel@gmail.com> writes:
> >
> >> On Fri, Jul 11, 2008 at 8:52 PM, Eric W. Biederman
> >> <ebiederm@xmission.com> wrote:
> >>> "Yinghai Lu" <yhlu.kernel@gmail.com> writes:
> >>>
> >>>> 1. wonder if x2apic can be use with uniprocessor.
> >>>>
> >>>> in APIC_init_uniprocessor, it will try to enable x2apic, but later
> >>>>
> >>>> apic_write(APIC_ID, SET_APIC_ID(boot_cpu_physical_apicid));
> >>>>
> >>>> but SET_APIC_ID is still for xapic version. so need to GET_APIC_ID,
> >>>> SET_APIC_ID for different
> >>>> genapic like 32bit.
> >>>>
> >>>> 2 check_x2apic is called in setup_arch, but it only set apic_ops,
> >>>> and genapic still not changed, aka apic_flat...
> >>>> wonder if you need to call setup_apic_routing to set genapic.
> >>>>
> >>>> otherwise read_apic_id could have use the one from apic_flat....need
> >>>> to shift......
> >>>>
> >>>> 3.or move read_apic_id to apic_ops intead...together with GET_APIC_ID too.
> >>>> but 32bit version seems like to put GET_APIC_ID with genapic...
> >>>>
> >>>> which one is better? 2 or 3
> >>>
> >>> Z finish untangle SMP support from apic initialization and move the apic
> >>> initialization up into init_IRQ.
> >>>
> >>> That is better but is likely the wrong short term approach.
> >>
> >> plan to add get_apic_id(x) into 64bit genapic, and will use
> >> #define GET_APIC_ID(x) genapic->get_apic_id(x)
> >> #define read_apic_id() GET_APIC_ID(apic_read(APIC_ID))
> >>
> >> so it is identical to 32bit, and we smooth the merging of 32/64 apic code
> >>
> >> also read the x2APIC spec pdf, it doesn't say anything about interrupt
> >> remapping...need to be used with x2apic...
> >
> > Clustered logical mode won't work as it requires > 16 bits of apicid.
> > So only flat physical mode will work.
> 
> current read_apic_id in genx2apic_cluster and genx2apic_phys is the same...

read_apic_id() corresponds to physical apic id and it is same
irrespective of whether we use logical cluster mode or physical mode.

thanks,
suresh

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support
  2008-07-12  8:37                               ` Yinghai Lu
  2008-07-12  9:46                                 ` Eric W. Biederman
@ 2008-07-13  1:02                                 ` Suresh Siddha
  1 sibling, 0 replies; 87+ messages in thread
From: Suresh Siddha @ 2008-07-13  1:02 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Eric W. Biederman, Siddha, Suresh B, Ingo Molnar, hpa@zytor.com,
	tglx@linutronix.de, akpm@linux-foundation.org,
	arjan@linux.intel.com, andi@firstfloor.org,
	jbarnes@virtuousgeek.org, steiner@sgi.com,
	linux-kernel@vger.kernel.org, jeremy@goop.org

On Sat, Jul 12, 2008 at 01:37:01AM -0700, Yinghai Lu wrote:
> is it like
> 0x0001, 0x0002, 0x0004, 0x0008, ...., 0x8000 in one cluster? so every
> cluster only have 16

for logical cluster id's, yes.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support
  2008-07-12  7:02                         ` Eric W. Biederman
  2008-07-12  7:49                           ` Yinghai Lu
@ 2008-07-13  1:32                           ` Suresh Siddha
  1 sibling, 0 replies; 87+ messages in thread
From: Suresh Siddha @ 2008-07-13  1:32 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Yinghai Lu, Siddha, Suresh B, Ingo Molnar, hpa@zytor.com,
	tglx@linutronix.de, akpm@linux-foundation.org,
	arjan@linux.intel.com, andi@firstfloor.org,
	jbarnes@virtuousgeek.org, steiner@sgi.com,
	linux-kernel@vger.kernel.org, jeremy@goop.org

On Sat, Jul 12, 2008 at 12:02:54AM -0700, Eric W. Biederman wrote:
> "Yinghai Lu" <yhlu.kernel@gmail.com> writes:
> > also read the x2APIC spec pdf, it doesn't say anything about interrupt
> > remapping...need to be used with x2apic...
> 
> Clustered logical mode won't work as it requires > 16 bits of apicid.
> So only flat physical mode will work.

Eric, As I mentioned just now in another thread, even flat physical mode
might not work, if cpu or chipset thinks that they are in different modes.

For example if CPU is in extended mode, chipset may block non-remapped
intr-messages and not fwd these to the cpu. So while in theory, physical mode
for < 255 apic id's may work, Intel is not validating and not recommending to
enable x2apic mode in the CPU with out enabling intr-remapping in the chipset.

thanks,
suresh

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support
  2008-07-10 19:53 ` [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support Ingo Molnar
  2008-07-10 20:22   ` Suresh Siddha
  2008-07-10 21:56   ` Suresh Siddha
@ 2008-07-16 14:37   ` Yong Wang
  2008-07-16 14:53     ` Ingo Molnar
  2008-07-22 20:49   ` Andrew Morton
  3 siblings, 1 reply; 87+ messages in thread
From: Yong Wang @ 2008-07-16 14:37 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Suresh Siddha, yhlu.kernel, linux-kernel

On Thu, Jul 10, 2008 at 09:53:20PM +0200, Ingo Molnar wrote:
> 
> quite some stuff!
> 
> For review and testing purposes i've created a new topic branch for 
> this: tip/x86/x2apic and have picked up your patches into it.
> 
> I've pushed it out, but it's not merged into tip/master yet (obviously, 
> you sent this just a few minutes ago :)
> 
> It integrates fine with tip/master. If you do this:
> 
>   git-checkout tip/master
>   git-merge tip/x86/x2apic
> 
> you'll get a clean merge.
> 

I'm seeing the following build error using default config.

  AS      arch/x86/lib/csum-copy_64.o
arch/x86/lib/csum-copy_64.S: Assembler messages:
arch/x86/lib/csum-copy_64.S:48: Error: Macro `ignore' was already defined
make[1]: *** [arch/x86/lib/csum-copy_64.o] Error 1
make: *** [arch/x86/lib] Error 2
make: *** Waiting for unfinished jobs....

Anyone encountered the same problem or any idea of what's wrong?

Thanks
-Yong

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support
  2008-07-16 14:37   ` Yong Wang
@ 2008-07-16 14:53     ` Ingo Molnar
  0 siblings, 0 replies; 87+ messages in thread
From: Ingo Molnar @ 2008-07-16 14:53 UTC (permalink / raw)
  To: Yong Wang; +Cc: Suresh Siddha, yhlu.kernel, linux-kernel


* Yong Wang <yong.y.wang@linux.intel.com> wrote:

> On Thu, Jul 10, 2008 at 09:53:20PM +0200, Ingo Molnar wrote:
> > 
> > quite some stuff!
> > 
> > For review and testing purposes i've created a new topic branch for 
> > this: tip/x86/x2apic and have picked up your patches into it.
> > 
> > I've pushed it out, but it's not merged into tip/master yet (obviously, 
> > you sent this just a few minutes ago :)
> > 
> > It integrates fine with tip/master. If you do this:
> > 
> >   git-checkout tip/master
> >   git-merge tip/x86/x2apic
> > 
> > you'll get a clean merge.
> > 
> 
> I'm seeing the following build error using default config.
> 
>   AS      arch/x86/lib/csum-copy_64.o
> arch/x86/lib/csum-copy_64.S: Assembler messages:
> arch/x86/lib/csum-copy_64.S:48: Error: Macro `ignore' was already defined
> make[1]: *** [arch/x86/lib/csum-copy_64.o] Error 1
> make: *** [arch/x86/lib] Error 2
> make: *** Waiting for unfinished jobs....
> 
> Anyone encountered the same problem or any idea of what's wrong?

that should be fixed already.

What is the output of:

  git-log tip/master | head

? The latest should be:

  commit a328874646a654eabb67da88b3d0a606e552ffe7
  Merge: 760e48e... 9d3b08d...
  Author: Ingo Molnar <mingo@elte.hu>
  Date:   Tue Jul 15 17:16:40 2008 +0200

and there this error should not occur.

	Ingo

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support
  2008-07-10 19:53 ` [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support Ingo Molnar
                     ` (2 preceding siblings ...)
  2008-07-16 14:37   ` Yong Wang
@ 2008-07-22 20:49   ` Andrew Morton
  2008-07-22 21:00     ` Mike Travis
  3 siblings, 1 reply; 87+ messages in thread
From: Andrew Morton @ 2008-07-22 20:49 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: suresh.b.siddha, hpa, tglx, arjan, andi, ebiederm, jbarnes,
	steiner, linux-kernel

On Thu, 10 Jul 2008 21:53:20 +0200
Ingo Molnar <mingo@elte.hu> wrote:

> For review and testing purposes i've created a new topic branch for 
> this: tip/x86/x2apic and have picked up your patches into it.

This has today turned up in linux-next and I'm having to rework 2.6.27
patches to compensate for it.

Which means that I now need to wait until this work goes into mainline
before I can merge those patches or I need to undo those fixes, route
around Suresh's changes and then force you to fix the resulting damage.

What's the score here?  Is this stuff going into 2.6.27?

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support
  2008-07-22 20:49   ` Andrew Morton
@ 2008-07-22 21:00     ` Mike Travis
  2008-07-22 21:14       ` Andrew Morton
  2008-07-24  5:03       ` Ingo Molnar
  0 siblings, 2 replies; 87+ messages in thread
From: Mike Travis @ 2008-07-22 21:00 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ingo Molnar, suresh.b.siddha, hpa, tglx, arjan, andi, ebiederm,
	jbarnes, steiner, linux-kernel

Andrew Morton wrote:
> On Thu, 10 Jul 2008 21:53:20 +0200
> Ingo Molnar <mingo@elte.hu> wrote:
> 
>> For review and testing purposes i've created a new topic branch for 
>> this: tip/x86/x2apic and have picked up your patches into it.
> 
> This has today turned up in linux-next and I'm having to rework 2.6.27
> patches to compensate for it.
> 
> Which means that I now need to wait until this work goes into mainline
> before I can merge those patches or I need to undo those fixes, route
> around Suresh's changes and then force you to fix the resulting damage.
> 
> What's the score here?  Is this stuff going into 2.6.27?

Hi Andrew,

Jack is out at OLS this week, but yes I believe we're trying to push this
into 6.2.27 as that is what the distro's will be basing their distributions
on when the new system sees the light of day.

Is there something I can do to help?  Perhaps work on conflict resolutions?

Thanks,
Mike

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support
  2008-07-22 21:00     ` Mike Travis
@ 2008-07-22 21:14       ` Andrew Morton
  2008-07-24  5:03       ` Ingo Molnar
  1 sibling, 0 replies; 87+ messages in thread
From: Andrew Morton @ 2008-07-22 21:14 UTC (permalink / raw)
  To: Mike Travis
  Cc: mingo, suresh.b.siddha, hpa, tglx, arjan, andi, ebiederm, jbarnes,
	steiner, linux-kernel

On Tue, 22 Jul 2008 14:00:47 -0700
Mike Travis <travis@sgi.com> wrote:

> Andrew Morton wrote:
> > On Thu, 10 Jul 2008 21:53:20 +0200
> > Ingo Molnar <mingo@elte.hu> wrote:
> > 
> >> For review and testing purposes i've created a new topic branch for 
> >> this: tip/x86/x2apic and have picked up your patches into it.
> > 
> > This has today turned up in linux-next and I'm having to rework 2.6.27
> > patches to compensate for it.
> > 
> > Which means that I now need to wait until this work goes into mainline
> > before I can merge those patches or I need to undo those fixes, route
> > around Suresh's changes and then force you to fix the resulting damage.
> > 
> > What's the score here?  Is this stuff going into 2.6.27?
> 
> Hi Andrew,
> 
> Jack is out at OLS this week, but yes I believe we're trying to push this
> into 6.2.27 as that is what the distro's will be basing their distributions
> on when the new system sees the light of day.

2.6.27 material shuld not be turning up in linux-next halfway through
the merge window.

> Is there something I can do to help?  Perhaps work on conflict resolutions?

It was a simple fix.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support
  2008-07-22 21:00     ` Mike Travis
  2008-07-22 21:14       ` Andrew Morton
@ 2008-07-24  5:03       ` Ingo Molnar
  1 sibling, 0 replies; 87+ messages in thread
From: Ingo Molnar @ 2008-07-24  5:03 UTC (permalink / raw)
  To: Mike Travis
  Cc: Andrew Morton, suresh.b.siddha, hpa, tglx, arjan, andi, ebiederm,
	jbarnes, steiner, linux-kernel


* Mike Travis <travis@sgi.com> wrote:

> Andrew Morton wrote:
> > On Thu, 10 Jul 2008 21:53:20 +0200
> > Ingo Molnar <mingo@elte.hu> wrote:
> > 
> >> For review and testing purposes i've created a new topic branch for 
> >> this: tip/x86/x2apic and have picked up your patches into it.
> > 
> > This has today turned up in linux-next and I'm having to rework 2.6.27
> > patches to compensate for it.
> > 
> > Which means that I now need to wait until this work goes into mainline
> > before I can merge those patches or I need to undo those fixes, route
> > around Suresh's changes and then force you to fix the resulting damage.
> > 
> > What's the score here?  Is this stuff going into 2.6.27?
> 
> Hi Andrew,
> 
> Jack is out at OLS this week, but yes I believe we're trying to push 
> this into 6.2.27 as that is what the distro's will be basing their 
> distributions on when the new system sees the light of day.

Note that this is the generic x2apic work from Intel, not the SGI/UV 
specific x2apic stuff. The SGI-specific code is upstream already. (and 
as i understand it was based on an earlier version of the Intel code)

The Intel x2apic code was submitted before the merge window but was 
indeed pushed to linux-next during the window because that's when its 
integration became fully ready. We'd rather not hold the whole merge 
window and linux-next hostage with not fully ready stuff :-)

	Ingo

^ permalink raw reply	[flat|nested] 87+ messages in thread

end of thread, other threads:[~2008-07-24  5:06 UTC | newest]

Thread overview: 87+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-07-10 18:16 [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support Suresh Siddha
2008-07-10 18:16 ` [patch 01/26] x64, x2apic/intr-remap: Intel vt-d, IOMMU code reorganization Suresh Siddha
2008-07-10 18:16 ` [patch 02/26] x64, x2apic/intr-remap: fix the need for sequential array allocation of iommus Suresh Siddha
2008-07-10 18:16 ` [patch 03/26] x64, x2apic/intr-remap: code re-structuring, to be used by both DMA and Interrupt remapping Suresh Siddha
2008-07-10 18:16 ` [patch 04/26] x64, x2apic/intr-remap: use CONFIG_DMAR for DMA-remapping specific code Suresh Siddha
2008-07-10 18:16 ` [patch 05/26] x64, x2apic/intr-remap: Fix the need for RMRR in the DMA-remapping detection Suresh Siddha
2008-07-10 18:16 ` [patch 06/26] x64, x2apic/intr-remap: parse ioapic scope under vt-d structures Suresh Siddha
2008-07-10 18:16 ` [patch 07/26] x64, x2apic/intr-remap: move IOMMU_WAIT_OP() macro to intel-iommu.h Suresh Siddha
2008-07-10 18:16 ` [patch 08/26] x64, x2apic/intr-remap: Queued invalidation infrastructure (part of VT-d) Suresh Siddha
2008-07-10 18:16 ` [patch 09/26] x64, x2apic/intr-remap: Interrupt remapping infrastructure Suresh Siddha
2008-07-10 18:16 ` [patch 10/26] x64, x2apic/intr-remap: routines managing Interrupt remapping table entries Suresh Siddha
2008-07-10 18:16 ` [patch 11/26] x64, x2apic/intr-remap: generic irq migration support from process context Suresh Siddha
2008-07-10 23:08   ` Eric W. Biederman
2008-07-11  5:41     ` Suresh Siddha
2008-07-11  9:19       ` Eric W. Biederman
2008-07-10 18:16 ` [patch 12/26] x64, x2apic/intr-remap: 8259 specific mask/unmask routines Suresh Siddha
2008-07-10 18:16 ` [patch 13/26] x64, x2apic/intr-remap: ioapic routines which deal with initial io-apic RTE setup Suresh Siddha
2008-07-10 18:16 ` [patch 14/26] x64, x2apic/intr-remap: introduce read_apic_id() to genapic routines Suresh Siddha
2008-07-10 18:16 ` [patch 15/26] x64, x2apic/intr-remap: basic apic ops support Suresh Siddha
2008-07-10 18:16 ` [patch 16/26] x64, x2apic/intr-remap: cpuid bits for x2apic feature Suresh Siddha
2008-07-10 18:16 ` [patch 17/26] x64, x2apic/intr-remap: disable DMA-remapping if Interrupt-remapping is detected (temporary quirk) Suresh Siddha
2008-07-10 18:16 ` [patch 18/26] x64, x2apic/intr-remap: x2apic ops for x2apic mode support Suresh Siddha
2008-07-10 18:16 ` [patch 19/26] x64, x2apic/intr-remap: introcude self IPI to genapic routines Suresh Siddha
2008-07-10 23:34   ` Eric W. Biederman
2008-07-11  2:29     ` Mike Travis
2008-07-11  3:50       ` Eric W. Biederman
2008-07-11 13:55         ` Mike Travis
2008-07-10 18:16 ` [patch 20/26] x64, x2apic/intr-remap: x2apic cluster mode support Suresh Siddha
2008-07-10 18:16 ` [patch 21/26] x64, x2apic/intr-remap: setup init_apic_ldr for UV Suresh Siddha
2008-07-11  0:14   ` Andrew Morton
2008-07-11  1:56     ` Suresh Siddha
2008-07-10 18:16 ` [patch 22/26] x64, x2apic/intr-remap: IO-APIC support for interrupt-remapping Suresh Siddha
2008-07-10 18:16 ` [patch 23/26] x64, x2apic/intr-remap: MSI and MSI-X support for interrupt remapping infrastructure Suresh Siddha
2008-07-11  1:22   ` Eric W. Biederman
2008-07-11  6:07     ` Suresh Siddha
2008-07-11  8:59       ` Eric W. Biederman
2008-07-11 23:07         ` Suresh Siddha
2008-07-11 23:50           ` Eric W. Biederman
2008-07-10 18:16 ` [patch 24/26] x64, x2apic/intr-remap: add x2apic support, including enabling interrupt-remapping Suresh Siddha
2008-07-10 18:16 ` [patch 25/26] x64, x2apic/intr-remap: support for x2apic physical mode support Suresh Siddha
2008-07-10 18:17 ` [patch 26/26] x64, x2apic/intr-remap: introduce CONFIG_INTR_REMAP Suresh Siddha
2008-07-10 23:29   ` Eric W. Biederman
2008-07-10 23:37     ` Yong Wang
2008-07-11  1:50       ` Suresh Siddha
2008-07-11  1:53       ` Eric W. Biederman
2008-07-10 19:53 ` [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support Ingo Molnar
2008-07-10 20:22   ` Suresh Siddha
2008-07-10 21:56   ` Suresh Siddha
2008-07-11 10:28     ` Ingo Molnar
2008-07-11 20:09       ` Ingo Molnar
2008-07-11 20:31         ` Suresh Siddha
2008-07-11 20:42           ` Yinghai Lu
2008-07-11 20:45             ` Ingo Molnar
2008-07-11 21:24               ` Suresh Siddha
2008-07-11 22:02                 ` Yinghai Lu
2008-07-12  3:16                   ` Yinghai Lu
2008-07-12  3:52                     ` Eric W. Biederman
2008-07-12  6:17                       ` Yinghai Lu
2008-07-12  7:02                         ` Eric W. Biederman
2008-07-12  7:49                           ` Yinghai Lu
2008-07-12  8:11                             ` Eric W. Biederman
2008-07-12  8:37                               ` Yinghai Lu
2008-07-12  9:46                                 ` Eric W. Biederman
2008-07-13  1:02                                 ` Suresh Siddha
2008-07-13  1:01                             ` Suresh Siddha
2008-07-13  1:32                           ` Suresh Siddha
2008-07-13  1:00                         ` Suresh Siddha
2008-07-13  0:55                     ` Suresh Siddha
2008-07-12  5:37                   ` Ingo Molnar
2008-07-12  6:06                     ` Yinghai Lu
2008-07-12  6:45                       ` Ingo Molnar
2008-07-11 20:49           ` Ingo Molnar
2008-07-16 14:37   ` Yong Wang
2008-07-16 14:53     ` Ingo Molnar
2008-07-22 20:49   ` Andrew Morton
2008-07-22 21:00     ` Mike Travis
2008-07-22 21:14       ` Andrew Morton
2008-07-24  5:03       ` Ingo Molnar
2008-07-10 20:05 ` Eric W. Biederman
2008-07-10 20:18   ` Ingo Molnar
2008-07-10 21:07     ` Eric W. Biederman
2008-07-10 21:15   ` Suresh Siddha
2008-07-10 22:52     ` Eric W. Biederman
2008-07-11  2:35       ` Suresh Siddha
2008-07-11  3:15         ` Eric W. Biederman
2008-07-10 22:09   ` Arjan van de Ven
2008-07-10 22:54     ` Eric W. Biederman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox