LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [RFC PATCH 07/23] x86/hpet: Expose more functions to read and write registers
From: Ricardo Neri @ 2018-06-13  0:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Ricardo Neri, Jacob Pan, Philippe Ombredanne, Kate Stewart,
	Rafael J. Wysocki, iommu
In-Reply-To: <1528851463-21140-1-git-send-email-ricardo.neri-calderon@linux.intel.com>

Some of the registers in the HPET hardware have a width of 64 bits. 64-bit
access functions are needed mostly to read the counter and write the
comparator in a single read or write. Also, 64-bit accesses can be used to
to read parameters located in the higher bits of some registers (such as
the timer period and the IO APIC pins that can be asserted by the timer)
without the need of masking and shifting the register values.

64-bit read and write functions are added. These functions, along with the
existing hpet_writel(), are exposed via the HPET header to be used by other
kernel subsystems.

Thus far, the only consumer of these functions will the HPET-based
hardlockup detector, which will only be available in 64-bit builds. Thus,
the 64-bit access functions are wrapped in CONFIG_X86_64.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/include/asm/hpet.h | 10 ++++++++++
 arch/x86/kernel/hpet.c      | 12 +++++++++++-
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index 67385d5..9e0afde 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -72,6 +72,11 @@ extern int is_hpet_enabled(void);
 extern int hpet_enable(void);
 extern void hpet_disable(void);
 extern unsigned int hpet_readl(unsigned int a);
+extern void hpet_writel(unsigned int d, unsigned int a);
+#ifdef CONFIG_X86_64
+extern unsigned long hpet_readq(unsigned int a);
+extern void hpet_writeq(unsigned long d, unsigned int a);
+#endif
 extern void force_hpet_resume(void);
 
 struct irq_data;
@@ -109,6 +114,11 @@ extern void hpet_unregister_irq_handler(rtc_irq_handler handler);
 static inline int hpet_enable(void) { return 0; }
 static inline int is_hpet_enabled(void) { return 0; }
 #define hpet_readl(a) 0
+#define hpet_writel(d, a)
+#ifdef CONFIG_X86_64
+#define hpet_readq(a) 0
+#define hpet_writeq(d, a)
+#endif
 #define default_setup_hpet_msi	NULL
 
 #endif
diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index 8ce4212..3fa1d3f 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -64,12 +64,22 @@ inline unsigned int hpet_readl(unsigned int a)
 	return readl(hpet_virt_address + a);
 }
 
-static inline void hpet_writel(unsigned int d, unsigned int a)
+inline void hpet_writel(unsigned int d, unsigned int a)
 {
 	writel(d, hpet_virt_address + a);
 }
 
 #ifdef CONFIG_X86_64
+inline unsigned long hpet_readq(unsigned int a)
+{
+	return readq(hpet_virt_address + a);
+}
+
+inline void hpet_writeq(unsigned long d, unsigned int a)
+{
+	writeq(d, hpet_virt_address + a);
+}
+
 #include <asm/pgtable.h>
 #endif
 
-- 
2.7.4

^ permalink raw reply related

* [RFC PATCH 06/23] x86/ioapic: Add support for IRQCHIP_CAN_DELIVER_AS_NMI with interrupt remapping
From: Ricardo Neri @ 2018-06-13  0:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Ricardo Neri, Jacob Pan, Juergen Gross, Baoquan He,
	Eric W. Biederman, Dou Liyang, Jan Kiszka, iommu
In-Reply-To: <1528851463-21140-1-git-send-email-ricardo.neri-calderon@linux.intel.com>

Even though there is a delivery mode field at the entries of an IO APIC's
redirection table, the documentation of the majority of the IO APICs
explicitly states that interrupt delivery as non-maskable is not supported.
Thus,

However, when using an IO APIC in combination with the Intel VT-d interrupt
remapping functionality, the delivery of the interrupt to the CPU is
handled by the remapping hardware. In such a case, the interrupt can be
delivered as non maskable.

Thus, add the IRQCHIP_CAN_DELIVER_AS_NMI flag only when used in combination
with interrupt remapping.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/kernel/apic/io_apic.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 10a20f8..39de91b 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -1911,7 +1911,8 @@ static struct irq_chip ioapic_ir_chip __read_mostly = {
 	.irq_eoi		= ioapic_ir_ack_level,
 	.irq_set_affinity	= ioapic_set_affinity,
 	.irq_retrigger		= irq_chip_retrigger_hierarchy,
-	.flags			= IRQCHIP_SKIP_SET_WAKE,
+	.flags			= IRQCHIP_SKIP_SET_WAKE |
+				  IRQCHIP_CAN_DELIVER_AS_NMI,
 };
 
 static inline void init_IO_APIC_traps(void)
-- 
2.7.4

^ permalink raw reply related

* [RFC PATCH 05/23] x86/msi: Add support for IRQCHIP_CAN_DELIVER_AS_NMI
From: Ricardo Neri @ 2018-06-13  0:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Ricardo Neri, Jacob Pan, Dou Liyang, Juergen Gross, iommu
In-Reply-To: <1528851463-21140-1-git-send-email-ricardo.neri-calderon@linux.intel.com>

As per the Intel 64 and IA-32 Architectures Software Developer's Manual
Volume 3 Section 10.11.2, the delivery mode field of the interrupt message
can be set to configure as non-maskable. Declare support to deliver non-
maskable interrupts by adding IRQCHIP_CAN_DELIVER_AS_NMI.

When composing the interrupt message, the delivery mode is obtained from
the configuration of the interrupt data.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/kernel/apic/msi.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index 12202ac..68b6a04 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -29,6 +29,9 @@ static void irq_msi_compose_msg(struct irq_data *data, struct msi_msg *msg)
 {
 	struct irq_cfg *cfg = irqd_cfg(data);
 
+	if (irqd_deliver_as_nmi(data))
+		cfg->delivery_mode = dest_NMI;
+
 	msg->address_hi = MSI_ADDR_BASE_HI;
 
 	if (x2apic_enabled())
@@ -297,7 +300,7 @@ static struct irq_chip hpet_msi_controller __ro_after_init = {
 	.irq_retrigger = irq_chip_retrigger_hierarchy,
 	.irq_compose_msi_msg = irq_msi_compose_msg,
 	.irq_write_msi_msg = hpet_msi_write_msg,
-	.flags = IRQCHIP_SKIP_SET_WAKE,
+	.flags = IRQCHIP_SKIP_SET_WAKE | IRQCHIP_CAN_DELIVER_AS_NMI,
 };
 
 static irq_hw_number_t hpet_msi_get_hwirq(struct msi_domain_info *info,
-- 
2.7.4

^ permalink raw reply related

* [RFC PATCH 04/23] iommu/vt-d/irq_remapping: Add support for IRQCHIP_CAN_DELIVER_AS_NMI
From: Ricardo Neri @ 2018-06-13  0:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Ricardo Neri, Jacob Pan, Joerg Roedel, iommu
In-Reply-To: <1528851463-21140-1-git-send-email-ricardo.neri-calderon@linux.intel.com>

The Intel IOMMU is capable of delivering remapped interrupts as non-
maskable. Add the IRQCHIP_CAN_DELIVER_AS_NMI flag to its irq_chip
structure to declare this capability. The delivery mode of each interrupt
can be set separately.

By default, the deliver mode is taken from the configuration field of the
interrupt data. If non-maskable delivery is requested in the interrupt
state flags, the respective entry in the remapping table is updated.

When remapping an interrupt from an IO APIC, modify the delivery
field in the interrupt remapping table entry. When remapping an MSI
interrupt, simply update the delivery mode when composing the message.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 drivers/iommu/intel_irq_remapping.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/iommu/intel_irq_remapping.c b/drivers/iommu/intel_irq_remapping.c
index 9f3a04d..b6cf7c4 100644
--- a/drivers/iommu/intel_irq_remapping.c
+++ b/drivers/iommu/intel_irq_remapping.c
@@ -1128,10 +1128,14 @@ static void intel_ir_reconfigure_irte(struct irq_data *irqd, bool force)
 	struct irte *irte = &ir_data->irte_entry;
 	struct irq_cfg *cfg = irqd_cfg(irqd);
 
+	if (irqd_deliver_as_nmi(irqd))
+		cfg->delivery_mode = dest_NMI;
+
 	/*
 	 * Atomically updates the IRTE with the new destination, vector
 	 * and flushes the interrupt entry cache.
 	 */
+	irte->dlvry_mode = cfg->delivery_mode;
 	irte->vector = cfg->vector;
 	irte->dest_id = IRTE_DEST(cfg->dest_apicid);
 
@@ -1182,6 +1186,9 @@ static void intel_ir_compose_msi_msg(struct irq_data *irq_data,
 {
 	struct intel_ir_data *ir_data = irq_data->chip_data;
 
+	if (irqd_deliver_as_nmi(irq_data))
+		ir_data->irte_entry.dlvry_mode = dest_NMI;
+
 	*msg = ir_data->msi_entry;
 }
 
@@ -1227,6 +1234,7 @@ static struct irq_chip intel_ir_chip = {
 	.irq_set_affinity	= intel_ir_set_affinity,
 	.irq_compose_msi_msg	= intel_ir_compose_msi_msg,
 	.irq_set_vcpu_affinity	= intel_ir_set_vcpu_affinity,
+	.flags			= IRQCHIP_CAN_DELIVER_AS_NMI,
 };
 
 static void intel_irq_remapping_prepare_irte(struct intel_ir_data *data,
-- 
2.7.4

^ permalink raw reply related

* [RFC PATCH 03/23] genirq: Introduce IRQF_DELIVER_AS_NMI
From: Ricardo Neri @ 2018-06-13  0:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Ricardo Neri, Jacob Pan, Daniel Lezcano, Andrew Morton,
	Levin, Alexander (Sasha Levin), Randy Dunlap, Masami Hiramatsu,
	Marc Zyngier, Bartosz Golaszewski, Doug Berger, Palmer Dabbelt,
	iommu
In-Reply-To: <1528851463-21140-1-git-send-email-ricardo.neri-calderon@linux.intel.com>

Certain interrupt controllers (such as APIC) are capable of delivering
interrupts as non-maskable. Likewise, drivers or subsystems (e.g., the
hardlockup detector) might be interested in requesting a non-maskable
interrupt. The new flag IRQF_DELIVER_AS_NMI serves this purpose.

When setting up an interrupt, non-maskable delivery will be set in the
interrupt state data only if supported by the underlying interrupt
controller chips.

Interrupt controller chips can declare that they support non-maskable
delivery by using the new flag IRQCHIP_CAN_DELIVER_AS_NMI.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Levin, Alexander (Sasha Levin)" <alexander.levin@verizon.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Bartosz Golaszewski <brgl@bgdev.pl>
Cc: Doug Berger <opendmb@gmail.com>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 include/linux/interrupt.h |  3 +++
 include/linux/irq.h       |  3 +++
 kernel/irq/manage.c       | 22 +++++++++++++++++++++-
 3 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index 5426627..dbc5e02 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -61,6 +61,8 @@
  *                interrupt handler after suspending interrupts. For system
  *                wakeup devices users need to implement wakeup detection in
  *                their interrupt handlers.
+ * IRQF_DELIVER_AS_NMI - Configure interrupt to be delivered as non-maskable, if
+ *                supported by the chip.
  */
 #define IRQF_SHARED		0x00000080
 #define IRQF_PROBE_SHARED	0x00000100
@@ -74,6 +76,7 @@
 #define IRQF_NO_THREAD		0x00010000
 #define IRQF_EARLY_RESUME	0x00020000
 #define IRQF_COND_SUSPEND	0x00040000
+#define IRQF_DELIVER_AS_NMI	0x00080000
 
 #define IRQF_TIMER		(__IRQF_TIMER | IRQF_NO_SUSPEND | IRQF_NO_THREAD)
 
diff --git a/include/linux/irq.h b/include/linux/irq.h
index 7271a2c..d2520ae 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -515,6 +515,8 @@ struct irq_chip {
  * IRQCHIP_SKIP_SET_WAKE:	Skip chip.irq_set_wake(), for this irq chip
  * IRQCHIP_ONESHOT_SAFE:	One shot does not require mask/unmask
  * IRQCHIP_EOI_THREADED:	Chip requires eoi() on unmask in threaded mode
+ * IRQCHIP_CAN_DELIVER_AS_NMI	Chip can deliver interrupts it receives as non-
+ *				maskable.
  */
 enum {
 	IRQCHIP_SET_TYPE_MASKED		= (1 <<  0),
@@ -524,6 +526,7 @@ enum {
 	IRQCHIP_SKIP_SET_WAKE		= (1 <<  4),
 	IRQCHIP_ONESHOT_SAFE		= (1 <<  5),
 	IRQCHIP_EOI_THREADED		= (1 <<  6),
+	IRQCHIP_CAN_DELIVER_AS_NMI	= (1 <<  7),
 };
 
 #include <linux/irqdesc.h>
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index e3336d9..d058aa8 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -1137,7 +1137,7 @@ __setup_irq(unsigned int irq, struct irq_desc *desc, struct irqaction *new)
 {
 	struct irqaction *old, **old_ptr;
 	unsigned long flags, thread_mask = 0;
-	int ret, nested, shared = 0;
+	int ret, nested, shared = 0, deliver_as_nmi = 0;
 
 	if (!desc)
 		return -EINVAL;
@@ -1156,6 +1156,16 @@ __setup_irq(unsigned int irq, struct irq_desc *desc, struct irqaction *new)
 	if (!(new->flags & IRQF_TRIGGER_MASK))
 		new->flags |= irqd_get_trigger_type(&desc->irq_data);
 
+	/* Only deliver as non-maskable interrupt if supported by chip. */
+	if (new->flags & IRQF_DELIVER_AS_NMI) {
+		if (desc->irq_data.chip->flags & IRQCHIP_CAN_DELIVER_AS_NMI) {
+			irqd_set_deliver_as_nmi(&desc->irq_data);
+			deliver_as_nmi = 1;
+		} else {
+			return -EINVAL;
+		}
+	}
+
 	/*
 	 * Check whether the interrupt nests into another interrupt
 	 * thread.
@@ -1166,6 +1176,13 @@ __setup_irq(unsigned int irq, struct irq_desc *desc, struct irqaction *new)
 			ret = -EINVAL;
 			goto out_mput;
 		}
+
+		/* Don't allow nesting if interrupt will be delivered as NMI. */
+		if (deliver_as_nmi) {
+			ret = -EINVAL;
+			goto out_mput;
+		}
+
 		/*
 		 * Replace the primary handler which was provided from
 		 * the driver for non nested interrupt handling by the
@@ -1186,6 +1203,9 @@ __setup_irq(unsigned int irq, struct irq_desc *desc, struct irqaction *new)
 	 * thread.
 	 */
 	if (new->thread_fn && !nested) {
+		if (deliver_as_nmi)
+			goto out_mput;
+
 		ret = setup_irq_thread(new, irq, false);
 		if (ret)
 			goto out_mput;
-- 
2.7.4

^ permalink raw reply related

* [RFC PATCH 02/23] genirq: Introduce IRQD_DELIVER_AS_NMI
From: Ricardo Neri @ 2018-06-13  0:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Ricardo Neri, Jacob Pan, Marc Zyngier, Bartosz Golaszewski,
	Doug Berger, Palmer Dabbelt, Randy Dunlap, iommu
In-Reply-To: <1528851463-21140-1-git-send-email-ricardo.neri-calderon@linux.intel.com>

Certain interrupt controllers (e.g., APIC) are capable of delivering
interrupts to the CPU as non-maskable. Add the new IRQD_DELIVER_AS_NMI
interrupt state flag. The purpose of this flag is to communicate to the
underlying irqchip whether the interrupt must be delivered in this manner.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Bartosz Golaszewski <brgl@bgdev.pl>
Cc: Doug Berger <opendmb@gmail.com>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: iommu@lists.linux-foundation.org

Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 include/linux/irq.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/include/linux/irq.h b/include/linux/irq.h
index 65916a3..7271a2c 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -208,6 +208,7 @@ struct irq_data {
  * IRQD_SINGLE_TARGET		- IRQ allows only a single affinity target
  * IRQD_DEFAULT_TRIGGER_SET	- Expected trigger already been set
  * IRQD_CAN_RESERVE		- Can use reservation mode
+ * IRQD_DELIVER_AS_NMI		- Deliver this interrupt as non-maskable
  */
 enum {
 	IRQD_TRIGGER_MASK		= 0xf,
@@ -230,6 +231,7 @@ enum {
 	IRQD_SINGLE_TARGET		= (1 << 24),
 	IRQD_DEFAULT_TRIGGER_SET	= (1 << 25),
 	IRQD_CAN_RESERVE		= (1 << 26),
+	IRQD_DELIVER_AS_NMI		= (1 << 27),
 };
 
 #define __irqd_to_state(d) ACCESS_PRIVATE((d)->common, state_use_accessors)
@@ -389,6 +391,16 @@ static inline bool irqd_can_reserve(struct irq_data *d)
 	return __irqd_to_state(d) & IRQD_CAN_RESERVE;
 }
 
+static inline void irqd_set_deliver_as_nmi(struct irq_data *d)
+{
+	__irqd_to_state(d) |= IRQD_DELIVER_AS_NMI;
+}
+
+static inline bool irqd_deliver_as_nmi(struct irq_data *d)
+{
+	return __irqd_to_state(d) & IRQD_DELIVER_AS_NMI;
+}
+
 #undef __irqd_to_state
 
 static inline irq_hw_number_t irqd_to_hwirq(struct irq_data *d)
-- 
2.7.4

^ permalink raw reply related

* [RFC PATCH 01/23] x86/apic: Add a parameter for the APIC delivery mode
From: Ricardo Neri @ 2018-06-13  0:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Ricardo Neri, Jacob Pan, Joerg Roedel, Juergen Gross,
	Bjorn Helgaas, Wincy Van, Kate Stewart, Philippe Ombredanne,
	Eric W. Biederman, Baoquan He, Dou Liyang, Jan Kiszka, iommu
In-Reply-To: <1528851463-21140-1-git-send-email-ricardo.neri-calderon@linux.intel.com>

Until now, the delivery mode of APIC interrupts is set to the default
mode set in the APIC driver. However, there are no restrictions in hardware
to configure each interrupt with a different delivery mode. Specifying the
delivery mode per interrupt is useful when one is interested in changing
the delivery mode of a particular interrupt. For instance, this can be used
to deliver an interrupt as non-maskable.

Add a new member, delivery_mode, to struct irq_cfg. Also, update the
configuration of the delivery mode in the IO APIC, the MSI APIC and the
Intel interrupt remapping driver to use this new per-interrupt member to
configure their respective interrupt tables.

In order to keep the current behavior, initialize the delivery mode of
each interrupt with the with the delivery mode of the APIC driver in use
when the interrupt data is allocated.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Wincy Van <fanwenyi0529@gmail.com>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: iommu@lists.linux-foundation.org

Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/include/asm/hw_irq.h       |  5 +++--
 arch/x86/include/asm/msidef.h       |  3 +++
 arch/x86/kernel/apic/io_apic.c      |  2 +-
 arch/x86/kernel/apic/msi.c          |  2 +-
 arch/x86/kernel/apic/vector.c       |  8 ++++++++
 arch/x86/platform/uv/uv_irq.c       |  2 +-
 drivers/iommu/intel_irq_remapping.c | 10 +++++-----
 7 files changed, 22 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index 32e666e..c024e59 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -117,8 +117,9 @@ struct irq_alloc_info {
 };
 
 struct irq_cfg {
-	unsigned int		dest_apicid;
-	unsigned int		vector;
+	unsigned int				dest_apicid;
+	unsigned int				vector;
+	enum ioapic_irq_destination_types	delivery_mode;
 };
 
 extern struct irq_cfg *irq_cfg(unsigned int irq);
diff --git a/arch/x86/include/asm/msidef.h b/arch/x86/include/asm/msidef.h
index ee2f8cc..6aef434 100644
--- a/arch/x86/include/asm/msidef.h
+++ b/arch/x86/include/asm/msidef.h
@@ -16,6 +16,9 @@
 					 MSI_DATA_VECTOR_MASK)
 
 #define MSI_DATA_DELIVERY_MODE_SHIFT	8
+#define MSI_DATA_DELIVERY_MODE_MASK	0x00000700
+#define MSI_DATA_DELIVERY_MODE(dm)	(((dm) << MSI_DATA_DELIVERY_MODE_SHIFT) & \
+					 MSI_DATA_DELIVERY_MODE_MASK)
 #define  MSI_DATA_DELIVERY_FIXED	(0 << MSI_DATA_DELIVERY_MODE_SHIFT)
 #define  MSI_DATA_DELIVERY_LOWPRI	(1 << MSI_DATA_DELIVERY_MODE_SHIFT)
 
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 7553819..10a20f8 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -2887,8 +2887,8 @@ static void mp_setup_entry(struct irq_cfg *cfg, struct mp_chip_data *data,
 			   struct IO_APIC_route_entry *entry)
 {
 	memset(entry, 0, sizeof(*entry));
-	entry->delivery_mode = apic->irq_delivery_mode;
 	entry->dest_mode     = apic->irq_dest_mode;
+	entry->delivery_mode = cfg->delivery_mode;
 	entry->dest	     = cfg->dest_apicid;
 	entry->vector	     = cfg->vector;
 	entry->trigger	     = data->trigger;
diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index ce503c9..12202ac 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -45,7 +45,7 @@ static void irq_msi_compose_msg(struct irq_data *data, struct msi_msg *msg)
 	msg->data =
 		MSI_DATA_TRIGGER_EDGE |
 		MSI_DATA_LEVEL_ASSERT |
-		MSI_DATA_DELIVERY_FIXED |
+		MSI_DATA_DELIVERY_MODE(cfg->delivery_mode) |
 		MSI_DATA_VECTOR(cfg->vector);
 }
 
diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index bb6f7a2..dfe0a2a 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -547,6 +547,14 @@ static int x86_vector_alloc_irqs(struct irq_domain *domain, unsigned int virq,
 		irqd->chip_data = apicd;
 		irqd->hwirq = virq + i;
 		irqd_set_single_target(irqd);
+
+		/*
+		 * Initialize the delivery mode of this irq to match
+		 * the default delivery mode of the APIC. This could be
+		 * changed later when the interrupt is activated.
+		 */
+		 apicd->hw_irq_cfg.delivery_mode = apic->irq_delivery_mode;
+
 		/*
 		 * Legacy vectors are already assigned when the IOAPIC
 		 * takes them over. They stay on the same vector. This is
diff --git a/arch/x86/platform/uv/uv_irq.c b/arch/x86/platform/uv/uv_irq.c
index e4cb9f4..c88508b 100644
--- a/arch/x86/platform/uv/uv_irq.c
+++ b/arch/x86/platform/uv/uv_irq.c
@@ -35,7 +35,7 @@ static void uv_program_mmr(struct irq_cfg *cfg, struct uv_irq_2_mmr_pnode *info)
 	mmr_value = 0;
 	entry = (struct uv_IO_APIC_route_entry *)&mmr_value;
 	entry->vector		= cfg->vector;
-	entry->delivery_mode	= apic->irq_delivery_mode;
+	entry->delivery_mode	= cfg->delivery_mode;
 	entry->dest_mode	= apic->irq_dest_mode;
 	entry->polarity		= 0;
 	entry->trigger		= 0;
diff --git a/drivers/iommu/intel_irq_remapping.c b/drivers/iommu/intel_irq_remapping.c
index 3062a15..9f3a04d 100644
--- a/drivers/iommu/intel_irq_remapping.c
+++ b/drivers/iommu/intel_irq_remapping.c
@@ -1045,7 +1045,7 @@ static int reenable_irq_remapping(int eim)
 	return -1;
 }
 
-static void prepare_irte(struct irte *irte, int vector, unsigned int dest)
+static void prepare_irte(struct irte *irte, struct irq_cfg *irq_cfg)
 {
 	memset(irte, 0, sizeof(*irte));
 
@@ -1059,9 +1059,9 @@ static void prepare_irte(struct irte *irte, int vector, unsigned int dest)
 	 * irq migration in the presence of interrupt-remapping.
 	*/
 	irte->trigger_mode = 0;
-	irte->dlvry_mode = apic->irq_delivery_mode;
-	irte->vector = vector;
-	irte->dest_id = IRTE_DEST(dest);
+	irte->dlvry_mode = irq_cfg->delivery_mode;
+	irte->vector = irq_cfg->vector;
+	irte->dest_id = IRTE_DEST(irq_cfg->dest_apicid);
 	irte->redir_hint = 1;
 }
 
@@ -1238,7 +1238,7 @@ static void intel_irq_remapping_prepare_irte(struct intel_ir_data *data,
 	struct irte *irte = &data->irte_entry;
 	struct msi_msg *msg = &data->msi_entry;
 
-	prepare_irte(irte, irq_cfg->vector, irq_cfg->dest_apicid);
+	prepare_irte(irte, irq_cfg);
 	switch (info->type) {
 	case X86_IRQ_ALLOC_TYPE_IOAPIC:
 		/* Set source-id of interrupt request */
-- 
2.7.4

^ permalink raw reply related

* [RFC PATCH 00/23] Implement an HPET-based hardlockup detector
From: Ricardo Neri @ 2018-06-13  0:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Ricardo Neri

Hi,

This patchset demonstrates the implementation of a hardlockup detector
driven by the High-Precision Event Timer.

== Introduction ==

In CPU architectures that do not have an NMI watchdog, one can be
constructed using a counter of the Performance Monitoring Unit (PMU).
Counters in the PMU have high granularity and high visibility of the CPU.
These capabilities and their limited number make these counters precious
resources. Unfortunately, the perf-based hardlockup detector permanently
consumes one of these counters per CPU.

These counters could be freed for profiling purposes if the hardlockup
detector were driven by another timer.

The hardlockup detector runs relatively infrequently and does not require
visibility of the CPU activity (in addition to detect locked-up CPUs). A
timer that is external to the CPU (e.g., in the chipset) can be used to
drive the detector.

A key requirement is that the timer needs to be capable of issuing a
non-maskable interrupt to the CPU. In most cases, this can be achieved
by tweaking the delivery mode of the interrupt in the interrupt controller
chip (the exception is the IO APIC).

== Parts of this series ==

Several parts of Linux need to be updated to operate the aforementioned
detector.

   1) Update the interrupt subsystem to accept requests of interrupts as
      non-maskable. Likewise, handle irqchips that have this capability.
      Patches 1-5

   2) Rework the x86 HPET platform code to reserve, configure a timer
      and its interrupt, and expose the needed interfaces and definitions.
      Patches 6-11

   3) Rework the hardlockup detector to decouple its generic part from
      perf. This adds definitions to be implemented using other sources
      of non-maskable interrupts. Patches 12-14

   4) Add an HPET-based hardlockup detector. This includes probing the
      hardware resources, configure the interrupt and rotate the
      destination of the interrupts among all monitored CPUs.

== Details on the HPET-based hardlockup detector

Unlike the the perf-based hardlockup detector, this implementation is
driven by a single timer. The timer targets one CPU at a time in a round-
robin manner. This means that if a CPU must be monitored every watch_thresh
seconds, in a system with N monitored CPUs the timer must expire every
watch_thresh/N. A timer expiration per CPU attribute is maintained.

The timer expiration time per CPU is updated every time CPUs are put
online or offline (a CPU hotplug thread enables and disables the watchdog
in these events).

Also, given that a single timer drives the detector, a cpumask is needed
to keep track of which online CPUs are allowed to be monitored. This mask
is updated every time a CPU is put online or offline as well as when the
user modifies the mask in /proc/sys/kernel/watchdog_cpumask. This mask
is needed to keep the current behavior of the lockup detector.

Thanks and BR,
Ricardo

Ricardo Neri (23):
  x86/apic: Add a parameter for the APIC delivery mode
  genirq: Introduce IRQD_DELIVER_AS_NMI
  genirq: Introduce IRQF_DELIVER_AS_NMI
  iommu/vt-d/irq_remapping: Add support for IRQCHIP_CAN_DELIVER_AS_NMI
  x86/msi: Add support for IRQCHIP_CAN_DELIVER_AS_NMI
  x86/ioapic: Add support for IRQCHIP_CAN_DELIVER_AS_NMI with interrupt
    remapping
  x86/hpet: Expose more functions to read and write registers
  x86/hpet: Calculate ticks-per-second in a separate function
  x86/hpet: Reserve timer for the HPET hardlockup detector
  x86/hpet: Relocate flag definitions to a header file
  x86/hpet: Configure the timer used by the hardlockup detector
  kernel/watchdog: Introduce a struct for NMI watchdog operations
  watchdog/hardlockup: Define a generic function to detect hardlockups
  watchdog/hardlockup: Decouple the hardlockup detector from perf
  kernel/watchdog: Add a function to obtain the watchdog_allowed_mask
  watchdog/hardlockup: Add an HPET-based hardlockup detector
  watchdog/hardlockup/hpet: Convert the timer's interrupt to NMI
  watchdog/hardlockup/hpet: Add the NMI watchdog operations
  watchdog/hardlockup: Make arch_touch_nmi_watchdog() to hpet-based
    implementation
  watchdog/hardlockup/hpet: Rotate interrupt among all monitored CPUs
  watchdog/hardlockup/hpet: Adjust timer expiration on the number of
    monitored CPUs
  watchdog/hardlockup/hpet: Only enable the HPET watchdog via a boot
    parameter
  watchdog/hardlockup: Activate the HPET-based lockup detector

 Documentation/admin-guide/kernel-parameters.txt |   5 +-
 arch/x86/include/asm/hpet.h                     |  38 ++
 arch/x86/include/asm/hw_irq.h                   |   5 +-
 arch/x86/include/asm/msidef.h                   |   3 +
 arch/x86/kernel/apic/io_apic.c                  |   5 +-
 arch/x86/kernel/apic/msi.c                      |   7 +-
 arch/x86/kernel/apic/vector.c                   |   8 +
 arch/x86/kernel/hpet.c                          | 149 ++++++-
 arch/x86/platform/uv/uv_irq.c                   |   2 +-
 drivers/char/hpet.c                             |  31 +-
 drivers/iommu/intel_irq_remapping.c             |  18 +-
 include/linux/hpet.h                            |   1 +
 include/linux/interrupt.h                       |   3 +
 include/linux/irq.h                             |  15 +
 include/linux/nmi.h                             |  56 ++-
 kernel/Makefile                                 |   3 +-
 kernel/irq/manage.c                             |  22 +-
 kernel/watchdog.c                               |  78 +++-
 kernel/watchdog_hld.c                           | 152 +------
 kernel/watchdog_hld_hpet.c                      | 557 ++++++++++++++++++++++++
 kernel/watchdog_hld_perf.c                      | 182 ++++++++
 lib/Kconfig.debug                               |  10 +
 22 files changed, 1145 insertions(+), 205 deletions(-)
 create mode 100644 kernel/watchdog_hld_hpet.c
 create mode 100644 kernel/watchdog_hld_perf.c

-- 
2.7.4

^ permalink raw reply

* Re: [RFC PATCH 3/3] powerpc/64s/radix: optimise TLB flush with precise TLB ranges in mmu_gather
From: Nicholas Piggin @ 2018-06-13  0:12 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-mm, ppc-dev, linux-arch, Aneesh Kumar K. V, Minchan Kim,
	Mel Gorman, Nadav Amit, Andrew Morton
In-Reply-To: <CA+55aFzbYBXUDcAGaP_HoCxjTvOgkixc0+7nJqMea0yKjLSnhw@mail.gmail.com>

On Tue, 12 Jun 2018 16:39:55 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Tue, Jun 12, 2018 at 4:26 PM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > Right.  Intel depends on the current thing, ie if a page table
> > *itself* is freed, we will will need to do a flush, but it's the exact
> > same flush as if there had been a regular page there.
> >
> > That's already handled by (for example) pud_free_tlb() doing the
> > __tlb_adjust_range().  
> 
> Side note: I guess we _could_ make the "page directory" flush be
> special on x86 too.
> 
> Right now a page directory flush just counts as a range, and then a
> range that is more that a few entries just means "flush everything".
> 
> End result: in practice, every time you free a page directory, you
> flush the whole TLB because it looks identical to flushing a large
> range of pages.
> 
> And in _theory_, maybe you could have just used "invalpg" with a
> targeted address instead. In fact, I think a single invlpg invalidates
> _all_ caches for the associated MM, but don't quote me on that.

Yeah I was thinking that, you could treat it separately (similar to
powerpc maybe) despite using the same instructions to invalidate it.

> That said, I don't think this is a common case. But I think that *if*
> you extend this to be aware of the page directory caches, and _if_ you
> extend it to cover both ppc and x86, at that point all my "this isn't
> generic" arguments go away.
> 
> Because once x86 does it, it's "common enough" that it counts as
> generic. It may be only a single other architecture, but it's the bulk
> of all the development machines, so..

I'll do the small step first (basically just this patch as an opt-in
for architectures that don't need page tables in their tlb range). But
after that it would be interesting to see if x86 could do anything
with explicit page table cache management.

Thanks,
Nick

^ permalink raw reply

* Re: [RFC PATCH 3/3] powerpc/64s/radix: optimise TLB flush with precise TLB ranges in mmu_gather
From: Nicholas Piggin @ 2018-06-12 23:53 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-mm, ppc-dev, linux-arch, Aneesh Kumar K. V, Minchan Kim,
	Mel Gorman, Nadav Amit, Andrew Morton
In-Reply-To: <CA+55aFxd97-29qi-JMxyPPoZMxw=eObQHB5XXGiLj7SNV8B-oQ@mail.gmail.com>

On Tue, 12 Jun 2018 16:26:33 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Tue, Jun 12, 2018 at 4:09 PM Nicholas Piggin <npiggin@gmail.com> wrote:
> >
> > Sorry I mean Intel needs the existing behaviour of range flush expanded
> > to cover page table pages.... right?  
> 
> Right.  Intel depends on the current thing, ie if a page table
> *itself* is freed, we will will need to do a flush, but it's the exact
> same flush as if there had been a regular page there.
> 
> That's already handled by (for example) pud_free_tlb() doing the
> __tlb_adjust_range().

Agreed.

> 
> Again, I may be missing entirely what you're talking about, because it
> feels like we're talking across each other.
> 
> My argument is that your new patches in (2-3 in the series - patch #1
> looks ok) seem to be fundamentally specific to things that have a
> *different* tlb invalidation for the directory entries than for the
> leaf entries.

Yes I think I confused myself a bit. You're right these patches are
only useful if there is no page structure cache, or if it's managed
separately from TLB invalidation.

> 
> But that's not what at least x86 has, and not what the generic code has done.
> 
> I think it might be fine to introduce a few new helpers that end up
> being no-ops for the traditional cases.
> 
> I just don't think it makes sense to maintain a set of range values
> that then aren't actually used in the general case.

Sure, I'll make it optional. That would probably give a better result
for powerpc too because it doesn't need to maintain two ranges either.

Thanks,
Nick

^ permalink raw reply

* Re: [RFC PATCH 3/3] powerpc/64s/radix: optimise TLB flush with precise TLB ranges in mmu_gather
From: Linus Torvalds @ 2018-06-12 23:39 UTC (permalink / raw)
  To: Nick Piggin
  Cc: linux-mm, ppc-dev, linux-arch, Aneesh Kumar K. V, Minchan Kim,
	Mel Gorman, Nadav Amit, Andrew Morton
In-Reply-To: <CA+55aFxd97-29qi-JMxyPPoZMxw=eObQHB5XXGiLj7SNV8B-oQ@mail.gmail.com>

On Tue, Jun 12, 2018 at 4:26 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Right.  Intel depends on the current thing, ie if a page table
> *itself* is freed, we will will need to do a flush, but it's the exact
> same flush as if there had been a regular page there.
>
> That's already handled by (for example) pud_free_tlb() doing the
> __tlb_adjust_range().

Side note: I guess we _could_ make the "page directory" flush be
special on x86 too.

Right now a page directory flush just counts as a range, and then a
range that is more that a few entries just means "flush everything".

End result: in practice, every time you free a page directory, you
flush the whole TLB because it looks identical to flushing a large
range of pages.

And in _theory_, maybe you could have just used "invalpg" with a
targeted address instead. In fact, I think a single invlpg invalidates
_all_ caches for the associated MM, but don't quote me on that.

That said, I don't think this is a common case. But I think that *if*
you extend this to be aware of the page directory caches, and _if_ you
extend it to cover both ppc and x86, at that point all my "this isn't
generic" arguments go away.

Because once x86 does it, it's "common enough" that it counts as
generic. It may be only a single other architecture, but it's the bulk
of all the development machines, so..

                 Linus

^ permalink raw reply

* Re: [RFC PATCH 3/3] powerpc/64s/radix: optimise TLB flush with precise TLB ranges in mmu_gather
From: Linus Torvalds @ 2018-06-12 23:26 UTC (permalink / raw)
  To: Nick Piggin
  Cc: linux-mm, ppc-dev, linux-arch, Aneesh Kumar K. V, Minchan Kim,
	Mel Gorman, Nadav Amit, Andrew Morton
In-Reply-To: <20180613090950.50566245@roar.ozlabs.ibm.com>

On Tue, Jun 12, 2018 at 4:09 PM Nicholas Piggin <npiggin@gmail.com> wrote:
>
> Sorry I mean Intel needs the existing behaviour of range flush expanded
> to cover page table pages.... right?

Right.  Intel depends on the current thing, ie if a page table
*itself* is freed, we will will need to do a flush, but it's the exact
same flush as if there had been a regular page there.

That's already handled by (for example) pud_free_tlb() doing the
__tlb_adjust_range().

Again, I may be missing entirely what you're talking about, because it
feels like we're talking across each other.

My argument is that your new patches in (2-3 in the series - patch #1
looks ok) seem to be fundamentally specific to things that have a
*different* tlb invalidation for the directory entries than for the
leaf entries.

But that's not what at least x86 has, and not what the generic code has done.

I think it might be fine to introduce a few new helpers that end up
being no-ops for the traditional cases.

I just don't think it makes sense to maintain a set of range values
that then aren't actually used in the general case.

              Linus

^ permalink raw reply

* Re: [RFC PATCH 3/3] powerpc/64s/radix: optimise TLB flush with precise TLB ranges in mmu_gather
From: Nicholas Piggin @ 2018-06-12 23:09 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-mm, ppc-dev, linux-arch, Aneesh Kumar K. V, Minchan Kim,
	Mel Gorman, Nadav Amit, Andrew Morton
In-Reply-To: <CA+55aFyk9VBLUk8VYhfEUR55x0TXY9_QX1dE4wE0A_ias9tMNQ@mail.gmail.com>

On Tue, 12 Jun 2018 15:42:34 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Tue, Jun 12, 2018 at 3:31 PM Nicholas Piggin <npiggin@gmail.com> wrote:
> >
> > Okay sure, and this is the reason for the wide cc list. Intel does
> > need it of course, from 4.10.3.1 of the dev manual:
> >
> >   =E2=80=94 The processor may create a PML4-cache entry even if there a=
re no
> >     translations for any linear address that might use that entry
> >     (e.g., because the P flags are 0 in all entries in the referenced
> >     page-directory-pointer table). =20
>=20
> But does intel need it?
>=20
> Because I don't see it. We already do the __tlb_adjust_range(), and we
> never tear down the highest-level page tables afaik.
>=20
> Am I missing something?


Sorry I mean Intel needs the existing behaviour of range flush expanded
to cover page table pages.... right? The manual has similar wording for
lower levels of page tables too. So it does need to send an invalidate
*somewhere* that a freed page table page covers, even if no valid pte
was torn down.

Thanks,
Nick

^ permalink raw reply

* Re: [RFC PATCH 3/3] powerpc/64s/radix: optimise TLB flush with precise TLB ranges in mmu_gather
From: Linus Torvalds @ 2018-06-12 22:42 UTC (permalink / raw)
  To: Nick Piggin
  Cc: linux-mm, ppc-dev, linux-arch, Aneesh Kumar K. V, Minchan Kim,
	Mel Gorman, Nadav Amit, Andrew Morton
In-Reply-To: <20180613083131.139a3c34@roar.ozlabs.ibm.com>

On Tue, Jun 12, 2018 at 3:31 PM Nicholas Piggin <npiggin@gmail.com> wrote:
>
> Okay sure, and this is the reason for the wide cc list. Intel does
> need it of course, from 4.10.3.1 of the dev manual:
>
>   =E2=80=94 The processor may create a PML4-cache entry even if there are=
 no
>     translations for any linear address that might use that entry
>     (e.g., because the P flags are 0 in all entries in the referenced
>     page-directory-pointer table).

But does intel need it?

Because I don't see it. We already do the __tlb_adjust_range(), and we
never tear down the highest-level page tables afaik.

Am I missing something?

               Linus

^ permalink raw reply

* Re: [RFC PATCH 3/3] powerpc/64s/radix: optimise TLB flush with precise TLB ranges in mmu_gather
From: Nicholas Piggin @ 2018-06-12 22:31 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-mm, ppc-dev, linux-arch, Aneesh Kumar K. V, Minchan Kim,
	Mel Gorman, Nadav Amit, Andrew Morton
In-Reply-To: <CA+55aFzKBieD0Y3sgFQzt+x5esqb9vT6SEQ28xyCz5UWegfFVg@mail.gmail.com>

On Tue, 12 Jun 2018 11:18:27 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Tue, Jun 12, 2018 at 12:16 AM Nicholas Piggin <npiggin@gmail.com> wrot=
e:
> >
> > This brings the number of tlbiel instructions required by a kernel
> > compile from 33M to 25M, most avoided from exec->shift_arg_pages. =20
>=20
> And this shows that "page_start/end" is purely for powerpc and used
> nowhere else.
>=20
> The previous patch should have been to purely powerpc page table
> walking and not touch asm-generic/tlb.h
>=20
> I think you should make those changes to
> arch/powerpc/include/asm/tlb.h. If that means you can't use the
> generic header, then so be it.

I can make it ppc specific if nobody else would use it. But at least
mmu notifiers AFAIKS would rather use a precise range.

> Or maybe you can embed the generic case in some ppc-specific
> structures, and use 90% of the generic code just with your added
> wrappers for that radix invalidation on top.

Would you mind another arch specific ifdefs in there?

>=20
> But don't make other architectures do pointless work that doesn't
> matter - or make sense - for them.

Okay sure, and this is the reason for the wide cc list. Intel does
need it of course, from 4.10.3.1 of the dev manual:

  =E2=80=94 The processor may create a PML4-cache entry even if there are no
    translations for any linear address that might use that entry
    (e.g., because the P flags are 0 in all entries in the referenced
    page-directory-pointer table).

But I'm sure others would not have paging structure caches at all
(some don't even walk the page tables in hardware right?). Maybe
they're all doing their own thing though.

Thanks,
Nick

^ permalink raw reply

* Re: [PATCH v2 08/12] macintosh/via-pmu68k: Don't load driver on unsupported hardware
From: Michael Schmitz @ 2018-06-12 20:12 UTC (permalink / raw)
  To: Laurent Vivier
  Cc: Finn Thain, Benjamin Herrenschmidt, Andreas Schwab, linuxppc-dev,
	Linux/m68k, Linux Kernel Development, Geert Uytterhoeven
In-Reply-To: <6c1205cb-7d65-8b30-c4df-59a0ed041313@redhat.com>

Hi,

On Tue, Jun 12, 2018 at 6:53 PM, Laurent Vivier <lvivier@redhat.com> wrote:
> On 12/06/2018 01:47, Finn Thain wrote:
>> On Sun, 10 Jun 2018, Benjamin Herrenschmidt wrote:
> ...
>> I don't know what the bootloader situation is, but it looks messy...
>> http://nubus-pmac.sourceforge.net/#booters
>>
>> Laurent, does Emile work on these machines?
>>
>
> No, Emile doesn't work on pmac-nubus, I tried to implement the switch
> from m68k to ppc, but it has never worked.

On the PowerMac 6100 that I installed Linux on many years ago, I'm
pretty sure I used Apple's mkLinux boot loader.

Not sure how that would work with modern kernels and initrds though.
Haven't got that hardware anymore so no way try.

Cheers,

  Michael

>
> Laurent

^ permalink raw reply

* Re: [RFC PATCH 1/3] Revert "mm: always flush VMA ranges affected by zap_page_range"
From: Nadav Amit @ 2018-06-12 18:52 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: open list:MEMORY MANAGEMENT, linuxppc-dev, linux-arch,
	Aneesh Kumar K . V, Minchan Kim, Mel Gorman, Andrew Morton,
	Linus Torvalds
In-Reply-To: <20180612071621.26775-2-npiggin@gmail.com>

at 12:16 AM, Nicholas Piggin <npiggin@gmail.com> wrote:

> This reverts commit 4647706ebeee6e50f7b9f922b095f4ec94d581c3.
>=20
> Patch 99baac21e4585 ("mm: fix MADV_[FREE|DONTNEED] TLB flush miss
> problem") provides a superset of the TLB flush coverage of this
> commit, and even includes in the changelog "this patch supersedes
> 'mm: Always flush VMA ranges affected by zap_page_range v2'".
>=20
> Reverting this avoids double flushing the TLB range, and the less
> efficient flush_tlb_range() call (the mmu_gather API is more precise
> about what ranges it invalidates).
>=20
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
> mm/memory.c | 14 +-------------
> 1 file changed, 1 insertion(+), 13 deletions(-)
>=20
> diff --git a/mm/memory.c b/mm/memory.c
> index 7206a634270b..9d472e00fc2d 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1603,20 +1603,8 @@ void zap_page_range(struct vm_area_struct *vma, =
unsigned long start,
> 	tlb_gather_mmu(&tlb, mm, start, end);
> 	update_hiwater_rss(mm);
> 	mmu_notifier_invalidate_range_start(mm, start, end);
> -	for ( ; vma && vma->vm_start < end; vma =3D vma->vm_next) {
> +	for ( ; vma && vma->vm_start < end; vma =3D vma->vm_next)
> 		unmap_single_vma(&tlb, vma, start, end, NULL);
> -
> -		/*
> -		 * zap_page_range does not specify whether mmap_sem =
should be
> -		 * held for read or write. That allows parallel =
zap_page_range
> -		 * operations to unmap a PTE and defer a flush meaning =
that
> -		 * this call observes pte_none and fails to flush the =
TLB.
> -		 * Rather than adding a complex API, ensure that no =
stale
> -		 * TLB entries exist when this call returns.
> -		 */
> -		flush_tlb_range(vma, start, end);
> -	}
> -
> 	mmu_notifier_invalidate_range_end(mm, start, end);
> 	tlb_finish_mmu(&tlb, start, end);
> }

Yes, this was in my =E2=80=9Cto check when I have time=E2=80=9D todo =
list, especially since
the flush was from start to end, not even vma->vm_start to vma->vm_end.

The revert seems correct.

Reviewed-by: Nadav Amit <namit@vmware.com>

^ permalink raw reply

* Re: [RFC PATCH 3/3] powerpc/64s/radix: optimise TLB flush with precise TLB ranges in mmu_gather
From: Linus Torvalds @ 2018-06-12 18:18 UTC (permalink / raw)
  To: Nick Piggin
  Cc: linux-mm, ppc-dev, linux-arch, Aneesh Kumar K. V, Minchan Kim,
	Mel Gorman, Nadav Amit, Andrew Morton
In-Reply-To: <20180612071621.26775-4-npiggin@gmail.com>

On Tue, Jun 12, 2018 at 12:16 AM Nicholas Piggin <npiggin@gmail.com> wrote:
>
> This brings the number of tlbiel instructions required by a kernel
> compile from 33M to 25M, most avoided from exec->shift_arg_pages.

And this shows that "page_start/end" is purely for powerpc and used
nowhere else.

The previous patch should have been to purely powerpc page table
walking and not touch asm-generic/tlb.h

I think you should make those changes to
arch/powerpc/include/asm/tlb.h. If that means you can't use the
generic header, then so be it.

Or maybe you can embed the generic case in some ppc-specific
structures, and use 90% of the generic code just with your added
wrappers for that radix invalidation on top.

But don't make other architectures do pointless work that doesn't
matter - or make sense - for them.

               Linus

^ permalink raw reply

* Re: [RFC PATCH 2/3] mm: mmu_gather track of invalidated TLB ranges explicitly for more precise flushing
From: Linus Torvalds @ 2018-06-12 18:14 UTC (permalink / raw)
  To: Nick Piggin
  Cc: linux-mm, ppc-dev, linux-arch, Aneesh Kumar K. V, Minchan Kim,
	Mel Gorman, Nadav Amit, Andrew Morton
In-Reply-To: <20180612071621.26775-3-npiggin@gmail.com>

On Tue, Jun 12, 2018 at 12:16 AM Nicholas Piggin <npiggin@gmail.com> wrote:
>
> +static inline void __tlb_adjust_page_range(struct mmu_gather *tlb,
> +                                     unsigned long address,
> +                                     unsigned int range_size)
> +{
> +       tlb->page_start = min(tlb->page_start, address);
> +       tlb->page_end = max(tlb->page_end, address + range_size);
> +}

Why add this unnecessary complexity for architectures where it doesn't matter?

This is not "generic". This is some crazy powerpc special case. Why
add it to generic code, and why make everybody else take the cost?

                    Linus

^ permalink raw reply

* Re: [PATCH v6 3/4] powerpc/lib: implement strlen() in assembly
From: Christophe LEROY @ 2018-06-12 17:01 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	wei.guo.simon, linux-kernel, linuxppc-dev
In-Reply-To: <20180612145315.GJ27520@gate.crashing.org>



Le 12/06/2018 à 16:53, Segher Boessenkool a écrit :
> On Tue, Jun 12, 2018 at 09:14:53AM +0000, Christophe Leroy wrote:
>> ---
>> Not tested on PPC64.
> 
> It won't be acceptable until that happens.  It also is likely quite bad
> performance on all 64-bit CPUs from the last fifteen years or so.  Or you
> did nothing to prove otherwise, at least.

Will it be as bad as the generic implementation which does it byte per 
byte ?

I don't have any 64 bits target, can someone test it using the test app 
I have added in selftests ?

Or should I just leave it as is for 64 bits and just do the 
implementation for 32 bits until someone wants to try and do it for PPC64 ?

Christophe

> 
>> + * Algorigthm:
> 
> Typo.
> 
> 
> Segher
> 

^ permalink raw reply

* Re: [PATCH v5 2/4] resource: Use list_head to link sibling resource
From: Julia Lawall @ 2018-06-12 15:10 UTC (permalink / raw)
  To: Baoquan He
  Cc: brijesh.singh, devicetree, airlied, linux-pci, richard.weiyang,
	keith.busch, jcmvbkbc, baiyaowei, frowand.list, lorenzo.pieralisi,
	sthemmin, Baoquan He, linux-nvdimm, patrik.r.jakobsson,
	linux-input, gustavo, dyoung, vgoyal, thomas.lendacky, haiyangz,
	maarten.lankhorst, jglisse, seanpaul, bhelgaas, tglx, yinghai,
	jonathan.derrick, chris, monstr, linux-parisc, gregkh,
	dmitry.torokhov, kexec, ebiederm, devel, linuxppc-dev, davem,
	kbuild-all

This looks wrong.  After a list iterator, the index variable points to a
dummy structure.

julia

url:    https://github.com/0day-ci/linux/commits/Baoquan-He/resource-Use-list_head-to-link-sibling-resource/20180612-113600
:::::: branch date: 7 hours ago
:::::: commit date: 7 hours ago

>> kernel/resource.c:265:17-20: ERROR: invalid reference to the index variable of the iterator on line 253

# https://github.com/0day-ci/linux/commit/e906f15906750a86913ba2b1f08bad99129d3dfc
git remote add linux-review https://github.com/0day-ci/linux
git remote update linux-review
git checkout e906f15906750a86913ba2b1f08bad99129d3dfc
vim +265 kernel/resource.c

^1da177e4 Linus Torvalds 2005-04-16  247
5eeec0ec9 Yinghai Lu     2009-12-22  248  static void __release_child_resources(struct resource *r)
5eeec0ec9 Yinghai Lu     2009-12-22  249  {
e906f1590 Baoquan He     2018-06-12  250  	struct resource *tmp, *next;
5eeec0ec9 Yinghai Lu     2009-12-22  251  	resource_size_t size;
5eeec0ec9 Yinghai Lu     2009-12-22  252
e906f1590 Baoquan He     2018-06-12 @253  	list_for_each_entry_safe(tmp, next, &r->child, sibling) {
5eeec0ec9 Yinghai Lu     2009-12-22  254  		tmp->parent = NULL;
e906f1590 Baoquan He     2018-06-12  255  		INIT_LIST_HEAD(&tmp->sibling);
5eeec0ec9 Yinghai Lu     2009-12-22  256  		__release_child_resources(tmp);
5eeec0ec9 Yinghai Lu     2009-12-22  257
5eeec0ec9 Yinghai Lu     2009-12-22  258  		printk(KERN_DEBUG "release child resource %pR\n", tmp);
5eeec0ec9 Yinghai Lu     2009-12-22  259  		/* need to restore size, and keep flags */
5eeec0ec9 Yinghai Lu     2009-12-22  260  		size = resource_size(tmp);
5eeec0ec9 Yinghai Lu     2009-12-22  261  		tmp->start = 0;
5eeec0ec9 Yinghai Lu     2009-12-22  262  		tmp->end = size - 1;
5eeec0ec9 Yinghai Lu     2009-12-22  263  	}
e906f1590 Baoquan He     2018-06-12  264
e906f1590 Baoquan He     2018-06-12 @265  	INIT_LIST_HEAD(&tmp->child);
5eeec0ec9 Yinghai Lu     2009-12-22  266  }
5eeec0ec9 Yinghai Lu     2009-12-22  267

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply

* Re: [PATCH v6 3/4] powerpc/lib: implement strlen() in assembly
From: Segher Boessenkool @ 2018-06-12 14:53 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	wei.guo.simon, linux-kernel, linuxppc-dev
In-Reply-To: <8b89f2e21f7e3a865105eeeeda509243db393454.1528791416.git.christophe.leroy@c-s.fr>

On Tue, Jun 12, 2018 at 09:14:53AM +0000, Christophe Leroy wrote:
> ---
> Not tested on PPC64.

It won't be acceptable until that happens.  It also is likely quite bad
performance on all 64-bit CPUs from the last fifteen years or so.  Or you
did nothing to prove otherwise, at least.

> + * Algorigthm:

Typo.


Segher

^ permalink raw reply

* RE: [v3, 00/10] Support DPAA PTP clock and timestamping
From: Madalin-cristian Bucur @ 2018-06-12 14:27 UTC (permalink / raw)
  To: Y.b. Lu, netdev@vger.kernel.org, Richard Cochran, Rob Herring,
	Shawn Guo, David S . Miller
  Cc: devicetree@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, Y.b. Lu
In-Reply-To: <20180607092050.46128-1-yangbo.lu@nxp.com>

> -----Original Message-----
> From: Yangbo Lu [mailto:yangbo.lu@nxp.com]
> Sent: Thursday, June 7, 2018 12:21 PM
> To: netdev@vger.kernel.org; Madalin-cristian Bucur
> <madalin.bucur@nxp.com>; Richard Cochran <richardcochran@gmail.com>;
> Rob Herring <robh+dt@kernel.org>; Shawn Guo <shawnguo@kernel.org>;
> David S . Miller <davem@davemloft.net>
> Cc: devicetree@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; linux-arm-
> kernel@lists.infradead.org; linux-kernel@vger.kernel.org; Y.b. Lu
> <yangbo.lu@nxp.com>
> Subject: [v3, 00/10] Support DPAA PTP clock and timestamping
>=20
> This patchset is to support DPAA FMAN PTP clock and HW timestamping.
> It had been verified on both ARM platform and PPC platform.
> - The patch #1 to patch #5 are to support DPAA FMAN 1588 timer in
>   ptp_qoriq driver.
> - The patch #6 to patch #10 are to add HW timestamping support in
>   DPAA ethernet driver.
>=20
> Yangbo Lu (10):
>   fsl/fman: share the event interrupt
>   ptp: support DPAA FMan 1588 timer in ptp_qoriq
>   dt-binding: ptp_qoriq: add DPAA FMan support
>   powerpc/mpc85xx: move ptp timer out of fman in dts
>   arm64: dts: fsl: move ptp timer out of fman
>   fsl/fman: add set_tstamp interface
>   fsl/fman_port: support getting timestamp
>   fsl/fman: define frame description command UPD
>   dpaa_eth: add support for hardware timestamping
>   dpaa_eth: add the get_ts_info interface for ethtool
>=20
>  Documentation/devicetree/bindings/net/fsl-fman.txt |   25 +-----
>  .../devicetree/bindings/ptp/ptp-qoriq.txt          |   15 +++-
>  arch/arm64/boot/dts/freescale/qoriq-fman3-0.dtsi   |   14 ++-
>  arch/powerpc/boot/dts/fsl/qoriq-fman-0.dtsi        |   14 ++-
>  arch/powerpc/boot/dts/fsl/qoriq-fman-1.dtsi        |   14 ++-
>  arch/powerpc/boot/dts/fsl/qoriq-fman3-0.dtsi       |   14 ++-
>  arch/powerpc/boot/dts/fsl/qoriq-fman3-1.dtsi       |   14 ++-
>  arch/powerpc/boot/dts/fsl/qoriq-fman3l-0.dtsi      |   14 ++-
>  drivers/net/ethernet/freescale/dpaa/dpaa_eth.c     |   88
> ++++++++++++++++-
>  drivers/net/ethernet/freescale/dpaa/dpaa_eth.h     |    3 +
>  drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c |   39 ++++++++
>  drivers/net/ethernet/freescale/fman/fman.c         |    3 +-
>  drivers/net/ethernet/freescale/fman/fman.h         |    1 +
>  drivers/net/ethernet/freescale/fman/fman_dtsec.c   |   27 +++++
>  drivers/net/ethernet/freescale/fman/fman_dtsec.h   |    1 +
>  drivers/net/ethernet/freescale/fman/fman_memac.c   |    5 +
>  drivers/net/ethernet/freescale/fman/fman_memac.h   |    1 +
>  drivers/net/ethernet/freescale/fman/fman_port.c    |   12 +++
>  drivers/net/ethernet/freescale/fman/fman_port.h    |    2 +
>  drivers/net/ethernet/freescale/fman/fman_tgec.c    |   21 ++++
>  drivers/net/ethernet/freescale/fman/fman_tgec.h    |    1 +
>  drivers/net/ethernet/freescale/fman/mac.c          |    3 +
>  drivers/net/ethernet/freescale/fman/mac.h          |    1 +
>  drivers/ptp/Kconfig                                |    2 +-
>  drivers/ptp/ptp_qoriq.c                            |  104 ++++++++++++--=
-----
>  include/linux/fsl/ptp_qoriq.h                      |   38 ++++++--
>  26 files changed, 361 insertions(+), 115 deletions(-)

Acked-by: Madalin Bucur <madalin.bucur@nxp.com>

^ permalink raw reply

* Re: [PATCH v5 1/4] resource: Move reparent_resources() to kernel/resource.c and make it public
From: Andy Shevchenko @ 2018-06-12 14:24 UTC (permalink / raw)
  To: Baoquan He
  Cc: Linux Kernel Mailing List, Andrew Morton, Rob Herring,
	Dan Williams, Nicolas Pitre, Josh Triplett, kbuild test robot,
	Borislav Petkov, Patrik Jakobsson, David Airlie, KY Srinivasan,
	Haiyang Zhang, Stephen Hemminger, Dmitry Torokhov, Frank Rowand,
	Keith Busch, Jon Derrick, Lorenzo Pieralisi, Bjorn Helgaas,
	Thomas Gleixner, brijesh.singh, Jérôme Glisse,
	Tom Lendacky, Greg Kroah-Hartman, baiyaowei, richard.weiyang,
	devel, linux-input, linux-nvdimm, devicetree, linux-pci,
	Eric Biederman, Vivek Goyal, Dave Young, Yinghai Lu, kexec,
	Michal Simek, David S. Miller, Chris Zankel, Max Filippov,
	Gustavo Padovan, Maarten Lankhorst, Sean Paul, linux-parisc,
	open list:LINUX FOR POWERPC PA SEMI PWRFICIENT,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
In-Reply-To: <CAHp75Vf_kBLkE6v=JyOdfNoWktWEfKs7JzRP1XEc4TeuT5xqfw@mail.gmail.com>

On Tue, Jun 12, 2018 at 5:20 PM, Andy Shevchenko
<andy.shevchenko@gmail.com> wrote:
> On Tue, Jun 12, 2018 at 12:38 PM, Baoquan He <bhe@redhat.com> wrote:
>> On 06/12/18 at 11:29am, Andy Shevchenko wrote:
>>> On Tue, Jun 12, 2018 at 6:28 AM, Baoquan He <bhe@redhat.com> wrote:
>
>>> > +{
>>>
>>> > +       for (pp = &parent->child; (p = *pp) != NULL; pp = &p->sibling) {
>>> > +               if (p->end < res->start)
>>> > +                       continue;
>>> > +               if (res->end < p->start)
>>> > +                       break;
>>>
>>> > +               if (p->start < res->start || p->end > res->end)
>>> > +                       return -1;      /* not completely contained */
>>>
>>> Usually we are expecting real eeror codes.
>>
>> Hmm, I just copied it from arch/powerpc/kernel/pci-common.c. The
>> function interface expects an integer returned value, not sure what a
>> real error codes look like, could you give more hints? Will change
>> accordingly.
>
> I briefly looked at the code and error codes we have, so, my proposal
> is one of the following

>  - use -ECANCELED (not the best choice for first occurrence here,
> though I can't find better)

Actually -ENOTSUPP might suit the first case (although the actual
would be something like -EOVERLAP, which we don't have)

>  - use positive integers (or enum), like
>   #define RES_REPARENTED 0
>   #define RES_OVERLAPPED 1
>   #define RES_NOCONFLICT 2
>
>
>>> > +               if (firstpp == NULL)
>>> > +                       firstpp = pp;
>>> > +       }
>>>
>>> > +       if (firstpp == NULL)
>>> > +               return -1;      /* didn't find any conflicting entries? */
>>>
>>> Ditto.
>
> Ditto.
>
>>>
>>> > +}
>>> > +EXPORT_SYMBOL(reparent_resources);
>
> --
> With Best Regards,
> Andy Shevchenko



-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply

* Re: [PATCH v5 1/4] resource: Move reparent_resources() to kernel/resource.c and make it public
From: Andy Shevchenko @ 2018-06-12 14:20 UTC (permalink / raw)
  To: Baoquan He
  Cc: Linux Kernel Mailing List, Andrew Morton, Rob Herring,
	Dan Williams, Nicolas Pitre, Josh Triplett, kbuild test robot,
	Borislav Petkov, Patrik Jakobsson, David Airlie, KY Srinivasan,
	Haiyang Zhang, Stephen Hemminger, Dmitry Torokhov, Frank Rowand,
	Keith Busch, Jon Derrick, Lorenzo Pieralisi, Bjorn Helgaas,
	Thomas Gleixner, brijesh.singh, Jérôme Glisse,
	Tom Lendacky, Greg Kroah-Hartman, baiyaowei, richard.weiyang,
	devel, linux-input, linux-nvdimm, devicetree, linux-pci,
	Eric Biederman, Vivek Goyal, Dave Young, Yinghai Lu, kexec,
	Michal Simek, David S. Miller, Chris Zankel, Max Filippov,
	Gustavo Padovan, Maarten Lankhorst, Sean Paul, linux-parisc,
	open list:LINUX FOR POWERPC PA SEMI PWRFICIENT,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
In-Reply-To: <20180612093812.GC1820@MiWiFi-R3L-srv>

On Tue, Jun 12, 2018 at 12:38 PM, Baoquan He <bhe@redhat.com> wrote:
> On 06/12/18 at 11:29am, Andy Shevchenko wrote:
>> On Tue, Jun 12, 2018 at 6:28 AM, Baoquan He <bhe@redhat.com> wrote:

>> > +{
>>
>> > +       for (pp = &parent->child; (p = *pp) != NULL; pp = &p->sibling) {
>> > +               if (p->end < res->start)
>> > +                       continue;
>> > +               if (res->end < p->start)
>> > +                       break;
>>
>> > +               if (p->start < res->start || p->end > res->end)
>> > +                       return -1;      /* not completely contained */
>>
>> Usually we are expecting real eeror codes.
>
> Hmm, I just copied it from arch/powerpc/kernel/pci-common.c. The
> function interface expects an integer returned value, not sure what a
> real error codes look like, could you give more hints? Will change
> accordingly.

I briefly looked at the code and error codes we have, so, my proposal
is one of the following
 - use -ECANCELED (not the best choice for first occurrence here,
though I can't find better)
 - use positive integers (or enum), like
  #define RES_REPARENTED 0
  #define RES_OVERLAPPED 1
  #define RES_NOCONFLICT 2


>> > +               if (firstpp == NULL)
>> > +                       firstpp = pp;
>> > +       }
>>
>> > +       if (firstpp == NULL)
>> > +               return -1;      /* didn't find any conflicting entries? */
>>
>> Ditto.

Ditto.

>>
>> > +}
>> > +EXPORT_SYMBOL(reparent_resources);

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox