linux-snps-arc.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] ARC perf updates
@ 2017-11-07 22:13 Vineet Gupta
  2017-11-07 22:13 ` [PATCH 1/4] ARCv2: perf: tweak overflow interrupt Vineet Gupta
                   ` (5 more replies)
  0 siblings, 6 replies; 13+ messages in thread
From: Vineet Gupta @ 2017-11-07 22:13 UTC (permalink / raw)
  To: linux-snps-arc

Hi,

Found these when cleaning up some old branches. The only controversial one
could be the last one.

Thx,
-Vineet

Vineet Gupta (4):
  ARCv2: perf: tweak overflow interrupt
  ARCv2: perf: optimize given that num counters <= 32
  ARC: perf: avoid vmalloc backed mmap
  ARCv2: entry: Reduce perf intr return path

 arch/arc/Kconfig                   |  2 +-
 arch/arc/include/asm/entry-arcv2.h |  2 ++
 arch/arc/kernel/entry-arcv2.S      | 23 +++++++++++++++++++++-
 arch/arc/kernel/perf_event.c       | 40 ++++++++++++++++++++------------------
 4 files changed, 46 insertions(+), 21 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/4] ARCv2: perf: tweak overflow interrupt
  2017-11-07 22:13 [PATCH 0/4] ARC perf updates Vineet Gupta
@ 2017-11-07 22:13 ` Vineet Gupta
  2017-11-07 22:13 ` [PATCH 2/4] ARCv2: perf: optimize given that num counters <= 32 Vineet Gupta
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Vineet Gupta @ 2017-11-07 22:13 UTC (permalink / raw)
  To: linux-snps-arc

Current perf ISR loops thru all 32 counters, checking for each if it
caused the interrupt. Instead only loop thru counters which actually
interrupted (typically 1).

Signed-off-by: Vineet Gupta <vgupta at synopsys.com>
---
 arch/arc/kernel/perf_event.c | 24 ++++++++++++++----------
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/arch/arc/kernel/perf_event.c b/arch/arc/kernel/perf_event.c
index 2ce24e74f879..0eaa132a2c90 100644
--- a/arch/arc/kernel/perf_event.c
+++ b/arch/arc/kernel/perf_event.c
@@ -377,21 +377,22 @@ static irqreturn_t arc_pmu_intr(int irq, void *dev)
 	struct perf_sample_data data;
 	struct arc_pmu_cpu *pmu_cpu = this_cpu_ptr(&arc_pmu_cpu);
 	struct pt_regs *regs;
-	int active_ints;
+	unsigned int active_ints;
 	int idx;
 
 	arc_pmu_disable(&arc_pmu->pmu);
 
 	active_ints = read_aux_reg(ARC_REG_PCT_INT_ACT);
+	if (!active_ints)
+		goto done;
 
 	regs = get_irq_regs();
 
-	for (idx = 0; idx < arc_pmu->n_counters; idx++) {
-		struct perf_event *event = pmu_cpu->act_counter[idx];
+	do {
+		struct perf_event *event;
 		struct hw_perf_event *hwc;
 
-		if (!(active_ints & (1 << idx)))
-			continue;
+		idx = __ffs(active_ints);
 
 		/* Reset interrupt flag by writing of 1 */
 		write_aux_reg(ARC_REG_PCT_INT_ACT, 1 << idx);
@@ -404,19 +405,22 @@ static irqreturn_t arc_pmu_intr(int irq, void *dev)
 		write_aux_reg(ARC_REG_PCT_INT_CTRL,
 			read_aux_reg(ARC_REG_PCT_INT_CTRL) | (1 << idx));
 
+		event = pmu_cpu->act_counter[idx];
 		hwc = &event->hw;
 
 		WARN_ON_ONCE(hwc->idx != idx);
 
 		arc_perf_event_update(event, &event->hw, event->hw.idx);
 		perf_sample_data_init(&data, 0, hwc->last_period);
-		if (!arc_pmu_event_set_period(event))
-			continue;
+		if (arc_pmu_event_set_period(event)) {
+			if (perf_event_overflow(event, &data, regs))
+				arc_pmu_stop(event, 0);
+		}
 
-		if (perf_event_overflow(event, &data, regs))
-			arc_pmu_stop(event, 0);
-	}
+		active_ints &= ~(1U << idx);
+	} while (active_ints);
 
+done:
 	arc_pmu_enable(&arc_pmu->pmu);
 
 	return IRQ_HANDLED;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/4] ARCv2: perf: optimize given that num counters <= 32
  2017-11-07 22:13 [PATCH 0/4] ARC perf updates Vineet Gupta
  2017-11-07 22:13 ` [PATCH 1/4] ARCv2: perf: tweak overflow interrupt Vineet Gupta
@ 2017-11-07 22:13 ` Vineet Gupta
  2017-11-07 22:13 ` [PATCH 3/4] ARC: perf: avoid vmalloc backed mmap Vineet Gupta
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Vineet Gupta @ 2017-11-07 22:13 UTC (permalink / raw)
  To: linux-snps-arc

use ffz primitive which maps to ARCv2 instruction, vs. non atomic
__test_and_set_bit

It is unlikely if we will even have more than 32 counters, but still add
a BUILD_BUG to catch that

Signed-off-by: Vineet Gupta <vgupta at synopsys.com>
---
 arch/arc/kernel/perf_event.c | 16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/arch/arc/kernel/perf_event.c b/arch/arc/kernel/perf_event.c
index 0eaa132a2c90..8aec462d90fb 100644
--- a/arch/arc/kernel/perf_event.c
+++ b/arch/arc/kernel/perf_event.c
@@ -336,15 +336,12 @@ static int arc_pmu_add(struct perf_event *event, int flags)
 	struct hw_perf_event *hwc = &event->hw;
 	int idx = hwc->idx;
 
-	if (__test_and_set_bit(idx, pmu_cpu->used_mask)) {
-		idx = find_first_zero_bit(pmu_cpu->used_mask,
-					  arc_pmu->n_counters);
-		if (idx == arc_pmu->n_counters)
-			return -EAGAIN;
-
-		__set_bit(idx, pmu_cpu->used_mask);
-		hwc->idx = idx;
-	}
+	idx = ffz(pmu_cpu->used_mask[0]);
+	if (idx == arc_pmu->n_counters)
+		return -EAGAIN;
+
+	__set_bit(idx, pmu_cpu->used_mask);
+	hwc->idx = idx;
 
 	write_aux_reg(ARC_REG_PCT_INDEX, idx);
 
@@ -465,6 +462,7 @@ static int arc_pmu_device_probe(struct platform_device *pdev)
 		pr_err("This core does not have performance counters!\n");
 		return -ENODEV;
 	}
+	BUILD_BUG_ON(ARC_PERF_MAX_COUNTERS > 32);
 	BUG_ON(pct_bcr.c > ARC_PERF_MAX_COUNTERS);
 
 	READ_BCR(ARC_REG_CC_BUILD, cc_bcr);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/4] ARC: perf: avoid vmalloc backed mmap
  2017-11-07 22:13 [PATCH 0/4] ARC perf updates Vineet Gupta
  2017-11-07 22:13 ` [PATCH 1/4] ARCv2: perf: tweak overflow interrupt Vineet Gupta
  2017-11-07 22:13 ` [PATCH 2/4] ARCv2: perf: optimize given that num counters <= 32 Vineet Gupta
@ 2017-11-07 22:13 ` Vineet Gupta
  2017-11-07 22:13 ` [PATCH 4/4] ARCv2: entry: Reduce perf intr return path Vineet Gupta
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Vineet Gupta @ 2017-11-07 22:13 UTC (permalink / raw)
  To: linux-snps-arc

For non-alising Dcache, vmalloc is not needed.

vmalloc triggers additonal D-TLB Misses in the perf interrupt code path
making it slightly inefficient as evident from hackbench runs below.

| [ARCLinux]# perf stat -e dTLB-load-misses --repeat 5 hackbench
| Running with 10*40 (== 400) tasks.
| Time: 35.060
| ...
| Performance counter stats for 'hackbench' (5 runs):

Before:      399235      dTLB-load-misses ( +-  2.08% )
After :      397676      dTLB-load-misses ( +-  2.27% )

Signed-off-by: Vineet Gupta <vgupta at synopsys.com>
---
 arch/arc/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig
index c84e67fdea09..f3cad98eeb8f 100644
--- a/arch/arc/Kconfig
+++ b/arch/arc/Kconfig
@@ -39,7 +39,7 @@ config ARC
 	select OF
 	select OF_EARLY_FLATTREE
 	select OF_RESERVED_MEM
-	select PERF_USE_VMALLOC
+	select PERF_USE_VMALLOC if ARC_CACHE_VIPT_ALIASING
 	select HAVE_DEBUG_STACKOVERFLOW
 	select HAVE_GENERIC_DMA_COHERENT
 	select HAVE_KERNEL_GZIP
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/4] ARCv2: entry: Reduce perf intr return path
  2017-11-07 22:13 [PATCH 0/4] ARC perf updates Vineet Gupta
                   ` (2 preceding siblings ...)
  2017-11-07 22:13 ` [PATCH 3/4] ARC: perf: avoid vmalloc backed mmap Vineet Gupta
@ 2017-11-07 22:13 ` Vineet Gupta
  2017-11-14 10:28   ` Peter Zijlstra
  2017-11-13 21:30 ` [PATCH 0/4] ARC perf updates Vineet Gupta
  2017-11-21 22:09 ` Vineet Gupta
  5 siblings, 1 reply; 13+ messages in thread
From: Vineet Gupta @ 2017-11-07 22:13 UTC (permalink / raw)
  To: linux-snps-arc

In the more likely case of returning to kernel from perf interrupt, do a
fast path returning w/o bothering about CONFIG_PREEMPT etc

However, if returning to user space, need do go thru the usual gyrations,
as check for pending signals is an absolute must.

Signed-off-by: Vineet Gupta <vgupta at synopsys.com>
---
 arch/arc/include/asm/entry-arcv2.h |  2 ++
 arch/arc/kernel/entry-arcv2.S      | 23 ++++++++++++++++++++++-
 2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/arch/arc/include/asm/entry-arcv2.h b/arch/arc/include/asm/entry-arcv2.h
index 257a68f3c2fe..8b49b327b1f9 100644
--- a/arch/arc/include/asm/entry-arcv2.h
+++ b/arch/arc/include/asm/entry-arcv2.h
@@ -58,6 +58,8 @@
 /*------------------------------------------------------------------------*/
 .macro INTERRUPT_EPILOGUE	called_from
 
+	; Assumes STATUS32.Z bit set if return to K
+
 .ifnc \called_from, exception
 	add	sp, sp, 12	; skip BTA/ECR/orig_r0 placeholderss
 .endif
diff --git a/arch/arc/kernel/entry-arcv2.S b/arch/arc/kernel/entry-arcv2.S
index cc558a25b8fa..9ca1d146426b 100644
--- a/arch/arc/kernel/entry-arcv2.S
+++ b/arch/arc/kernel/entry-arcv2.S
@@ -51,7 +51,7 @@ VECTOR	handle_interrupt	; (16) Timer0
 VECTOR	handle_interrupt	; unused (Timer1)
 VECTOR	handle_interrupt	; unused (WDT)
 VECTOR	handle_interrupt	; (19) Inter core Interrupt (IPI)
-VECTOR	handle_interrupt	; (20) perf Interrupt
+VECTOR	handle_interrupt_pct	; (20) perf Interrupt
 VECTOR	handle_interrupt	; (21) Software Triggered Intr (Self IPI)
 VECTOR	handle_interrupt	; unused
 VECTOR	handle_interrupt	; (23) unused
@@ -97,6 +97,26 @@ ENTRY(handle_interrupt)
 
 END(handle_interrupt)
 
+ENTRY(handle_interrupt_pct)
+
+	INTERRUPT_PROLOGUE  irq
+
+	IRQ_DISABLE
+
+	lr	r0, [ICAUSE]
+
+	bl.d	arch_do_IRQ
+	mov	r1, sp
+
+	ld	r0, [sp, PT_status32]   ; returning to User/Kernel Mode
+	btst	r0, STATUS_U_BIT
+	bnz	resume_user_mode_begin
+
+	clri
+	b	.Lisr_ret_fast_path_to_k
+
+END(handle_interrupt_pct)
+
 ;################### Non TLB Exception Handling #############################
 
 ENTRY(EV_SWI)
@@ -224,6 +244,7 @@ debug_marker_l1:
 	bset.nz	r11, r11, AUX_IRQ_ACT_BIT_U	; NZ means U
 	sr	r11, [AUX_IRQ_ACT]
 
+.Lisr_ret_fast_path_to_k:
 	INTERRUPT_EPILOGUE  irq
 	rtie
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 0/4] ARC perf updates
  2017-11-07 22:13 [PATCH 0/4] ARC perf updates Vineet Gupta
                   ` (3 preceding siblings ...)
  2017-11-07 22:13 ` [PATCH 4/4] ARCv2: entry: Reduce perf intr return path Vineet Gupta
@ 2017-11-13 21:30 ` Vineet Gupta
  2017-11-21 22:09 ` Vineet Gupta
  5 siblings, 0 replies; 13+ messages in thread
From: Vineet Gupta @ 2017-11-13 21:30 UTC (permalink / raw)
  To: linux-snps-arc

On 11/07/2017 02:13 PM, Vineet Gupta wrote:
> Hi,
> 
> Found these when cleaning up some old branches. The only controversial one
> could be the last one.
> 
> Thx,
> -Vineet
> 
> Vineet Gupta (4):
>    ARCv2: perf: tweak overflow interrupt
>    ARCv2: perf: optimize given that num counters <= 32
>    ARC: perf: avoid vmalloc backed mmap
>    ARCv2: entry: Reduce perf intr return path
> 
>   arch/arc/Kconfig                   |  2 +-
>   arch/arc/include/asm/entry-arcv2.h |  2 ++
>   arch/arc/kernel/entry-arcv2.S      | 23 +++++++++++++++++++++-
>   arch/arc/kernel/perf_event.c       | 40 ++++++++++++++++++++------------------
>   4 files changed, 46 insertions(+), 21 deletions(-)
> 

Peter, any specific likes/dislikes for this series !

-Vineet

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 4/4] ARCv2: entry: Reduce perf intr return path
  2017-11-07 22:13 ` [PATCH 4/4] ARCv2: entry: Reduce perf intr return path Vineet Gupta
@ 2017-11-14 10:28   ` Peter Zijlstra
  2017-11-14 23:01     ` Vineet Gupta
  0 siblings, 1 reply; 13+ messages in thread
From: Peter Zijlstra @ 2017-11-14 10:28 UTC (permalink / raw)
  To: linux-snps-arc

On Tue, Nov 07, 2017@02:13:04PM -0800, Vineet Gupta wrote:
> In the more likely case of returning to kernel from perf interrupt, do a
> fast path returning w/o bothering about CONFIG_PREEMPT etc

I think this needs more explaining and certainly also deserves a code
comment.

Is the argument something along these lines?

  Assumes the interrupt will never set TIF_NEED_RESCHED;
  therefore no preemption is ever required on return from
  the interrupt.

What do you (on ARC) do about irq_work ?

> +ENTRY(handle_interrupt_pct)
> +
> +	INTERRUPT_PROLOGUE  irq
> +
> +	IRQ_DISABLE
> +
> +	lr	r0, [ICAUSE]
> +
> +	bl.d	arch_do_IRQ
> +	mov	r1, sp
> +
> +	ld	r0, [sp, PT_status32]   ; returning to User/Kernel Mode
> +	btst	r0, STATUS_U_BIT
> +	bnz	resume_user_mode_begin
> +
> +	clri
> +	b	.Lisr_ret_fast_path_to_k
> +
> +END(handle_interrupt_pct)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 4/4] ARCv2: entry: Reduce perf intr return path
  2017-11-14 10:28   ` Peter Zijlstra
@ 2017-11-14 23:01     ` Vineet Gupta
  2017-11-15 10:18       ` Peter Zijlstra
  0 siblings, 1 reply; 13+ messages in thread
From: Vineet Gupta @ 2017-11-14 23:01 UTC (permalink / raw)
  To: linux-snps-arc

On 11/14/2017 02:28 AM, Peter Zijlstra wrote:
> On Tue, Nov 07, 2017@02:13:04PM -0800, Vineet Gupta wrote:
>> In the more likely case of returning to kernel from perf interrupt, do a
>> fast path returning w/o bothering about CONFIG_PREEMPT etc
> 
> I think this needs more explaining and certainly also deserves a code
> comment.

Sure ! It was a quick hack mainly to solicit feedback.


> Is the argument something along these lines?
> 
>    Assumes the interrupt will never set TIF_NEED_RESCHED;
>    therefore no preemption is ever required on return from
>    the interrupt.

No. I don't think we can assume that. But I was choosing to ignore it mainly to 
reduce the overhead of a perf intr in general. A subsequent real interrupt could 
go thru thru the gyrations of preemption etc.


> What do you (on ARC) do about irq_work ?

Nothing ATM. At the time of NPS platform upstreaming, Noam had proposed a patch to 
enable work as part of NO_HZ_FULL support on ARC for NPS. That part of patchseries 
didn't get merged and then life took over...

I'm attaching it here, can be added to ARC I suppose after a bit of deuglification 
for #ifdef'ery.

Although I'm sure it is, can you please explain how irq_work is relevant in the 
context of this patch. Even w/o the TIF_NEED_RESCHED, the generic intr return 
could still invoke the irq_work for perf.

I remember from my investigations at the time that it was originally needed for 
perf_event_do_pending() to do stuff in IRQ context. It was later factored out as 
generic irq_work facility for no_hz and printk !

------------>
 From c120294c80d61bcad223c96bd49303357f95afdc Mon Sep 17 00:00:00 2001
From: Noam Camus <noamc@ezchip.com>
Date: Sun, 21 Jun 2015 01:41:22 +0300
Subject: [PATCH] ARC: Support arch_irq_work_raise() via self IPIs

This will allow the scheduler tick to be restarted.
It is used if CONFIG_NO_HZ_FULL.

Signed-off-by: Gil Fruchter <gilf at ezchip.com>
Signed-off-by: Noam Camus <noamc at ezchip.com>
Signed-off-by: Vineet Gupta <vgupta at synopsys.com>
---
  arch/arc/include/asm/Kbuild     |  1 -
  arch/arc/include/asm/irq_work.h |  8 ++++++++
  arch/arc/kernel/smp.c           | 16 ++++++++++++++++
  3 files changed, 24 insertions(+), 1 deletion(-)
  create mode 100644 arch/arc/include/asm/irq_work.h

diff --git a/arch/arc/include/asm/Kbuild b/arch/arc/include/asm/Kbuild
index 0b10ef2a4372..f83ec8c04feb 100644
--- a/arch/arc/include/asm/Kbuild
+++ b/arch/arc/include/asm/Kbuild
@@ -16,7 +16,6 @@ generic-y += ioctl.h
  generic-y += ioctls.h
  generic-y += ipcbuf.h
  generic-y += irq_regs.h
-generic-y += irq_work.h
  generic-y += kmap_types.h
  generic-y += kvm_para.h
  generic-y += local.h
diff --git a/arch/arc/include/asm/irq_work.h b/arch/arc/include/asm/irq_work.h
new file mode 100644
index 000000000000..40d015199ced
--- /dev/null
+++ b/arch/arc/include/asm/irq_work.h
@@ -0,0 +1,8 @@
+#ifndef _ASM_ARC_IRQ_WORK_H
+#define _ASM_ARC_IRQ_WORK_H
+static inline bool arch_irq_work_has_interrupt(void)
+{
+	return true;
+}
+
+#endif /* _ASM_ARC_IRQ_WORK_H */
diff --git a/arch/arc/kernel/smp.c b/arch/arc/kernel/smp.c
index f2cf58b771b2..c2f49db57265 100644
--- a/arch/arc/kernel/smp.c
+++ b/arch/arc/kernel/smp.c
@@ -22,6 +22,7 @@
  #include <linux/atomic.h>
  #include <linux/cpumask.h>
  #include <linux/reboot.h>
+#include <linux/irq_work.h>
  #include <asm/processor.h>
  #include <asm/setup.h>
  #include <asm/mach_desc.h>
@@ -202,6 +203,7 @@ enum ipi_msg_type {
  	IPI_RESCHEDULE = 1,
  	IPI_CALL_FUNC,
  	IPI_CPU_STOP,
+	IPI_IRQ_WORK,
  };

  /*
@@ -276,6 +278,14 @@ void arch_send_call_function_ipi_mask(const struct cpumask *mask)
  	ipi_send_msg(mask, IPI_CALL_FUNC);
  }

+#ifdef CONFIG_IRQ_WORK
+void arch_irq_work_raise(void)
+{
+	if (arch_irq_work_has_interrupt())
+		ipi_send_msg_one(smp_processor_id(), IPI_IRQ_WORK);
+}
+#endif
+
  /*
   * ipi_cpu_stop - handle IPI from smp_send_stop()
   */
@@ -301,6 +311,12 @@ static inline int __do_IPI(unsigned long msg)
  		ipi_cpu_stop();
  		break;

+#ifdef CONFIG_IRQ_WORK
+	case IPI_IRQ_WORK:
+		irq_work_run();
+		break;
+#endif
+
  	default:
  		rc = 1;
  	}
-- 
2.7.4
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-ARC-Support-arch_irq_work_raise-via-self-IPIs.patch
Type: text/x-patch
Size: 2504 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-snps-arc/attachments/20171114/a9f92dad/attachment.bin>

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/4] ARCv2: entry: Reduce perf intr return path
  2017-11-14 23:01     ` Vineet Gupta
@ 2017-11-15 10:18       ` Peter Zijlstra
  2017-11-17 23:42         ` Vineet Gupta
  0 siblings, 1 reply; 13+ messages in thread
From: Peter Zijlstra @ 2017-11-15 10:18 UTC (permalink / raw)
  To: linux-snps-arc

On Tue, Nov 14, 2017@03:01:26PM -0800, Vineet Gupta wrote:
> On 11/14/2017 02:28 AM, Peter Zijlstra wrote:
> > On Tue, Nov 07, 2017@02:13:04PM -0800, Vineet Gupta wrote:
> > > In the more likely case of returning to kernel from perf interrupt, do a
> > > fast path returning w/o bothering about CONFIG_PREEMPT etc
> > 
> > I think this needs more explaining and certainly also deserves a code
> > comment.
> 
> Sure ! It was a quick hack mainly to solicit feedback.
> 
> 
> > Is the argument something along these lines?
> > 
> >    Assumes the interrupt will never set TIF_NEED_RESCHED;
> >    therefore no preemption is ever required on return from
> >    the interrupt.
> 
> No. I don't think we can assume that.

Well, given we run that code from NMI context on a number of platforms
(x86 being one of them) it can not in fact do things like wakeups.

So the pure perf-interrupt part should never set TIF_NEED_RESCHED.

I think we can actually make that assumption.

> But I was choosing to ignore it mainly to reduce the overhead of a
> perf intr in general. A subsequent real interrupt could go thru thru
> the gyrations of preemption etc.

So that's dangerous thinking... People that run a PREEMPT kernel
generally tend to care about latency (esp. when combined with
PREEMPT_RT).

And ignoring a preemption point gets these people upset (and missed
preemptions are a royal friggin pain to debug).

> > What do you (on ARC) do about irq_work ?
> 
> Nothing ATM.

So the reason I'm asking is that some architectures that don't have NMIs
call irq_work_run() at the very end of their perf-interrupt handler (ARM
does this for instance).

And the thing is, _that_ can and does do things like wakeups and will
thus require doing the PREEMPT thing.

> Although I'm sure it is, can you please explain how irq_work is relevant in
> the context of this patch.

Since the perf interrupt (in general) cannot call a whole lot of things
for it needs to assume running from NMI context, it needs to defer
things to a more regular context. It does this with irq_work.

So for instance, when the output buffer reaches its watermark, we'll
raise the irq_work to issue the wakeup of tasks that poll() on that.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 4/4] ARCv2: entry: Reduce perf intr return path
  2017-11-15 10:18       ` Peter Zijlstra
@ 2017-11-17 23:42         ` Vineet Gupta
  2017-11-21 23:26           ` Vineet Gupta
  0 siblings, 1 reply; 13+ messages in thread
From: Vineet Gupta @ 2017-11-17 23:42 UTC (permalink / raw)
  To: linux-snps-arc


>> But I was choosing to ignore it mainly to reduce the overhead of a
>> perf intr in general. A subsequent real interrupt could go thru thru
>> the gyrations of preemption etc.
> 
> So that's dangerous thinking... People that run a PREEMPT kernel
> generally tend to care about latency (esp. when combined with
> PREEMPT_RT).
> 
> And ignoring a preemption point gets these people upset (and missed
> preemptions are a royal friggin pain to debug).

Which implies that this patch goes to trash ! Unless we think that running 
instrumentation (perf) on production systems will not yield the same behavior in 
general.

>>> What do you (on ARC) do about irq_work ?
>>
>> Nothing ATM.

What I meant was lack of support for arch_irq_work_raise(). But given that we 
don't have NMIs (yet), this need *not* be a must as things could actually be 
scheduled in the regular intr return path ? At any rate arch_irq_work_raise() is 
not relevant for this discussion since NMIs are not involved.

> So the reason I'm asking is that some architectures that don't have NMIs
> call irq_work_run() at the very end of their perf-interrupt handler (ARM
> does this for instance).

But on ARC, we don't call irq_work_run() in perf intr return path and that seem to 
imply it is broken - as in latency to service a perf induced preemption.

> And the thing is, _that_ can and does do things like wakeups and will
> thus require doing the PREEMPT thing.

Reassures that this patch has to go to trash anyways, but I'm more worried about 
perf intr return for ARC in general.

>> Although I'm sure it is, can you please explain how irq_work is relevant in
>> the context of this patch.
> 
> Since the perf interrupt (in general) cannot call a whole lot of things
> for it needs to assume running from NMI context, it needs to defer
> things to a more regular context. It does this with irq_work.

And so do places such as flush_smp_call_function_queue() where the cross-core IPI 
could be an NMI.

> So for instance, when the output buffer reaches its watermark, we'll
> raise the irq_work to issue the wakeup of tasks that poll() on that.

Cool, thx for the explanation.
Perhaps I should put it in a Documentation/irq_work.txt pr some such !

Thx,
-Vineet

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 0/4] ARC perf updates
  2017-11-07 22:13 [PATCH 0/4] ARC perf updates Vineet Gupta
                   ` (4 preceding siblings ...)
  2017-11-13 21:30 ` [PATCH 0/4] ARC perf updates Vineet Gupta
@ 2017-11-21 22:09 ` Vineet Gupta
  2017-11-21 23:13   ` Peter Zijlstra
  5 siblings, 1 reply; 13+ messages in thread
From: Vineet Gupta @ 2017-11-21 22:09 UTC (permalink / raw)
  To: linux-snps-arc

On 11/07/2017 02:13 PM, Vineet Gupta wrote:
> Hi,
> 
> Found these when cleaning up some old branches. The only controversial one
> could be the last one.
> 
> Thx,
> -Vineet
> 
> Vineet Gupta (4):
>    ARCv2: perf: tweak overflow interrupt
>    ARCv2: perf: optimize given that num counters <= 32
>    ARC: perf: avoid vmalloc backed mmap
>    ARCv2: entry: Reduce perf intr return path
> 
>   arch/arc/Kconfig                   |  2 +-
>   arch/arc/include/asm/entry-arcv2.h |  2 ++
>   arch/arc/kernel/entry-arcv2.S      | 23 +++++++++++++++++++++-
>   arch/arc/kernel/perf_event.c       | 40 ++++++++++++++++++++------------------
>   4 files changed, 46 insertions(+), 21 deletions(-)

Peter, do I need an ACK from you for the initial 3 patches to able to add them to 
ARC tree ?

-Vineet

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 0/4] ARC perf updates
  2017-11-21 22:09 ` Vineet Gupta
@ 2017-11-21 23:13   ` Peter Zijlstra
  0 siblings, 0 replies; 13+ messages in thread
From: Peter Zijlstra @ 2017-11-21 23:13 UTC (permalink / raw)
  To: linux-snps-arc

On Tue, Nov 21, 2017@02:09:38PM -0800, Vineet Gupta wrote:
> On 11/07/2017 02:13 PM, Vineet Gupta wrote:
> > Hi,
> > 
> > Found these when cleaning up some old branches. The only controversial one
> > could be the last one.
> > 
> > Thx,
> > -Vineet
> > 
> > Vineet Gupta (4):
> >    ARCv2: perf: tweak overflow interrupt
> >    ARCv2: perf: optimize given that num counters <= 32
> >    ARC: perf: avoid vmalloc backed mmap
> >    ARCv2: entry: Reduce perf intr return path
> > 
> >   arch/arc/Kconfig                   |  2 +-
> >   arch/arc/include/asm/entry-arcv2.h |  2 ++
> >   arch/arc/kernel/entry-arcv2.S      | 23 +++++++++++++++++++++-
> >   arch/arc/kernel/perf_event.c       | 40 ++++++++++++++++++++------------------
> >   4 files changed, 46 insertions(+), 21 deletions(-)
> 
> Peter, do I need an ACK from you for the initial 3 patches to able to add
> them to ARC tree ?

Sure, for 1-3

Acked-by: Peter Zijlstra (Intel) <peterz at infradead.org>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 4/4] ARCv2: entry: Reduce perf intr return path
  2017-11-17 23:42         ` Vineet Gupta
@ 2017-11-21 23:26           ` Vineet Gupta
  0 siblings, 0 replies; 13+ messages in thread
From: Vineet Gupta @ 2017-11-21 23:26 UTC (permalink / raw)
  To: linux-snps-arc

On 11/17/2017 03:42 PM, Vineet Gupta wrote:

>>>> What do you (on ARC) do about irq_work ?
>>>
> 
>> So the reason I'm asking is that some architectures that don't have NMIs
>> call irq_work_run() at the very end of their perf-interrupt handler (ARM
>> does this for instance).
> 
> But on ARC, we don't call irq_work_run() in perf intr return path and that seem to 
> imply it is broken - as in latency to service a perf induced preemption.

[snip...]

>>> Although I'm sure it is, can you please explain how irq_work is relevant in
>>> the context of this patch.
>>
>> Since the perf interrupt (in general) cannot call a whole lot of things
>> for it needs to assume running from NMI context, it needs to defer
>> things to a more regular context. It does this with irq_work.

So given my understanding of this topic, ARC (or any non NMI based perf intr 
system) is potentially broken without irq_work_run() ?

I can follow up with a patch for ARC, or does this need to addressed for others 
too - say irq_exit_perf() or some such ?

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2017-11-21 23:26 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-11-07 22:13 [PATCH 0/4] ARC perf updates Vineet Gupta
2017-11-07 22:13 ` [PATCH 1/4] ARCv2: perf: tweak overflow interrupt Vineet Gupta
2017-11-07 22:13 ` [PATCH 2/4] ARCv2: perf: optimize given that num counters <= 32 Vineet Gupta
2017-11-07 22:13 ` [PATCH 3/4] ARC: perf: avoid vmalloc backed mmap Vineet Gupta
2017-11-07 22:13 ` [PATCH 4/4] ARCv2: entry: Reduce perf intr return path Vineet Gupta
2017-11-14 10:28   ` Peter Zijlstra
2017-11-14 23:01     ` Vineet Gupta
2017-11-15 10:18       ` Peter Zijlstra
2017-11-17 23:42         ` Vineet Gupta
2017-11-21 23:26           ` Vineet Gupta
2017-11-13 21:30 ` [PATCH 0/4] ARC perf updates Vineet Gupta
2017-11-21 22:09 ` Vineet Gupta
2017-11-21 23:13   ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).