* [PATCH 0/4] ARC perf updates
@ 2017-11-07 22:13 Vineet Gupta
2017-11-07 22:13 ` [PATCH 1/4] ARCv2: perf: tweak overflow interrupt Vineet Gupta
` (5 more replies)
0 siblings, 6 replies; 13+ messages in thread
From: Vineet Gupta @ 2017-11-07 22:13 UTC (permalink / raw)
To: linux-snps-arc
Hi,
Found these when cleaning up some old branches. The only controversial one
could be the last one.
Thx,
-Vineet
Vineet Gupta (4):
ARCv2: perf: tweak overflow interrupt
ARCv2: perf: optimize given that num counters <= 32
ARC: perf: avoid vmalloc backed mmap
ARCv2: entry: Reduce perf intr return path
arch/arc/Kconfig | 2 +-
arch/arc/include/asm/entry-arcv2.h | 2 ++
arch/arc/kernel/entry-arcv2.S | 23 +++++++++++++++++++++-
arch/arc/kernel/perf_event.c | 40 ++++++++++++++++++++------------------
4 files changed, 46 insertions(+), 21 deletions(-)
--
2.7.4
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 1/4] ARCv2: perf: tweak overflow interrupt
2017-11-07 22:13 [PATCH 0/4] ARC perf updates Vineet Gupta
@ 2017-11-07 22:13 ` Vineet Gupta
2017-11-07 22:13 ` [PATCH 2/4] ARCv2: perf: optimize given that num counters <= 32 Vineet Gupta
` (4 subsequent siblings)
5 siblings, 0 replies; 13+ messages in thread
From: Vineet Gupta @ 2017-11-07 22:13 UTC (permalink / raw)
To: linux-snps-arc
Current perf ISR loops thru all 32 counters, checking for each if it
caused the interrupt. Instead only loop thru counters which actually
interrupted (typically 1).
Signed-off-by: Vineet Gupta <vgupta at synopsys.com>
---
arch/arc/kernel/perf_event.c | 24 ++++++++++++++----------
1 file changed, 14 insertions(+), 10 deletions(-)
diff --git a/arch/arc/kernel/perf_event.c b/arch/arc/kernel/perf_event.c
index 2ce24e74f879..0eaa132a2c90 100644
--- a/arch/arc/kernel/perf_event.c
+++ b/arch/arc/kernel/perf_event.c
@@ -377,21 +377,22 @@ static irqreturn_t arc_pmu_intr(int irq, void *dev)
struct perf_sample_data data;
struct arc_pmu_cpu *pmu_cpu = this_cpu_ptr(&arc_pmu_cpu);
struct pt_regs *regs;
- int active_ints;
+ unsigned int active_ints;
int idx;
arc_pmu_disable(&arc_pmu->pmu);
active_ints = read_aux_reg(ARC_REG_PCT_INT_ACT);
+ if (!active_ints)
+ goto done;
regs = get_irq_regs();
- for (idx = 0; idx < arc_pmu->n_counters; idx++) {
- struct perf_event *event = pmu_cpu->act_counter[idx];
+ do {
+ struct perf_event *event;
struct hw_perf_event *hwc;
- if (!(active_ints & (1 << idx)))
- continue;
+ idx = __ffs(active_ints);
/* Reset interrupt flag by writing of 1 */
write_aux_reg(ARC_REG_PCT_INT_ACT, 1 << idx);
@@ -404,19 +405,22 @@ static irqreturn_t arc_pmu_intr(int irq, void *dev)
write_aux_reg(ARC_REG_PCT_INT_CTRL,
read_aux_reg(ARC_REG_PCT_INT_CTRL) | (1 << idx));
+ event = pmu_cpu->act_counter[idx];
hwc = &event->hw;
WARN_ON_ONCE(hwc->idx != idx);
arc_perf_event_update(event, &event->hw, event->hw.idx);
perf_sample_data_init(&data, 0, hwc->last_period);
- if (!arc_pmu_event_set_period(event))
- continue;
+ if (arc_pmu_event_set_period(event)) {
+ if (perf_event_overflow(event, &data, regs))
+ arc_pmu_stop(event, 0);
+ }
- if (perf_event_overflow(event, &data, regs))
- arc_pmu_stop(event, 0);
- }
+ active_ints &= ~(1U << idx);
+ } while (active_ints);
+done:
arc_pmu_enable(&arc_pmu->pmu);
return IRQ_HANDLED;
--
2.7.4
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 2/4] ARCv2: perf: optimize given that num counters <= 32
2017-11-07 22:13 [PATCH 0/4] ARC perf updates Vineet Gupta
2017-11-07 22:13 ` [PATCH 1/4] ARCv2: perf: tweak overflow interrupt Vineet Gupta
@ 2017-11-07 22:13 ` Vineet Gupta
2017-11-07 22:13 ` [PATCH 3/4] ARC: perf: avoid vmalloc backed mmap Vineet Gupta
` (3 subsequent siblings)
5 siblings, 0 replies; 13+ messages in thread
From: Vineet Gupta @ 2017-11-07 22:13 UTC (permalink / raw)
To: linux-snps-arc
use ffz primitive which maps to ARCv2 instruction, vs. non atomic
__test_and_set_bit
It is unlikely if we will even have more than 32 counters, but still add
a BUILD_BUG to catch that
Signed-off-by: Vineet Gupta <vgupta at synopsys.com>
---
arch/arc/kernel/perf_event.c | 16 +++++++---------
1 file changed, 7 insertions(+), 9 deletions(-)
diff --git a/arch/arc/kernel/perf_event.c b/arch/arc/kernel/perf_event.c
index 0eaa132a2c90..8aec462d90fb 100644
--- a/arch/arc/kernel/perf_event.c
+++ b/arch/arc/kernel/perf_event.c
@@ -336,15 +336,12 @@ static int arc_pmu_add(struct perf_event *event, int flags)
struct hw_perf_event *hwc = &event->hw;
int idx = hwc->idx;
- if (__test_and_set_bit(idx, pmu_cpu->used_mask)) {
- idx = find_first_zero_bit(pmu_cpu->used_mask,
- arc_pmu->n_counters);
- if (idx == arc_pmu->n_counters)
- return -EAGAIN;
-
- __set_bit(idx, pmu_cpu->used_mask);
- hwc->idx = idx;
- }
+ idx = ffz(pmu_cpu->used_mask[0]);
+ if (idx == arc_pmu->n_counters)
+ return -EAGAIN;
+
+ __set_bit(idx, pmu_cpu->used_mask);
+ hwc->idx = idx;
write_aux_reg(ARC_REG_PCT_INDEX, idx);
@@ -465,6 +462,7 @@ static int arc_pmu_device_probe(struct platform_device *pdev)
pr_err("This core does not have performance counters!\n");
return -ENODEV;
}
+ BUILD_BUG_ON(ARC_PERF_MAX_COUNTERS > 32);
BUG_ON(pct_bcr.c > ARC_PERF_MAX_COUNTERS);
READ_BCR(ARC_REG_CC_BUILD, cc_bcr);
--
2.7.4
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 3/4] ARC: perf: avoid vmalloc backed mmap
2017-11-07 22:13 [PATCH 0/4] ARC perf updates Vineet Gupta
2017-11-07 22:13 ` [PATCH 1/4] ARCv2: perf: tweak overflow interrupt Vineet Gupta
2017-11-07 22:13 ` [PATCH 2/4] ARCv2: perf: optimize given that num counters <= 32 Vineet Gupta
@ 2017-11-07 22:13 ` Vineet Gupta
2017-11-07 22:13 ` [PATCH 4/4] ARCv2: entry: Reduce perf intr return path Vineet Gupta
` (2 subsequent siblings)
5 siblings, 0 replies; 13+ messages in thread
From: Vineet Gupta @ 2017-11-07 22:13 UTC (permalink / raw)
To: linux-snps-arc
For non-alising Dcache, vmalloc is not needed.
vmalloc triggers additonal D-TLB Misses in the perf interrupt code path
making it slightly inefficient as evident from hackbench runs below.
| [ARCLinux]# perf stat -e dTLB-load-misses --repeat 5 hackbench
| Running with 10*40 (== 400) tasks.
| Time: 35.060
| ...
| Performance counter stats for 'hackbench' (5 runs):
Before: 399235 dTLB-load-misses ( +- 2.08% )
After : 397676 dTLB-load-misses ( +- 2.27% )
Signed-off-by: Vineet Gupta <vgupta at synopsys.com>
---
arch/arc/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig
index c84e67fdea09..f3cad98eeb8f 100644
--- a/arch/arc/Kconfig
+++ b/arch/arc/Kconfig
@@ -39,7 +39,7 @@ config ARC
select OF
select OF_EARLY_FLATTREE
select OF_RESERVED_MEM
- select PERF_USE_VMALLOC
+ select PERF_USE_VMALLOC if ARC_CACHE_VIPT_ALIASING
select HAVE_DEBUG_STACKOVERFLOW
select HAVE_GENERIC_DMA_COHERENT
select HAVE_KERNEL_GZIP
--
2.7.4
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 4/4] ARCv2: entry: Reduce perf intr return path
2017-11-07 22:13 [PATCH 0/4] ARC perf updates Vineet Gupta
` (2 preceding siblings ...)
2017-11-07 22:13 ` [PATCH 3/4] ARC: perf: avoid vmalloc backed mmap Vineet Gupta
@ 2017-11-07 22:13 ` Vineet Gupta
2017-11-14 10:28 ` Peter Zijlstra
2017-11-13 21:30 ` [PATCH 0/4] ARC perf updates Vineet Gupta
2017-11-21 22:09 ` Vineet Gupta
5 siblings, 1 reply; 13+ messages in thread
From: Vineet Gupta @ 2017-11-07 22:13 UTC (permalink / raw)
To: linux-snps-arc
In the more likely case of returning to kernel from perf interrupt, do a
fast path returning w/o bothering about CONFIG_PREEMPT etc
However, if returning to user space, need do go thru the usual gyrations,
as check for pending signals is an absolute must.
Signed-off-by: Vineet Gupta <vgupta at synopsys.com>
---
arch/arc/include/asm/entry-arcv2.h | 2 ++
arch/arc/kernel/entry-arcv2.S | 23 ++++++++++++++++++++++-
2 files changed, 24 insertions(+), 1 deletion(-)
diff --git a/arch/arc/include/asm/entry-arcv2.h b/arch/arc/include/asm/entry-arcv2.h
index 257a68f3c2fe..8b49b327b1f9 100644
--- a/arch/arc/include/asm/entry-arcv2.h
+++ b/arch/arc/include/asm/entry-arcv2.h
@@ -58,6 +58,8 @@
/*------------------------------------------------------------------------*/
.macro INTERRUPT_EPILOGUE called_from
+ ; Assumes STATUS32.Z bit set if return to K
+
.ifnc \called_from, exception
add sp, sp, 12 ; skip BTA/ECR/orig_r0 placeholderss
.endif
diff --git a/arch/arc/kernel/entry-arcv2.S b/arch/arc/kernel/entry-arcv2.S
index cc558a25b8fa..9ca1d146426b 100644
--- a/arch/arc/kernel/entry-arcv2.S
+++ b/arch/arc/kernel/entry-arcv2.S
@@ -51,7 +51,7 @@ VECTOR handle_interrupt ; (16) Timer0
VECTOR handle_interrupt ; unused (Timer1)
VECTOR handle_interrupt ; unused (WDT)
VECTOR handle_interrupt ; (19) Inter core Interrupt (IPI)
-VECTOR handle_interrupt ; (20) perf Interrupt
+VECTOR handle_interrupt_pct ; (20) perf Interrupt
VECTOR handle_interrupt ; (21) Software Triggered Intr (Self IPI)
VECTOR handle_interrupt ; unused
VECTOR handle_interrupt ; (23) unused
@@ -97,6 +97,26 @@ ENTRY(handle_interrupt)
END(handle_interrupt)
+ENTRY(handle_interrupt_pct)
+
+ INTERRUPT_PROLOGUE irq
+
+ IRQ_DISABLE
+
+ lr r0, [ICAUSE]
+
+ bl.d arch_do_IRQ
+ mov r1, sp
+
+ ld r0, [sp, PT_status32] ; returning to User/Kernel Mode
+ btst r0, STATUS_U_BIT
+ bnz resume_user_mode_begin
+
+ clri
+ b .Lisr_ret_fast_path_to_k
+
+END(handle_interrupt_pct)
+
;################### Non TLB Exception Handling #############################
ENTRY(EV_SWI)
@@ -224,6 +244,7 @@ debug_marker_l1:
bset.nz r11, r11, AUX_IRQ_ACT_BIT_U ; NZ means U
sr r11, [AUX_IRQ_ACT]
+.Lisr_ret_fast_path_to_k:
INTERRUPT_EPILOGUE irq
rtie
--
2.7.4
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 0/4] ARC perf updates
2017-11-07 22:13 [PATCH 0/4] ARC perf updates Vineet Gupta
` (3 preceding siblings ...)
2017-11-07 22:13 ` [PATCH 4/4] ARCv2: entry: Reduce perf intr return path Vineet Gupta
@ 2017-11-13 21:30 ` Vineet Gupta
2017-11-21 22:09 ` Vineet Gupta
5 siblings, 0 replies; 13+ messages in thread
From: Vineet Gupta @ 2017-11-13 21:30 UTC (permalink / raw)
To: linux-snps-arc
On 11/07/2017 02:13 PM, Vineet Gupta wrote:
> Hi,
>
> Found these when cleaning up some old branches. The only controversial one
> could be the last one.
>
> Thx,
> -Vineet
>
> Vineet Gupta (4):
> ARCv2: perf: tweak overflow interrupt
> ARCv2: perf: optimize given that num counters <= 32
> ARC: perf: avoid vmalloc backed mmap
> ARCv2: entry: Reduce perf intr return path
>
> arch/arc/Kconfig | 2 +-
> arch/arc/include/asm/entry-arcv2.h | 2 ++
> arch/arc/kernel/entry-arcv2.S | 23 +++++++++++++++++++++-
> arch/arc/kernel/perf_event.c | 40 ++++++++++++++++++++------------------
> 4 files changed, 46 insertions(+), 21 deletions(-)
>
Peter, any specific likes/dislikes for this series !
-Vineet
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 4/4] ARCv2: entry: Reduce perf intr return path
2017-11-07 22:13 ` [PATCH 4/4] ARCv2: entry: Reduce perf intr return path Vineet Gupta
@ 2017-11-14 10:28 ` Peter Zijlstra
2017-11-14 23:01 ` Vineet Gupta
0 siblings, 1 reply; 13+ messages in thread
From: Peter Zijlstra @ 2017-11-14 10:28 UTC (permalink / raw)
To: linux-snps-arc
On Tue, Nov 07, 2017@02:13:04PM -0800, Vineet Gupta wrote:
> In the more likely case of returning to kernel from perf interrupt, do a
> fast path returning w/o bothering about CONFIG_PREEMPT etc
I think this needs more explaining and certainly also deserves a code
comment.
Is the argument something along these lines?
Assumes the interrupt will never set TIF_NEED_RESCHED;
therefore no preemption is ever required on return from
the interrupt.
What do you (on ARC) do about irq_work ?
> +ENTRY(handle_interrupt_pct)
> +
> + INTERRUPT_PROLOGUE irq
> +
> + IRQ_DISABLE
> +
> + lr r0, [ICAUSE]
> +
> + bl.d arch_do_IRQ
> + mov r1, sp
> +
> + ld r0, [sp, PT_status32] ; returning to User/Kernel Mode
> + btst r0, STATUS_U_BIT
> + bnz resume_user_mode_begin
> +
> + clri
> + b .Lisr_ret_fast_path_to_k
> +
> +END(handle_interrupt_pct)
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 4/4] ARCv2: entry: Reduce perf intr return path
2017-11-14 10:28 ` Peter Zijlstra
@ 2017-11-14 23:01 ` Vineet Gupta
2017-11-15 10:18 ` Peter Zijlstra
0 siblings, 1 reply; 13+ messages in thread
From: Vineet Gupta @ 2017-11-14 23:01 UTC (permalink / raw)
To: linux-snps-arc
On 11/14/2017 02:28 AM, Peter Zijlstra wrote:
> On Tue, Nov 07, 2017@02:13:04PM -0800, Vineet Gupta wrote:
>> In the more likely case of returning to kernel from perf interrupt, do a
>> fast path returning w/o bothering about CONFIG_PREEMPT etc
>
> I think this needs more explaining and certainly also deserves a code
> comment.
Sure ! It was a quick hack mainly to solicit feedback.
> Is the argument something along these lines?
>
> Assumes the interrupt will never set TIF_NEED_RESCHED;
> therefore no preemption is ever required on return from
> the interrupt.
No. I don't think we can assume that. But I was choosing to ignore it mainly to
reduce the overhead of a perf intr in general. A subsequent real interrupt could
go thru thru the gyrations of preemption etc.
> What do you (on ARC) do about irq_work ?
Nothing ATM. At the time of NPS platform upstreaming, Noam had proposed a patch to
enable work as part of NO_HZ_FULL support on ARC for NPS. That part of patchseries
didn't get merged and then life took over...
I'm attaching it here, can be added to ARC I suppose after a bit of deuglification
for #ifdef'ery.
Although I'm sure it is, can you please explain how irq_work is relevant in the
context of this patch. Even w/o the TIF_NEED_RESCHED, the generic intr return
could still invoke the irq_work for perf.
I remember from my investigations at the time that it was originally needed for
perf_event_do_pending() to do stuff in IRQ context. It was later factored out as
generic irq_work facility for no_hz and printk !
------------>
From c120294c80d61bcad223c96bd49303357f95afdc Mon Sep 17 00:00:00 2001
From: Noam Camus <noamc@ezchip.com>
Date: Sun, 21 Jun 2015 01:41:22 +0300
Subject: [PATCH] ARC: Support arch_irq_work_raise() via self IPIs
This will allow the scheduler tick to be restarted.
It is used if CONFIG_NO_HZ_FULL.
Signed-off-by: Gil Fruchter <gilf at ezchip.com>
Signed-off-by: Noam Camus <noamc at ezchip.com>
Signed-off-by: Vineet Gupta <vgupta at synopsys.com>
---
arch/arc/include/asm/Kbuild | 1 -
arch/arc/include/asm/irq_work.h | 8 ++++++++
arch/arc/kernel/smp.c | 16 ++++++++++++++++
3 files changed, 24 insertions(+), 1 deletion(-)
create mode 100644 arch/arc/include/asm/irq_work.h
diff --git a/arch/arc/include/asm/Kbuild b/arch/arc/include/asm/Kbuild
index 0b10ef2a4372..f83ec8c04feb 100644
--- a/arch/arc/include/asm/Kbuild
+++ b/arch/arc/include/asm/Kbuild
@@ -16,7 +16,6 @@ generic-y += ioctl.h
generic-y += ioctls.h
generic-y += ipcbuf.h
generic-y += irq_regs.h
-generic-y += irq_work.h
generic-y += kmap_types.h
generic-y += kvm_para.h
generic-y += local.h
diff --git a/arch/arc/include/asm/irq_work.h b/arch/arc/include/asm/irq_work.h
new file mode 100644
index 000000000000..40d015199ced
--- /dev/null
+++ b/arch/arc/include/asm/irq_work.h
@@ -0,0 +1,8 @@
+#ifndef _ASM_ARC_IRQ_WORK_H
+#define _ASM_ARC_IRQ_WORK_H
+static inline bool arch_irq_work_has_interrupt(void)
+{
+ return true;
+}
+
+#endif /* _ASM_ARC_IRQ_WORK_H */
diff --git a/arch/arc/kernel/smp.c b/arch/arc/kernel/smp.c
index f2cf58b771b2..c2f49db57265 100644
--- a/arch/arc/kernel/smp.c
+++ b/arch/arc/kernel/smp.c
@@ -22,6 +22,7 @@
#include <linux/atomic.h>
#include <linux/cpumask.h>
#include <linux/reboot.h>
+#include <linux/irq_work.h>
#include <asm/processor.h>
#include <asm/setup.h>
#include <asm/mach_desc.h>
@@ -202,6 +203,7 @@ enum ipi_msg_type {
IPI_RESCHEDULE = 1,
IPI_CALL_FUNC,
IPI_CPU_STOP,
+ IPI_IRQ_WORK,
};
/*
@@ -276,6 +278,14 @@ void arch_send_call_function_ipi_mask(const struct cpumask *mask)
ipi_send_msg(mask, IPI_CALL_FUNC);
}
+#ifdef CONFIG_IRQ_WORK
+void arch_irq_work_raise(void)
+{
+ if (arch_irq_work_has_interrupt())
+ ipi_send_msg_one(smp_processor_id(), IPI_IRQ_WORK);
+}
+#endif
+
/*
* ipi_cpu_stop - handle IPI from smp_send_stop()
*/
@@ -301,6 +311,12 @@ static inline int __do_IPI(unsigned long msg)
ipi_cpu_stop();
break;
+#ifdef CONFIG_IRQ_WORK
+ case IPI_IRQ_WORK:
+ irq_work_run();
+ break;
+#endif
+
default:
rc = 1;
}
--
2.7.4
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-ARC-Support-arch_irq_work_raise-via-self-IPIs.patch
Type: text/x-patch
Size: 2504 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-snps-arc/attachments/20171114/a9f92dad/attachment.bin>
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 4/4] ARCv2: entry: Reduce perf intr return path
2017-11-14 23:01 ` Vineet Gupta
@ 2017-11-15 10:18 ` Peter Zijlstra
2017-11-17 23:42 ` Vineet Gupta
0 siblings, 1 reply; 13+ messages in thread
From: Peter Zijlstra @ 2017-11-15 10:18 UTC (permalink / raw)
To: linux-snps-arc
On Tue, Nov 14, 2017@03:01:26PM -0800, Vineet Gupta wrote:
> On 11/14/2017 02:28 AM, Peter Zijlstra wrote:
> > On Tue, Nov 07, 2017@02:13:04PM -0800, Vineet Gupta wrote:
> > > In the more likely case of returning to kernel from perf interrupt, do a
> > > fast path returning w/o bothering about CONFIG_PREEMPT etc
> >
> > I think this needs more explaining and certainly also deserves a code
> > comment.
>
> Sure ! It was a quick hack mainly to solicit feedback.
>
>
> > Is the argument something along these lines?
> >
> > Assumes the interrupt will never set TIF_NEED_RESCHED;
> > therefore no preemption is ever required on return from
> > the interrupt.
>
> No. I don't think we can assume that.
Well, given we run that code from NMI context on a number of platforms
(x86 being one of them) it can not in fact do things like wakeups.
So the pure perf-interrupt part should never set TIF_NEED_RESCHED.
I think we can actually make that assumption.
> But I was choosing to ignore it mainly to reduce the overhead of a
> perf intr in general. A subsequent real interrupt could go thru thru
> the gyrations of preemption etc.
So that's dangerous thinking... People that run a PREEMPT kernel
generally tend to care about latency (esp. when combined with
PREEMPT_RT).
And ignoring a preemption point gets these people upset (and missed
preemptions are a royal friggin pain to debug).
> > What do you (on ARC) do about irq_work ?
>
> Nothing ATM.
So the reason I'm asking is that some architectures that don't have NMIs
call irq_work_run() at the very end of their perf-interrupt handler (ARM
does this for instance).
And the thing is, _that_ can and does do things like wakeups and will
thus require doing the PREEMPT thing.
> Although I'm sure it is, can you please explain how irq_work is relevant in
> the context of this patch.
Since the perf interrupt (in general) cannot call a whole lot of things
for it needs to assume running from NMI context, it needs to defer
things to a more regular context. It does this with irq_work.
So for instance, when the output buffer reaches its watermark, we'll
raise the irq_work to issue the wakeup of tasks that poll() on that.
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 4/4] ARCv2: entry: Reduce perf intr return path
2017-11-15 10:18 ` Peter Zijlstra
@ 2017-11-17 23:42 ` Vineet Gupta
2017-11-21 23:26 ` Vineet Gupta
0 siblings, 1 reply; 13+ messages in thread
From: Vineet Gupta @ 2017-11-17 23:42 UTC (permalink / raw)
To: linux-snps-arc
>> But I was choosing to ignore it mainly to reduce the overhead of a
>> perf intr in general. A subsequent real interrupt could go thru thru
>> the gyrations of preemption etc.
>
> So that's dangerous thinking... People that run a PREEMPT kernel
> generally tend to care about latency (esp. when combined with
> PREEMPT_RT).
>
> And ignoring a preemption point gets these people upset (and missed
> preemptions are a royal friggin pain to debug).
Which implies that this patch goes to trash ! Unless we think that running
instrumentation (perf) on production systems will not yield the same behavior in
general.
>>> What do you (on ARC) do about irq_work ?
>>
>> Nothing ATM.
What I meant was lack of support for arch_irq_work_raise(). But given that we
don't have NMIs (yet), this need *not* be a must as things could actually be
scheduled in the regular intr return path ? At any rate arch_irq_work_raise() is
not relevant for this discussion since NMIs are not involved.
> So the reason I'm asking is that some architectures that don't have NMIs
> call irq_work_run() at the very end of their perf-interrupt handler (ARM
> does this for instance).
But on ARC, we don't call irq_work_run() in perf intr return path and that seem to
imply it is broken - as in latency to service a perf induced preemption.
> And the thing is, _that_ can and does do things like wakeups and will
> thus require doing the PREEMPT thing.
Reassures that this patch has to go to trash anyways, but I'm more worried about
perf intr return for ARC in general.
>> Although I'm sure it is, can you please explain how irq_work is relevant in
>> the context of this patch.
>
> Since the perf interrupt (in general) cannot call a whole lot of things
> for it needs to assume running from NMI context, it needs to defer
> things to a more regular context. It does this with irq_work.
And so do places such as flush_smp_call_function_queue() where the cross-core IPI
could be an NMI.
> So for instance, when the output buffer reaches its watermark, we'll
> raise the irq_work to issue the wakeup of tasks that poll() on that.
Cool, thx for the explanation.
Perhaps I should put it in a Documentation/irq_work.txt pr some such !
Thx,
-Vineet
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 0/4] ARC perf updates
2017-11-07 22:13 [PATCH 0/4] ARC perf updates Vineet Gupta
` (4 preceding siblings ...)
2017-11-13 21:30 ` [PATCH 0/4] ARC perf updates Vineet Gupta
@ 2017-11-21 22:09 ` Vineet Gupta
2017-11-21 23:13 ` Peter Zijlstra
5 siblings, 1 reply; 13+ messages in thread
From: Vineet Gupta @ 2017-11-21 22:09 UTC (permalink / raw)
To: linux-snps-arc
On 11/07/2017 02:13 PM, Vineet Gupta wrote:
> Hi,
>
> Found these when cleaning up some old branches. The only controversial one
> could be the last one.
>
> Thx,
> -Vineet
>
> Vineet Gupta (4):
> ARCv2: perf: tweak overflow interrupt
> ARCv2: perf: optimize given that num counters <= 32
> ARC: perf: avoid vmalloc backed mmap
> ARCv2: entry: Reduce perf intr return path
>
> arch/arc/Kconfig | 2 +-
> arch/arc/include/asm/entry-arcv2.h | 2 ++
> arch/arc/kernel/entry-arcv2.S | 23 +++++++++++++++++++++-
> arch/arc/kernel/perf_event.c | 40 ++++++++++++++++++++------------------
> 4 files changed, 46 insertions(+), 21 deletions(-)
Peter, do I need an ACK from you for the initial 3 patches to able to add them to
ARC tree ?
-Vineet
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 0/4] ARC perf updates
2017-11-21 22:09 ` Vineet Gupta
@ 2017-11-21 23:13 ` Peter Zijlstra
0 siblings, 0 replies; 13+ messages in thread
From: Peter Zijlstra @ 2017-11-21 23:13 UTC (permalink / raw)
To: linux-snps-arc
On Tue, Nov 21, 2017@02:09:38PM -0800, Vineet Gupta wrote:
> On 11/07/2017 02:13 PM, Vineet Gupta wrote:
> > Hi,
> >
> > Found these when cleaning up some old branches. The only controversial one
> > could be the last one.
> >
> > Thx,
> > -Vineet
> >
> > Vineet Gupta (4):
> > ARCv2: perf: tweak overflow interrupt
> > ARCv2: perf: optimize given that num counters <= 32
> > ARC: perf: avoid vmalloc backed mmap
> > ARCv2: entry: Reduce perf intr return path
> >
> > arch/arc/Kconfig | 2 +-
> > arch/arc/include/asm/entry-arcv2.h | 2 ++
> > arch/arc/kernel/entry-arcv2.S | 23 +++++++++++++++++++++-
> > arch/arc/kernel/perf_event.c | 40 ++++++++++++++++++++------------------
> > 4 files changed, 46 insertions(+), 21 deletions(-)
>
> Peter, do I need an ACK from you for the initial 3 patches to able to add
> them to ARC tree ?
Sure, for 1-3
Acked-by: Peter Zijlstra (Intel) <peterz at infradead.org>
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 4/4] ARCv2: entry: Reduce perf intr return path
2017-11-17 23:42 ` Vineet Gupta
@ 2017-11-21 23:26 ` Vineet Gupta
0 siblings, 0 replies; 13+ messages in thread
From: Vineet Gupta @ 2017-11-21 23:26 UTC (permalink / raw)
To: linux-snps-arc
On 11/17/2017 03:42 PM, Vineet Gupta wrote:
>>>> What do you (on ARC) do about irq_work ?
>>>
>
>> So the reason I'm asking is that some architectures that don't have NMIs
>> call irq_work_run() at the very end of their perf-interrupt handler (ARM
>> does this for instance).
>
> But on ARC, we don't call irq_work_run() in perf intr return path and that seem to
> imply it is broken - as in latency to service a perf induced preemption.
[snip...]
>>> Although I'm sure it is, can you please explain how irq_work is relevant in
>>> the context of this patch.
>>
>> Since the perf interrupt (in general) cannot call a whole lot of things
>> for it needs to assume running from NMI context, it needs to defer
>> things to a more regular context. It does this with irq_work.
So given my understanding of this topic, ARC (or any non NMI based perf intr
system) is potentially broken without irq_work_run() ?
I can follow up with a patch for ARC, or does this need to addressed for others
too - say irq_exit_perf() or some such ?
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2017-11-21 23:26 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-11-07 22:13 [PATCH 0/4] ARC perf updates Vineet Gupta
2017-11-07 22:13 ` [PATCH 1/4] ARCv2: perf: tweak overflow interrupt Vineet Gupta
2017-11-07 22:13 ` [PATCH 2/4] ARCv2: perf: optimize given that num counters <= 32 Vineet Gupta
2017-11-07 22:13 ` [PATCH 3/4] ARC: perf: avoid vmalloc backed mmap Vineet Gupta
2017-11-07 22:13 ` [PATCH 4/4] ARCv2: entry: Reduce perf intr return path Vineet Gupta
2017-11-14 10:28 ` Peter Zijlstra
2017-11-14 23:01 ` Vineet Gupta
2017-11-15 10:18 ` Peter Zijlstra
2017-11-17 23:42 ` Vineet Gupta
2017-11-21 23:26 ` Vineet Gupta
2017-11-13 21:30 ` [PATCH 0/4] ARC perf updates Vineet Gupta
2017-11-21 22:09 ` Vineet Gupta
2017-11-21 23:13 ` Peter Zijlstra
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).