* get_irq_regs() from soft IRQ
@ 2009-06-29 14:31 Jean Pihet
2009-06-29 15:19 ` Russell King - ARM Linux
2009-06-29 16:36 ` Siarhei Siamashka
0 siblings, 2 replies; 16+ messages in thread
From: Jean Pihet @ 2009-06-29 14:31 UTC (permalink / raw)
To: linux-omap, linux-arm-kernel, oprofile-list
Hi,
I am trying to get the latest IRQ registers from a timer or a work queue but I
am running into problems:
- get_irq_regs() returns NULL in some cases, so it is unsuable and even causes
crash when trying to get the registers values from the returned ptr
- I never get user space registers, only kernel
The use case is that the performance unit (PMNC) of the Cortex A8 has some
serious bug, in short the performance counters overflow IRQ is to be avoided.
The solution I am implementing is to read and reset the counters from a work
queue that is triggered by a timer.
Some questions:
- is there a way to get the last 'real' IRQ registers from a timer or work
queue handler?
- is there some other way to do it?
Any thoughts?
Thanks & regards,
Jean
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: get_irq_regs() from soft IRQ
2009-06-29 14:31 get_irq_regs() from soft IRQ Jean Pihet
@ 2009-06-29 15:19 ` Russell King - ARM Linux
2009-06-29 15:35 ` Jean Pihet
2009-06-29 16:36 ` Siarhei Siamashka
1 sibling, 1 reply; 16+ messages in thread
From: Russell King - ARM Linux @ 2009-06-29 15:19 UTC (permalink / raw)
To: Jean Pihet; +Cc: linux-omap, linux-arm-kernel, oprofile-list
On Mon, Jun 29, 2009 at 04:31:18PM +0200, Jean Pihet wrote:
> I am trying to get the latest IRQ registers from a timer or a work queue
> but I am running into problems:
> - get_irq_regs() returns NULL in some cases,
It will always return NULL outside of IRQ context - and only returns valid
pointers when used inside IRQ context.
It's one of these things that nests itself - when you have several IRQs
being processed on one CPU, there are several register contexts saved,
and get_irq_regs() returns the most recent one.
> The use case is that the performance unit (PMNC) of the Cortex A8 has some
> serious bug, in short the performance counters overflow IRQ is to be avoided.
I don't follow. None of the PMNC support code in the mainline kernel
uses get_irq_regs() outside of IRQ context.
> Some questions:
> - is there a way to get the last 'real' IRQ registers from a timer or work
> queue handler?
No. Outside of IRQ events, the saved IRQ context does not exist.
-------------------------------------------------------------------
List admin: http://lists.arm.linux.org.uk/mailman/listinfo/linux-arm-kernel
FAQ: http://www.arm.linux.org.uk/mailinglists/faq.php
Etiquette: http://www.arm.linux.org.uk/mailinglists/etiquette.php
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: get_irq_regs() from soft IRQ
2009-06-29 15:19 ` Russell King - ARM Linux
@ 2009-06-29 15:35 ` Jean Pihet
2009-06-29 16:07 ` Russell King - ARM Linux
0 siblings, 1 reply; 16+ messages in thread
From: Jean Pihet @ 2009-06-29 15:35 UTC (permalink / raw)
To: Russell King - ARM Linux; +Cc: linux-omap, linux-arm-kernel, oprofile-list
On Monday 29 June 2009 17:19:31 Russell King - ARM Linux wrote:
> On Mon, Jun 29, 2009 at 04:31:18PM +0200, Jean Pihet wrote:
> > I am trying to get the latest IRQ registers from a timer or a work queue
> > but I am running into problems:
> > - get_irq_regs() returns NULL in some cases,
>
> It will always return NULL outside of IRQ context - and only returns valid
> pointers when used inside IRQ context.
Ok got it.
> It's one of these things that nests itself - when you have several IRQs
> being processed on one CPU, there are several register contexts saved,
> and get_irq_regs() returns the most recent one.
>
> > The use case is that the performance unit (PMNC) of the Cortex A8 has
> > some serious bug, in short the performance counters overflow IRQ is to be
> > avoided.
>
> I don't follow. None of the PMNC support code in the mainline kernel
> uses get_irq_regs() outside of IRQ context.
That is correct. The Cortex A8 needs some special treatment.
The errata says that if the counters are overflowing at the same time as a
coprocessor access is performed, the perf unit gets reset and/or locks up. In
short the counters overflow is to be avoided and so the PMNC IRQ.
> > Some questions:
> > - is there a way to get the last 'real' IRQ registers from a timer or
> > work queue handler?
>
> No. Outside of IRQ events, the saved IRQ context does not exist.
Ok. I wonder how to implement it correctly from here.
The ultimate goal is to feed the registers to oprofile for statistics
gathering (mostly the PC). I do not see much benefit from oprofile without
the PC statistics.
Thanks,
Jean
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: get_irq_regs() from soft IRQ
2009-06-29 15:35 ` Jean Pihet
@ 2009-06-29 16:07 ` Russell King - ARM Linux
2009-06-29 16:12 ` Jean Pihet
0 siblings, 1 reply; 16+ messages in thread
From: Russell King - ARM Linux @ 2009-06-29 16:07 UTC (permalink / raw)
To: Jean Pihet; +Cc: linux-omap, linux-arm-kernel, oprofile-list
On Mon, Jun 29, 2009 at 05:35:37PM +0200, Jean Pihet wrote:
> On Monday 29 June 2009 17:19:31 Russell King - ARM Linux wrote:
> > It's one of these things that nests itself - when you have several IRQs
> > being processed on one CPU, there are several register contexts saved,
> > and get_irq_regs() returns the most recent one.
> >
> > > The use case is that the performance unit (PMNC) of the Cortex A8 has
> > > some serious bug, in short the performance counters overflow IRQ is to be
> > > avoided.
> >
> > I don't follow. None of the PMNC support code in the mainline kernel
> > uses get_irq_regs() outside of IRQ context.
>
> That is correct. The Cortex A8 needs some special treatment.
> The errata says that if the counters are overflowing at the same time as a
> coprocessor access is performed, the perf unit gets reset and/or locks up. In
> short the counters overflow is to be avoided and so the PMNC IRQ.
Are you talking about 628216?
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: get_irq_regs() from soft IRQ
2009-06-29 16:07 ` Russell King - ARM Linux
@ 2009-06-29 16:12 ` Jean Pihet
0 siblings, 0 replies; 16+ messages in thread
From: Jean Pihet @ 2009-06-29 16:12 UTC (permalink / raw)
To: Russell King - ARM Linux; +Cc: linux-omap, linux-arm-kernel, oprofile-list
On Monday 29 June 2009 18:07:44 Russell King - ARM Linux wrote:
> On Mon, Jun 29, 2009 at 05:35:37PM +0200, Jean Pihet wrote:
> > On Monday 29 June 2009 17:19:31 Russell King - ARM Linux wrote:
> > > It's one of these things that nests itself - when you have several IRQs
> > > being processed on one CPU, there are several register contexts saved,
> > > and get_irq_regs() returns the most recent one.
> > >
> > > > The use case is that the performance unit (PMNC) of the Cortex A8 has
> > > > some serious bug, in short the performance counters overflow IRQ is
> > > > to be avoided.
> > >
> > > I don't follow. None of the PMNC support code in the mainline kernel
> > > uses get_irq_regs() outside of IRQ context.
> >
> > That is correct. The Cortex A8 needs some special treatment.
> > The errata says that if the counters are overflowing at the same time as
> > a coprocessor access is performed, the perf unit gets reset and/or locks
> > up. In short the counters overflow is to be avoided and so the PMNC IRQ.
>
> Are you talking about 628216?
Yes that is the one. Sorry not to mention it sooner.
Jean
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: get_irq_regs() from soft IRQ
2009-06-29 14:31 get_irq_regs() from soft IRQ Jean Pihet
2009-06-29 15:19 ` Russell King - ARM Linux
@ 2009-06-29 16:36 ` Siarhei Siamashka
2009-06-29 16:58 ` Jean Pihet
2009-06-29 17:37 ` Russell King - ARM Linux
1 sibling, 2 replies; 16+ messages in thread
From: Siarhei Siamashka @ 2009-06-29 16:36 UTC (permalink / raw)
To: ext Jean Pihet
Cc: linux-omap@vger.kernel.org,
linux-arm-kernel@lists.arm.linux.org.uk,
oprofile-list@lists.sourceforge.net
On Monday 29 June 2009 17:31:18 ext Jean Pihet wrote:
> Hi,
>
> I am trying to get the latest IRQ registers from a timer or a work queue
> but I am running into problems:
> - get_irq_regs() returns NULL in some cases, so it is unsuable and even
> causes crash when trying to get the registers values from the returned ptr
> - I never get user space registers, only kernel
>
> The use case is that the performance unit (PMNC) of the Cortex A8 has some
> serious bug, in short the performance counters overflow IRQ is to be
> avoided. The solution I am implementing is to read and reset the counters
> from a work queue that is triggered by a timer.
Regarding this oprofile related part. I wonder how you can get oprofile
working properly (providing non-bogus results) without performance
counters overflow IRQ generation?
Are you trying to implement (in a clean way) something similar to
http://marc.info/?l=oprofile-list&m=123688347009580&w=2
Or is it going to be a different workaround?
--
Best regards,
Siarhei Siamashka
------------------------------------------------------------------------------
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: get_irq_regs() from soft IRQ
2009-06-29 16:36 ` Siarhei Siamashka
@ 2009-06-29 16:58 ` Jean Pihet
2009-06-29 17:46 ` Russell King - ARM Linux
2009-06-29 17:54 ` Siarhei Siamashka
2009-06-29 17:37 ` Russell King - ARM Linux
1 sibling, 2 replies; 16+ messages in thread
From: Jean Pihet @ 2009-06-29 16:58 UTC (permalink / raw)
To: Siarhei Siamashka
Cc: linux-omap@vger.kernel.org,
linux-arm-kernel@lists.arm.linux.org.uk,
oprofile-list@lists.sourceforge.net
Hi Siarhei Siamashka,
On Monday 29 June 2009 18:36:57 Siarhei Siamashka wrote:
> On Monday 29 June 2009 17:31:18 ext Jean Pihet wrote:
> > Hi,
> >
> > I am trying to get the latest IRQ registers from a timer or a work queue
> > but I am running into problems:
> > - get_irq_regs() returns NULL in some cases, so it is unsuable and even
> > causes crash when trying to get the registers values from the returned
> > ptr - I never get user space registers, only kernel
> >
> > The use case is that the performance unit (PMNC) of the Cortex A8 has
> > some serious bug, in short the performance counters overflow IRQ is to be
> > avoided. The solution I am implementing is to read and reset the counters
> > from a work queue that is triggered by a timer.
>
> Regarding this oprofile related part. I wonder how you can get oprofile
> working properly (providing non-bogus results) without performance
> counters overflow IRQ generation?
>
> Are you trying to implement (in a clean way) something similar to
> http://marc.info/?l=oprofile-list&m=123688347009580&w=2
>
> Or is it going to be a different workaround?
I am trying to get a different approach, starting from the errata description.
The idea is to avoid the counters from overflowing, which could cause a PMNC
unit reset or lock-up (or both).
Here are the implementation details:
- use a timer to read and reset the counters, then fire a work queue
- in the work queue the counters values are converted to oprofile samples
- the proper locking is used to avoid some races between the various tasks
I am nearly done with it but I am now running into problems with PM
(suspend/resume) and get_irq_regs().
What do you think?
How far are you on your side? Did you stress test the solution? Is the PMNC
recovery always successful?
Regards,
Jean
------------------------------------------------------------------------------
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: get_irq_regs() from soft IRQ
2009-06-29 16:36 ` Siarhei Siamashka
2009-06-29 16:58 ` Jean Pihet
@ 2009-06-29 17:37 ` Russell King - ARM Linux
2009-06-29 17:52 ` Jean Pihet
2009-06-29 18:38 ` Siarhei Siamashka
1 sibling, 2 replies; 16+ messages in thread
From: Russell King - ARM Linux @ 2009-06-29 17:37 UTC (permalink / raw)
To: Siarhei Siamashka
Cc: ext Jean Pihet, linux-omap@vger.kernel.org,
linux-arm-kernel@lists.arm.linux.org.uk,
oprofile-list@lists.sourceforge.net
On Mon, Jun 29, 2009 at 07:36:57PM +0300, Siarhei Siamashka wrote:
> On Monday 29 June 2009 17:31:18 ext Jean Pihet wrote:
> > I am trying to get the latest IRQ registers from a timer or a work queue
> > but I am running into problems:
> > - get_irq_regs() returns NULL in some cases, so it is unsuable and even
> > causes crash when trying to get the registers values from the returned ptr
> > - I never get user space registers, only kernel
> >
> > The use case is that the performance unit (PMNC) of the Cortex A8 has some
> > serious bug, in short the performance counters overflow IRQ is to be
> > avoided. The solution I am implementing is to read and reset the counters
> > from a work queue that is triggered by a timer.
>
> Regarding this oprofile related part. I wonder how you can get oprofile
> working properly (providing non-bogus results) without performance
> counters overflow IRQ generation?
I don't think you can - triggering capture on overflow is precisely how
oprofile works.
The erratum talks about polling for overflow. By doing this, you are in
a well defined part of the kernel, which is obviously going to be shown
as a hot path for every counter, thus making oprofile useless for kernel
work.
Deferring the interrupt to a workqueue doesn't resolve the problem either.
The problem has nothing to do with what happens after the interrupt
occurs - it's about interrupts themselves being lost.
I think just accepting that this erratum breaks oprofile is the only
realistic solution. ;(
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: get_irq_regs() from soft IRQ
2009-06-29 16:58 ` Jean Pihet
@ 2009-06-29 17:46 ` Russell King - ARM Linux
2009-06-29 17:57 ` Jean Pihet
2009-06-29 17:54 ` Siarhei Siamashka
1 sibling, 1 reply; 16+ messages in thread
From: Russell King - ARM Linux @ 2009-06-29 17:46 UTC (permalink / raw)
To: Jean Pihet
Cc: Siarhei Siamashka, linux-omap@vger.kernel.org,
linux-arm-kernel@lists.arm.linux.org.uk,
oprofile-list@lists.sourceforge.net
On Mon, Jun 29, 2009 at 06:58:41PM +0200, Jean Pihet wrote:
> I am trying to get a different approach, starting from the errata
> description. The idea is to avoid the counters from overflowing,
> which could cause a PMNC unit reset or lock-up (or both).
But this can't work.
Oprofile essentially works as follows:
You set the number (N) of events you wish to occur between each sample.
When N events have occured, you record the stacktrace and reset the
counter so it fires after another N events.
Now, you could start the counters at zero every time, and then poll them
via a timer. When the counter value is larger than N, you could log a
stacktrace and zero the counter.
However, this suffers one very serious problem - if you're wanting to
measure something at an interval which occurs faster than your timer,
you're going to get misleading results.
You could set the timer to fire at a high rate, but then that's going
to upset things like cache miss, cache hit, etc measurements.
> Here are the implementation details:
> - use a timer to read and reset the counters, then fire a work queue
> - in the work queue the counters values are converted to oprofile samples
> - the proper locking is used to avoid some races between the various tasks
This sounds over complicated. I see no reason for a workqueue to be
involved anywhere near the oprofile sample code.
> I am nearly done with it but I am now running into problems with PM
> (suspend/resume) and get_irq_regs().
You really really really can't use get_irq_regs() outside of IRQ context.
The stored registers just do not exist anymore - they've been overwritten
by whatever exception or system call you're currently in.
You can't create a copy of them - copies will be overwritten on the very
next (nested) interrupt. You don't know which interrupt is the first
interrupt to occur.
I really think that the only option here is to just accept that oprofile
is crucified by this errata.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: get_irq_regs() from soft IRQ
2009-06-29 17:37 ` Russell King - ARM Linux
@ 2009-06-29 17:52 ` Jean Pihet
2009-06-29 18:38 ` Siarhei Siamashka
1 sibling, 0 replies; 16+ messages in thread
From: Jean Pihet @ 2009-06-29 17:52 UTC (permalink / raw)
To: Russell King - ARM Linux
Cc: Siarhei Siamashka, linux-omap@vger.kernel.org,
linux-arm-kernel@lists.arm.linux.org.uk,
oprofile-list@lists.sourceforge.net
On Monday 29 June 2009 19:37:57 Russell King - ARM Linux wrote:
> On Mon, Jun 29, 2009 at 07:36:57PM +0300, Siarhei Siamashka wrote:
> > On Monday 29 June 2009 17:31:18 ext Jean Pihet wrote:
> > > I am trying to get the latest IRQ registers from a timer or a work
> > > queue but I am running into problems:
> > > - get_irq_regs() returns NULL in some cases, so it is unsuable and even
> > > causes crash when trying to get the registers values from the returned
> > > ptr - I never get user space registers, only kernel
> > >
> > > The use case is that the performance unit (PMNC) of the Cortex A8 has
> > > some serious bug, in short the performance counters overflow IRQ is to
> > > be avoided. The solution I am implementing is to read and reset the
> > > counters from a work queue that is triggered by a timer.
> >
> > Regarding this oprofile related part. I wonder how you can get oprofile
> > working properly (providing non-bogus results) without performance
> > counters overflow IRQ generation?
>
> I don't think you can - triggering capture on overflow is precisely how
> oprofile works.
>
> The erratum talks about polling for overflow. By doing this, you are in
> a well defined part of the kernel, which is obviously going to be shown
> as a hot path for every counter, thus making oprofile useless for kernel
> work.
I think it is possible, well if you except the get_irq_regs() problem.
The idea is to read and reset the counters before the overflow, instead of
loading them with a small negative value and waiting for the overflow to
happen.
> Deferring the interrupt to a workqueue doesn't resolve the problem either.
> The problem has nothing to do with what happens after the interrupt
> occurs - it's about interrupts themselves being lost.
The errata is about a lost event and/or a lock-up of the PMNC unit at the time
of overflow.
> I think just accepting that this erratum breaks oprofile is the only
> realistic solution. ;(
Completely agree. However it would be nice to have a workaround, as un-elegant
as it can be ;(
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: get_irq_regs() from soft IRQ
2009-06-29 16:58 ` Jean Pihet
2009-06-29 17:46 ` Russell King - ARM Linux
@ 2009-06-29 17:54 ` Siarhei Siamashka
2009-06-29 18:08 ` Jean Pihet
1 sibling, 1 reply; 16+ messages in thread
From: Siarhei Siamashka @ 2009-06-29 17:54 UTC (permalink / raw)
To: ext Jean Pihet
Cc: linux-omap@vger.kernel.org,
linux-arm-kernel@lists.arm.linux.org.uk,
oprofile-list@lists.sourceforge.net
On Monday 29 June 2009 19:58:41 ext Jean Pihet wrote:
> Hi Siarhei Siamashka,
>
> On Monday 29 June 2009 18:36:57 Siarhei Siamashka wrote:
> > On Monday 29 June 2009 17:31:18 ext Jean Pihet wrote:
> > > Hi,
> > >
> > > I am trying to get the latest IRQ registers from a timer or a work
> > > queue but I am running into problems:
> > > - get_irq_regs() returns NULL in some cases, so it is unsuable and even
> > > causes crash when trying to get the registers values from the returned
> > > ptr - I never get user space registers, only kernel
> > >
> > > The use case is that the performance unit (PMNC) of the Cortex A8 has
> > > some serious bug, in short the performance counters overflow IRQ is to
> > > be avoided. The solution I am implementing is to read and reset the
> > > counters from a work queue that is triggered by a timer.
> >
> > Regarding this oprofile related part. I wonder how you can get oprofile
> > working properly (providing non-bogus results) without performance
> > counters overflow IRQ generation?
> >
> > Are you trying to implement (in a clean way) something similar to
> > http://marc.info/?l=oprofile-list&m=123688347009580&w=2
> >
> > Or is it going to be a different workaround?
>
> I am trying to get a different approach, starting from the errata
> description. The idea is to avoid the counters from overflowing, which
> could cause a PMNC unit reset or lock-up (or both).
>
> Here are the implementation details:
> - use a timer to read and reset the counters, then fire a work queue
> - in the work queue the counters values are converted to oprofile samples
> - the proper locking is used to avoid some races between the various tasks
>
> I am nearly done with it but I am now running into problems with PM
> (suspend/resume) and get_irq_regs().
>
> What do you think?
Russel was the first to reply :)
But we also discussed this "hybrid model" some time ago, and there is a clear
counterexample where it fails:
http://www.nabble.com/Re%3A--PATCH-0-1--OMAP-gptimer-based-event-monitor-driver-for-oprofile-p21374285.html
> How far are you on your side? Did you stress test the solution? Is the PMNC
> recovery always successful?
I ended up just using a timer with high frequency of samples generation. it
works without hassle and is sufficient for the majority of cases.
--
Best regards,
Siarhei Siamashka
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: get_irq_regs() from soft IRQ
2009-06-29 17:46 ` Russell King - ARM Linux
@ 2009-06-29 17:57 ` Jean Pihet
0 siblings, 0 replies; 16+ messages in thread
From: Jean Pihet @ 2009-06-29 17:57 UTC (permalink / raw)
To: Russell King - ARM Linux
Cc: Siarhei Siamashka, linux-omap@vger.kernel.org,
linux-arm-kernel@lists.arm.linux.org.uk,
oprofile-list@lists.sourceforge.net
On Monday 29 June 2009 19:46:33 Russell King - ARM Linux wrote:
> On Mon, Jun 29, 2009 at 06:58:41PM +0200, Jean Pihet wrote:
> > I am trying to get a different approach, starting from the errata
> > description. The idea is to avoid the counters from overflowing,
> > which could cause a PMNC unit reset or lock-up (or both).
>
> But this can't work.
>
> Oprofile essentially works as follows:
>
> You set the number (N) of events you wish to occur between each sample.
> When N events have occured, you record the stacktrace and reset the
> counter so it fires after another N events.
>
> Now, you could start the counters at zero every time, and then poll them
> via a timer. When the counter value is larger than N, you could log a
> stacktrace and zero the counter.
>
> However, this suffers one very serious problem - if you're wanting to
> measure something at an interval which occurs faster than your timer,
> you're going to get misleading results.
The counters are 32-bit wide and the maximum counting frequency is 2 events
per cycle (cf. errata). That means you get plenty of time before the counters
overflow.
> You could set the timer to fire at a high rate, but then that's going
> to upset things like cache miss, cache hit, etc measurements.
Correct.
You need a tradeoff for the timer period.
> > Here are the implementation details:
> > - use a timer to read and reset the counters, then fire a work queue
> > - in the work queue the counters values are converted to oprofile samples
> > - the proper locking is used to avoid some races between the various
> > tasks
>
> This sounds over complicated.
It is ;p
> I see no reason for a workqueue to be
> involved anywhere near the oprofile sample code.
Got it.
> > I am nearly done with it but I am now running into problems with PM
> > (suspend/resume) and get_irq_regs().
>
> You really really really can't use get_irq_regs() outside of IRQ context.
> The stored registers just do not exist anymore - they've been overwritten
> by whatever exception or system call you're currently in.
>
> You can't create a copy of them - copies will be overwritten on the very
> next (nested) interrupt. You don't know which interrupt is the first
> interrupt to occur.
Doh!
> I really think that the only option here is to just accept that oprofile
> is crucified by this errata.
Amen!
Thanks,
Jean
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: get_irq_regs() from soft IRQ
2009-06-29 17:54 ` Siarhei Siamashka
@ 2009-06-29 18:08 ` Jean Pihet
0 siblings, 0 replies; 16+ messages in thread
From: Jean Pihet @ 2009-06-29 18:08 UTC (permalink / raw)
To: Siarhei Siamashka
Cc: linux-omap@vger.kernel.org,
linux-arm-kernel@lists.arm.linux.org.uk,
oprofile-list@lists.sourceforge.net
On Monday 29 June 2009 19:54:23 Siarhei Siamashka wrote:
> On Monday 29 June 2009 19:58:41 ext Jean Pihet wrote:
> > Hi Siarhei Siamashka,
> >
> > On Monday 29 June 2009 18:36:57 Siarhei Siamashka wrote:
> > > On Monday 29 June 2009 17:31:18 ext Jean Pihet wrote:
> > > > Hi,
> > > >
> > > > I am trying to get the latest IRQ registers from a timer or a work
> > > > queue but I am running into problems:
> > > > - get_irq_regs() returns NULL in some cases, so it is unsuable and
> > > > even causes crash when trying to get the registers values from the
> > > > returned ptr - I never get user space registers, only kernel
> > > >
> > > > The use case is that the performance unit (PMNC) of the Cortex A8 has
> > > > some serious bug, in short the performance counters overflow IRQ is
> > > > to be avoided. The solution I am implementing is to read and reset
> > > > the counters from a work queue that is triggered by a timer.
> > >
> > > Regarding this oprofile related part. I wonder how you can get oprofile
> > > working properly (providing non-bogus results) without performance
> > > counters overflow IRQ generation?
> > >
> > > Are you trying to implement (in a clean way) something similar to
> > > http://marc.info/?l=oprofile-list&m=123688347009580&w=2
> > >
> > > Or is it going to be a different workaround?
> >
> > I am trying to get a different approach, starting from the errata
> > description. The idea is to avoid the counters from overflowing, which
> > could cause a PMNC unit reset or lock-up (or both).
> >
> > Here are the implementation details:
> > - use a timer to read and reset the counters, then fire a work queue
> > - in the work queue the counters values are converted to oprofile samples
> > - the proper locking is used to avoid some races between the various
> > tasks
> >
> > I am nearly done with it but I am now running into problems with PM
> > (suspend/resume) and get_irq_regs().
> >
> > What do you think?
>
> Russel was the first to reply :)
>
> But we also discussed this "hybrid model" some time ago, and there is a
> clear counterexample where it fails:
> http://www.nabble.com/Re%3A--PATCH-0-1--OMAP-gptimer-based-event-monitor-dr
>iver-for-oprofile-p21374285.html
All right, sorry I was not aware of that discussion. So the PMNC unit is
broken beyond repair. BTW good description and test results!
> > How far are you on your side? Did you stress test the solution? Is the
> > PMNC recovery always successful?
>
> I ended up just using a timer with high frequency of samples generation. it
> works without hassle and is sufficient for the majority of cases.
Ok. It looks like it is the best we can do.
Thanks,
Jean
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: get_irq_regs() from soft IRQ
2009-06-29 17:37 ` Russell King - ARM Linux
2009-06-29 17:52 ` Jean Pihet
@ 2009-06-29 18:38 ` Siarhei Siamashka
2009-06-29 18:49 ` Jean Pihet
1 sibling, 1 reply; 16+ messages in thread
From: Siarhei Siamashka @ 2009-06-29 18:38 UTC (permalink / raw)
To: ext Russell King - ARM Linux
Cc: linux-omap@vger.kernel.org, oprofile-list@lists.sourceforge.net,
linux-arm-kernel@lists.arm.linux.org.uk
On Monday 29 June 2009 20:37:57 ext Russell King - ARM Linux wrote:
> On Mon, Jun 29, 2009 at 07:36:57PM +0300, Siarhei Siamashka wrote:
> > On Monday 29 June 2009 17:31:18 ext Jean Pihet wrote:
> > > I am trying to get the latest IRQ registers from a timer or a work
> > > queue but I am running into problems:
> > > - get_irq_regs() returns NULL in some cases, so it is unsuable and even
> > > causes crash when trying to get the registers values from the returned
> > > ptr - I never get user space registers, only kernel
> > >
> > > The use case is that the performance unit (PMNC) of the Cortex A8 has
> > > some serious bug, in short the performance counters overflow IRQ is to
> > > be avoided. The solution I am implementing is to read and reset the
> > > counters from a work queue that is triggered by a timer.
> >
> > Regarding this oprofile related part. I wonder how you can get oprofile
> > working properly (providing non-bogus results) without performance
> > counters overflow IRQ generation?
>
> I don't think you can - triggering capture on overflow is precisely how
> oprofile works.
>
> The erratum talks about polling for overflow. By doing this, you are in
> a well defined part of the kernel, which is obviously going to be shown
> as a hot path for every counter, thus making oprofile useless for kernel
> work.
>
> Deferring the interrupt to a workqueue doesn't resolve the problem either.
> The problem has nothing to do with what happens after the interrupt
> occurs - it's about interrupts themselves being lost.
>
> I think just accepting that this erratum breaks oprofile is the only
> realistic solution. ;(
I also thought about the same initially. But the problem still looks like it
can be workarounded, admittedly in quite a dirty way.
We just need to use not a periodic timer, but kind of a watchdog (this can be
implemented with OMAP GPTIMER).
As long as PMU interrupts are coming fast, watchdog is frequently reset and
never shows up anywhere. Everything is working nice.
Now if PMU gets broken, watchdog gets triggered eventually and recovers PMU
state. As PMU could get broken something like 10 times per second in the worst
case in my experiments, having ~10 ms for a watchdog trigger period seemed to
be a reasonable empirical value. So in this conditions, PMU will be in a
nonworking state approximately less than 10% of the time in the worst
practical case. Not very nice, but not completely ugly either.
Another problematic condition is when PMU is fine, but is not generating
events naturally (for example we have configured it for cache misses, but are
burning cpu in a loop which is not accessing memory at all). In this case a
watchdog will be triggered periodically for no reason, generating the "noise"
in profiling statistics. This noise needs to be filtered out, and seems like
it is possible to do it. The trick is to reset watchdog counter to a lower
value than it is typically reset in PMU IRQ handler. This way, whenever PMU
interrupt is generated, we check if watchdog counter is below the normal
threshold. If it is lower, then we know that watchdog interrupt was triggered
recently and this sample can be ignored. The difference between normal
watchdog counter reset value and the value which gets set on watchdog
interrupts should provide sufficient time to get out of the watchdog interrupt
handler and its related code, so that it does not show up in statistics that
much.
A working proof of concept patch was submitted there:
http://groups.google.com/group/beagleboard/msg/dd361f3b43fdeff0
Sorry for not posting it to one of the kernel mailing lists, but I thought
that beagleboard mailing list was a good place to find users who may
want to try it and evaluate if it has any practical value. Maybe it was not a
very wise decision.
Unfortunately I'm not a kernel hacker and cleaning up the patch may take
too much time and efforts, taking into account my current knowledge. I would
be happy if somebody else with more hands-on kernel experience could make a
clean and usable Cortex-A8 PMU workaround. I don't care about getting some
part of credit for it or not, the end result is more important :)
One of the obvious problems with the patch (other than race conditions) is
that it is using OMAP-specific GPTIMER. Is there something more portable in
the kernel to provide similar functionality? Or are there any Cortex-A8 r1
cores other than OMAP3 in the wild?
--
Best regards,
Siarhei Siamashka
------------------------------------------------------------------------------
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: get_irq_regs() from soft IRQ
2009-06-29 18:38 ` Siarhei Siamashka
@ 2009-06-29 18:49 ` Jean Pihet
2009-06-29 19:45 ` Siarhei Siamashka
0 siblings, 1 reply; 16+ messages in thread
From: Jean Pihet @ 2009-06-29 18:49 UTC (permalink / raw)
To: Siarhei Siamashka
Cc: ext Russell King - ARM Linux, linux-omap@vger.kernel.org,
linux-arm-kernel@lists.arm.linux.org.uk,
oprofile-list@lists.sourceforge.net
On Monday 29 June 2009 20:38:59 Siarhei Siamashka wrote:
> On Monday 29 June 2009 20:37:57 ext Russell King - ARM Linux wrote:
> > On Mon, Jun 29, 2009 at 07:36:57PM +0300, Siarhei Siamashka wrote:
> > > On Monday 29 June 2009 17:31:18 ext Jean Pihet wrote:
> > > > I am trying to get the latest IRQ registers from a timer or a work
> > > > queue but I am running into problems:
> > > > - get_irq_regs() returns NULL in some cases, so it is unsuable and
> > > > even causes crash when trying to get the registers values from the
> > > > returned ptr - I never get user space registers, only kernel
> > > >
> > > > The use case is that the performance unit (PMNC) of the Cortex A8 has
> > > > some serious bug, in short the performance counters overflow IRQ is
> > > > to be avoided. The solution I am implementing is to read and reset
> > > > the counters from a work queue that is triggered by a timer.
> > >
> > > Regarding this oprofile related part. I wonder how you can get oprofile
> > > working properly (providing non-bogus results) without performance
> > > counters overflow IRQ generation?
> >
> > I don't think you can - triggering capture on overflow is precisely how
> > oprofile works.
> >
> > The erratum talks about polling for overflow. By doing this, you are in
> > a well defined part of the kernel, which is obviously going to be shown
> > as a hot path for every counter, thus making oprofile useless for kernel
> > work.
> >
> > Deferring the interrupt to a workqueue doesn't resolve the problem
> > either. The problem has nothing to do with what happens after the
> > interrupt occurs - it's about interrupts themselves being lost.
> >
> > I think just accepting that this erratum breaks oprofile is the only
> > realistic solution. ;(
>
> I also thought about the same initially. But the problem still looks like
> it can be workarounded, admittedly in quite a dirty way.
>
> We just need to use not a periodic timer, but kind of a watchdog (this can
> be implemented with OMAP GPTIMER).
>
> As long as PMU interrupts are coming fast, watchdog is frequently reset and
> never shows up anywhere. Everything is working nice.
>
> Now if PMU gets broken, watchdog gets triggered eventually and recovers PMU
> state. As PMU could get broken something like 10 times per second in the
> worst case in my experiments, having ~10 ms for a watchdog trigger period
> seemed to be a reasonable empirical value. So in this conditions, PMU
> will be in a nonworking state approximately less than 10% of the time in
> the worst practical case. Not very nice, but not completely ugly either.
The accuracy is not very good.
> Another problematic condition is when PMU is fine, but is not generating
> events naturally (for example we have configured it for cache misses, but
> are burning cpu in a loop which is not accessing memory at all). In this
> case a watchdog will be triggered periodically for no reason, generating
> the "noise" in profiling statistics. This noise needs to be filtered out,
> and seems like it is possible to do it. The trick is to reset watchdog
> counter to a lower value than it is typically reset in PMU IRQ handler.
> This way, whenever PMU interrupt is generated, we check if watchdog counter
> is below the normal threshold. If it is lower, then we know that watchdog
> interrupt was triggered recently and this sample can be ignored. The
> difference between normal watchdog counter reset value and the value which
> gets set on watchdog interrupts should provide sufficient time to get out
> of the watchdog interrupt handler and its related code, so that it does not
> show up in statistics that much.
>
> A working proof of concept patch was submitted there:
> http://groups.google.com/group/beagleboard/msg/dd361f3b43fdeff0
> Sorry for not posting it to one of the kernel mailing lists, but I thought
> that beagleboard mailing list was a good place to find users who may
> want to try it and evaluate if it has any practical value. Maybe it was not
> a very wise decision.
>
> Unfortunately I'm not a kernel hacker and cleaning up the patch may take
> too much time and efforts, taking into account my current knowledge. I
> would be happy if somebody else with more hands-on kernel experience could
> make a clean and usable Cortex-A8 PMU workaround. I don't care about
> getting some part of credit for it or not, the end result is more important
> :)
I am ok to help
> One of the obvious problems with the patch (other than race conditions) is
> that it is using OMAP-specific GPTIMER. Is there something more portable in
> the kernel to provide similar functionality? Or are there any Cortex-A8 r1
> cores other than OMAP3 in the wild?
You can use a 'struct timer_list' and the setup_timer, mod_timer,
del_timer_sync. Another API is the hight resolution timers (HRT) but I do not
think we need such a high precision timer here.
Jean
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: get_irq_regs() from soft IRQ
2009-06-29 18:49 ` Jean Pihet
@ 2009-06-29 19:45 ` Siarhei Siamashka
0 siblings, 0 replies; 16+ messages in thread
From: Siarhei Siamashka @ 2009-06-29 19:45 UTC (permalink / raw)
To: ext Jean Pihet
Cc: linux-omap@vger.kernel.org, ext Russell King - ARM Linux,
linux-arm-kernel@lists.arm.linux.org.uk,
oprofile-list@lists.sourceforge.net
On Monday 29 June 2009 21:49:59 ext Jean Pihet wrote:
[...]
> > We just need to use not a periodic timer, but kind of a watchdog (this
> > can be implemented with OMAP GPTIMER).
> >
> > As long as PMU interrupts are coming fast, watchdog is frequently reset
> > and never shows up anywhere. Everything is working nice.
> >
> > Now if PMU gets broken, watchdog gets triggered eventually and recovers
> > PMU state. As PMU could get broken something like 10 times per second in
> > the worst case in my experiments, having ~10 ms for a watchdog trigger
> > period seemed to be a reasonable empirical value. So in this
> > conditions, PMU will be in a nonworking state approximately less than 10%
> > of the time in the worst practical case. Not very nice, but not
> > completely ugly either.
>
> The accuracy is not very good.
Yes, but it is the worst case. In "normal" case when PMU not broken or very
rarely broken, the statistics would be quite good. One of the reasons of
dropping working on this patch was also the fact that in some cases Cortex-A8
PMU even works reliable enough :) Adding some suspicious weird extra logic may
be not very desired by the people, who are quite satisfied even with the
current oprofile state on Cortex-A8 chips (numbercrunching applications with
relatively low number of syscalls and hence rarely touching any coprocessor
registers, are mostly unaffected).
Some adaptive watchdog trigger period may be better (try to predict when the
next PMU interrupt is going to normally happen and tune watchdog timeout at
runtime), but also may be more complex and may theoretically still misbehave
in some cases.
> > Another problematic condition is when PMU is fine, but is not generating
> > events naturally (for example we have configured it for cache misses, but
> > are burning cpu in a loop which is not accessing memory at all). In this
> > case a watchdog will be triggered periodically for no reason, generating
> > the "noise" in profiling statistics. This noise needs to be filtered out,
> > and seems like it is possible to do it. The trick is to reset watchdog
> > counter to a lower value than it is typically reset in PMU IRQ handler.
> > This way, whenever PMU interrupt is generated, we check if watchdog
> > counter is below the normal threshold. If it is lower, then we know that
> > watchdog interrupt was triggered recently and this sample can be ignored.
> > The difference between normal watchdog counter reset value and the value
> > which gets set on watchdog interrupts should provide sufficient time to
> > get out of the watchdog interrupt handler and its related code, so that
> > it does not show up in statistics that much.
And forgot to mention here, very low frequency events (with frequency lower
than the frequency of watchdog) may be quite problematic and still distort the
statistics because they will be filtered out. Tuning all the magic values may
turn out to be a hell.
But at the very least, all the watchdog interrupts (both false alarms and real
cases of PMU breakage) can be counted and taken into account. This statistics
could be somehow reported to the user, so that (s)he would make a decision
if the final profiling statistics can be trusted and for how much time the PMU
was actually broken.
> > A working proof of concept patch was submitted there:
> > http://groups.google.com/group/beagleboard/msg/dd361f3b43fdeff0
> > Sorry for not posting it to one of the kernel mailing lists, but I
> > thought that beagleboard mailing list was a good place to find users who
> > may want to try it and evaluate if it has any practical value. Maybe it
> > was not a very wise decision.
> >
> > Unfortunately I'm not a kernel hacker and cleaning up the patch may take
> > too much time and efforts, taking into account my current knowledge. I
> > would be happy if somebody else with more hands-on kernel experience
> > could make a clean and usable Cortex-A8 PMU workaround. I don't care
> > about getting some part of credit for it or not, the end result is more
> > important
> >
> > :)
>
> I am ok to help
>
> > One of the obvious problems with the patch (other than race conditions)
> > is that it is using OMAP-specific GPTIMER. Is there something more
> > portable in the kernel to provide similar functionality? Or are there any
> > Cortex-A8 r1 cores other than OMAP3 in the wild?
>
> You can use a 'struct timer_list' and the setup_timer, mod_timer,
> del_timer_sync. Another API is the hight resolution timers (HRT) but I do
> not think we need such a high precision timer here.
Thanks
--
Best regards,
Siarhei Siamashka
------------------------------------------------------------------------------
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2009-06-29 19:45 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-06-29 14:31 get_irq_regs() from soft IRQ Jean Pihet
2009-06-29 15:19 ` Russell King - ARM Linux
2009-06-29 15:35 ` Jean Pihet
2009-06-29 16:07 ` Russell King - ARM Linux
2009-06-29 16:12 ` Jean Pihet
2009-06-29 16:36 ` Siarhei Siamashka
2009-06-29 16:58 ` Jean Pihet
2009-06-29 17:46 ` Russell King - ARM Linux
2009-06-29 17:57 ` Jean Pihet
2009-06-29 17:54 ` Siarhei Siamashka
2009-06-29 18:08 ` Jean Pihet
2009-06-29 17:37 ` Russell King - ARM Linux
2009-06-29 17:52 ` Jean Pihet
2009-06-29 18:38 ` Siarhei Siamashka
2009-06-29 18:49 ` Jean Pihet
2009-06-29 19:45 ` Siarhei Siamashka
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox