* interrupt latency while resuming.
@ 2011-01-08 16:58 Abhijeet Dharmapurikar
2011-01-08 17:13 ` Russell King - ARM Linux
0 siblings, 1 reply; 4+ messages in thread
From: Abhijeet Dharmapurikar @ 2011-01-08 16:58 UTC (permalink / raw)
To: tglx
Cc: Rafael J. Wysocki, Andrew Morton, Russell King,
Greg Kroah-Hartman, Alan Stern, linux-kernel, linux-arm-msm
I am trying to address an issue of not handling a wakeup interrupt quick
enough while resuming. It is an edge triggered interrupt with the
IRQ_WAKEUP flag set. The interrupt controller implements lazy disabling
of interrupts, IOW does not have a disable callback in the irq_chip.
So while going in to supend that interrupt is marked IRQ_DISABLED in
dpm_suspend_noirq().
On resume handle_edge_trigger is run right after
arch_suspend_enable_irqs(). It finds the interrupt marked IRQ_DISABLED
and it sets the IRQ_PENDING flag and does not call the handler.
As the resume path unrolls, non boot cpus are enabled,
dpm_resume_noirq() is run. At that time it finds the IRQ_PENDING flag is
set on this interrupt and the interrupt handler is run.
The problem is, this is very late for the interrupt to be run. Possibly
because enable_nonboot_cpu takes a while or the resume_noirq callbacks
take a long time etc.
I tried using IRQF_NO_SUSPEND for that interrupt and it seems the
interrupt is handled as soon as arch_suspend_enable_irqs() is run.
However I suspect that with this the system will fail to abort suspend -
The interrupt could trigger between dpm_suspend_noirq() and
arch_suspend_disable_irqs() and since it wont be marked IRQF_PENDING the
system goes to suspend never to be woken up again. (the check to abort
suspend because of a pending interrupt is done in check_wakeup_irqs() in
sysdev_suspend). I dont think IRQF_NO_SUSPEND was designed for wakeup
interrupts. Please correct me if I am missing something here.
A solution that comes to mind is enabling such interrupts right before
arch_suspend_enable_irqs() is run. In some more detail, mark these
interrupts as IRQF_LOW_SUSPEND_LATENCY in their irq_desc->status and
enable such interrupts before doing arch_suspend_enable_irqs(). That way
when arch_suspend_enable_irqs() happens, the handler is run immediately.
We skip enabling such interrupts in resume_device_irqs() to avoid
enabling them twice.
Will appreciate any other suggestions towards fixing the delay.
Abhijeet
--
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: interrupt latency while resuming.
2011-01-08 16:58 interrupt latency while resuming Abhijeet Dharmapurikar
@ 2011-01-08 17:13 ` Russell King - ARM Linux
2011-01-08 19:09 ` Abhijeet Dharmapurikar
2011-01-11 5:11 ` Stephen Boyd
0 siblings, 2 replies; 4+ messages in thread
From: Russell King - ARM Linux @ 2011-01-08 17:13 UTC (permalink / raw)
To: Abhijeet Dharmapurikar
Cc: tglx, Rafael J. Wysocki, Andrew Morton, Greg Kroah-Hartman,
Alan Stern, linux-kernel, linux-arm-msm
On Sat, Jan 08, 2011 at 08:58:57AM -0800, Abhijeet Dharmapurikar wrote:
>
> I am trying to address an issue of not handling a wakeup interrupt quick
> enough while resuming. It is an edge triggered interrupt with the
> IRQ_WAKEUP flag set. The interrupt controller implements lazy disabling
> of interrupts, IOW does not have a disable callback in the irq_chip.
>
> So while going in to supend that interrupt is marked IRQ_DISABLED in
> dpm_suspend_noirq().
>
> On resume handle_edge_trigger is run right after
> arch_suspend_enable_irqs(). It finds the interrupt marked IRQ_DISABLED
> and it sets the IRQ_PENDING flag and does not call the handler.
>
> As the resume path unrolls, non boot cpus are enabled,
> dpm_resume_noirq() is run. At that time it finds the IRQ_PENDING flag is
> set on this interrupt and the interrupt handler is run.
The reason it's after non-boot CPUs are enabled is that the affinity
settings may require it to be run on one of these CPUs, so we need to
ensure that those CPUs are up.
We know it takes something like 200ms to run the lpg calibration, and
this is responsible for the majority of the time when bringing a CPU
online.
One solution is to sort out whether we need to re-run the lpj calibration
each time we resume a CPU. At the moment, we're just using the boot CPU
loops_per_jiffy for every udelay(), assuming that all CPUs are running
at the same clock rate - so we _could_ skip the calibration as its
never used.
Or we could fix things so that it is used (which also sorts udelay for
asymetrically clocked secondary CPUs) but then we always have to re-run
the calibration, or we have to stipulate that secondary CPUs come up at
the same clock rate as they went down. I suspect there's systems out
there where the latter is not true at present, so it's not something we
can impose - which means we're stuck having to do 200ms of recalibration
per CPU.
Another solution is to check whether we can run the handler before other
CPUs are online - it may mean we're breaking IRQ affinity, but any
non-boot affinity has already been broken by taking the other CPUs
offline.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: interrupt latency while resuming.
2011-01-08 17:13 ` Russell King - ARM Linux
@ 2011-01-08 19:09 ` Abhijeet Dharmapurikar
2011-01-11 5:11 ` Stephen Boyd
1 sibling, 0 replies; 4+ messages in thread
From: Abhijeet Dharmapurikar @ 2011-01-08 19:09 UTC (permalink / raw)
To: Russell King - ARM Linux
Cc: tglx, Rafael J. Wysocki, Andrew Morton, Greg Kroah-Hartman,
Alan Stern, linux-kernel, linux-arm-msm
On 01/08/2011 09:13 AM, Russell King - ARM Linux wrote:
> On Sat, Jan 08, 2011 at 08:58:57AM -0800, Abhijeet Dharmapurikar wrote:
>>
>> I am trying to address an issue of not handling a wakeup interrupt quick
>> enough while resuming. It is an edge triggered interrupt with the
>> IRQ_WAKEUP flag set. The interrupt controller implements lazy disabling
>> of interrupts, IOW does not have a disable callback in the irq_chip.
>>
>> So while going in to supend that interrupt is marked IRQ_DISABLED in
>> dpm_suspend_noirq().
>>
>> On resume handle_edge_trigger is run right after
>> arch_suspend_enable_irqs(). It finds the interrupt marked IRQ_DISABLED
>> and it sets the IRQ_PENDING flag and does not call the handler.
>>
>> As the resume path unrolls, non boot cpus are enabled,
>> dpm_resume_noirq() is run. At that time it finds the IRQ_PENDING flag is
>> set on this interrupt and the interrupt handler is run.
> Another solution is to check whether we can run the handler before other
> CPUs are online - it may mean we're breaking IRQ affinity, but any
> non-boot affinity has already been broken by taking the other CPUs
> offline.
Agree. I think we are breaking affinity anyways by not setting the
interrupt targets to pre disable_nonboot_cpus().
We can say that for interrupts marked IRQF_LOW_SUSPEND_LATENCY we may
not be honoring the affinity. They *will* run on the boot cpu, even if
their affinity is set to others upon resume. Driver authors should
excercise caution wrt to affinity for such interrupts.
For all the other interrupts we can restore the targets to pre
disable_nonboot_cpu state. But this is a different problem.
Do you agree?
I will test the proposal of IRQF_LOW_SUSPEND_LATENCY and push out a
patch soon.
--
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: interrupt latency while resuming.
2011-01-08 17:13 ` Russell King - ARM Linux
2011-01-08 19:09 ` Abhijeet Dharmapurikar
@ 2011-01-11 5:11 ` Stephen Boyd
1 sibling, 0 replies; 4+ messages in thread
From: Stephen Boyd @ 2011-01-11 5:11 UTC (permalink / raw)
To: Russell King - ARM Linux
Cc: Abhijeet Dharmapurikar, tglx, Rafael J. Wysocki, Andrew Morton,
Greg Kroah-Hartman, Alan Stern, linux-kernel, linux-arm-msm
On 01/08/2011 09:13 AM, Russell King - ARM Linux wrote:
>
> We know it takes something like 200ms to run the lpg calibration, and
> this is responsible for the majority of the time when bringing a CPU
> online.
>
> One solution is to sort out whether we need to re-run the lpj calibration
> each time we resume a CPU. At the moment, we're just using the boot CPU
> loops_per_jiffy for every udelay(), assuming that all CPUs are running
> at the same clock rate - so we _could_ skip the calibration as its
> never used.
>
> Or we could fix things so that it is used (which also sorts udelay for
> asymetrically clocked secondary CPUs) but then we always have to re-run
> the calibration, or we have to stipulate that secondary CPUs come up at
> the same clock rate as they went down. I suspect there's systems out
> there where the latter is not true at present, so it's not something we
> can impose - which means we're stuck having to do 200ms of recalibration
> per CPU.
We're actually using a timer based udelay() (which uses
calibrate_delay_direct() instead of the more expensive delay loop) and
thus lpj is calculated in about 50ms. You raise an interesting point
though since we don't need to recalibrate as our timesource is constant.
It would be great to skip the recalibration in such a situation and
speed up onlining CPUs.
--
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2011-01-11 5:11 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-01-08 16:58 interrupt latency while resuming Abhijeet Dharmapurikar
2011-01-08 17:13 ` Russell King - ARM Linux
2011-01-08 19:09 ` Abhijeet Dharmapurikar
2011-01-11 5:11 ` Stephen Boyd
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).