linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [arm arch] local_irq_restore() does not play well with local_fiq_enable()
@ 2014-03-20 19:54 Jonathan Bell
  2014-03-20 20:02 ` Russell King - ARM Linux
  0 siblings, 1 reply; 5+ messages in thread
From: Jonathan Bell @ 2014-03-20 19:54 UTC (permalink / raw)
  To: linux-arm-kernel

Hi.

I believe I've found an edge case where the FIQ enable bit in the CPSR for  
armv6 can get trampled through the use of a combination of the standard  
kernel interfaces and the arch-specific local_fiq_enable/local_fiq_disable  
macros.

We have an out-of-tree set of drivers on our mach-bcm2708 port that result  
in a bad interaction which disables FIQs for extended periods of time.

The below cliffnotes version is for readability, the full set of relevant  
code can be found at:

https://github.com/raspberrypi/linux/blob/rpi-3.10.y-next/drivers/misc/vc04_services/interface/vchiq_arm/
and
https://github.com/raspberrypi/linux/blob/rpi-3.10.y-next/drivers/usb/host/dwc_otg/dwc_otg_hcd_linux.c

In driver #1 (BRCM vchiq messagebox/GPU message passing handler) we do  
this:

vchiq_doorbell_irq:
	- Has the doorbell been poked from the GPU?
	- If yes
		up(vhciq_semaphore)
		return IRQ_HANDLED


in the corresponding "bottom half" we do something a bit silly but  
basically:


vchiq_worker_kthread:
	while (1)
		down_interruptible(vchiq_semaphore)
		process_gpu_messages()
		do_some_other_stuff()


In driver #2 (our dwc_otg/dwc2 implementation)	 we do the following on  
init:

hcd_init:
	setup the FIQ state
	install the fiq handler
	enable_fiq(USB)
	local_fiq_enable()
	usb_add_hcd()

Because the vchiq_worker_thread is waiting on a mailbox interrupt from the  
GPU, it initialises and goes to sleep almost immediately. In  
down_interruptible, there is a sequence of events (if the sem->count is  
<=0) that basically does

local_irq_save(flags)
local_irq_enable()
schedule();
local_irq_disable()
local_irq_restore(flags)

which then continues with the rest of the boot process, and probes USB  
which enables the FIQ.

The problem we get is that on boot, the vchiq services are probed and  
initialised before the USB driver is. A stale flags variable (with F bit  
set) borks things up as far as the FIQ is concerned.

On the first GPU interrupt, the vchiq_doorbell_irq increments the  
semaphore and wakes up the vchiq_worker_thread. Which then promptly  
overwrites the FIQ bit in the CPSR because the flags were saved before the  
FIQ was enabled.

The stale F bit then has the potential to propagate through other threads  
using similar mechanisms and causes no end of trouble to a FIQ handler  
that expects never to be disabled.

In our out-of-tree arch, I've "fixed" this by making local_irq_restore  
never touch the FIQ bit in CPSR.

See
https://github.com/raspberrypi/linux/commit/a0f47344e286768e3ce96268eed1ad0f6cfd9f2c
for the modification to arm/irqflags.h.

Should this be classed as a bug? The root cause is that the same register  
is used to enable priority FIQs as well as normal IRQs - thus the  
save(flags) and restore(flags) mechanisms trample on something that they  
don't know about.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [arm arch] local_irq_restore() does not play well with local_fiq_enable()
  2014-03-20 19:54 [arm arch] local_irq_restore() does not play well with local_fiq_enable() Jonathan Bell
@ 2014-03-20 20:02 ` Russell King - ARM Linux
  2014-03-20 20:49   ` Jonathan Bell
  0 siblings, 1 reply; 5+ messages in thread
From: Russell King - ARM Linux @ 2014-03-20 20:02 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Mar 20, 2014 at 07:54:02PM +0000, Jonathan Bell wrote:
> hcd_init:
> 	setup the FIQ state
> 	install the fiq handler
> 	enable_fiq(USB)
> 	local_fiq_enable()
> 	usb_add_hcd()

You're not supposed to use local_fiq_enable() to enable FIQs in a device
driver - they should already be enabled by this point.

There has been a hole in that where FIQs haven't been enabled in the
idle thread, but that's a bug which needs fixing.  Otherwise, you should
assume that FIQs are always unmasked everywhere except for any short
code sequences that are contained within a short local_fiq_disable()..
local_fiq_enable() block.

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [arm arch] local_irq_restore() does not play well with local_fiq_enable()
  2014-03-20 20:02 ` Russell King - ARM Linux
@ 2014-03-20 20:49   ` Jonathan Bell
  2014-03-20 21:09     ` Russell King - ARM Linux
  0 siblings, 1 reply; 5+ messages in thread
From: Jonathan Bell @ 2014-03-20 20:49 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 20 Mar 2014 20:02:34 -0000, Russell King - ARM Linux  
<linux@arm.linux.org.uk> wrote:

> On Thu, Mar 20, 2014 at 07:54:02PM +0000, Jonathan Bell wrote:
>> hcd_init:
>> 	setup the FIQ state
>> 	install the fiq handler
>> 	enable_fiq(USB)
>> 	local_fiq_enable()
>> 	usb_add_hcd()
>
> You're not supposed to use local_fiq_enable() to enable FIQs in a device
> driver - they should already be enabled by this point.
>
> There has been a hole in that where FIQs haven't been enabled in the
> idle thread, but that's a bug which needs fixing.  Otherwise, you should
> assume that FIQs are always unmasked everywhere except for any short
> code sequences that are contained within a short local_fiq_disable()..
> local_fiq_enable() block.
>

Relying on FIQs being enabled elsewhere is in fact what we used to do  
until ~3.10, whereupon some change (most likely the bug you describe)  
broke this behaviour.

Our use of local_fiq_disable() and local_fiq_enable() are indeed  
constrained to minimally small critical sections (mostly reading/writing a  
single hardware register or state variable) within the dwc_otg driver that  
"handles" the results from the FIQ.

But even if FIQs are enabled before entering the idle thread, any kernel  
thread that sleeps (and yields the CPU) with an irqflags variable that was  
saved before the call to local_fiq_enable has the potential to corrupt the  
F bit when the thread subsequently wakes up.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [arm arch] local_irq_restore() does not play well with local_fiq_enable()
  2014-03-20 20:49   ` Jonathan Bell
@ 2014-03-20 21:09     ` Russell King - ARM Linux
  2014-03-23 18:38       ` Jonathan Bell
  0 siblings, 1 reply; 5+ messages in thread
From: Russell King - ARM Linux @ 2014-03-20 21:09 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Mar 20, 2014 at 08:49:24PM +0000, Jonathan Bell wrote:
> On Thu, 20 Mar 2014 20:02:34 -0000, Russell King - ARM Linux  
> <linux@arm.linux.org.uk> wrote:
>
>> On Thu, Mar 20, 2014 at 07:54:02PM +0000, Jonathan Bell wrote:
>>> hcd_init:
>>> 	setup the FIQ state
>>> 	install the fiq handler
>>> 	enable_fiq(USB)
>>> 	local_fiq_enable()
>>> 	usb_add_hcd()
>>
>> You're not supposed to use local_fiq_enable() to enable FIQs in a device
>> driver - they should already be enabled by this point.
>>
>> There has been a hole in that where FIQs haven't been enabled in the
>> idle thread, but that's a bug which needs fixing.  Otherwise, you should
>> assume that FIQs are always unmasked everywhere except for any short
>> code sequences that are contained within a short local_fiq_disable()..
>> local_fiq_enable() block.
>>
>
> Relying on FIQs being enabled elsewhere is in fact what we used to do  
> until ~3.10, whereupon some change (most likely the bug you describe)  
> broke this behaviour.
>
> Our use of local_fiq_disable() and local_fiq_enable() are indeed  
> constrained to minimally small critical sections (mostly reading/writing 
> a single hardware register or state variable) within the dwc_otg driver 
> that "handles" the results from the FIQ.

Realise that the state of the CPSR is per-process.  So if you're enabling
it in PID 1 when your driver initialises and it isn't enabled in PID 0,
then it will still remain not enabled in PID 0 - and that's a big problem
because that means whenever the system is idle, FIQs will be masked.

Now, what I'm reading in 3.14-rc7 is that:

- there is a local_fiq_enable() in arch_cpu_idle_prepare() which ensures
  that FIQs will be enabled for the idle loop for PID0.

- secondary CPUs have a local_fiq_enable() in secondary_start_kernel()
  which ensures that the have FIQs enabled too.

- when a kernel thread is created, the initial register set is created
  by copy_thread() which sets the CPSR to just 'SVC_MODE', thus clearing
  the IRQ and FIQ mask bits in any spawned thread.

So that all appears to be correct.

> But even if FIQs are enabled before entering the idle thread, any kernel  
> thread that sleeps (and yields the CPU) with an irqflags variable that 
> was saved before the call to local_fiq_enable has the potential to 
> corrupt the F bit when the thread subsequently wakes up.

No, because local_fiq_enable() only ever affects the thread on the CPU
which executed it.

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [arm arch] local_irq_restore() does not play well with local_fiq_enable()
  2014-03-20 21:09     ` Russell King - ARM Linux
@ 2014-03-23 18:38       ` Jonathan Bell
  0 siblings, 0 replies; 5+ messages in thread
From: Jonathan Bell @ 2014-03-23 18:38 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 20 Mar 2014 21:09:41 -0000, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:

> On Thu, Mar 20, 2014 at 08:49:24PM +0000, Jonathan Bell wrote:
>> On Thu, 20 Mar 2014 20:02:34 -0000, Russell King - ARM Linux
>> <linux@arm.linux.org.uk> wrote:
>>
>>> On Thu, Mar 20, 2014 at 07:54:02PM +0000, Jonathan Bell wrote:
>>>> hcd_init:
>>>> 	setup the FIQ state
>>>> 	install the fiq handler
>>>> 	enable_fiq(USB)
>>>> 	local_fiq_enable()
>>>> 	usb_add_hcd()
>>>
>>> You're not supposed to use local_fiq_enable() to enable FIQs in a  
>>> device
>>> driver - they should already be enabled by this point.
>>>
>>> There has been a hole in that where FIQs haven't been enabled in the
>>> idle thread, but that's a bug which needs fixing.  Otherwise, you  
>>> should
>>> assume that FIQs are always unmasked everywhere except for any short
>>> code sequences that are contained within a short local_fiq_disable()..
>>> local_fiq_enable() block.
>>>

Ok, I went away and did a bit of research.

> Realise that the state of the CPSR is per-process.  So if you're enabling
> it in PID 1 when your driver initialises and it isn't enabled in PID 0,
> then it will still remain not enabled in PID 0 - and that's a big problem
> because that means whenever the system is idle, FIQs will be masked.
>
> Now, what I'm reading in 3.14-rc7 is that:
>
> - there is a local_fiq_enable() in arch_cpu_idle_prepare() which ensures
>   that FIQs will be enabled for the idle loop for PID0.
>
> - secondary CPUs have a local_fiq_enable() in secondary_start_kernel()
>   which ensures that the have FIQs enabled too.

The issue I describe happens on a uniprocessor platform. I have no data
for SMP.

> - when a kernel thread is created, the initial register set is created
>   by copy_thread() which sets the CPSR to just 'SVC_MODE', thus clearing
>   the IRQ and FIQ mask bits in any spawned thread.

At the end of start_kernel there's the call to rest_init, which creates
two kernel threads (kernel_init and kthreadd) and subsequently does some
scheduling that ensures all built-in drivers are probed, etc.

The arch call to enable FIQs on the boot CPU is done at the end of this
function right before the entry into the idle loop, therefore after the  
built-in
drivers are probed and various threads are created.

> So that all appears to be correct.

Appears to be. However the issue stands. I can see where copy_thread saves
a set of registers including CPSR (set to SVC_MODE in the case of kernel
threads as you describe), but I cannot find anywhere in the actual call
for a context switch (__switch_to / finish_task_switch) the point where  
the CPSR is updated.

As far as I can see, the CPSR F/I bits are unaltered during a context  
switch.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-03-23 18:38 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-03-20 19:54 [arm arch] local_irq_restore() does not play well with local_fiq_enable() Jonathan Bell
2014-03-20 20:02 ` Russell King - ARM Linux
2014-03-20 20:49   ` Jonathan Bell
2014-03-20 21:09     ` Russell King - ARM Linux
2014-03-23 18:38       ` Jonathan Bell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).