* [arm arch] local_irq_restore() does not play well with local_fiq_enable() @ 2014-03-20 19:54 Jonathan Bell 2014-03-20 20:02 ` Russell King - ARM Linux 0 siblings, 1 reply; 5+ messages in thread From: Jonathan Bell @ 2014-03-20 19:54 UTC (permalink / raw) To: linux-arm-kernel Hi. I believe I've found an edge case where the FIQ enable bit in the CPSR for armv6 can get trampled through the use of a combination of the standard kernel interfaces and the arch-specific local_fiq_enable/local_fiq_disable macros. We have an out-of-tree set of drivers on our mach-bcm2708 port that result in a bad interaction which disables FIQs for extended periods of time. The below cliffnotes version is for readability, the full set of relevant code can be found at: https://github.com/raspberrypi/linux/blob/rpi-3.10.y-next/drivers/misc/vc04_services/interface/vchiq_arm/ and https://github.com/raspberrypi/linux/blob/rpi-3.10.y-next/drivers/usb/host/dwc_otg/dwc_otg_hcd_linux.c In driver #1 (BRCM vchiq messagebox/GPU message passing handler) we do this: vchiq_doorbell_irq: - Has the doorbell been poked from the GPU? - If yes up(vhciq_semaphore) return IRQ_HANDLED in the corresponding "bottom half" we do something a bit silly but basically: vchiq_worker_kthread: while (1) down_interruptible(vchiq_semaphore) process_gpu_messages() do_some_other_stuff() In driver #2 (our dwc_otg/dwc2 implementation) we do the following on init: hcd_init: setup the FIQ state install the fiq handler enable_fiq(USB) local_fiq_enable() usb_add_hcd() Because the vchiq_worker_thread is waiting on a mailbox interrupt from the GPU, it initialises and goes to sleep almost immediately. In down_interruptible, there is a sequence of events (if the sem->count is <=0) that basically does local_irq_save(flags) local_irq_enable() schedule(); local_irq_disable() local_irq_restore(flags) which then continues with the rest of the boot process, and probes USB which enables the FIQ. The problem we get is that on boot, the vchiq services are probed and initialised before the USB driver is. A stale flags variable (with F bit set) borks things up as far as the FIQ is concerned. On the first GPU interrupt, the vchiq_doorbell_irq increments the semaphore and wakes up the vchiq_worker_thread. Which then promptly overwrites the FIQ bit in the CPSR because the flags were saved before the FIQ was enabled. The stale F bit then has the potential to propagate through other threads using similar mechanisms and causes no end of trouble to a FIQ handler that expects never to be disabled. In our out-of-tree arch, I've "fixed" this by making local_irq_restore never touch the FIQ bit in CPSR. See https://github.com/raspberrypi/linux/commit/a0f47344e286768e3ce96268eed1ad0f6cfd9f2c for the modification to arm/irqflags.h. Should this be classed as a bug? The root cause is that the same register is used to enable priority FIQs as well as normal IRQs - thus the save(flags) and restore(flags) mechanisms trample on something that they don't know about. ^ permalink raw reply [flat|nested] 5+ messages in thread
* [arm arch] local_irq_restore() does not play well with local_fiq_enable() 2014-03-20 19:54 [arm arch] local_irq_restore() does not play well with local_fiq_enable() Jonathan Bell @ 2014-03-20 20:02 ` Russell King - ARM Linux 2014-03-20 20:49 ` Jonathan Bell 0 siblings, 1 reply; 5+ messages in thread From: Russell King - ARM Linux @ 2014-03-20 20:02 UTC (permalink / raw) To: linux-arm-kernel On Thu, Mar 20, 2014 at 07:54:02PM +0000, Jonathan Bell wrote: > hcd_init: > setup the FIQ state > install the fiq handler > enable_fiq(USB) > local_fiq_enable() > usb_add_hcd() You're not supposed to use local_fiq_enable() to enable FIQs in a device driver - they should already be enabled by this point. There has been a hole in that where FIQs haven't been enabled in the idle thread, but that's a bug which needs fixing. Otherwise, you should assume that FIQs are always unmasked everywhere except for any short code sequences that are contained within a short local_fiq_disable().. local_fiq_enable() block. -- FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly improving, and getting towards what was expected from it. ^ permalink raw reply [flat|nested] 5+ messages in thread
* [arm arch] local_irq_restore() does not play well with local_fiq_enable() 2014-03-20 20:02 ` Russell King - ARM Linux @ 2014-03-20 20:49 ` Jonathan Bell 2014-03-20 21:09 ` Russell King - ARM Linux 0 siblings, 1 reply; 5+ messages in thread From: Jonathan Bell @ 2014-03-20 20:49 UTC (permalink / raw) To: linux-arm-kernel On Thu, 20 Mar 2014 20:02:34 -0000, Russell King - ARM Linux <linux@arm.linux.org.uk> wrote: > On Thu, Mar 20, 2014 at 07:54:02PM +0000, Jonathan Bell wrote: >> hcd_init: >> setup the FIQ state >> install the fiq handler >> enable_fiq(USB) >> local_fiq_enable() >> usb_add_hcd() > > You're not supposed to use local_fiq_enable() to enable FIQs in a device > driver - they should already be enabled by this point. > > There has been a hole in that where FIQs haven't been enabled in the > idle thread, but that's a bug which needs fixing. Otherwise, you should > assume that FIQs are always unmasked everywhere except for any short > code sequences that are contained within a short local_fiq_disable().. > local_fiq_enable() block. > Relying on FIQs being enabled elsewhere is in fact what we used to do until ~3.10, whereupon some change (most likely the bug you describe) broke this behaviour. Our use of local_fiq_disable() and local_fiq_enable() are indeed constrained to minimally small critical sections (mostly reading/writing a single hardware register or state variable) within the dwc_otg driver that "handles" the results from the FIQ. But even if FIQs are enabled before entering the idle thread, any kernel thread that sleeps (and yields the CPU) with an irqflags variable that was saved before the call to local_fiq_enable has the potential to corrupt the F bit when the thread subsequently wakes up. ^ permalink raw reply [flat|nested] 5+ messages in thread
* [arm arch] local_irq_restore() does not play well with local_fiq_enable() 2014-03-20 20:49 ` Jonathan Bell @ 2014-03-20 21:09 ` Russell King - ARM Linux 2014-03-23 18:38 ` Jonathan Bell 0 siblings, 1 reply; 5+ messages in thread From: Russell King - ARM Linux @ 2014-03-20 21:09 UTC (permalink / raw) To: linux-arm-kernel On Thu, Mar 20, 2014 at 08:49:24PM +0000, Jonathan Bell wrote: > On Thu, 20 Mar 2014 20:02:34 -0000, Russell King - ARM Linux > <linux@arm.linux.org.uk> wrote: > >> On Thu, Mar 20, 2014 at 07:54:02PM +0000, Jonathan Bell wrote: >>> hcd_init: >>> setup the FIQ state >>> install the fiq handler >>> enable_fiq(USB) >>> local_fiq_enable() >>> usb_add_hcd() >> >> You're not supposed to use local_fiq_enable() to enable FIQs in a device >> driver - they should already be enabled by this point. >> >> There has been a hole in that where FIQs haven't been enabled in the >> idle thread, but that's a bug which needs fixing. Otherwise, you should >> assume that FIQs are always unmasked everywhere except for any short >> code sequences that are contained within a short local_fiq_disable().. >> local_fiq_enable() block. >> > > Relying on FIQs being enabled elsewhere is in fact what we used to do > until ~3.10, whereupon some change (most likely the bug you describe) > broke this behaviour. > > Our use of local_fiq_disable() and local_fiq_enable() are indeed > constrained to minimally small critical sections (mostly reading/writing > a single hardware register or state variable) within the dwc_otg driver > that "handles" the results from the FIQ. Realise that the state of the CPSR is per-process. So if you're enabling it in PID 1 when your driver initialises and it isn't enabled in PID 0, then it will still remain not enabled in PID 0 - and that's a big problem because that means whenever the system is idle, FIQs will be masked. Now, what I'm reading in 3.14-rc7 is that: - there is a local_fiq_enable() in arch_cpu_idle_prepare() which ensures that FIQs will be enabled for the idle loop for PID0. - secondary CPUs have a local_fiq_enable() in secondary_start_kernel() which ensures that the have FIQs enabled too. - when a kernel thread is created, the initial register set is created by copy_thread() which sets the CPSR to just 'SVC_MODE', thus clearing the IRQ and FIQ mask bits in any spawned thread. So that all appears to be correct. > But even if FIQs are enabled before entering the idle thread, any kernel > thread that sleeps (and yields the CPU) with an irqflags variable that > was saved before the call to local_fiq_enable has the potential to > corrupt the F bit when the thread subsequently wakes up. No, because local_fiq_enable() only ever affects the thread on the CPU which executed it. -- FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly improving, and getting towards what was expected from it. ^ permalink raw reply [flat|nested] 5+ messages in thread
* [arm arch] local_irq_restore() does not play well with local_fiq_enable() 2014-03-20 21:09 ` Russell King - ARM Linux @ 2014-03-23 18:38 ` Jonathan Bell 0 siblings, 0 replies; 5+ messages in thread From: Jonathan Bell @ 2014-03-23 18:38 UTC (permalink / raw) To: linux-arm-kernel On Thu, 20 Mar 2014 21:09:41 -0000, Russell King - ARM Linux <linux@arm.linux.org.uk> wrote: > On Thu, Mar 20, 2014 at 08:49:24PM +0000, Jonathan Bell wrote: >> On Thu, 20 Mar 2014 20:02:34 -0000, Russell King - ARM Linux >> <linux@arm.linux.org.uk> wrote: >> >>> On Thu, Mar 20, 2014 at 07:54:02PM +0000, Jonathan Bell wrote: >>>> hcd_init: >>>> setup the FIQ state >>>> install the fiq handler >>>> enable_fiq(USB) >>>> local_fiq_enable() >>>> usb_add_hcd() >>> >>> You're not supposed to use local_fiq_enable() to enable FIQs in a >>> device >>> driver - they should already be enabled by this point. >>> >>> There has been a hole in that where FIQs haven't been enabled in the >>> idle thread, but that's a bug which needs fixing. Otherwise, you >>> should >>> assume that FIQs are always unmasked everywhere except for any short >>> code sequences that are contained within a short local_fiq_disable().. >>> local_fiq_enable() block. >>> Ok, I went away and did a bit of research. > Realise that the state of the CPSR is per-process. So if you're enabling > it in PID 1 when your driver initialises and it isn't enabled in PID 0, > then it will still remain not enabled in PID 0 - and that's a big problem > because that means whenever the system is idle, FIQs will be masked. > > Now, what I'm reading in 3.14-rc7 is that: > > - there is a local_fiq_enable() in arch_cpu_idle_prepare() which ensures > that FIQs will be enabled for the idle loop for PID0. > > - secondary CPUs have a local_fiq_enable() in secondary_start_kernel() > which ensures that the have FIQs enabled too. The issue I describe happens on a uniprocessor platform. I have no data for SMP. > - when a kernel thread is created, the initial register set is created > by copy_thread() which sets the CPSR to just 'SVC_MODE', thus clearing > the IRQ and FIQ mask bits in any spawned thread. At the end of start_kernel there's the call to rest_init, which creates two kernel threads (kernel_init and kthreadd) and subsequently does some scheduling that ensures all built-in drivers are probed, etc. The arch call to enable FIQs on the boot CPU is done at the end of this function right before the entry into the idle loop, therefore after the built-in drivers are probed and various threads are created. > So that all appears to be correct. Appears to be. However the issue stands. I can see where copy_thread saves a set of registers including CPSR (set to SVC_MODE in the case of kernel threads as you describe), but I cannot find anywhere in the actual call for a context switch (__switch_to / finish_task_switch) the point where the CPSR is updated. As far as I can see, the CPSR F/I bits are unaltered during a context switch. ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2014-03-23 18:38 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-03-20 19:54 [arm arch] local_irq_restore() does not play well with local_fiq_enable() Jonathan Bell 2014-03-20 20:02 ` Russell King - ARM Linux 2014-03-20 20:49 ` Jonathan Bell 2014-03-20 21:09 ` Russell King - ARM Linux 2014-03-23 18:38 ` Jonathan Bell
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).