From mboxrd@z Thu Jan 1 00:00:00 1970 From: kevin@bracey.fi (Kevin Bracey) Date: Thu, 06 Feb 2014 20:10:44 +0200 Subject: [PATCH] ARM: enable IRQs in user undefined instruction vector In-Reply-To: <20140206112039.GA31054@arm.com> References: <1391545146-8320-1-git-send-email-kevin@bracey.fi> <20140206112039.GA31054@arm.com> Message-ID: <52F3D024.5020804@bracey.fi> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On 06/02/2014 13:20, Catalin Marinas wrote: > On Tue, Feb 04, 2014 at 10:19:06PM +0200, Kevin Bracey wrote: >> See http://comments.gmane.org/gmane.linux.ports.arm.omap/59256 for >> an earlier report of the observed might_sleep() warning. > There was a follow-up on this: > > http://thread.gmane.org/gmane.linux.ports.arm.kernel/263765 Thanks for that pointer - we failed to find that thread. And from it I see I totally missed the iwmmxt path. :( > __und_usr: > usr_entry > + enable_irq > The problem is that you can be preempted here and parts of the kernel > may not cope with this. A very close analogue to this code is vector_swi. As far as I can see, having interrupts+preemption enabled in this __und_usr section should be as safe as it is there, so I'm pretty confident there shouldn't be a general kernel issue. But... > The reason I haven't pushed my patches to mainline is that I was worried > about such preemption cases. We enter iwmmxt_task_enable for example > with interrupts and preemption enabled, we disable preemption there but > is it too late? I don't have a way to test these and even for VFP I'm > not sure testing would guarantee it in all scenarios. The only concern I can see - and it's a big one - is that of how a HW coprocessor responds. If the coprocessor bounces an instruction, and we switch context before reaching the support code, will the hardware and its support code cope - both with switching to another context in that state, and handling the original bounce in this context when we get back? I know that because they're asynchronous, the FPA+VFP have always had to cope with possible pre-emption in the window between them deciding they need support and their next opportunity to bounce an instruction; but what if they get pre-empted between a bounce being generated and the support code processing it? The VFP code did take pains to increment the pre-emption count before enabling interrupts, but it's not obvious whether it requires no pre-emption between the bounce and handler, or just no pre-emption during the handler. For the FPA, we only have FPA emulation, not FPA support code, so nothing to worry about there. What about the iwmmxt and the crunchbits? They are only lazy-enable routines, to activate an inactive coprocessor. Which I think makes them safe. If we schedule before reaching the handler, when this context is returned to, the coprocessor must still be disabled in our context - the handler can proceed to enable it. And there can't be any other context state to worry about, assuming the lazy enable scheme works. Personally, I've always wondered why ARM never implemented a CP15 register you could read to find out what instruction generated a SWI or undefined instruction exception, to save all this hassle. It's always seemed to me like an obvious addition, ever since the I+D caches were split; why do I have to load my code into the data cache? > So it needs more thinking. Please have a look at my patches, if we get > to the conclusion there is no issue, I'll push them upstream. > I think if there's a problem, it will be in the VFP code - the others seem straightforward. The interrupt enable has to be made safe - as long as another core can age a page before the handler is reached, the VFP support code will /have/ to be able to cope with pre-emption between the fault and handler, if it doesn't already. We need a VFP expert to figure it out properly. We will stress test your posted patch for our failure case, which is Android skipping VFP instructions thanks to the extra irqs_disabled() patch in do_page_fault(). I think we should also include the fixup handler modification from my patch - I think __und_usr having a fixup handler that skips the instruction if do_page_fault fails is just asking for trouble, so it should be aligned with vector_swi so it retries. I share Alexey's Ignatov's concern that your patch ends up running the support code with interrupts either on or off depending on whether you came from user or supervisor mode, which feels a little odd. But then, always enabling interrupts like the current code does, when they might have been off in the SVC mode context, also seems wrong. I did ponder for my version whether the __und_svc path should restore original IRQ status before going to call_fpe, to make the handler entry environment more uniform - IRQ status would always be as in original context. Maybe it's a fine detail, as we don't really expect to be handling any of these instructions from the kernel anyway, but if we are going to the trouble to send __und_svc to handlers rather than going straight to __und_svc_fault... Kevin