From mboxrd@z Thu Jan 1 00:00:00 1970 From: ben.dooks@codethink.co.uk (Ben Dooks) Date: Mon, 10 Feb 2014 17:25:15 +0000 Subject: [PATCH] ARM: mm: add imprecise abort non-deadly handler In-Reply-To: <20140210143708.GB2794@e103592.cambridge.arm.com> References: <1391797214-17142-1-git-send-email-ben.dooks@codethink.co.uk> <1391797214-17142-2-git-send-email-ben.dooks@codethink.co.uk> <20140210143708.GB2794@e103592.cambridge.arm.com> Message-ID: <52F90B7B.50807@codethink.co.uk> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On 10/02/14 14:37, Dave Martin wrote: > On Fri, Feb 07, 2014 at 06:20:14PM +0000, Ben Dooks wrote: >> Given that imprecise aborts may be delivered after the action that >> caused them (or even for non-cpu related activities such as bridge >> faults from a bus-master) it is possible that the wrong process is >> terminated as a result. >> >> It is not know at this time in an SMP system which cores get notified >> of an imprecise external abort, we have yet to find the right details >> in the architecture reference manuals. This also means that killing >> the process is probably the wrong thing to do on reception of these aborts. >> >> Add a handler to take and print imprecise aborts and allow the process >> to continue. This should ensure that the abort is shown but not kill >> the process that was running on the cpu core at the time. > > Not treating these as thread-specific faults seems correct, since we > never have a way to map these aborts back to the culprit ... except that > there is a likelihood the culprit is still running when the abort fires. > > > "Spurious" imprecise aborts pretty much always indicate a hardware error > or a nasty bug somewhere. I need to find out where the one we are catching is coming from in our system. > Another cause is badly implemented, buggy or malicious userspace software > being given more exotic mmaps that it is qualified to deal with > responsibly. That's a nasty bug in the distro maintainer / system > administrator / vendor. > > So, I think this should be at least KERN_ERROR; maybe KERN_CRIT or above. > We must not encourage people to think that these aborts are somehow > benign. Ok, KERN_ERROR or KERN_CRIT sound reasonable. > If we really want people to fix their bugs, it may be worth considering > panic(), or doing this when some threshold is reached. This may be a > bit harsh though, at least without some threshold. I was considering also firing a WARN_ON(abort_count++ > 10) or something similar. -- Ben Dooks http://www.codethink.co.uk/ Senior Engineer Codethink - Providing Genius