From mboxrd@z Thu Jan 1 00:00:00 1970 From: aryabinin@virtuozzo.com (Andrey Ryabinin) Date: Wed, 17 Feb 2016 19:31:43 +0300 Subject: [PATCH v5sub1 7/8] arm64: move kernel image to base of vmalloc area In-Reply-To: <20160217143950.GC32647@leverpostej> References: <20160212145844.GI31665@e104818-lin.cambridge.arm.com> <20160212151006.GJ31665@e104818-lin.cambridge.arm.com> <20160212152641.GK31665@e104818-lin.cambridge.arm.com> <56BDFC86.5010705@arm.com> <20160212160652.GL31665@e104818-lin.cambridge.arm.com> <56C1E072.2090909@virtuozzo.com> <20160215185957.GB19413@e104818-lin.cambridge.arm.com> <56C31D1D.50708@virtuozzo.com> <20160217143950.GC32647@leverpostej> Message-ID: <56C4A06F.3040702@virtuozzo.com> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On 02/17/2016 05:39 PM, Mark Rutland wrote: > On Tue, Feb 16, 2016 at 03:59:09PM +0300, Andrey Ryabinin wrote: >> Actually, the first report is a bit more useful. It shows that shadow memory was corrupted: >> >> ffffffc93665bc00: f1 f1 f1 f1 00 f4 f4 f4 f2 f2 f2 f2 00 00 f1 f1 >>> ffffffc93665bc80: f1 f1 00 00 00 00 f3 f3 00 f4 f4 f4 f3 f3 f3 f3 >> ^ >> F1 - left redzone, it indicates start of stack frame >> F3 - right redzone, it should be the end of stack frame. >> >> But here we have the second set of F1s without F3s which should close the first set of F1s. >> Also those two F3s in the middle cannot be right. >> >> So shadow is corrupted. >> Some hypotheses: > >> 2) Shadow memory wasn't cleared. GCC poison memory on function entrance and unpoisons it before return. >> If we use some tricky way to exit from function this could cause false-positives like that. >> E.g. some hand-written assembly return code. > > I think this is what's happenening, at least for the idle case. > > A second attempt at bisecting led me to commit e679660dbb8347f2 ("ARM: > 8481/2: drivers: psci: replace psci firmware calls"). Reverting that > makes v4.5-rc1 boot without KASAN splats. > > That patch turned __invoke_psci_fn_{smc,hvc} into (ASAN-instrumented) C > functions. Prior to that commit, __invoke_psci_fn_{smc,hvc} were > pure assembly functions which used no stack. > > When we go down for idle, in __cpu_suspend_enter we stash some context > to the stack (in assembly). The CPU may return from a cold state via > cpu_resume, where we restore context from the stack. > > However, after storing the context we call psci_suspend_finisher, which > calls psci_cpu_suspend, which calls invoke_psci_fn_*. As > psci_cpu_suspend and invoke_psci_fn_* are instrumented, they poison > memory on function entrance, but we never perform the unpoisoning. > > That was always the case for psci_suspend_finisher, so there was a > latent issue that we were somehow avoiding. Perhaps we got luck with > stack layout and never hit the poison. > > I'm not sure how we fix that, as invoke_psci_fn_* may or may not return > for arbitrary reasons (e.g. a CPU_SUSPEND_CALL may or may not return > depending on whether an interrupt comes in at the right time). > > Perhaps the simplest option is to not instrument invoke_psci_fn_* and > psci_suspend_finisher. Do we have a per-function annotation to avoid > KASAN instrumentation, like notrace? I need to investigate, but we may > also need notrace for similar reasons. include/linux/compiler-gcc.h: /* * Tell the compiler that address safety instrumentation (KASAN) * should not be applied to that function. * Conflicts with inlining: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67368 */ #define __no_sanitize_address __attribute__((no_sanitize_address)) > > Andrey, on a tangential note, what do we do around hotplug? I assume > that we must unpooison the shadow region for the stack of a dead CPU, > but I wasn't able to figure out where we do that. Hopefuly we're not > just getting lucky? > We do nothing about it. AFAIU we need to clear swapper's stack, somewhere in secondary_start_kernel() perhaps. > Thanks, > Mark. >