From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 07EADC388F7 for ; Fri, 13 Nov 2020 11:23:05 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7698F207DE for ; Fri, 13 Nov 2020 11:23:04 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="Rq8VdzQc" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7698F207DE Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=arm.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References:Message-ID: Subject:To:From:Date:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=lEi+5hBvZZnTGC20uGvnR4bCoQ91z6fa8F+mJKZ1CNI=; b=Rq8VdzQc3QVNPHY408zVoHXiv nDo0DemmsRlS2LlBddp+C0PWhDRkR24x/23WdKCbhElJGGkPbSzL8IJ3DdTDwU0ytqfjZ93fZ8VWO VkET5e84gnH+oVPSY2b+vs36pRq0KUmS694cZKl2UNVQ0r/EmOayPyq4XvtvXY4yWeB2JznMgujyq z0dF2HSqh6YLPtLpbT1mvgqWpC7XkBS4zfIJU+qVJC34hPArfD9fAn4sYhpdxihVdx3fHmKl6eBvw 6ClQcrctbMyKbW5pJOlo2vV0LqVC2OcBM1Hfx0qyiT3i49p8RIoa3tC5dgXZ5l0VSJIJehBjAKZNa 0HwQrfI4w==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1kdXA5-0000QT-8z; Fri, 13 Nov 2020 11:22:37 +0000 Received: from foss.arm.com ([217.140.110.172]) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1kdXA2-0000Q4-RC for linux-arm-kernel@lists.infradead.org; Fri, 13 Nov 2020 11:22:35 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 6AC931042; Fri, 13 Nov 2020 03:22:31 -0800 (PST) Received: from C02TD0UTHF1T.local (unknown [10.57.53.36]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id F17A03F6CF; Fri, 13 Nov 2020 03:22:29 -0800 (PST) Date: Fri, 13 Nov 2020 11:22:23 +0000 From: Mark Rutland To: Peter Collingbourne Subject: Re: [PATCH] kasan: arm64: support specialized outlined tag mismatch checks Message-ID: <20201113112223.GA43992@C02TD0UTHF1T.local> References: <20201029205944.2769-1-pcc@google.com> <20201103152822.GF40454@C02TD0UTHF1T.local> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20201113_062234_975093_E920E01E X-CRM114-Status: GOOD ( 43.37 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Catalin Marinas , Mark Brown , Will Deacon , Linux ARM , Andrey Konovalov Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Hi Peter, Thanks for this; the additional detail was very helpful. The high-level idea looks fine to me, but I have some qualms with the ABI, and I think we should rejig the compiler side rather than taking this as-is. It also looks like there may be an issue with PLTs. I've tried to explain those in detail below. On Thu, Nov 05, 2020 at 05:12:56PM -0800, Peter Collingbourne wrote: > On Tue, Nov 3, 2020 at 7:28 AM Mark Rutland wrote: > > > > Hi Peter, > > > > On Thu, Oct 29, 2020 at 01:59:44PM -0700, Peter Collingbourne wrote: > > > Outlined checks require a new runtime function with a custom calling > > > convention. Add this function to arch/arm64/lib. > > > > [...] > > > > > +/* > > > + * Report a tag mismatch detected by tag-based KASAN. > > > + * > > > + * This function has a custom calling convention in order to minimize the sizes > > > + * of the compiler-generated thunks that call it. All registers except for x16 > > > + * and x17 must be preserved. This includes x0 and x1 which are used by the > > > + * caller to pass arguments, and x29 (the frame pointer register). In order to > > > + * allow these registers to be restored the caller spills them to stack. The 256 > > > + * bytes of stack space allocated by the caller must be deallocated on return. > > > > This is a bit difficult to follow. > > > > Am I right in understanding that the caller has pre-allocated 256 bytes > > on the stack by decrementing the SP, yet it's up to the callee to undo > > that? That seems bizarre to me, and it would be much easier (and more > > generally useful) for the caller to leave it to the trampoline to > > allocate/free its own space. > > Yes, that's how it works. The goal was to reduce code size costs, > since otherwise you would need setup and teardown code in each > trampoline together with unwind info which will not necessarily be > deduplicated by the linker. It's probably easiest to think of the > trampoline as the "head" of the tag mismatch handling function, while > the runtime part is the "tail". From that perspective everything is > balanced. I appreciate the general idea, but there some details that I think aren't quite right in the ABI, and O'd ratehr we rejig that before settling on it, especially as IIUC this was added specifically to speed up kernel usage, and I'd like to make sure it works well in-context. Firstly, reserving 256 bytes in the compiler-generated trampoline is arbitrary, and I'd rather that trampoline only reserved the space it needs (currently 32 bytes for a stack record and x0/x1), even if that means the runtime trampoline has to shuffle x0/x1. Secondly, I don't beieve that the compiler-generated trampoline needs to stack x29/x30. It makes a tail-call to __hwasan_tag_mismatch, which generally suggests it *shouldn't* create a frame record. It doesn't clobber the original x29/x30, so a frame record isn't necessary (even for reliable stacktrace), and it doesn't plumb the new record into x29, so it could leave all that work to the runtime trampoline and save an instruction. I think that the stacking in the compiler generated trampoline should be limited to: stp x0, x1, [sp, #-16]! ... which is much like how mfentry works on most architectures, and similar to how patchable-function-entry works on arm64. > For reference here is an example of a trampoline function for the kernel: > > __hwasan_check_x1_67043328: > sbfx x16, x1, #4, #52 > ldrb w16, [x9, x16] > cmp x16, x1, lsr #56 > b.ne .Ltmp5 > .Ltmp6: > ret > .Ltmp5: > lsr x16, x1, #56 > cmp x16, #255 > b.eq .Ltmp6 > stp x0, x1, [sp, #-256]! > stp x29, x30, [sp, #232] > mov x0, x1 > mov x1, #0 > b __hwasan_tag_mismatch Thanks for this; it was helpful in clarifying my understanding. I note this clobbers x16 unconditionally. Is x17 free for use too? > > > + * > > > + * This function takes care of transitioning to the standard AAPCS calling > > > + * convention and calls the C function kasan_tag_mismatch to report the error. > > > + * > > > + * Parameters: > > > + * x0 - the fault address > > > + * x1 - an encoded description of the faulting access > > > + */ > > > +SYM_FUNC_START(__hwasan_tag_mismatch) > > > > For non-AAPCS code you should use SYM_CODE_*() rather than SYM_FUNC_*(), > > and place any BTI instructions necessary manually. > > Okay. Since the trampolines may appear in modules I guess it's > possible for the function to be called via a module PLT so a BTI > instruction is required. I think so, but I am not completely certain (there are special rules for branches from PLTs using x16/x17). > If I am reading the code in arch/arm64/kernel/module-plts.c correctly > it looks like the module PLTs may only clobber x16, correct? That > works for me. PLTs can clobber both x16 and x17. Do callers of the compiler-generated trampolines account for those being clobbered? I see from the example above that the trampolines do not. [...] > > > +#ifndef CONFIG_SHADOW_CALL_STACK > > > + str x18, [sp, #128] > > > +#endif > > > > This doesn't look right to me. The shadow call stack should remain > > balanced across a call, so why do you need to save x18? any manipulation > > within kasan_tag_mismatch should be balanced. > > > > Did you mean to save the return address to the shadow stack, rather > > than saving the shadow stack pointer? > > This is to handle the case where SCS is disabled. In this case, x18 > becomes a temporary register for AAPCS functions, but this calling > convention requires it to be saved. My bad -- I had misread the `ifndef` as an `ifdef`, which was why I was confused as to why you'd save x18 when SCS was enabled, which is not what's going on. Thanks, Mark. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel