From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-14.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A5BE3C433DB for ; Tue, 23 Mar 2021 17:28:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 75982619AE for ; Tue, 23 Mar 2021 17:28:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230190AbhCWR1r (ORCPT ); Tue, 23 Mar 2021 13:27:47 -0400 Received: from linux.microsoft.com ([13.77.154.182]:60124 "EHLO linux.microsoft.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230198AbhCWR1i (ORCPT ); Tue, 23 Mar 2021 13:27:38 -0400 Received: from [192.168.254.32] (unknown [47.187.194.202]) by linux.microsoft.com (Postfix) with ESMTPSA id 13A1020B5680; Tue, 23 Mar 2021 10:27:37 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com 13A1020B5680 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1616520457; bh=p1pmPljb8ZS9xoSxAxg/MECHfcb5ywuGv+WYp/09E78=; h=Subject:From:To:Cc:References:Date:In-Reply-To:From; b=HuwZjruLEWv0TT34ZZaPXAug9Yq/DKVVeUtQBlYLfF/NvKNPQ5mz+LQkx4iKi0VPE EID4btZg8bDcRtZplySFtYi1Zz2mRVKl4H6SYXEu0EV7MvaUtl2p0owNxQoB6o307s AUh1ZS65DNEowBBOR06bNR/NvPjTXVZ006VB/Gcs= Subject: Re: [RFC PATCH v2 5/8] arm64: Detect an FTRACE frame and mark a stack trace unreliable From: "Madhavan T. Venkataraman" To: Mark Rutland Cc: broonie@kernel.org, jpoimboe@redhat.com, jthierry@redhat.com, catalin.marinas@arm.com, will@kernel.org, linux-arm-kernel@lists.infradead.org, live-patching@vger.kernel.org, linux-kernel@vger.kernel.org References: <20210315165800.5948-1-madvenka@linux.microsoft.com> <20210315165800.5948-6-madvenka@linux.microsoft.com> <20210323105118.GE95840@C02TD0UTHF1T.local> <2167f3c5-e7d0-40c8-99e3-ae89ceb2d60e@linux.microsoft.com> <20210323133611.GB98545@C02TD0UTHF1T.local> <20210323145734.GD98545@C02TD0UTHF1T.local> <20210323170236.GF98545@C02TD0UTHF1T.local> Message-ID: <2a390ffb-4931-9b7d-e203-5d0189052744@linux.microsoft.com> Date: Tue, 23 Mar 2021 12:27:36 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.7.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: live-patching@vger.kernel.org On 3/23/21 12:23 PM, Madhavan T. Venkataraman wrote: > > > On 3/23/21 12:02 PM, Mark Rutland wrote: >> On Tue, Mar 23, 2021 at 11:20:44AM -0500, Madhavan T. Venkataraman wrote: >>> On 3/23/21 10:26 AM, Madhavan T. Venkataraman wrote: >>>> On 3/23/21 9:57 AM, Mark Rutland wrote: >>>>> On Tue, Mar 23, 2021 at 09:15:36AM -0500, Madhavan T. Venkataraman wrote: >>>> So, my next question is - can we define a practical limit for the >>>> nesting so that any nesting beyond that is fatal? The reason I ask >>>> is - if there is a max, then we can allocate an array of stack >>>> frames out of band for the special frames so they are not part of >>>> the stack and will not likely get corrupted. >>>> >>>> Also, we don't have to do any special detection. If the number of >>>> out of band frames used is one or more then we have exceptions and >>>> the stack trace is unreliable. >>> >>> Alternatively, if we can just increment a counter in the task >>> structure when an exception is entered and decrement it when an >>> exception returns, that counter will tell us that the stack trace is >>> unreliable. >> >> As I noted earlier, we must treat *any* EL1 exception boundary needs to >> be treated as unreliable for unwinding, and per my other comments w.r.t. >> corrupting the call chain I don't think we need additional protection on >> exception boundaries specifically. >> >>> Is this feasible? >>> >>> I think I have enough for v3 at this point. If you think that the >>> counter idea is OK, I can implement it in v3. Once you confirm, I will >>> start working on v3. >> >> Currently, I don't see a compelling reason to need this, and would >> prefer to avoid it. >> > > I think that I did a bad job of explaining what I wanted to do. It is not > for any additional protection at all. > > So, let us say we create a field in the task structure: > > u64 unreliable_stack; > > Whenever an EL1 exception is entered or FTRACE is entered and pt_regs get > set up and pt_regs->stackframe gets chained, increment unreliable_stack. > On exiting the above, decrement unreliable_stack. > > In arch_stack_walk_reliable(), simply do this check upfront: > > if (task->unreliable_stack) > return -EINVAL; > > This way, the function does not even bother unwinding the stack to find > exception frames or checking for different return addresses or anything. > We also don't have to worry about code being reorganized, functions > being renamed, etc. It also may help in debugging to know if a task is > experiencing an exception and the level of nesting, etc. > >> More generally, could we please break this work into smaller steps? I >> reckon we can break this down into the following chunks: >> >> 1. Add the explicit final frame and associated handling. I suspect that >> this is complicated enough on its own to be an independent series, >> and it's something that we can merge without all the bits and pieces >> necessary for truly reliable stacktracing. >> > > OK. I can do that. > >> 2. Figure out how we must handle kprobes and ftrace. That probably means >> rejecting unwinds from specific places, but we might also want to >> adjust the trampolines if that makes this easier. >> > > I think I am already doing all the checks except the one you mentioned > earlier. Yes, I can do this separately. > >> 3. Figure out exception boundary handling. I'm currently working to >> simplify the entry assembly down to a uniform set of stubs, and I'd >> prefer to get that sorted before we teach the unwinder about >> exception boundaries, as it'll be significantly simpler to reason >> about and won't end up clashing with the rework. >> > > So, here is where I still have a question. Is it necessary for the unwinder > to know the exception boundaries? Is it not enough if it knows if there are > exceptions present? For instance, using something like num_special_frames Typo - num_special_frames should be unreliable_stack. That is the name of the counter I used above. Sorry about that. Madhavan