From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B98BB17C77 for ; Wed, 3 Dec 2025 19:11:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764789122; cv=none; b=lvx/FFPxtWdRMprTuMcC6cRVYZqQZ8SB8yK6kqP9/B6U3duNJPk+PVek5MYYEpgjT3OO/evdU3kIAZlWl5ErcKIT2ce+7CNyWz9oEn8i0toH6+Vt9vaG84UPNE6oSFGDcSYjKLtwKy342Sd6zUBkWxeZwhMtOjXGyTofekZEZFw= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764789122; c=relaxed/simple; bh=9jzE9iBjMeBWHBR4I47cKMqfpUUyI5V+RtF8pH1d0KM=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=Z8073WNEverkrFmejBsV6WQHB0M/I0v09+nWJa8WDNol/VGytvjQ10gCaoIE3C8kCgOcGQQ5aaCSbXkl5zHUAxv0oKDJt9TICpgdpnemdsaXQxRqHMvKHmkru7lqRTspN3m9TGSa6YQbjhknxI4Qr7F/HcDKHBT1sC4SPD3md0k= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=N1tPQkmQ; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="N1tPQkmQ" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 404D8C113D0; Wed, 3 Dec 2025 19:11:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1764789119; bh=9jzE9iBjMeBWHBR4I47cKMqfpUUyI5V+RtF8pH1d0KM=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=N1tPQkmQ6E4klXVYcsci6hpBR3o4y0tig+BD8vfDULIMOFukxQCZt89dbwSStuMR0 96jQqa5nHgIjwvuPxC2dwre/IlGFJizqD3mS2W8Gi8X2iKU0uZkcFyG/fWKprPYbZr r6sGTfY6Ll71GaExOE0x/dbLOlfA1N3D2BHGPnjUwQTW2HbezzEeisyTfVGV/4JS/m ZsSrFpV4hBLA+e+hMP/5dxcbh3ZRXw2WGJVQ4+B/zuE6dmc4rw/j9fnTvB4VzcI83X 4fBJQq3fvNo46YbF2vkMp22oTbD1ac4Sysdc+hpNgJ1EVztr16jYrvq3+QVdNEpGeW qQOhTL0j6Oneg== Date: Wed, 3 Dec 2025 20:11:54 +0100 From: Ingo Molnar To: Josh Poimboeuf Cc: x86@kernel.org, linux-kernel@vger.kernel.org, Nathan Chancellor , Peter Zijlstra , Alexandre Chartre , David Laight , Linus Torvalds Subject: Re: [PATCH] objtool: Fix stack overflow in validate_branch() Message-ID: References: <21bb161c23ca0d8c942a960505c0d327ca2dc7dc.1764691895.git.jpoimboe@kernel.org> <7i2v6lkl7pd2jzk57omos6pqkgwooewrrztsvi5weibvod2f5b@3mkwqwzslyl4> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <7i2v6lkl7pd2jzk57omos6pqkgwooewrrztsvi5weibvod2f5b@3mkwqwzslyl4> * Josh Poimboeuf wrote: > On Wed, Dec 03, 2025 at 10:25:34AM +0100, Ingo Molnar wrote: > > * Josh Poimboeuf wrote: > > > On Tue, Dec 02, 2025 at 05:20:22PM +0100, Ingo Molnar wrote: > > > > * Josh Poimboeuf wrote: > > > > > On an allmodconfig kernel compiled with Clang, objtool is > > > > > segfaulting in drivers/scsi/qla2xxx/qla2xxx.o due to a stack > > > > > overflow in validate_branch(). > > > > > > > > > > Due in part to KASAN being enabled, the qla2xxx code has a large > > > > > number of conditional jumps, causing objtool to go quite deep in > > > > > its recursion. > > > > > > > > > > By far the biggest offender of stack usage is the recently added > > > > > 'prev_state' stack variable in validate_insn(), coming in at 328 > > > > > bytes. > > > > > > > > That's weird - how can a user-space tool run into stack limits, are > > > > they set particularly conservatively? > > > > > > On my Fedora system, "ulimit -s" is 8MB. You'd think that would be > > > enough :-) > > > > > > In this case, objtool had over 20,000 stack frames caused by > > > recursively following over 7,000(!) conditional jumps in a single > > > function. > > > > BTW., I just instrumented it, and it's even worse: on current upstream, > > the allmodconfig qla2xxx.o code built with clang-20.1.8 has a worst-case > > recursion depth of 50,944 (!), for the qla83xx_fw_dump() function. > > Is that number of loops or total stack frames? So I tracked the depth of validate_insn() recursion directly: ret = validate_insn(file, func, insn, &state, prev_insn, next_insn, - &dead_end); + &dead_end, depth++); if (!insn->trace) { if (ret) And in validate_insn(): if (depth > max_depth) { max_depth = depth; printf("# objtool new max depth: %ld for %s()\n", max_depth, func->name); } Actual function recursion depth may be deeper, if any of the helper functions get uninlined. > kernel and clang 20.1.8 I'm getting a max recursion depth of 7,165 loops > (not frames). See the below patch for how I measured that. Your patch seems to be similar, except that I passed in 'depth' directly, because as a kernel developer I don't trust globals :-) But it should measure the same thing AFAICS, right? > You may be underestimating the amount of memory usage objtool needs. > Running objtool on that binary with "/usr/bin/time -v" shows the maximum > resident set size is 140M. So the stack usage of 5.5MB is only about > 4.4% of the total memory usage. Still, the stack is some of the cache-hottest pieces of memory in that workload - and the biggest negative impact from the current recursion pattern comes from the sparse parsing, which suffers an even worse negative effect with a 140MB working set. > > One relatively simple method to 'straighten out' the parsing flow would > > be to add an internal 'branch queue' with a limited size of say 16 or 32 > > entries, and defer the parsing of these branch targets and continue with > > the next instruction, until one of these conditions is true: > > > > - 'branch queue' is full > > > > - JMP, CALL, RET or any other branching/trapping instruction is found > > > > - already validated instruction is found > > > > - end of symbol/section/file/etc. > > > > At which point the current 'branch queue' is flushed. (It might even be > > implemented as a branch-target stack, which may have a bit better > > locality.) > > Objtool tracks a considerable amount of state across branches. The > recursion works well for keeping that state at hand. So there is a > certain level of dependency there which I have a feeling might be > difficult to extricate. I haven't really looked at it though. I'm not against recursion for branches at all, I just suggest to change the order of how the recursion is fed: instead of parsing the two instruction streams of a branch point in this order: verify target recursively verify next instruction (Which is arguably the simplest.) I suggest the following recursion pattern: verify a batch of serial sequence of instruction(s) and save conditional branch targets (if any) verify saved branch targets, recursively This change to the recursion pattern should make a very large impact on max recursion depth, in addition to substantially better cache locality. Thanks, Ingo