From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BFBE82686B4 for ; Fri, 14 Feb 2025 18:12:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739556728; cv=none; b=CsYNeFQtah25N4MUYCUNAfOCtsMOu1CxokNMxstDK6Uh9CXl0sWur6PzF5WPliS2312txCsXB6YLWVgyD4TRrbzfN8LpdD1bqHtr/B8OwnLXqw1t905VLKp9Mbrr0D8nC3rFxwIfv2hY/w+99ZNhMmoGXMC8gwedmPnB0WOLVRE= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739556728; c=relaxed/simple; bh=IE6AlM52FuUyZ9EJGSQYDKiCFQ3JI2Ju5oSerDTtN2A=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=RLr8f8h6tbr36Tp+EM4gw2R8grNlDnpxVBZcVGJtEY2fKoYj2UNhzX8CYEMn1q+r+FVNqO2vZ2Ca/73duuBnqmZntrOxs8pFyuYVTuW+Pal2I0wbwu64rywM4d4+AIftmNddErW1DOcFKFiSvWNrZ3NcV7MN0S4E8ZDjtZN4Kxw= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=uDhVC6Ri; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="uDhVC6Ri" Received: by smtp.kernel.org (Postfix) with ESMTPSA id CD6DBC4CED1; Fri, 14 Feb 2025 18:12:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1739556728; bh=IE6AlM52FuUyZ9EJGSQYDKiCFQ3JI2Ju5oSerDTtN2A=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=uDhVC6RiO/WbZiHqcPn56S9E6YjvJG/E8mtoGeLaTqD4RMnyF+8NBjDvQLUhU0Xii jIiiwq/1h2kdStXwoIdkhlbXojrxpnay/AXnOKOUBoH4p72oQ46yiNUcM0s6e35HCk n630r4v69ZMFHgrm/R3TqNZXADZujbuU9m3QX3FLUiua/qM7ZQI5s7pSUP9wThR/1L n7NzI/UcNnKdjLvH5JrQweTkay2BhLJwKmyUv+UXfN82M5W9r7mwHZRyp9ZrR3lEHY vJmxOzBIzJ71nxBgGBEE6Kjgpx7cxVUYWENRHgtF5FUoCUIn+NQgv9sOe1h+E684nc iLjCvHq1ps/Vw== Date: Fri, 14 Feb 2025 10:12:06 -0800 From: Josh Poimboeuf To: Petr Mladek Cc: Yafang Shao , jikos@kernel.org, mbenes@suse.cz, joe.lawrence@redhat.com, live-patching@vger.kernel.org Subject: Re: [PATCH 2/3] livepatch: Avoid blocking tasklist_lock too long Message-ID: <20250214181206.xkvxohoc4ft26uhf@jpoimboe> References: <20250211062437.46811-1-laoar.shao@gmail.com> <20250211062437.46811-3-laoar.shao@gmail.com> <20250212004009.ijs4bdbn6h55p7xd@jpoimboe> <20250213013603.i6uxtjvc3qxlsqwc@jpoimboe> <20250213173253.ovivhuq2c5rmvkhj@jpoimboe> Precedence: bulk X-Mailing-List: live-patching@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: On Fri, Feb 14, 2025 at 03:44:10PM +0100, Petr Mladek wrote: > I guess that we really could consider the new task as migrated > and clear TIF_PATCH_PENDING. > > But we can't set child->patch_state to KLP_TRANSITION_IDLE. It won't > work when the transition gets reverted. [*] Hm, why not? I don't see any difference between patching or unpatching? klp_init_transition() has a barrier to enforce the order of the klp_target_state and func->transition writes, as read by the klp_ftrace_handler(). So in the ftrace handler, if func->transition is set and the task is KLP_TRANSITION_IDLE, it can use klp_target_state to decide which function to use. Its patch state would effectively be the same as any other already-transitioned task, whether it's patching or unpatching. Then in klp_complete_transition(), after func->transition gets set to false, klp_synchronize_transition() flushes out any running ftrace handlers. From that point on, func->transition is false, so the ftrace handler would no longer read klp_target_state. > [*] I gave this few brain cycles but I did not find any elegant > way how to set this a safe way and allow using rcu_read_lock() > in klp_try_complete_transition(). > > It might be because it is Friday evening and I am leaving for > a trip tomorrow. Also I not motivated enough to think about it > because Yafang saw the RCU stall even with that rcu_read_lock(). > So I send this just for record. Even if it doesn't fix the RCU stalls, I think we should still try to avoid holding the tasklist_lock. It's a global lock which can be contended, and we want the livepatch transition to be as unobtrusive as possible. If the system is doing a lot of forking across many CPUs, holding the lock could block all the forking tasks and trigger system-wide scheduling latencies. And that could be compounded by the unnecessary transitioning of new tasks every time the delayed work runs. -- Josh