From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 3r1WYJ1QjSzDq5f for ; Fri, 6 May 2016 22:39:00 +1000 (AEST) Date: Fri, 6 May 2016 07:38:55 -0500 From: Josh Poimboeuf To: Petr Mladek Cc: Jessica Yu , Jiri Kosina , Miroslav Benes , Ingo Molnar , Peter Zijlstra , Michael Ellerman , Heiko Carstens , live-patching@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, Vojtech Pavlik , Jiri Slaby , Chris J Arges , Andy Lutomirski Subject: Re: klp_task_patch: was: [RFC PATCH v2 17/18] livepatch: change to a per-task consistency model Message-ID: <20160506123855.oabdeehqu3aj6nqy@treble> References: <20160504144854.GT2749@pathway.suse.cz> <20160504175700.nizuvflyisfd3lks@treble> <20160505115701.GY2749@pathway.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 In-Reply-To: <20160505115701.GY2749@pathway.suse.cz> List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , On Thu, May 05, 2016 at 01:57:01PM +0200, Petr Mladek wrote: > I have missed that the two commands are called with preemption > disabled. So, I had the following crazy scenario in mind: > > > CPU0 CPU1 > > klp_enable_patch() > > klp_target_state = KLP_PATCHED; > > for_each_task() > set TIF_PENDING_PATCH > > # task 123 > > if (klp_patch_pending(current) > klp_patch_task(current) > > clear TIF_PENDING_PATCH > > smp_rmb(); > > # switch to assembly of > # klp_patch_task() > > mov klp_target_state, %r12 > > # interrupt and schedule > # another task > > > klp_reverse_transition(); > > klp_target_state = KLP_UNPATCHED; > > klt_try_to_complete_transition() > > task = 123; > if (task->patch_state == klp_target_state; > return 0; > > => task 123 is in target state and does > not block conversion > > klp_complete_transition() > > > # disable previous patch on the stack > klp_disable_patch(); > > klp_target_state = KLP_UNPATCHED; > > > # task 123 gets scheduled again > lea %r12, task->patch_state > > => it happily stores an outdated > state > Thanks for the clear explanation, this helps a lot. > This is why the two functions should get called with preemption > disabled. We should document it at least. I imagine that we will > use them later also in another context and nobody will remember > this crazy scenario. > > Well, even disabled preemption does not help. The process on > CPU1 might be also interrupted by an NMI and do some long > printk in it. > > IMHO, the only safe approach is to call klp_patch_task() > only for "current" on a safe place. Then this race is harmless. > The switch happen on a safe place, so that it does not matter > into which state the process is switched. I'm not sure about this solution. When klp_complete_transition() is called, we need all tasks to be patched, for good. We don't want any of them to randomly switch to the wrong state at some later time in the middle of a future patch operation. How would changing klp_patch_task() to only use "current" prevent that? > By other words, the task state might be updated only > > + by the task itself on a safe place > + by other task when the updated on is sleeping on a safe place > > This should be well documented and the API should help to avoid > a misuse. I think we could fix it to be safe for future callers who might not have preemption disabled with a couple of changes to klp_patch_task(): disabling preemption and testing/clearing the TIF_PATCH_PENDING flag before changing the patch state: void klp_patch_task(struct task_struct *task) { preempt_disable(); if (test_and_clear_tsk_thread_flag(task, TIF_PATCH_PENDING)) task->patch_state = READ_ONCE(klp_target_state); preempt_enable(); } We would also need a synchronize_sched() after the patching is complete, either at the end of klp_try_complete_transition() or in klp_complete_transition(). That would make sure that all existing calls to klp_patch_task() are done. -- Josh