From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-ed1-f45.google.com (mail-ed1-f45.google.com [209.85.208.45]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AFA6B27180B for ; Wed, 12 Feb 2025 15:42:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.45 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739374966; cv=none; b=TF1MAa7RBhEKW/U38N6hTiCFGiXzEG6Crz5vGxHrjd/KbQXNzyK4e1O8h2oFbIM0odL+us5cEXq/bZnbG9Af4C9QdxV5pobLYRbX1AK3YfkAp3qiSuWcBTbuRAg9tmddGi+F1S2OfJSid94FCshZcrA9OzYolvyw4L9STvQEAv0= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739374966; c=relaxed/simple; bh=YY0ssl+CXBQNBh9eBWulItzJa3bkJYhVBr692DvN83M=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=ZnXFSFOaDRJ2abCoFnPiTlWFk6IrxNuMywDwlPWMs4iTn7bY2dBOlKWeyHhkSrF5BbGJVddIunllVsYVFPxpR7NU2n4/DKa60y2awbOcAvWamtUnMQo6NSiOuSyvDJk14pJT3VpoJGjzHfL9VmBQXbqXkpYL5n06l2V/1sGjojs= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com; spf=pass smtp.mailfrom=suse.com; dkim=pass (2048-bit key) header.d=suse.com header.i=@suse.com header.b=OA3+kim0; arc=none smtp.client-ip=209.85.208.45 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=suse.com header.i=@suse.com header.b="OA3+kim0" Received: by mail-ed1-f45.google.com with SMTP id 4fb4d7f45d1cf-5de64873d18so7773633a12.2 for ; Wed, 12 Feb 2025 07:42:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=google; t=1739374962; x=1739979762; darn=vger.kernel.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=i6ZpRGsTvZbDM3XhNofIhls2/mOZKVHm57EHg3R7sbY=; b=OA3+kim0WIWzgx/4FmG2wbahawepKkdQONR2B3ubhXLBJvcDb+X/G7aiCv5hcsn3NP Fp3og1BaF45RUy7OdGR+L+LVofpxqGetE5QlQcBROKflFJs0tliLVBBHSy/KXVKeiE8y T/chWHB1XPg9c/D8bLGZfDpQ+Fu/MnXTE26ok5/yY+vf03AXkKUudC5zztgl8OsuMsZP SSMwqEJFD4wEOtOFf5+2p48P50/Lb6amLkxWSdNu2Hg72sY3XMvftbJwwrSM2Og+1r9v FoZDhNjfZU+rSDssQ4Z74FZZOTwq1hB9oJOZ/sBAkl8UjvXztiQj9UTggKvEi/eWdad+ TBaw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739374962; x=1739979762; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=i6ZpRGsTvZbDM3XhNofIhls2/mOZKVHm57EHg3R7sbY=; b=KKjtgccpJEXMyHDvu2nxrZ/ckEUap8FWXYKEZFxn+sNFYos5RaVrbijl8IxbE5xcT0 +qfCF4Wmm5vcz9btT09CW06smDE6WLNwuJs8/lc9UQyLtphjKuFrZzpAKJrofRljeZR3 TbOMnDo2ZKI8xdcMWJhFvd7sisX3RYG2p9+KvY3MFhiZKl4+c44qGEvJ3+kFf6SLSv04 omc5fm/T45rMegR7E8577nJx69u2rt/pdMPDBskh5K9xU7YexrCwBHsCj9HITykbcNWI qbLdgsZqnmlcvG7TGkUWH+DDRBmiKQ2u9+2KchUM28bZHcDXeKt6CtwXMTX1JnDfEXGk OqIA== X-Forwarded-Encrypted: i=1; AJvYcCUNkGJq41U4Bn/1uShx5JyO2m7XcCNbG3liUTCAf83SYDJOPXdJsgj7+2QsO7H5uEF0IPYBKC2xUktVyI8+@vger.kernel.org X-Gm-Message-State: AOJu0YxURbOjQSOydfUbwz+F6CmdF1d+hUZYcTOe37yrhqHVYj+fSuDa it1zGGojWg6o3GCa4QX+eHf4kl+C/Pd/6yxHvmnfD8PF+gA0fTFq5KcvKolmLmOK5Es2Ug1DjT8 E X-Gm-Gg: ASbGncsNcLGv8ZfTOWu1OjS2zZ3q8UIsevhjfcJRo7kDFtHBqbQW1hqiTKBGF2wAcvE BiE1PkaSPCf+vLNVjifI+UaJvE/rxA7mvdDvvM9dG0P9FGw4L2VHqOzkvg410g/MauZkEHdlQXm BN77Xi2T66wv8NGX54CI3iilp1FzN/S5QJsnvaLNaHImb6VS1oalynwCF9UzpRgz+HUTYv7sMTS QqD36Jo2aqg5VuV/TQscHCMktqBiCNtnqsZKF2Xw7U8/s50BZjksGEChRJBeQH6FfZkkX8mYEKG wn8KgibTwNj7Dt+/cg== X-Google-Smtp-Source: AGHT+IFoDEHl/MhCZatevg2AOUnkQFlW9ilUT9oD51jiSUebRg+vSr6edY3XziasIWcpmEqKQXyCpw== X-Received: by 2002:a05:6402:26cf:b0:5de:42f5:817b with SMTP id 4fb4d7f45d1cf-5deade15403mr3094764a12.31.1739374961839; Wed, 12 Feb 2025 07:42:41 -0800 (PST) Received: from pathway.suse.cz ([176.114.240.130]) by smtp.gmail.com with ESMTPSA id 4fb4d7f45d1cf-5deb89289d9sm805622a12.62.2025.02.12.07.42.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 12 Feb 2025 07:42:41 -0800 (PST) Date: Wed, 12 Feb 2025 16:42:39 +0100 From: Petr Mladek To: Yafang Shao Cc: Josh Poimboeuf , jikos@kernel.org, mbenes@suse.cz, joe.lawrence@redhat.com, live-patching@vger.kernel.org Subject: Re: [PATCH 2/3] livepatch: Avoid blocking tasklist_lock too long Message-ID: References: <20250211062437.46811-1-laoar.shao@gmail.com> <20250211062437.46811-3-laoar.shao@gmail.com> <20250212004009.ijs4bdbn6h55p7xd@jpoimboe> Precedence: bulk X-Mailing-List: live-patching@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: On Wed 2025-02-12 19:54:21, Yafang Shao wrote: > On Wed, Feb 12, 2025 at 10:34 AM Yafang Shao wrote: > > > > On Wed, Feb 12, 2025 at 8:40 AM Josh Poimboeuf wrote: > > > > > > On Tue, Feb 11, 2025 at 02:24:36PM +0800, Yafang Shao wrote: > > > > void klp_try_complete_transition(void) > > > > { > > > > + unsigned long timeout, proceed_pending_processes; > > > > unsigned int cpu; > > > > struct task_struct *g, *task; > > > > struct klp_patch *patch; > > > > @@ -467,9 +468,30 @@ void klp_try_complete_transition(void) > > > > * unless the patch includes changes to a very common function. > > > > */ > > > > read_lock(&tasklist_lock); > > > > - for_each_process_thread(g, task) > > > > + timeout = jiffies + HZ; > > > > + proceed_pending_processes = 0; > > > > + for_each_process_thread(g, task) { > > > > + /* check if this task has already switched over */ > > > > + if (task->patch_state == klp_target_state) > > > > + continue; > > > > + > > > > + proceed_pending_processes++; > > > > + > > > > if (!klp_try_switch_task(task)) > > > > complete = false; > > > > + > > > > + /* > > > > + * Prevent hardlockup by not blocking tasklist_lock for too long. > > > > + * But guarantee the forward progress by making sure at least > > > > + * some pending processes were checked. > > > > + */ > > > > + if (rwlock_is_contended(&tasklist_lock) && > > > > + time_after(jiffies, timeout) && > > > > + proceed_pending_processes > 100) { > > > > + complete = false; > > > > + break; > > > > + } > > > > + } > > > > read_unlock(&tasklist_lock); > > > > > > Instead of all this can we not just use rcu_read_lock() instead of > > > tasklist_lock? > > > > I’m concerned that there’s a potential race condition in fork() if we > > use RCU, as illustrated below: > > > > CPU0 CPU1 > > > > write_lock_irq(&tasklist_lock); > > klp_copy_process(p); > > > > parent->patch_state=klp_target_state > > > > list_add_tail_rcu(&p->tasks, &init_task.tasks); > > write_unlock_irq(&tasklist_lock); > > > > In this scenario, after the parent executes klp_copy_process(p) to > > copy its patch_state to the child, but before adding the child to the > > task list, the parent’s patch_state might be updated by the KLP > > transition. This could result in the child being left with an outdated > > state. Similar race actually existed even before. > > We need to ensure that klp_copy_process() and list_add_tail_rcu() are > > treated as a single atomic operation. > > Before the newly forked task is added to the task list, it doesn’t > execute any code and can always be considered safe during the KLP > transition. Therefore, we could replace klp_copy_process() with > klp_init_process(), where we simply set patch_state to > KLP_TRANSITION_IDLE, as shown below: This might work when a new process is created, for example, using execve(). But it would fail when the process is just forked (man 2 fork) and both parent and child continue running the same code. > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -2544,7 +2544,9 @@ __latent_entropy struct task_struct *copy_process( > p->exit_signal = args->exit_signal; > } > > - klp_copy_process(p); > + // klp_init_process(p); > + clear_tsk_thread_flag(child, TIF_PATCH_PENDING); > + child->patch_state = KLP_TRANSITION_IDLE; This is exactly the loction where we depend on the sychronization with the tasklist_lock. It allows us to synchronize both TIF_PATCH_PENDING flag and p->patch_state between the parent and child, see the comment in klp_copy_process(): /* Called from copy_process() during fork */ void klp_copy_process(struct task_struct *child) { /* * The parent process may have gone through a KLP transition since * the thread flag was copied in setup_thread_stack earlier. Bring * the task flag up to date with the parent here. * * The operation is serialized against all klp_*_transition() * operations by the tasklist_lock. The only exceptions are * klp_update_patch_state(current) and __klp_sched_try_switch(), but we * cannot race with them because we are current. */ if (test_tsk_thread_flag(current, TIF_PATCH_PENDING)) set_tsk_thread_flag(child, TIF_PATCH_PENDING); else clear_tsk_thread_flag(child, TIF_PATCH_PENDING); child->patch_state = current->patch_state; } When using rcu_read_lock() in klp_try_complete_transition, child->patch_state might get an outdated information. The following race commes to my mind: CPU1 CPU1 klp_try_complete_transition() taskA: + fork() + klp_copy_process() child->patch_state = KLP_PATCH_UNPATCHED klp_try_switch_task(taskA) // safe child->patch_state = KLP_PATCH_PATCHED all processes patched klp_finish_transition() list_add_tail_rcu(&p->thread_node, &p->signal->thread_head); BANG: The forked task has KLP_PATCH_UNPATCHED so that klp_ftrace_handler() will redirect it to the old code. But CPU1 thinks that all tasks are migrated and is going to finish the transition My opinion: I am afraid that using rcu_read_lock() in klp_try_complete_transition() might cause more harm than good. The code already is complicated and this might make it even more tricky. I would first like to understand how exactly the stall happens. It is possible that even rcu_read_lock() won't help here! If the it takes too long time to check backtraces of all pending processes then even rcu_read_lock() might trigger the RCU stall warning as well. Best Regards, Petr