From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from szxga03-in.huawei.com (szxga03-in.huawei.com [45.249.212.189]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4621D8828; Sun, 26 Jan 2025 02:20:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.189 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737858015; cv=none; b=EZxgKDM8fHh5U2G6b+V7J6e8rIs3UjLxxgULH11dC89Shd/L4e8GbUyJuLpMZ6yta+U9/+rMUycW6SLhwbH1oHSPIrTfa4dst9f86RORJK67Ns1etu+Ep9oDNBrV3fop0KZsqfE9LmonVs9KkqOEwHEhFh7mJVRPfRCpD7BDYvg= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737858015; c=relaxed/simple; bh=3R9vtvzA2fnGngFx7Exo7oKdsmC2U3/VpD/JY68oUkY=; h=Message-ID:Date:MIME-Version:Subject:To:CC:References:From: In-Reply-To:Content-Type; b=PpPHZtIPn5TMtL+zvh9+dnc6MZI17NWAzFel/Uq51UcWnSwDXtov/VUW+vbx0KXKnwYG3QD4zgUneBqH/B/+6L0XHXC450qRNYPkS2ev4GB5bOmr3Tq7ZZu8VMU/CS03YdKsOCrEYbEcbIgARSHFsBRyEis8nUwkQ6Tr3Q9w8G0= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; arc=none smtp.client-ip=45.249.212.189 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.19.88.194]) by szxga03-in.huawei.com (SkyGuard) with ESMTP id 4YgZvL0dGVzRlnl; Sun, 26 Jan 2025 10:17:34 +0800 (CST) Received: from kwepemg500017.china.huawei.com (unknown [7.202.181.81]) by mail.maildlp.com (Postfix) with ESMTPS id 64A861402C1; Sun, 26 Jan 2025 10:20:08 +0800 (CST) Received: from [10.174.179.155] (10.174.179.155) by kwepemg500017.china.huawei.com (7.202.181.81) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Sun, 26 Jan 2025 10:20:06 +0800 Message-ID: Date: Sun, 26 Jan 2025 10:20:05 +0800 Precedence: bulk X-Mailing-List: linux-nfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: =?UTF-8?B?TW96aWxsYSBUaHVuZGVyYmlyZCDmtYvor5XniYg=?= Subject: Re: [PATCH] SUNRPC: Set tk_rpc_status when RPC_TASK_SIGNALLED is detected To: Trond Myklebust , "tom@talpey.com" , "davem@davemloft.net" , "chuck.lever@oracle.com" , "kuba@kernel.org" , "okorniev@redhat.com" , "anna@kernel.org" , "pabeni@redhat.com" , "horms@kernel.org" , "Dai.Ngo@oracle.com" , "yangerkun@huawei.com" , "jlayton@kernel.org" , "edumazet@google.com" , "neilb@suse.de" CC: "houtao1@huawei.com" , "linux-nfs@vger.kernel.org" , "yi.zhang@huawei.com" , "linux-kernel@vger.kernel.org" , "lilingfeng@huaweicloud.com" , "yukuai1@huaweicloud.com" , "netdev@vger.kernel.org" References: <20250114144101.2511043-1-lilingfeng3@huawei.com> <642413c4bdbe296db722f0091ffa5190c992eb8e.camel@hammerspace.com> <58bf9d83-b58d-e5a6-4096-64eb96f3854a@huawei.com> <4d3e8d4385a511860ec9018b3ca864e7ef3a7b48.camel@hammerspace.com> From: Li Lingfeng In-Reply-To: <4d3e8d4385a511860ec9018b3ca864e7ef3a7b48.camel@hammerspace.com> Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit X-ClientProxiedBy: dggems702-chm.china.huawei.com (10.3.19.179) To kwepemg500017.china.huawei.com (7.202.181.81) 在 2025/1/17 11:15, Trond Myklebust 写道: > On Fri, 2025-01-17 at 10:29 +0800, yangerkun wrote: >> >> 在 2025/1/17 4:52, Trond Myklebust 写道: >>> On Thu, 2025-01-16 at 19:43 +0800, yangerkun wrote: >>>> Hi, >>>> >>>> Thanks for the patch. >>>> >>>> Before 39494194f93b("SUNRPC: Fix races with rpc_killall_tasks()", >>>> every >>>> time we set RPC_TASK_SIGNALLED, when we go through __rpc_execute, >>>> this >>>> rpc_task will immediate break and exist. >>>> >>>> However after that, __rpc_execute won't judge RPC_TASK_SIGNNALED, >>>> so >>>> for >>>> the case like you point out below, even after your commit >>>> rpc_check_timeout will help break and exist eventually, but this >>>> rpc_task has already do some work. I prefer reintroduce judging >>>> RPC_TASK_SIGNNALED in __rpc_execute to help exist immediatly. >>>> >>> Better yet... Let's get rid of the RPC_TASK_SIGNALLED flag >>> altogether >>> and just replace >>> >>> #define RPC_TASK_SIGNALLED(task) (READ_ONCE(task->tk_rpc_status) == >>> -ERESTARTSYS) Hi, I'm not quite clear on how this can resolve the issue. If we remove the RPC_TASK_SIGNALLED flag and replace setting tk_runstate to RPC_TASK_SIGNALLED with setting tk_rpc_status to -ERESTARTSYS in rpc_signal_task, wouldn't setting tk_rpc_status back to 0 in __rpc_restart_call still lead to an infinite loop in the rpc_task? Could you please provide a more detailed explanation? Thanks. >> Hi, >> >> Thanks for your reply! Yeah, if all the places where tk_rpc_status is >> updated are by calling rpc_task_set_rpc_status, we can use >> task->tk_rpc_status == -ERESTARTSYS to determine whether rpc_task is >> RPC_TASK_SIGNALLED. But for the case like Li has provided, >> __rpc_restart_call won't do this, and will overwrite tk_rpc_status >> unconditionally. This won't be a stable solution. Maybe it's better >> to >> change __rpc_restart_call calling rpc_task_set_rpc_status too? And >> __rpc_execute will be enough to help solve this case. >> >> > That would break __rpc_restart_call() to the point of rendering it > completely useless. > The whole purpose of that call is to give the NFS layer a chance to > handle errors in the exit callback, and then kick off a fresh call. > Your suggestion would mean that any RPC level error sticks around, and > causes the new call to immediately fail. > > I see no point in doing anything more than fixing the looping > behaviour. Eliminating the redundant flag will do that. >