From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mx0b-00069f02.pphosted.com (mx0b-00069f02.pphosted.com [205.220.177.32]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EA5892A1AA for ; Fri, 16 May 2025 13:35:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=205.220.177.32 ARC-Seal:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747402535; cv=fail; b=Y8DYCC7sNrHr2dvL4ozEmtaaMnW45tdlnSQf77LYinUWVHeG5jE2t4zIycRe/afR3ksnQgjtOlDyZh6/AtCmMy5G4BuLwmbCUd46bYYJZuUX1GvLTvQX1FuYDFY9dr9GGmZd/x9r3gzZo/pvvtFqmQcggYf7hpKoJCopQz00zt0= ARC-Message-Signature:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747402535; c=relaxed/simple; bh=UN5DV3xM9oY4txGkBc47eqlm41Xjm9h/lZpRdW2wEGg=; h=Message-ID:Date:Subject:To:Cc:References:From:In-Reply-To: Content-Type:MIME-Version; b=qh4KMs99D29hhMv0E7yeJfHnhRdHey3lxOZss8/Wmh1+cUzCpQY7NHyYOnU9CG/qfBU2W2CU9s1Ohkvzyd9jwp69jbnHOdH2wJ37HBcRnsOWt3CL1euUHUXbB6miyIBsChsWd7YrnqI+oU3DY2svuOZTBVXn5CdjQXwUVmsOI2A= ARC-Authentication-Results:i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oracle.com; spf=pass smtp.mailfrom=oracle.com; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b=I0yarPZX; dkim=pass (1024-bit key) header.d=oracle.onmicrosoft.com header.i=@oracle.onmicrosoft.com header.b=p0dZtc2a; arc=fail smtp.client-ip=205.220.177.32 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oracle.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=oracle.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="I0yarPZX"; dkim=pass (1024-bit key) header.d=oracle.onmicrosoft.com header.i=@oracle.onmicrosoft.com header.b="p0dZtc2a" Received: from pps.filterd (m0333520.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 54GCg0eB018656; Fri, 16 May 2025 13:33:22 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s= corp-2025-04-25; bh=39YSi2lqG+0XIO08xiBzLpJXz+5hfM9H6u2nS6UpelA=; b= I0yarPZX5ltMsSbLVRgWpmZEDoAs503DcVmD6wrdc+pi9MlUPl8tx9w30vkSNVs/ HQdJlUMzUfI38fBaqHexaGV2l2fahE5OvK78EDbElHQi6YMRNc7LeBNuvGGzfkmm fdPMPAvHHiKHqGWhwKucIHAdFT1ombVr/x0KlpUu+cYl+TI0rLj5pNeokCn1b5RP q0I+/yBLDn2Ja5dbgE1HekTzjWq4YexPO8PkdApZ8wvEUKe13zijfRLGujOBajaZ Sj8IX1trUkHHdJTJIA0ZRrvw7awH0WsdCmnEdL6oqmJfFpL3alZF4g96vaetSzXN eVZiLE+bEjDXWob5U1DWdA== Received: from phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta03.appoci.oracle.com [138.1.37.129]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 46nrbdhd48-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 16 May 2025 13:33:22 +0000 (GMT) Received: from pps.filterd (phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com (8.18.1.2/8.18.1.2) with ESMTP id 54GCAFaL004590; Fri, 16 May 2025 13:33:21 GMT Received: from nam12-dm6-obe.outbound.protection.outlook.com (mail-dm6nam12lp2172.outbound.protection.outlook.com [104.47.59.172]) by phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 46mshmtse3-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 16 May 2025 13:33:21 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=COz7fxbBNrmES9r8jqjEdx4OM/LSjANRqVDKan0232bd/45LBuB8n0mBB+cJeWwwr/bnKx6ups3xenQGLIVCF8B0EdM+yvM4PBNbw44LS2RT4el1eBYDPx346vxIsu293cVn2NQjIsfK6rkZIfqkcsv48Wapa6b+duDCsdG0EMZ+sLuojx8qExv1ot1qh/7iUSDXCyRzUmj9TfKD0CKPMnPmN5fQiOXtfkVTkHM7lBX2WFBI4zQCWaDbUvdIfVASkGOoWc7tbGv5F5HBTAA8R817hD48ckoBzQ60s2tdLH5gFYZzM2VQYbi4G1OP+FfqFzkDd0BRP8Hu2OkVnhcclA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=39YSi2lqG+0XIO08xiBzLpJXz+5hfM9H6u2nS6UpelA=; b=VUvRTcvnmdCogceZRJhd/1SmvyDdgpY8+cqud57MxmkPT8/MmLuIMu8Rh/r4a9aavv1XXV6KWb/GUJbXlQowP3mgckIBT2CJZ5YuCyefR8eFJ2oAk/42vx09ZfzqlVVVlGs6NIqTJDQl8x6eZdEWybzBVA+rsKvFS1effnnmdUP0uzZFrZ8kseM7ZksirgrKzCmAkNXtX6GqvTcQqWmHMiykmhdYYbCweXxjGz+VoBv+npDQI4XUjIxv6kXB6B7x6SpeNSpCOgjP0YA3CydcL8JoOGn80UkmBOd9ftk6XJLfNdb7WwatBeVid1CTJNoXICKRKd3Y57SIZu2TUIiGJw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=oracle.com; dmarc=pass action=none header.from=oracle.com; dkim=pass header.d=oracle.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.onmicrosoft.com; s=selector2-oracle-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=39YSi2lqG+0XIO08xiBzLpJXz+5hfM9H6u2nS6UpelA=; b=p0dZtc2aKNZW2QdJdXcNJnnOyeFQaxfaZcMvP9yg/GeEg5VtXqPerWLLXAJTq6jgDwNqAxnPXmWvv6161i47CLRDEezUxrue7h1eRheAbydf+ws8cAtKYQb4RBxh0TBm8Fln+tWutvboPCYk+P2jOhDbytz/TEJQCADxWOl5HvU= Received: from BN0PR10MB5128.namprd10.prod.outlook.com (2603:10b6:408:117::24) by PH7PR10MB5770.namprd10.prod.outlook.com (2603:10b6:510:126::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8722.29; Fri, 16 May 2025 13:33:18 +0000 Received: from BN0PR10MB5128.namprd10.prod.outlook.com ([fe80::743a:3154:40da:cf90]) by BN0PR10MB5128.namprd10.prod.outlook.com ([fe80::743a:3154:40da:cf90%5]) with mapi id 15.20.8699.022; Fri, 16 May 2025 13:33:18 +0000 Message-ID: Date: Fri, 16 May 2025 09:33:15 -0400 User-Agent: Mozilla Thunderbird Subject: Re: non-stop kworker NFS/RPC write traffic even after unmount To: Rik Theys Cc: Daniel Kobras , Linux Nfs References: <79767ded-466f-44a5-b15a-fde5af1b03c7@esat.kuleuven.be> <2c1a60a7-051f-4952-84fe-c3a4b6b0327e@puzzle-itc.de> <42c84eb6-ede0-4e68-ae70-334365e2ae7f@esat.kuleuven.be> <62cb66ff-b718-4369-a7f1-fd3bb01a7b16@puzzle-itc.de> <4d4f781a-d668-4f49-9cfd-2e9e94a8cb71@esat.kuleuven.be> <8abc8a16-cbdb-4285-a2da-62f57fbbb165@esat.kuleuven.be> <9c446dc2-fe2e-4bd2-9ad5-f4015b0e2ffa@esat.kuleuven.be> <3c1acadf-b2ed-42b8-926e-662df5a8aa4c@oracle.com> <547993bc-80ed-47ee-b1b7-cbe83da1eae3@esat.kuleuven.be> Content-Language: en-US From: Chuck Lever In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-ClientProxiedBy: CH0PR03CA0329.namprd03.prod.outlook.com (2603:10b6:610:118::19) To BN0PR10MB5128.namprd10.prod.outlook.com (2603:10b6:408:117::24) Precedence: bulk X-Mailing-List: linux-nfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BN0PR10MB5128:EE_|PH7PR10MB5770:EE_ X-MS-Office365-Filtering-Correlation-Id: aab96e41-1098-475b-4a89-08dd947e37a7 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|366016|376014|1800799024; X-Microsoft-Antispam-Message-Info: =?utf-8?B?dmFZUlMvV2Z3VHlzS2psSjE5bks1eTd6YktmbHRvWkZFNXpqdUtjMUljZ0Fx?= =?utf-8?B?cXBvaWdyVmpMKytuTWNZYnMyeVZRZDZ4Zmp5ZHhCQ21Ib3g2S3Y3TGphT2ZB?= =?utf-8?B?cFRYZERUcFNKY2NkaW9VS09sajJoUFFBeEZnMFBFQVNUdjB6T25XdnRFcVQx?= =?utf-8?B?NXNSS1ZLN0Y0cWxzMTJCcStjdnVOdGNaVHFCaWRsa0RnQjh6amdKWmpZU0xO?= =?utf-8?B?M1hiSThrQ2U0N0NYOFlMWGhMeGdidkJCVHRJbncrMUxBNS9ENEh2eFZyb0w4?= =?utf-8?B?V29na1YrWG5FMUhuMnVtTUxNamFrbmFNeVFjbWovZFk0cUdDNnhPbitJNEpp?= =?utf-8?B?Vi9acHZYeXJocUdyYkVnQVJXRlYyaXhOcVlYMTZXK05ubW9nSWtmU2NPMDBv?= =?utf-8?B?Y05id2NYd3F0SGowZDROenZtV3NXY0dQdTAzUzhnYkhnelRocWkvc09TSE5M?= =?utf-8?B?U0tqWG8ranF4RVhDcDAwTjRDMHQ1c3RCQy9USVdQWXBCRHJLdTRIS1RBSHJz?= =?utf-8?B?QXdWcmlSdmt4bmlDL1ZMdXNwWFp4TWF3aUZxSkw2Y3V6K3BQZldxVTVRbU96?= =?utf-8?B?WGlnd1YyTXRrQ2hYZkg2M1UzTDVxSW9wWTM0aEhSMEwwMFh1VURjdUh2dUkr?= =?utf-8?B?c0h1dkZpT214bjY2ZjZySWtkOWYyWDdCNXR4cTNuS2RyRE5WemZHVDZDSkpq?= =?utf-8?B?R041eFFqN0xZTmlhbDNCekxjRGJ3akFOUUJtNGEwRlBtYTFiYXlSUlI0Qk5S?= =?utf-8?B?ZnNMOWJkZXExbFg0V2JNalozYmV6VnErUUFISjJGSWd6Q0xXSTh4TzdEa2Vl?= =?utf-8?B?MnIzYnR2MEdBYk1waG42NHBmbDJ6UmRpcE1YYVBFUVJ2MHozWDdRZzJqL21W?= =?utf-8?B?NEVuV0Nja3crTVZza3hhL1UySUh3NXF5UjFZcGZTb0hrcUFxUkVIQWdUM1Nh?= =?utf-8?B?L2w5d3hlL3dOcW1EQWd3WkorcFNEWUowY0t2QmdJeEVDZnVObVdPdVh4VnNv?= =?utf-8?B?amp3cFlqUnNBRDhsUTRmdmFxS3doVDc4RHVDMzFBdi9henNsU01XL1d2WnJ1?= =?utf-8?B?MEE2eWF1THcxWjBTZ0lyRXI2ekowNnJVVGp0N2xiOFlGT09mOTh6R0JXbkFl?= =?utf-8?B?a1FNa3FjNlJzc0I2Q3JpUnptSzI4SmVoU2YxMTIzU1RKY2RndVZxV21EVkNk?= =?utf-8?B?RjBISmcyR2NLekFoa1hNY3JYVnZGLzc0eHp6amhIUmdFc0RDVEIyMUpNNWFM?= =?utf-8?B?dUQ4NnhnZG1RcnM1WmNsNTZKU1ZpQnpyZmo0WTJiOTU4SmowaW9SSmVYM0pa?= =?utf-8?B?aW9EMUlqL2xveUZvVUd3OVJncEQrVE9oZGh0Q1lyaFZnMURIeVdpQzh1TmI3?= =?utf-8?B?ZHJ4d3BCVDFNMS9aNjBGRHZ4S0JOVUpjUjhpb2gvYkh4LzF5ZTB6d2owVlBa?= =?utf-8?B?cFpwZTY3dG43R0hyVkFUSjM0S210cnJmQlBlNTZtRUgvRFVvVDljQzlDTWxL?= =?utf-8?B?dWZuL0pYNXladHg1dXVNTzFxN2NkN1o5Q3NTeEV3Y0wrU1FUVy93ZFpJWFVk?= =?utf-8?B?V01DYS8xK2pMSlJEMThhTnZvY1M1clJuWnJ0LzBXV2N3TlFhTjFFZDVneFJk?= =?utf-8?B?MjlwL0FLV2ZsTjRMbjYwZHdBcDgzeUMzeSs5WVFSNytMWlRnU3JGa3hXYTVz?= =?utf-8?B?Vy9Na0VTc1FlQzBETzVQeEFqVTVUSEN5cit2ckNpVmVTSERZcldFZ3Y2aWVW?= =?utf-8?B?YWpXVGs5MjlFL3lXU01nOVhZL1BIZEluNkpRWHJRbkZ1d0l4L0hIZ043ZGFv?= =?utf-8?B?Zm1kSGNPWEdnMzc3aWF0ZU1CaWkrVmpzTXdQWDRWc05oL284aDFvK1hqRjNy?= =?utf-8?B?ald3Z1UvaGJ4ZXlURE5wREVsMGY4TnY3RjlkNDgyQ1daWjRhM0pBQkltQUdG?= =?utf-8?Q?NkNTRYdByNo=3D?= X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:BN0PR10MB5128.namprd10.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(366016)(376014)(1800799024);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?emh1RExEQllpMm9mcFNqNTNGSzQ1UHpocmtOWXR1NWZlN3QxZ3p4NTF1dlhu?= =?utf-8?B?citycWxOalVQOTNSU0c1VHkwTVI3U0lac1JvUlZDcDZJbG9RSDN4b2pGNG1p?= =?utf-8?B?OW1ybWc1SU82WW5BZkQ5QXVvTGl5L2I5YTVZV3JvYW5XQ1ZESjlnSGFMTTJM?= =?utf-8?B?WFRhck42d1ZIWjJhMG90S2RmN2tiU1JnSzlCNmdPRm13OEZXTXdIQjdwUzhR?= =?utf-8?B?YTZEeDkva3RIamVCbytDaGg2a3RyMjdSK0UyQjM5L3o2c2hrckxQaWh3ZEZ0?= =?utf-8?B?SEQzcXNLZitKQTd3QXZ3NVhuTVJZNi9oVC9zNVEwNHdRSm5JaytGMjhXQTZF?= =?utf-8?B?SllMd3I3RUhhc3Vma0Z6UXlicmE5NHRYbjVhLy9lUzlsMGYvSTdLVWVGSnRR?= =?utf-8?B?bklSTjlNaTlIM3pQQ20xaXJxMWpjbjdTZnc2SVZyU0lsVllOY0ZkSzFrMmU0?= =?utf-8?B?OWxsTTFTR0xCYkhpeEJTckdiajlDYnQzcDFZcXdRME9vcWFBSllybmlQMGZ3?= =?utf-8?B?SlBCK2tGK2IyTmNGUys3b2FoaVU5U2lvZG9ua1NTeWVFbHM2algxelkwQ3lU?= =?utf-8?B?RXg4dGpvSGNYOHZtbTdiYWtNeGVxUEJSME9tbkdOZVNpL2FGRHN6ZlVzQStj?= =?utf-8?B?WkZQRVU4TjhHQ00vWnhzZGpKZVBsNHFkM3kyY2VQL2N4Rzg1ZFpUUzhJMVhD?= =?utf-8?B?UlhlZVVrQ2ZLT2hWek5sUUdaV29hd0RWNGJQWk5QOWoyaWsxRE1XWERscWlI?= =?utf-8?B?WTF1clRjeXFCKzJFZHFXM0ZiN1liYStSMzc5U2NSVTBSWFBOSTAvZmgzR2lv?= =?utf-8?B?K285K2VxTlRvUUkvZlJRMEM4THVGa09OS01McVhHeElOT0xjRHIybTE2d0RQ?= =?utf-8?B?Wk93Q2hmT0trRTZtSEp1Z2QyMzkrVU84T0srdnhKVWU5OVpnV01aOTlPN1dZ?= =?utf-8?B?MGVtZjZMRllvNVR6eWdhbC9xUU9hRktKMWkvaDJobEMzVkY5T05MMkNWa0E3?= =?utf-8?B?cldDc2g5dTlUSjZEbDdNM0oweUd2WXNBMFByVmdzZ0Z1UE9NY050MDN1blNz?= =?utf-8?B?enhqb04reERjekM1a2pSNGpwRC9ENm5BYjlsaElrblI2Z1BWRnpBNHNDWDR1?= =?utf-8?B?R2MxSEEwY2IxNmJXcUJMWnNOQmQyK2VzV2hUSDduTDlQaW1Hc1o1eXZWNWNm?= =?utf-8?B?aVNyV1NIUmkzdDZldEhIR3lVbWJ6MHRCSFY0enlyOXRkUFp4QkgwSmQ4WUlO?= =?utf-8?B?Y0dISnpjVUUzcjM3a3orL1pKZ0JrL0s0YVV6RHFNODYrNjh0cGJCc0ZNUk1w?= =?utf-8?B?M1FwakdtU01OL3FoR0dhUmt0WWw1YVBJNWh1Uy94cXR1Y3FweWpHL1NDQ1dG?= =?utf-8?B?S0ZMWncvRE5lbVIrZGszcEhvbnc1K0cvTHhQbmIvajkrTjg5b0FvV3lRMWpl?= =?utf-8?B?NkhBcTZZN0Uxd3Y4Z0lLWTdjOG1XcjR6RnNUdDh3eEhQUk1LTWtrUWVqRmNG?= =?utf-8?B?STZVdEVJbEZlZUlERkE2RU80RElVQmR3M1N2eUtJNVorYmV2MWV1c3hrNURo?= =?utf-8?B?NDUyVkZrZkhmcTI2SlVwS2Jab1NRTlRBZnVDeG5KRVA4SlpJV0tsL0p1VS96?= =?utf-8?B?SkpTOEw4WmVaZXUzWjBlV2pFT1ZWTnU1d3lzdzhuVUVHUTB2bk5oRnB1dkU3?= =?utf-8?B?VmpkRVI0MXJoQlFNcm1pUWR4clAxRU1kcTcxUk5TM2ZQZHpBdmhaN1VlRkNP?= =?utf-8?B?S2tETEM5QkM4cVVKbEhXdWIyT1R2Y3VJTjhGaUFKUWw3d3JSclczbEhnMWdX?= =?utf-8?B?K09TTVp3cUlCSFJ6SmhDMGpraDMwK1l3a21FZHYvVHA1bTZ5dUc0cDAzRzc5?= =?utf-8?B?MmVoYXBveDViZ0VDSjlOQ1BvLzJDNGhoQXBKMXR0OEVNdTBQbUJyck1lUGJR?= =?utf-8?B?emhDZnYxVnpUZVcvQmFZOXk1TXY0eWdGWmQzekVndURzcjhmTUhkTHpZRGRW?= =?utf-8?B?ZDdvUGpWVXU5NTFCbitvYnBDYWl1aER2S0o3cjJOVnpyRmg0VVZzdFVWQzRj?= =?utf-8?B?LzNEeWQzTHF4K0RaeXJNdWZvaXJBejBzb2NidjlJUFMyb0lnME1FYURSZ0Ju?= =?utf-8?B?cGYvdGJ0UHNVMndhQ0dQV3RvYUJVQU9QajJUaEIvTmtIR2JEVWo1Q1FHV3Rx?= =?utf-8?B?MUE9PQ==?= X-MS-Exchange-AntiSpam-ExternalHop-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-ExternalHop-MessageData-0: SdKlKoh3JOpVFp+X2IlnY6yJutZCXtcO5z9j6bVEhY71RKlWodLMITkxMHinsWlqLgdIucUaJzcnaFyU6WZz+Fy8hdG5ruZBJkk8+4d1XyV/qWzgbi6NXVwxBT1J+soYTm5pvfYXBazRaPSUTuXeIjIGS3nlP6RrBU2lXyAN51sZ6PPOjItUMM9o44/jJn0bmUZxNPxBohBPVyp41nQIlo/qvXeDAKXeUMjoEOq3fdPdKfZha9GddSh73CdHXu6R0gUXTWaeIvUO/96kKXT8/iLb6NNmiwWFLxQEDeczEM6V8CN4rqVJ0LErWBgu5vm+tzvn2dTjoXIy8AMX6tMdlCoKoIQ/S4lNy3K3geGkE0YtYA/4zZS7GHkodBf+RLZSYXIoyeGr8IYVtFsbH1QPWf7CYitB0zHPCW3Vnr+qREWJQMYjDGff4SR+2PawfgMmKZbp/WXrYyiewAOatEmwGWoV/8+v6xG3vF+RtNolD6Wx7v635auhSs5Jo7ql/kqMsUWpJYIZwInwNuqrBXhgcy5DG8ZERoqbploDacBm/QKJBQ1fHG3yMu80001KDTvbznKfMPjjTipox2rkwYyfhG3bmigJV2MRQzYiwcVorJw= X-OriginatorOrg: oracle.com X-MS-Exchange-CrossTenant-Network-Message-Id: aab96e41-1098-475b-4a89-08dd947e37a7 X-MS-Exchange-CrossTenant-AuthSource: BN0PR10MB5128.namprd10.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 16 May 2025 13:33:17.9333 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 4e2c6054-71cb-48f1-bd6c-3a9705aca71b X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: bTQkGtKYpsPNCWrm0Atln1YFROiW0PLaCFX5ra3uimWdbjgc51xtqrgB8+N/KGMJzGoaUhOre0ati5w6ErkBiQ== X-MS-Exchange-Transport-CrossTenantHeadersStamped: PH7PR10MB5770 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1099,Hydra:6.0.736,FMLib:17.12.80.40 definitions=2025-05-16_05,2025-05-16_02,2025-03-28_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 phishscore=0 adultscore=0 bulkscore=0 malwarescore=0 mlxlogscore=999 mlxscore=0 spamscore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2505070000 definitions=main-2505160130 X-Proofpoint-GUID: VXrcMOYPyOalMPGX-i3naFZ6crUksP0s X-Proofpoint-ORIG-GUID: VXrcMOYPyOalMPGX-i3naFZ6crUksP0s X-Authority-Analysis: v=2.4 cv=G/McE8k5 c=1 sm=1 tr=0 ts=68273ea2 b=1 cx=c_pps a=WeWmnZmh0fydH62SvGsd2A==:117 a=WeWmnZmh0fydH62SvGsd2A==:17 a=lCpzRmAYbLLaTzLvsPZ7Mbvzbb8=:19 a=wKuvFiaSGQ0qltdbU6+NXLB8nM8=:19 a=Ol13hO9ccFRV9qXi2t6ftBPywas=:19 a=xqWC_Br6kY4A:10 a=IkcTkHD0fZMA:10 a=dt9VzEwgFbYA:10 a=GoEa3M9JfhUA:10 a=cBPnGxsydAFukLISLhQA:9 a=QEXdDO2ut3YA:10 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUwNTE2MDEzMCBTYWx0ZWRfXzXrNsvHLClm6 8WTjwknLlOtHWgL07QMfCzWbv1wcxVKt6lxFvqJGNMCuY87D9iUj6fVZu/3Joy0gMOonCwRjttb cpnUSOFdAhNedm0gg+wX32CXFpvCwfqOdoiBqzz6cUPTXUy0/U6qKmEnYjQn/DPmq2XUGUd/9nc 3A2iKj+1S6tICNFCxcPPSll7modDa8swyHO/mLoSCFq0sfs/mFI+R1R+lA/rH1Q8Yuf3qiaeXCK IoSJu+r04G4BenP+BDy6gxh8R2BVeHDxGy1PZr2LZ29G1054YVZVcNxbZNI0U+b1HKVDB6QAQ7H sXLl4LgxA6ldidzRUV0bL/ttMPmkjTw0ZDOL84VMplBbpSLXv36j40xt7tsz+hm5smy4ogd2MRT F/lGj1eAIynPnGCEOEEitIE1Qc73Yl0Mh7Bsb9vOj1Op8zmHMTlOQ0GBHYs4OCU159FcPaG3 On 5/16/25 9:09 AM, Rik Theys wrote: > Hi, > > On 5/16/25 2:59 PM, Chuck Lever wrote: >> On 5/16/25 8:36 AM, Rik Theys wrote: >>> Hi, >>> >>> On 5/16/25 2:19 PM, Chuck Lever wrote: >>>> On 5/16/25 7:32 AM, Rik Theys wrote: >>>>> Hi, >>>>> >>>>> On 5/16/25 11:47 AM, Rik Theys wrote: >>>>>> Hi, >>>>>> >>>>>> On 5/16/25 8:17 AM, Rik Theys wrote: >>>>>>> Hi, >>>>>>> >>>>>>> On 5/16/25 7:51 AM, Rik Theys wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> On 4/18/25 3:31 PM, Daniel Kobras wrote: >>>>>>>>> Hi Rik! >>>>>>>>> >>>>>>>>> Am 01.04.25 um 14:15 schrieb Rik Theys: >>>>>>>>>> On 4/1/25 2:05 PM, Daniel Kobras wrote: >>>>>>>>>>> Am 15.12.24 um 13:38 schrieb Rik Theys: >>>>>>>>>>>> Suddenly, a number of clients start to send an abnormal amount >>>>>>>>>>>> of NFS traffic to the server that saturates their link and >>>>>>>>>>>> never >>>>>>>>>>>> seems to stop. Running iotop on the clients shows kworker- >>>>>>>>>>>> {rpciod,nfsiod,xprtiod} processes generating the write traffic. >>>>>>>>>>>> On the server side, the system seems to process the traffic as >>>>>>>>>>>> the disks are processing the write requests. >>>>>>>>>>>> >>>>>>>>>>>> This behavior continues even after stopping all user processes >>>>>>>>>>>> on the clients and unmounting the NFS mount on the client. Is >>>>>>>>>>>> this normal? I was under the impression that once the NFS mount >>>>>>>>>>>> is unmounted no further traffic to the server should be >>>>>>>>>>>> visible? >>>>>>>>>>> I'm currently looking at an issue that resembles your >>>>>>>>>>> description >>>>>>>>>>> above (excess traffic to the server for data that was already >>>>>>>>>>> written and committed), and part of the packet capture also >>>>>>>>>>> looks >>>>>>>>>>> roughly similar to what you've sent in a followup. Before I dig >>>>>>>>>>> any deeper: Did you manage to pinpoint or resolve the problem in >>>>>>>>>>> the meantime? >>>>>>>>>> Our server is currently running the 6.12 LTS kernel and we >>>>>>>>>> haven't >>>>>>>>>> had this specific issue any more. But we were never able to >>>>>>>>>> reproduce it, so unfortunately I can't say for sure if it's >>>>>>>>>> fixed, >>>>>>>>>> or what fixed it :-/. >>>>>>>>> Thanks for the update! Indeed, in the meantime the affected >>>>>>>>> environment here stopped showing the reported behavior as well >>>>>>>>> after a few days, and I don't have a clear indication what might >>>>>>>>> have been the fix, either. >>>>>>>>> >>>>>>>>> When the issue still occurred, it could (once) be provoked by >>>>>>>>> dd'ing 4GB of /dev/zero to a test file on an NFSv4.2 mount. The >>>>>>>>> network trace shows that the file is completely written at wire >>>>>>>>> speed. But after a five second pause, the client then starts >>>>>>>>> sending the same file again in smaller chunks of a few hundred MB >>>>>>>>> at five second intervals. So it appears that the file's pages are >>>>>>>>> background-flushed to storage again, even though they've already >>>>>>>>> been written out. On the NFS layer, none of the passes look >>>>>>>>> conspicuous to me: WRITE and COMMIT operations all get NFS4_OK'ed >>>>>>>>> by the server. >>>>>>>>> >>>>>>>>>> Which kernel version(s) are your server and clients running? >>>>>>>>> The systems in the affected environment run Debian-packaged >>>>>>>>> kernels. The servers are on Debian's 6.1.0-32 which corresponds to >>>>>>>>> upstream's 6.1.129. The issues was seen on clients running the >>>>>>>>> same >>>>>>>>> kernel version, but also on older systems running Debian's >>>>>>>>> 5.10.0-33, corresponding to 5.10.226 upstream. I've skimmed the >>>>>>>>> list of patches that went into either of these kernel versions, >>>>>>>>> but >>>>>>>>> nothing stood out as clearly related. >>>>>>>>> >>>>>>>> Our server and clients are currently showing the same behavior >>>>>>>> again: clients are sending abnormal amounts of write traffic to the >>>>>>>> NFS server and the server is actually processing it as the writes >>>>>>>> end up on the disk (which fills up our replication journals). iotop >>>>>>>> shows that the kworker-{rpciod,nfsiod,xprtiod} are responsible for >>>>>>>> this traffic. A reboot of the server does not solve the issue. Also >>>>>>>> rebooting individual clients that are participating in this does >>>>>>>> not >>>>>>>> help. After a few minutes of user traffic they show the same >>>>>>>> behavior again. We also see this on multiple clients at the same >>>>>>>> time. >>>>>>>> >>>>>>>> The NFS operations that are being sent are mostly putfh, sequence >>>>>>>> and getattr. >>>>>>>> >>>>>>>> The server is running upstream 6.12.25 and the clients are running >>>>>>>> Rocky 8 (4.18.0-553.51.1.el8_10) and 9 (5.14.0-503.38.1.el9_5). >>>>>>>> >>>>>>>> What are some of the steps we can take to debug the root cause of >>>>>>>> this? Any idea on how to stop this traffic flood? >>>>>>>> >>>>>>> I took a tcpdump on one of the clients that was doing this. The pcap >>>>>>> was stored on the local disk of the server. When I tried to copy the >>>>>>> pcap to our management server over scp it now hangs at 95%. The >>>>>>> target disk on the management server is also an NFS mount of the >>>>>>> affected server. The scp had copied 565MB and our management server >>>>>>> has now also started to flood the server with non-stop traffic >>>>>>> (basically saturating its link). >>>>>>> >>>>>>> The management server is running Debian's 6.1.135 kernel. >>>>>>> >>>>>>> It seems that once a client has triggered some bad state in the >>>>>>> server, other clients that write a large file to the server also >>>>>>> start to participate in this behavior. Rebooting the server does not >>>>>>> seem to help as the same state is triggered almost immediately again >>>>>>> by some client. >>>>>>> >>>>>> Now that the server is in this state, I can very easily reproduce >>>>>> this >>>>>> on a client. I've installed the 6.14.6 kernel on a Rocky 9 client. >>>>>> >>>>>> 1. On a different machine, create an empty 3M file using "dd if=/dev/ >>>>>> zero of=3M bs=3M count=1" >>>>>> >>>>>> 2. Reboot the Rocky 9 client and log in as root. Verify that there >>>>>> are >>>>>> no active NFS mounts to the server. Start dstat and watch the output. >>>>>> >>>>>> 3. From the machine where you created the 3M file, scp the 3M file to >>>>>> the Rocky 9 client in a location that is an NFS mount of the server. >>>>>> In this case it's my home directory which is automounted. >>>>> I've reproduced the issue with rpcdebug on for rpc and nfs calls (see >>>>> attachment). >>>>>> The file copies normally, but when you look at the amount of data >>>>>> transferred out of the client to the server it seems more than the 3M >>>>>> file size. >>>>> The client seems to copy the file twice in the initial copy. The first >>>>> line on line 13623, which results in a lot of commit mismatch error >>>>> messages. >>>>> >>>>> Then again on line 13842 which results in the same commit mismatch >>>>> errors. >>>>> >>>>> These two attempts happen without any delay. This confirms my previous >>>>> observation that the outbound traffic to the server is twice the file >>>>> size. >>>>> >>>>> Then there's an NFS release call on the file. >>>>> >>>>> 30s later on line 14106, there's another attempt to write the file. >>>>> This >>>>> again results in the same commit mismatch errors. >>>>> >>>>> This process repeats itself every 30s. >>>>> >>>>> So it seems the server always returns a mismatch? Now, how can I solve >>>>> this situation? I've tried rebooting the server last night, but the >>>>> situation reappears as soon as clients start to perform writes. >>>> Usually the write verifier will mismatch only after a server restart. >>>> >>>> However, there are some other rare cases where NFSD will bump the >>>> write verifier. If an error occurs when the server tries to sync >>>> unstable NFS WRITEs to persistent storage, NFSD will change the >>>> write verifier to force the client to send the write payloads again. >>>> >>>> A writeback error might include a failing disk or a full file system, >>>> so that's the first place you should look. >>>> >>>> >>>> But why the clients don't switch to FILE_SYNC when retrying the >>>> writes is still a mystery. When they do that, the disk errors will >>>> be exposed to the client and application and you can figure out >>>> immediately what is going wrong. >>>> >>> There are no indications of a failing disk on the system (and the disks >>> are FC attached SAN disks) and the file systems that have the high write >>> I/O have sufficient free space available. Or can a "disk full" message >>> also be caused by disk quota being exceeded? As we do use disk quotas. >> That seems like something to explore. > > It's also strange that this would affect clients that are writing to the > same NFS filesystem but as a user that doesn't have any quota limits > exceeded, no? Or does the server interpret the "quota exceeded" for one > user on that filesystem as a global error for that filesystem? > > >> >> The problem is that the NFS protocol does not have a mechanism to expose >> write errors that occur on the server after it responds to an NFS >> UNSTABLE WRITE: NFS_OK, we received your data, but before the COMMIT >> occurs. >> >> When a COMMIT fails in this way, clients are supposed to change to >> FILE_SYNC and try the writes again. A FILE_SYNC WRITE flushes all the >> way to disk so any recurring error appears as part of the NFS >> operation's status code. The client is supposed to treat this as a >> permanent error and stop the loop. >> > Then there's probably a bug in the client code somewhere as the > client(s) did not do that... >>> Based on your last paragraph I conclude this is a client side issue? The >>> client should switch to FILE_SYNC instead? We do export the NFS share >>> "async". Does that make a difference? >> I don't know, because anyone who uses async is asking for trouble >> so we don't test it as an option that should be deployed in a >> production environment. All I can say is "don't do that." >> >> >>> So it's a normal operation for the server to change the write verifier? >> It's not a protocol compliance issue, if that's what you mean. Clients >> are supposed to be prepared for a write verifier change on COMMIT, full >> stop. That's why the verifier is there in the protocol. >> >> >>> The clients that showed this behavior ran a lot of different kernel >>> versions, from the RHEL 8/9 kernels, the Debian 12 (6.1 series), Fedora >>> 42 kernel and the 6.14.6 kernel on a Rocky 9 userland. So this must be >>> an issue that is present in the client code for a very long time now. >>> >>> Since approx 14:00 the issue has suddenly disappeared as suddenly as it >>> started. I can no longer reproduce it now. >> Then hitting a capacity or metadata quota might be the proximate cause. >> >> If this is an NFSv4.2 mount, is it possible that the clients are >> trying to do a large COPY operation but falling back to read/write? >> > All mounts are indeed 4.2 mounts. It's possible that some clients were > doing COPY operations, but when I reproduced the issue on my test client > it did not perform a COPY operation (as far as I understand the pcap/ > rpcdebug output). > > Are there any debugging steps we can take when this issue happens again? Since v5.17, there is a trace point in NFSD called trace_nfsd_writeverf_reset. I really don't have much experience to share regarding your filesystem and storage specific failures. You will have to use Google and StackOverflow for those details. -- Chuck Lever