From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4168C332604 for ; Thu, 25 Jun 2026 07:51:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=192.198.163.12 ARC-Seal:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782373901; cv=fail; b=EGxqY7w1RL/PXYylOaF6PCCtYpL+ldkxa99wogR1zaoAe/ot99yh7ZHAbIcRRqFlTS9dJfkmupTNLJAteg73GdThokLxfNkbPUaSD2srU3CKy2w0p8rEoVETs9QbSPnT19RVO/Bpl1eCeAsXMsim9OG09r4jrNJzjK+w8dB3o/w= ARC-Message-Signature:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782373901; c=relaxed/simple; bh=PBJYErrS/KUXoJQLmsW+YqFQ1cup7rVkCs++5dEkt/Q=; h=Message-ID:Date:Subject:To:CC:References:From:In-Reply-To: Content-Type:MIME-Version; b=ZG9wKhS2eZ+/JFlnKEiCEDa9V+tptkGsToZK3qV84BAq/3daFM8jje3jsnzWLAH8/Mwzj/7KoP8/Ic/qBc0JmTcH4wEfC25Qssaxddnjh+r+rcLnq1il3QWt12EZIC1jxCvuIzNNef6WxVJL3b+zzKd8W7mMP9m6/vDBv26CphI= ARC-Authentication-Results:i=2; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=D2yyhuFh; arc=fail smtp.client-ip=192.198.163.12 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="D2yyhuFh" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1782373899; x=1813909899; h=message-id:date:subject:to:cc:references:from: in-reply-to:content-transfer-encoding:mime-version; bh=PBJYErrS/KUXoJQLmsW+YqFQ1cup7rVkCs++5dEkt/Q=; b=D2yyhuFhjdusDPgKwxCm4q3oszsXBengWgXOsLQHCR8TEC8k9JEXIYPy K7xVNy33MsRp/PPYVyFEelCYtk2+THdBJ87INSRIL/ZtRNkOobW7I2e24 76eDVKjudxUOciDB5SyvVjsawmQNBQjcZgS9/CyC9zKQcDp9XOIN7wrJd hcVovm685hnwhlu/I3kaMAnApTmN7grxPwL0ozQNocJHVgQetFwALNiG6 qk1fw+a6BaaEDy6jOzWQMslv/wE7XGsMuEaVnbEhydWPGbT9qtV53Vugp tO0nhgbv2MOpemxl7VxBLRsivOVH4/CYQp04qiQ3Wb7/zJi2l0FSqZjYr A==; X-CSE-ConnectionGUID: 7u5G4+MPSp2wvX6K0QW/eg== X-CSE-MsgGUID: k32hRirxQLCt0ttAb8tk1A== X-IronPort-AV: E=McAfee;i="6800,10657,11827"; a="86991184" X-IronPort-AV: E=Sophos;i="6.24,224,1774335600"; d="scan'208";a="86991184" Received: from fmviesa005.fm.intel.com ([10.60.135.145]) by fmvoesa106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jun 2026 00:51:38 -0700 X-CSE-ConnectionGUID: I10b5wiBTaKFXgJ9c+nfLQ== X-CSE-MsgGUID: UqmDwUGhRquQQ1+SaelpwQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.24,224,1774335600"; d="scan'208";a="255357120" Received: from orsmsx903.amr.corp.intel.com ([10.22.229.25]) by fmviesa005.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jun 2026 00:51:39 -0700 Received: from ORSMSX903.amr.corp.intel.com (10.22.229.25) by ORSMSX903.amr.corp.intel.com (10.22.229.25) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.37; Thu, 25 Jun 2026 00:51:38 -0700 Received: from ORSEDG901.ED.cps.intel.com (10.7.248.11) by ORSMSX903.amr.corp.intel.com (10.22.229.25) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.37 via Frontend Transport; Thu, 25 Jun 2026 00:51:38 -0700 Received: from BL2PR02CU003.outbound.protection.outlook.com (52.101.52.57) by edgegateway.intel.com (134.134.137.111) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.37; Thu, 25 Jun 2026 00:51:37 -0700 ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=oN5MuDQTFDNwmp3KFUswuUMDpoN+pTufNJxagr0RDJ/cA0EMWre/g3/dT/YQIXcCmRlHbZvJyvvQU0kGASr0Bx3Ptxy2as4r2XaGS7OdkeYkgPElXx52n00kmEy6KtalkMqFyOu8pAvqbRttQ5yF+vR+CVOv6dupc1z9OrsQHXi4qKm/BDW8m8WdZalxg1IALhmgNTPi5ov4XnGMYerIzlDDj0gMfTZPkj+gB3rAZkI5n1U/BmCynWtxEOlD69ZOhEF22vBGgANPOjA4HbpBw9gUJmccHsRjRseM9i2OZ9Mj4HKq5CW5OfsgdeuMtpA37U4zyw9UBTDpDqfPRw/NxQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=loQ2eajmtVvqNAvULnitAxpvmFsercXz+ytzpokkE2o=; b=kOGPDIqjb275DrOO3IAJGLOInBSgKvXIP3i14254htHJcNz5iRDjLvXpd4G5y+SwS7VIT6seq8fHA7GHwm+Yfx/J6C0QAa2tveKoJEUqRC/i/r3LgcrRoQzARDnVdTdcTfee6KQNUZEp3gQad1QQ9qf1yHjxbUsWPbVXQ5gg9Z1b9Xqg6Qp7ut6Or05nsB7alFQTXUvBLHu9HWxJFwoDwp2zLmo+SLTsTtHI0SV8mhhYPj+Octn8jhcDOH0otbdJZdLryZS+cAJq9PmoyLhNyQXEH5e9jPhV++RjBzdmxW3RO37vpHTcRqUoXgtROhwCVStvaRxc/dMy+boOni/8oA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=intel.com; Received: from LV3PR11MB8508.namprd11.prod.outlook.com (2603:10b6:408:1b4::8) by DS4PPF11E6CAE14.namprd11.prod.outlook.com (2603:10b6:f:fc02::c) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.21.159.16; Thu, 25 Jun 2026 07:51:35 +0000 Received: from LV3PR11MB8508.namprd11.prod.outlook.com ([fe80::a1e8:1786:e5d1:8e51]) by LV3PR11MB8508.namprd11.prod.outlook.com ([fe80::a1e8:1786:e5d1:8e51%5]) with mapi id 15.21.0139.018; Thu, 25 Jun 2026 07:51:35 +0000 Message-ID: <5658849b-0425-4132-ba32-5801e2907c60@intel.com> Date: Thu, 25 Jun 2026 09:53:39 +0200 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH iwl v3] ice: retry reading NVM if admin queue returns EBUSY To: Robert Malz , Simon Horman , Grzegorz Nitka CC: , , References: <20260617120753.1785565-1-robert.malz@canonical.com> <20260618152003.909400-1-horms@kernel.org> From: Przemek Kitszel Content-Language: en-US In-Reply-To: Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 7bit X-ClientProxiedBy: TL0P290CA0013.ISRP290.PROD.OUTLOOK.COM (2603:1096:950:5::20) To LV3PR11MB8508.namprd11.prod.outlook.com (2603:10b6:408:1b4::8) Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: LV3PR11MB8508:EE_|DS4PPF11E6CAE14:EE_ X-MS-Office365-Filtering-Correlation-Id: 74517fed-588b-47bb-833a-08ded28e948c X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|366016|1800799024|23010399003|376014|18002099003|22082099003|4143699003|6133799003|3023799007|11063799006|5023799004|56012099006; X-Microsoft-Antispam-Message-Info: LZf+9nUI2Axn+OO0BVUMpp73/RTQHXLGTGmvDi3ZiExDmHbjH0w/FsZyYbX99zU9Aqz5V+rFqIMKafXJtPEOMEMWQF3+eLOFUUJIVZmKkTn/uIN8xN8xF1bwA6IdnPha3mphgmH1fEL026JrSoF9MUj1M1/y/CPxw+jen4WXXNnkshv4gG57jM8OD2CkqI3zT2ZvJyj8PwMK6qF+bFnxCrl49NyxhgVIx17imK9BR/69+gqurODmrZu3yV23qF1eVSpwFjNhzqbki6Fc+KEaQ9dgUlCVFFSXLOiI/iI+fNDeYsMj7JbqWahgc26BmTZ3Tp8O4rMzaejMjG0VPCG/vSatCfNN3tGfMlU1efHA0VX/PmmL18llmM+mZxlCa0Yaet7dZbWnQZ2n/al7JuRjjR5qV+U7UHc481hwirQbrZkfO34vI0EHjWZ/d2L9YptelOSy4tIqH1VfvlN5+ivGjq98IdL8faB/8ruXLYpTj2H/qvbIYf90t0kYmSn2+x+4yGMnLkh1z9T1ZVN35MX0gHQcAvIOrPmVdgux0TC1U9otGWXT7CCVdcyMGfXV1mQGQdct9QAbBux4g+hoEjgx9Yd0mvesywT2w779+un9I5soAk8BkI0NJXCd1pdhNvC15j1SRwlB4ZXVL+R4Kyp3+0XK5CIzf7GyIYvXs95wa4w= X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:LV3PR11MB8508.namprd11.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(366016)(1800799024)(23010399003)(376014)(18002099003)(22082099003)(4143699003)(6133799003)(3023799007)(11063799006)(5023799004)(56012099006);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?T05XdXNKQmtUdERvNHUwLzBNeHJsUG52QjVYTlVHN0lPSkhOQTVhSm9XcSth?= =?utf-8?B?Z3dNeG9POURpYUtHS2VCVmp6eVZTb3RobjFsMHUxRWYwSlQwR3JFYjlGa2lT?= =?utf-8?B?SFNYQ1diV2RWVjVQcG1qOUZyN1JCOWxBK2lrejIwV3hiR21oM3BkTXU1K3RG?= =?utf-8?B?VERrSG5jQWhXd1ZTd0ZjSS9kbFhTQmxhbjRST2YwbUp6eCtuMlJTaG9mb2pv?= =?utf-8?B?bXJOSjNsV2hxbDNLVmpRRmJta2RGMUw5YlVQNTQ1VVdUZElVSXFRTDQ2ZVFw?= =?utf-8?B?TnVoVzZNQS85MTFvd3B6dDN4YWp3MFUzYkFNTkVBNjZkWkdNZFltSG8zWVNT?= =?utf-8?B?dWNYNU8rZTQrWVJJVTRnQktWRGkrU3RBNzcxSWNEQjlFOGVoWGpQSExlVVZs?= =?utf-8?B?TVlvVFB0dHBTVDBaeTloR25kM21zZUYyajJETkNYL0hjWmQwbnE1eE9vbjZs?= =?utf-8?B?d2RlWnJ3VmlTVnFHVEJLdUxlMTN5N2J2WmhQaHIzeEkyWXlqaHhrN1pYR2o2?= =?utf-8?B?Y1NaVm9IYmx1M0k0T2ZFN1BnREhuVGNJc25RSGh6L2t3WHo2a3h3QURmdC9i?= =?utf-8?B?VTlBV1YvWVpxWUdycVJ6TGQ4SUwxQi9yZG11b2JBeEhaQ1hDeUR4b1lrVTNJ?= =?utf-8?B?dGhweSt1UDBLbkJhRys4R0k3VHV5dGhvOWplaFRJZkdCcDNOWkM0SUtaVVd2?= =?utf-8?B?Q3dRcmFieGFoNVZOWU9iL0NSaDNud0J1N3Q0S2VCVWd2dGRSUXpvTXJ2eFJ4?= =?utf-8?B?NHhFVTR0RE9nTFlIZVJqS3hOcmNtZ2Q3Ylh4MHFnSVRNMmg1NGo5T0d5V0Fw?= =?utf-8?B?SzNCY2o3Tmc1aS9MV3R2ckdTaG82NFB2ZnJhWVRqZENuRVRudFFDU1dUcTB5?= =?utf-8?B?L0VjMkJjZkZjWW9KVnNrYXVzdHVPVFpMWnZvZFdjZjNWR2RYQ1VTamJDZUhq?= =?utf-8?B?d1BpaFlCejk4NXlEWEtEY0NZL2lLQjhZaWpsWXROaDZDQ1FkR21FZzhsNlRM?= =?utf-8?B?VGJhTUoxSzV2Z3RIWmZFdVdVejNCSitmcEowak1SVHlrZUMyaDliOUR0MSs0?= =?utf-8?B?RElvY2ttWHpZMm92N2pNWEdFQ3N4UWdOVEJkZXJiS1hibDE1VTZEeDZMY0NE?= =?utf-8?B?QkUrUWtWa1hrT1FQUHBWcHQ2YnlDTW9hSGk3TFJoSnl2WHd6ZFFtUXdEZnpn?= =?utf-8?B?amVVeWxxME1xNzRqc2lsOVg0ZHJ0VzZSQkkycjdCd1I3Vmd0UWxDVlBHSG1z?= =?utf-8?B?ZGs1SEhqdGkwelByMFZLSmVTL1BqYnFXd2hWK3hyek4yblJzQ1BIUERta1FS?= =?utf-8?B?eUpNV3R1blhlRVEwaENoU0tiQW9GU1RVM3VuRlFyOGlxTFMwOWNkSnZZdDc3?= =?utf-8?B?SEplOVpZemNYeWNTcGYxSDUxNkdiWEFlWm5xUXpvTjg2SjRaYW1JSitiOFZT?= =?utf-8?B?dEM4UHhraktONFd2V0xkK3NNVm5Ea3BUWFJGd2hiOFlISCtkd0J5VnFOeXpB?= =?utf-8?B?UGRvT2FuRGpvY2pZWkNaN2ZFTXQ0cVhHSC9QdC9WNUpzWktRY0dtMzhtT1Z3?= =?utf-8?B?a1p4ZmlwckR6TkphakhVNDdSY0lPZ055Z1J1dzFIMExSOXVpcUI0TTNZN0VV?= =?utf-8?B?UE1oSVduL1BlUENmK2d2MGx2b1lRYWpJNzFKZTE3ajJ0UWs0MVkyeWlSWlhr?= =?utf-8?B?am9mNEEzZGZ4cm9GZi9pcU04c2Jpb0hGTEo5Q1FpMHoxdkVib3lDdnJSR0Jk?= =?utf-8?B?YXlMTjA0d2lnUzRaZUhBck5CSWVoWWxxSFMzM0RjUHRzenZjd0lWMS8yMEVy?= =?utf-8?B?UklTdnpZdzJhMWZHVnRFSkZ0L2JpWVkxVm5ueTg0YzRPajNmZElreG0xZGxS?= =?utf-8?B?UldjbEdLaHFZeGtkU2JycTA2TXAxTkVtdm50cytiRCtmRFNrazZTM0pseEFv?= =?utf-8?B?T003WkprVlc0UC9Uc2h5UzlMQlhTWVFGZFg2L2ZnTFJ0SmlvZW54aUN4YTkw?= =?utf-8?B?eDNEaTAzb0UzcnJiOXVxSllhcEIrOVIyLzVxdFdjQWllaHRZWFBGMENQNU5X?= =?utf-8?B?WllybnY1dzd3K2JVcnNDVERYdi8vSjR0SEQwWHhSWjRMZjRrQS9sYk82UGRp?= =?utf-8?B?ekM1ZjlaSUZEbmlYNVJTalg0UmF6eU9QSW1QRG5GUE5RcUx4YkNkcmFLWEt5?= =?utf-8?B?NmhtSGRaWWgzWWFkWGh2TlE5a3NvQ2x2eEhVQ21hek9tYWJRcFJtWFlxbndL?= =?utf-8?B?TEQ1UHNrTmN0VmhEYlcxbE4yVzV0cnM2clVqWVduYStoOUlhYkx1aG5FUGsx?= =?utf-8?B?SS9IWklFbDU0Qlc1bms0eGtkVGM2S1l2cHd5UVlDMlpMNGppYVNjSmd4MHBj?= =?utf-8?Q?+svYnUhNLjkiAdPE=3D?= X-Exchange-RoutingPolicyChecked: Pj9EyxO/coznYr3lvBSWW09sbTP14/A8u2oU9ymNh+ggnR6vqBlZ5kjaxCiF7PT216Xa7+biOJ7PbGxvKHwK5fklWzQUSYceuT5g+HiwBMRBzV7zJYaq7TyY/fDL7fMwobDI0kpdB0hUV/Ks5Qb2s7VutqTHYp5JZ4p5I1EnQr62XkWWBexMjHTDbAKN4L3b/Q6TVLyt4aNA5lXjwEgfzMhr1kVu/K5OSftPAvjUizaC0gqs3y2ipKTzjftPOH0u9o1ssTQ6O/nOZ1v9ihF00fGHoqZYfFoQq57LG3IQXbP9oYCOLdZqXhIdfBqo1qzShUyvViWJ5BiTz47GBa3LTg== X-MS-Exchange-CrossTenant-Network-Message-Id: 74517fed-588b-47bb-833a-08ded28e948c X-MS-Exchange-CrossTenant-AuthSource: LV3PR11MB8508.namprd11.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 25 Jun 2026 07:51:35.3451 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: NcQEbQ0E+ofmmjTJmeNFaH0FUMmoykQEX6pHCLz6Z06D3y+hOIciQaVMwkLA74NxF0UQO/YMVVCtbU125F0rOmFTZ1kMevhcFOTUFvlG10M= X-MS-Exchange-Transport-CrossTenantHeadersStamped: DS4PPF11E6CAE14 X-OriginatorOrg: intel.com >> This is an AI-generated review of your patch. The human sending this >> [Severity: Low] >> Does this check allow an extra retry execution? Since retry_cnt starts at 0, >> using a strictly greater-than check allows the loop to retry 4 times >> (5 total executions) instead of the 3 executions intended by >> ICE_SQ_SEND_MAX_EXECUTE. >> >>> + break; >>> + ice_debug(hw, ICE_DBG_NVM, >>> + "NVM read EBUSY error, retry %d\n", >>> + retry_cnt + 1); >>> + last_cmd = false; >>> + ice_release_nvm(hw); >>> + msleep(ICE_SQ_SEND_DELAY_TIME_MS); >> > The review is correct that this allows one extra execution. However, I > intentionally kept this verbatim from the OOT driver (1.15.4) to This OOT change was made entirely by Grzegorz, for E825C. Would be great to acknowledge his authorship (+ SoB ofc, below). Signed-off-by: Grzegorz Nitka What was the platform that you have encountered the bug on? > retain the significant test exposure it has there. The absolute worst > case is one harmless extra retry before giving up. If you strongly > prefer strict adherence to ICE_SQ_SEND_MAX_EXECUTE, I can update the > condition in a v4. Please update (to have one attempt less). > >> [Severity: High] >> Does dropping the NVM lock here violate the caller's assumption of mutual >> exclusion? >> >> Callers of ice_read_flat_nvm() typically acquire the NVM lock beforehand to >> ensure an atomic read of multiple sectors without interleaved writes. By >> explicitly dropping the lock during the msleep() window, another thread, VF, >> or PF could acquire the NVM lock and modify the NVM before the read resumes, >> potentially resulting in reading torn or inconsistent NVM data. >> >>> + status = ice_acquire_nvm(hw, ICE_RES_READ); >>> + if (status) >>> + break; >> > > I don't think this is an issue. The NVM resource (req_res 0x0008) is > an ownership lease, not a content snapshot. The FW grants it with a > hold timeout (see the timeout field handling in ice_aq_req_res). It > never guaranteed a point-in-time view of the flash. > Very similar behavior already exists in ice_devlink_nvm_snapshot(), > which reads the NVM in chunks via ice_read_flat_nvm() and deliberately > acquires/releases the lock between chunks (with a comment explaining > the FW reclaim). > Additionally, the lock is only dropped on the failure path, where > neither bytes_read nor offset have been advanced. The driver simply > re-reads the failed chunk after re-acquiring. So no data that is > returned to the caller is committed across the drop window. > Overall the lock drop does not weaken any guarantee callers actually > have. If we wanted to introduce a true snapshot guarantee, that would > probably require different approach and separate patch. > > >> [Severity: High] >> Can this lead to an unbalanced NVM resource lock release and potential data >> corruption? >> >> If ice_acquire_nvm() fails here, the loop breaks and returns an error with the >> lock currently unheld. However, callers like ice_read_flash_module() are >> unaware of this internal lock state change and will unconditionally call >> ice_release_nvm() upon failure: >> >> drivers/net/ethernet/intel/ice/ice_nvm.c:ice_read_flash_module() { >> status = ice_acquire_nvm(hw, ICE_RES_READ); >> if (status) >> return status; >> >> status = ice_read_flat_nvm(hw, start + offset, &length, data, false); >> >> ice_release_nvm(hw); >> ... >> >> Because firmware tracks resource locks per-PF rather than per-OS-thread, this >> errant second release could forcefully strip the lock from another thread on >> the same PF that successfully acquired it, exposing the NVM to concurrent >> modification. >> > > Agreed, this might be a real bug, and the one of the three I think is > worth investigating. > If ice_acquire_nvm() fails after the drop, ice_read_flat_nvm() returns > with the lock unheld while callers unconditionally call > ice_release_nvm(), so a stray release is issued. > > On probability, though, the window is very small. Reaching it requires > sustained EBUSY across the retry budget plus a failed re-acquire > (which itself polls up to ICE_NVM_TIMEOUT), and concurrently another > requester taking the lock. Most reads happen during init (ice_probe, > and reset/rebuild via ice_init_nvm), and NVM writes only happen on an > already initialized driver. The devlink/ethtool nvm_read paths are > also exposed, but hitting this race would require precise timing > against a concurrent NVM owner on the device. > > I'd prefer to keep the scope of this patch limited to the EBUSY retry > path and not take on the unbalanced-release fix here. A proper fix > should change the lock-ownership contract of ice_read_flat_nvm() (on > error, the lock must be released by ice_read_flat_nvm(), callers > release only on success) and update all callers. Code change sould be > simple for all callers but ice_discover_flash_size(), it intentionally > holds one lease across a read loop and would need to re-acquire after > each expected boundary failure. > > Given how small the original window is, I'd rather not trade tested > OOT behavior for the risk of a complex unbalanced NVM lock fix. I > actually have a patch mostly ready that fixes the lock-ownership > contract, but I really don't like it. It changes the design of > ice_read_flat_nvm(), making it less intuitive for callers. More > importantly, I just don't have the resources or test coverage right > now to properly verify such change. > > However, I can modify the failure path for ice_acquire_nvm inside > ice_read_flat_nvm. Instead of bailing out immediately, we can just > retry it within the existing retry budget. In this case, the > probability of leaving ice_read_flat_nvm without holding the lock is > reduced even further without needing a refactor. > > Please let me know what you think about my thought process on this. I think that both AI-reported issues against the lock are valid concerns. I think that sleep was the actual fix, and re-locking were merely a necessity due to their expiration (as you said). A proper fix would be to just increase lock-timeout to accommodate all attempts (and still do the retries&sleep, but without unlocking). > > > >>> + retry_cnt++; >>> + } else { >>> + bytes_read += read_size; >>> + offset += read_size; >>> + retry_cnt = 0; >>> + } >>> } while (!last_cmd); >>> >>> *length = bytes_read; > > Thanks, > Robert