From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 652CF1E47DB for ; Tue, 7 Jan 2025 14:29:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.15 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736260168; cv=none; b=Mqzrckjb1D+uC6hiEggurMSA5Lq/YsU3swFrpUOjN7PmB4IgBzgDhLmvBeT4teQE2Fcf/Ww1QifehjE1fBVSOlel/C3mroUVL2BS7/i+afdVH7V08KKaBugSBEdRDZo4o6DQZjqpmJ1kvT3UymfrGAwI93vGiqfMkszrWsW1iJA= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736260168; c=relaxed/simple; bh=rP4UWAe4JOsyEV7aa5ctI9aZUeKZJpJAcsN65riTZDU=; h=From:Date:To:cc:Subject:In-Reply-To:Message-ID:References: MIME-Version:Content-Type; b=PY0JgusMRKoVYUG7KMUOd3zhoX7Vf1gcBJXTrESgmBBd0PESI7hzZJpStSz4TjujBYH7kLC+cL3BGMavOnAf1D/bE346mzZrq8OwKCnDrkLT3q5QKjkjxeBnvyquwbDtrcklZoGyv5IEbBoIlp0VCDa5b8SYHgVieVQjEO3WRPA= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=QcQlbRnE; arc=none smtp.client-ip=198.175.65.15 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="QcQlbRnE" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1736260167; x=1767796167; h=from:date:to:cc:subject:in-reply-to:message-id: references:mime-version; bh=rP4UWAe4JOsyEV7aa5ctI9aZUeKZJpJAcsN65riTZDU=; b=QcQlbRnEzI0FDx8a0NCrSAP/MZH60EzWg7UYb7yV0uTOUxKRVwekgeoF bA28xA5M1DEb32A6r0s5JPqaUNRTvAFCDIHEZ5+cAaMqGsSvK8iEd4hKV wG3vpXQPff8v5Dkewq4jwv5HaRT30QKj578pPrMw47oPWHT6Vq/7Uedtg /YWFsZS9P1jPNqTGBCjXp03QmkTkg8IfK9QqXYiwwQ3fK0LeLPWLagC6C 17xZnAPlVTnN94wDmq3vS3ZY/LSaFTnzwQmUAf2V0DxtubsSHYW+QL6Lk CK51So/Ysys0jSN+q81RJ6CCAf2X8vGxtqE0ByA9bhfsOCMNadAcWZUrT w==; X-CSE-ConnectionGUID: ONAGwRXvQcqswgm41QYXpg== X-CSE-MsgGUID: /Un+eh8eSzy7SZsB3WswJg== X-IronPort-AV: E=McAfee;i="6700,10204,11308"; a="40115615" X-IronPort-AV: E=Sophos;i="6.12,295,1728975600"; d="scan'208";a="40115615" Received: from fmviesa004.fm.intel.com ([10.60.135.144]) by orvoesa107.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Jan 2025 06:29:26 -0800 X-CSE-ConnectionGUID: HB10+epBR1q1NJcGo9AYwg== X-CSE-MsgGUID: kRVTqs3NSJGjp/HrijWUpA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,295,1728975600"; d="scan'208";a="107657574" Received: from ijarvine-mobl1.ger.corp.intel.com (HELO localhost) ([10.245.244.206]) by fmviesa004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Jan 2025 06:29:21 -0800 From: =?UTF-8?q?Ilpo=20J=C3=A4rvinen?= Date: Tue, 7 Jan 2025 16:29:18 +0200 (EET) To: Lukas Wunner cc: Bjorn Helgaas , Krzysztof Wilczynski , linux-pci@vger.kernel.org, Niklas Schnelle , Jonathan Cameron , Mika Westerberg , "Maciej W. Rozycki" , Mario Limonciello , Evert Vorster Subject: Re: [PATCH for-linus] PCI/bwctrl: Fix NULL pointer deref on unbind and bind In-Reply-To: Message-ID: References: <0ee5faf5395cad8d29fb66e1ec444c8d882a4201.1735852688.git.lukas@wunner.de> Precedence: bulk X-Mailing-List: linux-pci@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="8323328-778893945-1736260158=:1001" This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --8323328-778893945-1736260158=:1001 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE On Tue, 7 Jan 2025, Lukas Wunner wrote: > On Sun, Jan 05, 2025 at 06:54:24PM +0200, Ilpo J=E4rvinen wrote: > > Indeed, it certainly didn't occur to me while arranging the code the wa= y=20 > > it is that there are other sources for the same irq. However, there is = a=20 > > reason those lines where within the same critical section (I also reali= zed=20 > > it's not documented anywhere): > >=20 > > As bwctrl has two operating modes, one with BW notifications and the ot= her=20 > > without them, there are races when switching between those modes during= =20 > > probe wrt. call to lbms counting accessor, and I reused those rw=20 > > semaphores to prevent those race (the race fixes were noted only in a= =20 > > history bullet of the bwctrl series). >=20 > Could you add code comment(s) to document this? Sure, I'll do that once I've been able to clear the holiday-induced logjam on pdx86 maintainership front. :-) (For now, I added a bullet to my todo=20 list to not forget it). > I've respun the patch, but of course yesterday was a holiday in Finland. > So I'm hoping you get a chance to review the v2 patch today. Done. > It seems pcie_bwctrl_setspeed_rwsem is only needed because > pcie_retrain_link() calls pcie_reset_lbms_count(), which > would recursively acquire pcie_bwctrl_lbms_rwsem. > > There are only two callers of pcie_retrain_link(), so I'm > wondering if the invocation of pcie_reset_lbms_count() > can be moved to them, thus avoiding the recursive lock > acquisition and allowing to get rid of pcie_bwctrl_setspeed_rwsem. > > An alternative would be to have a __pcie_retrain_link() helper > which doesn't call pcie_reset_lbms_count(). > > Right now there are no less than three locks used by bwctrl > (the two global rwsem plus the per-port mutex). That doesn't > look elegant and makes it difficult to reason about the code, > so simplifying the locking would be desirable I think. I considered __pcie_retrain_link() variant but it felt like locking=20 details that are internal to bwctrl would be leaking into elsewhere in the= =20 code so I had some level of dislike towards this solution, but I'm not=20 strictly against it. It would seem most straightforward approach that wouldn't force=20 moving LBMS reset to callers which feels even less elegant/obvious. I just previously chose to keep that to complexity internal to bwctrl but if you think adding __pcie_retrain_link() would be more elegant, we can certainly move to that direction. > I'm also wondering if the IRQ handler really needs to run in > hardirq context. Is there a reason it can't run in thread > context? Note that CONFIG_PREEMPT_RT=3Dy (as well as the > "threadirqs" command line option) cause the handler to be run > in thread context, so it must work properly in that situation > as well. If thread context would work now, why was the fix in the commit=20 3e82a7f9031f ("PCI/LINK: Supply IRQ handler so level-triggered IRQs are=20 acked")) needed (the commit is from the bwnotif era)? What has changed=20 since that fix? I'm open to our suggestion but that existence of that fix is keeping me=20 back. I just don't understand why it would work now when it didn't back=20 then. > Another oddity that caught my eye is the counting of the > interrupts. It seems the only place where lbms_count is read > is the pcie_failed_link_retrain() quirk, and it only cares > about the count being non-zero. So this could be a bit in > pci_dev->priv_flags that's accessed with set_bit() / test_bit() > similar to pci_dev_assign_added() / pci_dev_is_added(). >=20 > Are you planning on using the count for something else in the > future? If not, using a flag would be simpler and more economical > memory-wise. Somebody requested having the count exposed. For troubleshooting HW=20 problems (IIRC), it was privately asked from me when I posted one of=20 the early versions of the bwctrl series (so quite long time ago). I've just not created that change yet to put it under sysfs. > I'm also worried about the lbms_count overflowing. Should I perhaps simply do pci_warn() if it happens? > Because there's hardware which signals an interrupt before actually > setting one of the two bits in the Link Status Register, I'm > wondering if it would make sense to poll the register a couple > of times in the irq handler. Obviously this is only an option > if the handler is running in thread context. What was the maximum > time you saw during testing that it took to set the LBMS bit belatedly? Is there some misunderstanding here between us because I don't think I've= =20 noticed delayed LBMS assertion? What I saw was the new Link Speed not yet= =20 updated when Link Training was already 0. In that case, the Link Status=20 register was read inside the handler so I'd assume LBMS was set to=20 actually trigger the interrupt, thus, not set belatedly. I only recall testing with reading the value again inside set speed=20 functions and the Link Speed was always correct by then. I might have also= =20 tried polling it inside the handler but I'm sorry don't recall anymore if= =20 I did and what was the end result. > If you don't poll for the LBMS bit, then you definitely should clear > it on unbind in case it contains a stale 1. Or probably clear it in > any case. --=20 i. --8323328-778893945-1736260158=:1001--