From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 763BB30AABF for ; Tue, 18 Nov 2025 20:49:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763498989; cv=none; b=GY3BgNJmwKnQmHutrraBeFAq0AJ4pOTF0NixhCXdFlu2gIdH05N77/4qmVxy6uyObJnu9mQNCP9rfQRcHfldQ0Q46Z4OhlsQ0CzOIoewYeCm61PXxIjHVUN+lrdz1npC+N4pg3NyCE5ZrVwlCSSUFQwJedezTZgJUBAjNQLqrT4= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763498989; c=relaxed/simple; bh=GkRx614zWTda5iheqrQzFK4/apd3Zq7rMoKN653Ftps=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=OX6CD4XOB8rZtTnjyqGDr9nsrPQ0k+s5GvlHChOBixr3dTWHCuLI/KHVLnDtnDtZDousSSXz3g4JzQZk86LptSqLgS67mV0W4KI3R2c6HN+WOG1D1b/8yB4WoT7+3HIvdQ5NuVmH0Zl5fLFp3HUfjDRdb2RLZsynzbDKxtd+iko= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Dexhd1HI; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Dexhd1HI" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 32649C4CEFB; Tue, 18 Nov 2025 20:49:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1763498988; bh=GkRx614zWTda5iheqrQzFK4/apd3Zq7rMoKN653Ftps=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=Dexhd1HIDxXnFHTaGONtWafpB8SGUUwoX0QQU8g0IxvjHxkQiERcGKiQhUAEUr9By UrLcAdJOu2Y9LVLFvrnnMek/6WSB9R1ThUWvoc6iHemrJthNU2ysjk7E3/Wd/cEEE6 9CZ5pg9ZraLoWG7DY3SkzfbSICuzAdXfkriro19fHfk6j+HWQH7z7bNPbo5rQYMsqf FOwpbodZlM03odZzjvmzzQizltce+uq5vO7A0aWvJ5FXUoISgtTyqRXe5qOZ5OTBnl Psy6kC1jctMkmzcIvgpLxt+P7inoAX9RbnJeV5a9paaR0S4lNVavqF9emICHDA8iI5 eR0CrAfnY6PIA== Date: Tue, 18 Nov 2025 13:49:45 -0700 From: Keith Busch To: Thomas ten Cate Cc: Jens Axboe , Christoph Hellwig , Sagi Grimberg , linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org Subject: Re: "controller is down; will reset" on SK Hynix NVMe drive in Lenovo IdeaPad Pro 5 Message-ID: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: On Mon, Nov 17, 2025 at 02:39:17PM +0100, Thomas ten Cate wrote: > The log suggests to add the kernel arguments > "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off > pcie_port_pm=off", which indeed makes all issues go away. > > I haven't found a reliable way to trigger the latter error > specifically, though doing something I/O heavy like compiling a kernel > seems to make it more likely. This makes bisect difficult to do, but > it's clear that something was going on in previous versions as well, > so I wouldn't necessarily call this a regression. Either way, the > issue is still present in mainline 6.17.8. > > Since it happens only after some idle time, and disabling PM fixes it, > this seems related to power states. But of course, I cannot completely > rule out faulty hardware either. > > Machine: Lenovo IdeaPad Pro 5 16APH8 > Architecture: x86_64 > NVMe drive: SK Hynix HFS001TEJ4X112N > Full lshw output: > https://gist.github.com/ttencate/5540c81454bbe1fa679955effba65eba > > Distribution: Arch Linux > Kernel version: 6.17.8 (vanilla from commit 8ac42a6) > Kernel configuration: > https://gitlab.archlinux.org/archlinux/packaging/packages/linux-lts/-/blob/b0cac6a69041703bbe1aba4a2a269585d77b108b/config > (plus `make olddefconfig`) > GCC version: 15.2.1 > > This is my first kernel bug report, so I hope I didn't miss anything; > if I did, please let me know. I'd be happy to experiment or try out > patches. The "report a bug" message was originally pointed at hardware vendors rather than kernel. Something is wrong with the SSD, the PCIe slot, or both if the power features cause the endpoint to drop off the bus. The only recourse we have in the nvme driver is a quirk to disable APST for the device. The driver doesn't control the PCIe ASPM settings though, so that would have to be a different quirk if it's really necessary. Do you need all three of those parameters, or is disabling the nvme driver's apst sufficient on its own? These parameters do have a negative impact on your machine's power consumption, so you'd usually want to hone in if it's just the deepest power state or if every power saving feature really needs to be disabled.