From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1DED82D7DEF for ; Wed, 1 Apr 2026 20:02:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775073746; cv=none; b=d4vwvZeeenGiI8Yupf9KKDs7j1v30Xo4mbnPvU0nBwVwrr3eFUpzhLefWAjSq55RuSDxmipl95NGPU6Ny2RQ1/HvjAetLJZ2+MjjmsEktxoFkQGk3Ye7ffjK13XqvHGHzumwxTMqrAMo4FD6nktLhcrqgM+rC3MMXKqxjieH8gU= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775073746; c=relaxed/simple; bh=RraedbUMjsBfljOYrKy9mlz9B6TOhLSN7UW42ZAHVfE=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=T5AaXyEzH6Fbep8zink6SXfIl84GQ1mDWQ+6Zno+LkuE1nY7lKakR9Zy3urk1i+sAPpIZa5aiXBIbabMzIxkUXxAI/wc0V1dS+5a8HTKFcq7gzQK05GTlUtfHyhVuHEuLHS/CQ5FySM6rlCfvNAkm5ixzZ2furlvo/HPoVANG2w= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=nmWHw3PN; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="nmWHw3PN" Received: by smtp.kernel.org (Postfix) with ESMTPSA id B1B2DC4CEF7; Wed, 1 Apr 2026 20:02:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1775073745; bh=RraedbUMjsBfljOYrKy9mlz9B6TOhLSN7UW42ZAHVfE=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=nmWHw3PNZ04QPbcsrPyo8Mkd0h3vThYd1CpyGO0Hamnd4RSlRbAyE+CRRUmdkHKfq sqE4DRscHM+nJiOVNLfeqtNsS/awAcKW5K7S4QliQKIMBSoPgFwpfz4dkHm27UjE5m eZJrnPUzL0pXap7EQlZO+IFC6k3XbKWPdmQTRpp6O829j0hbrCpz17prRrvFdW9mnR LIkxDoD1+9bchAS3mFCRD2y4tsq4WcxYXB5/LrLY5uwPEY08WpxYoUewQCWNKWaZ1s oGY815q9qQM7igQQyT+nhczOKZvN/7X3g0nTpCRbOKPG7AwHdaJK9/AnvV9a4pTX+4 PmbY+bJoMTD/w== Message-ID: <7a0cfc66-3131-4b94-87f2-cbb96595ebb6@kernel.org> Date: Thu, 2 Apr 2026 05:02:22 +0900 Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2] block: Increase BLK_DEF_MAX_SECTORS_CAP To: Mira Limbeck Cc: axboe@kernel.dk, hch@lst.de, linux-block@vger.kernel.org, martin.petersen@oracle.com, Friedrich Weber References: <20250618060045.37593-1-dlemoal@kernel.org> <291f78bf-4b4a-40dd-867d-053b36c564b3@proxmox.com> Content-Language: en-US From: Damien Le Moal Organization: Western Digital Research In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit On 4/1/26 19:32, Mira Limbeck wrote: > Sorry if I wasn't clear enough, we did test the following mainline > kernels without any downstream patches (git tags): > v6.16 (unaffected) > v6.17 (affected) > v7.0-rc5 (affected) > > Afterwards we started to bisect between mainline 6.16 > (038d61fd642278bab63ee8ef722c50d10ab01e8f) and mainline 6.17 > (e5f0a698b34ed76002dc5cff3804a61c80233a7a) without any downstream > patches, which led us to this commit as the first bad one: > 9b8b84879d4adc506b0d3944e20b28d9f3f6994b Note: the proper way to reference a patch is to use 12-digits commit ID and patch title: 9b8b84879d4a ("block: Increase BLK_DEF_MAX_SECTORS_CAP") as that make it easier to know what one is talking about without having to go look what patch that ID references. > Building our downstream kernel 6.17 with this commit reverted, fixed it. Nope, this is likely not fixing anything but rather hiding the issue. With this patch reverted, the default max_sectors_kb will be 1280, so all requests will be chunked to that size at most, and your devices will not see large commands. However, simply doing something like: echo 4096 > /sys/block//queue/max_sectors_kb will put your system in a state that is equivalent to the patch being applied and you will likely see the issue again. Try. > To make sure that's also the case for the current mainline kernel, we've > tried 7.0-rc6 today (v7.0-rc6, 7aaa8047eafd0bd628065b15757d9b48c5f9c07d, > affected), and again with this commit reverted (unaffected). Same comment as above. > Here the logs from 7.0-rc6: > > Apr 01 11:41:19 pve-test-hba kernel: sd 9:2:2:0: [sdc] tag#3962 page boundary curr_buff: 0x00000000f4d7cfce > Apr 01 11:41:19 pve-test-hba kernel: BUG: unable to handle page fault for address: ff3a241243d70000 > Apr 01 11:41:19 pve-test-hba kernel: #PF: supervisor write access in kernel mode > Apr 01 11:41:19 pve-test-hba kernel: #PF: error_code(0x0002) - not-present page > Apr 01 11:41:19 pve-test-hba kernel: PGD 100010067 P4D 10066d067 PUD 10066e067 PMD 11f0fa067 PTE 0 > Apr 01 11:41:19 pve-test-hba kernel: Oops: Oops: 0002 [#3] SMP NOPTI > Apr 01 11:41:19 pve-test-hba kernel: CPU: 15 UID: 0 PID: 6695 Comm: vgs Tainted: G D W E 7.0.0-rc6 #19 PREEMPT(full) > Apr 01 11:41:19 pve-test-hba kernel: Tainted: [D]=DIE, [W]=WARN, [E]=UNSIGNED_MODULE > Apr 01 11:41:19 pve-test-hba kernel: Hardware name: > Apr 01 11:41:19 pve-test-hba kernel: RIP: 0010:_base_build_sg_scmd_ieee+0x478/0x590 [mpt3sas] There may be an issue with the mpt3sas driver with large commands. However, I am using that driver all day long and doing lots of testing with gigantic read/write commands all the time. I have never seen any issues. The difference is that I am using the SAS-SATA FW for the Broadcom HBA, so no NVMe support, and my target devices are SAS or SATA HDDs, not SSDs. Something may be wrong with the NVMe support in that HBA, or, your SSDs do not like large commands and cause issues. That is easy to test: try connecting your SSDs directly to PCI and test them by issuing large read/write commands with fio (you will need to use iomem=mmaphuge option to use hugepages for the IO buffers to ensure that you do not get the IOs chunked into small commands due to memory fragmentation). At least from my point of view and my tests, that commit is perfectly fine. As mentioned above, it is only changing the default value, and that's something that can be done manually even without this patch. So this is definitely not the root cause and it is simply exposing a problem that already existed. We have seen that for several devices in the ata subsystem and had to quirk many drives that are choking on large commands. We just need to figure out where the problem is (HBA and/or SSD) and can then look into how to avoid that problem. -- Damien Le Moal Western Digital Research