From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Niklas Cassel <niklas.cassel@wdc.com>,
Damien Le Moal <dlemoal@kernel.org>,
Sasha Levin <sashal@kernel.org>,
linux-ide@vger.kernel.org
Subject: [PATCH AUTOSEL 6.5 40/41] ata: libata-eh: do not clear ATA_PFLAG_EH_PENDING in ata_eh_reset()
Date: Sun, 24 Sep 2023 09:15:28 -0400 [thread overview]
Message-ID: <20230924131529.1275335-40-sashal@kernel.org> (raw)
In-Reply-To: <20230924131529.1275335-1-sashal@kernel.org>
From: Niklas Cassel <niklas.cassel@wdc.com>
[ Upstream commit 80cc944eca4f0baa9c381d0706f3160e491437f2 ]
ata_scsi_port_error_handler() starts off by clearing ATA_PFLAG_EH_PENDING,
before calling ap->ops->error_handler() (without holding the ap->lock).
If an error IRQ is received while ap->ops->error_handler() is running,
the irq handler will set ATA_PFLAG_EH_PENDING.
Once ap->ops->error_handler() returns, ata_scsi_port_error_handler()
checks if ATA_PFLAG_EH_PENDING is set, and if it is, another iteration
of ATA EH is performed.
The problem is that ATA_PFLAG_EH_PENDING is not only cleared by
ata_scsi_port_error_handler(), it is also cleared by ata_eh_reset().
ata_eh_reset() is called by ap->ops->error_handler(). This additional
clearing done by ata_eh_reset() breaks the whole retry logic in
ata_scsi_port_error_handler(). Thus, if an error IRQ is received while
ap->ops->error_handler() is running, the port will currently remain
frozen and will never get re-enabled.
The additional clearing in ata_eh_reset() was introduced in commit
1e641060c4b5 ("libata: clear eh_info on reset completion").
Looking at the original error report:
https://marc.info/?l=linux-ide&m=124765325828495&w=2
We can see the following happening:
[ 1.074659] ata3: XXX port freeze
[ 1.074700] ata3: XXX hardresetting link, stopping engine
[ 1.074746] ata3: XXX flipping SControl
[ 1.411471] ata3: XXX irq_stat=400040 CONN|PHY
[ 1.411475] ata3: XXX port freeze
[ 1.420049] ata3: XXX starting engine
[ 1.420096] ata3: XXX rc=0, class=1
[ 1.420142] ata3: XXX clearing IRQs for thawing
[ 1.420188] ata3: XXX port thawed
[ 1.420234] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
We are not supposed to be able to receive an error IRQ while the port is
frozen (PxIE is set to 0, i.e. all IRQs for the port are disabled).
AHCI 1.3.1 section 10.7.1.1 First Tier (IS Register) states:
"Each bit location can be thought of as reporting a '1' if the virtual
"interrupt line" for that port is indicating it wishes to generate an
interrupt. That is, if a port has one or more interrupt status bit set,
and the enables for those status bits are set, then this bit shall be set."
Additionally, AHCI state P:ComInit clearly shows that the state machine
will only jump to P:ComInitSetIS (which sets IS.IPS(x) to '1'), if PxIE.PCE
is set to '1'. In our case, PxIE is set to 0, so IS.IPS(x) won't get set.
So IS.IPS(x) only gets set if PxIS and PxIE is set.
AHCI 1.3.1 section 10.7.1.1 First Tier (IS Register) also states:
"The bits in this register are read/write clear. It is set by the level of
the virtual interrupt line being a set, and cleared by a write of '1' from
the software."
So if IS.IPS(x) is set, you need to explicitly clear it by writing a 1 to
IS.IPS(x) for that port.
Since PxIE is cleared, the only way to get an interrupt while the port is
frozen, is if IS.IPS(x) is set, and the only way IS.IPS(x) can be set when
the port is frozen, is if it was set before the port was frozen.
However, since commit 737dd811a3db ("ata: libahci: clear pending interrupt
status"), we clear both PxIS and IS.IPS(x) after freezing the port, but
before the COMRESET, so the problem that commit 1e641060c4b5 ("libata:
clear eh_info on reset completion") fixed can no longer happen.
Thus, revert commit 1e641060c4b5 ("libata: clear eh_info on reset
completion"), so that the retry logic in ata_scsi_port_error_handler()
works once again. (The retry logic is still needed, since we can still
get an error IRQ _after_ the port has been thawed, but before
ata_scsi_port_error_handler() takes the ap->lock in order to check
if ATA_PFLAG_EH_PENDING is set.)
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/ata/libata-eh.c | 13 +++----------
1 file changed, 3 insertions(+), 10 deletions(-)
diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index 35e03679b0bfe..d7914c7d1a0d1 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -2822,18 +2822,11 @@ int ata_eh_reset(struct ata_link *link, int classify,
}
}
- /*
- * Some controllers can't be frozen very well and may set spurious
- * error conditions during reset. Clear accumulated error
- * information and re-thaw the port if frozen. As reset is the
- * final recovery action and we cross check link onlineness against
- * device classification later, no hotplug event is lost by this.
- */
+ /* clear cached SError */
spin_lock_irqsave(link->ap->lock, flags);
- memset(&link->eh_info, 0, sizeof(link->eh_info));
+ link->eh_info.serror = 0;
if (slave)
- memset(&slave->eh_info, 0, sizeof(link->eh_info));
- ap->pflags &= ~ATA_PFLAG_EH_PENDING;
+ slave->eh_info.serror = 0;
spin_unlock_irqrestore(link->ap->lock, flags);
if (ata_port_is_frozen(ap))
--
2.40.1
next prev parent reply other threads:[~2023-09-24 13:19 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-24 13:14 [PATCH AUTOSEL 6.5 01/41] nvme-fc: Prevent null pointer dereference in nvme_fc_io_getuuid() Sasha Levin
2023-09-24 13:14 ` [PATCH AUTOSEL 6.5 02/41] parisc: sba: Fix compile warning wrt list of SBA devices Sasha Levin
2023-09-24 13:14 ` [PATCH AUTOSEL 6.5 03/41] parisc: sba-iommu: Fix sparse warnigs Sasha Levin
2023-09-24 13:14 ` [PATCH AUTOSEL 6.5 04/41] parisc: ccio-dma: Fix sparse warnings Sasha Levin
2023-09-24 13:14 ` [PATCH AUTOSEL 6.5 05/41] parisc: iosapic.c: " Sasha Levin
2023-09-24 13:14 ` [PATCH AUTOSEL 6.5 06/41] parisc: drivers: Fix sparse warning Sasha Levin
2023-09-24 13:14 ` [PATCH AUTOSEL 6.5 07/41] parisc: irq: Make irq_stack_union static to avoid " Sasha Levin
2023-09-24 13:14 ` [PATCH AUTOSEL 6.5 08/41] scsi: qedf: Add synchronization between I/O completions and abort Sasha Levin
2023-09-24 13:14 ` [PATCH AUTOSEL 6.5 09/41] scsi: ufs: core: Move __ufshcd_send_uic_cmd() outside host_lock Sasha Levin
2023-09-24 13:14 ` [PATCH AUTOSEL 6.5 10/41] scsi: ufs: core: Poll HCS.UCRDY before issuing a UIC command Sasha Levin
2023-09-24 13:14 ` [PATCH AUTOSEL 6.5 11/41] selftests/ftrace: Correctly enable event in instance-event.tc Sasha Levin
2023-09-24 13:15 ` [PATCH AUTOSEL 6.5 12/41] ring-buffer: Avoid softlockup in ring_buffer_resize() Sasha Levin
2023-09-24 13:15 ` [PATCH AUTOSEL 6.5 13/41] btrfs: do not block starts waiting on previous transaction commit Sasha Levin
2023-09-25 13:01 ` David Sterba
2023-09-25 17:47 ` Sasha Levin
2023-09-24 13:15 ` [PATCH AUTOSEL 6.5 14/41] btrfs: improve error message after failure to add delayed dir index item Sasha Levin
2023-09-24 13:15 ` [PATCH AUTOSEL 6.5 15/41] btrfs: assert delayed node locked when removing delayed item Sasha Levin
2023-09-24 13:15 ` [PATCH AUTOSEL 6.5 16/41] selftests: fix dependency checker script Sasha Levin
2023-09-24 13:15 ` [PATCH AUTOSEL 6.5 17/41] ring-buffer: Do not attempt to read past "commit" Sasha Levin
2023-09-24 13:15 ` [PATCH AUTOSEL 6.5 18/41] net/smc: bugfix for smcr v2 server connect success statistic Sasha Levin
2023-09-24 13:15 ` [PATCH AUTOSEL 6.5 19/41] ata: sata_mv: Fix incorrect string length computation in mv_dump_mem() Sasha Levin
2023-09-24 13:15 ` [PATCH AUTOSEL 6.5 20/41] efi/x86: Ensure that EFI_RUNTIME_MAP is enabled for kexec Sasha Levin
2023-09-24 13:15 ` [PATCH AUTOSEL 6.5 21/41] platform/mellanox: mlxbf-bootctl: add NET dependency into Kconfig Sasha Levin
2023-09-24 13:15 ` [PATCH AUTOSEL 6.5 22/41] platform/x86: asus-wmi: Support 2023 ROG X16 tablet mode Sasha Levin
2023-09-24 13:15 ` [PATCH AUTOSEL 6.5 23/41] thermal/of: add missing of_node_put() Sasha Levin
2023-09-24 13:15 ` [PATCH AUTOSEL 6.5 24/41] drm/amdgpu: Store CU info from all XCCs for GFX v9.4.3 Sasha Levin
2023-09-24 13:15 ` [PATCH AUTOSEL 6.5 25/41] drm/amdkfd: Update cache info reporting " Sasha Levin
2023-09-24 13:15 ` [PATCH AUTOSEL 6.5 26/41] drm/amdkfd: Update CU masking for GFX 9.4.3 Sasha Levin
2023-09-24 13:15 ` [PATCH AUTOSEL 6.5 27/41] drm/amd/display: Add dirty rect support for Replay Sasha Levin
2023-09-24 13:15 ` [PATCH AUTOSEL 6.5 28/41] drm/amd/display: Don't check registers, if using AUX BL control Sasha Levin
2023-09-24 13:15 ` [PATCH AUTOSEL 6.5 29/41] drm/amdgpu/soc21: don't remap HDP registers for SR-IOV Sasha Levin
2023-09-24 13:15 ` [PATCH AUTOSEL 6.5 30/41] drm/amdgpu/nbio4.3: set proper rmmio_remap.reg_offset " Sasha Levin
2023-09-24 13:15 ` [PATCH AUTOSEL 6.5 31/41] drm/amdgpu: fallback to old RAS error message for aqua_vanjaram Sasha Levin
2023-09-24 13:15 ` [PATCH AUTOSEL 6.5 32/41] drm/amdkfd: Checkpoint and restore queues on GFX11 Sasha Levin
2023-09-24 13:15 ` [PATCH AUTOSEL 6.5 33/41] drm/amdgpu: Handle null atom context in VBIOS info ioctl Sasha Levin
2023-09-24 13:15 ` [PATCH AUTOSEL 6.5 34/41] objtool: Fix _THIS_IP_ detection for cold functions Sasha Levin
2023-09-24 13:15 ` [PATCH AUTOSEL 6.5 35/41] nvme-pci: do not set the NUMA node of device if it has none Sasha Levin
2023-09-24 13:15 ` [PATCH AUTOSEL 6.5 36/41] riscv: errata: fix T-Head dcache.cva encoding Sasha Levin
2023-09-24 13:15 ` [PATCH AUTOSEL 6.5 37/41] scsi: pm80xx: Use phy-specific SAS address when sending PHY_START command Sasha Levin
2023-09-24 13:15 ` [PATCH AUTOSEL 6.5 38/41] scsi: pm80xx: Avoid leaking tags when processing OPC_INB_SET_CONTROLLER_CONFIG command Sasha Levin
2023-09-24 13:15 ` [PATCH AUTOSEL 6.5 39/41] smb3: correct places where ENOTSUPP is used instead of preferred EOPNOTSUPP Sasha Levin
2023-09-24 13:15 ` Sasha Levin [this message]
2023-09-24 13:15 ` [PATCH AUTOSEL 6.5 41/41] ata: libata-eh: do not thaw the port twice in ata_eh_reset() Sasha Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230924131529.1275335-40-sashal@kernel.org \
--to=sashal@kernel.org \
--cc=dlemoal@kernel.org \
--cc=linux-ide@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=niklas.cassel@wdc.com \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox