* [PATCH 1/2] ata: libata-eh: do not clear ATA_PFLAG_EH_PENDING in ata_eh_reset()
@ 2023-09-13 22:19 Niklas Cassel
2023-09-13 22:19 ` [PATCH 2/2] ata: libata-eh: do not thaw the port twice " Niklas Cassel
0 siblings, 1 reply; 8+ messages in thread
From: Niklas Cassel @ 2023-09-13 22:19 UTC (permalink / raw)
To: Damien Le Moal; +Cc: Li Nan, Li Nan, linux-ide, Niklas Cassel
From: Niklas Cassel <niklas.cassel@wdc.com>
ata_scsi_port_error_handler() starts off by clearing ATA_PFLAG_EH_PENDING,
before calling ap->ops->error_handler() (without holding the ap->lock).
If an error IRQ is received while ap->ops->error_handler() is running,
the irq handler will set ATA_PFLAG_EH_PENDING.
Once ap->ops->error_handler() returns, ata_scsi_port_error_handler()
checks if ATA_PFLAG_EH_PENDING is set, and if it is, another iteration
of ATA EH is performed.
The problem is that ATA_PFLAG_EH_PENDING is not only cleared by
ata_scsi_port_error_handler(), it is also cleared by ata_eh_reset().
ata_eh_reset() is called by ap->ops->error_handler(). This additional
clearing done by ata_eh_reset() breaks the whole retry logic in
ata_scsi_port_error_handler(). Thus, if an error IRQ is received while
ap->ops->error_handler() is running, the port will currently remain
frozen and will never get re-enabled.
The additional clearing in ata_eh_reset() was introduced in commit
1e641060c4b5 ("libata: clear eh_info on reset completion").
Looking at the original error report:
https://marc.info/?l=linux-ide&m=124765325828495&w=2
We can see the following happening:
[ 1.074659] ata3: XXX port freeze
[ 1.074700] ata3: XXX hardresetting link, stopping engine
[ 1.074746] ata3: XXX flipping SControl
[ 1.411471] ata3: XXX irq_stat=400040 CONN|PHY
[ 1.411475] ata3: XXX port freeze
[ 1.420049] ata3: XXX starting engine
[ 1.420096] ata3: XXX rc=0, class=1
[ 1.420142] ata3: XXX clearing IRQs for thawing
[ 1.420188] ata3: XXX port thawed
[ 1.420234] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
We are not supposed to be able to receive an error IRQ while the port is
frozen (PxIE is set to 0, i.e. all IRQs for the port are disabled).
AHCI 1.3.1 section 10.7.1.1 First Tier (IS Register) states:
"Each bit location can be thought of as reporting a '1' if the virtual
"interrupt line" for that port is indicating it wishes to generate an
interrupt. That is, if a port has one or more interrupt status bit set,
and the enables for those status bits are set, then this bit shall be set."
Additionally, AHCI state P:ComInit clearly shows that the state machine
will only jump to P:ComInitSetIS (which sets IS.IPS(x) to '1'), if PxIE.PCE
is set to '1'. In our case, PxIE is set to 0, so IS.IPS(x) won't get set.
So IS.IPS(x) only gets set if PxIS and PxIE is set.
AHCI 1.3.1 section 10.7.1.1 First Tier (IS Register) also states:
"The bits in this register are read/write clear. It is set by the level of
the virtual interrupt line being a set, and cleared by a write of '1' from
the software."
So if IS.IPS(x) is set, you need to explicitly clear it by writing a 1 to
IS.IPS(x) for that port.
Since PxIE is cleared, the only way to get an interrupt while the port is
frozen, is if IS.IPS(x) is set, and the only way IS.IPS(x) can be set when
the port is frozen, is if it was set before the port was frozen.
However, since commit 94152042eaa9 ("ata: libahci: clear pending interrupt
status"), we clear both PxIS and IS.IPS(x) after freezing the port, but
before the COMRESET, so the problem that commit 1e641060c4b5 ("libata:
clear eh_info on reset completion") fixed can no longer happen.
Thus, revert commit 1e641060c4b5 ("libata: clear eh_info on reset
completion"), so that the retry logic in ata_scsi_port_error_handler()
works once again. (The retry logic is still needed, since we can still
get an error IRQ _after_ the port has been thawed, but before
ata_scsi_port_error_handler() takes the ap->lock in order to check
if ATA_PFLAG_EH_PENDING is set.)
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
---
drivers/ata/libata-eh.c | 13 +++----------
1 file changed, 3 insertions(+), 10 deletions(-)
diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index 159ba6ba19eb..5c493b6316eb 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -2796,18 +2796,11 @@ int ata_eh_reset(struct ata_link *link, int classify,
}
}
- /*
- * Some controllers can't be frozen very well and may set spurious
- * error conditions during reset. Clear accumulated error
- * information and re-thaw the port if frozen. As reset is the
- * final recovery action and we cross check link onlineness against
- * device classification later, no hotplug event is lost by this.
- */
+ /* clear cached SError */
spin_lock_irqsave(link->ap->lock, flags);
- memset(&link->eh_info, 0, sizeof(link->eh_info));
+ link->eh_info.serror = 0;
if (slave)
- memset(&slave->eh_info, 0, sizeof(link->eh_info));
- ap->pflags &= ~ATA_PFLAG_EH_PENDING;
+ slave->eh_info.serror = 0;
spin_unlock_irqrestore(link->ap->lock, flags);
if (ata_port_is_frozen(ap))
--
2.41.0
^ permalink raw reply related [flat|nested] 8+ messages in thread* [PATCH 2/2] ata: libata-eh: do not thaw the port twice in ata_eh_reset()
2023-09-13 22:19 [PATCH 1/2] ata: libata-eh: do not clear ATA_PFLAG_EH_PENDING in ata_eh_reset() Niklas Cassel
@ 2023-09-13 22:19 ` Niklas Cassel
2023-09-13 23:51 ` Damien Le Moal
2023-09-15 6:30 ` Damien Le Moal
0 siblings, 2 replies; 8+ messages in thread
From: Niklas Cassel @ 2023-09-13 22:19 UTC (permalink / raw)
To: Damien Le Moal; +Cc: Li Nan, Li Nan, linux-ide, Niklas Cassel
From: Niklas Cassel <niklas.cassel@wdc.com>
commit 1e641060c4b5 ("libata: clear eh_info on reset completion") added
a workaround that broke the retry mechanism in ATA EH.
Tejun himself suggested to remove this workaround when it was identified
to cause additional problems:
https://lore.kernel.org/linux-ide/20110426135027.GI878@htj.dyndns.org/
He and even said:
"Hmm... it seems I wasn't thinking straight when I added that work around."
https://lore.kernel.org/linux-ide/20110426155229.GM878@htj.dyndns.org/
While removing the workaround solved the issue, however, the workaround was
kept to avoid "spurious hotplug events during reset", and instead another
workaround was added on top of the existing workaround in commit
8c56cacc724c ("libata: fix unexpectedly frozen port after ata_eh_reset()").
Because these IRQs happened when the port was frozen, we know that they
were actually a side effect of PxIS and IS.IPS(x) not being cleared before
the COMRESET. This is now done in commit 94152042eaa9 ("ata: libahci: clear
pending interrupt status"), so these workarounds can now be removed.
Since commit 1e641060c4b5 ("libata: clear eh_info on reset completion") has
now been reverted, the ATA EH retry mechanism is functional again, so there
is once again no need to thaw the port more than once in ata_eh_reset().
This reverts "the workaround on top of the workaround" introduced in commit
8c56cacc724c ("libata: fix unexpectedly frozen port after ata_eh_reset()").
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
---
drivers/ata/libata-eh.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index 5c493b6316eb..4cf4f57e57b8 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -2803,9 +2803,6 @@ int ata_eh_reset(struct ata_link *link, int classify,
slave->eh_info.serror = 0;
spin_unlock_irqrestore(link->ap->lock, flags);
- if (ata_port_is_frozen(ap))
- ata_eh_thaw_port(ap);
-
/*
* Make sure onlineness and classification result correspond.
* Hotplug could have happened during reset and some
--
2.41.0
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: [PATCH 2/2] ata: libata-eh: do not thaw the port twice in ata_eh_reset()
2023-09-13 22:19 ` [PATCH 2/2] ata: libata-eh: do not thaw the port twice " Niklas Cassel
@ 2023-09-13 23:51 ` Damien Le Moal
2023-09-14 9:06 ` Niklas Cassel
2023-09-15 6:30 ` Damien Le Moal
1 sibling, 1 reply; 8+ messages in thread
From: Damien Le Moal @ 2023-09-13 23:51 UTC (permalink / raw)
To: Niklas Cassel; +Cc: Li Nan, Li Nan, linux-ide, Niklas Cassel
On 9/14/23 07:19, Niklas Cassel wrote:
> From: Niklas Cassel <niklas.cassel@wdc.com>
>
> commit 1e641060c4b5 ("libata: clear eh_info on reset completion") added
> a workaround that broke the retry mechanism in ATA EH.
>
> Tejun himself suggested to remove this workaround when it was identified
> to cause additional problems:
> https://lore.kernel.org/linux-ide/20110426135027.GI878@htj.dyndns.org/
>
> He and even said:
> "Hmm... it seems I wasn't thinking straight when I added that work around."
> https://lore.kernel.org/linux-ide/20110426155229.GM878@htj.dyndns.org/
>
> While removing the workaround solved the issue, however, the workaround was
> kept to avoid "spurious hotplug events during reset", and instead another
> workaround was added on top of the existing workaround in commit
> 8c56cacc724c ("libata: fix unexpectedly frozen port after ata_eh_reset()").
>
> Because these IRQs happened when the port was frozen, we know that they
> were actually a side effect of PxIS and IS.IPS(x) not being cleared before
> the COMRESET. This is now done in commit 94152042eaa9 ("ata: libahci: clear
> pending interrupt status"), so these workarounds can now be removed.
>
> Since commit 1e641060c4b5 ("libata: clear eh_info on reset completion") has
> now been reverted, the ATA EH retry mechanism is functional again, so there
> is once again no need to thaw the port more than once in ata_eh_reset().
>
> This reverts "the workaround on top of the workaround" introduced in commit
> 8c56cacc724c ("libata: fix unexpectedly frozen port after ata_eh_reset()").
>
> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
We need a fixes tag. Same for patch 1.
> ---
> drivers/ata/libata-eh.c | 3 ---
> 1 file changed, 3 deletions(-)
>
> diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
> index 5c493b6316eb..4cf4f57e57b8 100644
> --- a/drivers/ata/libata-eh.c
> +++ b/drivers/ata/libata-eh.c
> @@ -2803,9 +2803,6 @@ int ata_eh_reset(struct ata_link *link, int classify,
> slave->eh_info.serror = 0;
> spin_unlock_irqrestore(link->ap->lock, flags);
>
> - if (ata_port_is_frozen(ap))
> - ata_eh_thaw_port(ap);
> -
> /*
> * Make sure onlineness and classification result correspond.
> * Hotplug could have happened during reset and some
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH 2/2] ata: libata-eh: do not thaw the port twice in ata_eh_reset()
2023-09-13 23:51 ` Damien Le Moal
@ 2023-09-14 9:06 ` Niklas Cassel
2023-09-14 10:02 ` Damien Le Moal
0 siblings, 1 reply; 8+ messages in thread
From: Niklas Cassel @ 2023-09-14 9:06 UTC (permalink / raw)
To: Damien Le Moal; +Cc: Niklas Cassel, Li Nan, Li Nan, linux-ide@vger.kernel.org
On Thu, Sep 14, 2023 at 08:51:06AM +0900, Damien Le Moal wrote:
> On 9/14/23 07:19, Niklas Cassel wrote:
> > From: Niklas Cassel <niklas.cassel@wdc.com>
> >
> > commit 1e641060c4b5 ("libata: clear eh_info on reset completion") added
> > a workaround that broke the retry mechanism in ATA EH.
> >
> > Tejun himself suggested to remove this workaround when it was identified
> > to cause additional problems:
> > https://lore.kernel.org/linux-ide/20110426135027.GI878@htj.dyndns.org/
> >
> > He and even said:
> > "Hmm... it seems I wasn't thinking straight when I added that work around."
> > https://lore.kernel.org/linux-ide/20110426155229.GM878@htj.dyndns.org/
> >
> > While removing the workaround solved the issue, however, the workaround was
> > kept to avoid "spurious hotplug events during reset", and instead another
> > workaround was added on top of the existing workaround in commit
> > 8c56cacc724c ("libata: fix unexpectedly frozen port after ata_eh_reset()").
> >
> > Because these IRQs happened when the port was frozen, we know that they
> > were actually a side effect of PxIS and IS.IPS(x) not being cleared before
> > the COMRESET. This is now done in commit 94152042eaa9 ("ata: libahci: clear
> > pending interrupt status"), so these workarounds can now be removed.
> >
> > Since commit 1e641060c4b5 ("libata: clear eh_info on reset completion") has
> > now been reverted, the ATA EH retry mechanism is functional again, so there
> > is once again no need to thaw the port more than once in ata_eh_reset().
> >
> > This reverts "the workaround on top of the workaround" introduced in commit
> > 8c56cacc724c ("libata: fix unexpectedly frozen port after ata_eh_reset()").
> >
> > Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
>
> We need a fixes tag. Same for patch 1.
The workaround introduced in commit 1e641060c4b5 ("libata: clear eh_info on
reset completion") broke ATA EH retry logic, so the proper commit that we
fix is that commit.
However, if we put a Fixes tag with that commit, then this patch will get
backported to all possible stable kernels that has that commit, something
that we do _not_ want.
We can only remove this workaround for kernels that has commit 94152042eaa9
("ata: libahci: clear pending interrupt status").
Do we really need a Fixes tag?
The workaround (which broke ATA EH retry logic) has been in the kernel for
14 years, since then, we've only seen two complaints..
the one by Bruce Stenning 12 years ago (see commit log for this patch),
and the complaint from Huawei folks this year..
I guess we could set the Fixes tag to 94152042eaa9 ("ata: libahci: clear
pending interrupt status"), since we depend on that commit.
However, that is basically a lie, since we are not fixing that commit.
Kind regards,
Niklas
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH 2/2] ata: libata-eh: do not thaw the port twice in ata_eh_reset()
2023-09-14 9:06 ` Niklas Cassel
@ 2023-09-14 10:02 ` Damien Le Moal
2023-09-14 10:11 ` Niklas Cassel
0 siblings, 1 reply; 8+ messages in thread
From: Damien Le Moal @ 2023-09-14 10:02 UTC (permalink / raw)
To: Niklas Cassel; +Cc: Niklas Cassel, Li Nan, Li Nan, linux-ide@vger.kernel.org
On 9/14/23 18:06, Niklas Cassel wrote:
> On Thu, Sep 14, 2023 at 08:51:06AM +0900, Damien Le Moal wrote:
>> On 9/14/23 07:19, Niklas Cassel wrote:
>>> From: Niklas Cassel <niklas.cassel@wdc.com>
>>>
>>> commit 1e641060c4b5 ("libata: clear eh_info on reset completion") added
>>> a workaround that broke the retry mechanism in ATA EH.
>>>
>>> Tejun himself suggested to remove this workaround when it was identified
>>> to cause additional problems:
>>> https://lore.kernel.org/linux-ide/20110426135027.GI878@htj.dyndns.org/
>>>
>>> He and even said:
>>> "Hmm... it seems I wasn't thinking straight when I added that work around."
>>> https://lore.kernel.org/linux-ide/20110426155229.GM878@htj.dyndns.org/
>>>
>>> While removing the workaround solved the issue, however, the workaround was
>>> kept to avoid "spurious hotplug events during reset", and instead another
>>> workaround was added on top of the existing workaround in commit
>>> 8c56cacc724c ("libata: fix unexpectedly frozen port after ata_eh_reset()").
>>>
>>> Because these IRQs happened when the port was frozen, we know that they
>>> were actually a side effect of PxIS and IS.IPS(x) not being cleared before
>>> the COMRESET. This is now done in commit 94152042eaa9 ("ata: libahci: clear
>>> pending interrupt status"), so these workarounds can now be removed.
>>>
>>> Since commit 1e641060c4b5 ("libata: clear eh_info on reset completion") has
>>> now been reverted, the ATA EH retry mechanism is functional again, so there
>>> is once again no need to thaw the port more than once in ata_eh_reset().
>>>
>>> This reverts "the workaround on top of the workaround" introduced in commit
>>> 8c56cacc724c ("libata: fix unexpectedly frozen port after ata_eh_reset()").
>>>
>>> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
>>
>> We need a fixes tag. Same for patch 1.
>
> The workaround introduced in commit 1e641060c4b5 ("libata: clear eh_info on
> reset completion") broke ATA EH retry logic, so the proper commit that we
> fix is that commit.
>
> However, if we put a Fixes tag with that commit, then this patch will get
> backported to all possible stable kernels that has that commit, something
> that we do _not_ want.
>
> We can only remove this workaround for kernels that has commit 94152042eaa9
> ("ata: libahci: clear pending interrupt status").
Squash the 2 fixes together in a single commit ?
>
> Do we really need a Fixes tag?
> The workaround (which broke ATA EH retry logic) has been in the kernel for
> 14 years, since then, we've only seen two complaints..
> the one by Bruce Stenning 12 years ago (see commit log for this patch),
> and the complaint from Huawei folks this year..
>
> I guess we could set the Fixes tag to 94152042eaa9 ("ata: libahci: clear
> pending interrupt status"), since we depend on that commit.
> However, that is basically a lie, since we are not fixing that commit.
>
>
> Kind regards,
> Niklas
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH 2/2] ata: libata-eh: do not thaw the port twice in ata_eh_reset()
2023-09-14 10:02 ` Damien Le Moal
@ 2023-09-14 10:11 ` Niklas Cassel
2023-09-14 10:38 ` Damien Le Moal
0 siblings, 1 reply; 8+ messages in thread
From: Niklas Cassel @ 2023-09-14 10:11 UTC (permalink / raw)
To: Damien Le Moal; +Cc: Niklas Cassel, Li Nan, Li Nan, linux-ide@vger.kernel.org
On Thu, Sep 14, 2023 at 07:02:38PM +0900, Damien Le Moal wrote:
> On 9/14/23 18:06, Niklas Cassel wrote:
> > On Thu, Sep 14, 2023 at 08:51:06AM +0900, Damien Le Moal wrote:
> >> On 9/14/23 07:19, Niklas Cassel wrote:
> >>> From: Niklas Cassel <niklas.cassel@wdc.com>
> >>>
> >>> commit 1e641060c4b5 ("libata: clear eh_info on reset completion") added
> >>> a workaround that broke the retry mechanism in ATA EH.
> >>>
> >>> Tejun himself suggested to remove this workaround when it was identified
> >>> to cause additional problems:
> >>> https://lore.kernel.org/linux-ide/20110426135027.GI878@htj.dyndns.org/
> >>>
> >>> He and even said:
> >>> "Hmm... it seems I wasn't thinking straight when I added that work around."
> >>> https://lore.kernel.org/linux-ide/20110426155229.GM878@htj.dyndns.org/
> >>>
> >>> While removing the workaround solved the issue, however, the workaround was
> >>> kept to avoid "spurious hotplug events during reset", and instead another
> >>> workaround was added on top of the existing workaround in commit
> >>> 8c56cacc724c ("libata: fix unexpectedly frozen port after ata_eh_reset()").
> >>>
> >>> Because these IRQs happened when the port was frozen, we know that they
> >>> were actually a side effect of PxIS and IS.IPS(x) not being cleared before
> >>> the COMRESET. This is now done in commit 94152042eaa9 ("ata: libahci: clear
> >>> pending interrupt status"), so these workarounds can now be removed.
> >>>
> >>> Since commit 1e641060c4b5 ("libata: clear eh_info on reset completion") has
> >>> now been reverted, the ATA EH retry mechanism is functional again, so there
> >>> is once again no need to thaw the port more than once in ata_eh_reset().
> >>>
> >>> This reverts "the workaround on top of the workaround" introduced in commit
> >>> 8c56cacc724c ("libata: fix unexpectedly frozen port after ata_eh_reset()").
> >>>
> >>> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
> >>
> >> We need a fixes tag. Same for patch 1.
> >
> > The workaround introduced in commit 1e641060c4b5 ("libata: clear eh_info on
> > reset completion") broke ATA EH retry logic, so the proper commit that we
> > fix is that commit.
> >
> > However, if we put a Fixes tag with that commit, then this patch will get
> > backported to all possible stable kernels that has that commit, something
> > that we do _not_ want.
> >
> > We can only remove this workaround for kernels that has commit 94152042eaa9
> > ("ata: libahci: clear pending interrupt status").
>
> Squash the 2 fixes together in a single commit ?
We can do that, but the problem would be the same.
commit 94152042eaa9 ("ata: libahci: clear pending interrupt status") is
currently in your for-next branch. Patch 1 and patch 2 in this series
depend on this commit.
Both of these fixes (patch 1 and patch 2 in this series) fix issues caused
by commit 1e641060c4b5 ("libata: clear eh_info on reset completion"),
a 14 year old commit.
We could add a Fixes that on 1e641060c4b5 ("libata: clear eh_info on reset
completion").
But might get this patch to get backported to all old kernels.
We don't want that, as we depend on 94152042eaa9 ("ata: libahci: clear
pending interrupt status").
So... skip a Fixes tag or add a Fixes against on the commit that we depend
on? (Even tough we are not "fixing" that commit.)
Kind regards,
Niklas
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH 2/2] ata: libata-eh: do not thaw the port twice in ata_eh_reset()
2023-09-14 10:11 ` Niklas Cassel
@ 2023-09-14 10:38 ` Damien Le Moal
0 siblings, 0 replies; 8+ messages in thread
From: Damien Le Moal @ 2023-09-14 10:38 UTC (permalink / raw)
To: Niklas Cassel; +Cc: Niklas Cassel, Li Nan, Li Nan, linux-ide@vger.kernel.org
On 9/14/23 19:11, Niklas Cassel wrote:
> On Thu, Sep 14, 2023 at 07:02:38PM +0900, Damien Le Moal wrote:
>> On 9/14/23 18:06, Niklas Cassel wrote:
>>> On Thu, Sep 14, 2023 at 08:51:06AM +0900, Damien Le Moal wrote:
>>>> On 9/14/23 07:19, Niklas Cassel wrote:
>>>>> From: Niklas Cassel <niklas.cassel@wdc.com>
>>>>>
>>>>> commit 1e641060c4b5 ("libata: clear eh_info on reset completion") added
>>>>> a workaround that broke the retry mechanism in ATA EH.
>>>>>
>>>>> Tejun himself suggested to remove this workaround when it was identified
>>>>> to cause additional problems:
>>>>> https://lore.kernel.org/linux-ide/20110426135027.GI878@htj.dyndns.org/
>>>>>
>>>>> He and even said:
>>>>> "Hmm... it seems I wasn't thinking straight when I added that work around."
>>>>> https://lore.kernel.org/linux-ide/20110426155229.GM878@htj.dyndns.org/
>>>>>
>>>>> While removing the workaround solved the issue, however, the workaround was
>>>>> kept to avoid "spurious hotplug events during reset", and instead another
>>>>> workaround was added on top of the existing workaround in commit
>>>>> 8c56cacc724c ("libata: fix unexpectedly frozen port after ata_eh_reset()").
>>>>>
>>>>> Because these IRQs happened when the port was frozen, we know that they
>>>>> were actually a side effect of PxIS and IS.IPS(x) not being cleared before
>>>>> the COMRESET. This is now done in commit 94152042eaa9 ("ata: libahci: clear
>>>>> pending interrupt status"), so these workarounds can now be removed.
>>>>>
>>>>> Since commit 1e641060c4b5 ("libata: clear eh_info on reset completion") has
>>>>> now been reverted, the ATA EH retry mechanism is functional again, so there
>>>>> is once again no need to thaw the port more than once in ata_eh_reset().
>>>>>
>>>>> This reverts "the workaround on top of the workaround" introduced in commit
>>>>> 8c56cacc724c ("libata: fix unexpectedly frozen port after ata_eh_reset()").
>>>>>
>>>>> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
>>>>
>>>> We need a fixes tag. Same for patch 1.
>>>
>>> The workaround introduced in commit 1e641060c4b5 ("libata: clear eh_info on
>>> reset completion") broke ATA EH retry logic, so the proper commit that we
>>> fix is that commit.
>>>
>>> However, if we put a Fixes tag with that commit, then this patch will get
>>> backported to all possible stable kernels that has that commit, something
>>> that we do _not_ want.
>>>
>>> We can only remove this workaround for kernels that has commit 94152042eaa9
>>> ("ata: libahci: clear pending interrupt status").
>>
>> Squash the 2 fixes together in a single commit ?
>
> We can do that, but the problem would be the same.
>
> commit 94152042eaa9 ("ata: libahci: clear pending interrupt status") is
> currently in your for-next branch. Patch 1 and patch 2 in this series
> depend on this commit.
That patch is in for next for testing only. I can remove it and change it.
>
> Both of these fixes (patch 1 and patch 2 in this series) fix issues caused
> by commit 1e641060c4b5 ("libata: clear eh_info on reset completion"),
> a 14 year old commit.
>
> We could add a Fixes that on 1e641060c4b5 ("libata: clear eh_info on reset
> completion").
> But might get this patch to get backported to all old kernels.
> We don't want that, as we depend on 94152042eaa9 ("ata: libahci: clear
> pending interrupt status").
>
>
> So... skip a Fixes tag or add a Fixes against on the commit that we depend
> on? (Even tough we are not "fixing" that commit.)
I can add the 2 patches to for-6.6-fixes without fixes tag and just cc stable.
There are not that many LTS versions. Oldest still maintained is 4.14.
>
>
> Kind regards,
> Niklas
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] ata: libata-eh: do not thaw the port twice in ata_eh_reset()
2023-09-13 22:19 ` [PATCH 2/2] ata: libata-eh: do not thaw the port twice " Niklas Cassel
2023-09-13 23:51 ` Damien Le Moal
@ 2023-09-15 6:30 ` Damien Le Moal
1 sibling, 0 replies; 8+ messages in thread
From: Damien Le Moal @ 2023-09-15 6:30 UTC (permalink / raw)
To: Niklas Cassel; +Cc: Li Nan, Li Nan, linux-ide, Niklas Cassel
On 9/14/23 07:19, Niklas Cassel wrote:
> From: Niklas Cassel <niklas.cassel@wdc.com>
>
> commit 1e641060c4b5 ("libata: clear eh_info on reset completion") added
> a workaround that broke the retry mechanism in ATA EH.
>
> Tejun himself suggested to remove this workaround when it was identified
> to cause additional problems:
> https://lore.kernel.org/linux-ide/20110426135027.GI878@htj.dyndns.org/
>
> He and even said:
> "Hmm... it seems I wasn't thinking straight when I added that work around."
> https://lore.kernel.org/linux-ide/20110426155229.GM878@htj.dyndns.org/
>
> While removing the workaround solved the issue, however, the workaround was
> kept to avoid "spurious hotplug events during reset", and instead another
> workaround was added on top of the existing workaround in commit
> 8c56cacc724c ("libata: fix unexpectedly frozen port after ata_eh_reset()").
>
> Because these IRQs happened when the port was frozen, we know that they
> were actually a side effect of PxIS and IS.IPS(x) not being cleared before
> the COMRESET. This is now done in commit 94152042eaa9 ("ata: libahci: clear
> pending interrupt status"), so these workarounds can now be removed.
>
> Since commit 1e641060c4b5 ("libata: clear eh_info on reset completion") has
> now been reverted, the ATA EH retry mechanism is functional again, so there
> is once again no need to thaw the port more than once in ata_eh_reset().
>
> This reverts "the workaround on top of the workaround" introduced in commit
> 8c56cacc724c ("libata: fix unexpectedly frozen port after ata_eh_reset()").
>
> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
I queued this and patch 1 also in fo-6.6.-fixes. Thanks !
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2023-09-15 6:30 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-13 22:19 [PATCH 1/2] ata: libata-eh: do not clear ATA_PFLAG_EH_PENDING in ata_eh_reset() Niklas Cassel
2023-09-13 22:19 ` [PATCH 2/2] ata: libata-eh: do not thaw the port twice " Niklas Cassel
2023-09-13 23:51 ` Damien Le Moal
2023-09-14 9:06 ` Niklas Cassel
2023-09-14 10:02 ` Damien Le Moal
2023-09-14 10:11 ` Niklas Cassel
2023-09-14 10:38 ` Damien Le Moal
2023-09-15 6:30 ` Damien Le Moal
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox