* [PATCH] rtc-opal: Fix handling of firmware error codes, prevent busy loops
@ 2016-08-02 1:50 Stewart Smith
2016-08-03 7:12 ` Michael Ellerman
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Stewart Smith @ 2016-08-02 1:50 UTC (permalink / raw)
To: linuxppc-dev; +Cc: neelegup, Stewart Smith, stable
According to the OPAL docs:
https://github.com/open-power/skiboot/blob/skiboot-5.2.5/doc/opal-api/opal-rtc-read-3.txt
https://github.com/open-power/skiboot/blob/skiboot-5.2.5/doc/opal-api/opal-rtc-write-4.txt
OPAL_HARDWARE may be returned from OPAL_RTC_READ or OPAL_RTC_WRITE and this
indicates either a transient or permanent error.
Prior to this patch, Linux was not dealing with OPAL_HARDWARE being a
permanent error particularly well, in that you could end up in a busy
loop.
This was not too hard to trigger on an AMI BMC based OpenPOWER machine
doing a continuous "ipmitool mc reset cold" to the BMC, the result of
that being that we'd get stuck in an infinite loop in opal_get_rtc_time.
We now retry a few times before returning the error higher up the stack.
Cc: stable@vger.kernel.org
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
---
drivers/rtc/rtc-opal.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/drivers/rtc/rtc-opal.c b/drivers/rtc/rtc-opal.c
index 9c18d6fd8107..fab19e3e2fba 100644
--- a/drivers/rtc/rtc-opal.c
+++ b/drivers/rtc/rtc-opal.c
@@ -58,6 +58,7 @@ static void tm_to_opal(struct rtc_time *tm, u32 *y_m_d, u64 *h_m_s_ms)
static int opal_get_rtc_time(struct device *dev, struct rtc_time *tm)
{
long rc = OPAL_BUSY;
+ int retries = 10;
u32 y_m_d;
u64 h_m_s_ms;
__be32 __y_m_d;
@@ -67,8 +68,11 @@ static int opal_get_rtc_time(struct device *dev, struct rtc_time *tm)
rc = opal_rtc_read(&__y_m_d, &__h_m_s_ms);
if (rc == OPAL_BUSY_EVENT)
opal_poll_events(NULL);
- else
+ else if (retries-- && (rc == OPAL_HARDWARE
+ || rc == OPAL_INTERNAL_ERROR))
msleep(10);
+ else if (rc != OPAL_BUSY && rc != OPAL_BUSY_EVENT)
+ break;
}
if (rc != OPAL_SUCCESS)
@@ -84,6 +88,7 @@ static int opal_get_rtc_time(struct device *dev, struct rtc_time *tm)
static int opal_set_rtc_time(struct device *dev, struct rtc_time *tm)
{
long rc = OPAL_BUSY;
+ int retries = 10;
u32 y_m_d = 0;
u64 h_m_s_ms = 0;
@@ -92,8 +97,11 @@ static int opal_set_rtc_time(struct device *dev, struct rtc_time *tm)
rc = opal_rtc_write(y_m_d, h_m_s_ms);
if (rc == OPAL_BUSY_EVENT)
opal_poll_events(NULL);
- else
+ else if (retries-- && (rc == OPAL_HARDWARE
+ || rc == OPAL_INTERNAL_ERROR))
msleep(10);
+ else if (rc != OPAL_BUSY && rc != OPAL_BUSY_EVENT)
+ break;
}
return rc == OPAL_SUCCESS ? 0 : -EIO;
--
2.1.4
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] rtc-opal: Fix handling of firmware error codes, prevent busy loops
2016-08-02 1:50 [PATCH] rtc-opal: Fix handling of firmware error codes, prevent busy loops Stewart Smith
@ 2016-08-03 7:12 ` Michael Ellerman
2018-01-29 4:13 ` Michael Ellerman
2018-02-06 2:26 ` Alexandre Belloni
2 siblings, 0 replies; 8+ messages in thread
From: Michael Ellerman @ 2016-08-03 7:12 UTC (permalink / raw)
To: Stewart Smith, linuxppc-dev; +Cc: neelegup, Stewart Smith, stable
Stewart Smith <stewart@linux.vnet.ibm.com> writes:
> According to the OPAL docs:
> https://github.com/open-power/skiboot/blob/skiboot-5.2.5/doc/opal-api/opal-rtc-read-3.txt
> https://github.com/open-power/skiboot/blob/skiboot-5.2.5/doc/opal-api/opal-rtc-write-4.txt
> OPAL_HARDWARE may be returned from OPAL_RTC_READ or OPAL_RTC_WRITE and this
> indicates either a transient or permanent error.
>
> Prior to this patch, Linux was not dealing with OPAL_HARDWARE being a
> permanent error particularly well, in that you could end up in a busy
> loop.
>
> This was not too hard to trigger on an AMI BMC based OpenPOWER machine
> doing a continuous "ipmitool mc reset cold" to the BMC, the result of
> that being that we'd get stuck in an infinite loop in opal_get_rtc_time.
>
> We now retry a few times before returning the error higher up the stack.
Looks like this has always been broken, so:
Fixes: 16b1d26e77b1 ("rtc/tpo: Driver to support rtc and wakeup on PowerNV platform")
> Cc: stable@vger.kernel.org
And therefore that should be:
Cc: stable@vger.kernel.org # v3.19+
> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
> ---
> drivers/rtc/rtc-opal.c | 12 ++++++++++--
> 1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/rtc/rtc-opal.c b/drivers/rtc/rtc-opal.c
> index 9c18d6fd8107..fab19e3e2fba 100644
> --- a/drivers/rtc/rtc-opal.c
> +++ b/drivers/rtc/rtc-opal.c
> @@ -58,6 +58,7 @@ static void tm_to_opal(struct rtc_time *tm, u32 *y_m_d, u64 *h_m_s_ms)
> static int opal_get_rtc_time(struct device *dev, struct rtc_time *tm)
> {
> long rc = OPAL_BUSY;
> + int retries = 10;
> u32 y_m_d;
> u64 h_m_s_ms;
> __be32 __y_m_d;
> @@ -67,8 +68,11 @@ static int opal_get_rtc_time(struct device *dev, struct rtc_time *tm)
> rc = opal_rtc_read(&__y_m_d, &__h_m_s_ms);
> if (rc == OPAL_BUSY_EVENT)
> opal_poll_events(NULL);
> - else
> + else if (retries-- && (rc == OPAL_HARDWARE
> + || rc == OPAL_INTERNAL_ERROR))
> msleep(10);
> + else if (rc != OPAL_BUSY && rc != OPAL_BUSY_EVENT)
> + break;
> }
This is a pretty gross API at this point.
That's basically a score of 2 on Rusty's API usability index ("Read the implementation
and you'll get it right" - http://ozlabs.org/~rusty/index.cgi/tech/2008-03-30.html).
The docs don't mention OPAL_INTERNAL_ERROR being transient, nor do they mention
OPAL_BUSY.
Can we at least do a wrapper function in opal.h for drivers to use that handles
some or all of these cases?
cheers
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: rtc-opal: Fix handling of firmware error codes, prevent busy loops
2016-08-02 1:50 [PATCH] rtc-opal: Fix handling of firmware error codes, prevent busy loops Stewart Smith
2016-08-03 7:12 ` Michael Ellerman
@ 2018-01-29 4:13 ` Michael Ellerman
2018-02-06 2:26 ` Alexandre Belloni
2 siblings, 0 replies; 8+ messages in thread
From: Michael Ellerman @ 2018-01-29 4:13 UTC (permalink / raw)
To: Stewart Smith, linuxppc-dev; +Cc: neelegup, Stewart Smith, stable
On Tue, 2016-08-02 at 01:50:16 UTC, Stewart Smith wrote:
> According to the OPAL docs:
> https://github.com/open-power/skiboot/blob/skiboot-5.2.5/doc/opal-api/opal-rtc-read-3.txt
> https://github.com/open-power/skiboot/blob/skiboot-5.2.5/doc/opal-api/opal-rtc-write-4.txt
> OPAL_HARDWARE may be returned from OPAL_RTC_READ or OPAL_RTC_WRITE and this
> indicates either a transient or permanent error.
>
> Prior to this patch, Linux was not dealing with OPAL_HARDWARE being a
> permanent error particularly well, in that you could end up in a busy
> loop.
>
> This was not too hard to trigger on an AMI BMC based OpenPOWER machine
> doing a continuous "ipmitool mc reset cold" to the BMC, the result of
> that being that we'd get stuck in an infinite loop in opal_get_rtc_time.
>
> We now retry a few times before returning the error higher up the stack.
>
> Cc: stable@vger.kernel.org
> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Applied to powerpc next, thanks.
https://git.kernel.org/powerpc/c/5b8b58063029f02da573120ef4dc90
cheers
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: rtc-opal: Fix handling of firmware error codes, prevent busy loops
2016-08-02 1:50 [PATCH] rtc-opal: Fix handling of firmware error codes, prevent busy loops Stewart Smith
2016-08-03 7:12 ` Michael Ellerman
2018-01-29 4:13 ` Michael Ellerman
@ 2018-02-06 2:26 ` Alexandre Belloni
2018-02-06 5:22 ` Michael Ellerman
2018-02-06 7:57 ` Stewart Smith
2 siblings, 2 replies; 8+ messages in thread
From: Alexandre Belloni @ 2018-02-06 2:26 UTC (permalink / raw)
To: Stewart Smith, Michael Ellerman; +Cc: linuxppc-dev, neelegup, stable
Hi,
On 02/08/2016 at 11:50:16 +1000, Stewart Smith wrote:
> According to the OPAL docs:
> https://github.com/open-power/skiboot/blob/skiboot-5.2.5/doc/opal-api/opal-rtc-read-3.txt
> https://github.com/open-power/skiboot/blob/skiboot-5.2.5/doc/opal-api/opal-rtc-write-4.txt
> OPAL_HARDWARE may be returned from OPAL_RTC_READ or OPAL_RTC_WRITE and this
> indicates either a transient or permanent error.
>
> Prior to this patch, Linux was not dealing with OPAL_HARDWARE being a
> permanent error particularly well, in that you could end up in a busy
> loop.
>
> This was not too hard to trigger on an AMI BMC based OpenPOWER machine
> doing a continuous "ipmitool mc reset cold" to the BMC, the result of
> that being that we'd get stuck in an infinite loop in opal_get_rtc_time.
>
> We now retry a few times before returning the error higher up the stack.
>
> Cc: stable@vger.kernel.org
> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
> ---
> drivers/rtc/rtc-opal.c | 12 ++++++++++--
> 1 file changed, 10 insertions(+), 2 deletions(-)
>
Just a note to let you know that this patch should have gone through my
tree but it was not sent to linux-rtc or me.
I guess what happened is that Michael cleaned up the Linux PPC patchwork
queue.
--
Alexandre Belloni, Bootlin (formerly Free Electrons)
Embedded Linux and Kernel engineering
http://bootlin.com
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: rtc-opal: Fix handling of firmware error codes, prevent busy loops
2018-02-06 2:26 ` Alexandre Belloni
@ 2018-02-06 5:22 ` Michael Ellerman
2018-02-06 12:42 ` Alexandre Belloni
2018-02-06 7:57 ` Stewart Smith
1 sibling, 1 reply; 8+ messages in thread
From: Michael Ellerman @ 2018-02-06 5:22 UTC (permalink / raw)
To: Alexandre Belloni, Stewart Smith; +Cc: linuxppc-dev, neelegup, stable
Alexandre Belloni <alexandre.belloni@bootlin.com> writes:
> Hi,
>
> On 02/08/2016 at 11:50:16 +1000, Stewart Smith wrote:
>> According to the OPAL docs:
>> https://github.com/open-power/skiboot/blob/skiboot-5.2.5/doc/opal-api/opal-rtc-read-3.txt
>> https://github.com/open-power/skiboot/blob/skiboot-5.2.5/doc/opal-api/opal-rtc-write-4.txt
>> OPAL_HARDWARE may be returned from OPAL_RTC_READ or OPAL_RTC_WRITE and this
>> indicates either a transient or permanent error.
>>
>> Prior to this patch, Linux was not dealing with OPAL_HARDWARE being a
>> permanent error particularly well, in that you could end up in a busy
>> loop.
>>
>> This was not too hard to trigger on an AMI BMC based OpenPOWER machine
>> doing a continuous "ipmitool mc reset cold" to the BMC, the result of
>> that being that we'd get stuck in an infinite loop in opal_get_rtc_time.
>>
>> We now retry a few times before returning the error higher up the stack.
>>
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
>> ---
>> drivers/rtc/rtc-opal.c | 12 ++++++++++--
>> 1 file changed, 10 insertions(+), 2 deletions(-)
>>
>
> Just a note to let you know that this patch should have gone through my
> tree but it was not sent to linux-rtc or me.
Sorry, I saw it had been languishing for a long time and assumed you'd
missed it.
Happy to revert/rework it if you're not happy with it.
> I guess what happened is that Michael cleaned up the Linux PPC patchwork
> queue.
Yeah I did.
In future I'll ping you if there's something that seems to have fallen
through the cracks.
cheers
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: rtc-opal: Fix handling of firmware error codes, prevent busy loops
2018-02-06 2:26 ` Alexandre Belloni
2018-02-06 5:22 ` Michael Ellerman
@ 2018-02-06 7:57 ` Stewart Smith
1 sibling, 0 replies; 8+ messages in thread
From: Stewart Smith @ 2018-02-06 7:57 UTC (permalink / raw)
To: Alexandre Belloni, Michael Ellerman; +Cc: linuxppc-dev, neelegup, stable
Alexandre Belloni <alexandre.belloni@bootlin.com> writes:
> On 02/08/2016 at 11:50:16 +1000, Stewart Smith wrote:
>> According to the OPAL docs:
>> https://github.com/open-power/skiboot/blob/skiboot-5.2.5/doc/opal-api/opal-rtc-read-3.txt
>> https://github.com/open-power/skiboot/blob/skiboot-5.2.5/doc/opal-api/opal-rtc-write-4.txt
>> OPAL_HARDWARE may be returned from OPAL_RTC_READ or OPAL_RTC_WRITE and this
>> indicates either a transient or permanent error.
>>
>> Prior to this patch, Linux was not dealing with OPAL_HARDWARE being a
>> permanent error particularly well, in that you could end up in a busy
>> loop.
>>
>> This was not too hard to trigger on an AMI BMC based OpenPOWER machine
>> doing a continuous "ipmitool mc reset cold" to the BMC, the result of
>> that being that we'd get stuck in an infinite loop in opal_get_rtc_time.
>>
>> We now retry a few times before returning the error higher up the stack.
>>
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
>> ---
>> drivers/rtc/rtc-opal.c | 12 ++++++++++--
>> 1 file changed, 10 insertions(+), 2 deletions(-)
>>
>
> Just a note to let you know that this patch should have gone through my
> tree but it was not sent to linux-rtc or me.
>
> I guess what happened is that Michael cleaned up the Linux PPC patchwork
> queue.
Apologies for not sending there. My (18 month ago self) bad.
--
Stewart Smith
OPAL Architect, IBM.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: rtc-opal: Fix handling of firmware error codes, prevent busy loops
2018-02-06 5:22 ` Michael Ellerman
@ 2018-02-06 12:42 ` Alexandre Belloni
2018-02-06 14:06 ` Michael Ellerman
0 siblings, 1 reply; 8+ messages in thread
From: Alexandre Belloni @ 2018-02-06 12:42 UTC (permalink / raw)
To: Michael Ellerman; +Cc: Stewart Smith, linuxppc-dev, neelegup, stable
On 06/02/2018 at 16:22:47 +1100, Michael Ellerman wrote:
> > Just a note to let you know that this patch should have gone through my
> > tree but it was not sent to linux-rtc or me.
>
> Sorry, I saw it had been languishing for a long time and assumed you'd
> missed it.
>
> Happy to revert/rework it if you're not happy with it.
>
No, that's fine. It's just that the commit title stands out when using
git log --oneline and that triggered my OCD ;)
> > I guess what happened is that Michael cleaned up the Linux PPC patchwork
> > queue.
>
> Yeah I did.
>
> In future I'll ping you if there's something that seems to have fallen
> through the cracks.
>
Great thanks!
--
Alexandre Belloni, Bootlin (formerly Free Electrons)
Embedded Linux and Kernel engineering
http://bootlin.com
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: rtc-opal: Fix handling of firmware error codes, prevent busy loops
2018-02-06 12:42 ` Alexandre Belloni
@ 2018-02-06 14:06 ` Michael Ellerman
0 siblings, 0 replies; 8+ messages in thread
From: Michael Ellerman @ 2018-02-06 14:06 UTC (permalink / raw)
To: Alexandre Belloni; +Cc: Stewart Smith, linuxppc-dev, neelegup, stable
Alexandre Belloni <alexandre.belloni@bootlin.com> writes:
> On 06/02/2018 at 16:22:47 +1100, Michael Ellerman wrote:
>> > Just a note to let you know that this patch should have gone through my
>> > tree but it was not sent to linux-rtc or me.
>>
>> Sorry, I saw it had been languishing for a long time and assumed you'd
>> missed it.
>>
>> Happy to revert/rework it if you're not happy with it.
>>
>
> No, that's fine. It's just that the commit title stands out when using
> git log --oneline and that triggered my OCD ;)
Oh no! Now I'm *really* sorry, that's the worst! :)
cheers
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2018-02-06 14:06 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-08-02 1:50 [PATCH] rtc-opal: Fix handling of firmware error codes, prevent busy loops Stewart Smith
2016-08-03 7:12 ` Michael Ellerman
2018-01-29 4:13 ` Michael Ellerman
2018-02-06 2:26 ` Alexandre Belloni
2018-02-06 5:22 ` Michael Ellerman
2018-02-06 12:42 ` Alexandre Belloni
2018-02-06 14:06 ` Michael Ellerman
2018-02-06 7:57 ` Stewart Smith
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).