* [PATCH v2 0/2] fsi and hwmon (occ): Prevent occasional checksum failures @ 2022-04-26 15:49 Eddie James 2022-04-26 15:49 ` [PATCH v2 1/2] fsi: occ: Fix checksum failure mode Eddie James 2022-04-26 15:49 ` [PATCH v2 2/2] hwmon (occ): Retry for checksum failure Eddie James 0 siblings, 2 replies; 4+ messages in thread From: Eddie James @ 2022-04-26 15:49 UTC (permalink / raw) To: linux-fsi Cc: linux-hwmon, linux-kernel, jdelvare, linux, joel, jk, David.Laight Due to the OCC communication design with a shared SRAM area, checkum errors are expected due to corrupted buffer from OCC communications with other system components. Therefore, use a unique errno for checksum failures and retry the command twice in that case. Changes since v1: - Refactor the retry loop Eddie James (2): fsi: occ: Fix checksum failure mode hwmon (occ): Retry for checksum failure drivers/fsi/fsi-occ.c | 7 +++++-- drivers/hwmon/occ/p9_sbe.c | 15 +++++++++++---- 2 files changed, 16 insertions(+), 6 deletions(-) -- 2.27.0 ^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH v2 1/2] fsi: occ: Fix checksum failure mode 2022-04-26 15:49 [PATCH v2 0/2] fsi and hwmon (occ): Prevent occasional checksum failures Eddie James @ 2022-04-26 15:49 ` Eddie James 2022-04-26 15:49 ` [PATCH v2 2/2] hwmon (occ): Retry for checksum failure Eddie James 1 sibling, 0 replies; 4+ messages in thread From: Eddie James @ 2022-04-26 15:49 UTC (permalink / raw) To: linux-fsi Cc: linux-hwmon, linux-kernel, jdelvare, linux, joel, jk, David.Laight Change the checksum errno to something different than the errno used for a bad SBE message. In addition, don't set the user's response length to the data length in this case, since it's not SBE FFDC. Signed-off-by: Eddie James <eajames@linux.ibm.com> --- drivers/fsi/fsi-occ.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/fsi/fsi-occ.c b/drivers/fsi/fsi-occ.c index c9cc75fbdfb9..3d04e8baecbb 100644 --- a/drivers/fsi/fsi-occ.c +++ b/drivers/fsi/fsi-occ.c @@ -246,7 +246,7 @@ static int occ_verify_checksum(struct occ *occ, struct occ_response *resp, if (checksum != checksum_resp) { dev_err(occ->dev, "Bad checksum: %04x!=%04x\n", checksum, checksum_resp); - return -EBADMSG; + return -EBADE; } return 0; @@ -575,8 +575,11 @@ int fsi_occ_submit(struct device *dev, const void *request, size_t req_len, dev_dbg(dev, "resp_status=%02x resp_data_len=%d\n", resp->return_status, resp_data_length); - occ->client_response_size = resp_data_length + 7; rc = occ_verify_checksum(occ, resp, resp_data_length); + if (rc) + goto done; + + occ->client_response_size = resp_data_length + 7; done: *resp_len = occ->client_response_size; -- 2.27.0 ^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH v2 2/2] hwmon (occ): Retry for checksum failure 2022-04-26 15:49 [PATCH v2 0/2] fsi and hwmon (occ): Prevent occasional checksum failures Eddie James 2022-04-26 15:49 ` [PATCH v2 1/2] fsi: occ: Fix checksum failure mode Eddie James @ 2022-04-26 15:49 ` Eddie James 2022-04-27 8:34 ` Joel Stanley 1 sibling, 1 reply; 4+ messages in thread From: Eddie James @ 2022-04-26 15:49 UTC (permalink / raw) To: linux-fsi Cc: linux-hwmon, linux-kernel, jdelvare, linux, joel, jk, David.Laight Due to the OCC communication design with a shared SRAM area, checkum errors are expected due to corrupted buffer from OCC communications with other system components. Therefore, retry the command twice in the event of a checksum failure. Signed-off-by: Eddie James <eajames@linux.ibm.com> Acked-by: Guenter Roeck <linux@roeck-us.net> --- drivers/hwmon/occ/p9_sbe.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/drivers/hwmon/occ/p9_sbe.c b/drivers/hwmon/occ/p9_sbe.c index 49b13cc01073..e6ccef2af659 100644 --- a/drivers/hwmon/occ/p9_sbe.c +++ b/drivers/hwmon/occ/p9_sbe.c @@ -14,6 +14,8 @@ #include "common.h" +#define OCC_CHECKSUM_RETRIES 3 + struct p9_sbe_occ { struct occ occ; bool sbe_error; @@ -83,17 +85,22 @@ static int p9_sbe_occ_send_cmd(struct occ *occ, u8 *cmd, size_t len) struct occ_response *resp = &occ->resp; struct p9_sbe_occ *ctx = to_p9_sbe_occ(occ); size_t resp_len = sizeof(*resp); + int i; int rc; - rc = fsi_occ_submit(ctx->sbe, cmd, len, resp, &resp_len); - if (rc < 0) { + for (i = 0; i < OCC_CHECKSUM_RETRIES; ++i) { + rc = fsi_occ_submit(ctx->sbe, cmd, len, resp, &resp_len); + if (rc >= 0) + break; if (resp_len) { if (p9_sbe_occ_save_ffdc(ctx, resp, resp_len)) sysfs_notify(&occ->bus_dev->kobj, NULL, bin_attr_ffdc.attr.name); - } - return rc; + return rc; + } + if (rc != -EBADE) + return rc; } switch (resp->return_status) { -- 2.27.0 ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH v2 2/2] hwmon (occ): Retry for checksum failure 2022-04-26 15:49 ` [PATCH v2 2/2] hwmon (occ): Retry for checksum failure Eddie James @ 2022-04-27 8:34 ` Joel Stanley 0 siblings, 0 replies; 4+ messages in thread From: Joel Stanley @ 2022-04-27 8:34 UTC (permalink / raw) To: Eddie James Cc: linux-fsi, linux-hwmon, Linux Kernel Mailing List, Jean Delvare, Guenter Roeck, Jeremy Kerr, David Laight On Tue, 26 Apr 2022 at 15:50, Eddie James <eajames@linux.ibm.com> wrote: > > Due to the OCC communication design with a shared SRAM area, > checkum errors are expected due to corrupted buffer from OCC > communications with other system components. Therefore, retry > the command twice in the event of a checksum failure. > > Signed-off-by: Eddie James <eajames@linux.ibm.com> > Acked-by: Guenter Roeck <linux@roeck-us.net> > --- > drivers/hwmon/occ/p9_sbe.c | 15 +++++++++++---- > 1 file changed, 11 insertions(+), 4 deletions(-) > > diff --git a/drivers/hwmon/occ/p9_sbe.c b/drivers/hwmon/occ/p9_sbe.c > index 49b13cc01073..e6ccef2af659 100644 > --- a/drivers/hwmon/occ/p9_sbe.c > +++ b/drivers/hwmon/occ/p9_sbe.c > @@ -14,6 +14,8 @@ > > #include "common.h" > > +#define OCC_CHECKSUM_RETRIES 3 > + > struct p9_sbe_occ { > struct occ occ; > bool sbe_error; > @@ -83,17 +85,22 @@ static int p9_sbe_occ_send_cmd(struct occ *occ, u8 *cmd, size_t len) > struct occ_response *resp = &occ->resp; > struct p9_sbe_occ *ctx = to_p9_sbe_occ(occ); > size_t resp_len = sizeof(*resp); > + int i; > int rc; > > - rc = fsi_occ_submit(ctx->sbe, cmd, len, resp, &resp_len); > - if (rc < 0) { > + for (i = 0; i < OCC_CHECKSUM_RETRIES; ++i) { > + rc = fsi_occ_submit(ctx->sbe, cmd, len, resp, &resp_len); > + if (rc >= 0) > + break; > if (resp_len) { > if (p9_sbe_occ_save_ffdc(ctx, resp, resp_len)) > sysfs_notify(&occ->bus_dev->kobj, NULL, > bin_attr_ffdc.attr.name); > - } > > - return rc; > + return rc; > + } > + if (rc != -EBADE) > + return rc; Future you might appreciate a comment above the EBADE check clarifying why that error is being special cased. > } > > switch (resp->return_status) { > -- > 2.27.0 > ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2022-04-27 8:34 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-04-26 15:49 [PATCH v2 0/2] fsi and hwmon (occ): Prevent occasional checksum failures Eddie James 2022-04-26 15:49 ` [PATCH v2 1/2] fsi: occ: Fix checksum failure mode Eddie James 2022-04-26 15:49 ` [PATCH v2 2/2] hwmon (occ): Retry for checksum failure Eddie James 2022-04-27 8:34 ` Joel Stanley
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox