* [PATCH 0/4] Improve error handling for qnap-mcu transfers
@ 2025-09-23 16:08 Heiko Stuebner
2025-09-23 16:08 ` [PATCH 1/4] mfd: qnap-mcu: Calculate the checksum on the actual number of bytes received Heiko Stuebner
` (4 more replies)
0 siblings, 5 replies; 7+ messages in thread
From: Heiko Stuebner @ 2025-09-23 16:08 UTC (permalink / raw)
To: lee; +Cc: linux-kernel, heiko
Digging deeper into how that MCU behaves, I found out it can return
more status codes than the "@0" for "ok".
The additional codes can report a failed checksum verification and
some "general" error for the command execution.
This also explains sporadic command timeout messages, I have seen
over time, when the controller sends an error code while we expect
a longer reply from it.
So while I'm not sure yet why it reports an error, with these changes
we at least handle the error return code in a meaningful way.
Heiko Stuebner (4):
mfd: qnap-mcu: Calculate the checksum on the actual number of bytes
received
mfd: qnap-mcu: Use EPROTO in stead of EIO on checksum errors
mfd: qnap-mcu: Move checksum verification to its own function
mfd: qnap-mcu: Add proper error handling for command errors
drivers/mfd/qnap-mcu.c | 55 +++++++++++++++++++++++++++++++++++++++---
1 file changed, 51 insertions(+), 4 deletions(-)
--
2.47.2
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/4] mfd: qnap-mcu: Calculate the checksum on the actual number of bytes received
2025-09-23 16:08 [PATCH 0/4] Improve error handling for qnap-mcu transfers Heiko Stuebner
@ 2025-09-23 16:08 ` Heiko Stuebner
2025-09-23 16:08 ` [PATCH 2/4] mfd: qnap-mcu: Use EPROTO in stead of EIO on checksum errors Heiko Stuebner
` (3 subsequent siblings)
4 siblings, 0 replies; 7+ messages in thread
From: Heiko Stuebner @ 2025-09-23 16:08 UTC (permalink / raw)
To: lee; +Cc: linux-kernel, heiko
In the case of an error message, the number of received bytes can be
less than originally expected but still contain a valid message.
If the transfer itself ended in an error we would exit earlier already.
So calculate the checksum on the number of received bytes and not the
number of expected bytes.
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
---
drivers/mfd/qnap-mcu.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/mfd/qnap-mcu.c b/drivers/mfd/qnap-mcu.c
index 2be429a50611..e3210541ec56 100644
--- a/drivers/mfd/qnap-mcu.c
+++ b/drivers/mfd/qnap-mcu.c
@@ -175,8 +175,8 @@ int qnap_mcu_exec(struct qnap_mcu *mcu,
return -ETIMEDOUT;
}
- crc = qnap_mcu_csum(rx, reply_data_size);
- if (crc != rx[reply_data_size]) {
+ crc = qnap_mcu_csum(rx, reply->received - QNAP_MCU_CHECKSUM_SIZE);
+ if (crc != rx[reply->received - QNAP_MCU_CHECKSUM_SIZE]) {
dev_err(&mcu->serdev->dev, "Invalid Checksum received\n");
return -EIO;
}
--
2.47.2
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 2/4] mfd: qnap-mcu: Use EPROTO in stead of EIO on checksum errors
2025-09-23 16:08 [PATCH 0/4] Improve error handling for qnap-mcu transfers Heiko Stuebner
2025-09-23 16:08 ` [PATCH 1/4] mfd: qnap-mcu: Calculate the checksum on the actual number of bytes received Heiko Stuebner
@ 2025-09-23 16:08 ` Heiko Stuebner
2025-09-23 16:08 ` [PATCH 3/4] mfd: qnap-mcu: Move checksum verification to its own function Heiko Stuebner
` (2 subsequent siblings)
4 siblings, 0 replies; 7+ messages in thread
From: Heiko Stuebner @ 2025-09-23 16:08 UTC (permalink / raw)
To: lee; +Cc: linux-kernel, heiko
EPROTO stands for protocol error and a lot of driver already use it
to designate errors in the sent or received data from a peripheral.
So use it in the qnap-mcu as well for checksum errors.
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
---
drivers/mfd/qnap-mcu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/mfd/qnap-mcu.c b/drivers/mfd/qnap-mcu.c
index e3210541ec56..b4b630f7d413 100644
--- a/drivers/mfd/qnap-mcu.c
+++ b/drivers/mfd/qnap-mcu.c
@@ -178,7 +178,7 @@ int qnap_mcu_exec(struct qnap_mcu *mcu,
crc = qnap_mcu_csum(rx, reply->received - QNAP_MCU_CHECKSUM_SIZE);
if (crc != rx[reply->received - QNAP_MCU_CHECKSUM_SIZE]) {
dev_err(&mcu->serdev->dev, "Invalid Checksum received\n");
- return -EIO;
+ return -EPROTO;
}
memcpy(reply_data, rx, reply_data_size);
--
2.47.2
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 3/4] mfd: qnap-mcu: Move checksum verification to its own function
2025-09-23 16:08 [PATCH 0/4] Improve error handling for qnap-mcu transfers Heiko Stuebner
2025-09-23 16:08 ` [PATCH 1/4] mfd: qnap-mcu: Calculate the checksum on the actual number of bytes received Heiko Stuebner
2025-09-23 16:08 ` [PATCH 2/4] mfd: qnap-mcu: Use EPROTO in stead of EIO on checksum errors Heiko Stuebner
@ 2025-09-23 16:08 ` Heiko Stuebner
2025-09-23 16:08 ` [PATCH 4/4] mfd: qnap-mcu: Add proper error handling for command errors Heiko Stuebner
2025-10-08 15:57 ` [PATCH 0/4] Improve error handling for qnap-mcu transfers Lee Jones
4 siblings, 0 replies; 7+ messages in thread
From: Heiko Stuebner @ 2025-09-23 16:08 UTC (permalink / raw)
To: lee; +Cc: linux-kernel, heiko
We'll need the checksum check in a second place in the future, so
move the verification code to a separate function.
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
---
drivers/mfd/qnap-mcu.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/drivers/mfd/qnap-mcu.c b/drivers/mfd/qnap-mcu.c
index b4b630f7d413..cd836bdd44a8 100644
--- a/drivers/mfd/qnap-mcu.c
+++ b/drivers/mfd/qnap-mcu.c
@@ -78,6 +78,13 @@ static u8 qnap_mcu_csum(const u8 *buf, size_t size)
return csum;
}
+static bool qnap_mcu_verify_checksum(const u8 *buf, size_t size)
+{
+ u8 crc = qnap_mcu_csum(buf, size - QNAP_MCU_CHECKSUM_SIZE);
+
+ return crc == buf[size - QNAP_MCU_CHECKSUM_SIZE];
+}
+
static int qnap_mcu_write(struct qnap_mcu *mcu, const u8 *data, u8 data_size)
{
unsigned char tx[QNAP_MCU_TX_BUFFER_SIZE];
@@ -150,7 +157,6 @@ int qnap_mcu_exec(struct qnap_mcu *mcu,
size_t length = reply_data_size + QNAP_MCU_CHECKSUM_SIZE;
struct qnap_mcu_reply *reply = &mcu->reply;
int ret = 0;
- u8 crc;
if (length > sizeof(rx)) {
dev_err(&mcu->serdev->dev, "expected data too big for receive buffer");
@@ -175,8 +181,7 @@ int qnap_mcu_exec(struct qnap_mcu *mcu,
return -ETIMEDOUT;
}
- crc = qnap_mcu_csum(rx, reply->received - QNAP_MCU_CHECKSUM_SIZE);
- if (crc != rx[reply->received - QNAP_MCU_CHECKSUM_SIZE]) {
+ if (!qnap_mcu_verify_checksum(rx, reply->received)) {
dev_err(&mcu->serdev->dev, "Invalid Checksum received\n");
return -EPROTO;
}
--
2.47.2
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 4/4] mfd: qnap-mcu: Add proper error handling for command errors
2025-09-23 16:08 [PATCH 0/4] Improve error handling for qnap-mcu transfers Heiko Stuebner
` (2 preceding siblings ...)
2025-09-23 16:08 ` [PATCH 3/4] mfd: qnap-mcu: Move checksum verification to its own function Heiko Stuebner
@ 2025-09-23 16:08 ` Heiko Stuebner
2025-10-08 15:54 ` Lee Jones
2025-10-08 15:57 ` [PATCH 0/4] Improve error handling for qnap-mcu transfers Lee Jones
4 siblings, 1 reply; 7+ messages in thread
From: Heiko Stuebner @ 2025-09-23 16:08 UTC (permalink / raw)
To: lee; +Cc: linux-kernel, heiko
Further investigation revealed that the MCU in QNAP devices may return
two error states. One "@8" for a checksum error in the submitted command
and one "@9" for any generic (and sadly unspecified) error.
These error codes with 2 data character can of course also be shorter
then the expected reply length for the submitted command, so we'll
need to check the received data for error codes and exit the receive
portion early in that case.
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
---
drivers/mfd/qnap-mcu.c | 44 +++++++++++++++++++++++++++++++++++++++++-
1 file changed, 43 insertions(+), 1 deletion(-)
diff --git a/drivers/mfd/qnap-mcu.c b/drivers/mfd/qnap-mcu.c
index cd836bdd44a8..fb78609a8433 100644
--- a/drivers/mfd/qnap-mcu.c
+++ b/drivers/mfd/qnap-mcu.c
@@ -19,6 +19,7 @@
/* The longest command found so far is 5 bytes long */
#define QNAP_MCU_MAX_CMD_SIZE 5
#define QNAP_MCU_MAX_DATA_SIZE 36
+#define QNAP_MCU_ERROR_SIZE 2
#define QNAP_MCU_CHECKSUM_SIZE 1
#define QNAP_MCU_RX_BUFFER_SIZE \
@@ -103,6 +104,24 @@ static int qnap_mcu_write(struct qnap_mcu *mcu, const u8 *data, u8 data_size)
return serdev_device_write(mcu->serdev, tx, length, HZ);
}
+static bool qnap_mcu_reply_is_generic_error(unsigned char *buf, size_t size)
+{
+ return (size == QNAP_MCU_ERROR_SIZE + QNAP_MCU_CHECKSUM_SIZE &&
+ buf[0] == '@' && buf[1] == '9');
+}
+
+static bool qnap_mcu_reply_is_checksum_error(unsigned char *buf, size_t size)
+{
+ return (size == QNAP_MCU_ERROR_SIZE + QNAP_MCU_CHECKSUM_SIZE &&
+ buf[0] == '@' && buf[1] == '8');
+}
+
+static bool qnap_mcu_reply_is_any_error(unsigned char *buf, size_t size)
+{
+ return (size == QNAP_MCU_ERROR_SIZE + QNAP_MCU_CHECKSUM_SIZE &&
+ buf[0] == '@' && (buf[1] == '8' || buf[1] == '9'));
+}
+
static size_t qnap_mcu_receive_buf(struct serdev_device *serdev, const u8 *buf, size_t size)
{
struct device *dev = &serdev->dev;
@@ -136,6 +155,19 @@ static size_t qnap_mcu_receive_buf(struct serdev_device *serdev, const u8 *buf,
}
}
+ /*
+ * We received everything the uart had to offer for now.
+ * Check for a possible error reply in the received data.
+ */
+ if (reply->received == QNAP_MCU_ERROR_SIZE + QNAP_MCU_CHECKSUM_SIZE &&
+ qnap_mcu_verify_checksum(reply->data, reply->received) &&
+ qnap_mcu_reply_is_any_error(reply->data, reply->received)) {
+ /* The reply was an error code, we're done */
+ reply->length = 0;
+
+ complete(&reply->done);
+ }
+
/*
* The only way to get out of the above loop and end up here
* is through consuming all of the supplied data, so here we
@@ -182,10 +214,20 @@ int qnap_mcu_exec(struct qnap_mcu *mcu,
}
if (!qnap_mcu_verify_checksum(rx, reply->received)) {
- dev_err(&mcu->serdev->dev, "Invalid Checksum received\n");
+ dev_err(&mcu->serdev->dev, "Invalid Checksum received from controller\n");
+ return -EPROTO;
+ }
+
+ if (qnap_mcu_reply_is_checksum_error(rx, reply->received)) {
+ dev_err(&mcu->serdev->dev, "Controller received invalid Checksum\n");
return -EPROTO;
}
+ if (qnap_mcu_reply_is_generic_error(rx, reply->received)) {
+ dev_err(&mcu->serdev->dev, "Generic error received from controller\n");
+ return -EIO;
+ }
+
memcpy(reply_data, rx, reply_data_size);
return 0;
--
2.47.2
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 4/4] mfd: qnap-mcu: Add proper error handling for command errors
2025-09-23 16:08 ` [PATCH 4/4] mfd: qnap-mcu: Add proper error handling for command errors Heiko Stuebner
@ 2025-10-08 15:54 ` Lee Jones
0 siblings, 0 replies; 7+ messages in thread
From: Lee Jones @ 2025-10-08 15:54 UTC (permalink / raw)
To: Heiko Stuebner; +Cc: linux-kernel
On Tue, 23 Sep 2025, Heiko Stuebner wrote:
> Further investigation revealed that the MCU in QNAP devices may return
> two error states. One "@8" for a checksum error in the submitted command
> and one "@9" for any generic (and sadly unspecified) error.
>
> These error codes with 2 data character can of course also be shorter
> then the expected reply length for the submitted command, so we'll
> need to check the received data for error codes and exit the receive
> portion early in that case.
>
> Signed-off-by: Heiko Stuebner <heiko@sntech.de>
> ---
> drivers/mfd/qnap-mcu.c | 44 +++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 43 insertions(+), 1 deletion(-)
I can't help but think that there is a bunch of avoidable duplication in
here.
> diff --git a/drivers/mfd/qnap-mcu.c b/drivers/mfd/qnap-mcu.c
> index cd836bdd44a8..fb78609a8433 100644
> --- a/drivers/mfd/qnap-mcu.c
> +++ b/drivers/mfd/qnap-mcu.c
> @@ -19,6 +19,7 @@
> /* The longest command found so far is 5 bytes long */
> #define QNAP_MCU_MAX_CMD_SIZE 5
> #define QNAP_MCU_MAX_DATA_SIZE 36
> +#define QNAP_MCU_ERROR_SIZE 2
> #define QNAP_MCU_CHECKSUM_SIZE 1
>
> #define QNAP_MCU_RX_BUFFER_SIZE \
> @@ -103,6 +104,24 @@ static int qnap_mcu_write(struct qnap_mcu *mcu, const u8 *data, u8 data_size)
> return serdev_device_write(mcu->serdev, tx, length, HZ);
> }
>
> +static bool qnap_mcu_reply_is_generic_error(unsigned char *buf, size_t size)
> +{
> + return (size == QNAP_MCU_ERROR_SIZE + QNAP_MCU_CHECKSUM_SIZE &&
I'd place this check into a separate, concisely named function, then
call it at the start of qnap_mcu_reply_is_generic_error() and
qnap_mcu_reply_is_checksum_error(). Perhaps something like:
static bool qnap_mcu_is_error_msg(size) {
return (size == QNAP_MCU_ERROR_SIZE + QNAP_MCU_CHECKSUM_SIZE);
}
Then at the start of both functions have:
if (!qnap_mcu_is_error_msg(size))
return false;
if (buf[0] == '@' && buf[1] == '9')(
return true;
return false;
> + buf[0] == '@' && buf[1] == '9');
> +}
> +
> +static bool qnap_mcu_reply_is_checksum_error(unsigned char *buf, size_t size)
> +{
> + return (size == QNAP_MCU_ERROR_SIZE + QNAP_MCU_CHECKSUM_SIZE &&
> + buf[0] == '@' && buf[1] == '8');
> +}
> +
> +static bool qnap_mcu_reply_is_any_error(unsigned char *buf, size_t size)
> +{
Then this should simply call each of the functions above:
if (qnap_mcu_reply_is_generic_error(buf, size)) {
dev_err(&mcu->serdev->dev, "Controller received invalid Checksum\n");
return true;
}
if (qnap_mcu_reply_is_checksum_error(buf, size)) {
dev_err(&mcu->serdev->dev, "Controller received invalid Checksum\n");
return true;
}
return false;
Note: all of this is just hand-written psudo-code that was knocked-up in
about 2-mins. None of it has even vaguely been checked or tested.
> + return (size == QNAP_MCU_ERROR_SIZE + QNAP_MCU_CHECKSUM_SIZE &&
> + buf[0] == '@' && (buf[1] == '8' || buf[1] == '9'));
> +}
> +
> static size_t qnap_mcu_receive_buf(struct serdev_device *serdev, const u8 *buf, size_t size)
> {
> struct device *dev = &serdev->dev;
> @@ -136,6 +155,19 @@ static size_t qnap_mcu_receive_buf(struct serdev_device *serdev, const u8 *buf,
> }
> }
>
> + /*
> + * We received everything the uart had to offer for now.
> + * Check for a possible error reply in the received data.
> + */
> + if (reply->received == QNAP_MCU_ERROR_SIZE + QNAP_MCU_CHECKSUM_SIZE &&
If qnap_mcu_verify_checksum() needs this, it too should be inserted into
the function itself.
> + qnap_mcu_verify_checksum(reply->data, reply->received) &&
> + qnap_mcu_reply_is_any_error(reply->data, reply->received)) {
> + /* The reply was an error code, we're done */
> + reply->length = 0;
> +
> + complete(&reply->done);
> + }
> +
> /*
> * The only way to get out of the above loop and end up here
> * is through consuming all of the supplied data, so here we
> @@ -182,10 +214,20 @@ int qnap_mcu_exec(struct qnap_mcu *mcu,
> }
>
> if (!qnap_mcu_verify_checksum(rx, reply->received)) {
> - dev_err(&mcu->serdev->dev, "Invalid Checksum received\n");
> + dev_err(&mcu->serdev->dev, "Invalid Checksum received from controller\n");
As you can see above, I moved the messages into the checking functions too.
> + return -EPROTO;
> + }
> +
> + if (qnap_mcu_reply_is_checksum_error(rx, reply->received)) {
> + dev_err(&mcu->serdev->dev, "Controller received invalid Checksum\n");
> return -EPROTO;
> }
>
> + if (qnap_mcu_reply_is_generic_error(rx, reply->received)) {
> + dev_err(&mcu->serdev->dev, "Generic error received from controller\n");
> + return -EIO;
> + }
> +
> memcpy(reply_data, rx, reply_data_size);
>
> return 0;
> --
> 2.47.2
>
--
Lee Jones [李琼斯]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 0/4] Improve error handling for qnap-mcu transfers
2025-09-23 16:08 [PATCH 0/4] Improve error handling for qnap-mcu transfers Heiko Stuebner
` (3 preceding siblings ...)
2025-09-23 16:08 ` [PATCH 4/4] mfd: qnap-mcu: Add proper error handling for command errors Heiko Stuebner
@ 2025-10-08 15:57 ` Lee Jones
4 siblings, 0 replies; 7+ messages in thread
From: Lee Jones @ 2025-10-08 15:57 UTC (permalink / raw)
To: Heiko Stuebner; +Cc: linux-kernel
On Tue, 23 Sep 2025, Heiko Stuebner wrote:
> Digging deeper into how that MCU behaves, I found out it can return
> more status codes than the "@0" for "ok".
>
> The additional codes can report a failed checksum verification and
> some "general" error for the command execution.
>
> This also explains sporadic command timeout messages, I have seen
> over time, when the controller sends an error code while we expect
> a longer reply from it.
>
> So while I'm not sure yet why it reports an error, with these changes
> we at least handle the error return code in a meaningful way.
>
>
> Heiko Stuebner (4):
> mfd: qnap-mcu: Calculate the checksum on the actual number of bytes
> received
> mfd: qnap-mcu: Use EPROTO in stead of EIO on checksum errors
> mfd: qnap-mcu: Move checksum verification to its own function
> mfd: qnap-mcu: Add proper error handling for command errors
>
> drivers/mfd/qnap-mcu.c | 55 +++++++++++++++++++++++++++++++++++++++---
> 1 file changed, 51 insertions(+), 4 deletions(-)
Note to self: patches 1-3 look good.
--
Lee Jones [李琼斯]
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-10-08 15:57 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-23 16:08 [PATCH 0/4] Improve error handling for qnap-mcu transfers Heiko Stuebner
2025-09-23 16:08 ` [PATCH 1/4] mfd: qnap-mcu: Calculate the checksum on the actual number of bytes received Heiko Stuebner
2025-09-23 16:08 ` [PATCH 2/4] mfd: qnap-mcu: Use EPROTO in stead of EIO on checksum errors Heiko Stuebner
2025-09-23 16:08 ` [PATCH 3/4] mfd: qnap-mcu: Move checksum verification to its own function Heiko Stuebner
2025-09-23 16:08 ` [PATCH 4/4] mfd: qnap-mcu: Add proper error handling for command errors Heiko Stuebner
2025-10-08 15:54 ` Lee Jones
2025-10-08 15:57 ` [PATCH 0/4] Improve error handling for qnap-mcu transfers Lee Jones
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox