* [PATCH v2 0/3] serial: qcom-geni: fix lockups
@ 2024-07-04 10:18 Johan Hovold
2024-07-04 10:18 ` [PATCH v2 1/3] serial: qcom-geni: fix soft lockup on sw flow control and suspend Johan Hovold
` (3 more replies)
0 siblings, 4 replies; 11+ messages in thread
From: Johan Hovold @ 2024-07-04 10:18 UTC (permalink / raw)
To: Greg Kroah-Hartman, Jiri Slaby
Cc: Douglas Anderson, Konrad Dybcio, Bjorn Andersson, linux-arm-msm,
linux-serial, linux-kernel, Johan Hovold
Since 6.10-rc1, Qualcomm machines with a serial port can easily lock up
hard, for example, when stopping a getty on reboot.
This was triggered by the kfifo conversion, which turned an existing bug
that caused the driver to print discarded characters after a buffer
flush into a hard lockup.
This series fixes the regression and a related soft lockup issue that
can be triggered on software flow control and on suspend.
Doug has posted an alternative series of fixes here that depends on
reworking the driver a fair bit here:
https://lore.kernel.org/lkml/20240610222515.3023730-1-dianders@chromium.org/
This rework has a significant impact on performance on some platforms,
but fortunately it seems such a rework can be avoided.
There are further bugs in the console code (e.g. that can lead to lost
characters) that this series does not address, but those can be fixed
separately (and I've started working on that).
Johan
Changes in v2
- restart tx by issuing a transfer command when there is stale data in
the fifo
- reorder and rename patches
- rename cancel tx helper
Johan Hovold (3):
serial: qcom-geni: fix soft lockup on sw flow control and suspend
serial: qcom-geni: fix hard lockup on buffer flush
serial: qcom-geni: do not kill the machine on fifo underrun
drivers/tty/serial/qcom_geni_serial.c | 51 ++++++++++++++++++++-------
1 file changed, 38 insertions(+), 13 deletions(-)
--
2.44.1
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v2 1/3] serial: qcom-geni: fix soft lockup on sw flow control and suspend
2024-07-04 10:18 [PATCH v2 0/3] serial: qcom-geni: fix lockups Johan Hovold
@ 2024-07-04 10:18 ` Johan Hovold
2024-07-04 10:18 ` [PATCH v2 2/3] serial: qcom-geni: fix hard lockup on buffer flush Johan Hovold
` (2 subsequent siblings)
3 siblings, 0 replies; 11+ messages in thread
From: Johan Hovold @ 2024-07-04 10:18 UTC (permalink / raw)
To: Greg Kroah-Hartman, Jiri Slaby
Cc: Douglas Anderson, Konrad Dybcio, Bjorn Andersson, linux-arm-msm,
linux-serial, linux-kernel, Johan Hovold, stable
The stop_tx() callback is used to implement software flow control and
must not discard data as the Qualcomm GENI driver is currently doing
when there is an active TX command.
Cancelling an active command can also leave data in the hardware FIFO,
which prevents the watermark interrupt from being enabled when TX is
later restarted. This results in a soft lockup and is easily triggered
by stopping TX using software flow control in a serial console but this
can also happen after suspend.
Fix this by only stopping any active command, and effectively clearing
the hardware fifo, when shutting down the port. When TX is later
restarted, a transfer command may need to be issued to discard any stale
data that could prevent the watermark interrupt from firing.
Fixes: c4f528795d1a ("tty: serial: msm_geni_serial: Add serial driver support for GENI based QUP")
Cc: stable@vger.kernel.org # 4.17
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
---
drivers/tty/serial/qcom_geni_serial.c | 33 +++++++++++++++++++--------
1 file changed, 24 insertions(+), 9 deletions(-)
diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c
index 2bd25afe0d92..a41360d34790 100644
--- a/drivers/tty/serial/qcom_geni_serial.c
+++ b/drivers/tty/serial/qcom_geni_serial.c
@@ -649,15 +649,25 @@ static void qcom_geni_serial_start_tx_dma(struct uart_port *uport)
static void qcom_geni_serial_start_tx_fifo(struct uart_port *uport)
{
+ unsigned char c;
u32 irq_en;
- if (qcom_geni_serial_main_active(uport) ||
- !qcom_geni_serial_tx_empty(uport))
- return;
+ /*
+ * Start a new transfer in case the previous command was cancelled and
+ * left data in the FIFO which may prevent the watermark interrupt
+ * from triggering. Note that the stale data is discarded.
+ */
+ if (!qcom_geni_serial_main_active(uport) &&
+ !qcom_geni_serial_tx_empty(uport)) {
+ if (uart_fifo_out(uport, &c, 1) == 1) {
+ writel(M_CMD_DONE_EN, uport->membase + SE_GENI_M_IRQ_CLEAR);
+ qcom_geni_serial_setup_tx(uport, 1);
+ writel(c, uport->membase + SE_GENI_TX_FIFOn);
+ }
+ }
irq_en = readl(uport->membase + SE_GENI_M_IRQ_EN);
irq_en |= M_TX_FIFO_WATERMARK_EN | M_CMD_DONE_EN;
-
writel(DEF_TX_WM, uport->membase + SE_GENI_TX_WATERMARK_REG);
writel(irq_en, uport->membase + SE_GENI_M_IRQ_EN);
}
@@ -665,13 +675,17 @@ static void qcom_geni_serial_start_tx_fifo(struct uart_port *uport)
static void qcom_geni_serial_stop_tx_fifo(struct uart_port *uport)
{
u32 irq_en;
- struct qcom_geni_serial_port *port = to_dev_port(uport);
irq_en = readl(uport->membase + SE_GENI_M_IRQ_EN);
irq_en &= ~(M_CMD_DONE_EN | M_TX_FIFO_WATERMARK_EN);
writel(0, uport->membase + SE_GENI_TX_WATERMARK_REG);
writel(irq_en, uport->membase + SE_GENI_M_IRQ_EN);
- /* Possible stop tx is called multiple times. */
+}
+
+static void qcom_geni_serial_cancel_tx_cmd(struct uart_port *uport)
+{
+ struct qcom_geni_serial_port *port = to_dev_port(uport);
+
if (!qcom_geni_serial_main_active(uport))
return;
@@ -684,6 +698,8 @@ static void qcom_geni_serial_stop_tx_fifo(struct uart_port *uport)
writel(M_CMD_ABORT_EN, uport->membase + SE_GENI_M_IRQ_CLEAR);
}
writel(M_CMD_CANCEL_EN, uport->membase + SE_GENI_M_IRQ_CLEAR);
+
+ port->tx_remaining = 0;
}
static void qcom_geni_serial_handle_rx_fifo(struct uart_port *uport, bool drop)
@@ -1069,11 +1085,10 @@ static void qcom_geni_serial_shutdown(struct uart_port *uport)
{
disable_irq(uport->irq);
- if (uart_console(uport))
- return;
-
qcom_geni_serial_stop_tx(uport);
qcom_geni_serial_stop_rx(uport);
+
+ qcom_geni_serial_cancel_tx_cmd(uport);
}
static int qcom_geni_serial_port_setup(struct uart_port *uport)
--
2.44.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 2/3] serial: qcom-geni: fix hard lockup on buffer flush
2024-07-04 10:18 [PATCH v2 0/3] serial: qcom-geni: fix lockups Johan Hovold
2024-07-04 10:18 ` [PATCH v2 1/3] serial: qcom-geni: fix soft lockup on sw flow control and suspend Johan Hovold
@ 2024-07-04 10:18 ` Johan Hovold
2024-07-04 10:18 ` [PATCH v2 3/3] serial: qcom-geni: do not kill the machine on fifo underrun Johan Hovold
2024-07-04 10:30 ` [PATCH v2 0/3] serial: qcom-geni: fix lockups Greg Kroah-Hartman
3 siblings, 0 replies; 11+ messages in thread
From: Johan Hovold @ 2024-07-04 10:18 UTC (permalink / raw)
To: Greg Kroah-Hartman, Jiri Slaby
Cc: Douglas Anderson, Konrad Dybcio, Bjorn Andersson, linux-arm-msm,
linux-serial, linux-kernel, Johan Hovold, stable
The Qualcomm GENI serial driver does not handle buffer flushing and used
to continue printing discarded characters when the circular buffer was
cleared. Since commit 1788cf6a91d9 ("tty: serial: switch from circ_buf
to kfifo") this instead results in a hard lockup due to
qcom_geni_serial_send_chunk_fifo() spinning indefinitely in the
interrupt handler.
This is easily triggered by interrupting a command such as dmesg in a
serial console but can also happen when stopping a serial getty on
reboot.
Implement the flush_buffer() callback and use it to cancel any active TX
command when the write buffer has been emptied.
Reported-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/lkml/20240610222515.3023730-1-dianders@chromium.org/
Fixes: 1788cf6a91d9 ("tty: serial: switch from circ_buf to kfifo")
Fixes: a1fee899e5be ("tty: serial: qcom_geni_serial: Fix softlock")
Cc: stable@vger.kernel.org # 5.0
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
---
drivers/tty/serial/qcom_geni_serial.c | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)
diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c
index a41360d34790..b2bbd2d79dbb 100644
--- a/drivers/tty/serial/qcom_geni_serial.c
+++ b/drivers/tty/serial/qcom_geni_serial.c
@@ -906,13 +906,17 @@ static void qcom_geni_serial_handle_tx_fifo(struct uart_port *uport,
else
pending = kfifo_len(&tport->xmit_fifo);
- /* All data has been transmitted and acknowledged as received */
- if (!pending && !status && done) {
+ /* All data has been transmitted or command has been cancelled */
+ if (!pending && done) {
qcom_geni_serial_stop_tx_fifo(uport);
goto out_write_wakeup;
}
- avail = port->tx_fifo_depth - (status & TX_FIFO_WC);
+ if (active)
+ avail = port->tx_fifo_depth - (status & TX_FIFO_WC);
+ else
+ avail = port->tx_fifo_depth;
+
avail *= BYTES_PER_FIFO_WORD;
chunk = min(avail, pending);
@@ -1091,6 +1095,11 @@ static void qcom_geni_serial_shutdown(struct uart_port *uport)
qcom_geni_serial_cancel_tx_cmd(uport);
}
+static void qcom_geni_serial_flush_buffer(struct uart_port *uport)
+{
+ qcom_geni_serial_cancel_tx_cmd(uport);
+}
+
static int qcom_geni_serial_port_setup(struct uart_port *uport)
{
struct qcom_geni_serial_port *port = to_dev_port(uport);
@@ -1547,6 +1556,7 @@ static const struct uart_ops qcom_geni_console_pops = {
.request_port = qcom_geni_serial_request_port,
.config_port = qcom_geni_serial_config_port,
.shutdown = qcom_geni_serial_shutdown,
+ .flush_buffer = qcom_geni_serial_flush_buffer,
.type = qcom_geni_serial_get_type,
.set_mctrl = qcom_geni_serial_set_mctrl,
.get_mctrl = qcom_geni_serial_get_mctrl,
--
2.44.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 3/3] serial: qcom-geni: do not kill the machine on fifo underrun
2024-07-04 10:18 [PATCH v2 0/3] serial: qcom-geni: fix lockups Johan Hovold
2024-07-04 10:18 ` [PATCH v2 1/3] serial: qcom-geni: fix soft lockup on sw flow control and suspend Johan Hovold
2024-07-04 10:18 ` [PATCH v2 2/3] serial: qcom-geni: fix hard lockup on buffer flush Johan Hovold
@ 2024-07-04 10:18 ` Johan Hovold
2024-07-08 23:59 ` Doug Anderson
2024-07-04 10:30 ` [PATCH v2 0/3] serial: qcom-geni: fix lockups Greg Kroah-Hartman
3 siblings, 1 reply; 11+ messages in thread
From: Johan Hovold @ 2024-07-04 10:18 UTC (permalink / raw)
To: Greg Kroah-Hartman, Jiri Slaby
Cc: Douglas Anderson, Konrad Dybcio, Bjorn Andersson, linux-arm-msm,
linux-serial, linux-kernel, Johan Hovold
The Qualcomm GENI serial driver did not handle buffer flushing and used
to print discarded characters when the circular buffer was cleared.
Since commit 1788cf6a91d9 ("tty: serial: switch from circ_buf to kfifo")
this instead resulted in a hard lockup due to
qcom_geni_serial_send_chunk_fifo() spinning indefinitely in the
interrupt handler.
The underlying bugs have now been fixed, but make sure to output NUL
characters instead of killing the machine if a similar driver bug is
ever reintroduced.
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
---
drivers/tty/serial/qcom_geni_serial.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c
index b2bbd2d79dbb..69a632fefc41 100644
--- a/drivers/tty/serial/qcom_geni_serial.c
+++ b/drivers/tty/serial/qcom_geni_serial.c
@@ -878,7 +878,7 @@ static void qcom_geni_serial_send_chunk_fifo(struct uart_port *uport,
memset(buf, 0, sizeof(buf));
tx_bytes = min(remaining, BYTES_PER_FIFO_WORD);
- tx_bytes = uart_fifo_out(uport, buf, tx_bytes);
+ uart_fifo_out(uport, buf, tx_bytes);
iowrite32_rep(uport->membase + SE_GENI_TX_FIFOn, buf, 1);
--
2.44.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v2 0/3] serial: qcom-geni: fix lockups
2024-07-04 10:18 [PATCH v2 0/3] serial: qcom-geni: fix lockups Johan Hovold
` (2 preceding siblings ...)
2024-07-04 10:18 ` [PATCH v2 3/3] serial: qcom-geni: do not kill the machine on fifo underrun Johan Hovold
@ 2024-07-04 10:30 ` Greg Kroah-Hartman
2024-07-08 23:57 ` Doug Anderson
3 siblings, 1 reply; 11+ messages in thread
From: Greg Kroah-Hartman @ 2024-07-04 10:30 UTC (permalink / raw)
To: Johan Hovold
Cc: Jiri Slaby, Douglas Anderson, Konrad Dybcio, Bjorn Andersson,
linux-arm-msm, linux-serial, linux-kernel
On Thu, Jul 04, 2024 at 12:18:02PM +0200, Johan Hovold wrote:
> Since 6.10-rc1, Qualcomm machines with a serial port can easily lock up
> hard, for example, when stopping a getty on reboot.
>
> This was triggered by the kfifo conversion, which turned an existing bug
> that caused the driver to print discarded characters after a buffer
> flush into a hard lockup.
>
> This series fixes the regression and a related soft lockup issue that
> can be triggered on software flow control and on suspend.
>
> Doug has posted an alternative series of fixes here that depends on
> reworking the driver a fair bit here:
>
> https://lore.kernel.org/lkml/20240610222515.3023730-1-dianders@chromium.org/
>
> This rework has a significant impact on performance on some platforms,
> but fortunately it seems such a rework can be avoided.
>
> There are further bugs in the console code (e.g. that can lead to lost
> characters) that this series does not address, but those can be fixed
> separately (and I've started working on that).
I'll take these now, thanks!
greg k-h
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 0/3] serial: qcom-geni: fix lockups
2024-07-04 10:30 ` [PATCH v2 0/3] serial: qcom-geni: fix lockups Greg Kroah-Hartman
@ 2024-07-08 23:57 ` Doug Anderson
2024-07-09 9:32 ` Johan Hovold
0 siblings, 1 reply; 11+ messages in thread
From: Doug Anderson @ 2024-07-08 23:57 UTC (permalink / raw)
To: Greg Kroah-Hartman, Johan Hovold
Cc: Jiri Slaby, Konrad Dybcio, Bjorn Andersson, linux-arm-msm,
linux-serial, linux-kernel
Johan,
On Thu, Jul 4, 2024 at 3:31 AM Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> On Thu, Jul 04, 2024 at 12:18:02PM +0200, Johan Hovold wrote:
> > Since 6.10-rc1, Qualcomm machines with a serial port can easily lock up
> > hard, for example, when stopping a getty on reboot.
> >
> > This was triggered by the kfifo conversion, which turned an existing bug
> > that caused the driver to print discarded characters after a buffer
> > flush into a hard lockup.
> >
> > This series fixes the regression and a related soft lockup issue that
> > can be triggered on software flow control and on suspend.
> >
> > Doug has posted an alternative series of fixes here that depends on
> > reworking the driver a fair bit here:
> >
> > https://lore.kernel.org/lkml/20240610222515.3023730-1-dianders@chromium.org/
> >
> > This rework has a significant impact on performance on some platforms,
> > but fortunately it seems such a rework can be avoided.
> >
> > There are further bugs in the console code (e.g. that can lead to lost
> > characters) that this series does not address, but those can be fixed
> > separately (and I've started working on that).
>
> I'll take these now, thanks!
Are you going to continue to work on the driver? There are still some
pretty bad bugs including ones that are affecting Collabora's test
labs. Unless you want to try to tackle it some other way, I'm going to
keep pushing for something like my original series to land. I can
re-post them atop your patches since they've landed. This will regress
your performance but correctness trumps performance.
-Doug
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 3/3] serial: qcom-geni: do not kill the machine on fifo underrun
2024-07-04 10:18 ` [PATCH v2 3/3] serial: qcom-geni: do not kill the machine on fifo underrun Johan Hovold
@ 2024-07-08 23:59 ` Doug Anderson
2024-07-09 9:44 ` Johan Hovold
0 siblings, 1 reply; 11+ messages in thread
From: Doug Anderson @ 2024-07-08 23:59 UTC (permalink / raw)
To: Johan Hovold
Cc: Greg Kroah-Hartman, Jiri Slaby, Konrad Dybcio, Bjorn Andersson,
linux-arm-msm, linux-serial, linux-kernel
Hi,
On Thu, Jul 4, 2024 at 3:19 AM Johan Hovold <johan+linaro@kernel.org> wrote:
>
> The Qualcomm GENI serial driver did not handle buffer flushing and used
> to print discarded characters when the circular buffer was cleared.
> Since commit 1788cf6a91d9 ("tty: serial: switch from circ_buf to kfifo")
> this instead resulted in a hard lockup due to
> qcom_geni_serial_send_chunk_fifo() spinning indefinitely in the
> interrupt handler.
>
> The underlying bugs have now been fixed, but make sure to output NUL
> characters instead of killing the machine if a similar driver bug is
> ever reintroduced.
>
> Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
> ---
> drivers/tty/serial/qcom_geni_serial.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c
> index b2bbd2d79dbb..69a632fefc41 100644
> --- a/drivers/tty/serial/qcom_geni_serial.c
> +++ b/drivers/tty/serial/qcom_geni_serial.c
> @@ -878,7 +878,7 @@ static void qcom_geni_serial_send_chunk_fifo(struct uart_port *uport,
> memset(buf, 0, sizeof(buf));
> tx_bytes = min(remaining, BYTES_PER_FIFO_WORD);
>
> - tx_bytes = uart_fifo_out(uport, buf, tx_bytes);
> + uart_fifo_out(uport, buf, tx_bytes);
FWIW I would have rather we output something much more obviously wrong
in this case instead of a NUL byte. Maybe we should fill it with "@"
characters or something? As you said: the driver shouldn't get into
this error condition so it shouldn't matter, but if we have a bug in
the future I'd rather it be an obvious bug instead of a subtle bug.
I'm happy to post a patch or provide a Reviewed-by if you want to post
a patch. Let me know.
-Doug
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 0/3] serial: qcom-geni: fix lockups
2024-07-08 23:57 ` Doug Anderson
@ 2024-07-09 9:32 ` Johan Hovold
0 siblings, 0 replies; 11+ messages in thread
From: Johan Hovold @ 2024-07-09 9:32 UTC (permalink / raw)
To: Doug Anderson
Cc: Greg Kroah-Hartman, Johan Hovold, Jiri Slaby, Konrad Dybcio,
Bjorn Andersson, linux-arm-msm, linux-serial, linux-kernel
Hi Doug,
Hope you had a good holiday.
On Mon, Jul 08, 2024 at 04:57:55PM -0700, Doug Anderson wrote:
> On Thu, Jul 4, 2024 at 3:31 AM Greg Kroah-Hartman
> <gregkh@linuxfoundation.org> wrote:
> >
> > On Thu, Jul 04, 2024 at 12:18:02PM +0200, Johan Hovold wrote:
> > > Since 6.10-rc1, Qualcomm machines with a serial port can easily lock up
> > > hard, for example, when stopping a getty on reboot.
> > >
> > > This was triggered by the kfifo conversion, which turned an existing bug
> > > that caused the driver to print discarded characters after a buffer
> > > flush into a hard lockup.
> > >
> > > This series fixes the regression and a related soft lockup issue that
> > > can be triggered on software flow control and on suspend.
> > >
> > > Doug has posted an alternative series of fixes here that depends on
> > > reworking the driver a fair bit here:
> > >
> > > https://lore.kernel.org/lkml/20240610222515.3023730-1-dianders@chromium.org/
> > >
> > > This rework has a significant impact on performance on some platforms,
> > > but fortunately it seems such a rework can be avoided.
> > >
> > > There are further bugs in the console code (e.g. that can lead to lost
> > > characters) that this series does not address, but those can be fixed
> > > separately (and I've started working on that).
> >
> > I'll take these now, thanks!
>
> Are you going to continue to work on the driver? There are still some
> pretty bad bugs including ones that are affecting Collabora's test
> labs. Unless you want to try to tackle it some other way, I'm going to
> keep pushing for something like my original series to land. I can
> re-post them atop your patches since they've landed. This will regress
> your performance but correctness trumps performance.
Yes, I have a working fix for the console issue that could lead to lost
characters. I need to spend some time on other things this week, but I
intend to follow with fixes for the remaining issues as well.
Johan
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 3/3] serial: qcom-geni: do not kill the machine on fifo underrun
2024-07-08 23:59 ` Doug Anderson
@ 2024-07-09 9:44 ` Johan Hovold
2024-07-09 12:55 ` Johan Hovold
0 siblings, 1 reply; 11+ messages in thread
From: Johan Hovold @ 2024-07-09 9:44 UTC (permalink / raw)
To: Doug Anderson
Cc: Johan Hovold, Greg Kroah-Hartman, Jiri Slaby, Konrad Dybcio,
Bjorn Andersson, linux-arm-msm, linux-serial, linux-kernel
On Mon, Jul 08, 2024 at 04:59:59PM -0700, Doug Anderson wrote:
> On Thu, Jul 4, 2024 at 3:19 AM Johan Hovold <johan+linaro@kernel.org> wrote:
> >
> > The Qualcomm GENI serial driver did not handle buffer flushing and used
> > to print discarded characters when the circular buffer was cleared.
> > Since commit 1788cf6a91d9 ("tty: serial: switch from circ_buf to kfifo")
> > this instead resulted in a hard lockup due to
> > qcom_geni_serial_send_chunk_fifo() spinning indefinitely in the
> > interrupt handler.
> >
> > The underlying bugs have now been fixed, but make sure to output NUL
> > characters instead of killing the machine if a similar driver bug is
> > ever reintroduced.
> >
> > Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
> > ---
> > drivers/tty/serial/qcom_geni_serial.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c
> > index b2bbd2d79dbb..69a632fefc41 100644
> > --- a/drivers/tty/serial/qcom_geni_serial.c
> > +++ b/drivers/tty/serial/qcom_geni_serial.c
> > @@ -878,7 +878,7 @@ static void qcom_geni_serial_send_chunk_fifo(struct uart_port *uport,
> > memset(buf, 0, sizeof(buf));
> > tx_bytes = min(remaining, BYTES_PER_FIFO_WORD);
> >
> > - tx_bytes = uart_fifo_out(uport, buf, tx_bytes);
> > + uart_fifo_out(uport, buf, tx_bytes);
>
> FWIW I would have rather we output something much more obviously wrong
> in this case instead of a NUL byte. Maybe we should fill it with "@"
> characters or something? As you said: the driver shouldn't get into
> this error condition so it shouldn't matter, but if we have a bug in
> the future I'd rather it be an obvious bug instead of a subtle bug.
Yeah, I've been running with a patch like that locally in my tests, and
went a bit back and forth whether I should post it. My reasoning for not
doing so was that the bugs have been fixed so we don't need to spend
cycles on memsetting the buffer to anything but NUL (I used 'X' in my
testing).
I guess that can be avoided by only padding the buffer if we ever hit an
underrun, but I still thinks it's questionable to spend the effort as
this is not something that should be needed. In any case, I didn't want
to spend time on it to fix the 6.10 regressions.
Killing the machine is perhaps an effective way to get attention to an
issue, but I'd much rather have an occasional NUL character in the log
*if* this ever becomes an issue at all again.
> I'm happy to post a patch or provide a Reviewed-by if you want to post
> a patch. Let me know.
If you feel strongly about this, I can either fill the buffer with
something else than NUL or add error handling for any such future
hypothetical bugs. What do you prefer?
Johan
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 3/3] serial: qcom-geni: do not kill the machine on fifo underrun
2024-07-09 9:44 ` Johan Hovold
@ 2024-07-09 12:55 ` Johan Hovold
2024-07-09 23:30 ` Doug Anderson
0 siblings, 1 reply; 11+ messages in thread
From: Johan Hovold @ 2024-07-09 12:55 UTC (permalink / raw)
To: Doug Anderson
Cc: Johan Hovold, Greg Kroah-Hartman, Jiri Slaby, Konrad Dybcio,
Bjorn Andersson, linux-arm-msm, linux-serial, linux-kernel
On Tue, Jul 09, 2024 at 11:44:18AM +0200, Johan Hovold wrote:
> On Mon, Jul 08, 2024 at 04:59:59PM -0700, Doug Anderson wrote:
> > On Thu, Jul 4, 2024 at 3:19 AM Johan Hovold <johan+linaro@kernel.org> wrote:
> > > @@ -878,7 +878,7 @@ static void qcom_geni_serial_send_chunk_fifo(struct uart_port *uport,
> > > memset(buf, 0, sizeof(buf));
> > > tx_bytes = min(remaining, BYTES_PER_FIFO_WORD);
> > >
> > > - tx_bytes = uart_fifo_out(uport, buf, tx_bytes);
> > > + uart_fifo_out(uport, buf, tx_bytes);
> >
> > FWIW I would have rather we output something much more obviously wrong
> > in this case instead of a NUL byte. Maybe we should fill it with "@"
> > characters or something? As you said: the driver shouldn't get into
> > this error condition so it shouldn't matter, but if we have a bug in
> > the future I'd rather it be an obvious bug instead of a subtle bug.
>
> Yeah, I've been running with a patch like that locally in my tests, and
> went a bit back and forth whether I should post it. My reasoning for not
> doing so was that the bugs have been fixed so we don't need to spend
> cycles on memsetting the buffer to anything but NUL (I used 'X' in my
> testing).
>
> I guess that can be avoided by only padding the buffer if we ever hit an
> underrun, but I still thinks it's questionable to spend the effort as
> this is not something that should be needed. In any case, I didn't want
> to spend time on it to fix the 6.10 regressions.
>
> Killing the machine is perhaps an effective way to get attention to an
> issue, but I'd much rather have an occasional NUL character in the log
> *if* this ever becomes an issue at all again.
>
> > I'm happy to post a patch or provide a Reviewed-by if you want to post
> > a patch. Let me know.
>
> If you feel strongly about this, I can either fill the buffer with
> something else than NUL or add error handling for any such future
> hypothetical bugs. What do you prefer?
Actually we just need to clear the buffer on entry, which would do away
with the unnecessary memset() that is there today. This should also give
you a printable indication that something is wrong in case a similar bug
is ever reintroduced (e.g. the last four characters would be repeated
until the transfer is complete instead of a fixed char like '@').
Perhaps that's good enough as a compromise?
Johan
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 3/3] serial: qcom-geni: do not kill the machine on fifo underrun
2024-07-09 12:55 ` Johan Hovold
@ 2024-07-09 23:30 ` Doug Anderson
0 siblings, 0 replies; 11+ messages in thread
From: Doug Anderson @ 2024-07-09 23:30 UTC (permalink / raw)
To: Johan Hovold
Cc: Johan Hovold, Greg Kroah-Hartman, Jiri Slaby, Konrad Dybcio,
Bjorn Andersson, linux-arm-msm, linux-serial, linux-kernel
Hi,
On Tue, Jul 9, 2024 at 5:55 AM Johan Hovold <johan@kernel.org> wrote:
>
> On Tue, Jul 09, 2024 at 11:44:18AM +0200, Johan Hovold wrote:
> > On Mon, Jul 08, 2024 at 04:59:59PM -0700, Doug Anderson wrote:
> > > On Thu, Jul 4, 2024 at 3:19 AM Johan Hovold <johan+linaro@kernel.org> wrote:
>
> > > > @@ -878,7 +878,7 @@ static void qcom_geni_serial_send_chunk_fifo(struct uart_port *uport,
> > > > memset(buf, 0, sizeof(buf));
> > > > tx_bytes = min(remaining, BYTES_PER_FIFO_WORD);
> > > >
> > > > - tx_bytes = uart_fifo_out(uport, buf, tx_bytes);
> > > > + uart_fifo_out(uport, buf, tx_bytes);
> > >
> > > FWIW I would have rather we output something much more obviously wrong
> > > in this case instead of a NUL byte. Maybe we should fill it with "@"
> > > characters or something? As you said: the driver shouldn't get into
> > > this error condition so it shouldn't matter, but if we have a bug in
> > > the future I'd rather it be an obvious bug instead of a subtle bug.
> >
> > Yeah, I've been running with a patch like that locally in my tests, and
> > went a bit back and forth whether I should post it. My reasoning for not
> > doing so was that the bugs have been fixed so we don't need to spend
> > cycles on memsetting the buffer to anything but NUL (I used 'X' in my
> > testing).
> >
> > I guess that can be avoided by only padding the buffer if we ever hit an
> > underrun, but I still thinks it's questionable to spend the effort as
> > this is not something that should be needed. In any case, I didn't want
> > to spend time on it to fix the 6.10 regressions.
> >
> > Killing the machine is perhaps an effective way to get attention to an
> > issue, but I'd much rather have an occasional NUL character in the log
> > *if* this ever becomes an issue at all again.
> >
> > > I'm happy to post a patch or provide a Reviewed-by if you want to post
> > > a patch. Let me know.
> >
> > If you feel strongly about this, I can either fill the buffer with
> > something else than NUL or add error handling for any such future
> > hypothetical bugs. What do you prefer?
>
> Actually we just need to clear the buffer on entry, which would do away
> with the unnecessary memset() that is there today. This should also give
> you a printable indication that something is wrong in case a similar bug
> is ever reintroduced (e.g. the last four characters would be repeated
> until the transfer is complete instead of a fixed char like '@').
>
> Perhaps that's good enough as a compromise?
IMO initting 32-bits of data should be fine to do each time through
the loop. I've sent a patch:
https://lore.kernel.org/r/20240709162841.1.I93bf39f29d1887c46c74fbf8d4b937f6497cdfaa@changeid
-Doug
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2024-07-09 23:31 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-04 10:18 [PATCH v2 0/3] serial: qcom-geni: fix lockups Johan Hovold
2024-07-04 10:18 ` [PATCH v2 1/3] serial: qcom-geni: fix soft lockup on sw flow control and suspend Johan Hovold
2024-07-04 10:18 ` [PATCH v2 2/3] serial: qcom-geni: fix hard lockup on buffer flush Johan Hovold
2024-07-04 10:18 ` [PATCH v2 3/3] serial: qcom-geni: do not kill the machine on fifo underrun Johan Hovold
2024-07-08 23:59 ` Doug Anderson
2024-07-09 9:44 ` Johan Hovold
2024-07-09 12:55 ` Johan Hovold
2024-07-09 23:30 ` Doug Anderson
2024-07-04 10:30 ` [PATCH v2 0/3] serial: qcom-geni: fix lockups Greg Kroah-Hartman
2024-07-08 23:57 ` Doug Anderson
2024-07-09 9:32 ` Johan Hovold
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).