public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Johan Hovold <johan@kernel.org>
To: Douglas Anderson <dianders@chromium.org>
Cc: "Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
	"Jiri Slaby" <jirislaby@kernel.org>,
	"Yicong Yang" <yangyicong@hisilicon.com>,
	"Tony Lindgren" <tony@atomide.com>,
	"Andy Shevchenko" <andriy.shevchenko@linux.intel.com>,
	"Johan Hovold" <johan+linaro@kernel.org>,
	"John Ogness" <john.ogness@linutronix.de>,
	linux-arm-msm@vger.kernel.org,
	"Bjorn Andersson" <andersson@kernel.org>,
	"Konrad Dybcio" <konrad.dybcio@linaro.org>,
	"Ilpo Järvinen" <ilpo.jarvinen@linux.intel.com>,
	"Stephen Boyd" <swboyd@chromium.org>,
	linux-serial@vger.kernel.org, linux-kernel@vger.kernel.org,
	"Uwe Kleine-König" <u.kleine-koenig@pengutronix.de>,
	"Thomas Gleixner" <tglx@linutronix.de>
Subject: Re: [PATCH v4 7/8] serial: qcom-geni: Fix suspend while active UART xfer
Date: Mon, 24 Jun 2024 14:12:04 +0200	[thread overview]
Message-ID: <ZnlilDj5UrvrVasv@hovoldconsulting.com> (raw)
In-Reply-To: <20240610152420.v4.7.I0f81a5baa37d368f291c96ee4830abca337e3c87@changeid>

On Mon, Jun 10, 2024 at 03:24:25PM -0700, Douglas Anderson wrote:
> On devices using Qualcomm's GENI UART it is possible to get the UART
> stuck such that it no longer outputs data. Specifically, logging in
> via an agetty on the debug serial port (which was _not_ used for
> kernel console) and running:
>   cat /var/log/messages
> ...and then (via an SSH session) forcing a few suspend/resume cycles
> causes the UART to stop transmitting.

An easier way to trigger this old bug is to just run a command like
dmesg and hit ctrl-s in a serial console to stop tx. Interrupting the
command or hitting ctrl-q to restart tx then triggers the soft lockup.

> The root of the problems was with qcom_geni_serial_stop_tx_fifo()
> which is called as part of the suspend process. Specific problems with
> that function:
> - When an in-progress "tx" command is cancelled it doesn't appear to
>   fully drain the FIFO. That meant qcom_geni_serial_tx_empty()
>   continued to report that the FIFO wasn't empty. The
>   qcom_geni_serial_start_tx_fifo() function didn't re-enable
>   interrupts in this case so the driver would never start transferring
>   again.
> - When the driver cancelled the current "tx" command but it forgot to
>   zero out "tx_remaining". This confused logic elsewhere in the
>   driver.
> - From experimentation, it appears that cancelling the "tx" command
>   could drop some of the queued up bytes.
> 
> While qcom_geni_serial_stop_tx_fifo() could be fixed to drain the FIFO
> and shut things down properly, stop_tx() isn't supposed to be a slow
> function. It is run with local interrupts off and is documented to
> stop transmitting "as soon as possible". Change the function to just
> stop new bytes from being queued. In order to make this work, change
> qcom_geni_serial_start_tx_fifo() to remove some conditions. It's
> always safe to enable the watermark interrupt and the IRQ handler will
> disable it if it's not needed.
> 
> For system suspend the queue still needs to be drained. Failure to do
> so means that the hardware won't provide new interrupts until a
> "cancel" command is sent. Add draining logic (fixing the issues noted
> above) at suspend time.

So I spent the better part of the weekend looking at this driver and
this is one of the bits I worry about with your approach as relying on
draining anything won't work with hardware flow control.

Cancelling commands can result stalled TX in a number of ways and
there's still at least one that you don't handle. If you end up with
data in in the FIFO, the watermark interrupt may never fire when you try
to restart tx.

I'm leaning towards fixing the immediate hard lockup regression
separately and then we can address the older bugs and rework driver
without having to rush things.

I've prepared a minimal three patch series which fixes most of the
discussed issues (hard and soft lockup and garbage characters) and that
should be backportable as well.

Currently, the diffstat is just:

	 drivers/tty/serial/qcom_geni_serial.c | 36 +++++++++++++++++++++++++-----------
	 1 file changed, 25 insertions(+), 11 deletions(-)

Fixing the hard lockup 6.10-rc1 regression is just a single line.

Johan

  parent reply	other threads:[~2024-06-24 12:11 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-10 22:24 [PATCH v4 0/8] serial: qcom-geni: Overhaul TX handling to fix crashes/hangs Douglas Anderson
2024-06-10 22:24 ` [PATCH v4 1/8] soc: qcom: geni-se: Add GP_LENGTH/IRQ_EN_SET/IRQ_EN_CLEAR registers Douglas Anderson
2024-06-10 22:24 ` [PATCH v4 2/8] tty: serial: Add uart_fifo_timeout_ms() Douglas Anderson
2024-06-12  7:38   ` Ilpo Järvinen
2024-06-12 18:45     ` Doug Anderson
2024-06-13  6:56       ` Ilpo Järvinen
2024-06-13 14:02         ` Doug Anderson
2024-06-10 22:24 ` [PATCH v4 3/8] serial: qcom-geni: Fix the timeout in qcom_geni_serial_poll_bit() Douglas Anderson
2024-06-17 18:46   ` Konrad Dybcio
2024-06-10 22:24 ` [PATCH v4 4/8] serial: qcom-geni: Fix arg types for qcom_geni_serial_poll_bit() Douglas Anderson
2024-06-17 18:47   ` Konrad Dybcio
2024-06-10 22:24 ` [PATCH v4 5/8] serial: qcom-geni: Introduce qcom_geni_serial_poll_bitfield() Douglas Anderson
2024-06-17 18:53   ` Konrad Dybcio
2024-06-10 22:24 ` [PATCH v4 6/8] serial: qcom-geni: Just set the watermark level once Douglas Anderson
2024-06-10 22:24 ` [PATCH v4 7/8] serial: qcom-geni: Fix suspend while active UART xfer Douglas Anderson
2024-06-17 19:02   ` Konrad Dybcio
2024-06-24 12:12   ` Johan Hovold [this message]
2024-06-24 16:54     ` Johan Hovold
2024-06-24 20:58     ` Doug Anderson
2024-06-25  8:46       ` Johan Hovold
2024-06-10 22:24 ` [PATCH v4 8/8] serial: qcom-geni: Rework TX in FIFO mode to fix hangs/lockups Douglas Anderson
2024-06-17 19:10   ` Konrad Dybcio
2024-06-17 19:37     ` Doug Anderson
2024-06-17 19:54       ` Konrad Dybcio
2024-06-24 12:43   ` Johan Hovold
2024-06-24 21:15     ` Doug Anderson
2024-06-25 11:21       ` Johan Hovold
2024-06-25 14:29         ` Doug Anderson
2024-06-26  8:20           ` Johan Hovold
2024-06-18 10:19 ` [PATCH v4 0/8] serial: qcom-geni: Overhaul TX handling to fix crashes/hangs Konrad Dybcio
2024-06-19  8:25 ` neil.armstrong
2024-06-19  8:50 ` Johan Hovold
2024-06-20 23:13 ` Nícolas F. R. A. Prado

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZnlilDj5UrvrVasv@hovoldconsulting.com \
    --to=johan@kernel.org \
    --cc=andersson@kernel.org \
    --cc=andriy.shevchenko@linux.intel.com \
    --cc=dianders@chromium.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=ilpo.jarvinen@linux.intel.com \
    --cc=jirislaby@kernel.org \
    --cc=johan+linaro@kernel.org \
    --cc=john.ogness@linutronix.de \
    --cc=konrad.dybcio@linaro.org \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-serial@vger.kernel.org \
    --cc=swboyd@chromium.org \
    --cc=tglx@linutronix.de \
    --cc=tony@atomide.com \
    --cc=u.kleine-koenig@pengutronix.de \
    --cc=yangyicong@hisilicon.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox