All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Ilpo Järvinen" <ilpo.jarvinen@linux.intel.com>
To: Hans de Goede <hdegoede@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Jiri Slaby <jirislaby@kernel.org>,
	"open list:SERIAL DRIVERS" <linux-serial@vger.kernel.org>,
	"regressions@lists.linux.dev" <regressions@lists.linux.dev>
Subject: Re: [Regression] "serial: 8250: use THRE & __stop_tx also with DMA" breaks DMA rx
Date: Wed, 15 Mar 2023 16:47:45 +0200 (EET)	[thread overview]
Message-ID: <322489-e120-dcbf-fffc-d9df31d8c499@linux.intel.com> (raw)
In-Reply-To: <00de13b2-0ed9-efe2-e8d8-c9370c04e80b@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 9175 bytes --]

On Tue, 14 Mar 2023, Hans de Goede wrote:
> On 3/14/23 17:55, Ilpo Järvinen wrote:
> > On Tue, 14 Mar 2023, Hans de Goede wrote:
> >> On 3/14/23 12:48, Ilpo Järvinen wrote:
> >>> On Tue, 14 Mar 2023, Hans de Goede wrote:
> >>>> On 3/14/23 11:55, Ilpo Järvinen wrote:
> >>>>> On Tue, 14 Mar 2023, Hans de Goede wrote:
> >>>>>
> >>>>>> I have spend the last couple of days debugging a problem with Bluetooth
> >>>>>> adapters (HCIs) connected over an UART connection on Intel Bay Trail
> >>>>>> and Cherry Trail devices.
> >>>>>>
> >>>>>> After much debugging I found out that sometimes the first byte of
> >>>>>> a received packet (data[0]) would be overwritten with a 0 byte.
> >>>>>>
> >>>>>> E.g. this packet received during init of a BCM4324B3 (1) Bluetooth HCI:
> >>>>>>
> >>>>>> 04 0e 0a 01 79 fc 00 54 fe ff ff 00 00
> >>>>>>
> >>>>>> would sometimes turn into:
> >>>>>>
> >>>>>> 00 0e 0a 01 79 fc 00 54 fe ff ff 00 00
> >>>>>>
> >>>>>> Further investigation revealed that this goes away if I stop
> >>>>>> the dw_dmac module from loading, leading to:
> >>>>>> "dw-apb-uart 80860F0A:00: failed to request DMA"
> >>>>>> and the UART working without DMA support.
> >>>>>>
> >>>>>> Testing various kernels showed that this problem was introduced
> >>>>>> in v5.19, v5.18 - v5.18.19 are fine. An a git bisect points to:
> >>>>>>
> >>>>>> e8ffbb71f783 ("serial: 8250: use THRE & __stop_tx also with DMA")
> >>>>>>
> >>>>>> And reverting that on top of v6.3-rc2 indeed solves the problem.
> >>>>>
> >>>>>> So it seems that that commit somehow interferes with DMA based
> >>>>
> >>>>> Maybe the the extra interrupt that the tx side change will trigger somehow 
> >>>>> causes the confusion on the rx side. So that would be an extra call into 
> >>>>> handle_rx_dma() that could either do an extra flush or start DMA Rx that 
> >>>>> would not occur w/o that tx side change.
> >>>>
> >>>> That sounds like a likely candidate for causing this yes, as said
> >>>> I'm unfamiliar with the serial-port code, but I did already suspect
> >>>> that the change was causing some extra interrupt which somehow
> >>>> interfered with the rx side.
> >>>>
> >>>>>> The issue has been seen on and the revert has been tested on
> >>>>>> the following HW:
> >>>>>>
> >>>>>> Asus T100TA
> >>>>>> SoC: Bay Trail UART: 80860F0A WIFI: brcmfmac43241b4-sdio BT: BCM4324B3
> >>>>>>
> >>>>>> Lenovo Yoga Tablet 2 1051L
> >>>>>> SoC: Bay Trail UART: 80860F0A WIFI: brcmfmac43241b4-sdio BT: BCM4324B3
> >>>>>>
> >>>>>> Lenovo Yoga Book X91F
> >>>>>> Soc: Cherry Trail UART: 8086228A WIFI: brcmfmac4356-pcie BT: BCM4356A2
> >>>>>>
> >>>>>> I have more hw which I believe is affected but these are the models
> >>>>>> where I have done detailed testing.
> >>>>>>
> >>>>>> I would be happy to test any patches, or run a kernel with some extra
> >>>>>> debugging added, just let me know what you need to help fixing this.

> > From: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
> > Subject: [PATCH] serial: 8250: Prevent starting up DMA Rx on THRI interrupt
> > 
> > Hans de Goede reported Bluetooth adapters (HCIs) connected over an UART
> > connection failed due corrupted Rx payload. The problem was narrowed
> > down to DMA Rx starting on UART_IIR_THRI interrupt. The problem occurs 
> > despite LSR having DR bit set, which is precondition for attempting to 
> > start DMA Rx.
> > 
> > This problem became apparent after e8ffbb71f783 ("serial: 8250: use
> > THRE & __stop_tx also with DMA") changed Tx stopping to get triggered
> > using THRI interrupt.
> > 
> > Don't setup DMA Rx on UART_IIR_THRI but leave it to a subsequent
> > interrupt which has Rx related IIR value.
> > 
> > Reported-by: Hans de Goede <hdegoede@redhat.com>
> > Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
> 
> I can confirm that this fixes things for me:
> 
> Tested-by: Hans de Goede <hdegoede@redhat.com>

Okay, thanks for testing.

Here's is a slightly improved debug patch which will count how many 
characters are received by DMA and non-DMA rx. It should be tested 
WITHOUT the fix.

I'm mostly interested in knowing if it's purely DMA Rx issue or whether 
the non-DMA rx is muddling the waters too. While investigating other 
issues I've seen UART_IIR_TIMEOUT (inter-character timeout) to often come 
so early from UART that the tail of characters is left there to be handled 
by the non-DMA path.


From: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Subject: [PATCH] DEBUG: Dma rx-problem

A DEBUG PATCH, not intended for upstream inclusing.

---
 drivers/tty/serial/8250/8250_dma.c  | 17 ++++++++++++-----
 drivers/tty/serial/8250/8250_port.c | 19 +++++++++++++++++++
 include/linux/serial_8250.h         |  3 +++
 3 files changed, 34 insertions(+), 5 deletions(-)

diff --git a/drivers/tty/serial/8250/8250_dma.c b/drivers/tty/serial/8250/8250_dma.c
index 7fa66501792d..705c26941429 100644
--- a/drivers/tty/serial/8250/8250_dma.c
+++ b/drivers/tty/serial/8250/8250_dma.c
@@ -32,13 +32,15 @@ static void __dma_tx_complete(void *param)
 		uart_write_wakeup(&p->port);
 
 	ret = serial8250_tx_dma(p);
-	if (ret || !dma->tx_running)
+	if (ret || !dma->tx_running) {
+		p->irq_debug = -5;
 		serial8250_set_THRI(p);
+	}
 
 	spin_unlock_irqrestore(&p->port.lock, flags);
 }
 
-static void __dma_rx_complete(struct uart_8250_port *p)
+static int __dma_rx_complete(struct uart_8250_port *p)
 {
 	struct uart_8250_dma	*dma = p->dma;
 	struct tty_port		*tty_port = &p->port.state->port;
@@ -53,7 +55,7 @@ static void __dma_rx_complete(struct uart_8250_port *p)
 	 */
 	dma_status = dmaengine_tx_status(dma->rxchan, dma->rx_cookie, &state);
 	if (dma_status == DMA_IN_PROGRESS)
-		return;
+		return -1;
 
 	count = dma->rx_size - state.residue;
 
@@ -62,6 +64,8 @@ static void __dma_rx_complete(struct uart_8250_port *p)
 	dma->rx_running = 0;
 
 	tty_flip_buffer_push(tty_port);
+
+	return count;
 }
 
 static void dma_rx_complete(void *param)
@@ -69,10 +73,11 @@ static void dma_rx_complete(void *param)
 	struct uart_8250_port *p = param;
 	struct uart_8250_dma *dma = p->dma;
 	unsigned long flags;
+	int count = -1;
 
 	spin_lock_irqsave(&p->port.lock, flags);
 	if (dma->rx_running)
-		__dma_rx_complete(p);
+		count = __dma_rx_complete(p);
 
 	/*
 	 * Cannot be combined with the previous check because __dma_rx_complete()
@@ -81,6 +86,8 @@ static void dma_rx_complete(void *param)
 	if (!dma->rx_running && (serial_lsr_in(p) & UART_LSR_DR))
 		p->dma->rx_dma(p);
 	spin_unlock_irqrestore(&p->port.lock, flags);
+
+	pr_err("8250dma: rx complete %d\n", count);
 }
 
 int serial8250_tx_dma(struct uart_8250_port *p)
@@ -172,7 +179,7 @@ void serial8250_rx_dma_flush(struct uart_8250_port *p)
 
 	if (dma->rx_running) {
 		dmaengine_pause(dma->rxchan);
-		__dma_rx_complete(p);
+		p->dma_rx_count = __dma_rx_complete(p);
 		dmaengine_terminate_async(dma->rxchan);
 	}
 }
diff --git a/drivers/tty/serial/8250/8250_port.c b/drivers/tty/serial/8250/8250_port.c
index fa43df05342b..0c950ff7a8ee 100644
--- a/drivers/tty/serial/8250/8250_port.c
+++ b/drivers/tty/serial/8250/8250_port.c
@@ -1811,6 +1811,7 @@ u16 serial8250_rx_chars(struct uart_8250_port *up, u16 lsr)
 	} while (lsr & (UART_LSR_DR | UART_LSR_BI));
 
 	tty_flip_buffer_push(&port->state->port);
+	up->rx_chars_count = 256 - max_count;
 	return lsr;
 }
 EXPORT_SYMBOL_GPL(serial8250_rx_chars);
@@ -1923,6 +1924,10 @@ int serial8250_handle_irq(struct uart_port *port, unsigned int iir)
 	struct uart_8250_port *up = up_to_u8250p(port);
 	bool skip_rx = false;
 	unsigned long flags;
+	long irq_debug;
+	unsigned int tx_running = 0, rx_running = 0, tx_err = 0;
+	unsigned int ier;
+	int rx_count, dma_count;
 	u16 status;
 
 	if (iir & UART_IIR_NO_INT)
@@ -1931,6 +1936,14 @@ int serial8250_handle_irq(struct uart_port *port, unsigned int iir)
 	spin_lock_irqsave(&port->lock, flags);
 
 	status = serial_lsr_in(up);
+	up->dma_rx_count = 0;
+	up->rx_chars_count = 0;
+	if (up->dma) {
+		rx_running = up->dma->rx_running;
+		tx_running = up->dma->tx_running;
+		tx_err = up->dma->tx_err;
+	}
+	ier = up->ier;
 
 	/*
 	 * If port is stopped and there are no error conditions in the
@@ -1957,7 +1970,13 @@ int serial8250_handle_irq(struct uart_port *port, unsigned int iir)
 			__stop_tx(up);
 	}
 
+	irq_debug = up->irq_debug++;
+	rx_count = up->rx_chars_count;
+	dma_count = up->dma_rx_count;
 	uart_unlock_and_check_sysrq_irqrestore(port, flags);
+	if (irq_debug < 0)
+		pr_err("8250irq: iir=%02x lsr+saved=%02x received=%d/%d ier=%02x dma_t/rx/err=%u/%u/%u cnt=%ld\n",
+		       iir, status, dma_count, rx_count, ier, tx_running, rx_running, tx_err, irq_debug);
 
 	return 1;
 }
diff --git a/include/linux/serial_8250.h b/include/linux/serial_8250.h
index 741ed4807a9c..4a0c0d549d7b 100644
--- a/include/linux/serial_8250.h
+++ b/include/linux/serial_8250.h
@@ -107,6 +107,9 @@ struct uart_8250_port {
 	unsigned char		mcr;
 	unsigned char		cur_iotype;	/* Running I/O type */
 	unsigned int		rpm_tx_active;
+	long			irq_debug;
+	int			rx_chars_count;
+	int			dma_rx_count;
 	unsigned char		canary;		/* non-zero during system sleep
 						 *   if no_console_suspend
 						 */

-- 
tg: (571079f5ba93..) debug/dma-rx-problem (depends on: tty-next)

  reply	other threads:[~2023-03-15 14:47 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-14 10:20 [Regression] "serial: 8250: use THRE & __stop_tx also with DMA" breaks DMA rx Hans de Goede
2023-03-14 10:55 ` Ilpo Järvinen
2023-03-14 11:11   ` Hans de Goede
2023-03-14 11:48     ` Ilpo Järvinen
2023-03-14 16:04       ` Hans de Goede
2023-03-14 16:55         ` Ilpo Järvinen
2023-03-14 19:02           ` Hans de Goede
2023-03-15 14:47             ` Ilpo Järvinen [this message]
2023-03-16 20:42               ` Hans de Goede
2023-03-17 10:38                 ` Ilpo Järvinen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=322489-e120-dcbf-fffc-d9df31d8c499@linux.intel.com \
    --to=ilpo.jarvinen@linux.intel.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hdegoede@redhat.com \
    --cc=jirislaby@kernel.org \
    --cc=linux-serial@vger.kernel.org \
    --cc=regressions@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.