linux-serial.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/3] tty: amba-pl011: Decruft and streamline earlycon output
@ 2017-10-18 14:14 Dave Martin
  2017-10-18 14:14 ` [RFC PATCH 1/3] tty: amba-pl011: earlycon: Switch to relaxed I/O Dave Martin
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Dave Martin @ 2017-10-18 14:14 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Andre Przywara, Stephen Boyd, Russell King, Andy Gross,
	linux-serial, Greg Kroah-Hartman, Bhupinder Thakur

This series avoids waiting for the UART transmitter to go idle in
between every earlycon char written, and does some refactoring so that
this change doesn't have to be implemented multiple times.

The initial motiviation for this is a performance issue encountered
during Xen virtual UART development (see [1] and patch 3).

It's debatable whether this is Linux's responsibility or the
responsibility of the virtual UART: this is a situation where the
virtualised behaviour is inevitably so different from hardware that
it's hard to make a ruling on exactly what behaviour is reasonable.
The best course is probably to work around it on _both_ ends, in the
interest of best interoperation.

Currently amba-pl011 has two distinct earlycon implementations: a
generic pl011 implementation and a quirked implementation that works
around the lack of a usable BUSY status bit in QDF2400 SoCs.

I don't like applying the same obscure fix in multiple places if I can
avoid it, so I've had a go at unifying the earlycons here (patches 1-2).


I'd appreciate people's views, particularly on:

 * (pl011) Whether to refactor harder: the caching of vendor->reg_offset
   vendor->access_32b in uart_amba_port is convenient and avoids chasing
   an extra pointer in the common case, but this the refactoring for the
   earlycon case a bit ugly.  Maybe we can do better.

 * (pl011, serial_core) Whether the new use of port->private_data will
   conflict with existing code (no, AFAICT).

 * (earlycon) Whether overriding port->iotype for 32-bit-only
   implementations is the right thing to do.

 * (earlycon) Whether there is some reason to drain the transmitter
   after every char that I've overlooked.


Currently I don't consider this to be a candidate for stable, but it
could be backported.  Due to the steady quirkage churn in amba-pl011,
it would be better to drop the refactoring stuff for backporting
purposes.

Testing:

 * tested rather casually no ARM Juno r0 (real PL011) and the ARM VFP
   Base Model (emulated PL011)

 * *Not tested* on QDF2400, since I don't have access to one.
   Someone probably ought to test that.


[1] Re: [Xen-devel] [PATCH 26/27 v12] arm/xen: vpl011: Fix the slow
early console SBSA UART output
https://lists.xenproject.org/archives/html/xen-devel/2017-10/msg01949.html

Dave Martin (3):
  tty: amba-pl011: earlycon: Switch to relaxed I/O
  tty: amba-pl011: earlycon: Unify earlycon backends
  tty: amba-pl011: earlycon: Don't drain the transmitter after each char

 drivers/tty/serial/amba-pl011.c | 108 +++++++++++++++++++++++-----------------
 1 file changed, 61 insertions(+), 47 deletions(-)

-- 
2.1.4

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [RFC PATCH 1/3] tty: amba-pl011: earlycon: Switch to relaxed I/O
  2017-10-18 14:14 [RFC PATCH 0/3] tty: amba-pl011: Decruft and streamline earlycon output Dave Martin
@ 2017-10-18 14:14 ` Dave Martin
  2017-10-18 14:14 ` [RFC PATCH 2/3] tty: amba-pl011: earlycon: Unify earlycon backends Dave Martin
  2017-10-18 14:14 ` [RFC PATCH 3/3] tty: amba-pl011: earlycon: Don't drain the transmitter after each char Dave Martin
  2 siblings, 0 replies; 4+ messages in thread
From: Dave Martin @ 2017-10-18 14:14 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Andre Przywara, Stephen Boyd, Russell King, Andy Gross,
	linux-serial, Greg Kroah-Hartman, Bhupinder Thakur

The current pl011 earlycon implementation uses the regular I/O
accessors, but this is unnecessary because the architecture
enforces ordering of accesses to the same device anyway.

Using relaxed I/O brings the added bonus that the generic pl011
helpers can be used instead of having to open-code: this allows
some duplicate logic to be unified.

This patch does some refactoring so that pl011_{read,write,
tx_empty}() are split into a frontend that does the same thing as
before, and a backend __* that can be used with a uart_port that
has no corresponding uart_amba_port structure yet (i.e., the
earlycon scenario).

__pl011_tx_empty() can now be used in place of an open-coded poll
that differs between the generic and qdf2400_e44 earlycon
implementations, because __pl011_tx_empty() handles the necessary
quirkage transparently.

Moving to relaxed I/O loses wmb() semantics at the start of an
earlycon write, and this may be important for some debugging
scenarios, so an explicit wmb() is inserted at the start of each
earlycon write implementation.

Because qdf2400_e44 only supports 32-bit I/O, this patch also
explicitly sets port->iotype == UPIO_MEM32 so that __pl011_{write,
read}() use the correct I/O size.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
---
 drivers/tty/serial/amba-pl011.c | 69 ++++++++++++++++++++++++++++-------------
 1 file changed, 47 insertions(+), 22 deletions(-)

diff --git a/drivers/tty/serial/amba-pl011.c b/drivers/tty/serial/amba-pl011.c
index 111e6a9..fd9e08c 100644
--- a/drivers/tty/serial/amba-pl011.c
+++ b/drivers/tty/serial/amba-pl011.c
@@ -300,26 +300,38 @@ static unsigned int pl011_reg_to_offset(const struct uart_amba_port *uap,
 	return uap->reg_offset[reg];
 }
 
-static unsigned int pl011_read(const struct uart_amba_port *uap,
-	unsigned int reg)
+static unsigned int __pl011_read(const struct uart_port *port,
+	const u16 *reg_offset, unsigned int reg)
 {
-	void __iomem *addr = uap->port.membase + pl011_reg_to_offset(uap, reg);
+	void __iomem *addr = port->membase + reg_offset[reg];
 
-	return (uap->port.iotype == UPIO_MEM32) ?
+	return (port->iotype == UPIO_MEM32) ?
 		readl_relaxed(addr) : readw_relaxed(addr);
 }
 
-static void pl011_write(unsigned int val, const struct uart_amba_port *uap,
+static unsigned int pl011_read(const struct uart_amba_port *uap,
 	unsigned int reg)
 {
-	void __iomem *addr = uap->port.membase + pl011_reg_to_offset(uap, reg);
+	return __pl011_read(&uap->port, uap->reg_offset, reg);
+}
 
-	if (uap->port.iotype == UPIO_MEM32)
+static void __pl011_write(unsigned int val, const struct uart_port *port,
+	const u16 *reg_offset, unsigned int reg)
+{
+	void __iomem *addr = port->membase + reg_offset[reg];
+
+	if (port->iotype == UPIO_MEM32)
 		writel_relaxed(val, addr);
 	else
 		writew_relaxed(val, addr);
 }
 
+static void pl011_write(unsigned int val, const struct uart_amba_port *uap,
+	unsigned int reg)
+{
+	__pl011_write(val, &uap->port, uap->reg_offset, reg);
+}
+
 /*
  * Reads up to 256 characters from the FIFO or until it's empty and
  * inserts them into the TTY layer. Returns the number of characters
@@ -1537,16 +1549,23 @@ static irqreturn_t pl011_int(int irq, void *dev_id)
 	return IRQ_RETVAL(handled);
 }
 
-static unsigned int pl011_tx_empty(struct uart_port *port)
+static unsigned int __pl011_tx_empty(struct uart_port *port,
+	const u16 *reg_offset, const struct vendor_data *vendor)
 {
-	struct uart_amba_port *uap =
-	    container_of(port, struct uart_amba_port, port);
+	unsigned int status = __pl011_read(port, reg_offset, REG_FR);
 
 	/* Allow feature register bits to be inverted to work around errata */
-	unsigned int status = pl011_read(uap, REG_FR) ^ uap->vendor->inv_fr;
+	status ^= vendor->inv_fr;
+	status &= vendor->fr_busy | UART01x_FR_TXFF;
+	return status ? 0 : TIOCSER_TEMT;
+}
 
-	return status & (uap->vendor->fr_busy | UART01x_FR_TXFF) ?
-							0 : TIOCSER_TEMT;
+static unsigned int pl011_tx_empty(struct uart_port *port)
+{
+	struct uart_amba_port *uap =
+	    container_of(port, struct uart_amba_port, port);
+
+	return __pl011_tx_empty(port, uap->reg_offset, uap->vendor);
 }
 
 static unsigned int pl011_get_mctrl(struct uart_port *port)
@@ -2419,10 +2438,13 @@ static struct console amba_console = {
 
 static void qdf2400_e44_putc(struct uart_port *port, int c)
 {
-	while (readl(port->membase + UART01x_FR) & UART01x_FR_TXFF)
+	const struct vendor_data *vendor = &vendor_qdt_qdf2400_e44;
+	const u16 *reg_offset = vendor->reg_offset;
+
+	while (__pl011_read(port, reg_offset, REG_FR) & UART01x_FR_TXFF)
 		cpu_relax();
-	writel(c, port->membase + UART01x_DR);
-	while (!(readl(port->membase + UART01x_FR) & UART011_FR_TXFE))
+	__pl011_write(c, port, reg_offset, REG_DR);
+	while (!__pl011_tx_empty(port, reg_offset, vendor))
 		cpu_relax();
 }
 
@@ -2430,18 +2452,19 @@ static void qdf2400_e44_early_write(struct console *con, const char *s, unsigned
 {
 	struct earlycon_device *dev = con->data;
 
+	wmb();
 	uart_console_write(&dev->port, s, n, qdf2400_e44_putc);
 }
 
 static void pl011_putc(struct uart_port *port, int c)
 {
-	while (readl(port->membase + UART01x_FR) & UART01x_FR_TXFF)
+	const struct vendor_data *vendor = &vendor_arm;
+	const u16 *reg_offset = vendor->reg_offset;
+
+	while (__pl011_read(port, reg_offset, REG_FR) & UART01x_FR_TXFF)
 		cpu_relax();
-	if (port->iotype == UPIO_MEM32)
-		writel(c, port->membase + UART01x_DR);
-	else
-		writeb(c, port->membase + UART01x_DR);
-	while (readl(port->membase + UART01x_FR) & UART01x_FR_BUSY)
+	__pl011_write(c, port, reg_offset, REG_DR);
+	while (!__pl011_tx_empty(port, reg_offset, vendor))
 		cpu_relax();
 }
 
@@ -2449,6 +2472,7 @@ static void pl011_early_write(struct console *con, const char *s, unsigned n)
 {
 	struct earlycon_device *dev = con->data;
 
+	wmb();
 	uart_console_write(&dev->port, s, n, pl011_putc);
 }
 
@@ -2494,6 +2518,7 @@ qdf2400_e44_early_console_setup(struct earlycon_device *device,
 	if (!device->port.membase)
 		return -ENODEV;
 
+	device->port.iotype = UPIO_MEM32;
 	device->con->write = qdf2400_e44_early_write;
 	return 0;
 }
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [RFC PATCH 2/3] tty: amba-pl011: earlycon: Unify earlycon backends
  2017-10-18 14:14 [RFC PATCH 0/3] tty: amba-pl011: Decruft and streamline earlycon output Dave Martin
  2017-10-18 14:14 ` [RFC PATCH 1/3] tty: amba-pl011: earlycon: Switch to relaxed I/O Dave Martin
@ 2017-10-18 14:14 ` Dave Martin
  2017-10-18 14:14 ` [RFC PATCH 3/3] tty: amba-pl011: earlycon: Don't drain the transmitter after each char Dave Martin
  2 siblings, 0 replies; 4+ messages in thread
From: Dave Martin @ 2017-10-18 14:14 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Andre Przywara, Stephen Boyd, Russell King, Andy Gross,
	linux-serial, Greg Kroah-Hartman, Bhupinder Thakur

Now that the generic pl011 and qdf2400_e44 earlycon backends differ
only in the choice of vendor_data and minor initialisation detatils
that are detectable based on vendor_data, there's no strong need
for them to be separate.

This patch factors common earlyon setup into a new function
pl011_early_console_setup_common(), which does generic setup and
stashes a pointer to the relevant vendor_data struct in
port->private_data (which appears otherwise unused by the
amba-pl011 driver).

The qdf2400_e44-specific earlycon write/putc functions would now be
identical to the pl011 versions, so they are deleted.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
---
 drivers/tty/serial/amba-pl011.c | 53 ++++++++++++++++-------------------------
 1 file changed, 20 insertions(+), 33 deletions(-)

diff --git a/drivers/tty/serial/amba-pl011.c b/drivers/tty/serial/amba-pl011.c
index fd9e08c..084ed3f 100644
--- a/drivers/tty/serial/amba-pl011.c
+++ b/drivers/tty/serial/amba-pl011.c
@@ -2436,29 +2436,9 @@ static struct console amba_console = {
 
 #define AMBA_CONSOLE	(&amba_console)
 
-static void qdf2400_e44_putc(struct uart_port *port, int c)
-{
-	const struct vendor_data *vendor = &vendor_qdt_qdf2400_e44;
-	const u16 *reg_offset = vendor->reg_offset;
-
-	while (__pl011_read(port, reg_offset, REG_FR) & UART01x_FR_TXFF)
-		cpu_relax();
-	__pl011_write(c, port, reg_offset, REG_DR);
-	while (!__pl011_tx_empty(port, reg_offset, vendor))
-		cpu_relax();
-}
-
-static void qdf2400_e44_early_write(struct console *con, const char *s, unsigned n)
-{
-	struct earlycon_device *dev = con->data;
-
-	wmb();
-	uart_console_write(&dev->port, s, n, qdf2400_e44_putc);
-}
-
 static void pl011_putc(struct uart_port *port, int c)
 {
-	const struct vendor_data *vendor = &vendor_arm;
+	const struct vendor_data *vendor = port->private_data;
 	const u16 *reg_offset = vendor->reg_offset;
 
 	while (__pl011_read(port, reg_offset, REG_FR) & UART01x_FR_TXFF)
@@ -2476,6 +2456,22 @@ static void pl011_early_write(struct console *con, const char *s, unsigned n)
 	uart_console_write(&dev->port, s, n, pl011_putc);
 }
 
+static int __init
+pl011_early_console_setup_common(struct earlycon_device *device,
+				 const struct vendor_data *vendor)
+{
+	if (!device->port.membase)
+		return -ENODEV;
+
+	device->port.private_data = (void *)vendor;
+	if (vendor->access_32b)
+		device->port.iotype = UPIO_MEM32;
+
+	device->con->write = pl011_early_write;
+
+	return 0;
+}
+
 /*
  * On non-ACPI systems, earlycon is enabled by specifying
  * "earlycon=pl011,<address>" on the kernel command line.
@@ -2491,12 +2487,7 @@ static void pl011_early_write(struct console *con, const char *s, unsigned n)
 static int __init pl011_early_console_setup(struct earlycon_device *device,
 					    const char *opt)
 {
-	if (!device->port.membase)
-		return -ENODEV;
-
-	device->con->write = pl011_early_write;
-
-	return 0;
+	return pl011_early_console_setup_common(device, &vendor_arm);
 }
 OF_EARLYCON_DECLARE(pl011, "arm,pl011", pl011_early_console_setup);
 OF_EARLYCON_DECLARE(pl011, "arm,sbsa-uart", pl011_early_console_setup);
@@ -2515,12 +2506,8 @@ static int __init
 qdf2400_e44_early_console_setup(struct earlycon_device *device,
 				const char *opt)
 {
-	if (!device->port.membase)
-		return -ENODEV;
-
-	device->port.iotype = UPIO_MEM32;
-	device->con->write = qdf2400_e44_early_write;
-	return 0;
+	return pl011_early_console_setup_common(device,
+						&vendor_qdt_qdf2400_e44);
 }
 EARLYCON_DECLARE(qdf2400_e44, qdf2400_e44_early_console_setup);
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [RFC PATCH 3/3] tty: amba-pl011: earlycon: Don't drain the transmitter after each char
  2017-10-18 14:14 [RFC PATCH 0/3] tty: amba-pl011: Decruft and streamline earlycon output Dave Martin
  2017-10-18 14:14 ` [RFC PATCH 1/3] tty: amba-pl011: earlycon: Switch to relaxed I/O Dave Martin
  2017-10-18 14:14 ` [RFC PATCH 2/3] tty: amba-pl011: earlycon: Unify earlycon backends Dave Martin
@ 2017-10-18 14:14 ` Dave Martin
  2 siblings, 0 replies; 4+ messages in thread
From: Dave Martin @ 2017-10-18 14:14 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Andre Przywara, Stephen Boyd, Russell King, Andy Gross,
	linux-serial, Greg Kroah-Hartman, Bhupinder Thakur

Currently, the pl011 earlycon implementation waits for the UART
transmitter to drain completely and become idle after each
character is written.

This can result in (mostly harmless) delays and stuttering output
on the wire, and can also lead to poor performance in virtualised
UART implementations: thrashing between VMs can occur as each
character gets forcibly drained to the remove sink (dom0 in the Xen
case) before the source guest writes another character.

However, the semantics of earlycon don't allow callers to write one
character at a time with the expectation of completion: the only
interface to exposed to earlycon writes a whole string before
returning.

So, this patch eliminates the draining of the transmitted from
pl011_putc() and moves if to the the and of pl011_early_write()
instead.  From the earlycon caller's point of view, the effect
should be faster but otherwise identical: either way, all the
characters have gone out onto the wire before the write method
returns.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>

---

For background on the virtualisation issue, see [1].  Although
virtual UART implementations should be working around this issue
somehow for compatibility with current and older kernels, this patch
should promote better interoperation, and earlycon throughput should
improve a bit on real hardware too.

[1] Re: [Xen-devel] [PATCH 26/27 v12] arm/xen: vpl011: Fix the slow
early console SBSA UART output
https://lists.xenproject.org/archives/html/xen-devel/2017-10/msg01949.html
---
 drivers/tty/serial/amba-pl011.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/tty/serial/amba-pl011.c b/drivers/tty/serial/amba-pl011.c
index 084ed3f..e27059a 100644
--- a/drivers/tty/serial/amba-pl011.c
+++ b/drivers/tty/serial/amba-pl011.c
@@ -2444,16 +2444,18 @@ static void pl011_putc(struct uart_port *port, int c)
 	while (__pl011_read(port, reg_offset, REG_FR) & UART01x_FR_TXFF)
 		cpu_relax();
 	__pl011_write(c, port, reg_offset, REG_DR);
-	while (!__pl011_tx_empty(port, reg_offset, vendor))
-		cpu_relax();
 }
 
 static void pl011_early_write(struct console *con, const char *s, unsigned n)
 {
 	struct earlycon_device *dev = con->data;
+	const struct vendor_data *vendor = dev->port.private_data;
+	const u16 *reg_offset = vendor->reg_offset;
 
 	wmb();
 	uart_console_write(&dev->port, s, n, pl011_putc);
+	while (!__pl011_tx_empty(&dev->port, reg_offset, vendor))
+		cpu_relax();
 }
 
 static int __init
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-10-18 14:14 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-10-18 14:14 [RFC PATCH 0/3] tty: amba-pl011: Decruft and streamline earlycon output Dave Martin
2017-10-18 14:14 ` [RFC PATCH 1/3] tty: amba-pl011: earlycon: Switch to relaxed I/O Dave Martin
2017-10-18 14:14 ` [RFC PATCH 2/3] tty: amba-pl011: earlycon: Unify earlycon backends Dave Martin
2017-10-18 14:14 ` [RFC PATCH 3/3] tty: amba-pl011: earlycon: Don't drain the transmitter after each char Dave Martin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).