* [PATCH 0/2] serial: sc16is7xx: fix TX inter-frame gaps on SPI UARTs
@ 2026-06-23 11:22 Paul Mbewe
2026-06-23 11:22 ` [PATCH 1/2] serial: sc16is7xx: fix TX gap caused by kfifo circular buffer wrap-around Paul Mbewe
2026-06-23 11:22 ` [PATCH 2/2] serial: sc16is7xx: set TX FIFO trigger level to half FIFO to prevent underruns Paul Mbewe
0 siblings, 2 replies; 15+ messages in thread
From: Paul Mbewe @ 2026-06-23 11:22 UTC (permalink / raw)
To: linux-serial, linux-kernel; +Cc: gregkh, jirislaby, hvilleneuve, Paul Mbewe
This series fixes two related TX path issues in the sc16is7xx SPI UART
driver that cause visible inter-frame gaps on the wire, breaking Modbus
RTU communication.
Patch 1 fixes a data loss / gap caused by kfifo_out_linear_ptr() only
returning one contiguous segment of the circular buffer. When TX data
wraps around the kfifo boundary, the remainder is delayed until the next
TX interrupt, producing a silence gap that violates the Modbus 1.5
character-time limit.
Patch 2 raises the TX FIFO trigger level from 8 to 32 free spaces via
the TLR register. The default trigger of 8 causes excessive SPI
round-trips on slow single-core hosts, leading to TX FIFO underruns
between bursts.
Tested on SC16IS752 (SPI, 1 MHz) on i.MX6ULL driving RS-485 at 115200
baud. Oscilloscope confirmed both gap types are eliminated after the
fix.
Paul Mbewe (2):
serial: sc16is7xx: fix TX gap caused by kfifo circular buffer
wrap-around
serial: sc16is7xx: set TX FIFO trigger level to half FIFO to prevent
underruns
drivers/tty/serial/sc16is7xx.c | 24 +++++++++++++++++++++---
1 file changed, 21 insertions(+), 3 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 15+ messages in thread* [PATCH 1/2] serial: sc16is7xx: fix TX gap caused by kfifo circular buffer wrap-around 2026-06-23 11:22 [PATCH 0/2] serial: sc16is7xx: fix TX inter-frame gaps on SPI UARTs Paul Mbewe @ 2026-06-23 11:22 ` Paul Mbewe 2026-06-23 11:22 ` [PATCH 2/2] serial: sc16is7xx: set TX FIFO trigger level to half FIFO to prevent underruns Paul Mbewe 1 sibling, 0 replies; 15+ messages in thread From: Paul Mbewe @ 2026-06-23 11:22 UTC (permalink / raw) To: linux-serial, linux-kernel Cc: gregkh, jirislaby, hvilleneuve, Paul Mbewe, stable, Tobias Gannert, Joachim Knorr kfifo_out_linear_ptr() returns only one contiguous linear segment of the circular kfifo buffer. When transmit data wraps around the end of the buffer, only the first segment (up to the buffer end) is sent. The remaining data at the start of the buffer is not sent until the next TX interrupt fires, resulting in a visible inter-frame gap on the wire. This gap violates the Modbus RTU 1.5 character-time inter-character silence limit. Receivers interpret any silence exceeding 1.5 character times as an end-of-frame marker, splitting a single valid frame into two malformed fragments and corrupting communication on the bus. The incomplete transfer also causes unnecessary TX interrupts: instead of draining the full available FIFO space in one pass, the driver fires an extra interrupt per wrap-around just to send the remaining bytes. The pre-kfifo code handled wrap-around by copying bytes one at a time from the circ_buf into a linear staging buffer. The conversion to kfifo replaced this with a single kfifo_out_linear_ptr() call, losing the wrap-around handling. The max310x driver (a similar SPI UART) correctly handles this with a while loop. Fix this by calling kfifo_out_linear_ptr() in a loop, advancing through all contiguous segments until the available TX FIFO space is exhausted or the kfifo is empty. Tested on SC16IS752 (SPI) driving RS-485 at 115200 baud 8N1 on an i.MX6ULL based board. Oscilloscope confirmed mid-frame breaks at the kfifo wrap-around boundary before the fix; no breaks observed after. Fixes: 1788cf6a91d9 ("tty: serial: switch from circ_buf to kfifo") Cc: stable@vger.kernel.org Reported-by: Tobias Gannert <tobias.gannert@ziehl-abegg.de> Tested-by: Tobias Gannert <tobias.gannert@ziehl-abegg.de> Reviewed-by: Joachim Knorr <joachim.knorr@ziehl-abegg.de> Signed-off-by: Paul Mbewe <paultyson.mbewe@ziehl-abegg.de> --- drivers/tty/serial/sc16is7xx.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/drivers/tty/serial/sc16is7xx.c b/drivers/tty/serial/sc16is7xx.c index 1a2c4c14f6aa..395a219280be 100644 --- a/drivers/tty/serial/sc16is7xx.c +++ b/drivers/tty/serial/sc16is7xx.c @@ -730,9 +730,17 @@ static void sc16is7xx_handle_tx(struct uart_port *port) txlen = 0; } - txlen = kfifo_out_linear_ptr(&tport->xmit_fifo, &tail, txlen); - sc16is7xx_fifo_write(port, tail, txlen); - uart_xmit_advance(port, txlen); + /* Handle circular buffer wrap-around by sending in contiguous segments */ + while (txlen > 0 && !kfifo_is_empty(&tport->xmit_fifo)) { + unsigned int to_send; + + to_send = kfifo_out_linear_ptr(&tport->xmit_fifo, &tail, txlen); + if (!to_send) + break; + sc16is7xx_fifo_write(port, tail, to_send); + uart_xmit_advance(port, to_send); + txlen -= to_send; + } uart_port_lock_irqsave(port, &flags); if (kfifo_len(&tport->xmit_fifo) < WAKEUP_CHARS) -- 2.43.0 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 2/2] serial: sc16is7xx: set TX FIFO trigger level to half FIFO to prevent underruns 2026-06-23 11:22 [PATCH 0/2] serial: sc16is7xx: fix TX inter-frame gaps on SPI UARTs Paul Mbewe 2026-06-23 11:22 ` [PATCH 1/2] serial: sc16is7xx: fix TX gap caused by kfifo circular buffer wrap-around Paul Mbewe @ 2026-06-23 11:22 ` Paul Mbewe 2026-06-23 12:45 ` David Laight 1 sibling, 1 reply; 15+ messages in thread From: Paul Mbewe @ 2026-06-23 11:22 UTC (permalink / raw) To: linux-serial, linux-kernel Cc: gregkh, jirislaby, hvilleneuve, Paul Mbewe, stable, Tobias Gannert, Joachim Knorr The THRI interrupt (IER[1]) fires when the TX FIFO free space reaches the configured threshold. With the reset default (TLR=0), the chip falls back to the FCR TX trigger of 8 free spaces (FCR[5:4]=00), causing THRI to assert after every 8 bytes drain from the FIFO. At 115200 baud 8N1, 8 bytes drain in 694 us. On slow single-core SPI hosts, the combined latency of an SPI IIR read, TXLVL read, and 8-byte THR write per interrupt, plus kthread scheduling jitter, can exceed this window on a loaded system. When the kthread cannot refill the FIFO within 694 us, the FIFO empties and produces an idle gap on the TX line. This violates the Modbus RTU specification, which treats any intra-frame silence longer than 1.5 character times (~130 us at 115200 baud) as a frame boundary, causing receivers to fragment frames and report CRC errors. Oscilloscope measurements confirmed a 757 us inter-burst gap during continuous transmission without this fix. Setting the TX trigger to 32 free spaces (half FIFO) via TLR[3:0]=8 widens the refill window to 2778 us at 115200 baud, reducing THRI events per 256-byte frame from ~32 to ~8 and eliminating the underrun. Only TLR[3:0] is written; TLR[7:4] is left at zero, so the RX trigger retains its FCR default. Only TX interrupt timing is affected. While increasing the SPI clock would also reduce per-round-trip latency, the driver should work correctly regardless of SPI speed. The fix belongs in the driver. Tested on i.MX6ULL (ARM Cortex-A7, single-core) with SC16IS752IBS over SPI at 1 MHz, 115200 baud 8N1, 256-byte Modbus RTU frames under production load: IRQ thread (irq/134-spi2.0): ~15-17% -> ~5% (~67% reduction) sys CPU: ~51-61% -> ~19-28% (~55% reduction) load average: ~2.0-2.2 -> ~0.65-1.3 No mid-frame gaps observed after fix. Without fix, oscilloscope confirmed 757 us inter-burst gaps causing Modbus frame fragmentation. Cc: stable@vger.kernel.org Reported-by: Tobias Gannert <tobias.gannert@ziehl-abegg.de> Tested-by: Tobias Gannert <tobias.gannert@ziehl-abegg.de> Reviewed-by: Joachim Knorr <joachim.knorr@ziehl-abegg.de> Signed-off-by: Paul Mbewe <paultyson.mbewe@ziehl-abegg.de> --- drivers/tty/serial/sc16is7xx.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/drivers/tty/serial/sc16is7xx.c b/drivers/tty/serial/sc16is7xx.c index 395a219280be..476e0dd3fa7f 100644 --- a/drivers/tty/serial/sc16is7xx.c +++ b/drivers/tty/serial/sc16is7xx.c @@ -1196,6 +1196,16 @@ static int sc16is7xx_startup(struct uart_port *port) SC16IS7XX_TCR_RX_RESUME(24) | SC16IS7XX_TCR_RX_HALT(48)); + /* + * Set TX FIFO trigger level to 32 spaces (half FIFO) via TLR. The reset + * default (TLR=0) falls back to the FCR TX trigger of 8 free spaces, + * requiring ~8 SPI round-trips per 64-byte FIFO load. On slow single-core + * SPI hosts, this accumulated latency can cause a TX FIFO underrun gap + * between bursts. + */ + sc16is7xx_port_write(port, SC16IS7XX_TLR_REG, + SC16IS7XX_TLR_TX_TRIGGER(32)); + /* Disable TCR/TLR access */ sc16is7xx_port_update(port, SC16IS7XX_MCR_REG, SC16IS7XX_MCR_TCRTLR_BIT, 0); -- 2.43.0 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] serial: sc16is7xx: set TX FIFO trigger level to half FIFO to prevent underruns 2026-06-23 11:22 ` [PATCH 2/2] serial: sc16is7xx: set TX FIFO trigger level to half FIFO to prevent underruns Paul Mbewe @ 2026-06-23 12:45 ` David Laight 2026-06-23 14:01 ` Paul Mbewe 0 siblings, 1 reply; 15+ messages in thread From: David Laight @ 2026-06-23 12:45 UTC (permalink / raw) To: Paul Mbewe Cc: linux-serial, linux-kernel, gregkh, jirislaby, hvilleneuve, stable, Tobias Gannert, Joachim Knorr On Tue, 23 Jun 2026 13:22:25 +0200 Paul Mbewe <paultyson.mbewe@ziehl-abegg.de> wrote: > The THRI interrupt (IER[1]) fires when the TX FIFO free space reaches the > configured threshold. With the reset default (TLR=0), the chip falls back > to the FCR TX trigger of 8 free spaces (FCR[5:4]=00), causing THRI to > assert after every 8 bytes drain from the FIFO. > > At 115200 baud 8N1, 8 bytes drain in 694 us. On slow single-core SPI > hosts, the combined latency of an SPI IIR read, TXLVL read, and 8-byte > THR write per interrupt, plus kthread scheduling jitter, can exceed this > window on a loaded system. When the kthread cannot refill the FIFO within > 694 us, the FIFO empties and produces an idle gap on the TX line. That seems strange. The earlier interrupt (8 free fifo spaces) ought to make it less likely that the fifo will underrun. Are you sure it isn't the other way around? So the interrupt happens when there are 8 bytes left in the fifo. With a 64 byte fifo each interrupt would fill at least 56 bytes. Increasing it to 32 (bytes left in the fifo) gives more time for the interrupt latency (etc), but reduces the number of bytes written by each isr to at least 32. The other possibility is some error passing the level sensitive 'fifo empty' state through to scheduling the kthread. It may well be that the system interrupt latency is small enough that you can take the interrupt at 'half full', and that the associated reduction in the number of interrupts makes the system behave better. But this isn't what the commit message says. David > > This violates the Modbus RTU specification, which treats any intra-frame > silence longer than 1.5 character times (~130 us at 115200 baud) as a > frame boundary, causing receivers to fragment frames and report CRC errors. > Oscilloscope measurements confirmed a 757 us inter-burst gap during > continuous transmission without this fix. > > Setting the TX trigger to 32 free spaces (half FIFO) via TLR[3:0]=8 > widens the refill window to 2778 us at 115200 baud, reducing THRI events > per 256-byte frame from ~32 to ~8 and eliminating the underrun. > > Only TLR[3:0] is written; TLR[7:4] is left at zero, so the RX trigger > retains its FCR default. Only TX interrupt timing is affected. > > While increasing the SPI clock would also reduce per-round-trip latency, > the driver should work correctly regardless of SPI speed. The fix belongs > in the driver. > > Tested on i.MX6ULL (ARM Cortex-A7, single-core) with SC16IS752IBS over > SPI at 1 MHz, 115200 baud 8N1, 256-byte Modbus RTU frames under production > load: > > IRQ thread (irq/134-spi2.0): ~15-17% -> ~5% (~67% reduction) > sys CPU: ~51-61% -> ~19-28% (~55% reduction) > load average: ~2.0-2.2 -> ~0.65-1.3 > > No mid-frame gaps observed after fix. Without fix, oscilloscope confirmed > 757 us inter-burst gaps causing Modbus frame fragmentation. > > Cc: stable@vger.kernel.org > Reported-by: Tobias Gannert <tobias.gannert@ziehl-abegg.de> > Tested-by: Tobias Gannert <tobias.gannert@ziehl-abegg.de> > Reviewed-by: Joachim Knorr <joachim.knorr@ziehl-abegg.de> > Signed-off-by: Paul Mbewe <paultyson.mbewe@ziehl-abegg.de> > --- > drivers/tty/serial/sc16is7xx.c | 10 ++++++++++ > 1 file changed, 10 insertions(+) > > diff --git a/drivers/tty/serial/sc16is7xx.c b/drivers/tty/serial/sc16is7xx.c > index 395a219280be..476e0dd3fa7f 100644 > --- a/drivers/tty/serial/sc16is7xx.c > +++ b/drivers/tty/serial/sc16is7xx.c > @@ -1196,6 +1196,16 @@ static int sc16is7xx_startup(struct uart_port *port) > SC16IS7XX_TCR_RX_RESUME(24) | > SC16IS7XX_TCR_RX_HALT(48)); > > + /* > + * Set TX FIFO trigger level to 32 spaces (half FIFO) via TLR. The reset > + * default (TLR=0) falls back to the FCR TX trigger of 8 free spaces, > + * requiring ~8 SPI round-trips per 64-byte FIFO load. On slow single-core > + * SPI hosts, this accumulated latency can cause a TX FIFO underrun gap > + * between bursts. > + */ > + sc16is7xx_port_write(port, SC16IS7XX_TLR_REG, > + SC16IS7XX_TLR_TX_TRIGGER(32)); > + > /* Disable TCR/TLR access */ > sc16is7xx_port_update(port, SC16IS7XX_MCR_REG, SC16IS7XX_MCR_TCRTLR_BIT, 0); > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] serial: sc16is7xx: set TX FIFO trigger level to half FIFO to prevent underruns 2026-06-23 12:45 ` David Laight @ 2026-06-23 14:01 ` Paul Mbewe 2026-06-23 15:07 ` David Laight 0 siblings, 1 reply; 15+ messages in thread From: Paul Mbewe @ 2026-06-23 14:01 UTC (permalink / raw) To: david.laight.linux Cc: linux-serial, linux-kernel, gregkh, jirislaby, hvilleneuve, stable, tobias.gannert, joachim.knorr, Paul Mbewe Hi David, Thanks for the detailed review. According to the SC16IS7xx datasheet, the TX trigger level is defined in terms of free FIFO spaces, not remaining data. So with the default configuration (FCR[5:4] = 00), the THRI interrupt fires when the FIFO has 8 free entries, i.e. when it still contains 56 bytes. While this in theory leaves enough data in the FIFO, in practice the system has to service many small refill cycles (~8 bytes per interrupt). On slow SPI hosts, each cycle involves threaded interrupt handling and multiple SPI transactions, and the cumulative latency plus scheduling jitter can exceed the available margin between refills under load. Increasing the trigger level to 32 free spaces reduces the number of refill cycles significantly (from ~8 per FIFO load to ~2), and increases the amount of data written per cycle. This reduces scheduling pressure and, in practice, avoids the FIFO draining to empty between bursts. The current commit message focuses too much on the "refill window" and does not explain this aspect clearly. I can rephrase it to better describe the interaction between trigger level, refill granularity and system latency. Thanks, Paul ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] serial: sc16is7xx: set TX FIFO trigger level to half FIFO to prevent underruns 2026-06-23 14:01 ` Paul Mbewe @ 2026-06-23 15:07 ` David Laight 2026-06-23 17:13 ` Paul Mbewe 0 siblings, 1 reply; 15+ messages in thread From: David Laight @ 2026-06-23 15:07 UTC (permalink / raw) To: Paul Mbewe Cc: linux-serial, linux-kernel, gregkh, jirislaby, hvilleneuve, stable, tobias.gannert, joachim.knorr On Tue, 23 Jun 2026 16:01:55 +0200 Paul Mbewe <paultyson.mbewe@ziehl-abegg.de> wrote: > Hi David, > > Thanks for the detailed review. > > According to the SC16IS7xx datasheet, the TX trigger level is defined > in terms of free FIFO spaces, not remaining data. So with the default > configuration (FCR[5:4] = 00), the THRI interrupt fires when the FIFO > has 8 free entries, i.e. when it still contains 56 bytes. > > While this in theory leaves enough data in the FIFO, in practice the > system has to service many small refill cycles (~8 bytes per interrupt). > On slow SPI hosts, each cycle involves threaded interrupt handling and > multiple SPI transactions, and the cumulative latency plus scheduling > jitter can exceed the available margin between refills under load. But that cost/time is much the same regardless of the trigger level. Changing the level from 8 to 32 significantly reduces the allowable latency. > Increasing the trigger level to 32 free spaces reduces the number of > refill cycles significantly (from ~8 per FIFO load to ~2), and increases > the amount of data written per cycle. This reduces scheduling pressure > and, in practice, avoids the FIFO draining to empty between bursts. But shouldn't it should all catch up. The isr thread should start finding more than 8 bytes space in the fifo, write enough bytes to fill it and the next interrupt should happen about 8 byte times after the previous one finishes. That does rely on nothing 'going wrong' between the hardware interrupt and the isr thread. What priority does the isr thread run at? If it isn't running at an RT priority then the scheduler might decide to reduce its priority which could easily generate what you are seeing. I'll agree that changing the threshold reduces system load - so should give extra time for 'other work'. But I don't really see why it be the correct way to stop underruns. David > > The current commit message focuses too much on the "refill window" and > does not explain this aspect clearly. I can rephrase it to better > describe the interaction between trigger level, refill granularity and > system latency. > > Thanks, > Paul > > _______________________________________ > > ZIEHL-ABEGG > > Executive Board: Joachim Ley (Chairman), Marco Altherr, Wolfgang Mayer > Supervisory Board: Dennis Ziehl (Chairman) > > Court of Registry: District Court Stuttgart HRB 746188 > Company Seat: Künzelsau, Germany > > Der Inhalt dieser E-Mail und/oder jegliche Anhänge können vertrauliche Mitteilungen enthalten und sind ausschließlich für den bezeichneten Adressaten bestimmt. > Wenn Sie nicht der vorgesehene Adressat dieser E-Mail oder dessen Vertreter sein sollten, so beachten Sie bitte, dass jede Form der Kenntnisnahme, Veröffentlichung, > Vervielfältigung oder Weitergabe des Inhalts dieser E-Mail einschließlich Anhängen unzulässig ist. > Wir bitten Sie, sich in diesem Fall mit dem Absender der E-Mail in Verbindung zu setzen. > > The content of this e-mail and/or attachments may contain confidential information and is intended solely for the named recipient. > If you are not the intended recipient of this e-mail or on its distribution list, please note that any type of disclosure, publication, > copying or distribution of the content of this e-mail including attachments is strictly forbidden. > In this case, we would kindly ask you to notify the sender of the e-mail. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] serial: sc16is7xx: set TX FIFO trigger level to half FIFO to prevent underruns 2026-06-23 15:07 ` David Laight @ 2026-06-23 17:13 ` Paul Mbewe 2026-06-23 18:42 ` David Laight 0 siblings, 1 reply; 15+ messages in thread From: Paul Mbewe @ 2026-06-23 17:13 UTC (permalink / raw) To: david.laight.linux Cc: linux-serial, linux-kernel, gregkh, jirislaby, hvilleneuve, stable, tobias.gannert, joachim.knorr, Paul Mbewe Hi David, On the test system, the relevant threads already run with RT priority: irq/134-spi2.0 SCHED_FIFO priority 50 sc16is7xx SCHED_FIFO priority 50 So this is not caused by the IRQ thread running as a normal SCHED_OTHER task. That does not remove all latency sources, of course; on this single-core SPI system the threaded IRQ can still be affected by other RT/kernel activity, IRQ/preemption-disabled sections, SPI transfer time, or lock contention. I agree, changing the TX trigger from 8 to 32 free spaces reduces the per-interrupt time-to-empty margin. With an 8-space trigger the FIFO still contains 56 bytes when THRI asserts, while with a 32-space trigger it contains 32 bytes. So the v1 commit message is misleading when it describes this as increasing the refill window. The observed effect is instead that the 8-space trigger causes many small TX refill events. Each event has roughly the same cost, as you said. If the handler runs in time, it can catch up by seeing more than 8 free spaces and writing more data. The failure happens when one event is delayed long enough for the FIFO to drain. Using a 32-space trigger reduces the number of refill events and the associated IRQ/SPI load. It also reduces the chance that one delayed event lets the FIFO drain completely. On the tested setup this reduced irq/134-spi2.0 CPU usage from about 15-17% to about 5%, sys CPU from about 51-61% to about 19-28%, and load average from about 2.0-2.2 to about 0.65-1.3. With that change, the observed TX gaps disappeared. So I agree the commit message should be reworked to describe this as reducing TX refill events and IRQ/SPI load, not as increasing the per-interrupt latency margin. If changing the default trigger globally is considered too broad, I can also look at making the TX trigger configurable or limiting the change to SPI-backed devices. Thanks Paul ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] serial: sc16is7xx: set TX FIFO trigger level to half FIFO to prevent underruns 2026-06-23 17:13 ` Paul Mbewe @ 2026-06-23 18:42 ` David Laight 2026-06-25 12:59 ` Maarten Brock 0 siblings, 1 reply; 15+ messages in thread From: David Laight @ 2026-06-23 18:42 UTC (permalink / raw) To: Paul Mbewe Cc: linux-serial, linux-kernel, gregkh, jirislaby, hvilleneuve, stable, tobias.gannert, joachim.knorr On Tue, 23 Jun 2026 19:13:28 +0200 Paul Mbewe <paultyson.mbewe@ziehl-abegg.de> wrote: > Hi David, > > On the test system, the relevant threads already run with RT priority: > > irq/134-spi2.0 SCHED_FIFO priority 50 > sc16is7xx SCHED_FIFO priority 50 > > So this is not caused by the IRQ thread running as a normal SCHED_OTHER > task. That does not remove all latency sources, of course; on this > single-core SPI system the threaded IRQ can still be affected by other > RT/kernel activity, IRQ/preemption-disabled sections, SPI transfer time, > or lock contention. > > I agree, changing the TX trigger from 8 to 32 free spaces reduces the > per-interrupt time-to-empty margin. With an 8-space trigger the FIFO > still contains 56 bytes when THRI asserts, while with a 32-space trigger > it contains 32 bytes. So the v1 commit message is misleading when it > describes this as increasing the refill window. > > The observed effect is instead that the 8-space trigger causes many > small TX refill events. Each event has roughly the same cost, as you > said. If the handler runs in time, it can catch up by seeing more than > 8 free spaces and writing more data. The failure happens when one event > is delayed long enough for the FIFO to drain. > > Using a 32-space trigger reduces the number of refill events and the > associated IRQ/SPI load. It also reduces the chance that one delayed > event lets the FIFO drain completely. On the tested setup this reduced > irq/134-spi2.0 CPU usage from about 15-17% to about 5%, sys CPU from > about 51-61% to about 19-28%, and load average from about 2.0-2.2 to > about 0.65-1.3. With that change, the observed TX gaps disappeared. Even with those figures (and the cpu % differences seem reasonable) I still don't see why it stops the underruns. An interesting exercise would be to call to ftrace_stop() when the tx fifo is unexpectedly nearly empty. Then enable all the ftrace scheduler and interrupt events (they don't normally slow things down that much) and run the test with ftrace enabled. When the trace gets stopped (you can just 'cat tracing_on' to find out when it has been stopped) just 'less trace' to see what happened. I think that will show that the isr thread is scheduled but doesn't run for a while because something else is 'hogging' the cpu. I've used this set of events - seemed to be useful. Clear old events echo >set_event System call entry and exit: echo 'syscalls:*' >>set_event IRQ entry and exit: echo irq:irq_handler_entry irq:irq_handler_exit >>set_event echo irq:softirq_entry irq:softirq_exit >>set_event Scheduler process switch events: echo sched:sched_switch >>set_event echo sched:sched_wakeup sched:sched_wakeup_new >>set_event echo sched:sched_migrate_task >>set_event David > > So I agree the commit message should be reworked to describe this as > reducing TX refill events and IRQ/SPI load, not as increasing the > per-interrupt latency margin. > > If changing the default trigger globally is considered too broad, I can > also look at making the TX trigger configurable or limiting the change to > SPI-backed devices. > > Thanks > Paul > > _______________________________________ > > ZIEHL-ABEGG > > Executive Board: Joachim Ley (Chairman), Marco Altherr, Wolfgang Mayer > Supervisory Board: Dennis Ziehl (Chairman) > > Court of Registry: District Court Stuttgart HRB 746188 > Company Seat: Künzelsau, Germany > > Der Inhalt dieser E-Mail und/oder jegliche Anhänge können vertrauliche Mitteilungen enthalten und sind ausschließlich für den bezeichneten Adressaten bestimmt. > Wenn Sie nicht der vorgesehene Adressat dieser E-Mail oder dessen Vertreter sein sollten, so beachten Sie bitte, dass jede Form der Kenntnisnahme, Veröffentlichung, > Vervielfältigung oder Weitergabe des Inhalts dieser E-Mail einschließlich Anhängen unzulässig ist. > Wir bitten Sie, sich in diesem Fall mit dem Absender der E-Mail in Verbindung zu setzen. > > The content of this e-mail and/or attachments may contain confidential information and is intended solely for the named recipient. > If you are not the intended recipient of this e-mail or on its distribution list, please note that any type of disclosure, publication, > copying or distribution of the content of this e-mail including attachments is strictly forbidden. > In this case, we would kindly ask you to notify the sender of the e-mail. ^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: [PATCH 2/2] serial: sc16is7xx: set TX FIFO trigger level to half FIFO to prevent underruns 2026-06-23 18:42 ` David Laight @ 2026-06-25 12:59 ` Maarten Brock 2026-07-01 16:40 ` Paul Mbewe 0 siblings, 1 reply; 15+ messages in thread From: Maarten Brock @ 2026-06-25 12:59 UTC (permalink / raw) To: David Laight, Paul Mbewe Cc: linux-serial@vger.kernel.org, linux-kernel@vger.kernel.org, gregkh@linuxfoundation.org, jirislaby@kernel.org, hvilleneuve@dimonoff.com, stable@vger.kernel.org, tobias.gannert@ziehl-abegg.de, joachim.knorr@ziehl-abegg.de > From: David Laight <david.laight.linux@gmail.com> > Sent: Tuesday 23 June 2026 20:42 > > On Tue, 23 Jun 2026 19:13:28 +0200 > Paul Mbewe <paultyson.mbewe@ziehl-abegg.de> wrote: > > > Hi David, > > > > On the test system, the relevant threads already run with RT priority: > > > > irq/134-spi2.0 SCHED_FIFO priority 50 > > sc16is7xx SCHED_FIFO priority 50 > > > > So this is not caused by the IRQ thread running as a normal SCHED_OTHER > > task. That does not remove all latency sources, of course; on this > > single-core SPI system the threaded IRQ can still be affected by other > > RT/kernel activity, IRQ/preemption-disabled sections, SPI transfer time, > > or lock contention. > > > > I agree, changing the TX trigger from 8 to 32 free spaces reduces the > > per-interrupt time-to-empty margin. With an 8-space trigger the FIFO > > still contains 56 bytes when THRI asserts, while with a 32-space trigger > > it contains 32 bytes. So the v1 commit message is misleading when it > > describes this as increasing the refill window. I would say plain wrong and as such cannot be accepted. > > The observed effect is instead that the 8-space trigger causes many > > small TX refill events. Each event has roughly the same cost, as you > > said. If the handler runs in time, it can catch up by seeing more than > > 8 free spaces and writing more data. The failure happens when one event > > is delayed long enough for the FIFO to drain. True. > > Using a 32-space trigger reduces the number of refill events and the > > associated IRQ/SPI load. It also reduces the chance that one delayed > > event lets the FIFO drain completely. Not true. The necessary delay to trigger a drained FIFO is reduced. So the chance that such a delay occurs is increased. > > On the tested setup this reduced > > irq/134-spi2.0 CPU usage from about 15-17% to about 5%, sys CPU from > > about 51-61% to about 19-28%, and load average from about 2.0-2.2 to > > about 0.65-1.3. With that change, the observed TX gaps disappeared. > > Even with those figures (and the cpu % differences seem reasonable) > I still don't see why it stops the underruns. It probably stops the underruns because the lower CPU load from the IRQ gives the CPU more time to handle the overhead and other high priority tasks. It seems like the scheduling overhead is bigger than the actual interrupt handling. > > > > So I agree the commit message should be reworked to describe this as > > reducing TX refill events and IRQ/SPI load, not as increasing the > > per-interrupt latency margin. > > > > If changing the default trigger globally is considered too broad, I can > > also look at making the TX trigger configurable or limiting the change to > > SPI-backed devices. I consider this to be the right approach. The user is the one that knows about the platform and its properties (#cores, cpu freq, scheduling latency, spi freq, uart baudrate). Still, I think it is good to change the default to halfway the fifo depth. It reduces the interrupt frequency by a factor 4 (8/32) at the cost of a less than halved allowed latency (32/56). And this is probably equally useful for the RX FIFO when data comes in at full speed. With a halved allowed latency before overruns happen the interrupt frequency can also be divided by 4 resulting in an similar lower cpu load. > > > > Thanks > > Paul Kind regards, Maarten ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] serial: sc16is7xx: set TX FIFO trigger level to half FIFO to prevent underruns 2026-06-25 12:59 ` Maarten Brock @ 2026-07-01 16:40 ` Paul Mbewe 2026-07-01 17:41 ` David Laight 0 siblings, 1 reply; 15+ messages in thread From: Paul Mbewe @ 2026-07-01 16:40 UTC (permalink / raw) To: david.laight.linux Cc: Maarten.Brock, linux-serial, linux-kernel, gregkh, jirislaby, hvilleneuve, tobias.gannert, joachim.knorr Hi David, Maarten, I rechecked this with ftrace, and I think the right explanation is narrower than my v1 commit message. The underrun trace shows the SC16IS7xx refill path running with the hardware TX FIFO already empty while the tty xmit FIFO still had data queued, for example: sc16is7xx delayed refill: line=0 txlvl=64 pending=172 So the conclusions for me are: - The driver can catch up if the FIFO is only partially drained when the THRI path runs. TXLVL will report more than the nominal trigger amount and the driver can write more bytes. - Once TXLVL is 64, the FIFO has already emptied and the TX gap has already occurred. - The v1 explanation was wrong to describe a 32-free-space trigger as increasing the per-interrupt refill window. It does not. With the default trigger, the FIFO has about 56 character times left when THRI fires; with a 32-free-space trigger, it has about 32 character times left. - As Maarten pointed out, the benefit is the other side of the trade-off: the refill event rate is reduced by 4x, from roughly 1 refill per 8 transmitted bytes to 1 refill per 32 transmitted bytes. - On the tested SPI-backed system, that lower refill rate reduces the IRQ/SPI load enough to avoid the observed underruns. The change is limited to SPI-attached parts so the I2C case is unaffected. For v2 I will: - rework the commit message around Maarten's trade-off calculation: 4x fewer TX refill events, at the cost of reducing the time-to-empty margin from 56 to 32 character times; - limit the TX trigger change to SPI-backed SC16IS7xx devices; - leave the RX trigger level unchanged. If maintainers would prefer this to be board-configurable instead, I can look at a follow-up DT binding/property, but I would prefer to keep v2 focused on the TX/SPI case. Thanks, Paul ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] serial: sc16is7xx: set TX FIFO trigger level to half FIFO to prevent underruns 2026-07-01 16:40 ` Paul Mbewe @ 2026-07-01 17:41 ` David Laight 2026-07-02 12:17 ` Maarten Brock 0 siblings, 1 reply; 15+ messages in thread From: David Laight @ 2026-07-01 17:41 UTC (permalink / raw) To: Paul Mbewe Cc: Maarten.Brock, linux-serial, linux-kernel, gregkh, jirislaby, hvilleneuve, tobias.gannert, joachim.knorr On Wed, 1 Jul 2026 18:40:56 +0200 Paul Mbewe <paultyson.mbewe@ziehl-abegg.de> wrote: > Hi David, Maarten, > > I rechecked this with ftrace, and I think the right explanation is > narrower than my v1 commit message. > > The underrun trace shows the SC16IS7xx refill path running with the > hardware TX FIFO already empty while the tty xmit FIFO still had data > queued, for example: > > sc16is7xx delayed refill: line=0 txlvl=64 pending=172 > > So the conclusions for me are: > > - The driver can catch up if the FIFO is only partially drained when > the THRI path runs. TXLVL will report more than the nominal trigger > amount and the driver can write more bytes. > > - Once TXLVL is 64, the FIFO has already emptied and the TX gap has > already occurred. Right, but there must be something happening after the previous time the fifo was filled that makes this one happen too late. Either the hardware ISR was very late or it didn't schedule the thread. Thinks..... Maybe you are getting a fifo empty interrupt (rather than the fifo threshold) interrupt, with the threshold one getting lost. So what happens if the threshold is reached while the threaded ISR is still running? When that happens the ISR needs to loop, either in software or by re-enabling a level-sensitive interrupt so that another hardware interrupt happens immediately. I'm not at all sure how that works with threaded ISR on the SPI interface. Especially since you really don't want to read any more SPI registers than absolutely necessary (you need to avoid PCIe reads as well!). For a 'normal' ethernet driver rx path it is important to check the next rx ring entry a second time after enabling the hardware interrupt. Otherwise you miss a wakeup. I think you are getting (effectively) the same problem. You might need to use (say) ktime_get_ns() to work out how many bytes were transmitted between the fifo space being read and the end of the fifo writes to see if you need to do a second refill. (Or, rather, to detect that you definitely don't need to do one.) With the threshold set to 32 it is much less likely that it will be hit during the fifo writes, so you are much less likely to lose an interrupt. But scheduling delays could still make it happen. David > > - The v1 explanation was wrong to describe a 32-free-space trigger as > increasing the per-interrupt refill window. It does not. With the > default trigger, the FIFO has about 56 character times left when THRI > fires; with a 32-free-space trigger, it has about 32 character times > left. > > - As Maarten pointed out, the benefit is the other side of the > trade-off: the refill event rate is reduced by 4x, from roughly > 1 refill per 8 transmitted bytes to 1 refill per 32 transmitted bytes. > > - On the tested SPI-backed system, that lower refill rate reduces the > IRQ/SPI load enough to avoid the observed underruns. > > The change is limited to SPI-attached parts so the I2C case is > unaffected. > > For v2 I will: > > - rework the commit message around Maarten's trade-off calculation: > 4x fewer TX refill events, at the cost of reducing the time-to-empty > margin from 56 to 32 character times; > - limit the TX trigger change to SPI-backed SC16IS7xx devices; > - leave the RX trigger level unchanged. > > If maintainers would prefer this to be board-configurable instead, I can > look at a follow-up DT binding/property, but I would prefer to keep v2 > focused on the TX/SPI case. > > Thanks, > Paul > > _______________________________________ > > ZIEHL-ABEGG > > Executive Board: Joachim Ley (Chairman), Marco Altherr, Wolfgang Mayer > Supervisory Board: Dennis Ziehl (Chairman) > > Court of Registry: District Court Stuttgart HRB 746188 > Company Seat: Künzelsau, Germany > > Der Inhalt dieser E-Mail und/oder jegliche Anhänge können vertrauliche Mitteilungen enthalten und sind ausschließlich für den bezeichneten Adressaten bestimmt. > Wenn Sie nicht der vorgesehene Adressat dieser E-Mail oder dessen Vertreter sein sollten, so beachten Sie bitte, dass jede Form der Kenntnisnahme, Veröffentlichung, > Vervielfältigung oder Weitergabe des Inhalts dieser E-Mail einschließlich Anhängen unzulässig ist. > Wir bitten Sie, sich in diesem Fall mit dem Absender der E-Mail in Verbindung zu setzen. > > The content of this e-mail and/or attachments may contain confidential information and is intended solely for the named recipient. > If you are not the intended recipient of this e-mail or on its distribution list, please note that any type of disclosure, publication, > copying or distribution of the content of this e-mail including attachments is strictly forbidden. > In this case, we would kindly ask you to notify the sender of the e-mail. ^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: [PATCH 2/2] serial: sc16is7xx: set TX FIFO trigger level to half FIFO to prevent underruns 2026-07-01 17:41 ` David Laight @ 2026-07-02 12:17 ` Maarten Brock 2026-07-02 21:24 ` David Laight 0 siblings, 1 reply; 15+ messages in thread From: Maarten Brock @ 2026-07-02 12:17 UTC (permalink / raw) To: David Laight, Paul Mbewe, Crescent Hsieh Cc: linux-serial@vger.kernel.org, linux-kernel@vger.kernel.org, gregkh@linuxfoundation.org, jirislaby@kernel.org, hvilleneuve@dimonoff.com, tobias.gannert@ziehl-abegg.de, joachim.knorr@ziehl-abegg.de > From: David Laight <david.laight.linux@gmail.com> > Sent: Wednesday 1 July 2026 19:42 > > On Wed, 1 Jul 2026 18:40:56 +0200 > Paul Mbewe <paultyson.mbewe@ziehl-abegg.de> wrote: > > > Hi David, Maarten, > > > > I rechecked this with ftrace, and I think the right explanation is > > narrower than my v1 commit message. > > > > The underrun trace shows the SC16IS7xx refill path running with the > > hardware TX FIFO already empty while the tty xmit FIFO still had data > > queued, for example: > > > > sc16is7xx delayed refill: line=0 txlvl=64 pending=172 > > > > So the conclusions for me are: > > > > - The driver can catch up if the FIFO is only partially drained when > > the THRI path runs. TXLVL will report more than the nominal trigger > > amount and the driver can write more bytes. > > > > - Once TXLVL is 64, the FIFO has already emptied and the TX gap has > > already occurred. > > Right, but there must be something happening after the previous time the > fifo was filled that makes this one happen too late. > Either the hardware ISR was very late or it didn't schedule the thread. > > Thinks..... > Maybe you are getting a fifo empty interrupt (rather than the fifo threshold) > interrupt, with the threshold one getting lost. > So what happens if the threshold is reached while the threaded ISR is still > running? > When that happens the ISR needs to loop, either in software or by re-enabling > a level-sensitive interrupt so that another hardware interrupt happens > immediately. The loop is already present in sc16is7xx_irq() with do { ... } while (keep_polling); > I'm not at all sure how that works with threaded ISR on the SPI interface. > Especially since you really don't want to read any more SPI registers > than absolutely necessary (you need to avoid PCIe reads as well!). > > For a 'normal' ethernet driver rx path it is important to check the next > rx ring entry a second time after enabling the hardware interrupt. > Otherwise you miss a wakeup. > I think you are getting (effectively) the same problem. > > You might need to use (say) ktime_get_ns() to work out how many bytes > were transmitted between the fifo space being read and the end of the > fifo writes to see if you need to do a second refill. > (Or, rather, to detect that you definitely don't need to do one.) > > With the threshold set to 32 it is much less likely that it will be hit > during the fifo writes, so you are much less likely to lose an interrupt. > But scheduling delays could still make it happen. > > David > > > > > - The v1 explanation was wrong to describe a 32-free-space trigger as > > increasing the per-interrupt refill window. It does not. With the > > default trigger, the FIFO has about 56 character times left when THRI > > fires; with a 32-free-space trigger, it has about 32 character times > > left. > > > > - As Maarten pointed out, the benefit is the other side of the > > trade-off: the refill event rate is reduced by 4x, from roughly > > 1 refill per 8 transmitted bytes to 1 refill per 32 transmitted bytes. > > > > - On the tested SPI-backed system, that lower refill rate reduces the > > IRQ/SPI load enough to avoid the observed underruns. > > > > The change is limited to SPI-attached parts so the I2C case is > > unaffected. > > > > For v2 I will: > > > > - rework the commit message around Maarten's trade-off calculation: > > 4x fewer TX refill events, at the cost of reducing the time-to-empty > > margin from 56 to 32 character times; > > - limit the TX trigger change to SPI-backed SC16IS7xx devices; > > - leave the RX trigger level unchanged. I see no reason to limit this to SPI or to the transmit case only. > > If maintainers would prefer this to be board-configurable instead, I can > > look at a follow-up DT binding/property, but I would prefer to keep v2 > > focused on the TX/SPI case. > > > > Thanks, > > Paul Further, I think this should be combined with this recent patch proposal from Crescent Hsieh: [PATCH v2 14/15] serial: 8250: allow UART drivers to override rx_trig_bytes handling which introduces a way to set/get the rxtrig count per port. A similar thing can be added for txtrig. Kind regards, Maarten Brock ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] serial: sc16is7xx: set TX FIFO trigger level to half FIFO to prevent underruns 2026-07-02 12:17 ` Maarten Brock @ 2026-07-02 21:24 ` David Laight 2026-07-03 18:19 ` Paul Mbewe 0 siblings, 1 reply; 15+ messages in thread From: David Laight @ 2026-07-02 21:24 UTC (permalink / raw) To: Maarten Brock Cc: Paul Mbewe, Crescent Hsieh, linux-serial@vger.kernel.org, linux-kernel@vger.kernel.org, gregkh@linuxfoundation.org, jirislaby@kernel.org, hvilleneuve@dimonoff.com, tobias.gannert@ziehl-abegg.de, joachim.knorr@ziehl-abegg.de On Thu, 2 Jul 2026 12:17:01 +0000 Maarten Brock <Maarten.Brock@sttls.nl> wrote: > > From: David Laight <david.laight.linux@gmail.com> > > Sent: Wednesday 1 July 2026 19:42 > > > > On Wed, 1 Jul 2026 18:40:56 +0200 > > Paul Mbewe <paultyson.mbewe@ziehl-abegg.de> wrote: > > > > > Hi David, Maarten, > > > > > > I rechecked this with ftrace, and I think the right explanation is > > > narrower than my v1 commit message. > > > > > > The underrun trace shows the SC16IS7xx refill path running with the > > > hardware TX FIFO already empty while the tty xmit FIFO still had data > > > queued, for example: > > > > > > sc16is7xx delayed refill: line=0 txlvl=64 pending=172 > > > > > > So the conclusions for me are: > > > > > > - The driver can catch up if the FIFO is only partially drained when > > > the THRI path runs. TXLVL will report more than the nominal trigger > > > amount and the driver can write more bytes. > > > > > > - Once TXLVL is 64, the FIFO has already emptied and the TX gap has > > > already occurred. > > > > Right, but there must be something happening after the previous time the > > fifo was filled that makes this one happen too late. > > Either the hardware ISR was very late or it didn't schedule the thread. > > > > Thinks..... > > Maybe you are getting a fifo empty interrupt (rather than the fifo threshold) > > interrupt, with the threshold one getting lost. > > So what happens if the threshold is reached while the threaded ISR is still > > running? > > When that happens the ISR needs to loop, either in software or by re-enabling > > a level-sensitive interrupt so that another hardware interrupt happens > > immediately. > > The loop is already present in sc16is7xx_irq() with > do { ... } while (keep_polling); That loop also processes multiple interrupt sources. I've been looking at the code (but don't have the data sheet). I'd look at what: sc16is7xx_port_read(port, SC16IS7XX_TXLVL_REG); returns both at the bottom of sc16is7xx_handle_tx() and much later when the IER is re-written by the other work item. You might find trace_printf() useful - writes into the ftrace buffer. Did ftrace show you a hardware interrupt at the time when the fifo would have been expected to have 8 free entries? Also how many byte times later was the interrupt (real and threaded) that found the fifo empty? I also noticed that the code re-enabled the tx interrupt at the end of the threaded isr - is this actually needed? or is it just there because the same code is called to send the first buffer. If this is needed then it is possible that it won't generate an interrupt if the fifo already has enough space. There is also a question of how the hardware interrupt itself is handled. I think there will be a single level-sensitive output from the device itself. This will trigger the hardware interrupt which then schedules the threaded isr, but the hardware interrupt has to be masked until the threaded isr finishes. When the hardware interrupt is re-enabled you need the hardware ISR to run if the irq line is still (or again) asserted. If that is edge triggered things will go wrong in the way you are seeing. David > > > I'm not at all sure how that works with threaded ISR on the SPI interface. > > Especially since you really don't want to read any more SPI registers > > than absolutely necessary (you need to avoid PCIe reads as well!). > > > > For a 'normal' ethernet driver rx path it is important to check the next > > rx ring entry a second time after enabling the hardware interrupt. > > Otherwise you miss a wakeup. > > I think you are getting (effectively) the same problem. > > > > You might need to use (say) ktime_get_ns() to work out how many bytes > > were transmitted between the fifo space being read and the end of the > > fifo writes to see if you need to do a second refill. > > (Or, rather, to detect that you definitely don't need to do one.) > > > > With the threshold set to 32 it is much less likely that it will be hit > > during the fifo writes, so you are much less likely to lose an interrupt. > > But scheduling delays could still make it happen. > > > > David > > > > > > > > - The v1 explanation was wrong to describe a 32-free-space trigger as > > > increasing the per-interrupt refill window. It does not. With the > > > default trigger, the FIFO has about 56 character times left when THRI > > > fires; with a 32-free-space trigger, it has about 32 character times > > > left. > > > > > > - As Maarten pointed out, the benefit is the other side of the > > > trade-off: the refill event rate is reduced by 4x, from roughly > > > 1 refill per 8 transmitted bytes to 1 refill per 32 transmitted bytes. > > > > > > - On the tested SPI-backed system, that lower refill rate reduces the > > > IRQ/SPI load enough to avoid the observed underruns. > > > > > > The change is limited to SPI-attached parts so the I2C case is > > > unaffected. > > > > > > For v2 I will: > > > > > > - rework the commit message around Maarten's trade-off calculation: > > > 4x fewer TX refill events, at the cost of reducing the time-to-empty > > > margin from 56 to 32 character times; > > > - limit the TX trigger change to SPI-backed SC16IS7xx devices; > > > - leave the RX trigger level unchanged. > > I see no reason to limit this to SPI or to the transmit case only. > > > > If maintainers would prefer this to be board-configurable instead, I can > > > look at a follow-up DT binding/property, but I would prefer to keep v2 > > > focused on the TX/SPI case. > > > > > > Thanks, > > > Paul > > Further, I think this should be combined with this recent patch proposal from Crescent Hsieh: > [PATCH v2 14/15] serial: 8250: allow UART drivers to override rx_trig_bytes handling > > which introduces a way to set/get the rxtrig count per port. A similar thing can be added for txtrig. > > Kind regards, > Maarten Brock > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] serial: sc16is7xx: set TX FIFO trigger level to half FIFO to prevent underruns 2026-07-02 21:24 ` David Laight @ 2026-07-03 18:19 ` Paul Mbewe 2026-07-03 20:36 ` David Laight 0 siblings, 1 reply; 15+ messages in thread From: Paul Mbewe @ 2026-07-03 18:19 UTC (permalink / raw) To: david.laight.linux Cc: Maarten.Brock, crescentcy.hsieh, linux-serial, linux-kernel, gregkh, jirislaby, hvilleneuve, tobias.gannert, joachim.knorr Hi David, Maarten, short status update before v2. David, your ethernet-rx double-check analogy reinforced the direction I had started investigating from the ftrace data. With ftrace plus a scope on the INT line, I now see two refill-latency modes: 1. Under-fill, common case. sc16is7xx reads TXLVL once and writes that many bytes. On a slow SPI write, the wire can drain about as fast as the driver fills. The FIFO free space then stays at or above the trigger level, no new threshold crossing occurs, and no further THRI is generated until the FIFO is empty. The scope confirms that the INT line stays deasserted in this case, so this is not a lost SoC edge. max310x avoids this pattern by re-reading the FIFO level each refill iteration and filling until the xmit FIFO is empty or the hardware FIFO is full. I am testing the same refill pattern for sc16is7xx. 2. Preempted refill, rarer case. Some traces show the IRQ thread being delayed during the refill path. In those cases the FIFO does appear to be pushed below the trigger, but the next THRI still does not arrive until the FIFO is empty. That looks closer to your "threshold reached while the threaded ISR is still running and the re-arm is lost" case. I want to see whether this still survives the TXLVL re-read fix before adding a TX watchdog. On your specific questions: - The trailing sc16is7xx_ier_set(THRI) is a 1->1 update mid-message. The bit is already set, so it does not create a fresh event. The driver relies on the FIFO being refilled far enough for the chip to produce a new threshold crossing. - trigger=32 is not a wider refill window. It gives about 4x fewer refills, lowering IRQ/SPI load and reducing exposure to these latency outliers. I will keep it as a separate load-reduction patch with the corrected wording, no longer as the root fix. Maarten, I will drop the SPI-only guard. I also agree that configurable RX/TX trigger levels belong with Crescent's rx_trig_bytes work as a follow-up, rather than blocking this underrun fix. I will follow up with the A/B numbers on Monday. Thanks, Paul ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] serial: sc16is7xx: set TX FIFO trigger level to half FIFO to prevent underruns 2026-07-03 18:19 ` Paul Mbewe @ 2026-07-03 20:36 ` David Laight 0 siblings, 0 replies; 15+ messages in thread From: David Laight @ 2026-07-03 20:36 UTC (permalink / raw) To: Paul Mbewe Cc: Maarten.Brock, crescentcy.hsieh, linux-serial, linux-kernel, gregkh, jirislaby, hvilleneuve, tobias.gannert, joachim.knorr On Fri, 3 Jul 2026 20:19:39 +0200 Paul Mbewe <paultyson.mbewe@ziehl-abegg.de> wrote: > Hi David, Maarten, > > short status update before v2. > > David, your ethernet-rx double-check analogy reinforced the direction I > had started investigating from the ftrace data. With ftrace plus a scope > on the INT line, I now see two refill-latency modes: > > 1. Under-fill, common case. > > sc16is7xx reads TXLVL once and writes that many bytes. On a slow SPI > write, the wire can drain about as fast as the driver fills. The FIFO > free space then stays at or above the trigger level, no new threshold > crossing occurs, and no further THRI is generated until the FIFO is > empty. That doesn't sound very good at all. You need the refill path to be much faster then the drain path. It might be worth testing at a much lower baud rate to see how it 'should' operate. (I've run serial code at 'single digit' baud rates in the past, needed to see exactly how the zilog scc handled sending crc at the end of hdlc frames and just how broken its reporting of received aborts was.) That path also sounds like it doesn't depend on the fifo trigger level. One thing I did notice is that the initial write doesn't loop and only puts one block of data into the fifo. The isr isn't enabled until the end (and by a diffrent thread). The block of data also isn't guaranteed to fill the hardware fifo, it only contains data to the end of the software ring buffer - so might only be 1 byte. After a partial write to the fifo, the code ought to try to get more data from the ring buffer before doing anything else. > > The scope confirms that the INT line stays deasserted in this case, Are you sure? I'd expect it to be active low and asserted throughout. I think the 'threaded ISR' code masks the interrupt on the interrupt controller until the ISR thread finishes. (I've NFI how this actually works, especially if PCIe MSI(-X) interrupts are involved.) > so this is not a lost SoC edge. max310x avoids this pattern by > re-reading the FIFO level each refill iteration and filling until the > xmit FIFO is empty or the hardware FIFO is full. I am testing the same > refill pattern for sc16is7xx. I'd look up what causes the IIR to report THRI. It might be based in the fifo level (and whether the ISR is enabled), but might be cleared by reads and set when the fifo crosses the level. It seems odd that you are seeing THRI read multiple times but the IRQ line is inactive, perhaps the IRQ line gets pulsed - that will make life hard. There is a timing window between the read of the IIR that return 'nothing to do' and the system re-enabling the hardware interrupt when the threaded ISR finishes. The hardware interrupt has to fire at that point. I also noticed that the code seems to be doing slow RMW cycles at the end of every tx to enable/disable the tx interrupt. The read shouldn't be needed (the driver knows the current value) and the write is almost certainly not needed unless the value has changed. The whole operation is also deferred to a different worker - that can't help. > > 2. Preempted refill, rarer case. > > Some traces show the IRQ thread being delayed during the refill path. > In those cases the FIFO does appear to be pushed below the trigger, > but the next THRI still does not arrive until the FIFO is empty. That > looks closer to your "threshold reached while the threaded ISR is > still running and the re-arm is lost" case. > > I want to see whether this still survives the TXLVL re-read fix before > adding a TX watchdog. > > On your specific questions: > > - The trailing sc16is7xx_ier_set(THRI) is a 1->1 update mid-message. The > bit is already set, so it does not create a fresh event. The driver > relies on the FIFO being refilled far enough for the chip to produce a > new threshold crossing. So you mean the the chip is generating 'threshold crossing' interrupt rather than a 'threshold level' one. That relies on the driver filling the fifo enough. Given it only writes the bytes that are contiguous in the ring buffer it might only write one byte. > - trigger=32 is not a wider refill window. It gives about 4x fewer > refills, lowering IRQ/SPI load and reducing exposure to these latency > outliers. I will keep it as a separate load-reduction patch with the > corrected wording, no longer as the root fix. That was my original thought. Something subtle makes it happen less often, but whatever it is can still happen. OTOH if the interrupt latency is 'good enough' for trigger=32 the reduced system/isr load is probably a good idea. David > > Maarten, I will drop the SPI-only guard. I also agree that configurable > RX/TX trigger levels belong with Crescent's rx_trig_bytes work as a > follow-up, rather than blocking this underrun fix. > > I will follow up with the A/B numbers on Monday. > > Thanks, > Paul > > _______________________________________ > > ZIEHL-ABEGG > > Executive Board: Joachim Ley (Chairman), Marco Altherr, Wolfgang Mayer > Supervisory Board: Dennis Ziehl (Chairman) > > Court of Registry: District Court Stuttgart HRB 746188 > Company Seat: Künzelsau, Germany > > Der Inhalt dieser E-Mail und/oder jegliche Anhänge können vertrauliche Mitteilungen enthalten und sind ausschließlich für den bezeichneten Adressaten bestimmt. > Wenn Sie nicht der vorgesehene Adressat dieser E-Mail oder dessen Vertreter sein sollten, so beachten Sie bitte, dass jede Form der Kenntnisnahme, Veröffentlichung, > Vervielfältigung oder Weitergabe des Inhalts dieser E-Mail einschließlich Anhängen unzulässig ist. > Wir bitten Sie, sich in diesem Fall mit dem Absender der E-Mail in Verbindung zu setzen. > > The content of this e-mail and/or attachments may contain confidential information and is intended solely for the named recipient. > If you are not the intended recipient of this e-mail or on its distribution list, please note that any type of disclosure, publication, > copying or distribution of the content of this e-mail including attachments is strictly forbidden. > In this case, we would kindly ask you to notify the sender of the e-mail. ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2026-07-03 20:37 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-06-23 11:22 [PATCH 0/2] serial: sc16is7xx: fix TX inter-frame gaps on SPI UARTs Paul Mbewe 2026-06-23 11:22 ` [PATCH 1/2] serial: sc16is7xx: fix TX gap caused by kfifo circular buffer wrap-around Paul Mbewe 2026-06-23 11:22 ` [PATCH 2/2] serial: sc16is7xx: set TX FIFO trigger level to half FIFO to prevent underruns Paul Mbewe 2026-06-23 12:45 ` David Laight 2026-06-23 14:01 ` Paul Mbewe 2026-06-23 15:07 ` David Laight 2026-06-23 17:13 ` Paul Mbewe 2026-06-23 18:42 ` David Laight 2026-06-25 12:59 ` Maarten Brock 2026-07-01 16:40 ` Paul Mbewe 2026-07-01 17:41 ` David Laight 2026-07-02 12:17 ` Maarten Brock 2026-07-02 21:24 ` David Laight 2026-07-03 18:19 ` Paul Mbewe 2026-07-03 20:36 ` David Laight
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox