Re: [PATCH 0/4] serial: omap: robustify for high speed transfers

linux-serial.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Peter Hurley <peter@hurleysoftware.com>
To: John Ogness <john.ogness@linutronix.de>
Cc: gregkh@linuxfoundation.org, vinod.koul@intel.com,
	dan.j.williams@intel.com, bigeasy@linutronix.de,
	tony@atomide.com, nsekhar@ti.com, peter.ujfalusi@ti.com,
	dmaengine@vger.kernel.org, linux-serial@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 0/4] serial: omap: robustify for high speed transfers
Date: Tue, 2 Feb 2016 17:21:07 -0800	[thread overview]
Message-ID: <56B15603.80807@hurleysoftware.com> (raw)
In-Reply-To: <87y4b85onp.fsf@linutronix.de>

On 01/29/2016 08:35 AM, John Ogness wrote:
> Hi Peter,
> 
> On 2016-01-25, Peter Hurley <peter@hurleysoftware.com> wrote:
>>> The DMA-enabled OMAP UART driver in its current form queues 48 bytes
>>> for a DMA-RX transfer. After the transfer is complete, a new transfer
>>> of 48 bytes is queued. The DMA completion callback runs in tasklet
>>> context, so a reschedule with context switch is required for the
>>> completion to be processed and the next 48 bytes to be queued.
>>>
>>> When running at a high speed such as 3Mbit, the CPU has 128us between
>>> when the DMA hardware transfer completes and when the DMA hardware
>>> must be fully prepared for the next transfer. For an embedded board
>>> running applications, this does not give the CPU much time. If the
>>> UART is using hardware flow control, this situation results in a
>>> dramatic decrease in real transfer speeds. If flow control is not
>>> used, the CPU will almost certainly be forced to drop data.
>>
>> I'm not convinced by this logic at all.
>> Tasklets are not affected by the scheduler because they run in softirq.
>> Or is this -RT?
> 
> Softirq runs as SCHED_OTHER. It is quite easy to create a scenario where
> DMA completion tasklets for this driver are not being serviced fast
> enough.
> 
>> I'm not seeing this problem on other platforms at this baud rate,
> 
> Do you run 3Mbit on other platforms without hardware flow control?

Yes, but only unidirectionally.


> I mention this because turning on hardware flow control can cover up the
> driver shortcomings by slowing down the transfers. What good is 3Mbit
> hardware if the driver never lets it get above 500Kbit on bulk
> transfers?

That's interesting. I wonder why the 6x hit when using h/w flow control.
Any thoughts on that?

>> and on this platform, all I see is lockups with DMA.
> 
> I have seen (and fixed) interesting issues with the AM335x eDMA, but I
> have not experienced lockups in any of my testing. I'd be curious how
> you trigger that.

I haven't tested it since 4.1. I'll go back and re-enable DMA and retest.


>> What is the test setup to reproduce these results?
> 
> Two Beaglebone boards connected via ttyS1. ttyS1's are set to raw mode
> at 3Mbit.
> 
> sender:   cat bigfile > /dev/ttyS1
> receiver: cat /dev/ttyS1 > bigfile

Ok, I can repro something similar.


> I am working on creating concrete examples that demonstrate not only
> that this patch series reduces system load (and thus can increase
> throughput on heavy system loads with hardware flow control), but also
> that it is able to handle baud rates without data loss well beyond the
> current implementation when no flow control is used.
> 
> I wanted to wait until I had all the results before answering your
> email. But I'm getting caught up in other tasks right now, so it may
> take a few more weeks.

Ok. So just to be clear here: this patchset is really all about
performance improvement and not correct operation?


>>> This patch series modifies the UART driver to use cyclic DMA transfers
>>> with a growable ring buffer to accommodate baud rates. The ring buffer is
>>> large enough to hold at least 1s of RX-data. 
>>> (At 3Mbit that is 367KiB.)
>>
>> Math slightly off because the frame is typically 10 bits, not 8.
> 
> I was thinking 8 was the minimal frame size. Thanks for pointing that
> out. A frame can contain 7-12 bits so I will modify the code to create a
> buffer appropriate for the UART settings. At 3Mbit with 5n1 the driver
> would require a 419KiB ring buffer (8929 DMA periods of 48 bytes).

More about this below.


>>> In order to ensure that data in the ring buffer is not overwritten before
>>> being processed by the tty layer, a hrtimer is used as a watchdog.
>>
>> How'd it go from "We're just missing 128us window" to "This holds 1s
>> of data"?
> 
> First, you need to recognize that DMA completion tasklets can be delayed
> significantly due to interrupt loads or rtprio processes (even on non-RT
> systems). And at 3Mbit we are talking about >12000 interrupts per
> second!

Not sure I see 12000 ints/sec. unless you're talking about full-duplex
at max rate in both directions?  3Mbit/sec / 10-bit frame / 48 bytes/dma = 
6250 ints/sec.

But again, interrupt load is not going to result in 100ms service intervals.
So I think we're really talking about a (misbehaved) rtprio process that's
starving i/o.


> When using cyclic transfers, the only real concern is that the DMA
> overwrites data in the ring buffer before the CPU has processed it due
> to tasklet delays. That is what the hrtimer watchdog is for.
> 
> Assuming the DMA is zooming along at full speed, the watchdog must be
> able to trigger before the ring buffer can fill up. If the watchdog sees
> the ring buffer is getting full, it pauses the DMA engine. But with
> cyclic DMA we never know if the DMA is zooming or sitting idle. So even
> on an idle system, the watchdog must assume DMA zooming and continually
> fire to check the status.
> 
> I chose 1 second buffer sizes and set the watchdog to fire at half
> that. On an idle system you will see at most 2 new interrupts per second
> due to this patch series. I thought that would be an acceptable trade
> off. Whether the watchdog should fire at 50% buffer full or say 90%
> buffer full is something that could be debated. But to answer your
> question, the big ring buffer is really to keep the watchdog interrupts
> low frequency.

Ok, but your fundamental premise is that all of this is an acceptable
space-time tradeoff for everyone using this platform, when it's not.

So I'm trying to understand the actual use case you're trying to address.
I doubt that's 5n1, full-duplex.


>> And with a latency hit this bad, you'll never get the data to the
>> process because the tty buffer kworker will buffer-overflow too and
>> its much more susceptible to timing latency (although not as bad now
>> that it's exclusively on the unbounded workqueue).
> 
> Yes, you are correct. But I think that is a problem that should be
> addressed at the tty layer.

I disagree. I think you should fix the source of 500ms latency.

Regards,
Peter Hurley

next prev parent reply	other threads:[~2016-02-03  1:21 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-22 10:27 [PATCH 0/4] serial: omap: robustify for high speed transfers John Ogness
2016-01-25 18:56 ` Peter Hurley
2016-01-29 16:35   ` John Ogness
2016-02-03  1:21     ` Peter Hurley [this message]
2016-02-11 12:02       ` John Ogness
2016-02-11 21:00         ` Tony Lindgren
2016-02-22 15:30           ` John Ogness
2016-02-22 19:38             ` Tony Lindgren
2016-02-23  9:59             ` Sekhar Nori
2016-02-23 12:43               ` Sebastian Andrzej Siewior
2016-02-23 16:56                 ` Andy Shevchenko
2016-02-24  3:20               ` Peter Hurley
2016-02-24 15:37                 ` Sekhar Nori
2016-02-24 15:46                   ` Sebastian Andrzej Siewior
2016-03-07 20:23                   ` Peter Hurley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56B15603.80807@hurleysoftware.com \
    --to=peter@hurleysoftware.com \
    --cc=bigeasy@linutronix.de \
    --cc=dan.j.williams@intel.com \
    --cc=dmaengine@vger.kernel.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=john.ogness@linutronix.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-serial@vger.kernel.org \
    --cc=nsekhar@ti.com \
    --cc=peter.ujfalusi@ti.com \
    --cc=tony@atomide.com \
    --cc=vinod.koul@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).