From: Peter Hurley <peter@hurleysoftware.com>
To: John Ogness <john.ogness@linutronix.de>
Cc: gregkh@linuxfoundation.org, vinod.koul@intel.com,
dan.j.williams@intel.com, bigeasy@linutronix.de,
tony@atomide.com, nsekhar@ti.com, peter.ujfalusi@ti.com,
dmaengine@vger.kernel.org, linux-serial@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH 0/4] serial: omap: robustify for high speed transfers
Date: Tue, 2 Feb 2016 17:21:07 -0800 [thread overview]
Message-ID: <56B15603.80807@hurleysoftware.com> (raw)
In-Reply-To: <87y4b85onp.fsf@linutronix.de>
On 01/29/2016 08:35 AM, John Ogness wrote:
> Hi Peter,
>
> On 2016-01-25, Peter Hurley <peter@hurleysoftware.com> wrote:
>>> The DMA-enabled OMAP UART driver in its current form queues 48 bytes
>>> for a DMA-RX transfer. After the transfer is complete, a new transfer
>>> of 48 bytes is queued. The DMA completion callback runs in tasklet
>>> context, so a reschedule with context switch is required for the
>>> completion to be processed and the next 48 bytes to be queued.
>>>
>>> When running at a high speed such as 3Mbit, the CPU has 128us between
>>> when the DMA hardware transfer completes and when the DMA hardware
>>> must be fully prepared for the next transfer. For an embedded board
>>> running applications, this does not give the CPU much time. If the
>>> UART is using hardware flow control, this situation results in a
>>> dramatic decrease in real transfer speeds. If flow control is not
>>> used, the CPU will almost certainly be forced to drop data.
>>
>> I'm not convinced by this logic at all.
>> Tasklets are not affected by the scheduler because they run in softirq.
>> Or is this -RT?
>
> Softirq runs as SCHED_OTHER. It is quite easy to create a scenario where
> DMA completion tasklets for this driver are not being serviced fast
> enough.
>
>> I'm not seeing this problem on other platforms at this baud rate,
>
> Do you run 3Mbit on other platforms without hardware flow control?
Yes, but only unidirectionally.
> I mention this because turning on hardware flow control can cover up the
> driver shortcomings by slowing down the transfers. What good is 3Mbit
> hardware if the driver never lets it get above 500Kbit on bulk
> transfers?
That's interesting. I wonder why the 6x hit when using h/w flow control.
Any thoughts on that?
>> and on this platform, all I see is lockups with DMA.
>
> I have seen (and fixed) interesting issues with the AM335x eDMA, but I
> have not experienced lockups in any of my testing. I'd be curious how
> you trigger that.
I haven't tested it since 4.1. I'll go back and re-enable DMA and retest.
>> What is the test setup to reproduce these results?
>
> Two Beaglebone boards connected via ttyS1. ttyS1's are set to raw mode
> at 3Mbit.
>
> sender: cat bigfile > /dev/ttyS1
> receiver: cat /dev/ttyS1 > bigfile
Ok, I can repro something similar.
> I am working on creating concrete examples that demonstrate not only
> that this patch series reduces system load (and thus can increase
> throughput on heavy system loads with hardware flow control), but also
> that it is able to handle baud rates without data loss well beyond the
> current implementation when no flow control is used.
>
> I wanted to wait until I had all the results before answering your
> email. But I'm getting caught up in other tasks right now, so it may
> take a few more weeks.
Ok. So just to be clear here: this patchset is really all about
performance improvement and not correct operation?
>>> This patch series modifies the UART driver to use cyclic DMA transfers
>>> with a growable ring buffer to accommodate baud rates. The ring buffer is
>>> large enough to hold at least 1s of RX-data.
>>> (At 3Mbit that is 367KiB.)
>>
>> Math slightly off because the frame is typically 10 bits, not 8.
>
> I was thinking 8 was the minimal frame size. Thanks for pointing that
> out. A frame can contain 7-12 bits so I will modify the code to create a
> buffer appropriate for the UART settings. At 3Mbit with 5n1 the driver
> would require a 419KiB ring buffer (8929 DMA periods of 48 bytes).
More about this below.
>>> In order to ensure that data in the ring buffer is not overwritten before
>>> being processed by the tty layer, a hrtimer is used as a watchdog.
>>
>> How'd it go from "We're just missing 128us window" to "This holds 1s
>> of data"?
>
> First, you need to recognize that DMA completion tasklets can be delayed
> significantly due to interrupt loads or rtprio processes (even on non-RT
> systems). And at 3Mbit we are talking about >12000 interrupts per
> second!
Not sure I see 12000 ints/sec. unless you're talking about full-duplex
at max rate in both directions? 3Mbit/sec / 10-bit frame / 48 bytes/dma =
6250 ints/sec.
But again, interrupt load is not going to result in 100ms service intervals.
So I think we're really talking about a (misbehaved) rtprio process that's
starving i/o.
> When using cyclic transfers, the only real concern is that the DMA
> overwrites data in the ring buffer before the CPU has processed it due
> to tasklet delays. That is what the hrtimer watchdog is for.
>
> Assuming the DMA is zooming along at full speed, the watchdog must be
> able to trigger before the ring buffer can fill up. If the watchdog sees
> the ring buffer is getting full, it pauses the DMA engine. But with
> cyclic DMA we never know if the DMA is zooming or sitting idle. So even
> on an idle system, the watchdog must assume DMA zooming and continually
> fire to check the status.
>
> I chose 1 second buffer sizes and set the watchdog to fire at half
> that. On an idle system you will see at most 2 new interrupts per second
> due to this patch series. I thought that would be an acceptable trade
> off. Whether the watchdog should fire at 50% buffer full or say 90%
> buffer full is something that could be debated. But to answer your
> question, the big ring buffer is really to keep the watchdog interrupts
> low frequency.
Ok, but your fundamental premise is that all of this is an acceptable
space-time tradeoff for everyone using this platform, when it's not.
So I'm trying to understand the actual use case you're trying to address.
I doubt that's 5n1, full-duplex.
>> And with a latency hit this bad, you'll never get the data to the
>> process because the tty buffer kworker will buffer-overflow too and
>> its much more susceptible to timing latency (although not as bad now
>> that it's exclusively on the unbounded workqueue).
>
> Yes, you are correct. But I think that is a problem that should be
> addressed at the tty layer.
I disagree. I think you should fix the source of 500ms latency.
Regards,
Peter Hurley
next prev parent reply other threads:[~2016-02-03 1:21 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-01-22 10:27 [PATCH 0/4] serial: omap: robustify for high speed transfers John Ogness
2016-01-25 18:56 ` Peter Hurley
2016-01-29 16:35 ` John Ogness
2016-02-03 1:21 ` Peter Hurley [this message]
2016-02-11 12:02 ` John Ogness
2016-02-11 21:00 ` Tony Lindgren
2016-02-22 15:30 ` John Ogness
2016-02-22 19:38 ` Tony Lindgren
2016-02-23 9:59 ` Sekhar Nori
2016-02-23 12:43 ` Sebastian Andrzej Siewior
2016-02-23 16:56 ` Andy Shevchenko
2016-02-24 3:20 ` Peter Hurley
2016-02-24 15:37 ` Sekhar Nori
2016-02-24 15:46 ` Sebastian Andrzej Siewior
2016-03-07 20:23 ` Peter Hurley
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56B15603.80807@hurleysoftware.com \
--to=peter@hurleysoftware.com \
--cc=bigeasy@linutronix.de \
--cc=dan.j.williams@intel.com \
--cc=dmaengine@vger.kernel.org \
--cc=gregkh@linuxfoundation.org \
--cc=john.ogness@linutronix.de \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-serial@vger.kernel.org \
--cc=nsekhar@ti.com \
--cc=peter.ujfalusi@ti.com \
--cc=tony@atomide.com \
--cc=vinod.koul@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).