Re: [PATCH 0/4] serial: omap: robustify for high speed transfers

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Peter Hurley <peter@hurleysoftware.com>
To: John Ogness <john.ogness@linutronix.de>
Cc: gregkh@linuxfoundation.org, vinod.koul@intel.com,
	dan.j.williams@intel.com, bigeasy@linutronix.de,
	tony@atomide.com, nsekhar@ti.com, peter.ujfalusi@ti.com,
	dmaengine@vger.kernel.org, linux-serial@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 0/4] serial: omap: robustify for high speed transfers
Date: Tue, 2 Feb 2016 17:21:07 -0800	[thread overview]
Message-ID: <56B15603.80807@hurleysoftware.com> (raw)
In-Reply-To: <87y4b85onp.fsf@linutronix.de>

On 01/29/2016 08:35 AM, John Ogness wrote:
> Hi Peter,
> 
> On 2016-01-25, Peter Hurley <peter@hurleysoftware.com> wrote:
>>> The DMA-enabled OMAP UART driver in its current form queues 48 bytes
>>> for a DMA-RX transfer. After the transfer is complete, a new transfer
>>> of 48 bytes is queued. The DMA completion callback runs in tasklet
>>> context, so a reschedule with context switch is required for the
>>> completion to be processed and the next 48 bytes to be queued.
>>>
>>> When running at a high speed such as 3Mbit, the CPU has 128us between
>>> when the DMA hardware transfer completes and when the DMA hardware
>>> must be fully prepared for the next transfer. For an embedded board
>>> running applications, this does not give the CPU much time. If the
>>> UART is using hardware flow control, this situation results in a
>>> dramatic decrease in real transfer speeds. If flow control is not
>>> used, the CPU will almost certainly be forced to drop data.
>>
>> I'm not convinced by this logic at all.
>> Tasklets are not affected by the scheduler because they run in softirq.
>> Or is this -RT?
> 
> Softirq runs as SCHED_OTHER. It is quite easy to create a scenario where
> DMA completion tasklets for this driver are not being serviced fast
> enough.
> 
>> I'm not seeing this problem on other platforms at this baud rate,
> 
> Do you run 3Mbit on other platforms without hardware flow control?

Yes, but only unidirectionally.


> I mention this because turning on hardware flow control can cover up the
> driver shortcomings by slowing down the transfers. What good is 3Mbit
> hardware if the driver never lets it get above 500Kbit on bulk
> transfers?

That's interesting. I wonder why the 6x hit when using h/w flow control.
Any thoughts on that?

>> and on this platform, all I see is lockups with DMA.
> 
> I have seen (and fixed) interesting issues with the AM335x eDMA, but I
> have not experienced lockups in any of my testing. I'd be curious how
> you trigger that.

I haven't tested it since 4.1. I'll go back and re-enable DMA and retest.


>> What is the test setup to reproduce these results?
> 
> Two Beaglebone boards connected via ttyS1. ttyS1's are set to raw mode
> at 3Mbit.
> 
> sender:   cat bigfile > /dev/ttyS1
> receiver: cat /dev/ttyS1 > bigfile

Ok, I can repro something similar.


> I am working on creating concrete examples that demonstrate not only
> that this patch series reduces system load (and thus can increase
> throughput on heavy system loads with hardware flow control), but also
> that it is able to handle baud rates without data loss well beyond the
> current implementation when no flow control is used.
> 
> I wanted to wait until I had all the results before answering your
> email. But I'm getting caught up in other tasks right now, so it may
> take a few more weeks.

Ok. So just to be clear here: this patchset is really all about
performance improvement and not correct operation?


>>> This patch series modifies the UART driver to use cyclic DMA transfers
>>> with a growable ring buffer to accommodate baud rates. The ring buffer is
>>> large enough to hold at least 1s of RX-data. 
>>> (At 3Mbit that is 367KiB.)
>>
>> Math slightly off because the frame is typically 10 bits, not 8.
> 
> I was thinking 8 was the minimal frame size. Thanks for pointing that
> out. A frame can contain 7-12 bits so I will modify the code to create a
> buffer appropriate for the UART settings. At 3Mbit with 5n1 the driver
> would require a 419KiB ring buffer (8929 DMA periods of 48 bytes).

More about this below.


>>> In order to ensure that data in the ring buffer is not overwritten before
>>> being processed by the tty layer, a hrtimer is used as a watchdog.
>>
>> How'd it go from "We're just missing 128us window" to "This holds 1s
>> of data"?
> 
> First, you need to recognize that DMA completion tasklets can be delayed
> significantly due to interrupt loads or rtprio processes (even on non-RT
> systems). And at 3Mbit we are talking about >12000 interrupts per
> second!

Not sure I see 12000 ints/sec. unless you're talking about full-duplex
at max rate in both directions?  3Mbit/sec / 10-bit frame / 48 bytes/dma = 
6250 ints/sec.

But again, interrupt load is not going to result in 100ms service intervals.
So I think we're really talking about a (misbehaved) rtprio process that's
starving i/o.


> When using cyclic transfers, the only real concern is that the DMA
> overwrites data in the ring buffer before the CPU has processed it due
> to tasklet delays. That is what the hrtimer watchdog is for.
> 
> Assuming the DMA is zooming along at full speed, the watchdog must be
> able to trigger before the ring buffer can fill up. If the watchdog sees
> the ring buffer is getting full, it pauses the DMA engine. But with
> cyclic DMA we never know if the DMA is zooming or sitting idle. So even
> on an idle system, the watchdog must assume DMA zooming and continually
> fire to check the status.
> 
> I chose 1 second buffer sizes and set the watchdog to fire at half
> that. On an idle system you will see at most 2 new interrupts per second
> due to this patch series. I thought that would be an acceptable trade
> off. Whether the watchdog should fire at 50% buffer full or say 90%
> buffer full is something that could be debated. But to answer your
> question, the big ring buffer is really to keep the watchdog interrupts
> low frequency.

Ok, but your fundamental premise is that all of this is an acceptable
space-time tradeoff for everyone using this platform, when it's not.

So I'm trying to understand the actual use case you're trying to address.
I doubt that's 5n1, full-duplex.


>> And with a latency hit this bad, you'll never get the data to the
>> process because the tty buffer kworker will buffer-overflow too and
>> its much more susceptible to timing latency (although not as bad now
>> that it's exclusively on the unbounded workqueue).
> 
> Yes, you are correct. But I think that is a problem that should be
> addressed at the tty layer.

I disagree. I think you should fix the source of 500ms latency.

Regards,
Peter Hurley

next prev parent reply	other threads:[~2016-02-03  1:21 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-22 10:27 [PATCH 0/4] serial: omap: robustify for high speed transfers John Ogness
2016-01-22 10:27 ` John Ogness
2016-01-25 18:56 ` Peter Hurley
2016-01-29 16:35   ` John Ogness
2016-02-03  1:21     ` Peter Hurley [this message]
2016-02-11 12:02       ` John Ogness
2016-02-11 21:00         ` Tony Lindgren
2016-02-22 15:30           ` John Ogness
2016-02-22 19:38             ` Tony Lindgren
2016-02-23  9:59             ` Sekhar Nori
2016-02-23  9:59               ` Sekhar Nori
2016-02-23 12:43               ` Sebastian Andrzej Siewior
2016-02-23 16:56                 ` Andy Shevchenko
2016-02-24  3:20               ` Peter Hurley
2016-02-24 15:37                 ` Sekhar Nori
2016-02-24 15:37                   ` Sekhar Nori
2016-02-24 15:46                   ` Sebastian Andrzej Siewior
2016-03-07 20:23                   ` Peter Hurley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56B15603.80807@hurleysoftware.com \
    --to=peter@hurleysoftware.com \
    --cc=bigeasy@linutronix.de \
    --cc=dan.j.williams@intel.com \
    --cc=dmaengine@vger.kernel.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=john.ogness@linutronix.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-serial@vger.kernel.org \
    --cc=nsekhar@ti.com \
    --cc=peter.ujfalusi@ti.com \
    --cc=tony@atomide.com \
    --cc=vinod.koul@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.