Re: Hardware spec prevents optimal performance in device driver

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Mason <slash.tmp@free.fr>
To: Mans Rullgard <mans@mansr.com>
Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>,
	linux-serial@vger.kernel.org, LKML <linux-kernel@vger.kernel.org>,
	Peter Hurley <peter@hurleysoftware.com>
Subject: Re: Hardware spec prevents optimal performance in device driver
Date: Sun, 10 May 2015 18:46:00 +0200	[thread overview]
Message-ID: <554F8B48.204@free.fr> (raw)
In-Reply-To: <yw1xbnhsk9ow.fsf@unicorn.mansr.com>

On 10/05/2015 12:29, Måns Rullgård wrote:

> Mason writes:
> 
>> One Thousand Gnomes wrote:
>>
>>> Mason wrote:
>>>
>>>> I'm writing a device driver for a serial-ish kind of device.
>>>> I'm interested in the TX side of the problem. (I'm working on
>>>> an ARM Cortex A9 system by the way.)
>>>>
>>>> There's a 16-byte TX FIFO. Data is queued to the FIFO by writing
>>>> {1,2,4} bytes to a TX{8,16,32} memory-mapped register.
>>>> Reading the TX_DEPTH register returns the current queue depth.
>>>>
>>>> The TX_READY IRQ is asserted when (and only when) TX_DEPTH
>>>> transitions from 1 to 0.
>>>
>>> If the last statement is correct then your performance is probably always
>>> going to suck unless there is additional invisible queueing beyond the
>>> visible FIFO.
>>
>> Do you agree with my assessment that the current semantics for
>> TX_READY lead to a race condition, unless we limit ourselves
>> to a single (atomic) write between interrupts?
> 
> No.  To get best throughput, you can simply busy-wait until TX_DEPTH
> indicates the FIFO is almost empty, then write a few words, but no more
> than you know fit in the FIFO.  Repeat until all data has been written.
> Use the IRQ only to signal completion of the entire packet.

Would you fill the FIFO with TX_READY disabled?
or with all interrupts masked?

I will show with pseudo-code where (I think) the race condition
breaks the algorithm you suggest. (When using IRQs, not busy wait.)

> If the transmit rate is low, you can save some CPU time by filling the
> FIFO, then sleeping until it should be almost empty, fill again, etc.

For one data point, the test app I have sets the tx rate to 128 kbps.
Thus, 1 ms to transmit an entire queue. CPU runs at 100-1000 MHz
depending on the mood of cpufreq.

> Whether busy-waiting or sleeping, this approach keeps the data flowing
> as fast as possible.
> 
> With the hardware you describe, there is unfortunately a trade-off
> between throughput and CPU efficiency.  You'll have to decide which is
> more important to you.

I can ask the hardware designer to change the behavior for the next
iteration of the SoC.

Regards.

     prev parent reply	other threads:[~2015-05-10 16:46 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-09 10:22 Hardware spec prevents optimal performance in device driver Mason
2015-05-09 17:32 ` One Thousand Gnomes
2015-05-09 20:48   ` Mason
2015-05-10 10:29     ` Måns Rullgård
2015-05-10 16:46       ` Mason [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=554F8B48.204@free.fr \
    --to=slash.tmp@free.fr \
    --cc=gnomes@lxorguk.ukuu.org.uk \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-serial@vger.kernel.org \
    --cc=mans@mansr.com \
    --cc=peter@hurleysoftware.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.