Hardware spec prevents optimal performance in device driver

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Mason <slash.tmp@free.fr>
To: linux-serial@vger.kernel.org
Cc: LKML <linux-kernel@vger.kernel.org>,
	Peter Hurley <peter@hurleysoftware.com>,
	Mans Rullgard <mans@mansr.com>
Subject: Hardware spec prevents optimal performance in device driver
Date: Sat, 09 May 2015 12:22:43 +0200	[thread overview]
Message-ID: <554DDFF3.5060906@free.fr> (raw)

Hello everyone,

I'm writing a device driver for a serial-ish kind of device.
I'm interested in the TX side of the problem. (I'm working on
an ARM Cortex A9 system by the way.)

There's a 16-byte TX FIFO. Data is queued to the FIFO by writing
{1,2,4} bytes to a TX{8,16,32} memory-mapped register.
Reading the TX_DEPTH register returns the current queue depth.

The TX_READY IRQ is asserted when (and only when) TX_DEPTH
transitions from 1 to 0.

With this spec in mind, I don't see how it is possible to
attain optimal TX performance in the driver. There's a race
between the SW thread filling the queue and the HW thread
emptying it.

My first attempt went along these lines:

SW thread pseudo-code (blocking write)

while (bytes_to_send > 16) {
  write 16 bytes to the queue /* NON ATOMIC */
  bytes_to_send -= 16;
  wait for semaphore
}
write the last bytes to the queue
wait for semaphore

The simplest way to "write 16 bytes to the queue" is
a byte-access loop.

for (i = 0; i < 16; ++i) write buf[i] to TX8
or -- just slightly more complex
for (i = 0; i < 4; ++i) write buf[4i .. 4i+3] to TX32

But you see the problem: I write a byte, and then, for some
reason (low freq from cpufreq, IRQ) the CPU takes a very long
time to get to the next, thus TX_READY fires before I even
write the next byte.

In short, TX_READY could fire at any point while filling the queue.

In my opinion, the semantics of TX_READY are fuzzy. When I hit
the ISR, I just know that "the TX queue reached 0 at some point
in time" but the HW might still be working on sending some bytes.

Seems the best one can do is:

while (bytes_to_send > 4) {
  write 4 bytes to TX32 /* ATOMIC */
  bytes_to_send -= 4;
  wait for semaphore
}
while (bytes_to_send > 0) {
  write 1 byte to TX8 /* ATOMIC */
  bytes_to_send -= 1;
  wait for semaphore
}

(This is ignoring the fact that the original buffer to
send may not be word-aligned, I will have to investigate
misaligned loads, or handle the first 0-3 bytes manually.)

In the solution proposed above, using atomic writes to
the device, I know that TX_READY signals "the work you
requested in now complete". But I have sacrificed
performance, as I will take an IRQ for every 4 bytes,
instead of one for every 16 bytes.

Is this making any sense? Or am I completely mistaken?

Regards.

next             reply	other threads:[~2015-05-09 10:22 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-09 10:22 Mason [this message]
2015-05-09 17:32 ` Hardware spec prevents optimal performance in device driver One Thousand Gnomes
2015-05-09 20:48   ` Mason
2015-05-10 10:29     ` Måns Rullgård
2015-05-10 16:46       ` Mason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=554DDFF3.5060906@free.fr \
    --to=slash.tmp@free.fr \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-serial@vger.kernel.org \
    --cc=mans@mansr.com \
    --cc=peter@hurleysoftware.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.