Memory copy between Linux-managed RAM and other RAM

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Mason <mpeg.blue@free.fr>
To: Linux ARM <linux-arm-kernel@lists.infradead.org>
Cc: LKML <linux-kernel@vger.kernel.org>
Subject: Memory copy between Linux-managed RAM and other RAM
Date: Sun, 28 Dec 2014 19:24:43 +0100	[thread overview]
Message-ID: <54A04AEB.1070502@free.fr> (raw)

Hello everyone,

I'm working on a Cortex-A9 SoC equipped with 2 GB of RAM.

However, Linux is only given a fraction (typically 256 MB) of the RAM
to manage (via the mem= bootparam) while the rest is managed using
"OS-agnostic software". This "other memory" is meant to be shared
between different hardware blocks of the SoC.

We have a custom "memory_copy" kernel module, to copy between
"Linux-managed RAM" and "SoC-wide RAM". However, the performance
of this routine is... disappointingly underwhelming (8.5 MB/s).

Taking a closer look at the implementation, I spotted some
inefficiencies.

1) data is first copied (in chunks) to a temporary kernel buffer

2) for each word, a hardware remap is setup, then the word
is copied, then the hardware remap is reset. (This hardware
remap technique dates back to when we used MIPS.)

I thought I could both make the implementation simpler, and boost
the performance.

A) I used ioremap to have Linux map the "SoC-wide RAM" physical
addresses to virtual addresses that can be used in the module.

B) I then use copy_{to,from}_user directly between the user-space
buffer and the "SoC-wide RAM".

This approach is ~20x faster than the original.

My main question is:

Is this safe/guaranteed to work all the time? (as long as the
"SoC-wide RAM" is indeed RAM, not MM registers)

Secondary thoughts/questions:

We have routines for accesses in units of {8,16,32} bits.
Since we're dealing with memory, I don't think the width
of the accesses is important, right? (for correctness)

AFAIU, ioremap maps as MT_DEVICE, i.e. uncached, no WC,
all memory optimizations disabled, etc. There might be
some performance improvements by using cached accesses,
and manually flushing when the copy is done.

Also, I don't know if copy_{to,from}_user is optimized
using SIMD/NEON? Maybe there is some perf left on the
table there?

Regards.

                 reply	other threads:[~2014-12-28 18:32 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54A04AEB.1070502@free.fr \
    --to=mpeg.blue@free.fr \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox