All of lore.kernel.org
 help / color / mirror / Atom feed
From: Thierry Reding <thierry.reding-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
To: Tomasz Figa <tfiga-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
Cc: Olav Haugan <ohaugan-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>,
	Alexandre Courbot
	<gnurou-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	Paul Walmsley <paul-DWxLp4Yu+b8AvxtiuMwx3w@public.gmane.org>,
	Arnd Bergmann <arnd-r2nGTMty4D4@public.gmane.org>,
	Tomeu Vizoso
	<tomeu.vizoso-ZGY8ohtN/8qB+jHODAdFcQ@public.gmane.org>,
	Stephen Warren <swarren-3lzwWm7+Weoh9ZMKESR00Q@public.gmane.org>,
	Antonios Motakis
	<a.motakis-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org>,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>,
	Mikko Perttunen
	<mperttunen-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
	Nicolas Iooss <nicolas.iooss_linux-oWGTIYur0i8@public.gmane.org>,
	Russell King <rmk+kernel-lFZ/pmaqli7XmaaqVzeoHQ@public.gmane.org>,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Vince Hsu <vince.h-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
Subject: Re: [RFC PATCH 0/3] iommu: Add range flush operation
Date: Tue, 29 Sep 2015 11:27:14 +0200	[thread overview]
Message-ID: <20150929092714.GD9460@ulmo.nvidia.com> (raw)
In-Reply-To: <1443504379-31841-1-git-send-email-tfiga-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>


[-- Attachment #1.1: Type: text/plain, Size: 4776 bytes --]

On Tue, Sep 29, 2015 at 02:25:23PM +0900, Tomasz Figa wrote:
> Currently the IOMMU subsystem provides 3 basic operations: iommu_map(),
> iommu_map_sg() and iommu_unmap(). iommu_map() can be used to map memory
> page by page, however it involves flushing the caches (CPU and IOMMU) for
> every mapped page separately, which is unsuitable for use cases that
> require low mapping latency. Similarly iommu_unmap(), even though it
> takes a full IOVA range as its argument, performs unmapping in a page
> by page manner.
> 
> To make mapping operation more suitable for such use cases, iommu_map_sg()
> and .map_sg() callback in iommu_ops struct were introduced, which allowed
> particular IOMMU drivers to directly iterate over SG entries, create
> necessary mappings and flush everything in one go.
> 
> This approach, however, has two drawbacks:
>  1) it does not do anything about unmap performance,
>  2) it requires each driver willing to have fast map to implement its
>     own SG iteration code, even though this is a mostly generic operation.
> 
> This series tries to mitigate the two issues above, while acknowledging
> the fact that the .map_sg() callback might be still necessary for some
> specific platforms, which could have the need to iterate over SG elements
> inside driver code. Proposed solution introduces a new .flush() callback,
> which expects IOVA range as its argument and is expected to flush all
> respective caches (be it CPU, IOMMU TLB or whatever) to make the given
> IOVA area mapping change visible to IOMMU clients. Then all the 3 basic
> map/unmap operations are modified to call the .flush() callback at the end
> of the operation. 
> 
> Advantages of proposed approach include:
>  1) ability to use default_iommu_map_sg() helper if all the driver needs
>     for performance optimization is batching the flush,
>  2) completely no effect on existing code - the .flush() callback is made
>     optional and if it isn't implemented drivers are expected to do
>     necessary flushes on a page by page basis in respective (un)mapping
>     callbakcs,
>  3) possibility of exporting the iommu_flush() operation and providing
>     unsynchronized map/unmap operations for subsystems with even higher
>     requirements for performance (e.g. drivers/gpu/drm).

That would require passing in some sort of flag that the core shouldn't
be flushing itself, right? Currently it would flush on every map/unmap.

> 
> The series includes a generic patch implementing necessary changes in
> IOMMU API and two Tegra-specific patches that demonstrate implementation
> on driver side and which can be used for further testing.
> 
> Last, but not least, some performance numbers on Tegra210:
> +-----------+--------------+-------------+------------+
> | Operation | Size [bytes] | Before [us] | After [us] |
> +-----------+--------------+-------------+------------+
> | Map       | 128K         |         139 |         40 |
> |           |              |         136 |         34 |
> |           |              |         137 |         38 |
> |           |              |         136 |         36 |
> |           | 4M           |        3939 |       1163 |
> |           |              |        3730 |       2389 |
> |           |              |        3613 |        997 |
> |           |              |        3622 |       1620 |
> |           | ~18M         |       18635 |       4741 |
> |           |              |       19261 |       6550 |
> |           |              |       18473 |       9304 |
> |           |              |       18125 |       5120 |
> | Unmap     | 128K         |         128 |          7 |
> |           |              |         122 |          8 |
> |           |              |         119 |         10 |
> |           |              |         123 |         12 |
> |           | 4M           |        3829 |        151 |
> |           |              |        3964 |        150 |
> |           |              |        3908 |        145 |
> |           |              |        3875 |        155 |
> |           | ~18M         |       18570 |        683 |
> |           |              |       18473 |        806 |
> |           |              |       21020 |        643 |
> |           |              |       21764 |        652 |
> +-----------+--------------+-------------+------------+
> The values are obtained by surrounding the calls to iommu_map_sg()
> (with default_iommu_map_sg() helper used as .map_sg() callback) and
> iommu_unmap() with ktime-based time measurement code. Taken 4 samples
> of every buffer size. ~18M means around 17-19M due do the variance
> in requested buffer sizes.

Those are pretty impressive numbers.

Thierry

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



WARNING: multiple messages have this Message-ID (diff)
From: Thierry Reding <thierry.reding@gmail.com>
To: Tomasz Figa <tfiga@chromium.org>
Cc: iommu@lists.linux-foundation.org, Joerg Roedel <joro@8bytes.org>,
	Hiroshi Doyu <hdoyu@nvidia.com>,
	Stephen Warren <swarren@wwwdotorg.org>,
	Alexandre Courbot <gnurou@gmail.com>,
	Vince Hsu <vince.h@nvidia.com>,
	Russell King <rmk+kernel@arm.linux.org.uk>,
	Paul Walmsley <paul@pwsan.com>,
	Tomeu Vizoso <tomeu.vizoso@collabora.com>,
	Mikko Perttunen <mperttunen@nvidia.com>,
	Will Deacon <will.deacon@arm.com>,
	Alex Williamson <alex.williamson@redhat.com>,
	Arnd Bergmann <arnd@arndb.de>,
	Marek Szyprowski <m.szyprowski@samsung.com>,
	Antonios Motakis <a.motakis@virtualopensystems.com>,
	Olav Haugan <ohaugan@codeaurora.org>,
	Nicolas Iooss <nicolas.iooss_linux@m4x.org>,
	linux-kernel@vger.kernel.org, linux-tegra@vger.kernel.org
Subject: Re: [RFC PATCH 0/3] iommu: Add range flush operation
Date: Tue, 29 Sep 2015 11:27:14 +0200	[thread overview]
Message-ID: <20150929092714.GD9460@ulmo.nvidia.com> (raw)
In-Reply-To: <1443504379-31841-1-git-send-email-tfiga@chromium.org>

[-- Attachment #1: Type: text/plain, Size: 4776 bytes --]

On Tue, Sep 29, 2015 at 02:25:23PM +0900, Tomasz Figa wrote:
> Currently the IOMMU subsystem provides 3 basic operations: iommu_map(),
> iommu_map_sg() and iommu_unmap(). iommu_map() can be used to map memory
> page by page, however it involves flushing the caches (CPU and IOMMU) for
> every mapped page separately, which is unsuitable for use cases that
> require low mapping latency. Similarly iommu_unmap(), even though it
> takes a full IOVA range as its argument, performs unmapping in a page
> by page manner.
> 
> To make mapping operation more suitable for such use cases, iommu_map_sg()
> and .map_sg() callback in iommu_ops struct were introduced, which allowed
> particular IOMMU drivers to directly iterate over SG entries, create
> necessary mappings and flush everything in one go.
> 
> This approach, however, has two drawbacks:
>  1) it does not do anything about unmap performance,
>  2) it requires each driver willing to have fast map to implement its
>     own SG iteration code, even though this is a mostly generic operation.
> 
> This series tries to mitigate the two issues above, while acknowledging
> the fact that the .map_sg() callback might be still necessary for some
> specific platforms, which could have the need to iterate over SG elements
> inside driver code. Proposed solution introduces a new .flush() callback,
> which expects IOVA range as its argument and is expected to flush all
> respective caches (be it CPU, IOMMU TLB or whatever) to make the given
> IOVA area mapping change visible to IOMMU clients. Then all the 3 basic
> map/unmap operations are modified to call the .flush() callback at the end
> of the operation. 
> 
> Advantages of proposed approach include:
>  1) ability to use default_iommu_map_sg() helper if all the driver needs
>     for performance optimization is batching the flush,
>  2) completely no effect on existing code - the .flush() callback is made
>     optional and if it isn't implemented drivers are expected to do
>     necessary flushes on a page by page basis in respective (un)mapping
>     callbakcs,
>  3) possibility of exporting the iommu_flush() operation and providing
>     unsynchronized map/unmap operations for subsystems with even higher
>     requirements for performance (e.g. drivers/gpu/drm).

That would require passing in some sort of flag that the core shouldn't
be flushing itself, right? Currently it would flush on every map/unmap.

> 
> The series includes a generic patch implementing necessary changes in
> IOMMU API and two Tegra-specific patches that demonstrate implementation
> on driver side and which can be used for further testing.
> 
> Last, but not least, some performance numbers on Tegra210:
> +-----------+--------------+-------------+------------+
> | Operation | Size [bytes] | Before [us] | After [us] |
> +-----------+--------------+-------------+------------+
> | Map       | 128K         |         139 |         40 |
> |           |              |         136 |         34 |
> |           |              |         137 |         38 |
> |           |              |         136 |         36 |
> |           | 4M           |        3939 |       1163 |
> |           |              |        3730 |       2389 |
> |           |              |        3613 |        997 |
> |           |              |        3622 |       1620 |
> |           | ~18M         |       18635 |       4741 |
> |           |              |       19261 |       6550 |
> |           |              |       18473 |       9304 |
> |           |              |       18125 |       5120 |
> | Unmap     | 128K         |         128 |          7 |
> |           |              |         122 |          8 |
> |           |              |         119 |         10 |
> |           |              |         123 |         12 |
> |           | 4M           |        3829 |        151 |
> |           |              |        3964 |        150 |
> |           |              |        3908 |        145 |
> |           |              |        3875 |        155 |
> |           | ~18M         |       18570 |        683 |
> |           |              |       18473 |        806 |
> |           |              |       21020 |        643 |
> |           |              |       21764 |        652 |
> +-----------+--------------+-------------+------------+
> The values are obtained by surrounding the calls to iommu_map_sg()
> (with default_iommu_map_sg() helper used as .map_sg() callback) and
> iommu_unmap() with ktime-based time measurement code. Taken 4 samples
> of every buffer size. ~18M means around 17-19M due do the variance
> in requested buffer sizes.

Those are pretty impressive numbers.

Thierry

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

  parent reply	other threads:[~2015-09-29  9:27 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-29  5:25 [RFC PATCH 0/3] iommu: Add range flush operation Tomasz Figa
2015-09-29  5:25 ` Tomasz Figa
     [not found] ` <1443504379-31841-1-git-send-email-tfiga-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
2015-09-29  5:25   ` [RFC PATCH 1/3] iommu: Add support for out of band flushing Tomasz Figa
2015-09-29  5:25     ` Tomasz Figa
     [not found]     ` <1443504379-31841-2-git-send-email-tfiga-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
2015-09-29  9:32       ` Thierry Reding
2015-09-29  9:32         ` Thierry Reding
     [not found]         ` <20150929093257.GE9460-AwZRO8vwLAwmlAP/+Wk3EA@public.gmane.org>
2015-09-29 11:56           ` Tomasz Figa
2015-09-29 11:56             ` Tomasz Figa
2015-09-29  5:25   ` [RFC PATCH 2/3] memory: tegra: add TLB cache line size Tomasz Figa
2015-09-29  5:25     ` Tomasz Figa
     [not found]     ` <1443504379-31841-3-git-send-email-tfiga-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
2015-09-29  9:43       ` Thierry Reding
2015-09-29  9:43         ` Thierry Reding
     [not found]         ` <20150929094348.GF9460-AwZRO8vwLAwmlAP/+Wk3EA@public.gmane.org>
2015-09-29 12:11           ` Tomasz Figa
2015-09-29 12:11             ` Tomasz Figa
2015-09-29  5:25   ` [RFC PATCH 3/3] iommu/tegra-smmu: Make the driver use out of band flushing Tomasz Figa
2015-09-29  5:25     ` Tomasz Figa
2015-09-29  9:27   ` Thierry Reding [this message]
2015-09-29  9:27     ` [RFC PATCH 0/3] iommu: Add range flush operation Thierry Reding
     [not found]     ` <20150929092714.GD9460-AwZRO8vwLAwmlAP/+Wk3EA@public.gmane.org>
2015-09-29 11:54       ` Tomasz Figa
2015-09-29 11:54         ` Tomasz Figa
2015-09-29 12:22       ` Joerg Roedel
2015-09-29 12:22         ` Joerg Roedel
2015-09-29 12:20   ` Joerg Roedel
2015-09-29 12:20     ` Joerg Roedel
2015-09-29 14:20   ` Robin Murphy
2015-09-29 14:20     ` Robin Murphy
     [not found]     ` <560A9E36.9030903-5wv7dgnIgG8@public.gmane.org>
2015-09-29 14:32       ` Russell King - ARM Linux
2015-09-29 14:32         ` Russell King - ARM Linux
     [not found]         ` <20150929143241.GI21513-l+eeeJia6m9vn6HldHNs0ANdhmdF6hFW@public.gmane.org>
2015-09-29 16:27           ` Robin Murphy
2015-09-29 16:27             ` Robin Murphy
     [not found]             ` <560ABBE0.8020805-5wv7dgnIgG8@public.gmane.org>
2015-09-29 16:40               ` Russell King - ARM Linux
2015-09-29 16:40                 ` Russell King - ARM Linux
     [not found]                 ` <20150929164014.GL21513-l+eeeJia6m9vn6HldHNs0ANdhmdF6hFW@public.gmane.org>
2015-09-29 17:13                   ` Robin Murphy
2015-09-29 17:13                     ` Robin Murphy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150929092714.GD9460@ulmo.nvidia.com \
    --to=thierry.reding-re5jqeeqqe8avxtiumwx3w@public.gmane.org \
    --cc=a.motakis-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org \
    --cc=arnd-r2nGTMty4D4@public.gmane.org \
    --cc=gnurou-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-tegra-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=mperttunen-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org \
    --cc=nicolas.iooss_linux-oWGTIYur0i8@public.gmane.org \
    --cc=ohaugan-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org \
    --cc=paul-DWxLp4Yu+b8AvxtiuMwx3w@public.gmane.org \
    --cc=rmk+kernel-lFZ/pmaqli7XmaaqVzeoHQ@public.gmane.org \
    --cc=swarren-3lzwWm7+Weoh9ZMKESR00Q@public.gmane.org \
    --cc=tfiga-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org \
    --cc=tomeu.vizoso-ZGY8ohtN/8qB+jHODAdFcQ@public.gmane.org \
    --cc=vince.h-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org \
    --cc=will.deacon-5wv7dgnIgG8@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.