Re: Generic IOMMU pooled allocator

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Sowmini Varadhan <sowmini.varadhan@oracle.com>
To: cascardo@linux.vnet.ibm.com
Cc: aik@au1.ibm.com, aik@ozlabs.ru, anton@au1.ibm.com,
	paulus@samba.org, sparclinux@vger.kernel.org,
	linuxppc-dev@lists.ozlabs.org, David Miller <davem@davemloft.net>
Subject: Re: Generic IOMMU pooled allocator
Date: Thu, 26 Mar 2015 06:56:01 -0400	[thread overview]
Message-ID: <20150326105601.GK31861@oracle.com> (raw)
In-Reply-To: <20150326004342.GB4925@oc0812247204.ltc.br.ibm.com>

On (03/25/15 21:43), cascardo@linux.vnet.ibm.com wrote:
> However, when using large TCP send/recv (I used uperf with 64KB
> writes/reads), I noticed that on the transmit side, largealloc is not
> used, but on the receive side, cxgb4 almost only uses largealloc, while
> qlge seems to have a 1/1 usage or largealloc/non-largealloc mappings.
> When turning GRO off, that ratio is closer to 1/10, meaning there is
> still some fair use of largealloc in that scenario.
> 
> I confess my experiments are not complete. I would like to test a couple
> of other drivers as well, including mlx4_en and bnx2x, and test with
> small packet sizes. I suspected that MTU size could make a difference,
> but in the case of ICMP, with MTU 9000 and payload of 8000 bytes, I
> didn't notice any significant hit of largepool with either qlge or
> cxgb4.

I guess we also need to consider the "average use-case", i.e., 
something that interleaves small packets and interactive data with
jumbo/bulk data.. in those cases, the largepool would not get many
hits, and might actually be undesirable?

> But I believe that on the receive side, all drivers should map entire
> pages, using some allocation strategy similar to mlx4_en, in order to
> avoid DMA mapping all the time. 

good point. I think in the early phase of my perf investigation,
it was brought up that Solaris does pre-mapped DMA buffers (they
have to do this carefully, to avoid resource-starvation vulnerabilities-
see http://www.spinics.net/lists/sparclinux/msg13217.html
and threads leading to it..

This is  not something that the common iommu-arena allocator
can/should get involved in, of course. The scope of the arena-allocator
is much more rigorously defined.

I dont know if there is a way to set up a generalized DMA premapped
buffer infra for linux today.

fwiw, when I instrumented this for solaris (there are hooks to disable
the pre-mapped bufferes) the impact on a T5-2 (8 sockets, 2 numa nodes,
64 cpus) was not very significant for a single 10G ixgbe port- approx
8 Gbps instead of 9.X Gbps. I think the DMA buffer pre-mapping is
only significant when you start trying to scale to multiple ethernet 
ports.

--Sowmini

WARNING: multiple messages have this Message-ID (diff)

From: Sowmini Varadhan <sowmini.varadhan@oracle.com>
To: cascardo@linux.vnet.ibm.com
Cc: aik@au1.ibm.com, aik@ozlabs.ru, anton@au1.ibm.com,
	paulus@samba.org, sparclinux@vger.kernel.org,
	linuxppc-dev@lists.ozlabs.org, David Miller <davem@davemloft.net>
Subject: Re: Generic IOMMU pooled allocator
Date: Thu, 26 Mar 2015 10:56:01 +0000	[thread overview]
Message-ID: <20150326105601.GK31861@oracle.com> (raw)
In-Reply-To: <20150326004342.GB4925@oc0812247204.ltc.br.ibm.com>

On (03/25/15 21:43), cascardo@linux.vnet.ibm.com wrote:
> However, when using large TCP send/recv (I used uperf with 64KB
> writes/reads), I noticed that on the transmit side, largealloc is not
> used, but on the receive side, cxgb4 almost only uses largealloc, while
> qlge seems to have a 1/1 usage or largealloc/non-largealloc mappings.
> When turning GRO off, that ratio is closer to 1/10, meaning there is
> still some fair use of largealloc in that scenario.
> 
> I confess my experiments are not complete. I would like to test a couple
> of other drivers as well, including mlx4_en and bnx2x, and test with
> small packet sizes. I suspected that MTU size could make a difference,
> but in the case of ICMP, with MTU 9000 and payload of 8000 bytes, I
> didn't notice any significant hit of largepool with either qlge or
> cxgb4.

I guess we also need to consider the "average use-case", i.e., 
something that interleaves small packets and interactive data with
jumbo/bulk data.. in those cases, the largepool would not get many
hits, and might actually be undesirable?

> But I believe that on the receive side, all drivers should map entire
> pages, using some allocation strategy similar to mlx4_en, in order to
> avoid DMA mapping all the time. 

good point. I think in the early phase of my perf investigation,
it was brought up that Solaris does pre-mapped DMA buffers (they
have to do this carefully, to avoid resource-starvation vulnerabilities-
see http://www.spinics.net/lists/sparclinux/msg13217.html
and threads leading to it..

This is  not something that the common iommu-arena allocator
can/should get involved in, of course. The scope of the arena-allocator
is much more rigorously defined.

I dont know if there is a way to set up a generalized DMA premapped
buffer infra for linux today.

fwiw, when I instrumented this for solaris (there are hooks to disable
the pre-mapped bufferes) the impact on a T5-2 (8 sockets, 2 numa nodes,
64 cpus) was not very significant for a single 10G ixgbe port- approx
8 Gbps instead of 9.X Gbps. I think the DMA buffer pre-mapping is
only significant when you start trying to scale to multiple ethernet 
ports.

--Sowmini

next prev parent reply	other threads:[~2015-03-26 10:56 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-19  2:25 Generic IOMMU pooled allocator David Miller
2015-03-19  2:25 ` David Miller
2015-03-19  2:46 ` Benjamin Herrenschmidt
2015-03-19  2:46   ` Benjamin Herrenschmidt
2015-03-19  2:50   ` David Miller
2015-03-19  2:50     ` David Miller
2015-03-19  3:01 ` Benjamin Herrenschmidt
2015-03-19  3:01   ` Benjamin Herrenschmidt
2015-03-19  5:27   ` Alexey Kardashevskiy
2015-03-19  5:27     ` Alexey Kardashevskiy
2015-03-19 13:34     ` Sowmini Varadhan
2015-03-19 13:34       ` Sowmini Varadhan
2015-03-22 19:27     ` Sowmini Varadhan
2015-03-22 19:27       ` Sowmini Varadhan
2015-03-23 16:29       ` David Miller
2015-03-23 16:29         ` David Miller
2015-03-23 16:54         ` Sowmini Varadhan
2015-03-23 16:54           ` Sowmini Varadhan
2015-03-23 19:05           ` David Miller
2015-03-23 19:05             ` David Miller
2015-03-23 19:09             ` Sowmini Varadhan
2015-03-23 19:09               ` Sowmini Varadhan
2015-03-23 22:21             ` Benjamin Herrenschmidt
2015-03-23 22:21               ` Benjamin Herrenschmidt
2015-03-23 23:08               ` Sowmini Varadhan
2015-03-23 23:08                 ` Sowmini Varadhan
2015-03-23 23:29                 ` chase rayfield
2015-03-24  0:47                 ` Benjamin Herrenschmidt
2015-03-24  0:47                   ` Benjamin Herrenschmidt
2015-03-24  1:11                   ` Sowmini Varadhan
2015-03-24  1:11                     ` Sowmini Varadhan
2015-03-24  1:44               ` David Miller
2015-03-24  1:44                 ` David Miller
2015-03-24  1:57                 ` Sowmini Varadhan
2015-03-24  1:57                   ` Sowmini Varadhan
2015-03-24  2:08                 ` Benjamin Herrenschmidt
2015-03-24  2:08                   ` Benjamin Herrenschmidt
2015-03-24  2:15                   ` David Miller
2015-03-24  2:15                     ` David Miller
2015-03-26  0:43                     ` cascardo
2015-03-26  0:43                       ` cascardo
2015-03-26  0:49                       ` Benjamin Herrenschmidt
2015-03-26  0:49                         ` Benjamin Herrenschmidt
2015-03-26 10:56                       ` Sowmini Varadhan [this message]
2015-03-26 10:56                         ` Sowmini Varadhan
2015-03-26 22:51                       ` David Miller
2015-03-26 23:00                         ` David Miller
2015-03-26 23:51                         ` Benjamin Herrenschmidt
2015-03-26 23:51                           ` Benjamin Herrenschmidt
2015-03-23 22:36             ` Benjamin Herrenschmidt
2015-03-23 22:36               ` Benjamin Herrenschmidt
2015-03-23 23:19               ` Sowmini Varadhan
2015-03-23 23:19                 ` Sowmini Varadhan
2015-03-24  0:48                 ` Benjamin Herrenschmidt
2015-03-24  0:48                   ` Benjamin Herrenschmidt
2015-03-23 22:25           ` Benjamin Herrenschmidt
2015-03-23 22:25             ` Benjamin Herrenschmidt
2015-03-22 19:36 ` Arnd Bergmann
2015-03-22 19:36   ` Arnd Bergmann
2015-03-22 22:02   ` Benjamin Herrenschmidt
2015-03-22 22:02     ` Benjamin Herrenschmidt
2015-03-22 22:07     ` Sowmini Varadhan
2015-03-22 22:07       ` Sowmini Varadhan
2015-03-22 22:22       ` Benjamin Herrenschmidt
2015-03-22 22:22         ` Benjamin Herrenschmidt
2015-03-23  6:04         ` Arnd Bergmann
2015-03-23  6:04           ` Arnd Bergmann
2015-03-23 11:04           ` Benjamin Herrenschmidt
2015-03-23 11:04             ` Benjamin Herrenschmidt
2015-03-23 18:45             ` Arnd Bergmann
2015-03-23 18:45               ` Arnd Bergmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150326105601.GK31861@oracle.com \
    --to=sowmini.varadhan@oracle.com \
    --cc=aik@au1.ibm.com \
    --cc=aik@ozlabs.ru \
    --cc=anton@au1.ibm.com \
    --cc=cascardo@linux.vnet.ibm.com \
    --cc=davem@davemloft.net \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=paulus@samba.org \
    --cc=sparclinux@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.