All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alexander Duyck <alexander.h.duyck@intel.com>
To: Andi Kleen <andi@firstfloor.org>
Cc: konrad.wilk@oracle.com, tglx@linutronix.de, mingo@redhat.com,
	hpa@zytor.com, rob@landley.net, akpm@linux-foundation.org,
	joerg.roedel@amd.com, bhelgaas@google.com, shuahkhan@gmail.com,
	linux-kernel@vger.kernel.org, devel@linuxdriverproject.org,
	x86@kernel.org, torvalds@linux-foundation.org
Subject: Re: [RFC PATCH 0/7] Improve swiotlb performance by using physical addresses
Date: Fri, 05 Oct 2012 16:23:30 -0700	[thread overview]
Message-ID: <506F6BF2.8030500@intel.com> (raw)
In-Reply-To: <20121005200245.GQ16230@one.firstfloor.org>

On 10/05/2012 01:02 PM, Andi Kleen wrote:
>> I was thinking the issue was all of the calls to relatively small
>> functions occurring in quick succession.  The way most of this code is
>> setup it seems like it is one small function call in turn calling
>> another, and then another, and I would imagine the code fragmentation
>> can have a significant negative impact.
> Maybe. Can you just inline everything and see if it it's faster then?
>
> This was out of line when the "text cost at all costs" drive was still
> envogue, but luckily we're not doing  that anymore.
>
> -Andiu
>

Inlining everything did speed things up a bit, but I still didn't reach
the same speed I achieved using the patch set.  However I did notice the
resulting swiotlb code was considerably larger.

I did a bit more digging and the issue may actually be simple repetition
of the calls.  By my math it would seem we would end up calling
is_swiotlb_buffer 3 times per packet in the routing test case, once in
sync_for_cpu and once for sync_for_device in the Rx cleanup path, and
once in unmap_page in the Tx cleanup path.  Each call to
is_swiotlb_buffer will result in 2 calls to __phys_addr.  In freeing the
skb we end up doing a call to virt_to_head_page which will call
__phys_addr.  In addition we end up mapping the skb using map_single so
we end up using __phys_addr to do a virt_to_page translation in the
xmit_frame_ring path, and then call __phys_addr when we check
dma_mapping_error.  So in total that ends up being 3 calls to
is_swiotlb_buffer, and 9 calls to __phys_addr per packet routed.

With the patches the is_swiotlb_buffer function, which was 25 lines of
assembly, is replaced with 8 lines of assembly and becomes inline.  In
addition we drop the number of calls to __phys_addr from 9 to 2 by
dropping them all from swiotlb.  By my math I am probably saving about
120 instructions per packet.  I suspect all of that would probably be
cutting the number of instructions per packet enough to probably account
for a 5% difference when you consider I am running at about 1.5Mpps per
core on a 2.7Ghz processor.

Thanks,

Alex

  reply	other threads:[~2012-10-05 23:23 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-04  0:38 [RFC PATCH 0/7] Improve swiotlb performance by using physical addresses Alexander Duyck
2012-10-04  0:38 ` [RFC PATCH 1/7] swiotlb: Instead of tracking the end of the swiotlb region just calculate it Alexander Duyck
2012-10-04 13:01   ` Konrad Rzeszutek Wilk
2012-10-04 15:54     ` Alexander Duyck
2012-10-04 16:31       ` Konrad Rzeszutek Wilk
2012-10-04  0:38 ` [RFC PATCH 2/7] swiotlb: Make io_tlb_start a physical address instead of a virtual address Alexander Duyck
2012-10-04 13:18   ` Konrad Rzeszutek Wilk
2012-10-04 17:11     ` Alexander Duyck
2012-10-04 17:19       ` Konrad Rzeszutek Wilk
2012-10-04 20:22         ` Alexander Duyck
2012-10-09 16:43           ` Konrad Rzeszutek Wilk
2012-10-09 19:11             ` Alexander Duyck
2012-10-04  0:38 ` [RFC PATCH 3/7] swiotlb: Make io_tlb_overflow_buffer a physical address Alexander Duyck
2012-10-04  0:39 ` [RFC PATCH 4/7] swiotlb: Return physical addresses when calling swiotlb_tbl_map_single Alexander Duyck
2012-10-04  0:39 ` [RFC PATCH 5/7] swiotlb: Use physical addresses for swiotlb_tbl_unmap_single Alexander Duyck
2012-10-04  0:39 ` [RFC PATCH 6/7] swiotlb: Use physical addresses instead of virtual in swiotlb_tbl_sync_single Alexander Duyck
2012-10-04  0:39 ` [RFC PATCH 7/7] swiotlb: Do not export swiotlb_bounce since there are no external consumers Alexander Duyck
2012-10-04 12:55 ` [RFC PATCH 0/7] Improve swiotlb performance by using physical addresses Konrad Rzeszutek Wilk
2012-10-04 15:50   ` Alexander Duyck
2012-10-04 13:33 ` Konrad Rzeszutek Wilk
2012-10-04 17:57   ` Alexander Duyck
2012-10-05 16:55 ` Andi Kleen
2012-10-05 19:35   ` Alexander Duyck
2012-10-05 20:02     ` Andi Kleen
2012-10-05 23:23       ` Alexander Duyck [this message]
2012-10-06 17:57         ` Andi Kleen
2012-10-06 18:56           ` H. Peter Anvin
2012-10-08 15:43           ` Alexander Duyck
2012-10-09 19:05             ` Alexander Duyck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=506F6BF2.8030500@intel.com \
    --to=alexander.h.duyck@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=bhelgaas@google.com \
    --cc=devel@linuxdriverproject.org \
    --cc=hpa@zytor.com \
    --cc=joerg.roedel@amd.com \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=rob@landley.net \
    --cc=shuahkhan@gmail.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.