From: Jesper Dangaard Brouer <brouer@redhat.com>
To: Christoph Hellwig <hch@lst.de>
Cc: Robin Murphy <robin.murphy@arm.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
iommu@lists.linux-foundation.org, tariqt@mellanox.com,
ilias.apalodimas@linaro.org, toke@toke.dk,
Linux List Kernel Mailing <linux-kernel@vger.kernel.org>,
brouer@redhat.com
Subject: Re: [RFC] avoid indirect calls for DMA direct mappings
Date: Fri, 7 Dec 2018 16:44:35 +0100 [thread overview]
Message-ID: <20181207164435.18f8ffed@redhat.com> (raw)
In-Reply-To: <20181207012141.GA4256@lst.de>
On Fri, 7 Dec 2018 02:21:42 +0100
Christoph Hellwig <hch@lst.de> wrote:
> On Thu, Dec 06, 2018 at 08:24:38PM +0000, Robin Murphy wrote:
> > On 06/12/2018 20:00, Christoph Hellwig wrote:
> >> On Thu, Dec 06, 2018 at 06:54:17PM +0000, Robin Murphy wrote:
> >>> I'm pretty sure we used to assign dummy_dma_ops explicitly to devices at
> >>> the point we detected the ACPI properties are wrong - that shouldn't be too
> >>> much of a headache to go back to.
> >>
> >> Ok. I've cooked up a patch to use NULL as the go direct marker.
> >> This cleans up a few things nicely, but also means we now need to
> >> do the bypass scheme for all ops, not just the fast path. But we
> >> probably should just move the slow path ops out of line anyway,
> >> so I'm not worried about it. This has survived some very basic
> >> testing on x86, and really needs to be cleaned up and split into
> >> multiple patches..
> >
> > I've also just finished hacking something up to keep the arm64 status quo -
> > I'll need to actually test it tomorrow, but the overall diff looks like the
> > below.
>
> Nice. I created a branch that picked up your bits and also the ideas
> from Linus, and the result looks reall nice. I'll still need a signoff
> for your bits, though.
>
> Jesper, can you give this a spin if it changes the number even further?
>
> git://git.infradead.org/users/hch/misc.git dma-direct-calls.2
>
> http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-direct-calls.2
I'll test it soon...
I looked at my perf stat recording on my existing tests[1] and there
seems to be significantly more I-cache usage.
Copy-paste from my summary[1]:
[1] https://github.com/xdp-project/xdp-project/blob/master/areas/dma/dma01_test_hellwig_direct_dma.org#summary-of-results
* Summary of results
Using XDP_REDIRECT between drivers RX ixgbe(10G) redirect TX i40e(40G),
via BPF devmap (used samples/bpf/xdp_redirect_map) . (Note choose
higher TX link-speed to assure that we don't to have a TX bottleneck).
The baseline-kernel is at commit https://git.kernel.org/torvalds/c/ef78e5ec9214,
which is commit just before Hellwigs changes in this tree.
Performance numbers in packets/sec (XDP_REDIRECT ixgbe -> i40e):
- 11913154 (11,913,154) pps - baseline compiled without retpoline
- 7438283 (7,438,283) pps - regression due to CONFIG_RETPOLINE
- 9610088 (9,610,088) pps - mitigation via Hellwig dma-direct-calls
>From the inst per cycle, it is clear that retpolines are stalling the CPU
pipeline:
| pps | insn per cycle |
|------------+----------------|
| 11,913,154 | 2.39 |
| 7,438,283 | 1.54 |
| 9,610,088 | 2.04 |
Strangely the Instruction-Cache is also under heavier pressure:
| pps | l2_rqsts.all_code_rd | l2_rqsts.code_rd_hit | l2_rqsts.code_rd_miss |
|------------+----------------------+----------------------+-----------------------|
| 11,913,154 | 874,547 | 742,335 | 132,198 |
| 7,438,283 | 649,513 | 547,581 | 101,945 |
| 9,610,088 | 2,568,064 | 2,001,369 | 566,683 |
| | | | |
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
WARNING: multiple messages have this Message-ID (diff)
From: Jesper Dangaard Brouer <brouer@redhat.com>
To: Christoph Hellwig <hch@lst.de>
Cc: Robin Murphy <robin.murphy@arm.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
iommu@lists.linux-foundation.org, tariqt@mellanox.com,
ilias.apalodimas@linaro.org, toke@toke.dk,
Linux List Kernel Mailing <linux-kernel@vger.kernel.org>,
brouer@redhat.com
Subject: Re: [RFC] avoid indirect calls for DMA direct mappings
Date: Fri, 7 Dec 2018 16:44:35 +0100 [thread overview]
Message-ID: <20181207164435.18f8ffed@redhat.com> (raw)
In-Reply-To: <20181207012141.GA4256@lst.de>
On Fri, 7 Dec 2018 02:21:42 +0100
Christoph Hellwig <hch@lst.de> wrote:
> On Thu, Dec 06, 2018 at 08:24:38PM +0000, Robin Murphy wrote:
> > On 06/12/2018 20:00, Christoph Hellwig wrote:
> >> On Thu, Dec 06, 2018 at 06:54:17PM +0000, Robin Murphy wrote:
> >>> I'm pretty sure we used to assign dummy_dma_ops explicitly to devices at
> >>> the point we detected the ACPI properties are wrong - that shouldn't be too
> >>> much of a headache to go back to.
> >>
> >> Ok. I've cooked up a patch to use NULL as the go direct marker.
> >> This cleans up a few things nicely, but also means we now need to
> >> do the bypass scheme for all ops, not just the fast path. But we
> >> probably should just move the slow path ops out of line anyway,
> >> so I'm not worried about it. This has survived some very basic
> >> testing on x86, and really needs to be cleaned up and split into
> >> multiple patches..
> >
> > I've also just finished hacking something up to keep the arm64 status quo -
> > I'll need to actually test it tomorrow, but the overall diff looks like the
> > below.
>
> Nice. I created a branch that picked up your bits and also the ideas
> from Linus, and the result looks reall nice. I'll still need a signoff
> for your bits, though.
>
> Jesper, can you give this a spin if it changes the number even further?
>
> git://git.infradead.org/users/hch/misc.git dma-direct-calls.2
>
> http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-direct-calls.2
I'll test it soon...
I looked at my perf stat recording on my existing tests[1] and there
seems to be significantly more I-cache usage.
Copy-paste from my summary[1]:
[1] https://github.com/xdp-project/xdp-project/blob/master/areas/dma/dma01_test_hellwig_direct_dma.org#summary-of-results
* Summary of results
Using XDP_REDIRECT between drivers RX ixgbe(10G) redirect TX i40e(40G),
via BPF devmap (used samples/bpf/xdp_redirect_map) . (Note choose
higher TX link-speed to assure that we don't to have a TX bottleneck).
The baseline-kernel is at commit https://git.kernel.org/torvalds/c/ef78e5ec9214,
which is commit just before Hellwigs changes in this tree.
Performance numbers in packets/sec (XDP_REDIRECT ixgbe -> i40e):
- 11913154 (11,913,154) pps - baseline compiled without retpoline
- 7438283 (7,438,283) pps - regression due to CONFIG_RETPOLINE
- 9610088 (9,610,088) pps - mitigation via Hellwig dma-direct-calls
From the inst per cycle, it is clear that retpolines are stalling the CPU
pipeline:
| pps | insn per cycle |
|------------+----------------|
| 11,913,154 | 2.39 |
| 7,438,283 | 1.54 |
| 9,610,088 | 2.04 |
Strangely the Instruction-Cache is also under heavier pressure:
| pps | l2_rqsts.all_code_rd | l2_rqsts.code_rd_hit | l2_rqsts.code_rd_miss |
|------------+----------------------+----------------------+-----------------------|
| 11,913,154 | 874,547 | 742,335 | 132,198 |
| 7,438,283 | 649,513 | 547,581 | 101,945 |
| 9,610,088 | 2,568,064 | 2,001,369 | 566,683 |
| | | | |
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
next prev parent reply other threads:[~2018-12-07 15:44 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-12-06 15:37 [RFC] avoid indirect calls for DMA direct mappings Christoph Hellwig
2018-12-06 15:37 ` Christoph Hellwig
[not found] ` <20181206153720.10702-1-hch-jcswGhMUV9g@public.gmane.org>
2018-12-06 15:37 ` [PATCH] dma-mapping: bypass indirect calls for dma-direct Christoph Hellwig
2018-12-06 15:37 ` Christoph Hellwig
2018-12-06 17:40 ` Jesper Dangaard Brouer
2018-12-06 18:35 ` Christoph Hellwig
2018-12-06 17:43 ` [RFC] avoid indirect calls for DMA direct mappings Jesper Dangaard Brouer
[not found] ` <20181206184351.4d9ece54-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-12-06 18:29 ` Nadav Amit
2018-12-06 18:29 ` Nadav Amit
[not found] ` <A9F9D621-445E-4F3C-95FE-3963A3DAEF98-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2018-12-06 18:45 ` Christoph Hellwig
2018-12-06 18:45 ` Christoph Hellwig
2018-12-06 18:28 ` Linus Torvalds
2018-12-06 18:30 ` Linus Torvalds
2018-12-06 18:43 ` Christoph Hellwig
2018-12-06 18:43 ` Christoph Hellwig
[not found] ` <20181206184330.GB30039-jcswGhMUV9g@public.gmane.org>
2018-12-06 18:51 ` Linus Torvalds
2018-12-06 18:51 ` Linus Torvalds
2018-12-06 18:54 ` Robin Murphy
2018-12-06 18:54 ` Robin Murphy
2018-12-06 20:00 ` Christoph Hellwig
2018-12-06 20:00 ` Christoph Hellwig
[not found] ` <20181206200006.GA31548-jcswGhMUV9g@public.gmane.org>
2018-12-06 20:24 ` Robin Murphy
2018-12-06 20:24 ` Robin Murphy
2018-12-07 1:21 ` Christoph Hellwig
2018-12-07 15:44 ` Jesper Dangaard Brouer [this message]
2018-12-07 15:44 ` Jesper Dangaard Brouer
2018-12-07 16:05 ` Jesper Dangaard Brouer
2018-12-07 16:05 ` Jesper Dangaard Brouer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20181207164435.18f8ffed@redhat.com \
--to=brouer@redhat.com \
--cc=hch@lst.de \
--cc=ilias.apalodimas@linaro.org \
--cc=iommu@lists.linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=robin.murphy@arm.com \
--cc=tariqt@mellanox.com \
--cc=toke@toke.dk \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.