From: "Song Bao Hua (Barry Song)" <song.bao.hua@hisilicon.com>
To: Robin Murphy <robin.murphy@arm.com>,
"iommu@lists.linux-foundation.org"
<iommu@lists.linux-foundation.org>, "hch@lst.de" <hch@lst.de>,
"m.szyprowski@samsung.com" <m.szyprowski@samsung.com>
Cc: "shuah@kernel.org" <shuah@kernel.org>,
"will@kernel.org" <will@kernel.org>,
Linuxarm <linuxarm@huawei.com>,
"linux-kselftest@vger.kernel.org"
<linux-kselftest@vger.kernel.org>
Subject: RE: [PATCH 1/2] dma-mapping: add benchmark support for streaming DMA APIs
Date: Sat, 31 Oct 2020 09:44:58 +0000 [thread overview]
Message-ID: <ebf48b2c57384f37aaaeb02ea0d3beff@hisilicon.com> (raw)
In-Reply-To: <472cf21a-5196-dbb5-caef-c1c0d982fe1c@arm.com>
> -----Original Message-----
> From: Robin Murphy [mailto:robin.murphy@arm.com]
> Sent: Saturday, October 31, 2020 4:48 AM
> To: Song Bao Hua (Barry Song) <song.bao.hua@hisilicon.com>;
> iommu@lists.linux-foundation.org; hch@lst.de; m.szyprowski@samsung.com
> Cc: joro@8bytes.org; will@kernel.org; shuah@kernel.org; Linuxarm
> <linuxarm@huawei.com>; linux-kselftest@vger.kernel.org
> Subject: Re: [PATCH 1/2] dma-mapping: add benchmark support for streaming
> DMA APIs
>
> On 2020-10-29 21:39, Song Bao Hua (Barry Song) wrote:
> [...]
> >>> +struct map_benchmark {
> >>> + __u64 map_nsec;
> >>> + __u64 unmap_nsec;
> >>> + __u32 threads; /* how many threads will do map/unmap in parallel
> */
> >>> + __u32 seconds; /* how long the test will last */
> >>> + int node; /* which numa node this benchmark will run on */
> >>> + __u64 expansion[10]; /* For future use */
> >>> +};
> >>
> >> I'm no expert on userspace ABIs (and what little experience I do have
> >> is mostly of Win32...), so hopefully someone else will comment if
> >> there's anything of concern here. One thing I wonder is that there's
> >> a fair likelihood of functionality evolving here over time, so might
> >> it be appropriate to have some sort of explicit versioning parameter
> >> for robustness?
> >
> > I copied that from gup_benchmark. There is no this kind of code to
> > compare version.
> > I believe there is a likelihood that kernel module is changed but
> > users are still using old userspace tool, this might lead to the
> > incompatible data structure.
> > But not sure if it is a big problem :-)
>
> Yeah, like I say I don't really have a good feeling for what would be best here,
> I'm just thinking of what I do know and wary of the potential for a "640 bits
> ought to be enough for anyone" issue ;)
>
> >>> +struct map_benchmark_data {
> >>> + struct map_benchmark bparam;
> >>> + struct device *dev;
> >>> + struct dentry *debugfs;
> >>> + atomic64_t total_map_nsecs;
> >>> + atomic64_t total_map_loops;
> >>> + atomic64_t total_unmap_nsecs;
> >>> + atomic64_t total_unmap_loops;
> >>> +};
> >>> +
> >>> +static int map_benchmark_thread(void *data) {
> >>> + struct page *page;
> >>> + dma_addr_t dma_addr;
> >>> + struct map_benchmark_data *map = data;
> >>> + int ret = 0;
> >>> +
> >>> + page = alloc_page(GFP_KERNEL);
> >>> + if (!page)
> >>> + return -ENOMEM;
> >>> +
> >>> + while (!kthread_should_stop()) {
> >>> + ktime_t map_stime, map_etime, unmap_stime, unmap_etime;
> >>> +
> >>> + map_stime = ktime_get();
> >>> + dma_addr = dma_map_page(map->dev, page, 0, PAGE_SIZE,
> >> DMA_BIDIRECTIONAL);
> >>
> >> Note that for a non-coherent device, this will give an underestimate
> >> of the real-world overhead of BIDIRECTIONAL or TO_DEVICE mappings,
> >> since the page will never be dirty in the cache (except possibly the
> >> very first time through).
> >
> > Agreed. I'd like to add a DIRECTION parameter like "-d 0", "-d 1"
> > after we have this basic framework.
>
> That wasn't so much about the direction itself, just that if it's anything other
> than FROM_DEVICE, we should probably do something to dirty the buffer by a
> reasonable amount before each map. Otherwise the measured performance is
> going to be unrealistic on many systems.
Maybe put a memset(buf, 0, PAGE_SIZE) before dma_map will help ?
>
> [...]
> >>> + atomic64_add((long long)ktime_to_ns(ktime_sub(unmap_etime,
> >> unmap_stime)),
> >>> + &map->total_unmap_nsecs);
> >>> + atomic64_inc(&map->total_map_loops);
> >>> + atomic64_inc(&map->total_unmap_loops);
> >>
> >> I think it would be worth keeping track of the variances as well - it
> >> can be hard to tell if a reasonable-looking average is hiding
> >> terrible worst-case behaviour.
> >
> > This is a sensible requirement. I believe it is better to be handled
> > by the existing kernel tracing method.
> >
> > Maybe we need a histogram like:
> > Delay sample count
> > 1-2us 1000 ***
> > 2-3us 2000 *******
> > 3-4us 100 *
> > .....
> > This will be more precise than the maximum latency in the worst case.
> >
> > I'd believe this can be handled by:
> > tracepoint A
> > Map
> > Tracepoint B
> >
> > Tracepoint C
> > Unmap
> > Tracepoint D
> >
> > Let the userspace ebpf to draw the histogram for the delta of B-A and D-C.
> >
> > So I am planning to put this requirement into todo list and write an
> > userspace ebpf/bcc script for histogram and put in tools/ directory.
> >
> > Please give your comments on this.
>
> Right, I wasn't suggesting trying to homebrew a full data collection system here
> - I agree there are better tools for that already - just that it's basically free to
> track a sum of squares alongside a sum, so that we can trivially calculate a
> useful variance (or standard
> deviation) figure alongside the mean at the end.
For this case, I am not sure if it is true. Unless we expose more data such as
min, max etc. to userspace, it makes no difference whether total_(un)map_nsecs
and total_(un)map_loops are exposed or not.
As
total loops = seconds / (avg_map_latency + avg_unmap_latency);
total_map_nsecs = total loop count * avg_map_latency
total_unmap_nsecs = total loop count * avg_unmap_latency
all of seconds, avg_unmap_latency, avg_unmap_latency are known by
userspace tool.
>
> [...]
> >>> + for (i = 0; i < threads; i++) {
> >>> + tsk[i] = kthread_create_on_node(map_benchmark_thread, map,
> >>> + map->bparam.node, "dma-map-benchmark/%d", i);
> >>> + if (IS_ERR(tsk[i])) {
> >>> + dev_err(map->dev, "create dma_map thread failed\n");
> >>> + return PTR_ERR(tsk[i]);
> >>> + }
> >>> +
> >>> + if (node != NUMA_NO_NODE && node_online(node))
> >>> + kthread_bind_mask(tsk[i], cpu_mask);
> >>> +
> >>> + wake_up_process(tsk[i]);
> >>
> >> Might it be better to create all the threads first, *then* start
> >> kicking them?
> >
> > The difficulty is that we don't know how many threads we should create
> > as the thread number is a parameter to test the contention of IOMMU driver.
> > In my test case, I'd like to test things like One thread Two threads
> > ....
> > 8 threads
> > 12 threads
> > 16 threads...
> >
> > On the other hand, I think it is better to drop the memory of
> > task_struct of those test threads while we are not testing dma map.
>
> I simply meant splitting the loop here into two - one to create the threads and
> set their affinity, then another to wake them all up - so we don't start
> unnecessarily thrashing the system while we're still trying to set up the rest of
> the test ;)
Agreed.
>
> Robin.
Thanks
Barry
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu
next prev parent reply other threads:[~2020-10-31 9:45 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-10-27 3:53 [PATCH 0/2] dma-mapping: provide a benchmark for streaming DMA mapping Barry Song
2020-10-27 3:53 ` [PATCH 1/2] dma-mapping: add benchmark support for streaming DMA APIs Barry Song
2020-10-29 19:38 ` Robin Murphy
2020-10-29 21:39 ` Song Bao Hua (Barry Song)
2020-10-30 15:48 ` Robin Murphy
2020-10-31 9:44 ` Song Bao Hua (Barry Song) [this message]
2020-10-31 21:42 ` Song Bao Hua (Barry Song)
2020-10-27 3:53 ` [PATCH 2/2] selftests/dma: add test application for DMA_MAP_BENCHMARK Barry Song
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ebf48b2c57384f37aaaeb02ea0d3beff@hisilicon.com \
--to=song.bao.hua@hisilicon.com \
--cc=hch@lst.de \
--cc=iommu@lists.linux-foundation.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linuxarm@huawei.com \
--cc=m.szyprowski@samsung.com \
--cc=robin.murphy@arm.com \
--cc=shuah@kernel.org \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox