From: axboe@kernel.dk (Jens Axboe)
Subject: [PATCH] NVMe: Add rw_page support
Date: Fri, 14 Nov 2014 13:53:44 -0700 [thread overview]
Message-ID: <54666BD8.9010901@kernel.dk> (raw)
In-Reply-To: <alpine.LNX.2.00.1411141633180.4225@localhost.lm.intel.com>
On 11/14/2014 10:05 AM, Keith Busch wrote:
> On Fri, 14 Nov 2014, Jens Axboe wrote:
>> For the cases where you do indeed end up submitting multiple, it's even
>> more of a shame to bypass the normal IO path. There are various tricks
>> we can do in there to speed things up, like batched doorbell rings. And
>> if we kill that last alloc/free per IO, then I'd really be curious to
>> know why rw_page is faster. Seems it should be possible to fix that up
>> instead.
>
> Here's some perf data of just the kernel from two runs with a simple
> swap testing program. I'm a novice at interpreting this for comparison,
> so I'm not sure if this shows what we're looking for. The test ran for
> the same amount of time in both cases, but perf couted ~16% fewer events
> when using rw_page.
>
> With rw_page disabled:
>
> 7.33% swap [kernel.kallsyms] [k] page_fault
> 5.13% swap [kernel.kallsyms] [k] clear_page_c
> 4.46% swap [kernel.kallsyms] [k] __radix_tree_lookup
> 4.36% swap [kernel.kallsyms] [k] do_raw_spin_lock
> 2.63% swap [kernel.kallsyms] [k] handle_mm_fault
> 2.17% swap [kernel.kallsyms] [k] get_page_from_freelist
> 1.77% swap [kernel.kallsyms] [k] __swap_duplicate
> 1.53% swap [nvme] [k] nvme_queue_rq
> 1.38% swap [kernel.kallsyms] [k] intel_pmu_disable_all
> 1.37% swap [kernel.kallsyms] [k] put_page_testzero
> 1.19% swap [kernel.kallsyms] [k] __do_page_fault
> 1.05% swap [kernel.kallsyms] [k] _raw_spin_lock_irqsave
> 0.99% swap [kernel.kallsyms] [k] __free_one_page
> 0.97% swap [kernel.kallsyms] [k] swap_info_get
> 0.90% swap [kernel.kallsyms] [k] __alloc_pages_nodemask
> 0.80% swap [kernel.kallsyms] [k] radix_tree_insert
> 0.78% swap [kernel.kallsyms] [k] test_and_set_bit.constprop.90
> 0.74% swap [kernel.kallsyms] [k] __bt_get
> 0.71% swap [kernel.kallsyms] [k] sg_init_table
> 0.71% swap [kernel.kallsyms] [k] list_del
> 0.70% swap [kernel.kallsyms] [k] ____cache_alloc
> 0.67% swap [kernel.kallsyms] [k] __schedule
> 0.66% swap [kernel.kallsyms] [k] round_jiffies_common
> 0.63% swap [kernel.kallsyms] [k] __wait_on_bit
> 0.61% swap [kernel.kallsyms] [k] __rmqueue
> 0.60% swap [kernel.kallsyms] [k] vmacache_find
> 0.54% swap [kernel.kallsyms] [k] __blk_bios_map_sg
> 0.54% swap [kernel.kallsyms] [k] blk_mq_start_request
> 0.53% swap [kernel.kallsyms] [k] unmap_single_vma
> 0.52% swap [kernel.kallsyms] [k]
> __update_tg_runnable_avg.isra.23
> 0.52% swap [kernel.kallsyms] [k] __blk_mq_alloc_request
> 0.51% swap [kernel.kallsyms] [k] swiotlb_map_sg_attrs
> 0.49% swap [nvme] [k] nvme_alloc_iod
> 0.49% swap [kernel.kallsyms] [k] update_cfs_shares
> 0.47% swap [kernel.kallsyms] [k] __add_to_swap_cache
> 0.46% swap [kernel.kallsyms] [k] update_curr
> 0.46% swap [kernel.kallsyms] [k] swap_entry_free
> 0.45% swap [kernel.kallsyms] [k] swapin_readahead
> 0.45% swap [kernel.kallsyms] [k] __call_rcu.constprop.62
> 0.44% swap [kernel.kallsyms] [k] page_waitqueue
> 0.44% swap [kernel.kallsyms] [k] tag_get
> 0.43% swap [kernel.kallsyms] [k] next_zones_zonelist
> 0.43% swap [kernel.kallsyms] [k] kmem_cache_alloc
> 0.42% swap [nvme] [k] nvme_process_cq
>
> With rw_page enabled:
>
> 8.33% swap [kernel.kallsyms] [k] page_fault
> 6.36% swap [kernel.kallsyms] [k] clear_page_c
> 5.15% swap [kernel.kallsyms] [k] do_raw_spin_lock
> 5.10% swap [kernel.kallsyms] [k] __radix_tree_lookup
> 3.01% swap [kernel.kallsyms] [k] handle_mm_fault
> 2.57% swap [kernel.kallsyms] [k] get_page_from_freelist
> 2.06% swap [kernel.kallsyms] [k] __swap_duplicate
> 1.57% swap [kernel.kallsyms] [k] put_page_testzero
> 1.44% swap [kernel.kallsyms] [k] intel_pmu_disable_all
> 1.37% swap [kernel.kallsyms] [k] test_and_set_bit.constprop.90
> 1.20% swap [kernel.kallsyms] [k] _raw_spin_lock_irqsave
> 1.19% swap [kernel.kallsyms] [k] __free_one_page
> 1.15% swap [kernel.kallsyms] [k] radix_tree_insert
> 1.15% swap [kernel.kallsyms] [k] __do_page_fault
> 1.07% swap [kernel.kallsyms] [k] swap_info_get
> 0.89% swap [kernel.kallsyms] [k] __alloc_pages_nodemask
> 0.85% swap [kernel.kallsyms] [k] list_del
> 0.81% swap [kernel.kallsyms] [k] __bt_get
> 0.78% swap [nvme] [k] nvme_rw_page
> 0.74% swap [kernel.kallsyms] [k] __rmqueue
> 0.74% swap [kernel.kallsyms] [k] __wait_on_bit
> 0.69% swap [kernel.kallsyms] [k] __schedule
> 0.63% swap [kernel.kallsyms] [k] unmap_single_vma
> 0.62% swap [kernel.kallsyms] [k] vmacache_find
> 0.60% swap [kernel.kallsyms] [k] update_cfs_shares
> 0.59% swap [kernel.kallsyms] [k] tag_get
> 0.55% swap [kernel.kallsyms] [k] update_curr
> 0.53% swap [kernel.kallsyms] [k]
> __update_tg_runnable_avg.isra.23
> 0.51% swap [kernel.kallsyms] [k] next_zones_zonelist
> 0.51% swap [kernel.kallsyms] [k] __radix_tree_create
> 0.50% swap [kernel.kallsyms] [k] __blk_mq_alloc_request
> 0.50% swap [kernel.kallsyms] [k] __call_rcu.constprop.62
> 0.49% swap [kernel.kallsyms] [k] page_waitqueue
> 0.48% swap [kernel.kallsyms] [k] swap_entry_free
> 0.47% swap [kernel.kallsyms] [k] __add_to_swap_cache
> 0.46% swap [kernel.kallsyms] [k] down_read_trylock
> 0.44% swap [kernel.kallsyms] [k] up_read
> 0.43% swap [kernel.kallsyms] [k] __wake_up_bit
> 0.43% swap [kernel.kallsyms] [k] io_schedule
> 0.42% swap [kernel.kallsyms] [k] __mod_zone_page_state
> 0.42% swap [kernel.kallsyms] [k] do_wp_page
> 0.39% swap [kernel.kallsyms] [k] __inc_zone_state
> 0.39% swap [kernel.kallsyms] [k] dequeue_task_fair
> 0.39% swap [kernel.kallsyms] [k] prepare_to_wait
It's hard (impossible) to tell from just this, we'd need performance
data to go with it, too. The number of events is a very vague hint, I
would not put any value into that.
If you can describe your workload, I'd love to just run it and see what
happens here!
--
Jens Axboe
next prev parent reply other threads:[~2014-11-14 20:53 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-11-14 0:05 [PATCH] NVMe: Add rw_page support Keith Busch
2014-11-14 1:29 ` Jens Axboe
2014-11-14 14:58 ` Matthew Wilcox
2014-11-14 15:07 ` Jens Axboe
2014-11-14 15:52 ` Matthew Wilcox
2014-11-14 16:32 ` Jens Axboe
2014-11-14 17:05 ` Keith Busch
2014-11-14 20:53 ` Jens Axboe [this message]
2014-11-14 22:59 ` Keith Busch
2014-11-14 14:55 ` Matthew Wilcox
[not found] ` <CANvN+ekQTdNgPe33iaM_9=2Hjrfds2B2R3d3XK06K9n=SY+ZKA@mail.gmail.com>
2014-11-14 22:50 ` Keith Busch
2014-11-14 22:56 ` Jens Axboe
2014-11-14 23:04 ` Keith Busch
2014-11-14 23:30 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=54666BD8.9010901@kernel.dk \
--to=axboe@kernel.dk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox