[PATCH] NVMe: Add rw_page support

Linux-NVME Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: axboe@kernel.dk (Jens Axboe)
Subject: [PATCH] NVMe: Add rw_page support
Date: Fri, 14 Nov 2014 13:53:44 -0700	[thread overview]
Message-ID: <54666BD8.9010901@kernel.dk> (raw)
In-Reply-To: <alpine.LNX.2.00.1411141633180.4225@localhost.lm.intel.com>

On 11/14/2014 10:05 AM, Keith Busch wrote:
> On Fri, 14 Nov 2014, Jens Axboe wrote:
>> For the cases where you do indeed end up submitting multiple, it's even
>> more of a shame to bypass the normal IO path. There are various tricks
>> we can do in there to speed things up, like batched doorbell rings. And
>> if we kill that last alloc/free per IO, then I'd really be curious to
>> know why rw_page is faster. Seems it should be possible to fix that up
>> instead.
> 
> Here's some perf data of just the kernel from two runs with a simple
> swap testing program. I'm a novice at interpreting this for comparison,
> so I'm not sure if this shows what we're looking for. The test ran for
> the same amount of time in both cases, but perf couted ~16% fewer events
> when using rw_page.
> 
> With rw_page disabled:
> 
>      7.33%  swap     [kernel.kallsyms]  [k] page_fault
>      5.13%  swap     [kernel.kallsyms]  [k] clear_page_c
>      4.46%  swap     [kernel.kallsyms]  [k] __radix_tree_lookup
>      4.36%  swap     [kernel.kallsyms]  [k] do_raw_spin_lock
>      2.63%  swap     [kernel.kallsyms]  [k] handle_mm_fault
>      2.17%  swap     [kernel.kallsyms]  [k] get_page_from_freelist
>      1.77%  swap     [kernel.kallsyms]  [k] __swap_duplicate
>      1.53%  swap     [nvme]             [k] nvme_queue_rq
>      1.38%  swap     [kernel.kallsyms]  [k] intel_pmu_disable_all
>      1.37%  swap     [kernel.kallsyms]  [k] put_page_testzero
>      1.19%  swap     [kernel.kallsyms]  [k] __do_page_fault
>      1.05%  swap     [kernel.kallsyms]  [k] _raw_spin_lock_irqsave
>      0.99%  swap     [kernel.kallsyms]  [k] __free_one_page
>      0.97%  swap     [kernel.kallsyms]  [k] swap_info_get
>      0.90%  swap     [kernel.kallsyms]  [k] __alloc_pages_nodemask
>      0.80%  swap     [kernel.kallsyms]  [k] radix_tree_insert
>      0.78%  swap     [kernel.kallsyms]  [k] test_and_set_bit.constprop.90
>      0.74%  swap     [kernel.kallsyms]  [k] __bt_get
>      0.71%  swap     [kernel.kallsyms]  [k] sg_init_table
>      0.71%  swap     [kernel.kallsyms]  [k] list_del
>      0.70%  swap     [kernel.kallsyms]  [k] ____cache_alloc
>      0.67%  swap     [kernel.kallsyms]  [k] __schedule
>      0.66%  swap     [kernel.kallsyms]  [k] round_jiffies_common
>      0.63%  swap     [kernel.kallsyms]  [k] __wait_on_bit
>      0.61%  swap     [kernel.kallsyms]  [k] __rmqueue
>      0.60%  swap     [kernel.kallsyms]  [k] vmacache_find
>      0.54%  swap     [kernel.kallsyms]  [k] __blk_bios_map_sg
>      0.54%  swap     [kernel.kallsyms]  [k] blk_mq_start_request
>      0.53%  swap     [kernel.kallsyms]  [k] unmap_single_vma
>      0.52%  swap     [kernel.kallsyms]  [k]
> __update_tg_runnable_avg.isra.23
>      0.52%  swap     [kernel.kallsyms]  [k] __blk_mq_alloc_request
>      0.51%  swap     [kernel.kallsyms]  [k] swiotlb_map_sg_attrs
>      0.49%  swap     [nvme]             [k] nvme_alloc_iod
>      0.49%  swap     [kernel.kallsyms]  [k] update_cfs_shares
>      0.47%  swap     [kernel.kallsyms]  [k] __add_to_swap_cache
>      0.46%  swap     [kernel.kallsyms]  [k] update_curr
>      0.46%  swap     [kernel.kallsyms]  [k] swap_entry_free
>      0.45%  swap     [kernel.kallsyms]  [k] swapin_readahead
>      0.45%  swap     [kernel.kallsyms]  [k] __call_rcu.constprop.62
>      0.44%  swap     [kernel.kallsyms]  [k] page_waitqueue
>      0.44%  swap     [kernel.kallsyms]  [k] tag_get
>      0.43%  swap     [kernel.kallsyms]  [k] next_zones_zonelist
>      0.43%  swap     [kernel.kallsyms]  [k] kmem_cache_alloc
>      0.42%  swap     [nvme]             [k] nvme_process_cq
> 
> With rw_page enabled:
> 
>      8.33%  swap     [kernel.kallsyms]  [k] page_fault
>      6.36%  swap     [kernel.kallsyms]  [k] clear_page_c
>      5.15%  swap     [kernel.kallsyms]  [k] do_raw_spin_lock
>      5.10%  swap     [kernel.kallsyms]  [k] __radix_tree_lookup
>      3.01%  swap     [kernel.kallsyms]  [k] handle_mm_fault
>      2.57%  swap     [kernel.kallsyms]  [k] get_page_from_freelist
>      2.06%  swap     [kernel.kallsyms]  [k] __swap_duplicate
>      1.57%  swap     [kernel.kallsyms]  [k] put_page_testzero
>      1.44%  swap     [kernel.kallsyms]  [k] intel_pmu_disable_all
>      1.37%  swap     [kernel.kallsyms]  [k] test_and_set_bit.constprop.90
>      1.20%  swap     [kernel.kallsyms]  [k] _raw_spin_lock_irqsave
>      1.19%  swap     [kernel.kallsyms]  [k] __free_one_page
>      1.15%  swap     [kernel.kallsyms]  [k] radix_tree_insert
>      1.15%  swap     [kernel.kallsyms]  [k] __do_page_fault
>      1.07%  swap     [kernel.kallsyms]  [k] swap_info_get
>      0.89%  swap     [kernel.kallsyms]  [k] __alloc_pages_nodemask
>      0.85%  swap     [kernel.kallsyms]  [k] list_del
>      0.81%  swap     [kernel.kallsyms]  [k] __bt_get
>      0.78%  swap     [nvme]             [k] nvme_rw_page
>      0.74%  swap     [kernel.kallsyms]  [k] __rmqueue
>      0.74%  swap     [kernel.kallsyms]  [k] __wait_on_bit
>      0.69%  swap     [kernel.kallsyms]  [k] __schedule
>      0.63%  swap     [kernel.kallsyms]  [k] unmap_single_vma
>      0.62%  swap     [kernel.kallsyms]  [k] vmacache_find
>      0.60%  swap     [kernel.kallsyms]  [k] update_cfs_shares
>      0.59%  swap     [kernel.kallsyms]  [k] tag_get
>      0.55%  swap     [kernel.kallsyms]  [k] update_curr
>      0.53%  swap     [kernel.kallsyms]  [k]
> __update_tg_runnable_avg.isra.23
>      0.51%  swap     [kernel.kallsyms]  [k] next_zones_zonelist
>      0.51%  swap     [kernel.kallsyms]  [k] __radix_tree_create
>      0.50%  swap     [kernel.kallsyms]  [k] __blk_mq_alloc_request
>      0.50%  swap     [kernel.kallsyms]  [k] __call_rcu.constprop.62
>      0.49%  swap     [kernel.kallsyms]  [k] page_waitqueue
>      0.48%  swap     [kernel.kallsyms]  [k] swap_entry_free
>      0.47%  swap     [kernel.kallsyms]  [k] __add_to_swap_cache
>      0.46%  swap     [kernel.kallsyms]  [k] down_read_trylock
>      0.44%  swap     [kernel.kallsyms]  [k] up_read
>      0.43%  swap     [kernel.kallsyms]  [k] __wake_up_bit
>      0.43%  swap     [kernel.kallsyms]  [k] io_schedule
>      0.42%  swap     [kernel.kallsyms]  [k] __mod_zone_page_state
>      0.42%  swap     [kernel.kallsyms]  [k] do_wp_page
>      0.39%  swap     [kernel.kallsyms]  [k] __inc_zone_state
>      0.39%  swap     [kernel.kallsyms]  [k] dequeue_task_fair
>      0.39%  swap     [kernel.kallsyms]  [k] prepare_to_wait

It's hard (impossible) to tell from just this, we'd need performance
data to go with it, too. The number of events is a very vague hint, I
would not put any value into that.

If you can describe your workload, I'd love to just run it and see what
happens here!

-- 
Jens Axboe

next prev parent reply	other threads:[~2014-11-14 20:53 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-14  0:05 [PATCH] NVMe: Add rw_page support Keith Busch
2014-11-14  1:29 ` Jens Axboe
2014-11-14 14:58   ` Matthew Wilcox
2014-11-14 15:07     ` Jens Axboe
2014-11-14 15:52       ` Matthew Wilcox
2014-11-14 16:32         ` Jens Axboe
2014-11-14 17:05           ` Keith Busch
2014-11-14 20:53             ` Jens Axboe [this message]
2014-11-14 22:59               ` Keith Busch
2014-11-14 14:55 ` Matthew Wilcox
     [not found] ` <CANvN+ekQTdNgPe33iaM_9=2Hjrfds2B2R3d3XK06K9n=SY+ZKA@mail.gmail.com>
2014-11-14 22:50   ` Keith Busch
2014-11-14 22:56     ` Jens Axboe
2014-11-14 23:04       ` Keith Busch
2014-11-14 23:30         ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54666BD8.9010901@kernel.dk \
    --to=axboe@kernel.dk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox