NVMe scalability issue

Linux-NVME Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: axboe@fb.com (Jens Axboe)
Subject: NVMe scalability issue
Date: Tue, 2 Jun 2015 13:09:07 -0600	[thread overview]
Message-ID: <556DFF53.7030107@fb.com> (raw)
In-Reply-To: <CANvN+e=se0_q+eC+KkBenM2asOSy3-9ZNveRm2YBXJ9-qVna1w@mail.gmail.com>

On 06/02/2015 01:03 PM, Andrey Kuzmin wrote:
> On Tue, Jun 2, 2015@1:52 AM, Ming Lin <mlin@kernel.org> wrote:
>> Hi list,
>>
>> I'm playing with 8 high performance NVMe devices on a 4 sockets server.
>> Each device can get 730K 4k read IOPS.
>>
>> Kernel: 4.1-rc3
>> fio test shows it doesn't scale well with 4 or more devices.
>> I wonder any possible direction to improve it.
>>
>> devices         theory          actual
>>                  IOPS(K)         IOPS(K)
>> -------         -------         -------
>> 1               733             733
>> 2               1466            1446.8
>> 3               2199            2174.5
>> 4               2932            2354.9
>> 5               3665            3024.5
>> 6               4398            3818.9
>> 7               5131            4526.3
>> 8               5864            4621.2
>>
>> And a graph here:
>> http://minggr.net/pub/20150601/nvme-scalability.jpg
>>
>>
>> With 8 devices, CPU is still 43% idle, so CPU is not the bottleneck.
>>
>> "top" data
>>
>> Tasks: 565 total,  30 running, 535 sleeping,   0 stopped,   0 zombie
>> %Cpu(s): 17.5 us, 39.2 sy,  0.0 ni, 43.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
>> KiB Mem:  52833033+total,  3103032 used, 52522732+free,    18472 buffers
>> KiB Swap:  7999484 total,        0 used,  7999484 free.  1506732 cached Mem
>>
>> "perf top" data
>>
>>     PerfTop:  124581 irqs/sec  kernel:78.6%  exact:  0.0% [4000Hz cycles],  (all, 48 CPUs)
>> -----------------------------------------------------------------------------------------
>>
>>       3.30%  [kernel]       [k] do_blockdev_direct_IO
>>       2.99%  fio            [.] get_io_u
>>       2.79%  fio            [.] axmap_isset
>
> Just a thought as well, but axmap_isset cpu usage is suspiciously
> high, given a read-only workload where it's essentially a noop.

Read or write doesn't matter, it's still marked in the random map. Both 
of them will maintain that state.

-- 
Jens Axboe

next prev parent reply	other threads:[~2015-06-02 19:09 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-01 22:52 NVMe scalability issue Ming Lin
2015-06-01 23:02 ` Keith Busch
2015-06-01 23:24   ` Ming Lin
2015-06-02  3:30     ` Keith Busch
2015-06-02 17:24       ` Ming Lin
2015-06-02 18:22         ` Jens Axboe
2015-06-02 20:55           ` Ming Lin
2015-06-01 23:28   ` Azher Mughal
2015-06-02  7:58 ` Matias Bjørling
2015-06-02 19:03 ` Andrey Kuzmin
2015-06-02 19:09   ` Jens Axboe [this message]
2015-06-02 19:11     ` Andrey Kuzmin
2015-06-02 19:14       ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=556DFF53.7030107@fb.com \
    --to=axboe@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox