From: axboe@fb.com (Jens Axboe)
Subject: NVMe scalability issue
Date: Tue, 2 Jun 2015 13:14:04 -0600 [thread overview]
Message-ID: <556E007C.6040603@fb.com> (raw)
In-Reply-To: <CANvN+ekcUU6fCxGKf=6g+yY79qKAbpaDA9UGBnBh1QV0W9nN2w@mail.gmail.com>
On 06/02/2015 01:11 PM, Andrey Kuzmin wrote:
> On Tue, Jun 2, 2015@10:09 PM, Jens Axboe <axboe@fb.com> wrote:
>> On 06/02/2015 01:03 PM, Andrey Kuzmin wrote:
>>>
>>> On Tue, Jun 2, 2015@1:52 AM, Ming Lin <mlin@kernel.org> wrote:
>>>>
>>>> Hi list,
>>>>
>>>> I'm playing with 8 high performance NVMe devices on a 4 sockets server.
>>>> Each device can get 730K 4k read IOPS.
>>>>
>>>> Kernel: 4.1-rc3
>>>> fio test shows it doesn't scale well with 4 or more devices.
>>>> I wonder any possible direction to improve it.
>>>>
>>>> devices theory actual
>>>> IOPS(K) IOPS(K)
>>>> ------- ------- -------
>>>> 1 733 733
>>>> 2 1466 1446.8
>>>> 3 2199 2174.5
>>>> 4 2932 2354.9
>>>> 5 3665 3024.5
>>>> 6 4398 3818.9
>>>> 7 5131 4526.3
>>>> 8 5864 4621.2
>>>>
>>>> And a graph here:
>>>> http://minggr.net/pub/20150601/nvme-scalability.jpg
>>>>
>>>>
>>>> With 8 devices, CPU is still 43% idle, so CPU is not the bottleneck.
>>>>
>>>> "top" data
>>>>
>>>> Tasks: 565 total, 30 running, 535 sleeping, 0 stopped, 0 zombie
>>>> %Cpu(s): 17.5 us, 39.2 sy, 0.0 ni, 43.3 id, 0.0 wa, 0.0 hi, 0.0 si,
>>>> 0.0 st
>>>> KiB Mem: 52833033+total, 3103032 used, 52522732+free, 18472 buffers
>>>> KiB Swap: 7999484 total, 0 used, 7999484 free. 1506732 cached
>>>> Mem
>>>>
>>>> "perf top" data
>>>>
>>>> PerfTop: 124581 irqs/sec kernel:78.6% exact: 0.0% [4000Hz
>>>> cycles], (all, 48 CPUs)
>>>>
>>>> -----------------------------------------------------------------------------------------
>>>>
>>>> 3.30% [kernel] [k] do_blockdev_direct_IO
>>>> 2.99% fio [.] get_io_u
>>>> 2.79% fio [.] axmap_isset
>>>
>>>
>>> Just a thought as well, but axmap_isset cpu usage is suspiciously
>>> high, given a read-only workload where it's essentially a noop.
>>
>>
>> Read or write doesn't matter, it's still marked in the random map. Both of
>> them will maintain that state.
>>
>
> Not sure keeping track of blocks read was the intention in the test,
> so it's worth rerunning with norandommap=1.
Right, it doesn't matter for this test. But it's only a few percent of
CPU, and should not impact scaling. I suspect the time keeping would be
a bigger offender.
--
Jens Axboe
prev parent reply other threads:[~2015-06-02 19:14 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-06-01 22:52 NVMe scalability issue Ming Lin
2015-06-01 23:02 ` Keith Busch
2015-06-01 23:24 ` Ming Lin
2015-06-02 3:30 ` Keith Busch
2015-06-02 17:24 ` Ming Lin
2015-06-02 18:22 ` Jens Axboe
2015-06-02 20:55 ` Ming Lin
2015-06-01 23:28 ` Azher Mughal
2015-06-02 7:58 ` Matias Bjørling
2015-06-02 19:03 ` Andrey Kuzmin
2015-06-02 19:09 ` Jens Axboe
2015-06-02 19:11 ` Andrey Kuzmin
2015-06-02 19:14 ` Jens Axboe [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=556E007C.6040603@fb.com \
--to=axboe@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox