Re: [PATCH 00/10] mpt3sas: full mq support

linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
To: Hannes Reinecke <hare@suse.de>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>,
	Christoph Hellwig <hch@lst.de>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	James Bottomley <james.bottomley@hansenpartnership.com>,
	"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
	Sathya Prakash Veerichetty <sathya.prakash@broadcom.com>,
	PDL-MPT-FUSIONLINUX <mpt-fusionlinux.pdl@broadcom.com>
Subject: Re: [PATCH 00/10] mpt3sas: full mq support
Date: Wed, 15 Feb 2017 13:57:38 +0530	[thread overview]
Message-ID: <CAK=zhgqiCh+5tD+kHD15Xef5MnJM0SG=LaP=+cM-40VtMSH7Ew@mail.gmail.com> (raw)
In-Reply-To: <2770c802-b8b2-9035-c760-c5b970a9bd99@suse.de>

On Mon, Feb 13, 2017 at 6:41 PM, Hannes Reinecke <hare@suse.de> wrote:
> On 02/13/2017 07:15 AM, Sreekanth Reddy wrote:
>> On Fri, Feb 10, 2017 at 12:29 PM, Hannes Reinecke <hare@suse.de> wrote:
>>> On 02/10/2017 05:43 AM, Sreekanth Reddy wrote:
>>>> On Thu, Feb 9, 2017 at 6:42 PM, Hannes Reinecke <hare@suse.de> wrote:
>>>>> On 02/09/2017 02:03 PM, Sreekanth Reddy wrote:
>>> [ .. ]
>>>>>>
>>>>>>
>>>>>> Hannes,
>>>>>>
>>>>>> I have created a md raid0 with 4 SAS SSD drives using below command,
>>>>>> #mdadm --create /dev/md0 --level=0 --raid-devices=4 /dev/sdg /dev/sdh
>>>>>> /dev/sdi /dev/sdj
>>>>>>
>>>>>> And here is 'mdadm --detail /dev/md0' command output,
>>>>>> --------------------------------------------------------------------------------------------------------------------------
>>>>>> /dev/md0:
>>>>>>         Version : 1.2
>>>>>>   Creation Time : Thu Feb  9 14:38:47 2017
>>>>>>      Raid Level : raid0
>>>>>>      Array Size : 780918784 (744.74 GiB 799.66 GB)
>>>>>>    Raid Devices : 4
>>>>>>   Total Devices : 4
>>>>>>     Persistence : Superblock is persistent
>>>>>>
>>>>>>     Update Time : Thu Feb  9 14:38:47 2017
>>>>>>           State : clean
>>>>>>  Active Devices : 4
>>>>>> Working Devices : 4
>>>>>>  Failed Devices : 0
>>>>>>   Spare Devices : 0
>>>>>>
>>>>>>      Chunk Size : 512K
>>>>>>
>>>>>>            Name : host_name
>>>>>>            UUID : b63f9da7:b7de9a25:6a46ca00:42214e22
>>>>>>          Events : 0
>>>>>>
>>>>>>     Number   Major   Minor   RaidDevice State
>>>>>>        0       8       96        0      active sync   /dev/sdg
>>>>>>        1       8      112        1      active sync   /dev/sdh
>>>>>>        2       8      144        2      active sync   /dev/sdj
>>>>>>        3       8      128        3      active sync   /dev/sdi
>>>>>> ------------------------------------------------------------------------------------------------------------------------------
>>>>>>
>>>>>> Then I have used below fio profile to run 4K sequence read operations
>>>>>> with nr_hw_queues=1 driver and with nr_hw_queues=24 driver (as my
>>>>>> system has two numa node and each with 12 cpus).
>>>>>> -----------------------------------------------------
>>>>>> global]
>>>>>> ioengine=libaio
>>>>>> group_reporting
>>>>>> direct=1
>>>>>> rw=read
>>>>>> bs=4k
>>>>>> allow_mounted_write=0
>>>>>> iodepth=128
>>>>>> runtime=150s
>>>>>>
>>>>>> [job1]
>>>>>> filename=/dev/md0
>>>>>> -----------------------------------------------------
>>>>>>
>>>>>> Here are the fio results when nr_hw_queues=1 (i.e. single request
>>>>>> queue) with various number of job counts
>>>>>> 1JOB 4k read  : io=213268MB, bw=1421.8MB/s, iops=363975, runt=150001msec
>>>>>> 2JOBs 4k read : io=309605MB, bw=2064.2MB/s, iops=528389, runt=150001msec
>>>>>> 4JOBs 4k read : io=281001MB, bw=1873.4MB/s, iops=479569, runt=150002msec
>>>>>> 8JOBs 4k read : io=236297MB, bw=1575.2MB/s, iops=403236, runt=150016msec
>>>>>>
>>>>>> Here are the fio results when nr_hw_queues=24 (i.e. multiple request
>>>>>> queue) with various number of job counts
>>>>>> 1JOB 4k read   : io=95194MB, bw=649852KB/s, iops=162463, runt=150001msec
>>>>>> 2JOBs 4k read : io=189343MB, bw=1262.3MB/s, iops=323142, runt=150001msec
>>>>>> 4JOBs 4k read : io=314832MB, bw=2098.9MB/s, iops=537309, runt=150001msec
>>>>>> 8JOBs 4k read : io=277015MB, bw=1846.8MB/s, iops=472769, runt=150001msec
>>>>>>
>>>>>> Here we can see that on less number of jobs count, single request
>>>>>> queue (nr_hw_queues=1) is giving more IOPs than multi request
>>>>>> queues(nr_hw_queues=24).
>>>>>>
>>>>>> Can you please share your fio profile, so that I can try same thing on
>>>>>> my system.
>>>>>>
>>>>> Have you tried with the latest git update from Jens for-4.11/block (or
>>>>> for-4.11/next) branch?
>>>>
>>>> I am using below git repo,
>>>>
>>>> https://git.kernel.org/cgit/linux/kernel/git/mkp/scsi.git/log/?h=4.11/scsi-queue
>>>>
>>>> Today I will try with Jens for-4.11/block.
>>>>
>>> By all means, do.
>>>
>>>>> I've found that using the mq-deadline scheduler has a noticeable
>>>>> performance boost.
>>>>>
>>>>> The fio job I'm using is essentially the same; you just should make sure
>>>>> to specify a 'numjob=' statement in there.
>>>>> Otherwise fio will just use a single CPU, which of course leads to
>>>>> averse effects in the multiqueue case.
>>>>
>>>> Yes I am providing 'numjob=' on fio command line as shown below,
>>>>
>>>> # fio md_fio_profile --numjobs=8 --output=fio_results.txt
>>>>
>>> Still, it looks as if you'd be using less jobs than you have CPUs.
>>> Which means you'll be running into a tag starvation scenario on those
>>> CPUs, especially for the small blocksizes.
>>> What are the results if you set 'numjobs' to the number of CPUs?
>>>
>>
>> Hannes,
>>
>> Tried on Jens for-4.11/block kernel repo and also set each block PD's
>> scheduler as 'mq-deadline', and here is my results for 4K SR on md0
>> (raid0 with 4 drives). I have 24 CPUs and so tried even with setting
>> numjobs=24.
>>
>> fio results when nr_hw_queues=1 (i.e. single request queue) with
>> various number of job counts
>>
>> 4k read when numjobs=1 : io=215553MB, bw=1437.9MB/s, iops=367874,
>> runt=150001msec
>> 4k read when numjobs=2 : io=307771MB, bw=2051.9MB/s, iops=525258,
>> runt=150001msec
>> 4k read when numjobs=4 : io=300382MB, bw=2002.6MB/s, iops=512644,
>> runt=150002msec
>> 4k read when numjobs=8 : io=320609MB, bw=2137.4MB/s, iops=547162,
>> runt=150003msec
>> 4k read when numjobs=24: io=275701MB, bw=1837.1MB/s, iops=470510,
>> runt=150006msec
>>
>> fio results when nr_hw_queues=24 (i.e. multiple request queue) with
>> various number of job counts,
>>
>> 4k read when numjobs=1 : io=177600MB, bw=1183.2MB/s, iops=303102,
>> runt=150001msec
>> 4k read when numjobs=2 : io=182416MB, bw=1216.1MB/s, iops=311320,
>> runt=150001msec
>> 4k read when numjobs=4 : io=347553MB, bw=2316.2MB/s, iops=593149,
>> runt=150002msec
>> 4k read when numjobs=8 : io=349995MB, bw=2333.3MB/s, iops=597312,
>> runt=150003msec
>> 4k read when numjobs=24: io=350618MB, bw=2337.4MB/s, iops=598359,
>> runt=150007msec
>>
>> On less number of jobs single queue performing better. Where as on
>> more number of jobs multi-queue is performing better.
>>
> Thank you for these numbers. They do very much fit with my results.
>
> So it's as I suspected; with more parallelism we do gain from
> multiqueue. And with single-issue processes we do suffer a performance
> penalty.
>
> However, I strongly suspect that this is an issue with block-mq itself,
> and not so much with mpt3sas.
> Reason is that block-mq needs split the tag space into distinct ranges
> for each queue, and hence is hitting tag starvation far earlier the more
> queues are registered.
> block-mq _can_ work around this by moving the issuing process onto
> another CPU (and thus use the tagspace from there), but this involved
> calling 'schedule' in the hot path. And might well account for the
> performance drop here.
>
> I will be doing more tests with a high nr_hw_queue count and a low I/O
> issuer count; I really do guess that it's the block-layer which is
> performing suboptimal here.
> In any case, we will be discussing blk-mq performance at LSF/MM this
> year; I will be bringing up the poor single-queue performance there.
>
> At the end of the day, I strongly suspect that every self-respecting
> process doing heavy I/O already _is_ multithreaded, so I would not
> trying to optimize for the single-queue case.
>
> Cheers,
>
> Hannes


Hannes,

Result I have posted last time is with merge operation enabled in
block layer. If I disable merge operation then I don't see much
improvement  with multiple hw request queues. Here is the result,

fio results when nr_hw_queues=1,
4k read when numjobs=24: io=248387MB, bw=1655.1MB/s, iops=423905,
runt=150003msec

fio results when nr_hw_queues=24,
4k read when numjobs=24: io=263904MB, bw=1759.4MB/s, iops=450393,
runt=150001msec

Thanks,
Sreekanth

next prev parent reply	other threads:[~2017-02-15  8:27 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-31  9:25 [PATCH 00/10] mpt3sas: full mq support Hannes Reinecke
2017-01-31  9:25 ` [PATCH 01/10] mpt3sas: switch to pci_alloc_irq_vectors Hannes Reinecke
2017-02-07 13:15   ` Christoph Hellwig
2017-02-16  9:32   ` Sreekanth Reddy
2017-02-16 10:01     ` Hannes Reinecke
2017-01-31  9:25 ` [PATCH 02/10] mpt3sas: set default value for cb_idx Hannes Reinecke
2017-02-07 13:15   ` Christoph Hellwig
2017-01-31  9:25 ` [PATCH 03/10] mpt3sas: implement _dechain_st() Hannes Reinecke
2017-02-07 13:15   ` Christoph Hellwig
2017-02-07 13:18     ` Hannes Reinecke
2017-01-31  9:25 ` [PATCH 04/10] mpt3sas: separate out _base_recovery_check() Hannes Reinecke
2017-02-07 13:16   ` Christoph Hellwig
2017-02-16  9:53   ` Sreekanth Reddy
2017-02-16 10:03     ` Hannes Reinecke
2017-01-31  9:25 ` [PATCH 05/10] mpt3sas: open-code _scsih_scsi_lookup_get() Hannes Reinecke
2017-02-07 13:16   ` Christoph Hellwig
2017-02-16  9:59   ` Sreekanth Reddy
2017-02-16 10:04     ` Hannes Reinecke
2017-01-31  9:25 ` [PATCH 06/10] mpt3sas: Introduce mpt3sas_get_st_from_smid() Hannes Reinecke
2017-02-07 13:17   ` Christoph Hellwig
2017-01-31  9:25 ` [PATCH 07/10] mpt3sas: use hi-priority queue for TMFs Hannes Reinecke
2017-02-07 13:19   ` Christoph Hellwig
2017-02-16 10:09   ` Sreekanth Reddy
2017-02-16 10:14     ` Hannes Reinecke
2017-02-16 10:23       ` Sreekanth Reddy
2017-02-16 10:26         ` Hannes Reinecke
2017-01-31  9:25 ` [PATCH 08/10] mpt3sas: lockless command submission for scsi-mq Hannes Reinecke
2017-01-31 13:22   ` Christoph Hellwig
2017-01-31 13:46     ` Hannes Reinecke
2017-01-31 14:24       ` Christoph Hellwig
2017-01-31  9:25 ` [PATCH 09/10] mpt3sas: Use 'msix_index' as argument for put_smid functions Hannes Reinecke
2017-01-31  9:26 ` [PATCH 10/10] mpt3sas: scsi-mq interrupt steering Hannes Reinecke
2017-01-31 10:05   ` Christoph Hellwig
2017-01-31 10:02 ` [PATCH 00/10] mpt3sas: full mq support Christoph Hellwig
2017-01-31 11:16   ` Hannes Reinecke
2017-01-31 17:54     ` Kashyap Desai
2017-02-01  6:51       ` Hannes Reinecke
2017-02-01  7:07         ` Kashyap Desai
2017-02-01  7:43           ` Hannes Reinecke
2017-02-09 13:03             ` Sreekanth Reddy
2017-02-09 13:12               ` Hannes Reinecke
2017-02-10  4:43                 ` Sreekanth Reddy
2017-02-10  6:59                   ` Hannes Reinecke
2017-02-13  6:15                     ` Sreekanth Reddy
2017-02-13 13:11                       ` Hannes Reinecke
2017-02-15  8:27                         ` Sreekanth Reddy [this message]
2017-02-15  9:18                           ` Kashyap Desai
2017-02-15 10:05                             ` Hannes Reinecke
2017-02-16  9:48                               ` Kashyap Desai
2017-02-16 10:18                                 ` Hannes Reinecke
2017-02-16 10:45                                   ` Kashyap Desai
2017-02-07 13:19 ` Christoph Hellwig
2017-02-07 14:38   ` Hannes Reinecke
2017-02-07 15:34     ` Christoph Hellwig
2017-02-07 15:39       ` Hannes Reinecke
2017-02-07 15:40         ` Christoph Hellwig
2017-02-07 15:49           ` Hannes Reinecke
2017-02-15  8:15   ` Christoph Hellwig
2017-02-15  8:19     ` Hannes Reinecke

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAK=zhgqiCh+5tD+kHD15Xef5MnJM0SG=LaP=+cM-40VtMSH7Ew@mail.gmail.com' \
    --to=sreekanth.reddy@broadcom.com \
    --cc=hare@suse.de \
    --cc=hch@lst.de \
    --cc=james.bottomley@hansenpartnership.com \
    --cc=kashyap.desai@broadcom.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=mpt-fusionlinux.pdl@broadcom.com \
    --cc=sathya.prakash@broadcom.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).