Re: [PATCH V5 00/25] mmc: mmc: Add Software Command Queuing

linux-mmc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Adrian Hunter <adrian.hunter@intel.com>
To: Ritesh Harjani <riteshh@codeaurora.org>,
	Ulf Hansson <ulf.hansson@linaro.org>
Cc: linux-mmc <linux-mmc@vger.kernel.org>,
	Alex Lemberg <alex.lemberg@sandisk.com>,
	Mateusz Nowak <mateusz.nowak@intel.com>,
	Yuliy Izrailov <Yuliy.Izrailov@sandisk.com>,
	Jaehoon Chung <jh80.chung@samsung.com>,
	Dong Aisheng <dongas86@gmail.com>,
	Das Asutosh <asutoshd@codeaurora.org>,
	Zhangfei Gao <zhangfei.gao@gmail.com>,
	Dorfman Konstantin <kdorfman@codeaurora.org>,
	David Griego <david.griego@linaro.org>,
	Sahitya Tummala <stummala@codeaurora.org>,
	Venu Byravarasu <vbyravarasu@nvidia.com>
Subject: Re: [PATCH V5 00/25] mmc: mmc: Add Software Command Queuing
Date: Wed, 2 Nov 2016 12:09:02 +0200	[thread overview]
Message-ID: <3128dca4-374c-02d3-dda6-645f46e64a23@intel.com> (raw)
In-Reply-To: <4d64111f-7bdb-22ed-5a53-e23f6bfcb28b@codeaurora.org>

On 02/11/16 11:30, Ritesh Harjani wrote:
> Hi Adrian,
> 
> Reply got delayed due to festive season holidays here.
> 
> On 10/28/2016 6:18 PM, Adrian Hunter wrote:
>> On 28/10/16 14:48, Ritesh Harjani wrote:
>>> Hi Adrian,
>>>
>>>
>>> Thanks for the re-base. I am trying to apply this patch series and validate.
>>> I am seeing some tuning related errors, it could be due to my setup/device.
>>> I will be working on this.
>>>
>>>
>>> Meanwhile below perf readout was showing some discrepancy (could be that I
>>> am missing something).
>>> In your last test both sequential and random read gives almost same
>>> throughput and even the percentage increase is similar(legacy v/s SW cmdq).
>>> Could be due to benchmark itself ?
>>
>> Normally sequential reading would result in large transfers due to
>> readahead.  But the test is using Direct-I/O (-I option) to prevent VFS
>> caching distorting the results.  So the sequential read case ends up being
>> more like a random read case anyway.
>>
>>>
>>> Do you think we should get some other benchmarks tested as well?
>>> (may be tio?)
>>>
>>>
>>> Also could you please also add some analysis on where we should expect to
>>> see the score improvement in SW CMDQ and where we may see some decrements
>>> due to SW CMDQ?
>>
>> I would expect random reads to be improved, but there is the overhead of the
>> extra cmdq commands, so a small decrease in performance would be expected
>> otherwise.
>>
>>> You did mention in original cover-letter that we mostly should see
>>> improvement in Random read scores, but below here we are seeing similar or
>>> higher improvement in sequential reads(4 threads). Which is a bit
>>> surprising.
>>
>> As explained above: the test is called "sequential" but the limited record
>> size combined with Direct I/O makes it more like random I/O especially when
>> it is mixed from multiple threads.
> In that case if we see sequential read v/s random read for single thread of
> SWCMDQ, both show the same data. Both SR and RR should not be same ideally no?

Yes, with these iozone tests, SR and RR seem the same.

> 
> 
>>
>>>
>>>
>>>
>>>
>>> On 10/24/2016 2:07 PM, Adrian Hunter wrote:
>>>> Hi
>>>>
>>>> Here is an updated version of the Software Command Queuing patches,
>>>> re-based on next, with patches 1-5 dropped because they have been applied,
>>>> and 2 fixes that have been rolled in (refer Changes in V5 below).
>>>>
>>>> Performance results:
>>>>
>>>> Results can vary from run to run, but here are some results showing 1, 2
>>>> or 4
>>>> processes with 4k and 32k record sizes.  They show up to 40% improvement in
>>>> read performance when there are multiple processes.
>>>>
>>>> iozone -s 8192k -r 4k -i 0 -i 1 -i 2 -i 8 -I -t 1 -F /mnt/mmc/iozone1.tmp
>>>>
>>>>     Children see throughput for  1 initial writers     =     27909.87
>>>> kB/sec     24204.14 kB/sec      -13.28 %
>>>>     Children see throughput for  1 rewriters     =     28839.28 kB/sec
>>>> 25531.92 kB/sec      -11.47 %
>>>>     Children see throughput for  1 readers         =     25889.65
>>>> kB/sec     24883.23 kB/sec       -3.89 %
>>>>     Children see throughput for 1 re-readers     =     25558.23 kB/sec
>>>> 24679.89 kB/sec       -3.44 %
>>>>     Children see throughput for 1 random readers     =     25571.48
>>>> kB/sec     24689.52 kB/sec       -3.45 %
>>>>     Children see throughput for 1 mixed workload     =     25758.59
>>>> kB/sec     24487.52 kB/sec       -4.93 %
>>>>     Children see throughput for 1 random writers     =     24787.51
>>>> kB/sec     19368.99 kB/sec      -21.86 %
>>>>
>>>> iozone -s 8192k -r 32k -i 0 -i 1 -i 2 -i 8 -I -t 1 -F /mnt/mmc/iozone1.tmp
>>>>
>>>>     Children see throughput for  1 initial writers     =     91344.61
>>>> kB/sec    102008.56 kB/sec       11.67 %
>>>>     Children see throughput for  1 rewriters     =     87932.36 kB/sec
>>>> 96630.44 kB/sec        9.89 %
>>>>     Children see throughput for  1 readers         =    134879.82
>>>> kB/sec    110292.79 kB/sec      -18.23 %
>>>>     Children see throughput for 1 re-readers     =    147632.13 kB/sec
>>>> 109053.33 kB/sec      -26.13 %
>>>>     Children see throughput for 1 random readers     =     93547.37
>>>> kB/sec    112225.50 kB/sec       19.97 %
>>>>     Children see throughput for 1 mixed workload     =     93560.04
>>>> kB/sec    110515.21 kB/sec       18.12 %
>>>>     Children see throughput for 1 random writers     =     92841.84
>>>> kB/sec     81153.81 kB/sec      -12.59 %
>>>>
>>>> iozone -s 8192k -r 4k -i 0 -i 1 -i 2 -i 8 -I -t 2 -F /mnt/mmc/iozone1.tmp
>>>> /mnt/mmc/iozone2.tmp
>>>>
>>>>     Children see throughput for  2 initial writers     =     31145.43
>>>> kB/sec     33771.25 kB/sec        8.43 %
>>>>     Children see throughput for  2 rewriters     =     30592.57 kB/sec
>>>> 35916.46 kB/sec       17.40 %
>>>>     Children see throughput for  2 readers         =     31669.83
>>>> kB/sec     37460.13 kB/sec       18.28 %
>>>>     Children see throughput for 2 re-readers     =     32079.94 kB/sec
>>>> 37373.33 kB/sec       16.50 %
>>>>     Children see throughput for 2 random readers     =     27731.19
>>>> kB/sec     37601.65 kB/sec       35.59 %
>>>>     Children see throughput for 2 mixed workload     =     13927.50
>>>> kB/sec     14617.06 kB/sec        4.95 %
>>>>     Children see throughput for 2 random writers     =     31250.00
>>>> kB/sec     33106.72 kB/sec        5.94 %
>>>>
>>>> iozone -s 8192k -r 32k -i 0 -i 1 -i 2 -i 8 -I -t 2 -F /mnt/mmc/iozone1.tmp
>>>> /mnt/mmc/iozone2.tmp
>>>>
>>>>     Children see throughput for  2 initial writers     =    123255.84
>>>> kB/sec    131252.22 kB/sec        6.49 %
>>>>     Children see throughput for  2 rewriters     =    115234.91 kB/sec
>>>> 107225.74 kB/sec       -6.95 %
>>>>     Children see throughput for  2 readers         =    128921.86
>>>> kB/sec    148562.71 kB/sec       15.23 %
>>>>     Children see throughput for 2 re-readers     =    127815.24 kB/sec
>>>> 149304.32 kB/sec       16.81 %
>>>>     Children see throughput for 2 random readers     =    125600.46
>>>> kB/sec    148406.56 kB/sec       18.16 %
>>>>     Children see throughput for 2 mixed workload     =     44006.94
>>>> kB/sec     50937.36 kB/sec       15.75 %
>>>>     Children see throughput for 2 random writers     =    120623.95
>>>> kB/sec    103969.05 kB/sec      -13.81 %
>>>>
>>>> iozone -s 8192k -r 4k -i 0 -i 1 -i 2 -i 8 -I -t 4 -F /mnt/mmc/iozone1.tmp
>>>> /mnt/mmc/iozone2.tmp /mnt/mmc/iozone3.tmp /mnt/mmc/iozone4.tmp
>>>>
>>>>     Children see throughput for  4 initial writers     =     24100.96
>>>> kB/sec     33336.58 kB/sec       38.32 %
>>>>     Children see throughput for  4 rewriters     =     31650.20 kB/sec
>>>> 33091.53 kB/sec        4.55 %
>>>>     Children see throughput for  4 readers         =     33276.92
>>>> kB/sec     41799.89 kB/sec       25.61 %
>>>>     Children see throughput for 4 re-readers     =     31786.96 kB/sec
>>>> 41501.74 kB/sec       30.56 %
>>>>     Children see throughput for 4 random readers     =     31991.65
>>>> kB/sec     40973.93 kB/sec       28.08 %
>>>>     Children see throughput for 4 mixed workload     =     15804.80
>>>> kB/sec     13581.32 kB/sec      -14.07 %
>>>>     Children see throughput for 4 random writers     =     31231.42
>>>> kB/sec     34537.03 kB/sec       10.58 %
>>>>
>>>> iozone -s 8192k -r 32k -i 0 -i 1 -i 2 -i 8 -I -t 4 -F /mnt/mmc/iozone1.tmp
>>>> /mnt/mmc/iozone2.tmp /mnt/mmc/iozone3.tmp /mnt/mmc/iozone4.tmp
>>>>
>>>>     Children see throughput for  4 initial writers     =    116567.42
>>>> kB/sec    119280.35 kB/sec        2.33 %
>>>>     Children see throughput for  4 rewriters     =    115010.96 kB/sec
>>>> 120864.34 kB/sec        5.09 %
>>>>     Children see throughput for  4 readers         =    130700.29
>>>> kB/sec    177834.21 kB/sec       36.06 %
>>> Do you think sequential read will increase more that of random read. It
>>> should mostly benefit in random reads right. Any idea why it's behaving
>>> differently here?
>>
>> Explained above.
>>
>>>
>>>
>>>>     Children see throughput for 4 re-readers     =    125392.58 kB/sec
>>>> 175975.28 kB/sec       40.34 %
>>>>     Children see throughput for 4 random readers     =    132194.57
>>>> kB/sec    176630.46 kB/sec       33.61 %
>>>>     Children see throughput for 4 mixed workload     =     56464.98
>>>> kB/sec     54140.61 kB/sec       -4.12 %
>>>>     Children see throughput for 4 random writers     =    109128.36
>>>> kB/sec     85359.80 kB/sec      -21.78 %
>>> Similarly, we don't expect random write scores to decrease here. Do you know
>>> why this could be the case here?
>>
>> There is a lot of variation from one run to another
> 
> These run-to-run variations are due to what, any idea?

Not really, but there is no knowing how much internal work the eMMC is
doing, wear-leveling, erasing blocks, cache flushing etc.  Also "random"
writes that hit the same erase block can be combined when flushing the
cache, and random writes to the same sectors can also be combined - so the
potential for variation is very large.

> 21% variations is huge.
> 
> Actually can we try and get some data which shows the difference between
> SeqR and RandR and also RandW should not decrease this much.
> May be by changing iozone parameters?
> 
> 
> How about below data ?

Yes that would be interesting

> 
> For Sequential
> ./iozone -s 64m -r 32m -i 0 -i 1 -i 2 -i 8 -I -t 4 -F /data/mmc/iozone1.tmp
> /data/mmc/iozone2.tmp /data/mmc/iozone3.tmp /data/mmc/iozone4.tmp
> 
> For random -
> ./iozone -s 64m -r 4k -i 0 -i 1 -i 2 -i 8 -I -t 4 -F /data/mmc/iozone1.tmp
> /data/mmc/iozone2.tmp /data/mmc/iozone3.tmp /data/mmc/iozone4.tmp
> 
> 
> Regards
> Ritesh
>

     prev parent reply	other threads:[~2016-11-02 10:13 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-24  8:37 [PATCH V5 00/25] mmc: mmc: Add Software Command Queuing Adrian Hunter
2016-10-24  8:37 ` [PATCH V5 01/25] mmc: queue: Fix queue thread wake-up Adrian Hunter
2016-10-24  8:37 ` [PATCH V5 02/25] mmc: queue: Factor out mmc_queue_alloc_bounce_bufs() Adrian Hunter
2016-10-24  8:37 ` [PATCH V5 03/25] mmc: queue: Factor out mmc_queue_alloc_bounce_sgs() Adrian Hunter
2016-10-24  8:37 ` [PATCH V5 04/25] mmc: queue: Factor out mmc_queue_alloc_sgs() Adrian Hunter
2016-10-24  8:37 ` [PATCH V5 05/25] mmc: queue: Factor out mmc_queue_reqs_free_bufs() Adrian Hunter
2016-10-24  8:37 ` [PATCH V5 06/25] mmc: queue: Introduce queue depth Adrian Hunter
2016-10-24  8:37 ` [PATCH V5 07/25] mmc: queue: Use queue depth to allocate and free Adrian Hunter
2016-10-24  8:37 ` [PATCH V5 08/25] mmc: queue: Allocate queue of size qdepth Adrian Hunter
2016-10-24  8:37 ` [PATCH V5 09/25] mmc: mmc: Add Command Queue definitions Adrian Hunter
2016-10-24  8:37 ` [PATCH V5 10/25] mmc: mmc: Add functions to enable / disable the Command Queue Adrian Hunter
2016-10-24  8:37 ` [PATCH V5 11/25] mmc: mmc_test: Disable Command Queue while mmc_test is used Adrian Hunter
2016-10-24  8:37 ` [PATCH V5 12/25] mmc: block: Disable Command Queue while RPMB " Adrian Hunter
2016-10-24  8:37 ` [PATCH V5 13/25] mmc: core: Do not prepare a new request twice Adrian Hunter
2016-11-04 12:07   ` [PATCH V6 " Adrian Hunter
2016-10-24  8:37 ` [PATCH V5 14/25] mmc: core: Export mmc_retune_hold() and mmc_retune_release() Adrian Hunter
2016-10-24  8:37 ` [PATCH V5 15/25] mmc: block: Factor out mmc_blk_requeue() Adrian Hunter
2016-10-24  8:37 ` [PATCH V5 16/25] mmc: block: Fix 4K native sector check Adrian Hunter
2016-10-24  8:37 ` [PATCH V5 17/25] mmc: block: Use local var for mqrq_cur Adrian Hunter
2016-10-24  8:37 ` [PATCH V5 18/25] mmc: block: Pass mqrq to mmc_blk_prep_packed_list() Adrian Hunter
2016-10-24  8:37 ` [PATCH V5 19/25] mmc: block: Introduce queue semantics Adrian Hunter
2016-10-24  8:37 ` [PATCH V5 20/25] mmc: queue: Share mmc request array between partitions Adrian Hunter
2016-10-24  8:37 ` [PATCH V5 21/25] mmc: queue: Add a function to control wake-up on new requests Adrian Hunter
2016-10-24  8:37 ` [PATCH V5 22/25] mmc: block: Add Software Command Queuing Adrian Hunter
2016-10-24  8:37 ` [PATCH V5 23/25] mmc: mmc: Enable " Adrian Hunter
2016-10-24  8:37 ` [PATCH V5 24/25] mmc: sdhci-pci: Enable Software Command Queuing for some Intel controllers Adrian Hunter
2016-10-24  8:37 ` [PATCH V5 25/25] mmc: sdhci-acpi: " Adrian Hunter
2016-10-28 11:48 ` [PATCH V5 00/25] mmc: mmc: Add Software Command Queuing Ritesh Harjani
2016-10-28 12:48   ` Adrian Hunter
2016-11-02  9:30     ` Ritesh Harjani
2016-11-02 10:09       ` Adrian Hunter [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3128dca4-374c-02d3-dda6-645f46e64a23@intel.com \
    --to=adrian.hunter@intel.com \
    --cc=Yuliy.Izrailov@sandisk.com \
    --cc=alex.lemberg@sandisk.com \
    --cc=asutoshd@codeaurora.org \
    --cc=david.griego@linaro.org \
    --cc=dongas86@gmail.com \
    --cc=jh80.chung@samsung.com \
    --cc=kdorfman@codeaurora.org \
    --cc=linux-mmc@vger.kernel.org \
    --cc=mateusz.nowak@intel.com \
    --cc=riteshh@codeaurora.org \
    --cc=stummala@codeaurora.org \
    --cc=ulf.hansson@linaro.org \
    --cc=vbyravarasu@nvidia.com \
    --cc=zhangfei.gao@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).