From mboxrd@z Thu Jan  1 00:00:00 1970
From: merez@codeaurora.org
Subject: Re: [PATCH v2 1/1] mmc: block: Add write packing control
Date: Tue, 24 Jul 2012 13:23:16 -0700 (PDT)
Message-ID: <39c8c611bf7f120d9b1c519d56b44e15.squirrel@www.codeaurora.org>
References: <34e29d774773a7dfa3e1b57232249b68.squirrel@www.codeaurora.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: QUOTED-PRINTABLE
Return-path: <linux-mmc-owner@vger.kernel.org>
In-Reply-To: <34e29d774773a7dfa3e1b57232249b68.squirrel@www.codeaurora.org>
Sender: linux-mmc-owner@vger.kernel.org
Cc: "S, Venkatraman" <svenkatr@ti.com>, merez@codeaurora.org, Chris Ball <cjb@laptop.org>, Muthu Kumar <muthu.lkml@gmail.com>, linux-mmc@vger.kernel.org, linux-arm-msm@vger.kernel.org, open list <linux-kernel@vger.kernel.org>, Seungwon Jeon <tgih.jun@samsung.com>
List-Id: linux-arm-msm@vger.kernel.org

Attached are the trace logs for parallel read and write lmdd operations=
=2E

On Tue, July 24, 2012 1:44 am, merez@codeaurora.org wrote:
> On Mon, July 23, 2012 5:22 am, S, Venkatraman wrote:
>> On Mon, Jul 23, 2012 at 5:13 PM,  <merez@codeaurora.org> wrote:
>>> On Wed, July 18, 2012 12:26 am, Chris Ball wrote:
>>>> Hi,  [removing Jens and the documentation list, since now we're
> talking about the MMC side only]
>>>> On Wed, Jul 18 2012, merez@codeaurora.org wrote:
>>>>> Is there anything else that holds this patch from being pushed to
>>> mmc-next?
>>>> Yes, I'm still uncomfortable with the write packing patchsets for =
a
>>> couple of reasons, and I suspect that the sum of those reasons mean=
s
>>> that
>>> we should probably plan on holding off merging it until after 3.6.
>>>> Here are the open issues; please correct any misunderstandings: Wi=
th
> Seungwon's patchset ("Support packed write command"):
>>>> * I still don't have a good set of representative benchmarks showi=
ng
>>>>   what kind of performance changes come with this patchset.  It se=
ems
>>> like we've had a small amount of testing on one controller/eMMC par=
t
>>> combo
>>> from Seungwon, and an entirely different test from Maya, and the
> results
>>> aren't documented fully anywhere to the level of describing what th=
e
> hardware was, what the test was, and what the results were before and
> after the patchset.
>>> Currently, there is only one card vendor that supports packed comma=
nds.
> Following are our sequential write (LMDD) test results on 2 of our
> targets
>>> (in MB/s):
>>>                        No packing        packing
>>> Target 1 (SDR 50MHz)     15               25
>>> Target 2 (DDR 50MHz)     20               30
>>>> With the reads-during-writes regression:
>>>> * Venkat still has open questions about the nature of the read
>>>>   regression, and thinks we should understand it with blktrace bef=
ore
>>> trying to fix it.  Maya has a theory about writes overwhelming read=
s,
>>> but
>>> Venkat doesn't understand why this would explain the observed
>>> bandwidth drop.
>>> The degradation of read due to writes is not a new behavior and exi=
sts
> also without the write packing feature (which only increases the
> degradation). Our investigation of this phenomenon led us to the
> Conclusion that a new scheduling policy should be used for mobile
> devices,
>>> but this is not related to the current discussion of the write pack=
ing
> feature.
>>> The write packing feature increases the degradation of read due to
> write
>>> since it allows the MMC to fetch many write requests in a row, inst=
ead
>>> of
>>> fetching only one at a time.  Therefore some of the read requests w=
ill
> have to wait for the completion of more write requests before they ca=
n
> be
>>> issued.
>>
>> I am a bit puzzled by this claim. One thing I checked carefully when
> reviewing write packing patches from SJeon was that the code didn't
> plough through a mixed list of reads and writes and selected only
> writes.
>> This section of the code in "mmc_blk_prep_packed_list()", from v8
> patchset..
>> <Quote>
>> +               if (rq_data_dir(cur) !=3D rq_data_dir(next)) {
>> +                       put_back =3D 1;
>> +                       break;
>> +               }
>> </Quote>
>>
>> means that once a read is encountered in the middle of write packing=
,
> the packing is stopped at that point and it is executed. Then the nex=
t
> blk_fetch_request should get the next read and continue as before.
>>
>> IOW, the ordering of reads and writes is _not_ altered when using pa=
cked
> commands.
>> For example if there were 5 write requests, followed by 1 read,
>> followed by 5 more write requests in the request_queue, the first 5
> writes will be executed as one "packed command", then the read will b=
e
> executed, and then the remaining 5 writes will be executed as one
> "packed command". So the read does not have to wait any more than it
> waited before (packing feature)
>
> Let me try to better explain with your example.
> Without packing the MMC layer will fetch 2 write requests and wait fo=
r the
> first write request completion before fetching another write request.
> During this time the read request could be inserted into the CFQ and =
since
> it has higher priority than the async write it will be dispatched in =
the
> next fetch. So, the result would be 2 write requests followed by one =
read
> request and the read would have to wait for completion of only 2 writ=
e
> requests.
> With packing, all the 5 write requests will be fetched in a row, and =
then
> the read will arrive and be dispatched in the next fetch. Then the re=
ad
> will have to wait for the completion of 5 write requests.
>
> Few more clarifications:
> Due to the plug list mechanism in the block layer the applications ca=
n
> "aggregate" several requests to be inserted into the scheduler before
> waking the MMC queue thread.
> This leads to a situation where there are several write requests in t=
he
> CFQ queue when MMC starts to do the fetches.
>
> If the read was inserted while we are building the packed command the=
n I
> agree that we should have seen less effect on the read performance.
> However, the write packing statistics show that in most of the cases =
the
> packing stopped due to an empty queue, meaning that the read was inse=
rted
> to the CFQ after all the pending write requests were fetched and pack=
ed.
>
> Following is an example for write packing statistics of a READ/WRITE
> parallel scenario:
> write packing statistics:
> Packed 1 reqs - 448 times
> Packed 2 reqs - 38 times
> Packed 3 reqs - 23 times
> Packed 4 reqs - 30 times
> Packed 5 reqs - 14 times
> Packed 6 reqs - 8 times
> Packed 7 reqs - 4 times
> Packed 8 reqs - 1 times
> Packed 10 reqs - 1 times
> Packed 34 reqs - 1 times
> stopped packing due to the following reasons:
> 2 times: wrong data direction (meaning a READ was fetched and stopped=
 the
> packing)
> 1 times: flush or discard
> 565 times: empty queue (meaning blk_fetch_request returned NULL)
>
>>
>> And I requested blktrace to confirm that this is indeed the behaviou=
r.
>
> The trace logs show that in case of no packing, there are maximum of =
3-4
> requests issued before a read request, while with packing there are a=
lso
> cases of 6 and 7 requests dispatched before a read request.
>
> I'm waiting for an approval for sharing the block trace logs.
> Since this is a simple test to run you can collect the trace logs and=
 let
> us know if you reach other conclusions.
>
> Thanks,
> Maya
>
>>
>> Your rest of the arguments anyway depend on this assertion, so can y=
ou
> please clarify this.
>>
>>> To overcome this behavior, the solution would be to stop the write
>>> packing
>>> when a read request is fetched, and this is the algorithm suggested=
 by
>>> the
>>> write packing control.
>>> Let's also keep in mind that lmdd benchmarking doesn't fully reflec=
t
> the
>>> real life in which there are not many scenarios that cause massive =
read
> and write operations. In our user-common-scenarios tests we saw that =
in
> many cases the write packing decreases the read latency. It can happe=
n
> in
>>> cases where the same amount of write requests is fetched with and
>>> without
>>> packing. In such a case the write packing decreases the transfer ti=
me
> of
>>> the write requests and causes the read request to wait for a shorte=
r
>>> time.
>>>> With Maya's patchset ("write packing control"):
>>>> * Venkat thinks that HPI should be used, and the number-of-request=
s
>>>>   metric is too coarse, and it doesn't let you disable packing at =
the
>>> right time, and you're essentially implementing a new I/O scheduler
>>> inside
>>> the MMC subsystem without understanding the root cause for why that=
's
> necessary.
>>> According to our measurements the stop transmission (CMD12) + HPI i=
s a
> heavy operation that may take up to several milliseconds. Therefore, =
a
> massive usage of HPI can cause a degradation of performance.
>>> In addition, it doesn=92t provide a complete solution for read duri=
ng
>>> write
>>> since it doesn=92t solve the problem of =93what to do with the inte=
rrupted
> write request remainder?=94.  That is, a common interrupting read req=
uest
> will usually be followed by another one. If we just continue to write
> the
>>> interrupted write request remainder we will probably get another HP=
I
> due
>>> to the second read request, so eventually we may end up with lots o=
f
>>> HPIs
>>> and write retries. A complete solution will be: stop the current wr=
ite,
> change packing mode to non-packing, serve the read request, push back
> the
>>> write remainders to the block I/O scheduler and let him schedule th=
em
> again probably after the read burst ends (this requires block layer
> support of course).
>>> Regarding the packing control, there seem to be a confusion since t=
he
> number-of-requests is the trigger for *enabling* the packing (after i=
t
> was
>>> disabled), while a single read request disable packing. Therefore, =
the
> packing is stopped at the right time.
>>> The packing control doesn't add any scheduling policy to the MMC la=
yer.
> The write packing feature is the one changing the scheduling policy b=
y
> fetching many write requests in a row without a delay that allows rea=
d
> requests to come in the middle.
>>> By disabling the write packing, the write packing control returns t=
he
>>> old
>>> scheduling policy. It causes the MMC to fetch the requests one by o=
ne,
> thus read requests are served as before.
>>> It is correct that the trigger for enabling the write packing contr=
ol
> should be adjusted per platform and doesn't give a complete solution.
> As
>>> I
>>> mentioned above, the complete solution will include the usage of wr=
ite
> packing control, a re-insert of the write packed to the scheduler whe=
n
> a
>>> read request is fetched and usage of HPI to stop the packing that i=
s
> already transferred.
>>> To summarize -
>>> We recommend including the write packing in 3.6 due to the followin=
g
> reasons:
>>> 1. It significantly improves the write throughput
>>> 2. In some of the cases it even decreases the read latency
>>> 3. The read degradation in simultaneous read-write flows already ex=
ist,
> even without this feature
>>> As for the write packing control, it can be included in 3.6 to supp=
ly a
> partial solution for the read degradation or we can postpone it to 3.=
7
> and
>>> integrate it as part of the complete solution.
>>> Thanks,
>>> Maya
>>>> My sense is that there's no way we can solve all of these to
>>>> satisfaction in the next week (which is when the merge window will
>>> open), but that by waiting a cycle we might come up with some good
> answers.
>>>> What do other people think?  If you're excited about these patchse=
ts,
>>> now would be a fine time to come forward with your benchmarking res=
ults
> and to help understand the reads-during-writes regression.
>>
>
>
> --
> Sent by consultant of Qualcomm Innovation Center, Inc.
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-mmc" =
in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


--=20
Sent by consultant of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum