From mboxrd@z Thu Jan 1 00:00:00 1970 From: merez@codeaurora.org Subject: Re: [PATCH v2 1/1] mmc: block: Add write packing control Date: Tue, 24 Jul 2012 13:23:16 -0700 (PDT) Message-ID: <39c8c611bf7f120d9b1c519d56b44e15.squirrel@www.codeaurora.org> References: <34e29d774773a7dfa3e1b57232249b68.squirrel@www.codeaurora.org> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: In-Reply-To: <34e29d774773a7dfa3e1b57232249b68.squirrel@www.codeaurora.org> Sender: linux-mmc-owner@vger.kernel.org Cc: "S, Venkatraman" , merez@codeaurora.org, Chris Ball , Muthu Kumar , linux-mmc@vger.kernel.org, linux-arm-msm@vger.kernel.org, open list , Seungwon Jeon List-Id: linux-arm-msm@vger.kernel.org Attached are the trace logs for parallel read and write lmdd operations= =2E On Tue, July 24, 2012 1:44 am, merez@codeaurora.org wrote: > On Mon, July 23, 2012 5:22 am, S, Venkatraman wrote: >> On Mon, Jul 23, 2012 at 5:13 PM, wrote: >>> On Wed, July 18, 2012 12:26 am, Chris Ball wrote: >>>> Hi, [removing Jens and the documentation list, since now we're > talking about the MMC side only] >>>> On Wed, Jul 18 2012, merez@codeaurora.org wrote: >>>>> Is there anything else that holds this patch from being pushed to >>> mmc-next? >>>> Yes, I'm still uncomfortable with the write packing patchsets for = a >>> couple of reasons, and I suspect that the sum of those reasons mean= s >>> that >>> we should probably plan on holding off merging it until after 3.6. >>>> Here are the open issues; please correct any misunderstandings: Wi= th > Seungwon's patchset ("Support packed write command"): >>>> * I still don't have a good set of representative benchmarks showi= ng >>>> what kind of performance changes come with this patchset. It se= ems >>> like we've had a small amount of testing on one controller/eMMC par= t >>> combo >>> from Seungwon, and an entirely different test from Maya, and the > results >>> aren't documented fully anywhere to the level of describing what th= e > hardware was, what the test was, and what the results were before and > after the patchset. >>> Currently, there is only one card vendor that supports packed comma= nds. > Following are our sequential write (LMDD) test results on 2 of our > targets >>> (in MB/s): >>> No packing packing >>> Target 1 (SDR 50MHz) 15 25 >>> Target 2 (DDR 50MHz) 20 30 >>>> With the reads-during-writes regression: >>>> * Venkat still has open questions about the nature of the read >>>> regression, and thinks we should understand it with blktrace bef= ore >>> trying to fix it. Maya has a theory about writes overwhelming read= s, >>> but >>> Venkat doesn't understand why this would explain the observed >>> bandwidth drop. >>> The degradation of read due to writes is not a new behavior and exi= sts > also without the write packing feature (which only increases the > degradation). Our investigation of this phenomenon led us to the > Conclusion that a new scheduling policy should be used for mobile > devices, >>> but this is not related to the current discussion of the write pack= ing > feature. >>> The write packing feature increases the degradation of read due to > write >>> since it allows the MMC to fetch many write requests in a row, inst= ead >>> of >>> fetching only one at a time. Therefore some of the read requests w= ill > have to wait for the completion of more write requests before they ca= n > be >>> issued. >> >> I am a bit puzzled by this claim. One thing I checked carefully when > reviewing write packing patches from SJeon was that the code didn't > plough through a mixed list of reads and writes and selected only > writes. >> This section of the code in "mmc_blk_prep_packed_list()", from v8 > patchset.. >> >> + if (rq_data_dir(cur) !=3D rq_data_dir(next)) { >> + put_back =3D 1; >> + break; >> + } >> >> >> means that once a read is encountered in the middle of write packing= , > the packing is stopped at that point and it is executed. Then the nex= t > blk_fetch_request should get the next read and continue as before. >> >> IOW, the ordering of reads and writes is _not_ altered when using pa= cked > commands. >> For example if there were 5 write requests, followed by 1 read, >> followed by 5 more write requests in the request_queue, the first 5 > writes will be executed as one "packed command", then the read will b= e > executed, and then the remaining 5 writes will be executed as one > "packed command". So the read does not have to wait any more than it > waited before (packing feature) > > Let me try to better explain with your example. > Without packing the MMC layer will fetch 2 write requests and wait fo= r the > first write request completion before fetching another write request. > During this time the read request could be inserted into the CFQ and = since > it has higher priority than the async write it will be dispatched in = the > next fetch. So, the result would be 2 write requests followed by one = read > request and the read would have to wait for completion of only 2 writ= e > requests. > With packing, all the 5 write requests will be fetched in a row, and = then > the read will arrive and be dispatched in the next fetch. Then the re= ad > will have to wait for the completion of 5 write requests. > > Few more clarifications: > Due to the plug list mechanism in the block layer the applications ca= n > "aggregate" several requests to be inserted into the scheduler before > waking the MMC queue thread. > This leads to a situation where there are several write requests in t= he > CFQ queue when MMC starts to do the fetches. > > If the read was inserted while we are building the packed command the= n I > agree that we should have seen less effect on the read performance. > However, the write packing statistics show that in most of the cases = the > packing stopped due to an empty queue, meaning that the read was inse= rted > to the CFQ after all the pending write requests were fetched and pack= ed. > > Following is an example for write packing statistics of a READ/WRITE > parallel scenario: > write packing statistics: > Packed 1 reqs - 448 times > Packed 2 reqs - 38 times > Packed 3 reqs - 23 times > Packed 4 reqs - 30 times > Packed 5 reqs - 14 times > Packed 6 reqs - 8 times > Packed 7 reqs - 4 times > Packed 8 reqs - 1 times > Packed 10 reqs - 1 times > Packed 34 reqs - 1 times > stopped packing due to the following reasons: > 2 times: wrong data direction (meaning a READ was fetched and stopped= the > packing) > 1 times: flush or discard > 565 times: empty queue (meaning blk_fetch_request returned NULL) > >> >> And I requested blktrace to confirm that this is indeed the behaviou= r. > > The trace logs show that in case of no packing, there are maximum of = 3-4 > requests issued before a read request, while with packing there are a= lso > cases of 6 and 7 requests dispatched before a read request. > > I'm waiting for an approval for sharing the block trace logs. > Since this is a simple test to run you can collect the trace logs and= let > us know if you reach other conclusions. > > Thanks, > Maya > >> >> Your rest of the arguments anyway depend on this assertion, so can y= ou > please clarify this. >> >>> To overcome this behavior, the solution would be to stop the write >>> packing >>> when a read request is fetched, and this is the algorithm suggested= by >>> the >>> write packing control. >>> Let's also keep in mind that lmdd benchmarking doesn't fully reflec= t > the >>> real life in which there are not many scenarios that cause massive = read > and write operations. In our user-common-scenarios tests we saw that = in > many cases the write packing decreases the read latency. It can happe= n > in >>> cases where the same amount of write requests is fetched with and >>> without >>> packing. In such a case the write packing decreases the transfer ti= me > of >>> the write requests and causes the read request to wait for a shorte= r >>> time. >>>> With Maya's patchset ("write packing control"): >>>> * Venkat thinks that HPI should be used, and the number-of-request= s >>>> metric is too coarse, and it doesn't let you disable packing at = the >>> right time, and you're essentially implementing a new I/O scheduler >>> inside >>> the MMC subsystem without understanding the root cause for why that= 's > necessary. >>> According to our measurements the stop transmission (CMD12) + HPI i= s a > heavy operation that may take up to several milliseconds. Therefore, = a > massive usage of HPI can cause a degradation of performance. >>> In addition, it doesn=92t provide a complete solution for read duri= ng >>> write >>> since it doesn=92t solve the problem of =93what to do with the inte= rrupted > write request remainder?=94. That is, a common interrupting read req= uest > will usually be followed by another one. If we just continue to write > the >>> interrupted write request remainder we will probably get another HP= I > due >>> to the second read request, so eventually we may end up with lots o= f >>> HPIs >>> and write retries. A complete solution will be: stop the current wr= ite, > change packing mode to non-packing, serve the read request, push back > the >>> write remainders to the block I/O scheduler and let him schedule th= em > again probably after the read burst ends (this requires block layer > support of course). >>> Regarding the packing control, there seem to be a confusion since t= he > number-of-requests is the trigger for *enabling* the packing (after i= t > was >>> disabled), while a single read request disable packing. Therefore, = the > packing is stopped at the right time. >>> The packing control doesn't add any scheduling policy to the MMC la= yer. > The write packing feature is the one changing the scheduling policy b= y > fetching many write requests in a row without a delay that allows rea= d > requests to come in the middle. >>> By disabling the write packing, the write packing control returns t= he >>> old >>> scheduling policy. It causes the MMC to fetch the requests one by o= ne, > thus read requests are served as before. >>> It is correct that the trigger for enabling the write packing contr= ol > should be adjusted per platform and doesn't give a complete solution. > As >>> I >>> mentioned above, the complete solution will include the usage of wr= ite > packing control, a re-insert of the write packed to the scheduler whe= n > a >>> read request is fetched and usage of HPI to stop the packing that i= s > already transferred. >>> To summarize - >>> We recommend including the write packing in 3.6 due to the followin= g > reasons: >>> 1. It significantly improves the write throughput >>> 2. In some of the cases it even decreases the read latency >>> 3. The read degradation in simultaneous read-write flows already ex= ist, > even without this feature >>> As for the write packing control, it can be included in 3.6 to supp= ly a > partial solution for the read degradation or we can postpone it to 3.= 7 > and >>> integrate it as part of the complete solution. >>> Thanks, >>> Maya >>>> My sense is that there's no way we can solve all of these to >>>> satisfaction in the next week (which is when the merge window will >>> open), but that by waiting a cycle we might come up with some good > answers. >>>> What do other people think? If you're excited about these patchse= ts, >>> now would be a fine time to come forward with your benchmarking res= ults > and to help understand the reads-during-writes regression. >> > > > -- > Sent by consultant of Qualcomm Innovation Center, Inc. > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-mmc" = in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > --=20 Sent by consultant of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum