* MMC performance degradation due to too many requests @ 2018-05-14 11:30 Faiz Abbas 2018-05-14 16:41 ` Tony Lindgren 2018-05-15 12:39 ` Linus Walleij 0 siblings, 2 replies; 6+ messages in thread From: Faiz Abbas @ 2018-05-14 11:30 UTC (permalink / raw) To: linux-kernel@vger.kernel.org, linux-omap, linux-mmc Cc: Ulf Hansson, Linus Walleij Hi Everyone, I am debugging a performance degradation issue in some SD cards which happened around 4.13 kernel on TI's AM335x-evm. I compare the time taken for a simple dd command (after dropping caches) to run between kernel versions 4.13 and 4.14. I see that there is a consistent 15-20% reduction in speed. The amount of decrease depends on the card used. command: # echo 3 > /proc/sys/vm/drop_caches; dd if=/dev/zero of=/dev/mmcblk1p2 bs=1M count=100; v4.13 ~22s for 100M transfer v4.14 ~28s for 100M transfer Also, I see many times the number of interrupts for mmc1 in 4.14. v4.14 34: 2122 INTC 64 Level mmc1 v4.13 162: 457 INTC 64 Level mmc1 This increase of interrupts also exists in the latest kernel. I also printed out the size of the DMA transfers in the omap_hsmmc driver during the dd command. I see that the maximum size of the dma transfer has decreased from 1 MB to 256kB between 4.13 and 4.14. Full log: v4.13 https://pastebin.ubuntu.com/p/gK2k9GZcNd/ v4.14 https://pastebin.ubuntu.com/p/3YzCbQ79rk/ It looks like something has started splitting the request into 4 times as many smaller buffers and this is what is causing the performance decrease. I also ran the ftrace function profile tool on mmc* and blk* functions and see that the function mmc_blk_issue_rq() is where the program spends most of its time and the calls to that and other related functions have increased around 4 times with the average time per call being reduced. Function profile Log: v4.13 https://pastebin.ubuntu.com/p/R5JG6FpTvy/ v4.14 https://pastebin.ubuntu.com/p/Kc6QYYwYcN/ I am not sure why calls to mmc_blk_issue_rq() have increased by so much. Any ideas why this could be happening? Thanks, Faiz ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: MMC performance degradation due to too many requests 2018-05-14 11:30 MMC performance degradation due to too many requests Faiz Abbas @ 2018-05-14 16:41 ` Tony Lindgren 2018-05-15 12:39 ` Linus Walleij 1 sibling, 0 replies; 6+ messages in thread From: Tony Lindgren @ 2018-05-14 16:41 UTC (permalink / raw) To: Faiz Abbas Cc: linux-kernel@vger.kernel.org, linux-omap, linux-mmc, Ulf Hansson, Linus Walleij * Faiz Abbas <faiz_abbas@ti.com> [180514 11:31]: > Hi Everyone, > > I am debugging a performance degradation issue in some SD cards which > happened around 4.13 kernel on TI's AM335x-evm. > > I compare the time taken for a simple dd command (after dropping > caches) to run between kernel versions 4.13 and 4.14. I see that there > is a consistent 15-20% reduction in speed. The amount of decrease > depends on the card used. Care to run git bisect between v4.13 and v4.14? Seems like that should work just fine between that range for boot and performance testing. Regards, Tony ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: MMC performance degradation due to too many requests 2018-05-14 11:30 MMC performance degradation due to too many requests Faiz Abbas 2018-05-14 16:41 ` Tony Lindgren @ 2018-05-15 12:39 ` Linus Walleij 2018-05-15 15:11 ` Faiz Abbas 1 sibling, 1 reply; 6+ messages in thread From: Linus Walleij @ 2018-05-15 12:39 UTC (permalink / raw) To: Faiz Abbas Cc: linux-kernel@vger.kernel.org, linux-omap, linux-mmc, Ulf Hansson On Mon, May 14, 2018 at 1:30 PM, Faiz Abbas <faiz_abbas@ti.com> wrote: > Also, I see many times the number of interrupts for mmc1 in 4.14. (...) > I also printed out the size of the DMA transfers in the omap_hsmmc > driver during the dd command. I see that the maximum size of the dma > transfer has decreased from 1 MB to 256kB between 4.13 and 4.14. This seems like my fault for removing the bounce buffer. (The name bounce buffer is misleading, I suspect the right term for what is was doing was "merge buffer" or something, coalescing physically consecutive requests.) commit de3ee99b097dd51938276e3af388cd4ad0f2750a "mmc: Delete bounce buffer handling" should be the culprit. The bounce buffer only makes any performance impact if the host DMA controller is unable to handle scatter-gather lists so the number of segments is set down to 1. Can you verifty that: - mmc->max_segs == 1 - You were previously enabling CONFIG_MMC_BLOCK_BOUNCE I am a bit confused since omap_hsmmc sets max_segs to 64, indicating that it can do scatter-gather, and the bounce buffer code will only every kick in if max_segs == 1. The solution we created for SDHCI is: commit bd9b902798ab14d19ca116b10bde581ddff8f905 "mmc: sdhci: Implement an SDHCI-specific bounce buffer" i.e. we push this bouncing down to the driver and does not try to do it in the MMC core. I'd be happy to come up with something OMAP-specific as well if you can only help me to figure out the code path leading to max_segs == 1. Yours, Linus Walleij ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: MMC performance degradation due to too many requests 2018-05-15 12:39 ` Linus Walleij @ 2018-05-15 15:11 ` Faiz Abbas 2018-05-16 14:29 ` Linus Walleij 0 siblings, 1 reply; 6+ messages in thread From: Faiz Abbas @ 2018-05-15 15:11 UTC (permalink / raw) To: Linus Walleij Cc: linux-kernel@vger.kernel.org, linux-omap, linux-mmc, Ulf Hansson Hi Linus, On Tuesday 15 May 2018 06:09 PM, Linus Walleij wrote: > On Mon, May 14, 2018 at 1:30 PM, Faiz Abbas <faiz_abbas@ti.com> wrote: > >> Also, I see many times the number of interrupts for mmc1 in 4.14. > (...) >> I also printed out the size of the DMA transfers in the omap_hsmmc >> driver during the dd command. I see that the maximum size of the dma >> transfer has decreased from 1 MB to 256kB between 4.13 and 4.14. > > This seems like my fault for removing the bounce buffer. > (The name bounce buffer is misleading, I suspect the right > term for what is was doing was "merge buffer" or something, > coalescing physically consecutive requests.) > > commit de3ee99b097dd51938276e3af388cd4ad0f2750a > "mmc: Delete bounce buffer handling" > should be the culprit. I don't think so. I checked out to the commit before that and see that the interrupts are still too high. > > The bounce buffer only makes any performance impact if the > host DMA controller is unable to handle scatter-gather lists > so the number of segments is set down to 1. > > Can you verifty that: > - mmc->max_segs == 1 No. It is 64 in v4.14. It was 1024 in v4.13. v4.14 https://pastebin.ubuntu.com/p/4wXB2jb6zw/ v4.13 https://pastebin.ubuntu.com/p/dpd6kb4YJY/ > - You were previously enabling CONFIG_MMC_BLOCK_BOUNCE Yes. Regards, Faiz ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: MMC performance degradation due to too many requests 2018-05-15 15:11 ` Faiz Abbas @ 2018-05-16 14:29 ` Linus Walleij 2018-05-17 12:54 ` Faiz Abbas 0 siblings, 1 reply; 6+ messages in thread From: Linus Walleij @ 2018-05-16 14:29 UTC (permalink / raw) To: Faiz Abbas, Will Newton, ext Tony Lindgren Cc: linux-kernel@vger.kernel.org, linux-omap, linux-mmc, Ulf Hansson On Tue, May 15, 2018 at 5:11 PM, Faiz Abbas <faiz_abbas@ti.com> wrote: > On Tuesday 15 May 2018 06:09 PM, Linus Walleij wrote: >> On Mon, May 14, 2018 at 1:30 PM, Faiz Abbas <faiz_abbas@ti.com> wrote: >> >>> Also, I see many times the number of interrupts for mmc1 in 4.14. >> (...) >>> I also printed out the size of the DMA transfers in the omap_hsmmc >>> driver during the dd command. I see that the maximum size of the dma >>> transfer has decreased from 1 MB to 256kB between 4.13 and 4.14. >> >> This seems like my fault for removing the bounce buffer. >> (The name bounce buffer is misleading, I suspect the right >> term for what is was doing was "merge buffer" or something, >> coalescing physically consecutive requests.) >> >> commit de3ee99b097dd51938276e3af388cd4ad0f2750a >> "mmc: Delete bounce buffer handling" >> should be the culprit. > > I don't think so. I checked out to the commit before that and see that > the interrupts are still too high. OK phew then I'm safe :) >> The bounce buffer only makes any performance impact if the >> host DMA controller is unable to handle scatter-gather lists >> so the number of segments is set down to 1. >> >> Can you verifty that: >> - mmc->max_segs == 1 > > No. It is 64 in v4.14. It was 1024 in v4.13. This smells like: commit 9442400486a1ec28915e9ab31517a5665ed35705 "mmc: omap_hsmmc: Reduce max_segs for reliability" Can you test reverting that and notify Will Newton if this commit is causing performance regressions. Yours, Linus Walleij ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: MMC performance degradation due to too many requests 2018-05-16 14:29 ` Linus Walleij @ 2018-05-17 12:54 ` Faiz Abbas 0 siblings, 0 replies; 6+ messages in thread From: Faiz Abbas @ 2018-05-17 12:54 UTC (permalink / raw) To: Linus Walleij, Will Newton, ext Tony Lindgren Cc: linux-kernel@vger.kernel.org, linux-omap, linux-mmc, Ulf Hansson Hi, On Wednesday 16 May 2018 07:59 PM, Linus Walleij wrote: > On Tue, May 15, 2018 at 5:11 PM, Faiz Abbas <faiz_abbas@ti.com> wrote: >> On Tuesday 15 May 2018 06:09 PM, Linus Walleij wrote: >>> On Mon, May 14, 2018 at 1:30 PM, Faiz Abbas <faiz_abbas@ti.com> wrote: >>> >>>> Also, I see many times the number of interrupts for mmc1 in 4.14. >>> (...) >>>> I also printed out the size of the DMA transfers in the omap_hsmmc >>>> driver during the dd command. I see that the maximum size of the dma >>>> transfer has decreased from 1 MB to 256kB between 4.13 and 4.14. >>> >>> This seems like my fault for removing the bounce buffer. >>> (The name bounce buffer is misleading, I suspect the right >>> term for what is was doing was "merge buffer" or something, >>> coalescing physically consecutive requests.) >>> >>> commit de3ee99b097dd51938276e3af388cd4ad0f2750a >>> "mmc: Delete bounce buffer handling" >>> should be the culprit. >> >> I don't think so. I checked out to the commit before that and see that >> the interrupts are still too high. > > OK phew then I'm safe :) > >>> The bounce buffer only makes any performance impact if the >>> host DMA controller is unable to handle scatter-gather lists >>> so the number of segments is set down to 1. >>> >>> Can you verifty that: >>> - mmc->max_segs == 1 >> >> No. It is 64 in v4.14. It was 1024 in v4.13. > > This smells like: > commit 9442400486a1ec28915e9ab31517a5665ed35705 > "mmc: omap_hsmmc: Reduce max_segs for reliability" > Yes. I was able to bisect the number of interrupts to the same commit as well. On top of v4.14, *Without revert*: https://pastebin.ubuntu.com/p/VJgvzrCP9g/ avg speed for 100 MB transfer: 3.7 MBps Number of interrupts per 100 MB transfer: ~2000 *After revert*: https://pastebin.ubuntu.com/p/ktgTmKVKkN/ avg speed for 100 MB transfer: 4.6 MBps Number of interrupts per 100 MB transfer: ~500 Also, I'm getting 1M buffers in the omap_hsmmc DMA now. @Will will it be OK to change the mmc_segs back to 1024? Thanks, Faiz ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2018-05-17 12:54 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-05-14 11:30 MMC performance degradation due to too many requests Faiz Abbas 2018-05-14 16:41 ` Tony Lindgren 2018-05-15 12:39 ` Linus Walleij 2018-05-15 15:11 ` Faiz Abbas 2018-05-16 14:29 ` Linus Walleij 2018-05-17 12:54 ` Faiz Abbas
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox