From mboxrd@z Thu Jan 1 00:00:00 1970 From: Faiz Abbas Subject: Re: [PATCH v2 1/8] mmc: sdhci: Get rid of finish_tasklet Date: Thu, 14 Mar 2019 17:11:21 +0530 Message-ID: References: <20190215192033.24203-1-faiz_abbas@ti.com> <20190215192033.24203-2-faiz_abbas@ti.com> <8d72ff93-e07f-52b9-da85-acd54f046694@ti.com> <63b6631d-86e7-b8ef-ffaf-40e7d4e96cfb@intel.com> <842caafd-1547-1ea6-faf0-27a85a912622@ti.com> <2a74ed21-2e6f-1ba3-3d49-6826a5ab3e66@ti.com> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <2a74ed21-2e6f-1ba3-3d49-6826a5ab3e66@ti.com> Content-Language: en-US Sender: linux-kernel-owner@vger.kernel.org To: Grygorii Strashko , Adrian Hunter , linux-kernel@vger.kernel.org, devicetree@vger.kernel.org, linux-mmc@vger.kernel.org, linux-omap@vger.kernel.org Cc: ulf.hansson@linaro.org, robh+dt@kernel.org, mark.rutland@arm.com, kishon@ti.com, zhang.chunyan@linaro.org List-Id: devicetree@vger.kernel.org Hi, On 14/03/19 4:45 PM, Grygorii Strashko wrote: > > > On 12.03.19 19:30, Rizvi, Mohammad Faiz Abbas wrote: >> Hi Adrian, >> >> On 3/8/2019 7:06 PM, Adrian Hunter wrote: >>> On 6/03/19 12:00 PM, Faiz Abbas wrote: >>>> Adrian, >>>> >>>> On 25/02/19 1:47 PM, Adrian Hunter wrote: >>>>> On 15/02/19 9:20 PM, Faiz Abbas wrote: >>>>>> sdhci.c has two bottom halves implemented. A threaded_irq for handling >>>>>> card insert/remove operations and a tasklet for finishing mmc requests. >>>>>> With the addition of external dma support, dmaengine APIs need to >>>>>> terminate in non-atomic context before unmapping the dma buffers. >>>>>> >>>>>> To facilitate this, remove the finish_tasklet and move the call of >>>>>> sdhci_request_done() to the threaded_irq() callback. >>>>> >>>>> The irq thread has a higher latency than the tasklet. The performance drop >>>>> is measurable on the system I tried: >>>>> >>>>> Before: >>>>> >>>>> # dd if=/dev/mmcblk1 of=/dev/null bs=1G count=1 & >>>>> 1+0 records in >>>>> 1+0 records out >>>>> 1073741824 bytes (1.1 GB) copied, 4.44502 s, 242 MB/s >>>>> >>>>> After: >>>>> >>>>> # dd if=/dev/mmcblk1 of=/dev/null bs=1G count=1 & >>>>> 1+0 records in >>>>> 1+0 records out >>>>> 1073741824 bytes (1.1 GB) copied, 4.50898 s, 238 MB/s >>>>> >>>>> So we only want to resort to the thread for the error case. >>>>> >>>> >>>> Sorry for the late response here, but this is about 1.6% decrease. I >>>> tried out the same commands on a dra7xx board here (with about 5 >>>> consecutive dd of 1GB) and the average decrease was 0.3%. I believe you >>>> will also find a lesser percentage change if you average over multiple >>>> dd commands. >>>> >>>> Is this really so significant that we have to maintain two different >>>> bottom halves and keep having difficulty with adding APIs that can sleep? >>> >>> It is a performance drop that can be avoided, so it might as well be. >>> Splitting the success path from the failure path is common for I/O drivers >>> for similar reasons as here: the success path can be optimized whereas the >>> failure path potentially needs to sleep. >> >> Understood. You wanna keep the success path as fast as possible. > > Sry, I've not completely followed this series, but I'd like to add 5c > > It's good thing to get rid of tasklets hence RT Linux kernel is actively moving towards to LKML > and there everything handled in threads (even networking trying to get rid of softirqs). > > Performance is pretty relative thing here - just try to run network traffic in parallel, and > there are no control over it comparing to threads. Now way to assign priority or pin to CPU. There is a 2007 LWN article(https://lwn.net/Articles/239633/) which talks about removing tasklets altogether. I wonder what happened after that. Thanks, Faiz