From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.1 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_PASS, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0C990C4646D for ; Wed, 8 Aug 2018 08:42:06 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id B7A70216FB for ; Wed, 8 Aug 2018 08:42:05 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=ideasonboard.com header.i=@ideasonboard.com header.b="IkETxSjU" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B7A70216FB Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=ideasonboard.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727321AbeHHLAn (ORCPT ); Wed, 8 Aug 2018 07:00:43 -0400 Received: from perceval.ideasonboard.com ([213.167.242.64]:52360 "EHLO perceval.ideasonboard.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727098AbeHHLAn (ORCPT ); Wed, 8 Aug 2018 07:00:43 -0400 Received: from avalon.localnet (dfj612ybrt5fhg77mgycy-3.rev.dnainternet.fi [IPv6:2001:14ba:21f5:5b00:2e86:4862:ef6a:2804]) by perceval.ideasonboard.com (Postfix) with ESMTPSA id D587757; Wed, 8 Aug 2018 10:41:59 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ideasonboard.com; s=mail; t=1533717720; bh=WrLRA5JxyKG/8KaMjpao3MpPFykClLAjVtbnT+/gQ+I=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=IkETxSjUynzrZol6ufrCNhKCO4S4ZE7jVn/3B50njuMEcClByZE01kEHgr3+1GpS+ P1rsKtGAfNODKs/h6czi05DJMqy8+l2ZJ1cCK8Dmph5Chg9ot29TzgnvhW/0nYJdzi Usu4ptP4gLMMO84fACbPvEX2iWEQ2VZi5eZi8olM= From: Laurent Pinchart To: Tomasz Figa Cc: keiichiw@chromium.org, Linux Kernel Mailing List , Mauro Carvalho Chehab , Linux Media Mailing List , Kieran Bingham , Douglas Anderson , Alan Stern , Ezequiel Garcia , "Matwey V. Kornilov" Subject: Re: [RFC PATCH v1] media: uvcvideo: Cache URB header data before processing Date: Wed, 08 Aug 2018 11:42:44 +0300 Message-ID: <3411643.50e8mdYzJX@avalon> Organization: Ideas on Board Oy In-Reply-To: References: <20180627103408.33003-1-keiichiw@chromium.org> <11886963.8nkeRH3xvi@avalon> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Tomasz, On Wednesday, 8 August 2018 07:08:59 EEST Tomasz Figa wrote: > On Tue, Jul 31, 2018 at 1:00 AM Laurent Pinchart wrote: > > On Wednesday, 27 June 2018 13:34:08 EEST Keiichi Watanabe wrote: > >> On some platforms with non-coherent DMA (e.g. ARM), USB drivers use > >> uncached memory allocation methods. In such situations, it sometimes > >> takes a long time to access URB buffers. This can be a cause of video > >> flickering problems if a resolution is high and a USB controller has > >> a very tight time limit. (e.g. dwc2) To avoid this problem, we copy > >> header data from (uncached) URB buffer into (cached) local buffer. > >> > >> This change should make the elapsed time of the interrupt handler > >> shorter on platforms with non-coherent DMA. We measured the elapsed > >> time of each callback of uvc_video_complete without/with this patch > >> while capturing Full HD video in > >> https://webrtc.github.io/samples/src/content/getusermedia/resolution/. > >> I tested it on the top of Kieran Bingham's Asynchronous UVC series > >> https://www.mail-archive.com/linux-media@vger.kernel.org/msg128359.html. > >> The test device was Jerry Chromebook (RK3288) with Logitech Brio 4K. > >> I collected data for 5 seconds. (There were around 480 callbacks in > >> this case.) The following result shows that this patch makes > >> uvc_video_complete about 2x faster. > >> > >> | average | median | min | max | standard deviation > >> w/o caching| 45319ns | 40250ns | 33834ns | 142625ns| 16611ns > >> w/ caching| 20620ns | 19250ns | 12250ns | 56583ns | 6285ns > >> > >> In addition, we confirmed that this patch doesn't make it worse on > >> coherent DMA architecture by performing the same measurements on a > >> Broadwell Chromebox with the same camera. > >> > >> | average | median | min | max | standard deviation > >> w/o caching| 21026ns | 21424ns | 12263ns | 23956ns | 1932ns > >> w/ caching| 20728ns | 20398ns | 8922ns | 45120ns | 3368ns > > > > This is very interesting, and it seems related to https:// > > patchwork.kernel.org/patch/10468937/. You might have seen that discussion > > as you got CC'ed at some point. > > > > I wonder whether performances couldn't be further improved by allocating > > the URB buffers cached, as that would speed up the memcpy() as well. Have > > you tested that by any chance ? > > We haven't measure it, but the issue being solved here was indeed > significantly reduced by using cached URB buffers, even without > Kieran's async series. After we discovered the latter, we just > backported it and decided to further tweak the last remaining bit, to > avoid playing too much with the DMA API in code used in production on > several different platforms (including both ARM and x86). > > If you think we could change the driver to use cached buffers instead > (as the pwc driver mentioned in another thread), I wouldn't have > anything against it obviously. I think there's a chance that performances could be further improved. Furthermore, it would lean to simpler code as we wouldn't need to deal with caching headers manually. I would however like to see numbers before making a decision. -- Regards, Laurent Pinchart