From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id AF4A821D2DD05 for ; Fri, 4 Aug 2017 11:19:00 -0700 (PDT) Date: Fri, 4 Aug 2017 12:21:09 -0600 From: Ross Zwisler Subject: Re: [PATCH 0/3] remove rw_page() from brd, pmem and btt Message-ID: <20170804182109.GA16128@linux.intel.com> References: <20170728165604.10455-1-ross.zwisler@linux.intel.com> <20170728173143.GE15980@bombadil.infradead.org> <20170802221359.GA20666@linux.intel.com> <20170803001315.GF32020@bbox> <20170803211335.GA1260@linux.intel.com> <20170804035441.GA305@bbox> <20170804081740.GA2083@bbox> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" To: Dan Williams Cc: Jens Axboe , Jerome Marchand , "linux-nvdimm@lists.01.org" , Dave Chinner , "linux-kernel@vger.kernel.org" , Matthew Wilcox , Christoph Hellwig , Minchan Kim , seungho1.park@lge.com, Jan Kara , "karam . lee" , Andrew Morton , Nitin Gupta List-ID: On Fri, Aug 04, 2017 at 11:01:08AM -0700, Dan Williams wrote: > [ adding Dave who is working on a blk-mq + dma offload version of the > pmem driver ] > > On Fri, Aug 4, 2017 at 1:17 AM, Minchan Kim wrote: > > On Fri, Aug 04, 2017 at 12:54:41PM +0900, Minchan Kim wrote: > [..] > >> Thanks for the testing. Your testing number is within noise level? > >> > >> I cannot understand why PMEM doesn't have enough gain while BTT is significant > >> win(8%). I guess no rw_page with BTT testing had more chances to wait bio dynamic > >> allocation and mine and rw_page testing reduced it significantly. However, > >> in no rw_page with pmem, there wasn't many cases to wait bio allocations due > >> to the device is so fast so the number comes from purely the number of > >> instructions has done. At a quick glance of bio init/submit, it's not trivial > >> so indeed, i understand where the 12% enhancement comes from but I'm not sure > >> it's really big difference in real practice at the cost of maintaince burden. > > > > I tested pmbench 10 times in my local machine(4 core) with zram-swap. > > In my machine, even, on-stack bio is faster than rw_page. Unbelievable. > > > > I guess it's really hard to get stable result in severe memory pressure. > > It would be a result within noise level(see below stddev). > > So, I think it's hard to conclude rw_page is far faster than onstack-bio. > > > > rw_page > > avg 5.54us > > stddev 8.89% > > max 6.02us > > min 4.20us > > > > onstack bio > > avg 5.27us > > stddev 13.03% > > max 5.96us > > min 3.55us > > The maintenance burden of having alternative submission paths is > significant especially as we consider the pmem driver ising more > services of the core block layer. Ideally, I'd want to complete the > rw_page removal work before we look at the blk-mq + dma offload > reworks. > > The change to introduce BDI_CAP_SYNC is interesting because we might > have use for switching between dma offload and cpu copy based on > whether the I/O is synchronous or otherwise hinted to be a low latency > request. Right now the dma offload patches are using "bio_segments() > > 1" as the gate for selecting offload vs cpu copy which seem > inadequate. Okay, so based on the feedback above and from Jens[1], it sounds like we want to go forward with removing the rw_page() interface, and instead optimize the regular I/O path via on-stack BIOS and dma offload, correct? If so, I'll prepare patches that fully remove the rw_page() code, and let Minchan and Dave work on their optimizations. [1]: https://lkml.org/lkml/2017/8/3/803 _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm