From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ingo Molnar Subject: Re: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t Date: Thu, 7 May 2015 19:36:41 +0200 Message-ID: <20150507173641.GA21781@gmail.com> References: <20150506200219.40425.74411.stgit@dwillia2-desk3.amr.corp.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Linus Torvalds , Linux Kernel Mailing List , Boaz Harrosh , Jan Kara , Mike Snitzer , Neil Brown , Benjamin Herrenschmidt , Dave Hansen , Heiko Carstens , Chris Mason , Paul Mackerras , "H. Peter Anvin" , Christoph Hellwig , Alasdair Kergon , "linux-nvdimm@lists.01.org" , Mel Gorman , Matthew Wilcox , Ross Zwisler , Rik van Riel , Martin Schwidefsky , Jens Axboe , Theodore Ts'o , "Martin K. Petersen" Return-path: Received: from mail-wg0-f53.google.com ([74.125.82.53]:34980 "EHLO mail-wg0-f53.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751007AbbEGRgs (ORCPT ); Thu, 7 May 2015 13:36:48 -0400 Content-Disposition: inline In-Reply-To: Sender: linux-fsdevel-owner@vger.kernel.org List-ID: * Dan Williams wrote: > > Anyway, I did want to say that while I may not be convinced about=20 > > the approach, I think the patches themselves don't look horrible.=20 > > I actually like your "__pfn_t". So while I (very obviously) have=20 > > some doubts about this approach, it may be that the most=20 > > convincing argument is just in the code. >=20 > Ok, I'll keep thinking about this and come back when we have a=20 > better story about passing mmap'd persistent memory around in=20 > userspace. So is there anything fundamentally wrong about creating struct page=20 backing at mmap() time (and making sure aliased mmaps share struct=20 page arrays)? Because if that is done, then the DMA agent won't even know about the=20 memory being persistent RAM. It's just a regular struct page, that=20 happens to point to persistent RAM. Same goes for all the high level=20 VM APIs, futexes, etc. Everything will Just Work. It will also be relatively fast: mmap() is a relative slowpath,=20 comparatively. As far as RAID is concerned: that's a relatively easy situation, as=20 there's only a single user of the devices, the RAID context that=20 manages all component devices exclusively. Device to device DMA can=20 use the block layer directly, i.e. most of the patches you've got here=20 in this series, except: 74287 C May 06 Dan Williams ( 232) =E2=94=9C=E2=94=80>[PATCH v2 09= /10] dax: convert to __pfn_t I think DAX mmap()s need struct page backing. I think there's a simple rule: if a page is visible to user-space via=20 the MMU then it needs struct page backing. If it's "hidden", like=20 behind a RAID abstraction, it probably doesn't. With the remaining patches a high level RAID driver ought to be able=20 to send pfn-to-sector and sector-to-pfn requests to other block=20 drivers, without any unnecessary struct page allocation overhead,=20 right? As long as the pfn concept remains a clever way to reuse our=20 ram<->sector interfaces to implement sector<->sector IO, in the cases=20 where the IO has no serialization or MMU concerns, not using struct=20 page and using pfn_t looks natural. The moment it starts reaching user space APIs, like in the DAX case,=20 and especially if it becomes user-MMU visible, it's a mistake to not=20 have struct page backing, I think. (In that sense the current DAX mmap() code is already a partial=20 mistake.) Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel= " in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html