From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757583AbcEEPWe (ORCPT ); Thu, 5 May 2016 11:22:34 -0400 Received: from bombadil.infradead.org ([198.137.202.9]:34841 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753503AbcEEPWc (ORCPT ); Thu, 5 May 2016 11:22:32 -0400 Date: Thu, 5 May 2016 08:22:30 -0700 From: Christoph Hellwig To: Dan Williams Cc: Christoph Hellwig , Boaz Harrosh , linux-block@vger.kernel.org, linux-ext4 , Jan Kara , Matthew Wilcox , Dave Chinner , "linux-kernel@vger.kernel.org" , XFS Developers , Jens Axboe , Linux MM , Al Viro , linux-nvdimm , linux-fsdevel , Andrew Morton Subject: Re: [PATCH v4 5/7] fs: prioritize and separate direct_io from dax_io Message-ID: <20160505152230.GA3994@infradead.org> References: <1461878218-3844-1-git-send-email-vishal.l.verma@intel.com> <1461878218-3844-6-git-send-email-vishal.l.verma@intel.com> <5727753F.6090104@plexistor.com> <20160505142433.GA4557@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, May 05, 2016 at 08:15:32AM -0700, Dan Williams wrote: > > Agreed - makig O_DIRECT less direct than not having it is plain stupid, > > and I somehow missed this initially. > > Of course I disagree because like Dave argues in the msync case we > should do the correct thing first and make it fast later, but also > like Dave this arguing in circles is getting tiresome. We should do the right thing first, and make it fast later. But this proposal is not getting it right - it still does not handle errors for the fast path, but magically makes it work for direct I/O by in general using a less optional path for O_DIRECT. It's getting the worst of all choices. As far as I can tell the only sensible option is to: - always try dax-like I/O first - have a custom get_user_pages + rw_bytes fallback handles bad blocks when hitting EIO And then we need to sort out the concurrent write synchronization. Again there I think we absolutely have to obey Posix for the !O_DIRECT case and can avoid it for O_DIRECT, similar to the existing non-DAX semantics. If we want any special additional semantics we _will_ need a special O_DAX flag.