From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757682AbcEEVmR (ORCPT ); Thu, 5 May 2016 17:42:17 -0400 Received: from mga04.intel.com ([192.55.52.120]:63220 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756783AbcEEVmP (ORCPT ); Thu, 5 May 2016 17:42:15 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.24,583,1455004800"; d="scan'208";a="973593569" From: "Verma, Vishal L" To: "Williams, Dan J" , "hch@infradead.org" CC: "linux-kernel@vger.kernel.org" , "linux-block@vger.kernel.org" , "xfs@oss.sgi.com" , "linux-nvdimm@ml01.01.org" , "linux-mm@kvack.org" , "viro@zeniv.linux.org.uk" , "axboe@fb.com" , "akpm@linux-foundation.org" , "linux-fsdevel@vger.kernel.org" , "linux-ext4@vger.kernel.org" , "david@fromorbit.com" , "jack@suse.cz" , "matthew@wil.cx" Subject: Re: [PATCH v4 5/7] fs: prioritize and separate direct_io from dax_io Thread-Topic: [PATCH v4 5/7] fs: prioritize and separate direct_io from dax_io Thread-Index: AQHRoZNfS9ZF3cQEEUydwj8b8xF2vZ+mRHyAgAShZYCAAA4/AIAAa/uA Date: Thu, 5 May 2016 21:42:12 +0000 Message-ID: <1462484521.29294.4.camel@intel.com> References: <1461878218-3844-1-git-send-email-vishal.l.verma@intel.com> <1461878218-3844-6-git-send-email-vishal.l.verma@intel.com> <5727753F.6090104@plexistor.com> <20160505142433.GA4557@infradead.org> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.232.112.171] Content-Type: text/plain; charset="utf-8" Content-ID: <353BB2C55CB22643BBB65F3CF6BFC153@intel.com> MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by mail.home.local id u45LgMMj013161 On Thu, 2016-05-05 at 08:15 -0700, Dan Williams wrote: > On Thu, May 5, 2016 at 7:24 AM, Christoph Hellwig > wrote: > > > > On Mon, May 02, 2016 at 06:41:51PM +0300, Boaz Harrosh wrote: > > > > > > > > > > > All IO in a dax filesystem used to go through dax_do_io, which > > > > cannot > > > > handle media errors, and thus cannot provide a recovery path > > > > that can > > > > send a write through the driver to clear errors. > > > > > > > > Add a new iocb flag for DAX, and set it only for DAX mounts. In > > > > the IO > > > > path for DAX filesystems, use the same direct_IO path for both > > > > DAX and > > > > direct_io iocbs, but use the flags to identify when we are in > > > > O_DIRECT > > > > mode vs non O_DIRECT with DAX, and for O_DIRECT, use the > > > > conventional > > > > direct_IO path instead of DAX. > > > > > > > Really? What are your thinking here? > > > > > > What about all the current users of O_DIRECT, you have just made > > > them > > > 4 times slower and "less concurrent*" then "buffred io" users. > > > Since > > > direct_IO path will queue an IO request and all. > > > (And if it is not so slow then why do we need dax_do_io at all? > > > [Rhetorical]) > > > > > > I hate it that you overload the semantics of a known and expected > > > O_DIRECT flag, for special pmem quirks. This is an incompatible > > > and unrelated overload of the semantics of O_DIRECT. > > Agreed - makig O_DIRECT less direct than not having it is plain > > stupid, > > and I somehow missed this initially. > Of course I disagree because like Dave argues in the msync case we > should do the correct thing first and make it fast later, but also > like Dave this arguing in circles is getting tiresome. > > > > > This whole DAX story turns into a major nightmare, and I fear all > > our > > hodge podge tweaks to the semantics aren't helping it. > > > > It seems like we simply need an explicit O_DAX for the read/write > > bypass if can't sort out the semantics (error, writer > > synchronization) > > just as we need a special flag for MMAP. > I don't see how O_DAX makes this situation better if the goal is to > accelerate unmodified applications... > > Vishal, at least the "delete a file with a badblock" model will still > work for implicitly clearing errors with your changes to stop doing > block clearing in fs/dax.c.  This combined with a new -EBADBLOCK (as > Dave suggests) and explicit logging of I/Os that fail for this reason > at least gives a chance to communicate errors in files to suitably > aware applications / environments. Agreed - I'll send out a series that has just the zeroing changes, and drop the dax_io fallback/O_DIRECT tweak for now while we figure out the right thing to do. That should get us to a place where we still have dax in the presence of errors, and have _a_ path for recovery. > _______________________________________________ > Linux-nvdimm mailing list > Linux-nvdimm@lists.01.org > https://lists.01.org/mailman/listinfo/linux-nvdimm