From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Message-ID: <57277EDA.9000803@plexistor.com> Date: Mon, 02 May 2016 19:22:50 +0300 From: Boaz Harrosh MIME-Version: 1.0 To: Dan Williams CC: Vishal Verma , "linux-nvdimm@lists.01.org" , linux-block@vger.kernel.org, Jan Kara , Matthew Wilcox , Dave Chinner , "linux-kernel@vger.kernel.org" , XFS Developers , Jens Axboe , Linux MM , Al Viro , Christoph Hellwig , linux-fsdevel , Andrew Morton , linux-ext4 Subject: Re: [PATCH v4 5/7] fs: prioritize and separate direct_io from dax_io References: <1461878218-3844-1-git-send-email-vishal.l.verma@intel.com> <1461878218-3844-6-git-send-email-vishal.l.verma@intel.com> <5727753F.6090104@plexistor.com> In-Reply-To: Content-Type: text/plain; charset=utf-8 Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On 05/02/2016 07:01 PM, Dan Williams wrote: > On Mon, May 2, 2016 at 8:41 AM, Boaz Harrosh wrote: >> On 04/29/2016 12:16 AM, Vishal Verma wrote: >>> All IO in a dax filesystem used to go through dax_do_io, which cannot >>> handle media errors, and thus cannot provide a recovery path that can >>> send a write through the driver to clear errors. >>> >>> Add a new iocb flag for DAX, and set it only for DAX mounts. In the IO >>> path for DAX filesystems, use the same direct_IO path for both DAX and >>> direct_io iocbs, but use the flags to identify when we are in O_DIRECT >>> mode vs non O_DIRECT with DAX, and for O_DIRECT, use the conventional >>> direct_IO path instead of DAX. >>> >> >> Really? What are your thinking here? >> >> What about all the current users of O_DIRECT, you have just made them >> 4 times slower and "less concurrent*" then "buffred io" users. Since >> direct_IO path will queue an IO request and all. >> (And if it is not so slow then why do we need dax_do_io at all? [Rhetorical]) >> >> I hate it that you overload the semantics of a known and expected >> O_DIRECT flag, for special pmem quirks. This is an incompatible >> and unrelated overload of the semantics of O_DIRECT. > > I think it is the opposite situation, it us undoing the premature > overloading of O_DIRECT that went in without performance numbers. We have tons of measurements. Is not hard to imagine the results though. Specially the 1000 threads case > This implementation clarifies that dax_do_io() handles the lack of a > page cache for buffered I/O and O_DIRECT behaves as it nominally would > by sending an I/O to the driver. > It has the benefit of matching the > error semantics of a typical block device where a buffered write could > hit an error filling the page cache, but an O_DIRECT write potentially > triggers the drive to remap the block. > I fail to see how in writes the device error semantics regarding remapping of blocks is any different between buffered and direct IO. As far as the block device it is the same exact code path. All The big difference is higher in the VFS. And ... So you are willing to sacrifice the 99% hotpath for the sake of the 1% error path? and piggybacking on poor O_DIRECT. Again there are tons of O_DIRECT apps out there, why are you forcing them to change if they want true pmem performance? I still believe dax_do_io() can be made more resilient to errors, and clear errors on writes. Me going digging in old patches ... Cheers Boaz From mboxrd@z Thu Jan 1 00:00:00 1970 From: Boaz Harrosh Subject: Re: [PATCH v4 5/7] fs: prioritize and separate direct_io from dax_io Date: Mon, 02 May 2016 19:22:50 +0300 Message-ID: <57277EDA.9000803@plexistor.com> References: <1461878218-3844-1-git-send-email-vishal.l.verma@intel.com> <1461878218-3844-6-git-send-email-vishal.l.verma@intel.com> <5727753F.6090104@plexistor.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Cc: Vishal Verma , "linux-nvdimm@lists.01.org" , linux-block@vger.kernel.org, Jan Kara , Matthew Wilcox , Dave Chinner , "linux-kernel@vger.kernel.org" , XFS Developers , Jens Axboe , Linux MM , Al Viro , Christoph Hellwig , linux-fsdevel , Andrew Morton , linux-ext4 To: Dan Williams Return-path: In-Reply-To: Sender: owner-linux-mm@kvack.org List-Id: linux-ext4.vger.kernel.org On 05/02/2016 07:01 PM, Dan Williams wrote: > On Mon, May 2, 2016 at 8:41 AM, Boaz Harrosh wrote: >> On 04/29/2016 12:16 AM, Vishal Verma wrote: >>> All IO in a dax filesystem used to go through dax_do_io, which cannot >>> handle media errors, and thus cannot provide a recovery path that can >>> send a write through the driver to clear errors. >>> >>> Add a new iocb flag for DAX, and set it only for DAX mounts. In the IO >>> path for DAX filesystems, use the same direct_IO path for both DAX and >>> direct_io iocbs, but use the flags to identify when we are in O_DIRECT >>> mode vs non O_DIRECT with DAX, and for O_DIRECT, use the conventional >>> direct_IO path instead of DAX. >>> >> >> Really? What are your thinking here? >> >> What about all the current users of O_DIRECT, you have just made them >> 4 times slower and "less concurrent*" then "buffred io" users. Since >> direct_IO path will queue an IO request and all. >> (And if it is not so slow then why do we need dax_do_io at all? [Rhetorical]) >> >> I hate it that you overload the semantics of a known and expected >> O_DIRECT flag, for special pmem quirks. This is an incompatible >> and unrelated overload of the semantics of O_DIRECT. > > I think it is the opposite situation, it us undoing the premature > overloading of O_DIRECT that went in without performance numbers. We have tons of measurements. Is not hard to imagine the results though. Specially the 1000 threads case > This implementation clarifies that dax_do_io() handles the lack of a > page cache for buffered I/O and O_DIRECT behaves as it nominally would > by sending an I/O to the driver. > It has the benefit of matching the > error semantics of a typical block device where a buffered write could > hit an error filling the page cache, but an O_DIRECT write potentially > triggers the drive to remap the block. > I fail to see how in writes the device error semantics regarding remapping of blocks is any different between buffered and direct IO. As far as the block device it is the same exact code path. All The big difference is higher in the VFS. And ... So you are willing to sacrifice the 99% hotpath for the sake of the 1% error path? and piggybacking on poor O_DIRECT. Again there are tons of O_DIRECT apps out there, why are you forcing them to change if they want true pmem performance? I still believe dax_do_io() can be made more resilient to errors, and clear errors on writes. Me going digging in old patches ... Cheers Boaz -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-x229.google.com (mail-wm0-x229.google.com [IPv6:2a00:1450:400c:c09::229]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 332381A1E1F for ; Mon, 2 May 2016 09:22:54 -0700 (PDT) Received: by mail-wm0-x229.google.com with SMTP id n129so113989050wmn.1 for ; Mon, 02 May 2016 09:22:54 -0700 (PDT) Message-ID: <57277EDA.9000803@plexistor.com> Date: Mon, 02 May 2016 19:22:50 +0300 From: Boaz Harrosh MIME-Version: 1.0 Subject: Re: [PATCH v4 5/7] fs: prioritize and separate direct_io from dax_io References: <1461878218-3844-1-git-send-email-vishal.l.verma@intel.com> <1461878218-3844-6-git-send-email-vishal.l.verma@intel.com> <5727753F.6090104@plexistor.com> In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" To: Dan Williams Cc: Jens Axboe , Jan Kara , Matthew Wilcox , "linux-nvdimm@lists.01.org" , Dave Chinner , "linux-kernel@vger.kernel.org" , XFS Developers , linux-block@vger.kernel.org, Linux MM , Al Viro , Christoph Hellwig , linux-fsdevel , Andrew Morton , linux-ext4 List-ID: On 05/02/2016 07:01 PM, Dan Williams wrote: > On Mon, May 2, 2016 at 8:41 AM, Boaz Harrosh wrote: >> On 04/29/2016 12:16 AM, Vishal Verma wrote: >>> All IO in a dax filesystem used to go through dax_do_io, which cannot >>> handle media errors, and thus cannot provide a recovery path that can >>> send a write through the driver to clear errors. >>> >>> Add a new iocb flag for DAX, and set it only for DAX mounts. In the IO >>> path for DAX filesystems, use the same direct_IO path for both DAX and >>> direct_io iocbs, but use the flags to identify when we are in O_DIRECT >>> mode vs non O_DIRECT with DAX, and for O_DIRECT, use the conventional >>> direct_IO path instead of DAX. >>> >> >> Really? What are your thinking here? >> >> What about all the current users of O_DIRECT, you have just made them >> 4 times slower and "less concurrent*" then "buffred io" users. Since >> direct_IO path will queue an IO request and all. >> (And if it is not so slow then why do we need dax_do_io at all? [Rhetorical]) >> >> I hate it that you overload the semantics of a known and expected >> O_DIRECT flag, for special pmem quirks. This is an incompatible >> and unrelated overload of the semantics of O_DIRECT. > > I think it is the opposite situation, it us undoing the premature > overloading of O_DIRECT that went in without performance numbers. We have tons of measurements. Is not hard to imagine the results though. Specially the 1000 threads case > This implementation clarifies that dax_do_io() handles the lack of a > page cache for buffered I/O and O_DIRECT behaves as it nominally would > by sending an I/O to the driver. > It has the benefit of matching the > error semantics of a typical block device where a buffered write could > hit an error filling the page cache, but an O_DIRECT write potentially > triggers the drive to remap the block. > I fail to see how in writes the device error semantics regarding remapping of blocks is any different between buffered and direct IO. As far as the block device it is the same exact code path. All The big difference is higher in the VFS. And ... So you are willing to sacrifice the 99% hotpath for the sake of the 1% error path? and piggybacking on poor O_DIRECT. Again there are tons of O_DIRECT apps out there, why are you forcing them to change if they want true pmem performance? I still believe dax_do_io() can be made more resilient to errors, and clear errors on writes. Me going digging in old patches ... Cheers Boaz _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from relay.sgi.com (relay1.corp.sgi.com [137.38.102.111]) by oss.sgi.com (Postfix) with ESMTP id 04A8D7CD1 for ; Mon, 2 May 2016 11:23:01 -0500 (CDT) Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by relay1.corp.sgi.com (Postfix) with ESMTP id AE7418F8035 for ; Mon, 2 May 2016 09:23:00 -0700 (PDT) Received: from mail-wm0-f51.google.com (mail-wm0-f51.google.com [74.125.82.51]) by cuda.sgi.com with ESMTP id j7zvpBTq5Yz8R9Qh (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO) for ; Mon, 02 May 2016 09:22:54 -0700 (PDT) Received: by mail-wm0-f51.google.com with SMTP id e201so113572899wme.0 for ; Mon, 02 May 2016 09:22:54 -0700 (PDT) Message-ID: <57277EDA.9000803@plexistor.com> Date: Mon, 02 May 2016 19:22:50 +0300 From: Boaz Harrosh MIME-Version: 1.0 Subject: Re: [PATCH v4 5/7] fs: prioritize and separate direct_io from dax_io References: <1461878218-3844-1-git-send-email-vishal.l.verma@intel.com> <1461878218-3844-6-git-send-email-vishal.l.verma@intel.com> <5727753F.6090104@plexistor.com> In-Reply-To: List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: xfs-bounces@oss.sgi.com Sender: xfs-bounces@oss.sgi.com To: Dan Williams Cc: Jens Axboe , Jan Kara , Matthew Wilcox , Vishal Verma , "linux-nvdimm@lists.01.org" , "linux-kernel@vger.kernel.org" , XFS Developers , linux-block@vger.kernel.org, Linux MM , Al Viro , Christoph Hellwig , linux-fsdevel , Andrew Morton , linux-ext4 On 05/02/2016 07:01 PM, Dan Williams wrote: > On Mon, May 2, 2016 at 8:41 AM, Boaz Harrosh wrote: >> On 04/29/2016 12:16 AM, Vishal Verma wrote: >>> All IO in a dax filesystem used to go through dax_do_io, which cannot >>> handle media errors, and thus cannot provide a recovery path that can >>> send a write through the driver to clear errors. >>> >>> Add a new iocb flag for DAX, and set it only for DAX mounts. In the IO >>> path for DAX filesystems, use the same direct_IO path for both DAX and >>> direct_io iocbs, but use the flags to identify when we are in O_DIRECT >>> mode vs non O_DIRECT with DAX, and for O_DIRECT, use the conventional >>> direct_IO path instead of DAX. >>> >> >> Really? What are your thinking here? >> >> What about all the current users of O_DIRECT, you have just made them >> 4 times slower and "less concurrent*" then "buffred io" users. Since >> direct_IO path will queue an IO request and all. >> (And if it is not so slow then why do we need dax_do_io at all? [Rhetorical]) >> >> I hate it that you overload the semantics of a known and expected >> O_DIRECT flag, for special pmem quirks. This is an incompatible >> and unrelated overload of the semantics of O_DIRECT. > > I think it is the opposite situation, it us undoing the premature > overloading of O_DIRECT that went in without performance numbers. We have tons of measurements. Is not hard to imagine the results though. Specially the 1000 threads case > This implementation clarifies that dax_do_io() handles the lack of a > page cache for buffered I/O and O_DIRECT behaves as it nominally would > by sending an I/O to the driver. > It has the benefit of matching the > error semantics of a typical block device where a buffered write could > hit an error filling the page cache, but an O_DIRECT write potentially > triggers the drive to remap the block. > I fail to see how in writes the device error semantics regarding remapping of blocks is any different between buffered and direct IO. As far as the block device it is the same exact code path. All The big difference is higher in the VFS. And ... So you are willing to sacrifice the 99% hotpath for the sake of the 1% error path? and piggybacking on poor O_DIRECT. Again there are tons of O_DIRECT apps out there, why are you forcing them to change if they want true pmem performance? I still believe dax_do_io() can be made more resilient to errors, and clear errors on writes. Me going digging in old patches ... Cheers Boaz _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754057AbcEBQXB (ORCPT ); Mon, 2 May 2016 12:23:01 -0400 Received: from mail-wm0-f49.google.com ([74.125.82.49]:36752 "EHLO mail-wm0-f49.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754161AbcEBQWy (ORCPT ); Mon, 2 May 2016 12:22:54 -0400 Message-ID: <57277EDA.9000803@plexistor.com> Date: Mon, 02 May 2016 19:22:50 +0300 From: Boaz Harrosh User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: Dan Williams CC: Vishal Verma , "linux-nvdimm@lists.01.org" , linux-block@vger.kernel.org, Jan Kara , Matthew Wilcox , Dave Chinner , "linux-kernel@vger.kernel.org" , XFS Developers , Jens Axboe , Linux MM , Al Viro , Christoph Hellwig , linux-fsdevel , Andrew Morton , linux-ext4 Subject: Re: [PATCH v4 5/7] fs: prioritize and separate direct_io from dax_io References: <1461878218-3844-1-git-send-email-vishal.l.verma@intel.com> <1461878218-3844-6-git-send-email-vishal.l.verma@intel.com> <5727753F.6090104@plexistor.com> In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 05/02/2016 07:01 PM, Dan Williams wrote: > On Mon, May 2, 2016 at 8:41 AM, Boaz Harrosh wrote: >> On 04/29/2016 12:16 AM, Vishal Verma wrote: >>> All IO in a dax filesystem used to go through dax_do_io, which cannot >>> handle media errors, and thus cannot provide a recovery path that can >>> send a write through the driver to clear errors. >>> >>> Add a new iocb flag for DAX, and set it only for DAX mounts. In the IO >>> path for DAX filesystems, use the same direct_IO path for both DAX and >>> direct_io iocbs, but use the flags to identify when we are in O_DIRECT >>> mode vs non O_DIRECT with DAX, and for O_DIRECT, use the conventional >>> direct_IO path instead of DAX. >>> >> >> Really? What are your thinking here? >> >> What about all the current users of O_DIRECT, you have just made them >> 4 times slower and "less concurrent*" then "buffred io" users. Since >> direct_IO path will queue an IO request and all. >> (And if it is not so slow then why do we need dax_do_io at all? [Rhetorical]) >> >> I hate it that you overload the semantics of a known and expected >> O_DIRECT flag, for special pmem quirks. This is an incompatible >> and unrelated overload of the semantics of O_DIRECT. > > I think it is the opposite situation, it us undoing the premature > overloading of O_DIRECT that went in without performance numbers. We have tons of measurements. Is not hard to imagine the results though. Specially the 1000 threads case > This implementation clarifies that dax_do_io() handles the lack of a > page cache for buffered I/O and O_DIRECT behaves as it nominally would > by sending an I/O to the driver. > It has the benefit of matching the > error semantics of a typical block device where a buffered write could > hit an error filling the page cache, but an O_DIRECT write potentially > triggers the drive to remap the block. > I fail to see how in writes the device error semantics regarding remapping of blocks is any different between buffered and direct IO. As far as the block device it is the same exact code path. All The big difference is higher in the VFS. And ... So you are willing to sacrifice the 99% hotpath for the sake of the 1% error path? and piggybacking on poor O_DIRECT. Again there are tons of O_DIRECT apps out there, why are you forcing them to change if they want true pmem performance? I still believe dax_do_io() can be made more resilient to errors, and clear errors on writes. Me going digging in old patches ... Cheers Boaz