From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D6221C433F5 for ; Thu, 4 Nov 2021 17:48:39 +0000 (UTC) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 6E37D611C0 for ; Thu, 4 Nov 2021 17:48:39 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 6E37D611C0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=tempfail smtp.mailfrom=redhat.com Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-415-infy-yZGNp-M1eb7cx9ZCA-1; Thu, 04 Nov 2021 13:48:35 -0400 X-MC-Unique: infy-yZGNp-M1eb7cx9ZCA-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 09AAA100A642; Thu, 4 Nov 2021 17:48:31 +0000 (UTC) Received: from colo-mx.corp.redhat.com (colo-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.20]) by smtp.corp.redhat.com (Postfix) with ESMTPS id DB9B16784F; Thu, 4 Nov 2021 17:48:30 +0000 (UTC) Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33]) by colo-mx.corp.redhat.com (Postfix) with ESMTP id AF7291806D03; Thu, 4 Nov 2021 17:48:30 +0000 (UTC) Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id 1A4Hhh26006500 for ; Thu, 4 Nov 2021 13:43:44 -0400 Received: by smtp.corp.redhat.com (Postfix) id 81D312026D46; Thu, 4 Nov 2021 17:43:43 +0000 (UTC) Received: from mimecast-mx02.redhat.com (mimecast01.extmail.prod.ext.rdu2.redhat.com [10.11.55.17]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 79B9B2026D67 for ; Thu, 4 Nov 2021 17:43:40 +0000 (UTC) Received: from us-smtp-1.mimecast.com (us-smtp-delivery-1.mimecast.com [205.139.110.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 9AFA7899EC2 for ; Thu, 4 Nov 2021 17:43:40 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-510-WkIgtd11NHeEnOdWgkAWgg-1; Thu, 04 Nov 2021 13:43:36 -0400 X-MC-Unique: WkIgtd11NHeEnOdWgkAWgg-1 Received: from hch by bombadil.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1migln-009hiW-DT; Thu, 04 Nov 2021 17:43:23 +0000 Date: Thu, 4 Nov 2021 10:43:23 -0700 From: Christoph Hellwig To: Dan Williams Message-ID: References: <2102a2e6-c543-2557-28a2-8b0bdc470855@oracle.com> <20211028002451.GB2237511@magnolia> MIME-Version: 1.0 In-Reply-To: X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html X-Mimecast-Impersonation-Protect: Policy=CLT - Impersonation Protection Definition; Similar Internal Domain=false; Similar Monitored External Domain=false; Custom External Domain=false; Mimecast External Domain=false; Newly Observed Domain=false; Internal User Name=false; Custom Display Name List=false; Reply-to Address Mismatch=false; Targeted Threat Dictionary=false; Mimecast Threat Dictionary=false; Custom Threat Dictionary=false X-Scanned-By: MIMEDefang 2.78 on 10.11.54.4 X-loop: dm-devel@redhat.com Cc: Jane Chu , "nvdimm@lists.linux.dev" , "dave.jiang@intel.com" , "snitzer@redhat.com" , "Darrick J. Wong" , "david@fromorbit.com" , "linux-kernel@vger.kernel.org" , "willy@infradead.org" , Christoph Hellwig , "dm-devel@redhat.com" , "vgoyal@redhat.com" , "vishal.l.verma@intel.com" , "linux-fsdevel@vger.kernel.org" , "ira.weiny@intel.com" , "linux-xfs@vger.kernel.org" , "agk@redhat.com" Subject: Re: [dm-devel] [PATCH 0/6] dax poison recovery with RWF_RECOVERY_DATA flag X-BeenThere: dm-devel@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk List-Id: device-mapper development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=dm-devel-bounces@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Disposition: inline Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit On Thu, Nov 04, 2021 at 09:24:15AM -0700, Dan Williams wrote: > No, the big difference with every other modern storage device is > access to byte-addressable storage. Storage devices get to "cheat" > with guaranteed minimum 512-byte accesses. So you can arrange for > writes to always be large enough to scrub the ECC bits along with the > data. For PMEM and byte-granularity DAX accesses the "sector size" is > a cacheline and it needed a new CPU instruction before software could > atomically update data + ECC. Otherwise, with sub-cacheline accesses, > a RMW cycle can't always be avoided. Such a cycle pulls poison from > the device on the read and pushes it back out to the media on the > cacheline writeback. Indeed. The fake byte addressability is indeed the problem, and the fix is to not do that, at least on the second attempt. > I don't understand what overprovisioning has to do with better error > management? No other storage device has seen fit to be as transparent > with communicating the error list and offering ways to proactively > scrub it. Dave and Darrick rightly saw this and said "hey, the FS > could do a much better job for the user if it knew about this error > list". So I don't get what this argument about spare blocks has to do > with what XFS wants? I.e. an rmap facility to communicate files that > have been clobbered by cosmic rays and other calamities. Well, the answer for other interfaces (at least at the gold plated cost option) is so strong internal CRCs that user visible bits clobbered by cosmic rays don't realisticly happen. But it is a problem with the cheaper ones, and at least SCSI and NVMe offer the error list through the Get LBA status command (and I bet ATA too, but I haven't looked into that). Oddly enough there has never been much interested from the fs community for those. > > So far out of the low instrusiveness options Janes' previous series > > to automatically retry after calling a clear_poison operation seems > > like the best idea so far. We just need to also think about what > > we want to do for direct users of ->direct_access that do not use > > the mcsafe iov_iter helpers. > > Those exist? Even dm-writecache uses copy_mc_to_kernel(). I'm sorry, I have completely missed that it has been added. And it's been in for a whole year.. -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E55C4C433EF for ; Thu, 4 Nov 2021 17:43:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id BEB2361168 for ; Thu, 4 Nov 2021 17:43:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232269AbhKDRqQ (ORCPT ); Thu, 4 Nov 2021 13:46:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60282 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232255AbhKDRqQ (ORCPT ); Thu, 4 Nov 2021 13:46:16 -0400 Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:e::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0CEC5C061203; Thu, 4 Nov 2021 10:43:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=In-Reply-To:Content-Type:MIME-Version :References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=tpfTM33t7i1Dy0/MirwuyrZHtl/M5zJb53zksLaIaww=; b=izhOhuQ7+o/FPLVHvFRYe7dOmO EKtpQ0CBDnMKmuXKOe/Zlh3NIONEezpxzTNpDKYTwr2pABanbXIUtTny/VMaNObE/41bPRgqEE1uC cJRKfVaWHcbVX0BuZoBaOgRR/QHe11HLcQ1i7+xflfuFOu2PIxtSyoHEzWKTqfAypcc0omMUWaqIi wkJHvA25mPsBegAiAH8kwFuGnFD2JvXc+tPH335MDjyWcR+/l22fT7b3i51K+5SAj73ghFlbVcP7g vB5RMzB4fuLnjRRi8+4CN9cEcAaZrhSCOTkqBD6gjg+TXUwTbfhTswKMHe6JQ4Skm4Kb2uT9qmp78 l/43M2oQ==; Received: from hch by bombadil.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1migln-009hiW-DT; Thu, 04 Nov 2021 17:43:23 +0000 Date: Thu, 4 Nov 2021 10:43:23 -0700 From: Christoph Hellwig To: Dan Williams Cc: Christoph Hellwig , "Darrick J. Wong" , Jane Chu , "david@fromorbit.com" , "vishal.l.verma@intel.com" , "dave.jiang@intel.com" , "agk@redhat.com" , "snitzer@redhat.com" , "dm-devel@redhat.com" , "ira.weiny@intel.com" , "willy@infradead.org" , "vgoyal@redhat.com" , "linux-fsdevel@vger.kernel.org" , "nvdimm@lists.linux.dev" , "linux-kernel@vger.kernel.org" , "linux-xfs@vger.kernel.org" Subject: Re: [dm-devel] [PATCH 0/6] dax poison recovery with RWF_RECOVERY_DATA flag Message-ID: References: <2102a2e6-c543-2557-28a2-8b0bdc470855@oracle.com> <20211028002451.GB2237511@magnolia> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org On Thu, Nov 04, 2021 at 09:24:15AM -0700, Dan Williams wrote: > No, the big difference with every other modern storage device is > access to byte-addressable storage. Storage devices get to "cheat" > with guaranteed minimum 512-byte accesses. So you can arrange for > writes to always be large enough to scrub the ECC bits along with the > data. For PMEM and byte-granularity DAX accesses the "sector size" is > a cacheline and it needed a new CPU instruction before software could > atomically update data + ECC. Otherwise, with sub-cacheline accesses, > a RMW cycle can't always be avoided. Such a cycle pulls poison from > the device on the read and pushes it back out to the media on the > cacheline writeback. Indeed. The fake byte addressability is indeed the problem, and the fix is to not do that, at least on the second attempt. > I don't understand what overprovisioning has to do with better error > management? No other storage device has seen fit to be as transparent > with communicating the error list and offering ways to proactively > scrub it. Dave and Darrick rightly saw this and said "hey, the FS > could do a much better job for the user if it knew about this error > list". So I don't get what this argument about spare blocks has to do > with what XFS wants? I.e. an rmap facility to communicate files that > have been clobbered by cosmic rays and other calamities. Well, the answer for other interfaces (at least at the gold plated cost option) is so strong internal CRCs that user visible bits clobbered by cosmic rays don't realisticly happen. But it is a problem with the cheaper ones, and at least SCSI and NVMe offer the error list through the Get LBA status command (and I bet ATA too, but I haven't looked into that). Oddly enough there has never been much interested from the fs community for those. > > So far out of the low instrusiveness options Janes' previous series > > to automatically retry after calling a clear_poison operation seems > > like the best idea so far. We just need to also think about what > > we want to do for direct users of ->direct_access that do not use > > the mcsafe iov_iter helpers. > > Those exist? Even dm-writecache uses copy_mc_to_kernel(). I'm sorry, I have completely missed that it has been added. And it's been in for a whole year..