From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E661DE92FCD for ; Fri, 6 Oct 2023 04:32:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230082AbjJFEcK (ORCPT ); Fri, 6 Oct 2023 00:32:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35894 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230128AbjJFEbo (ORCPT ); Fri, 6 Oct 2023 00:31:44 -0400 Received: from mail-pf1-x435.google.com (mail-pf1-x435.google.com [IPv6:2607:f8b0:4864:20::435]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6FD30123 for ; Thu, 5 Oct 2023 21:31:09 -0700 (PDT) Received: by mail-pf1-x435.google.com with SMTP id d2e1a72fcca58-690fe10b6a4so1450824b3a.3 for ; Thu, 05 Oct 2023 21:31:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20230601.gappssmtp.com; s=20230601; t=1696566669; x=1697171469; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=ix+sRq4XQRGJaHRfH6WXpO9zrY7MmqagNFx/9Jnm3Cw=; b=1PXrH8VZnOblzuDC7aK/UG4BmUJT3RX2BSU2ovDIkmCEzmSbK9W2unlSLAt7RVfROA uk+A/E6ugLaxfW9A2XIFg3J2HEDzr8n6c2Aa17kn39z2vmedi082//EAkpShgfYAUSJl CydbxSDIB30+eGO0Da0ibs7g3KHtjrVZ8dQlY6dePodntI+1kNfJzL8U+iwhniC76fiX nwLztdRyyquTTwympjC0U8E4NJzkL/eB4XgzU43c6iiM3jtyBdZbOtRnfFmSAALGh7Sl aHvsMcP1x2wCPpxJ2lCLWZJ+qCTipRNUe8kEQnftAMToDeSR9VUp2Pqq89J2n5vMqJCC fDuA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696566669; x=1697171469; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=ix+sRq4XQRGJaHRfH6WXpO9zrY7MmqagNFx/9Jnm3Cw=; b=mZLoQFPThKDEGRxplZFn2Gw2sEoHudSsfM0n3ptvZUfOLUYSkKe7USNI+HFwIY7M9c ku3xyjeDHPMYirLsX/fY9dHtXZgnEDhtJyq4XmMBVbicjwC2gBDQlgBtfkuld8btvgUD 3vzILua2p33A27okCaSCRUhbsGD5RRU5ObkOB6PYMbank3iX2iUS30rV06++Klq8Yr6Q dd6HVIjlRI6VndXnpA+X4DBXNiFjhysFTsUIFKN+a9tRAWIOUUlEfOcW/IwCvbsQmBku YjgW7qQwf/pTzqwmZqLZEGh0jo1gEbwmO1fJYXUUcO+HpZm/bYhhVk8cE3v//vJPbsHm Ib4w== X-Gm-Message-State: AOJu0YyD5esdkOdRxe5YwFK88aq+TgJ4VhWuBHunFi1aAooGhY0lTzpB 42HxbWnhB/HReMgi+bQHEb6mGA== X-Google-Smtp-Source: AGHT+IHgkUMxgn1iEBVFJVYZt/afT8PIq1pP/h8RwAD8SFXjlpezcSfR4j9zUmA+wytXxYHwxrQQQg== X-Received: by 2002:a05:6a20:1447:b0:14e:3daf:fdb9 with SMTP id a7-20020a056a20144700b0014e3daffdb9mr8351717pzi.22.1696566668815; Thu, 05 Oct 2023 21:31:08 -0700 (PDT) Received: from dread.disaster.area (pa49-180-20-59.pa.nsw.optusnet.com.au. [49.180.20.59]) by smtp.gmail.com with ESMTPSA id v9-20020a62a509000000b0069029a3196dsm427960pfm.184.2023.10.05.21.31.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 05 Oct 2023 21:31:08 -0700 (PDT) Received: from dave by dread.disaster.area with local (Exim 4.96) (envelope-from ) id 1qocUT-00A4Ul-1m; Fri, 06 Oct 2023 15:31:05 +1100 Date: Fri, 6 Oct 2023 15:31:05 +1100 From: Dave Chinner To: Bart Van Assche Cc: "Martin K. Petersen" , John Garry , axboe@kernel.dk, kbusch@kernel.org, hch@lst.de, sagi@grimberg.me, jejb@linux.ibm.com, djwong@kernel.org, viro@zeniv.linux.org.uk, brauner@kernel.org, chandan.babu@oracle.com, dchinner@redhat.com, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nvme@lists.infradead.org, linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, tytso@mit.edu, jbongio@google.com, linux-api@vger.kernel.org Subject: Re: [PATCH 10/21] block: Add fops atomic write support Message-ID: References: <5d26fa3b-ec34-bc39-ecfe-4616a04977ca@oracle.com> <34c08488-a288-45f9-a28f-a514a408541d@acm.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-api@vger.kernel.org On Thu, Oct 05, 2023 at 03:58:38PM -0700, Bart Van Assche wrote: > On 10/5/23 15:36, Dave Chinner wrote: > > $ lspci |grep -i nvme > > 03:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983 > > 06:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983 > > $ cat /sys/block/nvme*n1/queue/write_cache > > write back > > write back > > $ > > > > That they have volatile writeback caches.... > > It seems like what I wrote has been misunderstood completely. With > "handling a power failure cleanly" I meant that power cycling a block device > does not result in read errors nor in reading data that has never been written. Then I don't see what your concern is. Single sector writes are guaranteed atomic and have been for as long as I've worked in this game. OTOH, multi-sector writes are not guaranteed to be atomic - they can get torn on sector boundaries, but the individual sectors within that write are guaranteed to be all-or-nothing. Any hardware device that does not guarantee single sector write atomicity (i.e. tears in the middle of a sector) is, by definition, broken. And we all know that broken hardware means nothing in the storage stack works as it should, so I just don't see what point you are trying to make... > Although it is hard to find information about this topic, here is what I found > online: > * About certain SSDs with power loss protection: > https://us.transcend-info.com/embedded/technology/power-loss-protection-plp > * About another class of SSDs with power loss protection: > https://www.kingston.com/en/blog/servers-and-data-centers/ssd-power-loss-protection > * About yet another class of SSDs with power loss protection: > https://phisonblog.com/avoiding-ssd-data-loss-with-phisons-power-loss-protection-2/ Yup, devices that behave as if they have non-volatile write caches. Such devices have been around for more than 30 years, they operate the same as devices without caches at all. > So far I haven't found any information about hard disks and power failure > handling. What I found is that most current hard disks protect data with ECC. > The ECC mechanism should provide good protection against reading data that > has never been written. If a power failure occurs while a hard disk is writing > a physical block, can this result in a read error after power is restored? If > so, is this behavior allowed by storage standards? If a power fail results in read errors from the storage media being reported to the OS instead of the data that was present in the sector before the power failure, then the device is broken. If there is no data in the region being read because it has never been written, then it should return zeros (no data) rather than stale data or a media read error. -Dave. -- Dave Chinner david@fromorbit.com