From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 277DAC0015E for ; Fri, 21 Jul 2023 06:38:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1689921504; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post; bh=EfIfnjaxtI5M/LHDFeq0pYuTNQNtH4YOqvyg1qrJoys=; b=eC+8GQuf9q/jTxU3i6Y5Kk26pHLTZx3oWv9L6fWtsiF9gd75g0Gx1rZcy9tfs+k+NG/YTh j2+ZqFXAIiZkUIGvtXlerWBdTXCsrCTelr+t6jCOr2DGvV8+3NqJLPk+begCZhKV0USfIn ZX3APbDDOG5TXNwDOQJbjicDp6V0fpI= Received: from mimecast-mx02.redhat.com (66.187.233.73 [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-260-O3-PAcUwPu-V3vRAgAxtEg-1; Fri, 21 Jul 2023 02:38:20 -0400 X-MC-Unique: O3-PAcUwPu-V3vRAgAxtEg-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id AD2921C060D7; Fri, 21 Jul 2023 06:38:18 +0000 (UTC) Received: from mm-prod-listman-01.mail-001.prod.us-east-1.aws.redhat.com (unknown [10.30.29.100]) by smtp.corp.redhat.com (Postfix) with ESMTP id 1E26A1401C2E; Fri, 21 Jul 2023 06:38:15 +0000 (UTC) Received: from mm-prod-listman-01.mail-001.prod.us-east-1.aws.redhat.com (localhost [IPv6:::1]) by mm-prod-listman-01.mail-001.prod.us-east-1.aws.redhat.com (Postfix) with ESMTP id F30291946A43; Fri, 21 Jul 2023 06:38:07 +0000 (UTC) Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) by mm-prod-listman-01.mail-001.prod.us-east-1.aws.redhat.com (Postfix) with ESMTP id 334861946587 for ; Thu, 20 Jul 2023 07:55:31 +0000 (UTC) Received: by smtp.corp.redhat.com (Postfix) id CFA201454143; Thu, 20 Jul 2023 07:55:30 +0000 (UTC) Received: from mimecast-mx02.redhat.com (mimecast02.extmail.prod.ext.rdu2.redhat.com [10.11.55.18]) by smtp.corp.redhat.com (Postfix) with ESMTPS id C6B031454142 for ; Thu, 20 Jul 2023 07:55:30 +0000 (UTC) Received: from us-smtp-1.mimecast.com (us-smtp-1.mimecast.com [205.139.110.61]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id A95048F1841 for ; Thu, 20 Jul 2023 07:55:30 +0000 (UTC) Received: from mail.lichtvoll.de (luna.lichtvoll.de [194.150.191.11]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-568-fcHqTvf6Nn6qjXNhby7rBw-1; Thu, 20 Jul 2023 03:55:25 -0400 X-MC-Unique: fcHqTvf6Nn6qjXNhby7rBw-1 Received: from 127.0.0.1 (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by mail.lichtvoll.de (Postfix) with ESMTPSA id 225317497FD; Thu, 20 Jul 2023 09:55:23 +0200 (CEST) From: Martin Steigerwald To: Theodore Ts'o Date: Thu, 20 Jul 2023 09:55:22 +0200 Message-ID: <38426448.10thIPus4b@lichtvoll.de> In-Reply-To: <20230720042034.GA5764@mit.edu> References: <20230717075035.GA9549@tomerius.de> <4835096.GXAFRqVoOG@lichtvoll.de> <20230720042034.GA5764@mit.edu> MIME-Version: 1.0 X-Mimecast-Impersonation-Protect: Policy=CLT - Impersonation Protection Definition; Similar Internal Domain=false; Similar Monitored External Domain=false; Custom External Domain=false; Mimecast External Domain=false; Newly Observed Domain=false; Internal User Name=false; Custom Display Name List=false; Reply-to Address Mismatch=false; Targeted Threat Dictionary=false; Mimecast Threat Dictionary=false; Custom Threat Dictionary=false X-Scanned-By: MIMEDefang 3.1 on 10.11.54.7 X-Mailman-Approved-At: Fri, 21 Jul 2023 06:38:07 +0000 Subject: [dm-devel] Nobarrier mount option (was: Re: File system robustness) X-BeenThere: dm-devel@redhat.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: device-mapper development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linux-embedded@vger.kernel.org, dm-devel@redhat.com, Kai Tomerius , =?ISO-8859-1?Q?Bj=F8rn?= Forsman , Ext4 Developers List , "Alan C. Assis" Errors-To: dm-devel-bounces@redhat.com Sender: "dm-devel" X-Scanned-By: MIMEDefang 3.1 on 10.11.54.7 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: lichtvoll.de Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Theodore Ts'o - 20.07.23, 06:20:34 CEST: > On Wed, Jul 19, 2023 at 08:22:43AM +0200, Martin Steigerwald wrote: > > Is "nobarrier" mount option still a thing? I thought those mount > > options have been deprecated or even removed with the introduction > > of cache flush handling in kernel 2.6.37? > > Yes, it's a thing, and if your server has a UPS with a reliable power > failure / low battery feedback, it's *possible* to engineer a reliable > system. Or, for example, if you have a phone with an integrated > battery, so when you drop it the battery compartment won't open and > the battery won't go flying out, *and* the baseboard management > controller (BMC) will halt the CPU before the battery complete dies, > and gives a chance for the flash storage device to commit everything > before shutdown, *and* the BMC arranges to make sure the same thing > happens when the user pushes and holds the power button for 30 > seconds, then it could be safe. Thanks for clarification. I am aware that something like this can be done. But I did not think that is would be necessary to explicitly disable barriers, or should I more accurately write cache flushes, in such a case: I thought that nowadays a cache flush would be (almost) a no-op in the case the storage receiving it is backed by such reliability measures. I.e. that the hardware just says "I am ready" when having the I/O request in stable storage whatever that would be, even in case that would be battery backed NVRAM and/or temporary flash. At least that is what I thought was the background for not doing the "nobarrier" thing anymore: Let the storage below decide whether it is safe to basically ignore cache flushes by answering them (almost) immediately. However, not sending the cache flushes in the first place would likely still be more efficient although as far as I am aware block layer does not return back a success / failure information to the upper layers anymore since kernel 2.6.37. Seems I got to update my Linux Performance tuning slides about this once again. > We also use nobarrier for a scratch file systems which by definition > go away when the borg/kubernetes job dies, and which will *never* > survive a reboot, let alone a power failure. In such a situation, > there's no point sending the cache flush, because the partition will > be mkfs'ed on reboot. Or, in if the iSCSI or Cloud Persistent Disk > will *always* go away when the VM dies, because any persistent state > is saved to some cluster or distributed file store (e.g., to the MySQL > server, or Big Table, or Spanner, etc. In these cases, you don't > *want* the Cache Flush operation, since skipping it reduce I/O > overhead. Hmm, right. > So if you know what you are doing, in certain specialized use cases, > nobarrier can make sense, and it is used today at my $WORK's data > center for production jobs *all* the time. So we won't be making > ext4's nobarrier mount option go away; it has users. :-) I now wonder why XFS people deprecated and even removed those mount options. But maybe I better ask them separately instead of adding their list in CC. Probably by forwarding this mail to the XFS mailing list later on. Best, -- Martin -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel From mboxrd@z Thu Jan 1 00:00:00 1970 From: Martin Steigerwald Subject: Nobarrier mount option (was: Re: File system robustness) Date: Thu, 20 Jul 2023 09:55:22 +0200 Message-ID: <38426448.10thIPus4b@lichtvoll.de> References: <20230717075035.GA9549@tomerius.de> <4835096.GXAFRqVoOG@lichtvoll.de> <20230720042034.GA5764@mit.edu> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7Bit Return-path: In-Reply-To: <20230720042034.GA5764@mit.edu> List-ID: To: Theodore Ts'o Cc: "Alan C. Assis" , =?ISO-8859-1?Q?Bj=F8rn?= Forsman , Kai Tomerius , linux-embedded@vger.kernel.org, Ext4 Developers List , dm-devel@redhat.com Theodore Ts'o - 20.07.23, 06:20:34 CEST: > On Wed, Jul 19, 2023 at 08:22:43AM +0200, Martin Steigerwald wrote: > > Is "nobarrier" mount option still a thing? I thought those mount > > options have been deprecated or even removed with the introduction > > of cache flush handling in kernel 2.6.37? > > Yes, it's a thing, and if your server has a UPS with a reliable power > failure / low battery feedback, it's *possible* to engineer a reliable > system. Or, for example, if you have a phone with an integrated > battery, so when you drop it the battery compartment won't open and > the battery won't go flying out, *and* the baseboard management > controller (BMC) will halt the CPU before the battery complete dies, > and gives a chance for the flash storage device to commit everything > before shutdown, *and* the BMC arranges to make sure the same thing > happens when the user pushes and holds the power button for 30 > seconds, then it could be safe. Thanks for clarification. I am aware that something like this can be done. But I did not think that is would be necessary to explicitly disable barriers, or should I more accurately write cache flushes, in such a case: I thought that nowadays a cache flush would be (almost) a no-op in the case the storage receiving it is backed by such reliability measures. I.e. that the hardware just says "I am ready" when having the I/O request in stable storage whatever that would be, even in case that would be battery backed NVRAM and/or temporary flash. At least that is what I thought was the background for not doing the "nobarrier" thing anymore: Let the storage below decide whether it is safe to basically ignore cache flushes by answering them (almost) immediately. However, not sending the cache flushes in the first place would likely still be more efficient although as far as I am aware block layer does not return back a success / failure information to the upper layers anymore since kernel 2.6.37. Seems I got to update my Linux Performance tuning slides about this once again. > We also use nobarrier for a scratch file systems which by definition > go away when the borg/kubernetes job dies, and which will *never* > survive a reboot, let alone a power failure. In such a situation, > there's no point sending the cache flush, because the partition will > be mkfs'ed on reboot. Or, in if the iSCSI or Cloud Persistent Disk > will *always* go away when the VM dies, because any persistent state > is saved to some cluster or distributed file store (e.g., to the MySQL > server, or Big Table, or Spanner, etc. In these cases, you don't > *want* the Cache Flush operation, since skipping it reduce I/O > overhead. Hmm, right. > So if you know what you are doing, in certain specialized use cases, > nobarrier can make sense, and it is used today at my $WORK's data > center for production jobs *all* the time. So we won't be making > ext4's nobarrier mount option go away; it has users. :-) I now wonder why XFS people deprecated and even removed those mount options. But maybe I better ask them separately instead of adding their list in CC. Probably by forwarding this mail to the XFS mailing list later on. Best, -- Martin