From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-ua1-f50.google.com (mail-ua1-f50.google.com [209.85.222.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A33E73033C0 for ; Thu, 2 Apr 2026 21:22:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.50 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775164965; cv=none; b=coQm7ufA1deVYEqrupcmxUkc/R9pOyvxdWPPGl688Cf2NmU/4JoYiJGI1oBGzCzXUol39WSMOFGOGhmGxqlhAEcsSoYPYEEHj1v/Ql2H5D7yDR0i0Camnn3b7Ajg5Vvmj18wBPWoz1NtP4jArWfHJ5rCMmpHlyaz6TbHh3tP4W8= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775164965; c=relaxed/simple; bh=W37+UJm76gcYtlceZR/206I3VcDHSjYRZX6gB5xCSfk=; h=Date:From:To:Cc:Subject:Message-ID:MIME-Version:Content-Type: Content-Disposition; b=kaahJv8At3iPb3r1w/tGdQqz0gJP+5jpuAZjNHaofFJDDREDfTmPvcE+XyxPh8aQG6XJXdFvcLGqZJbdPAmTM9ro/EZpxYL3DnpASzHidxOxPdozTdPZ1cvcyXoKvY262xj2ctiJ9sDJ1oZiRiWDIJzi9r35xqFx5neetqFizfk= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Xd4hbN7a; arc=none smtp.client-ip=209.85.222.50 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Xd4hbN7a" Received: by mail-ua1-f50.google.com with SMTP id a1e0cc1a2514c-94ac7f22d23so395490241.3 for ; Thu, 02 Apr 2026 14:22:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1775164963; x=1775769763; darn=vger.kernel.org; h=content-disposition:mime-version:message-id:subject:cc:to:from:date :from:to:cc:subject:date:message-id:reply-to; bh=IB55pBrPrRVvy6yC37jMsRUOxNSIdANq32MiVaAR3v4=; b=Xd4hbN7arlLvtOPW9Chv3B5WQytuZs+gSyb21gX/ayo+JQpjx9tEpRPBlhNM/N7YYx ClDzJCssc8bX6YJS5kyQuHAO3/rs9GqtKaN0YOc5iXEPB5WR2VvkxQsgSm4JSKCC3jCI AE6opdMDvqUnqjmBDgXVuqgEFSqvsLUxD3zM8x0yB6/eLb0CthzMLfjQgblw0i/T8aBe IteVUPgArest/OZ+fkuzM1Gmv+0weBl1FdO8NaiHhY8dFJAXi1xgko8bNSCz3Kee47o6 xGM+JSnMaMTm2/nf2ZWSasZzmkmrRQCIGcH0ZIGsCLErQEmDSjAe3H9CY9QjuxP5wpN2 9M9g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1775164963; x=1775769763; h=content-disposition:mime-version:message-id:subject:cc:to:from:date :x-gm-gg:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=IB55pBrPrRVvy6yC37jMsRUOxNSIdANq32MiVaAR3v4=; b=OD+247vxGL/VnpfwrvORVt0knKJDFWgCtk5dJoFC0scsyP3pt7+JQv47rLwVAUqJgB FMACP590y6Do2g5WLDtoVbW37UHmdZucD6c5FC2ZkRdCeCS4AXS89gJPflZDHRWTIIiO CKpB4dd7sJVivJfudlM8CkKHjGG7DAaXUh3/KlgU5vOkdqatu0eI/nMi9sT8v+97ij2h gZ3DXQ2IRPovIHPpcCTlJdHxKVpOPj90/fX6+QUCbHjeG+gy5l/gfRqczGDidJY1S62Q SQRg/R8FEKg9qS3oeVut4PESsrndIXB/fKG5BQ1Cfpqv35ycGnsHIoWFm1KxNycFkBO2 /eIw== X-Gm-Message-State: AOJu0YyDuISsrEyTW1sYCsO4ryh1yIJtHUFPBlT06W4pac9qIJBzDhAn ETvQXCuwwiJnL2ezeOBbGfGtNEImHx+iTA0ckusr+a/k9xKlAfHx7Vff32yplaVI X-Gm-Gg: ATEYQzyQLsUWWSeRvFiRnzQ6Ib1DTom9dkkW+BKqcJAgGPazHzKWoY0/70PJcqUMaCC ilGPJJvyhW2UtOYZtdEqpuyYN1z2CqhBeik3OoRbO95FNNAmicW46Bv/2AAKV4NR1Pyw2PSjAbk Abl9SutSl/ZE82BN2G//fh0LDfCSX9Uy1WM2UfwCdaNI1Z3ij6FqqDZVOMHVC1lHnz0nR1sKot3 IdntfVNh+4a0g4vYdXbCKiQF52SAm1RXbkPRhrLM03n6yhF0IXdbf1gL/BF2uLPWR2Hu0RFlWu+ NTUNUXKUjD1H8F9YPjHz2bd9OUtzW0eZNl05T3SnJ0mTHYt6V3pUHEHzVUpP/os0q/uvdu3v/R2 NlGnW1XMfaKPnmMUwVooQkHpR+YLq42DvN9ta2EFCKchn7fROv+Oprufl2eWUHPgqxEAZ6nyaIg C1CiFHac263oQHMkx1drb8RlW8NNggVwicUpApDSomdcn7q/98qeCkU8C5C4ZxQ8UJpeApqo8wd fSQTWXvb/fvZg== X-Received: by 2002:a05:6102:605b:b0:605:1994:a8a8 with SMTP id ada2fe7eead31-605a4d770f1mr250928137.9.1775164963308; Thu, 02 Apr 2026 14:22:43 -0700 (PDT) Received: from localhost ([2800:810:843:17:c685:8ff:fed8:cae8]) by smtp.gmail.com with UTF8SMTPSA id ada2fe7eead31-60582e1214esm5184674137.3.2026.04.02.14.22.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 02 Apr 2026 14:22:42 -0700 (PDT) Date: Thu, 2 Apr 2026 18:22:36 -0300 From: Esteban Cerutti To: linux-kernel@vger.kernel.org Cc: linux-block@vger.kernel.org, linux-nvme@lists.infradead.org Subject: [RFC] block/nvme: exploring asynchronous durability notification semantics Message-ID: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Hi, I would like to explore whether current NVMe completion semantics unnecessarily conflate execution completion with durability, and whether there is room for a more explicit, asynchronous durability notification model between host and device. Today, a successful write completion indicates command execution, but not necessarily physical persistence to non-volatile media unless FUA or Flush is used. This forces the kernel and filesystems to assume worst-case durability behavior and rely on synchronous flushes and barriers for safety. The device internally knows when data is staged in volatile buffers versus committed to NAND (or equivalent persistent media), but this information is not exposed to the host. This RFC explores a potential extension model with two components: 1) Multi-phase completion semantics - Normal completion continues to signal execution. - The device assigns a persistence token ID. - When the data is physically committed to non-volatile media, the device emits an asynchronous durability confirmation referencing that token. This would decouple execution throughput from durability confirmation and potentially allow filesystems to close journal transactions only upon confirmed persistence, without forcing synchronous flush fences. 2) Advisory write intent classification - Host-provided hints such as EPHEMERAL, STANDARD, or CRITICAL. - CRITICAL writes would request immediate durability. - EPHEMERAL writes could tolerate extended volatile staging. Additionally, I am curious whether host power-state awareness could be relevant in such a model. For example, if the kernel can detect battery-backed operation or confirmed UPS infrastructure, it could advertise a bounded persistence relaxation window (e.g. guaranteed power for N ms), allowing the device to safely extend volatile staging within that window. This would be advisory and revocable, not a mandatory trust model. Questions for discussion: - Has asynchronous durability acknowledgment been previously explored in NVMe or block-layer discussions? - Are there fundamental architectural reasons why separating execution completion from durability confirmation would not be viable? - Would such semantics belong strictly in NVMe specification work, or is there room for experimentation in the Linux NVMe driver as a prototype? - Are there known workloads where this model would clearly fail or provide no measurable benefit? This is not a proposal for immediate implementation, but an attempt to identify whether the current binary durability model (completion vs flush) leaves performance or efficiency on the table due to lack of explicit state sharing between host and device. Feedback, criticism, or pointers to prior art are very welcome. Thanks, Esteban Cerutti