From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out-174.mta0.migadu.com (out-174.mta0.migadu.com [91.218.175.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7A39424BBFD for ; Mon, 16 Feb 2026 13:18:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.174 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771247908; cv=none; b=CD161dPyWMgrONJdcsH/tXgsZFxy3RLPRhOukw+dNuCAjs5dXd0pkgH/rX8WkDgPmMpDKesiAr1TVCb+nzfP2vmBGmRC6wqChK+zkD2ehXr5reZsNX0GdMwrICZoYe9rHJbtfisS6FWU9HbA8W3LmMoUPjp1ZaoadCqxHzkcF+Q= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771247908; c=relaxed/simple; bh=vZEGwyLXtwS50t+qseTk7VEoyijqYlIwNiT1NUT/MaQ=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=T4cpfPs/rkzZPhCyZQDBx1W27EaK867DyVDFsXi0RguUWWi27qKdfdgpLQq1nB38XxhZcRCjcOph6qKp400ZUYp9/L5vPNn+ijWpN7MBN+upg9ASgsNp8+0vjxCAAGLxXT5yRaMbj0//amjvVsdE1VXLndm0BPisiCmA6QSHNU4= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=p48jLXCy; arc=none smtp.client-ip=91.218.175.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="p48jLXCy" Message-ID: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1771247894; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=V05ejPvopLo9DRbeUi1eZlG4ZZP00I4HL4AVm44w2ig=; b=p48jLXCyB4VQIxa8U3Gdq7k8wKO8XyymBq6W8BKSRkDN7UxEod2Qo9XWM6In41SsmVr+1X 6bxBpYh3sSb2Zy2qbVUyeFagIxBxAeBh7JjAaEUhiDSJL47UcvZlEFoNMF1POC5KagDyAC gKbYB0/kLVuZKWkLCLWUh1bnjRK+ODA= Date: Mon, 16 Feb 2026 14:18:10 +0100 Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Subject: Re: [Lsf-pc] [LSF/MM/BPF TOPIC] Buffered atomic writes To: Jan Kara , Ojaswin Mujoo Cc: linux-xfs@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, lsf-pc@lists.linux-foundation.org, Andres Freund , djwong@kernel.org, john.g.garry@oracle.com, willy@infradead.org, hch@lst.de, ritesh.list@gmail.com, Luis Chamberlain , dchinner@redhat.com, Javier Gonzalez , gost.dev@samsung.com, tytso@mit.edu, p.raghav@samsung.com, vi.shah@samsung.com References: Content-Language: en-US X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Pankaj Raghav In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Migadu-Flow: FLOW_OUT On 2/16/2026 12:38 PM, Jan Kara wrote: > Hi! > > On Fri 13-02-26 19:02:39, Ojaswin Mujoo wrote: >> Another thing that came up is to consider using write through semantics >> for buffered atomic writes, where we are able to transition page to >> writeback state immediately after the write and avoid any other users to >> modify the data till writeback completes. This might affect performance >> since we won't be able to batch similar atomic IOs but maybe >> applications like postgres would not mind this too much. If we go with >> this approach, we will be able to avoid worrying too much about other >> users changing atomic data underneath us. >> >> An argument against this however is that it is user's responsibility to >> not do non atomic IO over an atomic range and this shall be considered a >> userspace usage error. This is similar to how there are ways users can >> tear a dio if they perform overlapping writes. [1]. > > Yes, I was wondering whether the write-through semantics would make sense > as well. Intuitively it should make things simpler because you could > practially reuse the atomic DIO write path. Only that you'd first copy > data into the page cache and issue dio write from those folios. No need for > special tracking of which folios actually belong together in atomic write, > no need for cluttering standard folio writeback path, in case atomic write > cannot happen (e.g. because you cannot allocate appropriately aligned > blocks) you get the error back rightaway, ... > > Of course this all depends on whether such semantics would be actually > useful for users such as PostgreSQL. One issue might be the performance, especially if the atomic max unit is in the smaller end such as 16k or 32k (which is fairly common). But it will avoid the overlapping writes issue and can easily leverage the direct IO path. But one thing that postgres really cares about is the integrity of a database block. So if there is an IO that is a multiple of an atomic write unit (one atomic unit encapsulates the whole DB page), it is not a problem if tearing happens on the atomic boundaries. This fits very well with what NVMe calls Multiple Atomicity Mode (MAM) [1]. We don't have any semantics for MaM at the moment but that could increase the performance as we can do larger IOs but still get the atomic guarantees certain applications care about. [1] https://nvmexpress.org/wp-content/uploads/NVM-Express-NVM-Command-Set-Specification-Revision-1.1-2024.08.05-Ratified.pdf