From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pl1-f172.google.com (mail-pl1-f172.google.com [209.85.214.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 93E4A2F30 for ; Mon, 20 Jan 2025 05:11:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.172 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737349907; cv=none; b=h358sxchL4jx3B7STWUPu304xsP3xcUKmTveN69n4Xs6n2IFjjfriBUHllFtGmZciB04RBXI5ZbKGviojtHSJ3bOd88R1Oq3TNL80TtvJIkvgbLH26V6OLKJGrN5UjJtt5hhGlbNaDRwyu7p6HnI1oonhoLfsGY9xoIDVaX7AM8= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737349907; c=relaxed/simple; bh=6/hN0uU5Wl1ajZZjuZXT7EQYyfDFmNjcnumjQGHIlDM=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=IlA/TEnBf/y1naTzoocgvHF1BMEn6IY0VkjR6CB2C+SGpYjLjIZ3v0VlwyoLRi8q2VzhK4hk3IhZOFdtJulTGIdQ66yKbDyRi9Ned7eBvJGiv4xVe0zgRznmfB9rS9fMsByW2r8HUOh5gmJo9ePv/c/eCVS0+F3+vkqBYBd1SP0= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=fromorbit.com; spf=pass smtp.mailfrom=fromorbit.com; dkim=pass (2048-bit key) header.d=fromorbit-com.20230601.gappssmtp.com header.i=@fromorbit-com.20230601.gappssmtp.com header.b=XyQd/JE5; arc=none smtp.client-ip=209.85.214.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=fromorbit.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fromorbit.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=fromorbit-com.20230601.gappssmtp.com header.i=@fromorbit-com.20230601.gappssmtp.com header.b="XyQd/JE5" Received: by mail-pl1-f172.google.com with SMTP id d9443c01a7336-21644aca3a0so93002285ad.3 for ; Sun, 19 Jan 2025 21:11:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20230601.gappssmtp.com; s=20230601; t=1737349905; x=1737954705; darn=vger.kernel.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=MewRj9QiRq20R+oAW7gPmxUVZw7d03MaRffdhRRwACc=; b=XyQd/JE56btLujCCZrIuxVQTvFK9HnOdlWO3ReG4NpXGFlha/wq8/ZX9M3Qe5oPGY3 +a+9/DO6LOFXABo8xSP6oYHmypT8TZuGzg7YQYCerOwZM/F2s1COO1hMqyBB20YdLQSP sD7kjVOCDZgUHxDKf7wkh5oJhl7ow83PEDj3pfn7pvMEr723qHdpxyhh5a0+W7FZhTft jydIFlxnJEal6eitNy8uShAvuOJ4NvZziGSyFB4eJEZHGh27aQp960gJ8DX2G/Kyf/bs UF7VKb8T1MIpUsDLP9Q4miTd4vn8E7wgipoIc0Tm421OIqCS2rkdATFb6RFLKVQVoxY6 1eEA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737349905; x=1737954705; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=MewRj9QiRq20R+oAW7gPmxUVZw7d03MaRffdhRRwACc=; b=cEH6C4R4axmuVGXLsKt2jKLzAA6dcIuckbS+piRmHzNhBEVCemnYf/Oo7+hlaW84Jw WRgNH8ljQ9+e/6UlnrZhaA92sutJLSpYURaFogdTTAg4JHkUSoIPVN77RdXhMmu0JsNr 86DCRT2C3aNaVskv2DAMeMR3B/B3Le3RmM6shSe2IwTdU085WyAfroDoSIFWe9t7AMNI cX7CLGMSBa9uUUphy7TyvIa88p7rTgKDPUgitFwijf3a/IY5JeEJXdtm7j/Rp0Mb93CU wqHV01QVDL7urEVVwA0rw5EcXA2AD8z88En9xzXM6F2WE1LHX+sxdR0zpCNDhRpqAn7B Re5A== X-Forwarded-Encrypted: i=1; AJvYcCX9BuPMmdaiPfCERa5pd0UYZXaeg+nRErXq5+so46laBQdFJ4X7Ikf5zbT1bk2B1RJ1YCfpxoIgNMLzsGk=@vger.kernel.org X-Gm-Message-State: AOJu0Yx5IdpNbDitB+T2tATzC5kirHBDBBWMjDdiVjWSnLtlHxpQckuH 9mWFio6b1Q7SuvR9usEkpn3w2JblgFDUXidTmR7qWo9RfUyU7tXsW3l8nUhDbG0= X-Gm-Gg: ASbGnctvoslhSJn6q6UR/e+QIN0yVP/sesr6Waz3Fna4T1t5pwaShq0cpjCGe+yYXjD iok0ZetHOX4qZSRaOl+DQSGmRnRWotrR5+PfGODKkipACLvDNibOFkN3wTs+r6XOW8q7tRGModt U/HOcMefF7ef3Gv+PS33QHWOvAIUb2aqu41+TXa3eJZNZrkbkTaNemXrdb2r+CK6GUeH9dXZfgj w9SaDetv+NI2OhbRZda49zMZNWiaIqHzznwZKGnyAohAmfQR5/c6e1h/cywAfiSRlNB9a/i5JVQ UQvRyFGzYNHYtvYCy+qThUgbVNUEHWQjBL5FXV3kT4/X4w== X-Google-Smtp-Source: AGHT+IHhbkqmjwk7Q6aME70kwcXDy+5z8I0QB6+x/heTFSjea9kOICMiW5mp/Ss+fGBt5bxAEJoGtA== X-Received: by 2002:a17:903:2302:b0:216:4a8a:2665 with SMTP id d9443c01a7336-21c3560949emr171420605ad.50.1737349904797; Sun, 19 Jan 2025 21:11:44 -0800 (PST) Received: from dread.disaster.area (pa49-186-89-135.pa.vic.optusnet.com.au. [49.186.89.135]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-21c2d403100sm52561115ad.234.2025.01.19.21.11.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 19 Jan 2025 21:11:44 -0800 (PST) Received: from dave by dread.disaster.area with local (Exim 4.98) (envelope-from ) id 1tZk4b-000000087ZK-0YIq; Mon, 20 Jan 2025 16:11:41 +1100 Date: Mon, 20 Jan 2025 16:11:41 +1100 From: Dave Chinner To: Amir Goldstein Cc: Christoph Hellwig , Brian Foster , "Darrick J. Wong" , Chi Zhiling , cem@kernel.org, linux-xfs@vger.kernel.org, linux-kernel@vger.kernel.org, Chi Zhiling , John Garry Subject: Re: [PATCH] xfs: Remove i_rwsem lock in buffered read Message-ID: References: <20250113024401.GU1306365@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: On Sat, Jan 18, 2025 at 02:03:41PM +0100, Amir Goldstein wrote: > On Fri, Jan 17, 2025 at 11:19 PM Dave Chinner > wrote: > > On Fri, Jan 17, 2025 at 02:27:46PM +0100, Amir Goldstein wrote: > > > On Wed, Jan 15, 2025 at 10:41 PM Dave Chinner > > > wrote: For all practical purposes, we > > > could maintain a counter in inode not for submitted DIO, but > > > for files opened O_DIRECT. > > > > > > The first open O_DIRECT could serialize with in-flight > > > buffered writes holding a shared iolock and then buffered > > > writes could take SH vs. EX iolock depending on folio state > > > and on i_dio_open_count. > > > > I don't see how this can be made to work w.r.t. sane data > > coherency behaviour. e.g. how does this model serialise a new > > DIO write that is submitted after a share-locked buffered write > > has just started and passed all the "i_dio_count == 0" checks > > that enable it to use shared locking? i.e. we now have > > potentially overlapping buffered writes and DIO writes being > > done concurrently because the buffered write may not have > > instantiated folios in the page cache yet.... > > > > A shared-locked buffered write can only start when there is no > O_DIRECT file open on the inode. The first open for O_DIRECT to > increment i_dio_open_count needs to take exclusive iolock to wait > for all in-flight buffered writes then release the iolock. > > All DIO submitted via O_DIRECT fds will be safe against in-flight > share-locked buffered write. Hooking ->open() and ->release() isn't sufficient. O_DIRECT can be changed at any time via fcntl(F_SETFL) which isn't synchronised against IO in progress at all. i.e. we can't track that, nor can we even use f_ops->check_flags to reject changes to O_DIRECT state because it doesn't pass either the filp or the inode.... IOWs, as it stands logic like "is the file opened for O_DIRECT" in the IO path is inherently racy and the racing mechanisms are directly under the control of unprivileged userspace applications. Without mods from the VFS down and new hooks being implemented in XFS along with all the whacky "which serialisiation method do we use" logic and dealing with the complexities and switching between them, tracking struct files that are open for O_DIRECT isn't really a viable mechanism for enabling shared buffered writes... -Dave. -- Dave Chinner david@fromorbit.com