All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Mateusz Guzik <mjguzik@gmail.com>
Cc: Hao-ran Zheng <zhenghaoran@buaa.edu.cn>,
	viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.cz,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	baijiaju1990@gmail.com, 21371365@buaa.edu.cn
Subject: Re: [PATCH v4] fs: Fix data race in inode_set_ctime_to_ts
Date: Sun, 24 Nov 2024 10:34:47 -0800	[thread overview]
Message-ID: <20241124183447.GP1926309@frogsfrogsfrogs> (raw)
In-Reply-To: <wxwj3mxb7xromjvy3vreqbme7tugvi7gfriyhtcznukiladeoj@o7drq3kvflfa>

On Sun, Nov 24, 2024 at 06:56:57PM +0100, Mateusz Guzik wrote:
> On Sun, Nov 24, 2024 at 09:44:35AM -0800, Darrick J. Wong wrote:
> > On Sun, Nov 24, 2024 at 05:42:53PM +0800, Hao-ran Zheng wrote:
> > > A data race may occur when the function `inode_set_ctime_to_ts()` and
> > > the function `inode_get_ctime_sec()` are executed concurrently. When
> > > two threads call `aio_read` and `aio_write` respectively, they will
> > > be distributed to the read and write functions of the corresponding
> > > file system respectively. Taking the btrfs file system as an example,
> > > the `btrfs_file_read_iter` and `btrfs_file_write_iter` functions are
> > > finally called. These two functions created a data race when they
> > > finally called `inode_get_ctime_sec()` and `inode_set_ctime_to_ns()`.
> > > The specific call stack that appears during testing is as follows:
> > > 
> > > ============DATA_RACE============
> > > btrfs_delayed_update_inode+0x1f61/0x7ce0 [btrfs]
> > > btrfs_update_inode+0x45e/0xbb0 [btrfs]
> > > btrfs_dirty_inode+0x2b8/0x530 [btrfs]
> > > btrfs_update_time+0x1ad/0x230 [btrfs]
> > > touch_atime+0x211/0x440
> > > filemap_read+0x90f/0xa20
> > > btrfs_file_read_iter+0xeb/0x580 [btrfs]
> > > aio_read+0x275/0x3a0
> > > io_submit_one+0xd22/0x1ce0
> > > __se_sys_io_submit+0xb3/0x250
> > > do_syscall_64+0xc1/0x190
> > > entry_SYSCALL_64_after_hwframe+0x77/0x7f
> > > ============OTHER_INFO============
> > > btrfs_write_check+0xa15/0x1390 [btrfs]
> > > btrfs_buffered_write+0x52f/0x29d0 [btrfs]
> > > btrfs_do_write_iter+0x53d/0x1590 [btrfs]
> > > btrfs_file_write_iter+0x41/0x60 [btrfs]
> > > aio_write+0x41e/0x5f0
> > > io_submit_one+0xd42/0x1ce0
> > > __se_sys_io_submit+0xb3/0x250
> > > do_syscall_64+0xc1/0x190
> > > entry_SYSCALL_64_after_hwframe+0x77/0x7f
> > > 
> > > To address this issue, it is recommended to add WRITE_ONCE
> > > and READ_ONCE when reading or writing the `inode->i_ctime_sec`
> > > and `inode->i_ctime_nsec` variable.
> > 
> > Excuse my ignorance, but how exactly does this annotation fix the race?
> > Does it prevent reordering of the memory accesses or something?  "it is
> > recommended" is not enough for dunces such as myself to figure out why
> > this fixes the problem. :(
> > 
> 
> It prevents the compiler from getting ideas. One hypothetical is
> splitting the load from one asm op to several, possibly resulting a
> corrupted value when racing against an update
> 
> A not hypothethical concerns some like this:
> 	time64_t sec = inode_get_ctime_sec(inode);
> 	/* do stuff with sec */
> 	/* do other stuff */
> 	/* do more stuff sec */
> 
> The compiler might have decided to throw away the read sec value and do
> the load again later. Thus if an update happened in the meantime then
> the code ends up operating on 2 different values, even though the
> programmer clearly intended it to be stable.

<nod> I figured as much, but I wanted the commit message to spell that
out for everybody, e.g. "Use READ_ONCE to force compilers to sample the
ctime values one time only, and WRITE_ONCE to prevent the compiler from
turning one line of code into multiple stores to the same address."

> However, since both sec and nsec are updated separately and there is no
> synchro, reading *both* can still result in values from 2 different
> updates which is a bug not addressed by any of the above. To my
> underestanding of the vfs folk take on it this is considered tolerable.

<nod> This is the other thing -- the commit message refers to preventing
a data race, but there is indeed nothing about access ordering that
prevents /that/ race.  Someone not an fs developer could interpret that
as "the patch does not fully fix all possible problems" whereas the
community has decided there's an acceptable tradeoff with not taking
i_rwsem.

--D

  reply	other threads:[~2024-11-24 18:34 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-20  2:43 [PATCH] fs: Fix data race in inode_set_ctime_to_ts Hao-ran Zheng
2024-11-21 11:35 ` Jan Kara
2024-11-22  3:51   ` [PATCH v2] " Hao-ran Zheng
2024-11-22 11:13     ` Christian Brauner
2024-11-22 11:22     ` Jan Kara
2024-11-22 11:48       ` 郑浩然
2024-11-22 13:06       ` [PATCH v3] " Hao-ran Zheng
2024-11-23 14:01         ` Jeff Layton
2024-11-24  8:46           ` 郑浩然
2024-11-24  9:42           ` [PATCH v4] " Hao-ran Zheng
2024-11-24 17:44             ` Darrick J. Wong
2024-11-24 17:56               ` Mateusz Guzik
2024-11-24 18:34                 ` Darrick J. Wong [this message]
2024-11-24 21:50                 ` [RFC] metadata updates vs. fetches (was Re: [PATCH v4] fs: Fix data race in inode_set_ctime_to_ts) Al Viro
2024-11-24 22:10                   ` Linus Torvalds
2024-11-24 22:24                     ` Al Viro
2024-11-24 22:34                       ` Matthew Wilcox
2024-11-24 22:43                         ` Linus Torvalds
2024-11-24 23:53                           ` Matthew Wilcox
2024-11-25  0:53                             ` Linus Torvalds
2024-11-25  1:02                               ` Linus Torvalds
2024-11-25  1:15                               ` Matthew Wilcox
2024-11-25  1:26                                 ` Linus Torvalds
2024-11-24 22:40                       ` Linus Torvalds
2024-11-24 23:05                     ` Mateusz Guzik
2024-11-24 23:19                       ` Mateusz Guzik
2024-11-24 23:41                         ` Al Viro
2024-11-24 23:38                       ` Al Viro
2024-11-25 12:20                     ` Christian Brauner
2024-11-24 22:10                   ` Dr. David Alan Gilbert
2024-11-24 22:19                     ` Matthew Wilcox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20241124183447.GP1926309@frogsfrogsfrogs \
    --to=djwong@kernel.org \
    --cc=21371365@buaa.edu.cn \
    --cc=baijiaju1990@gmail.com \
    --cc=brauner@kernel.org \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mjguzik@gmail.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=zhenghaoran@buaa.edu.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.