Linux Btrfs filesystem development
 help / color / mirror / Atom feed
From: David Sterba <dsterba@suse.cz>
To: Filipe Manana <fdmanana@kernel.org>
Cc: dsterba@suse.cz, linux-btrfs@vger.kernel.org
Subject: Re: [PATCH] btrfs: fix data race when accessing the last_trans field of a root
Date: Tue, 2 Jul 2024 17:46:06 +0200	[thread overview]
Message-ID: <20240702154606.GG21023@twin.jikos.cz> (raw)
In-Reply-To: <CAL3q7H7X9zMZh-0ruaQV++mMY2q3oFTq6kW2BwOe=v+0OECGQQ@mail.gmail.com>

On Tue, Jul 02, 2024 at 04:09:42PM +0100, Filipe Manana wrote:
> On Tue, Jul 2, 2024 at 3:52 PM David Sterba <dsterba@suse.cz> wrote:
> > On Mon, Jul 01, 2024 at 11:01:53AM +0100, fdmanana@kernel.org wrote:
> > >   [  199.564372]  __s390x_sys_write+0x68/0x88
> > >   [  199.564397]  do_syscall+0x1c6/0x210
> > >   [  199.564424]  __do_syscall+0xc8/0xf0
> > >   [  199.564452]  system_call+0x70/0x98
> > >
> > > This is because we update and read last_trans concurrently without any
> > > type of synchronization. This should be generally harmless and in the
> > > worst case it can make us do extra locking (btrfs_record_root_in_trans())
> > > trigger some warnings at ctree.c or do extra work during relocation - this
> > > would probably only happen in case of load or store tearing.
> > >
> > > So fix this by always reading and updating the field using READ_ONCE()
> > > and WRITE_ONCE(), this silences KCSAN and prevents load and store tearing.
> >
> > I'm curious why you mention the load/store tearing, as we discussed this
> > last time under some READ_ONCE/WRITE_ONCE change it's not happening on
> > aligned addresses for any integer type, I provided links to intel manuals.
> 
> Yes, I do remember that.
> But that was a different case, it was about a pointer type.
> 
> This is a u64. Can't the load/store tearing happen at the very least
> on 32 bits systems?

Right, it was for a pointer type. I'll continue searching for a
definitive answer regarding 64bit types on 32bit architectures. The
tearing could likely happen when a 64bit type is split into two
cachelines, but I'd be very curious how this could happen within one
cacheline (assuming compiler will align 64bit types to 8 bytes).

> I believe that's the reason we use WRITE_ONCE/READ_ONCE in several
> places dealing with u64s.

AFAIK we do READ_ONCE/WRITE_ONCE for unlocked access as an annotation,
e.g. for the sysfs configuration values used in code. Or when there's a
fast path that reads a value outised of a lock and then under the lock,
there it needs the fresh value that's enforced by READ_ONCE.

The KCSAN reports should be fixed by data_race() annotation so it's not
confused by the above.

I don't see how READ_ONCE protects against load tearing on 32bit because
it's doing the same thing on 64bit and that's verifying that it's basic
scalar type and hen it's an ordinary access.

https://elixir.bootlin.com/linux/latest/source/include/asm-generic/rwonce.h#L47

#ifndef __READ_ONCE
#define __READ_ONCE(x)	(*(const volatile __unqual_scalar_typeof(x) *)&(x))
#endif

#define READ_ONCE(x)							\
({									\
	compiletime_assert_rwonce_type(x);				\
	__READ_ONCE(x);							\
})

__unqual_scalar_typeof() checks types from char to long long.

An x86_64 build and also i386 build use the same file
asm-generic/rwonce.h, no code that would prevent load tearing.
The only thing that comes to mind is that it's all hidden in the address
and pointer dereference, but that still says nothing about alignment or
cacheline-straddling.

There are arch-specific implementations of that header that do
workarounds some architectural oddities, but that's on Alpha (we don't
care) and ARM64 (64bit arch with 64bit pointers assumed).

Why I'm so picky about that? For one I want to understad it completely.
This has been bothering me for a long time as the arguments were not
always solid and more like cargo culting (happend in the past) or
scattered in comments to articles or mail threads. If we're really
missing correct use of the _ONCE accessors then we have potential bugs
lurking somewhere.

I don't mind if we add data_race() annotations as we do generally update
code to be able to use internal tools like that, or when we use _ONCE
for the fast path or as an annotation.

The article https://lwn.net/Articles/793253/ title says "Who's afraid of
a big bad optimizing compiler?" and in the first load/store tearing
example argues with a sample 16 bit architecture that could do 2 byte
loads of a pointer. That's good for a demonstration, I want something
real and relevant for linux kernel.

  reply	other threads:[~2024-07-02 15:46 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-01 10:01 [PATCH] btrfs: fix data race when accessing the last_trans field of a root fdmanana
2024-07-01 14:16 ` Josef Bacik
2024-07-02 14:52 ` David Sterba
2024-07-02 15:09   ` Filipe Manana
2024-07-02 15:46     ` David Sterba [this message]
2024-07-03 23:05       ` David Sterba
2024-07-08 16:23         ` Filipe Manana

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240702154606.GG21023@twin.jikos.cz \
    --to=dsterba@suse.cz \
    --cc=fdmanana@kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox