Linux Btrfs filesystem development
 help / color / mirror / Atom feed
From: David Sterba <dsterba@suse.cz>
To: David Sterba <dsterba@suse.cz>
Cc: Filipe Manana <fdmanana@kernel.org>, linux-btrfs@vger.kernel.org
Subject: Re: [PATCH] btrfs: fix data race when accessing the last_trans field of a root
Date: Thu, 4 Jul 2024 01:05:46 +0200	[thread overview]
Message-ID: <20240703230546.GR21023@twin.jikos.cz> (raw)
In-Reply-To: <20240702154606.GG21023@twin.jikos.cz>

On Tue, Jul 02, 2024 at 05:46:06PM +0200, David Sterba wrote:
> On Tue, Jul 02, 2024 at 04:09:42PM +0100, Filipe Manana wrote:
> > On Tue, Jul 2, 2024 at 3:52 PM David Sterba <dsterba@suse.cz> wrote:
> > > On Mon, Jul 01, 2024 at 11:01:53AM +0100, fdmanana@kernel.org wrote:
> > > >   [  199.564372]  __s390x_sys_write+0x68/0x88
> > > >   [  199.564397]  do_syscall+0x1c6/0x210
> > > >   [  199.564424]  __do_syscall+0xc8/0xf0
> > > >   [  199.564452]  system_call+0x70/0x98
> > > >
> > > > This is because we update and read last_trans concurrently without any
> > > > type of synchronization. This should be generally harmless and in the
> > > > worst case it can make us do extra locking (btrfs_record_root_in_trans())
> > > > trigger some warnings at ctree.c or do extra work during relocation - this
> > > > would probably only happen in case of load or store tearing.
> > > >
> > > > So fix this by always reading and updating the field using READ_ONCE()
> > > > and WRITE_ONCE(), this silences KCSAN and prevents load and store tearing.
> > >
> > > I'm curious why you mention the load/store tearing, as we discussed this
> > > last time under some READ_ONCE/WRITE_ONCE change it's not happening on
> > > aligned addresses for any integer type, I provided links to intel manuals.
> > 
> > Yes, I do remember that.
> > But that was a different case, it was about a pointer type.
> > 
> > This is a u64. Can't the load/store tearing happen at the very least
> > on 32 bits systems?
> 
> Right, it was for a pointer type. I'll continue searching for a
> definitive answer regarding 64bit types on 32bit architectures.

I have a counter example, but this is a weak "proof by compiler", yet at
least something tangible:

in space-info.c:
void btrfs_update_space_info_chunk_size(struct btrfs_space_info *space_info,
                                        u64 chunk_size)
{
        WRITE_ONCE(space_info->chunk_size, chunk_size);
}

This uses _ONCE for the sysfs use case but demonstrates that it does not
prevent load tearing on 32bit:

000012f4 <btrfs_update_space_info_chunk_size>:
    12f4:       /-- e8 fc ff ff ff                      calll  12f5 <btrfs_update_space_info_chunk_size+0x1>
                        12f5: R_386_PC32        __fentry__
    12f9:           55                                  pushl  %ebp
    12fa:           89 e5                               movl   %esp,%ebp
    12fc:           53                                  pushl  %ebx
    12fd:           5b                                  popl   %ebx

    12fe:           89 50 44                            movl   %edx,0x44(%eax)
    1301:           89 48 48                            movl   %ecx,0x48(%eax)

    1304:           5d                                  popl   %ebp
    1305:       /-- e9 fc ff ff ff                      jmpl   1306 <btrfs_update_space_info_chunk_size+0x12>
                        1306: R_386_PC32        __x86_return_thunk
    130a:           66 90                               xchgw  %ax,%ax


eax - first parameter, a pointer to struct
edx - first half of the first parameter (u64)
ecx - second half of the first parameter

I found this example as it easy to identify, many other _ONCE uses are
static inlines or mixed in other code.

I've also tried various compilers on godbolt.org, on 32bit architectures
like MIPS32, SPARC32 and maybe others, there are two (or more)
instructions needed.

The question that remains is if the 2 instruction load/store matters in
case if the value is in one cache line as this depends on the MESI cache
synchronization protocol. The only case I see where it could tear it is
when the process gets rescheduled exactly between the instructions. An
outer lock/mutex synchronization could prevent it so it depends.

  reply	other threads:[~2024-07-03 23:05 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-01 10:01 [PATCH] btrfs: fix data race when accessing the last_trans field of a root fdmanana
2024-07-01 14:16 ` Josef Bacik
2024-07-02 14:52 ` David Sterba
2024-07-02 15:09   ` Filipe Manana
2024-07-02 15:46     ` David Sterba
2024-07-03 23:05       ` David Sterba [this message]
2024-07-08 16:23         ` Filipe Manana

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240703230546.GR21023@twin.jikos.cz \
    --to=dsterba@suse.cz \
    --cc=fdmanana@kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox