public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Josef Bacik <josef@toxicpanda.com>
To: fdmanana@kernel.org
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH] btrfs: fix race setting file private on concurrent lseek using same fd
Date: Tue, 10 Sep 2024 11:35:59 -0400	[thread overview]
Message-ID: <20240910153559.GA3943310@perftesting> (raw)
In-Reply-To: <a351fd3b0397b6fe8d94f11a6744e1655349d687.1725356877.git.fdmanana@suse.com>

On Tue, Sep 03, 2024 at 10:55:36AM +0100, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> When doing concurrent lseek(2) system calls against the same file
> descriptor, using multiple threads belonging to the same process, we have
> a short time window where a race happens and can result in a memory leak.
> 
> The race happens like this:
> 
> 1) A program opens a file descriptor for a file and then spawns two
>    threads (with the pthreads library for example), lets call them
>    task A and task B;
> 
> 2) Task A calls lseek with SEEK_DATA or SEEK_HOLE and ends up at
>    file.c:find_desired_extent() while holding a read lock on the inode;
> 
> 3) At the start of find_desired_extent(), it extracts the file's
>    private_data pointer into a local variable named 'private', which has
>    a value of NULL;
> 
> 4) Task B also calls lseek with SEEK_DATA or SEEK_HOLE, locks the inode
>    in shared mode and enters file.c:find_desired_extent(), where it also
>    extracts file->private_data into its local variable 'private', which
>    has a NULL value;
> 
> 5) Because it saw a NULL file private, task A allocates a private
>    structure and assigns to the file structure;
> 
> 6) Task B also saw a NULL file private so it also allocates its own file
>    private and then assigns it to the same file structure, since both
>    tasks are using the same file descriptor.
> 
>    At this point we leak the private structure allocated by task A.
> 
> Besides the memory leak, there's also the detail that both tasks end up
> using the same cached state record in the private structure (struct
> btrfs_file_private::llseek_cached_state), which can result in a
> use-after-free problem since one task can free it while the other is
> still using it (only one task took a reference count on it). Also, sharing
> the cached state is not a good idea since it could result in incorrect
> results in the future - right now it should not be a problem because it
> end ups being used only in extent-io-tree.c:count_range_bits() where we do
> range validation before using the cached state.
> 
> Fix this by protecting the private assignment and check of a file while
> holding the inode's spinlock and keep track of the task that allocated
> the private, so that it's used only by that task in order to prevent
> user-after-free issues with the cached state record as well as potentially
> using it incorrectly in the future.
> 
> Fixes: 3c32c7212f16 ("btrfs: use cached state when looking for delalloc ranges with lseek")
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

We should maybe re-look at our useage of btrfs_file_private and see if we can
come up with a better solution for this style of problem.  I think the directory
readdir stuff does the same sort of thing and we might end up with a similar
issue if we have two tasks using the same fd to do readdir.

But these are just random other thoughts for future work, for this you can add

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

  reply	other threads:[~2024-09-10 15:36 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-03  9:55 [PATCH] btrfs: fix race setting file private on concurrent lseek using same fd fdmanana
2024-09-10 15:35 ` Josef Bacik [this message]
2024-09-10 15:38   ` Filipe Manana

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240910153559.GA3943310@perftesting \
    --to=josef@toxicpanda.com \
    --cc=fdmanana@kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox