linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sergei Trofimovich <slyfox@gentoo.org>
To: Chris Mason <chris.mason@oracle.com>
Cc: Sergei Trofimovich <slyfox@gentoo.org>, linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 0/9] some fixes for bugs spotted by valgrind
Date: Tue, 31 May 2011 22:10:37 +0300	[thread overview]
Message-ID: <20110531221037.0454f16e@sf> (raw)
In-Reply-To: <1306790348-9553-1-git-send-email-slyfox@gentoo.org>

[-- Attachment #1: Type: text/plain, Size: 4153 bytes --]

Some clarifications:

> Patchset based on 'tmp' branch e6bd18d8938986c997c45f0ea95b221d4edec095.
All patches are against btrfs-progs.

====

The rest of rambling is about kernel code, which handles supers.
I have read what I've wrote last night (braindump of insane!)
and will try to elaborate a bit:

> Poking at valgrind warnings I have noticed very worrying problem.
> When we (over)write superblock we take 4096 continuous bytes in memory.

The common pattern in disk-io.c is the following:

    struct btrfs_super_block *sb = root->fs_info->sb // or ->sb_for_commit
    memcpy(dest, sb, BTRFS_SUPER_INFO_SIZE /* 4096 */);

With a memcpy we go out of sizeof(*sb) (~2.5K) and pick more fields from fs_info struct.
Let's look at them:
struct btrfs_fs_info {
...
        struct btrfs_super_block super_copy;
        struct btrfs_super_block super_for_commit;
        struct block_device *__bdev;
        struct super_block *sb;
        struct inode *btree_inode;
        struct backing_dev_info bdi;
        struct mutex trans_mutex;
        struct mutex tree_log_mutex;
        struct mutex transaction_kthread_mutex;
        struct mutex cleaner_mutex;
        struct mutex chunk_mutex;
        struct mutex volume_mutex;
        struct mutex ordered_operations_mutex;
        struct rw_semaphore extent_commit_sem;

        struct rw_semaphore cleanup_work_sem;

        struct rw_semaphore subvol_sem;
        struct srcu_struct subvol_srcu;

        struct list_head trans_list;
        struct list_head hashers;
        struct list_head dead_roots;
        struct list_head caching_block_groups;

        spinlock_t delayed_iput_lock;
        struct list_head delayed_iputs;

        atomic_t nr_async_submits;
...
So we copyout (and even checksum!) atomic counters and other volatile stuff
(I haven't looked at mutexes and semaphores, but i'm sure there is yet some
volatile stuff) 

> In kernel the structures reside in btrfs_fs_info structure, so we compute
> CRC for:
>     struct btrfs_super_block super_copy;
>     struct btrfs_super_block super_for_commit;
> and then write it to disk. [H]ere we have 2 issues:
> 1. kernel pointers and other random stuff leaks out to kernel.
>    It's nondeterministic and leaks out data (not too bad,
>    as it should be accessible only for root, but still)
> 2. more serious: is there guarantee, that noone will kick-in
>    between CRC computation and superblock outwrite?
> 
>    What if some of mutexes, semaphores or lists will change
>    it's internal state? Some async thread will kick it
>    an we will end-up writing superblock with invalid CRC!
>    This might well be the cause of recend superblock
>    corruptions under heavy load + hangup retorted to the list.
> 
> Consider the following call chain:
> [somewhere in write_dev_supers ...]
> 
>                         bh->b_end_io = btrfs_end_buffer_write_sync;
>                         crc = ~(u32)0;
>                         crc = btrfs_csum_data(NULL, (char *)sb +
>                                               BTRFS_CSUM_SIZE, crc,
>                                               BTRFS_SUPER_INFO_SIZE -
>                                               BTRFS_CSUM_SIZE);
>                         btrfs_csum_final(crc, sb->csum);

Now the problem should be a bit more clear: sb is a thing pointing to the middle
of fs_info. We checksum it with data after it in fs_info.

>                         bh = __getblk(device->bdev, bytenr / 4096,
>                                       BTRFS_SUPER_INFO_SIZE);
> 
>                         memcpy(bh->b_data, sb, BTRFS_SUPER_INFO_SIZE);

and here we write all the checksummed contents. Is there guard, which
prevents things from updating fs_info?

> 
>                         /* one reference for submit_bh */
>                         get_bh(bh);
> 
>                         set_buffer_uptodate(bh);
>                         lock_buffer(bh);
>                         bh->b_end_io = btrfs_end_buffer_write_sync;

Am I too paranoid about the issue?

Thanks!

-- 

  Sergei

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

  parent reply	other threads:[~2011-05-31 19:10 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-05-30 21:18 [PATCH 0/9] some fixes for bugs spotted by valgrind Sergei Trofimovich
2011-05-30 21:19 ` [PATCH 1/9] btrfs progs: fix extra metadata chunk allocation in --mixed case Sergei Trofimovich
2011-05-30 21:19 ` [PATCH 2/9] btrfs-convert: fix typo: 'all inode' -> 'all inodes' Sergei Trofimovich
2011-05-30 21:19 ` [PATCH 3/9] mkfs.btrfs: fail on scandir error (-r mode) Sergei Trofimovich
2011-05-30 21:19 ` [PATCH 4/9] mkfs.btrfs: return some defined value instead of garbage when lookup checksum Sergei Trofimovich
2011-05-30 21:19 ` [PATCH 5/9] mkfs.btrfs: fix symlink names writing Sergei Trofimovich
2011-05-30 21:19 ` [PATCH 6/9] mkfs.btrfs: write zeroes instead on uninitialized data Sergei Trofimovich
2011-05-30 21:19 ` [PATCH 7/9] mkfs.btrfs: free buffers allocated by pretty_sizes Sergei Trofimovich
2011-05-30 21:19 ` [PATCH 8/9] mkfs.btrfs: fix memory leak caused by 'scandir()' calls Sergei Trofimovich
2011-05-30 21:19 ` [PATCH 9/9] mkfs.btrfs: fix error text in '-r' mode Sergei Trofimovich
2011-05-31 19:10 ` Sergei Trofimovich [this message]
2011-06-02 20:17   ` [PATCH 0/9] some fixes for bugs spotted by valgrind Andi Kleen
2011-06-02 21:13     ` Sergei Trofimovich
2011-06-02 23:26       ` Andi Kleen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110531221037.0454f16e@sf \
    --to=slyfox@gentoo.org \
    --cc=chris.mason@oracle.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).