From: Sergei Trofimovich <slyfox@gentoo.org>
To: Chris Mason <chris.mason@oracle.com>
Cc: Sergei Trofimovich <slyfox@gentoo.org>, linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 0/9] some fixes for bugs spotted by valgrind
Date: Tue, 31 May 2011 22:10:37 +0300 [thread overview]
Message-ID: <20110531221037.0454f16e@sf> (raw)
In-Reply-To: <1306790348-9553-1-git-send-email-slyfox@gentoo.org>
[-- Attachment #1: Type: text/plain, Size: 4153 bytes --]
Some clarifications:
> Patchset based on 'tmp' branch e6bd18d8938986c997c45f0ea95b221d4edec095.
All patches are against btrfs-progs.
====
The rest of rambling is about kernel code, which handles supers.
I have read what I've wrote last night (braindump of insane!)
and will try to elaborate a bit:
> Poking at valgrind warnings I have noticed very worrying problem.
> When we (over)write superblock we take 4096 continuous bytes in memory.
The common pattern in disk-io.c is the following:
struct btrfs_super_block *sb = root->fs_info->sb // or ->sb_for_commit
memcpy(dest, sb, BTRFS_SUPER_INFO_SIZE /* 4096 */);
With a memcpy we go out of sizeof(*sb) (~2.5K) and pick more fields from fs_info struct.
Let's look at them:
struct btrfs_fs_info {
...
struct btrfs_super_block super_copy;
struct btrfs_super_block super_for_commit;
struct block_device *__bdev;
struct super_block *sb;
struct inode *btree_inode;
struct backing_dev_info bdi;
struct mutex trans_mutex;
struct mutex tree_log_mutex;
struct mutex transaction_kthread_mutex;
struct mutex cleaner_mutex;
struct mutex chunk_mutex;
struct mutex volume_mutex;
struct mutex ordered_operations_mutex;
struct rw_semaphore extent_commit_sem;
struct rw_semaphore cleanup_work_sem;
struct rw_semaphore subvol_sem;
struct srcu_struct subvol_srcu;
struct list_head trans_list;
struct list_head hashers;
struct list_head dead_roots;
struct list_head caching_block_groups;
spinlock_t delayed_iput_lock;
struct list_head delayed_iputs;
atomic_t nr_async_submits;
...
So we copyout (and even checksum!) atomic counters and other volatile stuff
(I haven't looked at mutexes and semaphores, but i'm sure there is yet some
volatile stuff)
> In kernel the structures reside in btrfs_fs_info structure, so we compute
> CRC for:
> struct btrfs_super_block super_copy;
> struct btrfs_super_block super_for_commit;
> and then write it to disk. [H]ere we have 2 issues:
> 1. kernel pointers and other random stuff leaks out to kernel.
> It's nondeterministic and leaks out data (not too bad,
> as it should be accessible only for root, but still)
> 2. more serious: is there guarantee, that noone will kick-in
> between CRC computation and superblock outwrite?
>
> What if some of mutexes, semaphores or lists will change
> it's internal state? Some async thread will kick it
> an we will end-up writing superblock with invalid CRC!
> This might well be the cause of recend superblock
> corruptions under heavy load + hangup retorted to the list.
>
> Consider the following call chain:
> [somewhere in write_dev_supers ...]
>
> bh->b_end_io = btrfs_end_buffer_write_sync;
> crc = ~(u32)0;
> crc = btrfs_csum_data(NULL, (char *)sb +
> BTRFS_CSUM_SIZE, crc,
> BTRFS_SUPER_INFO_SIZE -
> BTRFS_CSUM_SIZE);
> btrfs_csum_final(crc, sb->csum);
Now the problem should be a bit more clear: sb is a thing pointing to the middle
of fs_info. We checksum it with data after it in fs_info.
> bh = __getblk(device->bdev, bytenr / 4096,
> BTRFS_SUPER_INFO_SIZE);
>
> memcpy(bh->b_data, sb, BTRFS_SUPER_INFO_SIZE);
and here we write all the checksummed contents. Is there guard, which
prevents things from updating fs_info?
>
> /* one reference for submit_bh */
> get_bh(bh);
>
> set_buffer_uptodate(bh);
> lock_buffer(bh);
> bh->b_end_io = btrfs_end_buffer_write_sync;
Am I too paranoid about the issue?
Thanks!
--
Sergei
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
next prev parent reply other threads:[~2011-05-31 19:10 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-05-30 21:18 [PATCH 0/9] some fixes for bugs spotted by valgrind Sergei Trofimovich
2011-05-30 21:19 ` [PATCH 1/9] btrfs progs: fix extra metadata chunk allocation in --mixed case Sergei Trofimovich
2011-05-30 21:19 ` [PATCH 2/9] btrfs-convert: fix typo: 'all inode' -> 'all inodes' Sergei Trofimovich
2011-05-30 21:19 ` [PATCH 3/9] mkfs.btrfs: fail on scandir error (-r mode) Sergei Trofimovich
2011-05-30 21:19 ` [PATCH 4/9] mkfs.btrfs: return some defined value instead of garbage when lookup checksum Sergei Trofimovich
2011-05-30 21:19 ` [PATCH 5/9] mkfs.btrfs: fix symlink names writing Sergei Trofimovich
2011-05-30 21:19 ` [PATCH 6/9] mkfs.btrfs: write zeroes instead on uninitialized data Sergei Trofimovich
2011-05-30 21:19 ` [PATCH 7/9] mkfs.btrfs: free buffers allocated by pretty_sizes Sergei Trofimovich
2011-05-30 21:19 ` [PATCH 8/9] mkfs.btrfs: fix memory leak caused by 'scandir()' calls Sergei Trofimovich
2011-05-30 21:19 ` [PATCH 9/9] mkfs.btrfs: fix error text in '-r' mode Sergei Trofimovich
2011-05-31 19:10 ` Sergei Trofimovich [this message]
2011-06-02 20:17 ` [PATCH 0/9] some fixes for bugs spotted by valgrind Andi Kleen
2011-06-02 21:13 ` Sergei Trofimovich
2011-06-02 23:26 ` Andi Kleen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110531221037.0454f16e@sf \
--to=slyfox@gentoo.org \
--cc=chris.mason@oracle.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.