From: Sergei Trofimovich <slyfox@gentoo.org>
To: Chris Mason <chris.mason@oracle.com>
Cc: Sergei Trofimovich <slyfox@gentoo.org>, linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 0/9] some fixes for bugs spotted by valgrind
Date: Tue, 31 May 2011 22:10:37 +0300 [thread overview]
Message-ID: <20110531221037.0454f16e@sf> (raw)
In-Reply-To: <1306790348-9553-1-git-send-email-slyfox@gentoo.org>
[-- Attachment #1: Type: text/plain, Size: 4153 bytes --]
Some clarifications:
> Patchset based on 'tmp' branch e6bd18d8938986c997c45f0ea95b221d4edec095.
All patches are against btrfs-progs.
====
The rest of rambling is about kernel code, which handles supers.
I have read what I've wrote last night (braindump of insane!)
and will try to elaborate a bit:
> Poking at valgrind warnings I have noticed very worrying problem.
> When we (over)write superblock we take 4096 continuous bytes in memory.
The common pattern in disk-io.c is the following:
struct btrfs_super_block *sb = root->fs_info->sb // or ->sb_for_commit
memcpy(dest, sb, BTRFS_SUPER_INFO_SIZE /* 4096 */);
With a memcpy we go out of sizeof(*sb) (~2.5K) and pick more fields from fs_info struct.
Let's look at them:
struct btrfs_fs_info {
...
struct btrfs_super_block super_copy;
struct btrfs_super_block super_for_commit;
struct block_device *__bdev;
struct super_block *sb;
struct inode *btree_inode;
struct backing_dev_info bdi;
struct mutex trans_mutex;
struct mutex tree_log_mutex;
struct mutex transaction_kthread_mutex;
struct mutex cleaner_mutex;
struct mutex chunk_mutex;
struct mutex volume_mutex;
struct mutex ordered_operations_mutex;
struct rw_semaphore extent_commit_sem;
struct rw_semaphore cleanup_work_sem;
struct rw_semaphore subvol_sem;
struct srcu_struct subvol_srcu;
struct list_head trans_list;
struct list_head hashers;
struct list_head dead_roots;
struct list_head caching_block_groups;
spinlock_t delayed_iput_lock;
struct list_head delayed_iputs;
atomic_t nr_async_submits;
...
So we copyout (and even checksum!) atomic counters and other volatile stuff
(I haven't looked at mutexes and semaphores, but i'm sure there is yet some
volatile stuff)
> In kernel the structures reside in btrfs_fs_info structure, so we compute
> CRC for:
> struct btrfs_super_block super_copy;
> struct btrfs_super_block super_for_commit;
> and then write it to disk. [H]ere we have 2 issues:
> 1. kernel pointers and other random stuff leaks out to kernel.
> It's nondeterministic and leaks out data (not too bad,
> as it should be accessible only for root, but still)
> 2. more serious: is there guarantee, that noone will kick-in
> between CRC computation and superblock outwrite?
>
> What if some of mutexes, semaphores or lists will change
> it's internal state? Some async thread will kick it
> an we will end-up writing superblock with invalid CRC!
> This might well be the cause of recend superblock
> corruptions under heavy load + hangup retorted to the list.
>
> Consider the following call chain:
> [somewhere in write_dev_supers ...]
>
> bh->b_end_io = btrfs_end_buffer_write_sync;
> crc = ~(u32)0;
> crc = btrfs_csum_data(NULL, (char *)sb +
> BTRFS_CSUM_SIZE, crc,
> BTRFS_SUPER_INFO_SIZE -
> BTRFS_CSUM_SIZE);
> btrfs_csum_final(crc, sb->csum);
Now the problem should be a bit more clear: sb is a thing pointing to the middle
of fs_info. We checksum it with data after it in fs_info.
> bh = __getblk(device->bdev, bytenr / 4096,
> BTRFS_SUPER_INFO_SIZE);
>
> memcpy(bh->b_data, sb, BTRFS_SUPER_INFO_SIZE);
and here we write all the checksummed contents. Is there guard, which
prevents things from updating fs_info?
>
> /* one reference for submit_bh */
> get_bh(bh);
>
> set_buffer_uptodate(bh);
> lock_buffer(bh);
> bh->b_end_io = btrfs_end_buffer_write_sync;
Am I too paranoid about the issue?
Thanks!
--
Sergei
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
next prev parent reply other threads:[~2011-05-31 19:10 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-05-30 21:18 [PATCH 0/9] some fixes for bugs spotted by valgrind Sergei Trofimovich
2011-05-30 21:19 ` [PATCH 1/9] btrfs progs: fix extra metadata chunk allocation in --mixed case Sergei Trofimovich
2011-05-30 21:19 ` [PATCH 2/9] btrfs-convert: fix typo: 'all inode' -> 'all inodes' Sergei Trofimovich
2011-05-30 21:19 ` [PATCH 3/9] mkfs.btrfs: fail on scandir error (-r mode) Sergei Trofimovich
2011-05-30 21:19 ` [PATCH 4/9] mkfs.btrfs: return some defined value instead of garbage when lookup checksum Sergei Trofimovich
2011-05-30 21:19 ` [PATCH 5/9] mkfs.btrfs: fix symlink names writing Sergei Trofimovich
2011-05-30 21:19 ` [PATCH 6/9] mkfs.btrfs: write zeroes instead on uninitialized data Sergei Trofimovich
2011-05-30 21:19 ` [PATCH 7/9] mkfs.btrfs: free buffers allocated by pretty_sizes Sergei Trofimovich
2011-05-30 21:19 ` [PATCH 8/9] mkfs.btrfs: fix memory leak caused by 'scandir()' calls Sergei Trofimovich
2011-05-30 21:19 ` [PATCH 9/9] mkfs.btrfs: fix error text in '-r' mode Sergei Trofimovich
2011-05-31 19:10 ` Sergei Trofimovich [this message]
2011-06-02 20:17 ` [PATCH 0/9] some fixes for bugs spotted by valgrind Andi Kleen
2011-06-02 21:13 ` Sergei Trofimovich
2011-06-02 23:26 ` Andi Kleen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110531221037.0454f16e@sf \
--to=slyfox@gentoo.org \
--cc=chris.mason@oracle.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).