* [PATCH 2/2] btrfs: do not write corrupted metadata blocks to disk @ 2016-03-10 11:10 Alex Lyakas 2016-03-11 22:19 ` Nicholas D Steeves 2016-03-21 18:51 ` Filipe Manana 0 siblings, 2 replies; 6+ messages in thread From: Alex Lyakas @ 2016-03-10 11:10 UTC (permalink / raw) To: linux-btrfs; +Cc: fdmanana, Alex Lyakas csum_dirty_buffer was issuing a warning in case the extent buffer did not look alright, but was still returning success. Let's return error in this case, and also add an additional sanity check on the extent buffer header. The caller up the chain may BUG_ON on this, for example flush_epd_write_bio will, but it is better than to have a silent metadata corruption on disk. Signed-off-by: Alex Lyakas <alex@zadarastorage.com> --- fs/btrfs/disk-io.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 4420ab2..cf85714 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -506,23 +506,34 @@ static int btree_read_extent_buffer_pages(struct btrfs_root *root, static int csum_dirty_buffer(struct btrfs_fs_info *fs_info, struct page *page) { u64 start = page_offset(page); u64 found_start; struct extent_buffer *eb; eb = (struct extent_buffer *)page->private; if (page != eb->pages[0]) return 0; + found_start = btrfs_header_bytenr(eb); - if (WARN_ON(found_start != start || !PageUptodate(page))) - return 0; + /* + * Please do not consolidate these warnings into a single if. + * It is useful to know what went wrong. + */ + if (WARN_ON(found_start != start)) + return -EUCLEAN; + if (WARN_ON(!PageUptodate(page))) + return -EUCLEAN; + + ASSERT(memcmp_extent_buffer(eb, fs_info->fsid, + btrfs_header_fsid(), BTRFS_FSID_SIZE) == 0); + return csum_tree_block(fs_info, eb, 0); } static int check_tree_block_fsid(struct btrfs_fs_info *fs_info, struct extent_buffer *eb) { struct btrfs_fs_devices *fs_devices = fs_info->fs_devices; u8 fsid[BTRFS_UUID_SIZE]; int ret = 1; -- 1.9.1 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 2/2] btrfs: do not write corrupted metadata blocks to disk 2016-03-10 11:10 [PATCH 2/2] btrfs: do not write corrupted metadata blocks to disk Alex Lyakas @ 2016-03-11 22:19 ` Nicholas D Steeves 2016-03-13 9:51 ` Alex Lyakas 2016-03-21 18:51 ` Filipe Manana 1 sibling, 1 reply; 6+ messages in thread From: Nicholas D Steeves @ 2016-03-11 22:19 UTC (permalink / raw) To: Btrfs BTRFS On 10 March 2016 at 06:10, Alex Lyakas <alex.bolshoy@gmail.com> wrote: > csum_dirty_buffer was issuing a warning in case the extent buffer > did not look alright, but was still returning success. > Let's return error in this case, and also add an additional sanity > check on the extent buffer header. > The caller up the chain may BUG_ON on this, for example flush_epd_write_bio will, > but it is better than to have a silent metadata corruption on disk. Does this mean there is a good chance that everyone has corrupted metadata? Is there any way to verify/rebuild it without wipefs+mkfs+restore from backups? Best regards, Nicholas ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 2/2] btrfs: do not write corrupted metadata blocks to disk 2016-03-11 22:19 ` Nicholas D Steeves @ 2016-03-13 9:51 ` Alex Lyakas 2016-04-06 4:04 ` Nicholas D Steeves 0 siblings, 1 reply; 6+ messages in thread From: Alex Lyakas @ 2016-03-13 9:51 UTC (permalink / raw) To: Nicholas D Steeves; +Cc: Btrfs BTRFS Nicholas, On Sat, Mar 12, 2016 at 12:19 AM, Nicholas D Steeves <nsteeves@gmail.com> wrote: > On 10 March 2016 at 06:10, Alex Lyakas <alex.bolshoy@gmail.com> wrote: >> csum_dirty_buffer was issuing a warning in case the extent buffer >> did not look alright, but was still returning success. >> Let's return error in this case, and also add an additional sanity >> check on the extent buffer header. >> The caller up the chain may BUG_ON on this, for example flush_epd_write_bio will, >> but it is better than to have a silent metadata corruption on disk. > > Does this mean there is a good chance that everyone has corrupted > metadata? No, this definitely does not. The code that I added prevents btrfs from writing a metadata block, if it somehow got corrupted before being sent to disk. If it happens, it indicates a bug somewhere in the kernel. For example, if some other kernel module erroneously uses a page-cache entry, which does not belong to it (and contains btrfs metadata block or part of it). > Is there any way to verify/rebuild it without wipefs+mkfs+restore from backups? To verify btrfs metadata: unmount the filesystem and run "btrfs check ...". Do not specify the "repair" parameter. Another way to verify is to run "btrfs-debug-tree" and redirect its standard output to /dev/null. It should not print anything to standard error. But "btrfs check" is faster. Thanks, Alex. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 2/2] btrfs: do not write corrupted metadata blocks to disk 2016-03-13 9:51 ` Alex Lyakas @ 2016-04-06 4:04 ` Nicholas D Steeves 2016-04-06 7:01 ` Duncan 0 siblings, 1 reply; 6+ messages in thread From: Nicholas D Steeves @ 2016-04-06 4:04 UTC (permalink / raw) To: Btrfs BTRFS Hi Alex, On 13 March 2016 at 05:51, Alex Lyakas <alex@zadarastorage.com> wrote: > Nicholas, > > On Sat, Mar 12, 2016 at 12:19 AM, Nicholas D Steeves <nsteeves@gmail.com> wrote: >> On 10 March 2016 at 06:10, Alex Lyakas <alex.bolshoy@gmail.com> wrote: >> Does this mean there is a good chance that everyone has corrupted >> metadata? > No, this definitely does not. > > The code that I added prevents btrfs from writing a metadata block, if > it somehow got corrupted before being sent to disk. If it happens, it > indicates a bug somewhere in the kernel. For example, if some other > kernel module erroneously uses a page-cache entry, which does not > belong to it (and contains btrfs metadata block or part of it). Oh wow, I didn't know that was possible. If I understand correctly, this patch makes using bcache a little bit safer? (I don't use it since I'm too short on free time to what is--I suspect-- something that radically increases the chances of having to restore from backup) >> Is there any way to verify/rebuild it without wipefs+mkfs+restore from backups? > To verify btrfs metadata: unmount the filesystem and run "btrfs check > ...". Do not specify the "repair" parameter. Another way to verify is > to run "btrfs-debug-tree" and redirect its standard output to > /dev/null. It should not print anything to standard error. But "btrfs > check" is faster. Ah, that's exactly what I was looking for! Thank you. It took forever, and brought me back to what it was like to fsck large ext2 volumes. Is btrfs check conceptually identical to a read-only fsck of a ext2 volume? If now how does it defer? Are the following sort of errors still an issue?: Extent back ref already exists for 2148837945344 parent 0 root 257 leaf parent key incorrect 504993210368 bad block 504993210368 ( https://btrfs.wiki.kernel.org/index.php/Btrfsck ) Cheers, Nicholas ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 2/2] btrfs: do not write corrupted metadata blocks to disk 2016-04-06 4:04 ` Nicholas D Steeves @ 2016-04-06 7:01 ` Duncan 0 siblings, 0 replies; 6+ messages in thread From: Duncan @ 2016-04-06 7:01 UTC (permalink / raw) To: linux-btrfs Nicholas D Steeves posted on Wed, 06 Apr 2016 00:04:36 -0400 as excerpted: > Ah, that's exactly what I was looking for! Thank you. It took forever, > and brought me back to what it was like to fsck large ext2 volumes. Is > btrfs check conceptually identical to a read-only fsck of a ext2 volume? > If now how does it defer? That question had me confused for a moment as I couldn't figure out what was "deferred", until I realized you must have meant "differ". =:^) At a suitably high level, yes, btrfs check (without --repair) is comparable to a read-only fsck of an ext* volume. They're both quite deep and thorough checks of their respective filesystems. But while ext2 had no journal and thus required such a deep fsck on reboot to recover after a bad shutdown, the later generations had a journal and avoided that, tho running an occasional fsck was still recommended. What they run automatically (on boot) is a very brief check that just checks some basics and that the filesystem looks like it should, replaying the journal if shutdown wasn't safely done, but nothing very thorough at all. And btrfs, being an atomic operation copy-on-write (cow) filesystem, is to a journaled filesystem like ext3/4, what a journaled filesystem like ext3/4 is to an earlier unjournaled filesystem like ext2. Which is why btrfs check doesn't normally need to be run -- because in general, the atomic cow nature of btrfs means that a commit is done atomically, you either get the state of the previous commit, or of the new one. There's no way to get a half-way state where only part of the data was written, which was the problem ext2 had, that the ext3 and later ext4 journal was designed to alleviate, but that an atomic cow filesystem such as btrfs does away with entirely. (Tho btrfs does have a journal, its use is far more limited. Normally, commits happen only every 30 seconds or so, and the btrfs journal simply replays any fsynced file changes between commits so they still hit the filesystem in the case of a crash before a later commit did it the normal way. IOW, it's only a shortcut to faster fsyncs, which would otherwise force a full filesystem commit before returning, much as they did on ext3. But even without that replay, a btrfs should be self-consistent to one of the two latest commits, with only the fsyncs done between commits lost if the log isn't replayed at all.) Of course all that assumes no critical bugs. There's a reason btrfs is still considered stabilizing, not fully stable and mature yet, as there are still bugs being found and fixed that prevent this ideal from being reality in more cases than we'd like, tho the filesystem is in general stable enough for many to use daily, as many including myself do, as long as they have backups just in case, and their world won't end if they actually have to use them. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 2/2] btrfs: do not write corrupted metadata blocks to disk 2016-03-10 11:10 [PATCH 2/2] btrfs: do not write corrupted metadata blocks to disk Alex Lyakas 2016-03-11 22:19 ` Nicholas D Steeves @ 2016-03-21 18:51 ` Filipe Manana 1 sibling, 0 replies; 6+ messages in thread From: Filipe Manana @ 2016-03-21 18:51 UTC (permalink / raw) To: Alex Lyakas; +Cc: linux-btrfs@vger.kernel.org, Alex Lyakas On Thu, Mar 10, 2016 at 11:10 AM, Alex Lyakas <alex.bolshoy@gmail.com> wrote: > csum_dirty_buffer was issuing a warning in case the extent buffer > did not look alright, but was still returning success. > Let's return error in this case, and also add an additional sanity > check on the extent buffer header. > The caller up the chain may BUG_ON on this, for example flush_epd_write_bio will, > but it is better than to have a silent metadata corruption on disk. > > Signed-off-by: Alex Lyakas <alex@zadarastorage.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> > --- > fs/btrfs/disk-io.c | 15 +++++++++++++-- > 1 file changed, 13 insertions(+), 2 deletions(-) > > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > index 4420ab2..cf85714 100644 > --- a/fs/btrfs/disk-io.c > +++ b/fs/btrfs/disk-io.c > @@ -506,23 +506,34 @@ static int btree_read_extent_buffer_pages(struct btrfs_root *root, > > static int csum_dirty_buffer(struct btrfs_fs_info *fs_info, struct page *page) > { > u64 start = page_offset(page); > u64 found_start; > struct extent_buffer *eb; > > eb = (struct extent_buffer *)page->private; > if (page != eb->pages[0]) > return 0; > + > found_start = btrfs_header_bytenr(eb); > - if (WARN_ON(found_start != start || !PageUptodate(page))) > - return 0; > + /* > + * Please do not consolidate these warnings into a single if. > + * It is useful to know what went wrong. > + */ > + if (WARN_ON(found_start != start)) > + return -EUCLEAN; > + if (WARN_ON(!PageUptodate(page))) > + return -EUCLEAN; > + > + ASSERT(memcmp_extent_buffer(eb, fs_info->fsid, > + btrfs_header_fsid(), BTRFS_FSID_SIZE) == 0); > + > return csum_tree_block(fs_info, eb, 0); > } > > static int check_tree_block_fsid(struct btrfs_fs_info *fs_info, > struct extent_buffer *eb) > { > struct btrfs_fs_devices *fs_devices = fs_info->fs_devices; > u8 fsid[BTRFS_UUID_SIZE]; > int ret = 1; > > -- > 1.9.1 > ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2016-04-06 7:01 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-03-10 11:10 [PATCH 2/2] btrfs: do not write corrupted metadata blocks to disk Alex Lyakas 2016-03-11 22:19 ` Nicholas D Steeves 2016-03-13 9:51 ` Alex Lyakas 2016-04-06 4:04 ` Nicholas D Steeves 2016-04-06 7:01 ` Duncan 2016-03-21 18:51 ` Filipe Manana
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).