From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:13923 "EHLO mx0b-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752765AbaKZQcR (ORCPT ); Wed, 26 Nov 2014 11:32:17 -0500 Message-ID: <54760089.7020608@fb.com> Date: Wed, 26 Nov 2014 11:32:09 -0500 From: Josef Bacik MIME-Version: 1.0 To: CC: Filipe Manana , "linux-btrfs@vger.kernel.org" Subject: Re: [PATCH 6/6] Btrfs: make btrfs_abort_transaction consider existence of new block groups References: <1417015735-8581-1-git-send-email-fdmanana@suse.com> <1417015735-8581-7-git-send-email-fdmanana@suse.com> <5475FAA9.7080507@fb.com> <5475FD8A.2000602@fb.com> In-Reply-To: Content-Type: text/plain; charset="utf-8"; format=flowed Sender: linux-btrfs-owner@vger.kernel.org List-ID: On 11/26/2014 11:29 AM, Filipe David Manana wrote: > On Wed, Nov 26, 2014 at 4:19 PM, Josef Bacik wrote: >> On 11/26/2014 11:15 AM, Filipe David Manana wrote: >>> >>> On Wed, Nov 26, 2014 at 4:07 PM, Josef Bacik wrote: >>>> >>>> On 11/26/2014 10:28 AM, Filipe Manana wrote: >>>>> >>>>> >>>>> If the transaction handle doesn't have used blocks but has created new >>>>> block >>>>> groups make sure we turn the fs into readonly mode too. This is because >>>>> the >>>>> new block groups didn't get all their metadata persisted into the chunk >>>>> and >>>>> device trees, and therefore if a subsequent transaction starts, >>>>> allocates >>>>> space from the new block groups, writes data or metadata into that >>>>> space, >>>>> commits successfully and then after we unmount and mount the filesystem >>>>> again, the same space can be allocated again for a new block group, >>>>> resulting in file data or metadata corruption. >>>>> >>>>> Example where we don't abort the transaction when we fail to finish the >>>>> chunk allocation (add items to the chunk and device trees) and later a >>>>> future transaction where the block group is removed fails because it >>>>> can't >>>>> find the chunk item in the chunk tree: >>>>> >>>>> [25230.404300] WARNING: CPU: 0 PID: 7721 at fs/btrfs/super.c:260 >>>>> __btrfs_abort_transaction+0x50/0xfc [btrfs]() >>>>> [25230.404301] BTRFS: Transaction aborted (error -28) >>>>> [25230.404302] Modules linked in: btrfs dm_flakey nls_utf8 fuse xor >>>>> raid6_pq ntfs vfat msdos fat xfs crc32c_generic libcrc32c ext3 jbd ext2 >>>>> dm_mod nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc >>>>> loop >>>>> psmouse i2c_piix4 i2ccore parport_pc parport processor button pcspkr >>>>> serio_raw thermal_sys evdev microcode ext4 crc16 jbd2 mbcache sr_mod >>>>> cdrom >>>>> ata_generic sg sd_mod crc_t10dif crct10dif_generic crct10dif_common >>>>> virtio_scsi floppy e1000 ata_piix libata virtio_pci virtio_ring scsi_mod >>>>> virtio [last unloaded: btrfs] >>>>> [25230.404325] CPU: 0 PID: 7721 Comm: xfs_io Not tainted >>>>> 3.17.0-rc5-btrfs-next-1+ #1 >>>>> [25230.404326] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), >>>>> BIOS >>>>> rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 >>>>> [25230.404328] 0000000000000000 ffff88004581bb08 ffffffff813e7a13 >>>>> ffff88004581bb50 >>>>> [25230.404330] ffff88004581bb40 ffffffff810423aa ffffffffa049386a >>>>> 00000000ffffffe4 >>>>> [25230.404332] ffffffffa05214c0 000000000000240c ffff88010fc8f800 >>>>> ffff88004581bba8 >>>>> [25230.404334] Call Trace: >>>>> [25230.404338] [] dump_stack+0x45/0x56 >>>>> [25230.404342] [] warn_slowpath_common+0x7f/0x98 >>>>> [25230.404351] [] ? >>>>> __btrfs_abort_transaction+0x50/0xfc >>>>> [btrfs] >>>>> [25230.404353] [] warn_slowpath_fmt+0x48/0x50 >>>>> [25230.404362] [] __btrfs_abort_transaction+0x50/0xfc >>>>> [btrfs] >>>>> [25230.404374] [] >>>>> btrfs_create_pending_block_groups+0x10c/0x135 [btrfs] >>>>> [25230.404387] [] __btrfs_end_transaction+0x7e/0x2de >>>>> [btrfs] >>>>> [25230.404398] [] btrfs_end_transaction+0x10/0x12 >>>>> [btrfs] >>>>> [25230.404408] [] >>>>> btrfs_check_data_free_space+0x111/0x1f0 [btrfs] >>>>> [25230.404421] [] __btrfs_buffered_write+0x160/0x48d >>>>> [btrfs] >>>>> [25230.404425] [] ? cap_inode_need_killpriv+0x2d/0x37 >>>>> [25230.404429] [] ? get_page+0x1a/0x2b >>>>> [25230.404441] [] btrfs_file_write_iter+0x321/0x42f >>>>> [btrfs] >>>>> [25230.404443] [] ? handle_mm_fault+0x7f3/0x846 >>>>> [25230.404446] [] ? mutex_unlock+0x16/0x18 >>>>> [25230.404449] [] new_sync_write+0x7c/0xa0 >>>>> [25230.404450] [] vfs_write+0xb0/0x112 >>>>> [25230.404452] [] SyS_pwrite64+0x66/0x84 >>>>> [25230.404454] [] system_call_fastpath+0x16/0x1b >>>>> [25230.404455] ---[ end trace 5aa5684fdf47ab38 ]--- >>>>> [25230.404458] BTRFS warning (device sdc): >>>>> btrfs_create_pending_block_groups:9228: Aborting unused transaction(No >>>>> space >>>>> left). >>>>> [25288.084814] BTRFS: error (device sdc) in btrfs_free_chunk:2509: >>>>> errno=-2 No such entry (Failed lookup while freeing chunk.) >>>>> >>>>> Signed-off-by: Filipe Manana >>>>> --- >>>>> fs/btrfs/extent-tree.c | 5 +++-- >>>>> fs/btrfs/super.c | 2 +- >>>>> 2 files changed, 4 insertions(+), 3 deletions(-) >>>>> >>>>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c >>>>> index 4bf8f02..0a5e770 100644 >>>>> --- a/fs/btrfs/extent-tree.c >>>>> +++ b/fs/btrfs/extent-tree.c >>>>> @@ -9209,9 +9209,8 @@ void btrfs_create_pending_block_groups(struct >>>>> btrfs_trans_handle *trans, >>>>> int ret = 0; >>>>> >>>>> list_for_each_entry_safe(block_group, tmp, &trans->new_bgs, >>>>> bg_list) { >>>>> - list_del_init(&block_group->bg_list); >>>>> if (ret) >>>>> - continue; >>>>> + goto next; >>>>> >>>>> spin_lock(&block_group->lock); >>>>> memcpy(&item, &block_group->item, sizeof(item)); >>>>> @@ -9226,6 +9225,8 @@ void btrfs_create_pending_block_groups(struct >>>>> btrfs_trans_handle *trans, >>>>> key.objectid, >>>>> key.offset); >>>>> if (ret) >>>>> btrfs_abort_transaction(trans, extent_root, >>>>> ret); >>>>> +next: >>>>> + list_del_init(&block_group->bg_list); >>>>> } >>>>> } >>>>> >>>> >>>> I don't understand this change, logically it seems the same as what we >>>> had >>>> before. Thanks, >>> >>> >>> It isn't. Before we would not turn the fs readonly if the transaction >>> had no blocks used but had new block groups. This change just makes it >>> turn into readonly mode if it has new block groups (even if no blocks >>> were used). >>> See the trace in the log where it shows we failed to finish the chunk >>> allocation but the transaction was aborted and didn't turn the fs into >>> readonly mode. >>> >> >> Yeah the bit below makes sense, its the above change that is the same. >> Before we deleted the entry from the list and then if there is an error we >> just continue, now if there is an error you go to next and delete the entry >> and continue, which is the same thing. Thanks, > > I don't see how it's the same. > Before if we had a single new block group, when we called > abort_transaction the list of block groups was empty because we > removed the bg from the list before calling abort_transaction. This > change just ensures that for the single new block group case > abort_transaction sees there's a new block group and takes the action > of making the fs readonly. > Oooooh Jesus I see it now, sorry Reviewed-by: Josef Bacik Thanks, Josef