From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dmitry Monakhov Subject: Re: [PATCH] ext4: fix journal callback list traversal Date: Thu, 21 Mar 2013 20:11:24 +0400 Message-ID: <874ng4ybpv.fsf@openvz.org> References: <1363880952-28758-1-git-send-email-dmonakhov@openvz.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: bobijam@whamcloud.com To: linux-ext4@vger.kernel.org Return-path: Received: from mail-la0-f50.google.com ([209.85.215.50]:44271 "EHLO mail-la0-f50.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933832Ab3CUQL3 (ORCPT ); Thu, 21 Mar 2013 12:11:29 -0400 Received: by mail-la0-f50.google.com with SMTP id ec20so5400323lab.23 for ; Thu, 21 Mar 2013 09:11:27 -0700 (PDT) In-Reply-To: <1363880952-28758-1-git-send-email-dmonakhov@openvz.org> Sender: linux-ext4-owner@vger.kernel.org List-ID: On Thu, 21 Mar 2013 19:49:12 +0400, Dmitry Monakhov wrote: > It is incorrect to use list_for_each_entry_safe() for journal callback > traversial because ->next may be removed by other task: > ->ext4_mb_free_metadata() > ->ext4_mb_free_metadata() > ->ext4_journal_callback_del() > > This result in following issue: > > WARNING: at lib/list_debug.c:62 __list_del_entry+0x1c0/0x250() > Hardware name: > list_del corruption. prev->next should be ffff88019a4ec198, but was 6b6b6b6b6b6b6b6b > Modules linked in: cpufreq_ondemand acpi_cpufreq freq_table mperf coretemp kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode sg xhci_hcd button sd_mod crc_t10dif aesni_intel ablk_helper cryptd lrw aes_x86_64 xts gf128mul ahci libahci pata_acpi ata_generic dm_mirror dm_region_hash dm_log dm_mod > Pid: 16400, comm: jbd2/dm-1-8 Tainted: G W 3.8.0-rc3+ #107 > Call Trace: > [] warn_slowpath_common+0xad/0xf0 > [] warn_slowpath_fmt+0x46/0x50 > [] ? ext4_journal_commit_callback+0x99/0xc0 > [] __list_del_entry+0x1c0/0x250 > [] ext4_journal_commit_callback+0x6f/0xc0 > [] jbd2_journal_commit_transaction+0x23a6/0x2570 > [] ? try_to_del_timer_sync+0x82/0xa0 > [] ? del_timer_sync+0x91/0x1e0 > [] kjournald2+0x19f/0x6a0 > [] ? wake_up_bit+0x40/0x40 > [] ? bit_spin_lock+0x80/0x80 > [] kthread+0x10e/0x120 > [] ? __init_kthread_worker+0x70/0x70 > [] ret_from_fork+0x7c/0xb0 > [] ? __init_kthread_worker+0x70/0x70 > > This patch fix the issue like follows: > - ext4_journal_commit_callback() make list truly traversial safe > simply by always starting from the list head > - fix race between ext4_journal_callback_del() and > ext4_journal_callback_try_del() OOps. Sorry. I've sent wrong patch. Please ignore it. I'll send correct version in a minute. > > Signed-off-by: Dmitry Monakhov > --- > fs/ext4/ext4_jbd2.h | 6 +++++- > fs/ext4/mballoc.c | 8 ++++---- > fs/ext4/super.c | 4 +++- > 3 files changed, 12 insertions(+), 6 deletions(-) > > diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h > index 4c216b1..aeed0ba 100644 > --- a/fs/ext4/ext4_jbd2.h > +++ b/fs/ext4/ext4_jbd2.h > @@ -194,16 +194,20 @@ static inline void ext4_journal_callback_add(handle_t *handle, > * ext4_journal_callback_del: delete a registered callback > * @handle: active journal transaction handle on which callback was registered > * @jce: registered journal callback entry to unregister > + * Return true if object was sucessfully removed > */ > -static inline void ext4_journal_callback_del(handle_t *handle, > +static inline bool ext4_journal_callback_try_del(handle_t *handle, > struct ext4_journal_cb_entry *jce) > { > + bool deleted; > struct ext4_sb_info *sbi = > EXT4_SB(handle->h_transaction->t_journal->j_private); > > spin_lock(&sbi->s_md_lock); > + deleted = !list_empty(&jce->jce_list); > list_del_init(&jce->jce_list); > spin_unlock(&sbi->s_md_lock); > + return deleted; > } > > int > diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c > index ee6614b..2815f46 100644 > --- a/fs/ext4/mballoc.c > +++ b/fs/ext4/mballoc.c > @@ -4420,11 +4420,11 @@ ext4_mb_free_metadata(handle_t *handle, struct ext4_buddy *e4b, > node = rb_prev(new_node); > if (node) { > entry = rb_entry(node, struct ext4_free_data, efd_node); > - if (can_merge(entry, new_entry)) { > + if (can_merge(entry, new_entry) && > + ext4_journal_callback_try_del(handle, &entry->efd_jce)) { > new_entry->efd_start_cluster = entry->efd_start_cluster; > new_entry->efd_count += entry->efd_count; > rb_erase(node, &(db->bb_free_root)); > - ext4_journal_callback_del(handle, &entry->efd_jce); > kmem_cache_free(ext4_free_data_cachep, entry); > } > } > @@ -4432,10 +4432,10 @@ ext4_mb_free_metadata(handle_t *handle, struct ext4_buddy *e4b, > node = rb_next(new_node); > if (node) { > entry = rb_entry(node, struct ext4_free_data, efd_node); > - if (can_merge(new_entry, entry)) { > + if (can_merge(new_entry, entry) && > + ext4_journal_callback_del(handle, &entry->efd_jce)) { > new_entry->efd_count += entry->efd_count; > rb_erase(node, &(db->bb_free_root)); > - ext4_journal_callback_del(handle, &entry->efd_jce); > kmem_cache_free(ext4_free_data_cachep, entry); > } > } > diff --git a/fs/ext4/super.c b/fs/ext4/super.c > index d1ee6a8..c7e1509 100644 > --- a/fs/ext4/super.c > +++ b/fs/ext4/super.c > @@ -352,7 +352,9 @@ static void ext4_journal_commit_callback(journal_t *journal, transaction_t *txn) > struct ext4_journal_cb_entry *jce, *tmp; > > spin_lock(&sbi->s_md_lock); > - list_for_each_entry_safe(jce, tmp, &txn->t_private_list, jce_list) { > + while (!list_empty(&txn->t_private_list)) { > + jce = list_entry(txn->t_private_list.next, > + struct ext4_journal_cb_entry, jce_list); > list_del_init(&jce->jce_list); > spin_unlock(&sbi->s_md_lock); > jce->jce_func(sb, jce, error); > -- > 1.7.1 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html