linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: linux-fsdevel@vger.kernel.org
Cc: Dave Kleikamp <shaggy@kernel.org>,
	jfs-discussion@lists.sourceforge.net, tytso@mit.edu,
	Jeff Mahoney <jeffm@suse.de>, Mark Fasheh <mfasheh@suse.com>,
	Dave Chinner <david@fromorbit.com>,
	reiserfs-devel@vger.kernel.org, xfs@oss.sgi.com,
	cluster-devel@redhat.com, Joel Becker <jlbec@evilplan.org>,
	Jan Kara <jack@suse.cz>,
	linux-ext4@vger.kernel.org,
	Steven Whitehouse <swhiteho@redhat.com>,
	ocfs2-devel@oss.oracle.com, viro@zeniv.linux.org.uk
Subject: [PATCH] ext4: Fix jbd2 warning under heavy xattr load
Date: Fri, 10 Oct 2014 16:23:17 +0200	[thread overview]
Message-ID: <1412951028-4085-13-git-send-email-jack@suse.cz> (raw)
In-Reply-To: <1412951028-4085-1-git-send-email-jack@suse.cz>

When heavily exercising xattr code the assertion that
jbd2_journal_dirty_metadata() shouldn't return error was triggered:

WARNING: at /srv/autobuild-ceph/gitbuilder.git/build/fs/jbd2/transaction.c:1237
jbd2_journal_dirty_metadata+0x1ba/0x260()

CPU: 0 PID: 8877 Comm: ceph-osd Tainted: G    W 3.10.0-ceph-00049-g68d04c9 #1
Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 1.6.3 02/07/2011
 ffffffff81a1d3c8 ffff880214469928 ffffffff816311b0 ffff880214469968
 ffffffff8103fae0 ffff880214469958 ffff880170a9dc30 ffff8802240fbe80
 0000000000000000 ffff88020b366000 ffff8802256e7510 ffff880214469978
Call Trace:
 [<ffffffff816311b0>] dump_stack+0x19/0x1b
 [<ffffffff8103fae0>] warn_slowpath_common+0x70/0xa0
 [<ffffffff8103fb2a>] warn_slowpath_null+0x1a/0x20
 [<ffffffff81267c2a>] jbd2_journal_dirty_metadata+0x1ba/0x260
 [<ffffffff81245093>] __ext4_handle_dirty_metadata+0xa3/0x140
 [<ffffffff812561f3>] ext4_xattr_release_block+0x103/0x1f0
 [<ffffffff81256680>] ext4_xattr_block_set+0x1e0/0x910
 [<ffffffff8125795b>] ext4_xattr_set_handle+0x38b/0x4a0
 [<ffffffff810a319d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff81257b32>] ext4_xattr_set+0xc2/0x140
 [<ffffffff81258547>] ext4_xattr_user_set+0x47/0x50
 [<ffffffff811935ce>] generic_setxattr+0x6e/0x90
 [<ffffffff81193ecb>] __vfs_setxattr_noperm+0x7b/0x1c0
 [<ffffffff811940d4>] vfs_setxattr+0xc4/0xd0
 [<ffffffff8119421e>] setxattr+0x13e/0x1e0
 [<ffffffff811719c7>] ? __sb_start_write+0xe7/0x1b0
 [<ffffffff8118f2e8>] ? mnt_want_write_file+0x28/0x60
 [<ffffffff8118c65c>] ? fget_light+0x3c/0x130
 [<ffffffff8118f2e8>] ? mnt_want_write_file+0x28/0x60
 [<ffffffff8118f1f8>] ? __mnt_want_write+0x58/0x70
 [<ffffffff811946be>] SyS_fsetxattr+0xbe/0x100
 [<ffffffff816407c2>] system_call_fastpath+0x16/0x1b

The reason for the warning is that buffer_head passed into
jbd2_journal_dirty_metadata() didn't have journal_head attached. This is
caused by the following race of two ext4_xattr_release_block() calls:

CPU1                                CPU2
ext4_xattr_release_block()          ext4_xattr_release_block()
lock_buffer(bh);
/* False */
if (BHDR(bh)->h_refcount == cpu_to_le32(1))
} else {
  le32_add_cpu(&BHDR(bh)->h_refcount, -1);
  unlock_buffer(bh);
                                    lock_buffer(bh);
                                    /* True */
                                    if (BHDR(bh)->h_refcount == cpu_to_le32(1))
                                      get_bh(bh);
                                      ext4_free_blocks()
                                        ...
                                        jbd2_journal_forget()
                                          jbd2_journal_unfile_buffer()
                                          -> JH is gone
  error = ext4_handle_dirty_xattr_block(handle, inode, bh);
  -> triggers the warning

We fix the problem by moving ext4_handle_dirty_xattr_block() under the
buffer lock. Sadly this cannot be done in nojournal mode as that
function can call sync_dirty_buffer() which would deadlock. Luckily in
nojournal mode the race is harmless (we only dirty already freed buffer)
and thus for nojournal mode we leave the dirtying outside of the buffer
lock.

Reported-by: Sage Weil <sage@inktank.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/xattr.c | 23 +++++++++++++++++++----
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
index e175e94116ac..55e611c1513c 100644
--- a/fs/ext4/xattr.c
+++ b/fs/ext4/xattr.c
@@ -517,8 +517,8 @@ static void ext4_xattr_update_super_block(handle_t *handle,
 }
 
 /*
- * Release the xattr block BH: If the reference count is > 1, decrement
- * it; otherwise free the block.
+ * Release the xattr block BH: If the reference count is > 1, decrement it;
+ * otherwise free the block.
  */
 static void
 ext4_xattr_release_block(handle_t *handle, struct inode *inode,
@@ -538,16 +538,31 @@ ext4_xattr_release_block(handle_t *handle, struct inode *inode,
 		if (ce)
 			mb_cache_entry_free(ce);
 		get_bh(bh);
+		unlock_buffer(bh);
 		ext4_free_blocks(handle, inode, bh, 0, 1,
 				 EXT4_FREE_BLOCKS_METADATA |
 				 EXT4_FREE_BLOCKS_FORGET);
-		unlock_buffer(bh);
 	} else {
 		le32_add_cpu(&BHDR(bh)->h_refcount, -1);
 		if (ce)
 			mb_cache_entry_release(ce);
+		/*
+		 * Beware of this ugliness: Releasing of xattr block references
+		 * from different inodes can race and so we have to protect
+		 * from a race where someone else frees the block (and releases
+		 * its journal_head) before we are done dirtying the buffer. In
+		 * nojournal mode this race is harmless and we actually cannot
+		 * call ext4_handle_dirty_xattr_block() with locked buffer as
+		 * that function can call sync_dirty_buffer() so for that case
+		 * we handle the dirtying after unlocking the buffer.
+		 */
+		if (ext4_handle_valid(handle))
+			error = ext4_handle_dirty_xattr_block(handle, inode,
+							      bh);
 		unlock_buffer(bh);
-		error = ext4_handle_dirty_xattr_block(handle, inode, bh);
+		if (!ext4_handle_valid(handle))
+			error = ext4_handle_dirty_xattr_block(handle, inode,
+							      bh);
 		if (IS_SYNC(inode))
 			ext4_handle_sync(handle);
 		dquot_free_block(inode, EXT4_C2B(EXT4_SB(inode->i_sb), 1));
-- 
1.8.1.4


------------------------------------------------------------------------------
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk

  parent reply	other threads:[~2014-10-10 14:23 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-10-10 14:23 [PATCH 0/2 v2] Fix data corruption when blocksize < pagesize for mmapped data Jan Kara
2014-10-10 14:23 ` [PATCH 1/2 RESEND] bdi: Fix hung task on sync Jan Kara
2014-10-10 14:23 ` [PATCH] block: free q->flush_rq in blk_init_allocated_queue error paths Jan Kara
2014-10-10 15:19   ` Dave Jones
2014-10-10 15:32     ` Jan Kara
2014-10-10 14:23 ` [PATCH] block: improve rq_affinity placement Jan Kara
2014-10-10 14:23 ` [PATCH] block: Make rq_affinity = 1 work as expected Jan Kara
2014-10-10 14:23 ` [PATCH] block: strict rq_affinity Jan Kara
2014-10-10 14:23 ` [PATCH] ext3: Fix deadlock in data=journal mode when fs is frozen Jan Kara
2014-10-10 14:23 ` [PATCH] ext3: Speedup WB_SYNC_ALL pass Jan Kara
2014-10-10 14:23 ` [PATCH] ext4: Avoid lock inversion between i_mmap_mutex and transaction start Jan Kara
2014-10-10 14:23 ` [PATCH 1/2] ext4: Don't check quota format when there are no quota files Jan Kara
2014-10-10 14:23 ` [PATCH 1/2] ext4: Fix block zeroing when punching holes in indirect block files Jan Kara
2014-10-10 14:23 ` [PATCH] ext4: Fix buffer double free in ext4_alloc_branch() Jan Kara
2014-10-10 14:23 ` Jan Kara [this message]
2014-10-10 14:23 ` [PATCH] ext4: Fix zeroing of page during writeback Jan Kara
2014-10-10 14:23 ` [PATCH] ext4: Remove orphan list handling Jan Kara
2014-10-10 14:23 ` [PATCH] ext4: Speedup WB_SYNC_ALL pass Jan Kara
2014-10-10 14:23 ` [PATCH for 3.14-stable] fanotify: fix double free of pending permission events Jan Kara
2014-10-10 14:23 ` [PATCH] fs: Avoid userspace mounting anon_inodefs filesystem Jan Kara
2014-10-10 14:23 ` [PATCH 1/2] jbd2: Avoid pointless scanning of checkpoint lists Jan Kara
2014-10-10 14:23 ` [PATCH] jbd2: Optimize jbd2_log_do_checkpoint() a bit Jan Kara
2014-10-10 14:23 ` [PATCH] lockdep: Dump info via tracing Jan Kara
2014-10-10 14:23 ` [PATCH] mm: Fixup pagecache_isize_extended() definitions for !CONFIG_MMU Jan Kara
2014-10-10 14:23 ` [PATCH] ncpfs: fix rmdir returns Device or resource busy Jan Kara
2014-10-10 14:23 ` [PATCH] ocfs2: Fix quota file corruption Jan Kara
2014-10-10 14:23 ` [PATCH 1/2] printk: Debug patch1 Jan Kara
2014-10-10 14:23 ` [PATCH] printk: debug: Slow down printing to 9600 bauds Jan Kara
2014-10-10 14:23 ` [PATCH] printk: enable interrupts before calling console_trylock_for_printk() Jan Kara
2014-10-10 14:23 ` [PATCH] quota: Fix race between dqput() and dquot_scan_active() Jan Kara
2014-10-10 14:23 ` [PATCH] scsi: Keep interrupts disabled while submitting requests Jan Kara
2014-10-10 14:23 ` [PATCH] sync: don't block the flusher thread waiting on IO Jan Kara
2014-10-10 14:23 ` [PATCH] timer: Fix lock inversion between hrtimer_bases.lock and scheduler locks Jan Kara
2014-10-10 14:23 ` [PATCH] udf: Avoid infinite loop when processing indirect ICBs Jan Kara
2014-10-10 14:23 ` [PATCH] udf: Print error when inode is loaded Jan Kara
2014-10-10 14:23 ` [PATCH] vfs: Allocate anon_inode_inode in anon_inode_init() Jan Kara
2014-10-10 14:23 ` [PATCH 1/2] vfs: Fix data corruption when blocksize < pagesize for mmaped data Jan Kara
2014-10-10 14:23 ` [PATCH RESEND] vfs: Return EINVAL for default SEEK_HOLE, SEEK_DATA implementation Jan Kara
2014-10-10 14:23 ` [PATCH] writeback: plug writeback at a high level Jan Kara
2014-10-10 14:23 ` [PATCH] x86: Fixup lockdep complaint caused by io apic code Jan Kara
2014-10-10 14:23 ` [PATCH 2/2 RESEND] bdi: Avoid oops on device removal Jan Kara
2014-10-10 14:23 ` [PATCH 2/2] ext3: Don't check quota format when there are no quota files Jan Kara
2014-10-10 14:23 ` [PATCH 2/2] ext4: Fix hole punching for files with indirect blocks Jan Kara
2014-10-10 14:23 ` [PATCH 2/2] ext4: Fix mmap data corruption when blocksize < pagesize Jan Kara
2014-10-10 14:23 ` [PATCH 2/2] jbd2: Simplify calling convention around __jbd2_journal_clean_checkpoint_list Jan Kara
2014-10-10 14:23 ` [PATCH 2/2] printk: Debug patch 2 Jan Kara
  -- strict thread matches above, loose matches on Subject: below --
2014-04-07  9:38 [PATCH] ext4: Fix jbd2 warning under heavy xattr load Jan Kara
2014-04-07 14:55 ` Theodore Ts'o

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1412951028-4085-13-git-send-email-jack@suse.cz \
    --to=jack@suse.cz \
    --cc=cluster-devel@redhat.com \
    --cc=david@fromorbit.com \
    --cc=jeffm@suse.de \
    --cc=jfs-discussion@lists.sourceforge.net \
    --cc=jlbec@evilplan.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=mfasheh@suse.com \
    --cc=ocfs2-devel@oss.oracle.com \
    --cc=reiserfs-devel@vger.kernel.org \
    --cc=shaggy@kernel.org \
    --cc=swhiteho@redhat.com \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).