linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Thavatchai Makphaibulchoke <thavatchai.makpahibulchoke@hp.com>
To: Theodore Ts'o <tytso@mit.edu>, T Makphaibulchoke <tmac@hp.com>,
	adilger.kernel@dilger.ca, viro@zeniv.linux.org.uk,
	linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, aswin@hp.com,
	torvalds@linux-foundation.org, aswin_proj@groups.hp.com
Subject: Re: [PATCH v3 1/2] mbcache: decoupling the locking of local from global data
Date: Wed, 30 Oct 2013 11:32:05 -0600	[thread overview]
Message-ID: <52714295.6040605@hp.com> (raw)
In-Reply-To: <20131030144229.GC3305@thunk.org>

On 10/30/2013 08:42 AM, Theodore Ts'o wrote:
> I tried running xfstests with this patch, and it blew up on
> generic/020 test:
> 
> generic/020	[10:21:50][  105.170352] ------------[ cut here ]------------
> [  105.171683] kernel BUG at /usr/projects/linux/ext4/include/linux/bit_spinlock.h:76!
> [  105.173346] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
> [  105.173346] Modules linked in:
> [  105.173346] CPU: 1 PID: 8519 Comm: attr Not tainted 3.12.0-rc5-00008-gffbe1d7-dirty #1492
> [  105.173346] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> [  105.173346] task: f5abe560 ti: f2274000 task.ti: f2274000
> [  105.173346] EIP: 0060:[<c026b464>] EFLAGS: 00010246 CPU: 1
> [  105.173346] EIP is at hlist_bl_unlock+0x7/0x1c
> [  105.173346] EAX: f488d360 EBX: f488d360 ECX: 00000000 EDX: f2998800
> [  105.173346] ESI: f29987f0 EDI: 6954c848 EBP: f2275cc8 ESP: f2275cb8
> [  105.173346]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> [  105.173346] CR0: 80050033 CR2: b76bcf54 CR3: 34844000 CR4: 000006f0
> [  105.173346] Stack:
> [  105.173346]  c026bc78 f2275d48 6954c848 f29987f0 f2275d24 c02cd7a9 f2275ce4 c02e2881
> [  105.173346]  f255d8c8 00000000 f1109020 f4a67f00 f2275d54 f2275d08 c02cd020 6954c848
> [  105.173346]  f4a67f00 f1109000 f2b0eba8 f2ee3800 f2275d28 f4f811e8 f2275d38 00000000
> [  105.173346] Call Trace:
> [  105.173346]  [<c026bc78>] ? mb_cache_entry_find_first+0x4b/0x55
> [  105.173346]  [<c02cd7a9>] ext4_xattr_block_set+0x248/0x6e7
> [  105.173346]  [<c02e2881>] ? jbd2_journal_put_journal_head+0xe2/0xed
> [  105.173346]  [<c02cd020>] ? ext4_xattr_find_entry+0x52/0xac
> [  105.173346]  [<c02ce307>] ext4_xattr_set_handle+0x1c7/0x30f
> [  105.173346]  [<c02ce4f4>] ext4_xattr_set+0xa5/0xe1
> [  105.173346]  [<c02ceb36>] ext4_xattr_user_set+0x46/0x5f
> [  105.173346]  [<c024a4da>] generic_setxattr+0x4c/0x5e
> [  105.173346]  [<c024a48e>] ? generic_listxattr+0x95/0x95
> [  105.173346]  [<c024ab0f>] __vfs_setxattr_noperm+0x56/0xb6
> [  105.173346]  [<c024abd2>] vfs_setxattr+0x63/0x7e
> [  105.173346]  [<c024ace8>] setxattr+0xfb/0x139
> [  105.173346]  [<c01b200a>] ? __lock_acquire+0x540/0xca6
> [  105.173346]  [<c01877a3>] ? lg_local_unlock+0x1b/0x34
> [  105.173346]  [<c01af8dd>] ? trace_hardirqs_off_caller+0x2e/0x98
> [  105.173346]  [<c0227e69>] ? kmem_cache_free+0xd4/0x149
> [  105.173346]  [<c01b2c2b>] ? lock_acquire+0xdd/0x107
> [  105.173346]  [<c023225e>] ? __sb_start_write+0xee/0x11d
> [  105.173346]  [<c0247383>] ? mnt_want_write+0x1e/0x3e
> [  105.173346]  [<c01b3019>] ? trace_hardirqs_on_caller+0x12a/0x17e
> [  105.173346]  [<c0247353>] ? __mnt_want_write+0x4e/0x60
> [  105.173346]  [<c024af3b>] SyS_lsetxattr+0x6a/0x9f
> [  105.173346]  [<c078d0e8>] syscall_call+0x7/0xb
> [  105.173346] Code: 00 00 00 00 5b 5d c3 55 89 e5 53 3e 8d 74 26 00 8b 58 08 89 c2 8b 43 18 e8 3f c9 fb ff f0 ff 4b 0c 5b 5d c3 8b 10 80 e2 01 75 02 <0f> 0b 55 89 e5 0f ba 30 00 89 e0 25 00 e0 ff ff ff 48 14 5d c3
> [  105.173346] EIP: [<c026b464>] hlist_bl_unlock+0x7/0x1c SS:ESP 0068:f2275cb8
> [  105.273781] ---[ end trace 1ee45ddfc1df0935 ]---
> 
> When I tried to find a potential problem, I immediately ran into this.
> I'm not entirely sure it's the problem, but it's raised a number of
> red flags for me in terms of (a) how much testing you've employed with
> this patch set, and (b) how maintaining and easy-to-audit the code
> will be with this extra locking.  The comments are good start, but
> some additional comments about exactly what assumptions a function
> assumes about locks that are held on function entry, or especially if
> the locking is different on function entry and function exit, might
> make it easier for people to audit this patch.
> 
> Or maybe this commit needs to be split up with first a conversion from
> using list_head to hlist_hl_node, and the changing the locking?  The
> bottom line is that we need to somehow make this patch easier to
> validate/review.
> 

Thanks for the comemnts.  Yes, I did run through xfstests.  My guess is that you probably ran into a race condition that I did not.

I will try to port the patch to a more recent kernel, including the mb_cache_shrink_scan() sent earlier (BTW, it looks good) and debug the problem.

Yes, those are good suggestions.  Once I find the problem, I will resubmit with more comments and also split it into two patches, as suggested. 

>> @@ -520,18 +647,23 @@ __mb_cache_entry_find(struct list_head *l, struct list_head *head,
>>  				ce->e_queued++;
>>  				prepare_to_wait(&mb_cache_queue, &wait,
>>  						TASK_UNINTERRUPTIBLE);
>> -				spin_unlock(&mb_cache_spinlock);
>> +				hlist_bl_unlock(head);
>>  				schedule();
>> -				spin_lock(&mb_cache_spinlock);
>> +				hlist_bl_lock(head);
>> +				mb_assert(ce->e_index_hash_p == head);
>>  				ce->e_queued--;
>>  			}
>> +			hlist_bl_unlock(head);
>>  			finish_wait(&mb_cache_queue, &wait);
>>  
>> -			if (!__mb_cache_entry_is_hashed(ce)) {
>> +			hlist_bl_lock(ce->e_block_hash_p);
>> +			if (!__mb_cache_entry_is_block_hashed(ce)) {
>>  				__mb_cache_entry_release_unlock(ce);
>> -				spin_lock(&mb_cache_spinlock);
>> +				hlist_bl_lock(head);
> 
> <---  are we missing a "hlist_bl_unlock(ce->e_block_hash_p);" here?
> 
> <---- Why is it that we call "hlist_bl_lock(head);" but not in the else clause?

The function __mb_cache_entry_release_unlock() releases both the e_block_hash and e_index_hash locks.  So we have to reacquire the e_index_hash (head) lock in the then part, and release the e_block_hash lock in the else part.
> 
>>  				return ERR_PTR(-EAGAIN);
>> -			}
>> +			} else
>> +				hlist_bl_unlock(ce->e_block_hash_p);
>> +			mb_assert(ce->e_index_hash_p == head);
>>  			return ce;
>>  		}
>>  		l = l->next; 
> 
> 
> Cheers,
> 
> 					- Ted
> 

Thanks,
Mak.


  reply	other threads:[~2013-10-30 23:36 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-18  0:55 [PATCH 0/2] ext4: increase mbcache scalability T Makphaibulchoke
2013-08-22 15:54 ` [PATCH v2 " T Makphaibulchoke
2013-08-22 15:54   ` [PATCH v2 1/2] mbcache: decoupling the locking of local from global data T Makphaibulchoke
2013-08-22 16:53     ` Linus Torvalds
2013-08-22 15:33       ` Thavatchai Makphaibulchoke
2013-08-22 15:54   ` [PATCH v2 2/2] ext4: each filesystem creates and uses its own mc_cache T Makphaibulchoke
2014-01-24 18:31   ` [PATCH v4 0/3] ext4: increase mbcache scalability T Makphaibulchoke
2014-01-24 18:31     ` [PATCH v4 1/3] fs/mbcache.c change block and index hash chain to hlist_bl_node T Makphaibulchoke
2014-01-24 18:31     ` [PATCH v4 2/3] mbcache: decoupling the locking of local from global data T Makphaibulchoke
2014-01-24 18:31     ` [PATCH v4 3/3] ext4: each filesystem creates and uses its own mc_cache T Makphaibulchoke
2014-01-24 21:38     ` [PATCH v4 0/3] ext4: increase mbcache scalability Andi Kleen
2014-01-25  1:13       ` Thavatchai Makphaibulchoke
2014-01-25  6:09       ` Andreas Dilger
2014-01-27 12:27         ` Thavatchai Makphaibulchoke
2014-02-09 19:46         ` Thavatchai Makphaibulchoke
2014-02-11 19:58         ` Thavatchai Makphaibulchoke
2014-02-13  2:01           ` Andreas Dilger
2013-09-04 16:39 ` [PATCH v3 0/2] " T Makphaibulchoke
2013-09-04 16:39   ` [PATCH v3 1/2] mbcache: decoupling the locking of local from global data T Makphaibulchoke
2013-10-30 13:27     ` Theodore Ts'o
2013-10-30 14:42     ` Theodore Ts'o
2013-10-30 17:32       ` Thavatchai Makphaibulchoke [this message]
2013-09-04 16:39   ` [PATCH v3 2/2] ext4: each filesystem creates and uses its own mb_cache T Makphaibulchoke
2013-10-30 14:30     ` Theodore Ts'o
2013-09-04 20:00   ` [PATCH v3 0/2] ext4: increase mbcache scalability Andreas Dilger
2013-09-04 15:33     ` Thavatchai Makphaibulchoke
2013-09-05 15:22     ` Thavatchai Makphaibulchoke
2013-09-05  2:35   ` Theodore Ts'o
2013-09-05  9:49     ` Thavatchai Makphaibulchoke
2013-09-06  5:10       ` Andreas Dilger
2013-09-06 12:23         ` Thavatchai Makphaibulchoke
2013-09-10 20:47           ` Andreas Dilger
2013-09-10 21:02             ` Theodore Ts'o
2013-09-10 17:10               ` Thavatchai Makphaibulchoke
2013-09-11  3:13               ` Eric Sandeen
2013-09-11 11:30                 ` Theodore Ts'o
2013-09-11 16:49                   ` Eric Sandeen
2013-09-11 19:33                     ` Eric Sandeen
2013-09-11 20:32                     ` David Lang
2013-09-11 20:48                       ` Eric Sandeen
2013-09-11 21:25                         ` Theodore Ts'o
2013-09-11 20:36                           ` Thavatchai Makphaibulchoke
2013-09-12  3:42                             ` Eric Sandeen
2014-02-20 18:37 ` [PATCH V5 0/3] " T Makphaibulchoke
2014-02-20 18:37   ` [PATCH V5 1/3] fs/mbcache.c change block and index hash chain to hlist_bl_node T Makphaibulchoke
2014-02-20 18:37   ` [PATCH V5 2/3] mbcache: decoupling the locking of local from global data T Makphaibulchoke
2014-02-20 18:37   ` [PATCH V5 3/3] ext4: each filesystem creates and uses its own mb_cache T Makphaibulchoke
2014-03-12 16:19 ` [PATCH v5 RESEND 0/3] ext4: increase mbcache scalability T Makphaibulchoke
2014-03-12 16:19   ` [PATCH v5 RESEND 1/3] fs/mbcache.c change block and index hash chain to hlist_bl_node T Makphaibulchoke
2014-03-12 16:19   ` [PATCH v5 RESEND 2/3] mbcache: decoupling the locking of local from global data T Makphaibulchoke
2014-03-12 16:19   ` [PATCH v5 RESEND 3/3] ext4: each filesystem creates and uses its own mb_cache T Makphaibulchoke

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52714295.6040605@hp.com \
    --to=thavatchai.makpahibulchoke@hp.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=aswin@hp.com \
    --cc=aswin_proj@groups.hp.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tmac@hp.com \
    --cc=torvalds@linux-foundation.org \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).