linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Theodore Ts'o <tytso@mit.edu>
To: T Makphaibulchoke <tmac@hp.com>
Cc: adilger.kernel@dilger.ca, viro@zeniv.linux.org.uk,
	linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, aswin@hp.com,
	torvalds@linux-foundation.org, aswin_proj@groups.hp.com
Subject: Re: [PATCH v3 1/2] mbcache: decoupling the locking of local from global data
Date: Wed, 30 Oct 2013 10:42:29 -0400	[thread overview]
Message-ID: <20131030144229.GC3305@thunk.org> (raw)
In-Reply-To: <1378312756-68597-2-git-send-email-tmac@hp.com>

On Wed, Sep 04, 2013 at 10:39:15AM -0600, T Makphaibulchoke wrote:
> The patch increases the parallelism of mb_cache_entry utilization by
> replacing list_head with hlist_bl_node for the implementation of both the
> block and index hash tables.  Each hlist_bl_node contains a built-in lock
> used to protect mb_cache's local block and index hash chains. The global
> data mb_cache_lru_list and mb_cache_list continue to be protected by the
> global mb_cache_spinlock.
> 
> Signed-off-by: T. Makphaibulchoke <tmac@hp.com>

I tried running xfstests with this patch, and it blew up on
generic/020 test:

generic/020	[10:21:50][  105.170352] ------------[ cut here ]------------
[  105.171683] kernel BUG at /usr/projects/linux/ext4/include/linux/bit_spinlock.h:76!
[  105.173346] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
[  105.173346] Modules linked in:
[  105.173346] CPU: 1 PID: 8519 Comm: attr Not tainted 3.12.0-rc5-00008-gffbe1d7-dirty #1492
[  105.173346] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  105.173346] task: f5abe560 ti: f2274000 task.ti: f2274000
[  105.173346] EIP: 0060:[<c026b464>] EFLAGS: 00010246 CPU: 1
[  105.173346] EIP is at hlist_bl_unlock+0x7/0x1c
[  105.173346] EAX: f488d360 EBX: f488d360 ECX: 00000000 EDX: f2998800
[  105.173346] ESI: f29987f0 EDI: 6954c848 EBP: f2275cc8 ESP: f2275cb8
[  105.173346]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[  105.173346] CR0: 80050033 CR2: b76bcf54 CR3: 34844000 CR4: 000006f0
[  105.173346] Stack:
[  105.173346]  c026bc78 f2275d48 6954c848 f29987f0 f2275d24 c02cd7a9 f2275ce4 c02e2881
[  105.173346]  f255d8c8 00000000 f1109020 f4a67f00 f2275d54 f2275d08 c02cd020 6954c848
[  105.173346]  f4a67f00 f1109000 f2b0eba8 f2ee3800 f2275d28 f4f811e8 f2275d38 00000000
[  105.173346] Call Trace:
[  105.173346]  [<c026bc78>] ? mb_cache_entry_find_first+0x4b/0x55
[  105.173346]  [<c02cd7a9>] ext4_xattr_block_set+0x248/0x6e7
[  105.173346]  [<c02e2881>] ? jbd2_journal_put_journal_head+0xe2/0xed
[  105.173346]  [<c02cd020>] ? ext4_xattr_find_entry+0x52/0xac
[  105.173346]  [<c02ce307>] ext4_xattr_set_handle+0x1c7/0x30f
[  105.173346]  [<c02ce4f4>] ext4_xattr_set+0xa5/0xe1
[  105.173346]  [<c02ceb36>] ext4_xattr_user_set+0x46/0x5f
[  105.173346]  [<c024a4da>] generic_setxattr+0x4c/0x5e
[  105.173346]  [<c024a48e>] ? generic_listxattr+0x95/0x95
[  105.173346]  [<c024ab0f>] __vfs_setxattr_noperm+0x56/0xb6
[  105.173346]  [<c024abd2>] vfs_setxattr+0x63/0x7e
[  105.173346]  [<c024ace8>] setxattr+0xfb/0x139
[  105.173346]  [<c01b200a>] ? __lock_acquire+0x540/0xca6
[  105.173346]  [<c01877a3>] ? lg_local_unlock+0x1b/0x34
[  105.173346]  [<c01af8dd>] ? trace_hardirqs_off_caller+0x2e/0x98
[  105.173346]  [<c0227e69>] ? kmem_cache_free+0xd4/0x149
[  105.173346]  [<c01b2c2b>] ? lock_acquire+0xdd/0x107
[  105.173346]  [<c023225e>] ? __sb_start_write+0xee/0x11d
[  105.173346]  [<c0247383>] ? mnt_want_write+0x1e/0x3e
[  105.173346]  [<c01b3019>] ? trace_hardirqs_on_caller+0x12a/0x17e
[  105.173346]  [<c0247353>] ? __mnt_want_write+0x4e/0x60
[  105.173346]  [<c024af3b>] SyS_lsetxattr+0x6a/0x9f
[  105.173346]  [<c078d0e8>] syscall_call+0x7/0xb
[  105.173346] Code: 00 00 00 00 5b 5d c3 55 89 e5 53 3e 8d 74 26 00 8b 58 08 89 c2 8b 43 18 e8 3f c9 fb ff f0 ff 4b 0c 5b 5d c3 8b 10 80 e2 01 75 02 <0f> 0b 55 89 e5 0f ba 30 00 89 e0 25 00 e0 ff ff ff 48 14 5d c3
[  105.173346] EIP: [<c026b464>] hlist_bl_unlock+0x7/0x1c SS:ESP 0068:f2275cb8
[  105.273781] ---[ end trace 1ee45ddfc1df0935 ]---

When I tried to find a potential problem, I immediately ran into this.
I'm not entirely sure it's the problem, but it's raised a number of
red flags for me in terms of (a) how much testing you've employed with
this patch set, and (b) how maintaining and easy-to-audit the code
will be with this extra locking.  The comments are good start, but
some additional comments about exactly what assumptions a function
assumes about locks that are held on function entry, or especially if
the locking is different on function entry and function exit, might
make it easier for people to audit this patch.

Or maybe this commit needs to be split up with first a conversion from
using list_head to hlist_hl_node, and the changing the locking?  The
bottom line is that we need to somehow make this patch easier to
validate/review.

> @@ -520,18 +647,23 @@ __mb_cache_entry_find(struct list_head *l, struct list_head *head,
>  				ce->e_queued++;
>  				prepare_to_wait(&mb_cache_queue, &wait,
>  						TASK_UNINTERRUPTIBLE);
> -				spin_unlock(&mb_cache_spinlock);
> +				hlist_bl_unlock(head);
>  				schedule();
> -				spin_lock(&mb_cache_spinlock);
> +				hlist_bl_lock(head);
> +				mb_assert(ce->e_index_hash_p == head);
>  				ce->e_queued--;
>  			}
> +			hlist_bl_unlock(head);
>  			finish_wait(&mb_cache_queue, &wait);
>  
> -			if (!__mb_cache_entry_is_hashed(ce)) {
> +			hlist_bl_lock(ce->e_block_hash_p);
> +			if (!__mb_cache_entry_is_block_hashed(ce)) {
>  				__mb_cache_entry_release_unlock(ce);
> -				spin_lock(&mb_cache_spinlock);
> +				hlist_bl_lock(head);

<---  are we missing a "hlist_bl_unlock(ce->e_block_hash_p);" here?

<---- Why is it that we call "hlist_bl_lock(head);" but not in the else clause?

>  				return ERR_PTR(-EAGAIN);
> -			}
> +			} else
> +				hlist_bl_unlock(ce->e_block_hash_p);
> +			mb_assert(ce->e_index_hash_p == head);
>  			return ce;
>  		}
>  		l = l->next; 


Cheers,

					- Ted

  parent reply	other threads:[~2013-10-30 14:42 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-18  0:55 [PATCH 0/2] ext4: increase mbcache scalability T Makphaibulchoke
2013-08-22 15:54 ` [PATCH v2 " T Makphaibulchoke
2013-08-22 15:54   ` [PATCH v2 1/2] mbcache: decoupling the locking of local from global data T Makphaibulchoke
2013-08-22 16:53     ` Linus Torvalds
2013-08-22 15:33       ` Thavatchai Makphaibulchoke
2013-08-22 15:54   ` [PATCH v2 2/2] ext4: each filesystem creates and uses its own mc_cache T Makphaibulchoke
2014-01-24 18:31   ` [PATCH v4 0/3] ext4: increase mbcache scalability T Makphaibulchoke
2014-01-24 18:31     ` [PATCH v4 1/3] fs/mbcache.c change block and index hash chain to hlist_bl_node T Makphaibulchoke
2014-01-24 18:31     ` [PATCH v4 2/3] mbcache: decoupling the locking of local from global data T Makphaibulchoke
2014-01-24 18:31     ` [PATCH v4 3/3] ext4: each filesystem creates and uses its own mc_cache T Makphaibulchoke
2014-01-24 21:38     ` [PATCH v4 0/3] ext4: increase mbcache scalability Andi Kleen
2014-01-25  1:13       ` Thavatchai Makphaibulchoke
2014-01-25  6:09       ` Andreas Dilger
2014-01-27 12:27         ` Thavatchai Makphaibulchoke
2014-02-09 19:46         ` Thavatchai Makphaibulchoke
2014-02-11 19:58         ` Thavatchai Makphaibulchoke
2014-02-13  2:01           ` Andreas Dilger
2013-09-04 16:39 ` [PATCH v3 0/2] " T Makphaibulchoke
2013-09-04 16:39   ` [PATCH v3 1/2] mbcache: decoupling the locking of local from global data T Makphaibulchoke
2013-10-30 13:27     ` Theodore Ts'o
2013-10-30 14:42     ` Theodore Ts'o [this message]
2013-10-30 17:32       ` Thavatchai Makphaibulchoke
2013-09-04 16:39   ` [PATCH v3 2/2] ext4: each filesystem creates and uses its own mb_cache T Makphaibulchoke
2013-10-30 14:30     ` Theodore Ts'o
2013-09-04 20:00   ` [PATCH v3 0/2] ext4: increase mbcache scalability Andreas Dilger
2013-09-04 15:33     ` Thavatchai Makphaibulchoke
2013-09-05 15:22     ` Thavatchai Makphaibulchoke
2013-09-05  2:35   ` Theodore Ts'o
2013-09-05  9:49     ` Thavatchai Makphaibulchoke
2013-09-06  5:10       ` Andreas Dilger
2013-09-06 12:23         ` Thavatchai Makphaibulchoke
2013-09-10 20:47           ` Andreas Dilger
2013-09-10 21:02             ` Theodore Ts'o
2013-09-10 17:10               ` Thavatchai Makphaibulchoke
2013-09-11  3:13               ` Eric Sandeen
2013-09-11 11:30                 ` Theodore Ts'o
2013-09-11 16:49                   ` Eric Sandeen
2013-09-11 19:33                     ` Eric Sandeen
2013-09-11 20:32                     ` David Lang
2013-09-11 20:48                       ` Eric Sandeen
2013-09-11 21:25                         ` Theodore Ts'o
2013-09-11 20:36                           ` Thavatchai Makphaibulchoke
2013-09-12  3:42                             ` Eric Sandeen
2013-09-12 12:23                             ` Theodore Ts'o
2013-09-13 12:04                               ` Thavatchai Makphaibulchoke
2013-09-13 18:59                                 ` Eric Sandeen
2013-09-17 16:43                                   ` Thavatchai Makphaibulchoke
2013-09-18 14:19                                     ` Eric Sandeen
2013-09-21 18:53                                       ` Theodore Ts'o
2013-09-23 15:35                                         ` Thavatchai Makphaibulchoke
     [not found]                                         ` <D59BF31F-934D-4DE6-9E5C-F92014F09C68@dilger.ca>
2013-09-25 18:43                                           ` Thavatchai Makphaibulchoke
2013-10-04  3:22                                             ` Andreas Dilger
2013-10-18 12:43                                               ` Thavatchai Makphaibulchoke
2013-10-19 23:06                                                 ` Eric Sandeen
2014-02-20 18:37 ` [PATCH V5 0/3] " T Makphaibulchoke
2014-02-20 18:37   ` [PATCH V5 1/3] fs/mbcache.c change block and index hash chain to hlist_bl_node T Makphaibulchoke
2014-02-20 18:37   ` [PATCH V5 2/3] mbcache: decoupling the locking of local from global data T Makphaibulchoke
2014-02-20 18:37   ` [PATCH V5 3/3] ext4: each filesystem creates and uses its own mb_cache T Makphaibulchoke
2014-03-12 16:19 ` [PATCH v5 RESEND 0/3] ext4: increase mbcache scalability T Makphaibulchoke
2014-03-12 16:19   ` [PATCH v5 RESEND 1/3] fs/mbcache.c change block and index hash chain to hlist_bl_node T Makphaibulchoke
2014-03-12 16:19   ` [PATCH v5 RESEND 2/3] mbcache: decoupling the locking of local from global data T Makphaibulchoke
2014-03-12 16:19   ` [PATCH v5 RESEND 3/3] ext4: each filesystem creates and uses its own mb_cache T Makphaibulchoke

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131030144229.GC3305@thunk.org \
    --to=tytso@mit.edu \
    --cc=adilger.kernel@dilger.ca \
    --cc=aswin@hp.com \
    --cc=aswin_proj@groups.hp.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tmac@hp.com \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).