From: Brian Foster <bfoster@redhat.com>
To: "Michael L. Semon" <mlsemon35@gmail.com>
Cc: xfs@oss.sgi.com
Subject: Re: [RFC PATCH 00/11] xfs: introduce the free inode btree
Date: Fri, 06 Sep 2013 07:17:21 -0400 [thread overview]
Message-ID: <5229B9C1.5050505@redhat.com> (raw)
In-Reply-To: <5228F4D6.4050306@gmail.com>
On 09/05/2013 05:17 PM, Michael L. Semon wrote:
> On 09/03/2013 02:24 PM, Brian Foster wrote:
>> Hi all,
>>
>> This is an RFC for the kernel work to support a free inode btree. The free inode
>> btree (finobt) is equivalent to the existing inode allocation btree with the
>> exception that the finobt only tracks inode chunks with at least one free inode.
>> The purpose is to improve lookups for free inode clusters for inode allocation.
>>
>> Newly allocated inode chunks by definition contain free inodes and are thus
>> inserted into the finobt immediately. The record for a previously full inode
>> chunk is inserted to the finobt when the first inode is freed. A record is
>> removed from the finobt when the last free inode has been allocated or the
>> entire chunk is completely deallocated.
>>
>> Patches 1-3 refactor some ialloc btree code to introduce the new finobt type and
>> feature bit. Patches 4-7 fix up the transaction handling for inode allocation
>> and deallocation to account for the new tree. Patches 8-10 add the finobt
>> management code to insert, remove and modify records as appropriate. Patch 11
>> fixes growfs to support the finobt.
>>
>> Thoughts, reviews, flames appreciated.
>
> I'm looking for Dave's judgement call on whether I should run this code
> full-time. The patchset applied well on top of Dave's latest work--only
> a "trailing whitespace" warning on Patch #9 (I think)--and the code
> compiled without error. There was a lockdep while running xfstests,
> before generic/013 (I think), so I switched back to my normal git branch
> and have your patchset in a separate branch.
>
Hi Michael,
Thanks for giving this a burn. I actually haven't tested with lockdep
yet, so I've made a note to do so and see if I can reproduce that or any
other problems.
With regard to running "full-time," I'd suggest to hold off at least
until I post a non-RFC version (the next version will probably be a real
v1). The review feedback has shown that a few areas need some
non-trivial rework that change the behavior enough (i.e., the flush on
inactive thing, transaction reservation management, etc.) that any
issues at this point might be worth retesting for on the later version
anyways.
> My setup here is slow--x86, old IDE hardware, write cache off, debug
> kernel--but the patchset made things seem a little slower. At the
> right time--not necessarily now--performance numbers might be nice.
> I didn't time it but did a copy of 3 kernel gits to v5 1k-block-size
> XFS and just felt something was off. The copy did complete, though.
> Will try timing this on another day.
>
That's interesting. I had a chat with Dave with regard to the existing
lookup allocation algorithm and the best way to test the improved
lookup, and I am able to reproduce a nice inode allocation improvement
under certain conditions. I could understand seeing some general effect
in performance via the addition of the new tree (there's more work to do
now to manage it, after all), but my expectation is that for finobt=0
filesystems, there should be no effect whatsoever.
> Anyway, good work so far! No additional stack traces were caused by
> your code in limited testing, and the filesystems were are still
> intact.
>
Thanks again for testing. Most of my testing so far has been with finobt
enabled, so that's a good bit of news! :)
Brian
> Thanks!
>
> Michael
>
> [lockdep from xfstests generic/0-ten-something follows:]
>
> [ 763.993429] XFS (sdb4): Mounting Filesystem
> [ 764.258701] XFS (sdb4): Ending clean mount
> [ 768.798390] XFS (sdb5): Mounting Filesystem
> [ 769.061280] XFS (sdb5): Ending clean mount
> [ 770.030277] XFS (sdb4): Mounting Filesystem
> [ 770.313502] XFS (sdb4): Ending clean mount
> [ 788.932588] XFS (sdb4): Mounting Filesystem
> [ 789.256815] XFS (sdb4): Ending clean mount
> [ 792.639933] XFS (sdb4): Mounting Filesystem
> [ 792.965477] XFS (sdb4): Ending clean mount
> [ 795.166220] XFS (sdb4): Mounting Filesystem
> [ 795.507372] XFS (sdb4): Ending clean mount
> [ 802.870263] XFS (sdb4): Mounting Filesystem
> [ 803.516422] XFS (sdb4): Ending clean mount
> [ 814.376620] XFS (sdb4): Mounting Filesystem
> [ 815.050778] XFS (sdb4): Ending clean mount
> [ 823.169368]
> [ 823.170932] ======================================================
> [ 823.172146] [ INFO: possible circular locking dependency detected ]
> [ 823.172146] 3.11.0+ #5 Not tainted
> [ 823.172146] -------------------------------------------------------
> [ 823.172146] dirstress/5276 is trying to acquire lock:
> [ 823.172146] (sb_internal){.+.+.+}, at: [<c11c5e60>] xfs_trans_alloc+0x1f/0x35
> [ 823.172146]
> [ 823.172146] but task is already holding lock:
> [ 823.172146] (&(&ip->i_lock)->mr_lock){+++++.}, at: [<c1206cfb>] xfs_ilock+0x100/0x1f1
> [ 823.172146]
> [ 823.172146] which lock already depends on the new lock.
> [ 823.172146]
> [ 823.172146]
> [ 823.172146] the existing dependency chain (in reverse order) is:
> [ 823.172146]
> [ 823.172146] -> #1 (&(&ip->i_lock)->mr_lock){+++++.}:
> [ 823.172146] [<c1070a11>] __lock_acquire+0x345/0xa11
> [ 823.172146] [<c1071c45>] lock_acquire+0x88/0x17e
> [ 823.172146] [<c14bff98>] _raw_spin_lock+0x47/0x74
> [ 823.172146] [<c1116247>] __mark_inode_dirty+0x171/0x38c
> [ 823.172146] [<c111acab>] __set_page_dirty+0x5f/0x95
> [ 823.172146] [<c111b93e>] mark_buffer_dirty+0x58/0x12b
> [ 823.172146] [<c111baff>] __block_commit_write.isra.17+0x64/0x82
> [ 823.172146] [<c111c197>] block_write_end+0x2b/0x53
> [ 823.172146] [<c111c201>] generic_write_end+0x42/0x9a
> [ 823.172146] [<c11a42d5>] xfs_vm_write_end+0x60/0xbe
> [ 823.172146] [<c10bd47a>] generic_file_buffered_write+0x140/0x20f
> [ 823.172146] [<c11b2347>] xfs_file_buffered_aio_write+0x10b/0x205
> [ 823.172146] [<c11b24ee>] xfs_file_aio_write+0xad/0xec
> [ 823.172146] [<c10f0c5f>] do_sync_write+0x60/0x87
> [ 823.172146] [<c10f0e0f>] vfs_write+0x9c/0x189
> [ 823.172146] [<c10f0fc6>] SyS_write+0x49/0x81
> [ 823.172146] [<c14c14bb>] sysenter_do_call+0x12/0x32
> [ 823.172146]
> [ 823.172146] -> #0 (sb_internal){.+.+.+}:
> [ 823.172146] [<c106e972>] validate_chain.isra.35+0xfc7/0xff4
> [ 823.172146] [<c1070a11>] __lock_acquire+0x345/0xa11
> [ 823.172146] [<c1071c45>] lock_acquire+0x88/0x17e
> [ 823.172146] [<c10f36eb>] __sb_start_write+0xad/0x177
> [ 823.172146] [<c11c5e60>] xfs_trans_alloc+0x1f/0x35
> [ 823.172146] [<c120a823>] xfs_inactive+0x129/0x4a3
> [ 823.172146] [<c11c280d>] xfs_fs_evict_inode+0x6c/0x114
> [ 823.172146] [<c1106678>] evict+0x8e/0x15d
> [ 823.172146] [<c1107126>] iput+0xc4/0x138
> [ 823.172146] [<c1103504>] dput+0x1b2/0x257
> [ 823.172146] [<c10f1a30>] __fput+0x140/0x1eb
> [ 823.172146] [<c10f1b0f>] ____fput+0xd/0xf
> [ 823.172146] [<c1048477>] task_work_run+0x67/0x90
> [ 823.172146] [<c1001ea5>] do_notify_resume+0x61/0x63
> [ 823.172146] [<c14c0cfa>] work_notifysig+0x1f/0x25
> [ 823.172146]
> [ 823.172146] other info that might help us debug this:
> [ 823.172146]
> [ 823.172146] Possible unsafe locking scenario:
> [ 823.172146]
> [ 823.172146] CPU0 CPU1
> [ 823.172146] ---- ----
> [ 823.172146] lock(&(&ip->i_lock)->mr_lock);
> [ 823.172146] lock(sb_internal);
> [ 823.172146] lock(&(&ip->i_lock)->mr_lock);
> [ 823.172146] lock(sb_internal);
> [ 823.172146]
> [ 823.172146] *** DEADLOCK ***
> [ 823.172146]
> [ 823.172146] 1 lock held by dirstress/5276:
> [ 823.172146] #0: (&(&ip->i_lock)->mr_lock){+++++.}, at: [<c1206cfb>] xfs_ilock+0x100/0x1f1
> [ 823.172146]
> [ 823.172146] stack backtrace:
> [ 823.172146] CPU: 0 PID: 5276 Comm: dirstress Not tainted 3.11.0+ #5
> [ 823.172146] Hardware name: Dell Computer Corporation Dimension 2350/07W080, BIOS A01 12/17/2002
> [ 823.172146] c1c26060 c1c26060 da34fd58 c14ba216 da34fd78 c14b7317 c15f171b da34fdb4
> [ 823.172146] dcaa1440 00000001 dcaa18b0 00000000 da34fde4 c106e972 dcaa1888 00000001
> [ 823.172146] da34fdb4 c1057e0f 00000000 00003f61 c1c28660 00000000 dcaa1888 dcaa18b0
> [ 823.172146] Call Trace:
> [ 823.172146] [<c14ba216>] dump_stack+0x16/0x18
> [ 823.172146] [<c14b7317>] print_circular_bug+0x1b8/0x1c2
> [ 823.172146] [<c106e972>] validate_chain.isra.35+0xfc7/0xff4
> [ 823.172146] [<c1057e0f>] ? sched_clock_local.constprop.3+0x39/0x131
> [ 823.172146] [<c1057fd4>] ? sched_clock_cpu+0x8f/0xe2
> [ 823.172146] [<c1070a11>] __lock_acquire+0x345/0xa11
> [ 823.172146] [<c1070a36>] ? __lock_acquire+0x36a/0xa11
> [ 823.172146] [<c1071c45>] lock_acquire+0x88/0x17e
> [ 823.172146] [<c11c5e60>] ? xfs_trans_alloc+0x1f/0x35
> [ 823.172146] [<c10f36eb>] __sb_start_write+0xad/0x177
> [ 823.172146] [<c11c5e60>] ? xfs_trans_alloc+0x1f/0x35
> [ 823.172146] [<c11c5e60>] ? xfs_trans_alloc+0x1f/0x35
> [ 823.172146] [<c1206cfb>] ? xfs_ilock+0x100/0x1f1
> [ 823.172146] [<c11c5e60>] xfs_trans_alloc+0x1f/0x35
> [ 823.172146] [<c120a823>] xfs_inactive+0x129/0x4a3
> [ 823.172146] [<c106f21f>] ? trace_hardirqs_on+0xb/0xd
> [ 823.172146] [<c14c01e5>] ? _raw_spin_unlock_irq+0x27/0x36
> [ 823.172146] [<c11c280d>] xfs_fs_evict_inode+0x6c/0x114
> [ 823.172146] [<c1106678>] evict+0x8e/0x15d
> [ 823.172146] [<c1107126>] iput+0xc4/0x138
> [ 823.172146] [<c1103504>] dput+0x1b2/0x257
> [ 823.172146] [<c10f1a30>] __fput+0x140/0x1eb
> [ 823.172146] [<c10f1b0f>] ____fput+0xd/0xf
> [ 823.172146] [<c1048477>] task_work_run+0x67/0x90
> [ 823.172146] [<c1001ea5>] do_notify_resume+0x61/0x63
> [ 823.172146] [<c14c0cfa>] work_notifysig+0x1f/0x25
> [ 824.015366] Clocksource tsc unstable (delta = 486645129 ns)
> [ 825.324019] XFS (sdb4): Mounting Filesystem
> [ 825.743317] XFS (sdb4): Ending clean mount
> [ 827.223193] XFS (sdb4): Mounting Filesystem
> [ 827.668493] XFS (sdb4): Ending clean mount
> [ 837.524673] XFS (sdb4): Mounting Filesystem
> [ 837.986097] XFS (sdb4): Ending clean mount
>
>
>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2013-09-06 11:21 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-09-03 18:24 [RFC PATCH 00/11] xfs: introduce the free inode btree Brian Foster
2013-09-03 18:24 ` [RFC PATCH 01/11] xfs: refactor xfs_ialloc_btree.c to support multiple inobt numbers Brian Foster
2013-09-05 0:36 ` Dave Chinner
2013-09-03 18:24 ` [RFC PATCH 02/11] xfs: reserve v5 superblock read-only compat. feature bit for finobt Brian Foster
2013-09-05 0:39 ` Dave Chinner
2013-09-03 18:25 ` [RFC PATCH 03/11] xfs: support the XFS_BTNUM_FINOBT free inode btree type Brian Foster
2013-09-05 0:54 ` Dave Chinner
2013-09-05 16:17 ` Brian Foster
2013-09-06 0:07 ` Dave Chinner
2013-09-06 11:25 ` Brian Foster
2013-09-06 21:22 ` Dave Chinner
2013-09-03 18:25 ` [RFC PATCH 04/11] xfs: update inode allocation transaction reservations for finobt Brian Foster
2013-09-05 0:59 ` Dave Chinner
2013-09-05 16:17 ` Brian Foster
2013-09-06 0:11 ` Dave Chinner
2013-09-03 18:25 ` [RFC PATCH 05/11] xfs: update ifree " Brian Foster
2013-09-05 1:00 ` Dave Chinner
2013-09-03 18:25 ` [RFC PATCH 06/11] xfs: use correct transaction reservations in xfs_inactive() Brian Foster
2013-09-05 1:35 ` Dave Chinner
2013-09-05 16:18 ` Brian Foster
2013-09-03 18:25 ` [RFC PATCH 07/11] xfs: retry trans reservation on ENOSPC " Brian Foster
2013-09-05 1:40 ` Dave Chinner
2013-09-05 16:18 ` Brian Foster
2013-09-06 0:17 ` Dave Chinner
2013-09-06 11:30 ` Brian Foster
2013-09-03 18:25 ` [RFC PATCH 08/11] xfs: insert newly allocated inode chunks into the finobt Brian Foster
2013-09-05 2:10 ` Dave Chinner
2013-09-03 18:25 ` [RFC PATCH 09/11] xfs: use and update the finobt on inode allocation Brian Foster
2013-09-05 2:27 ` Dave Chinner
2013-09-05 16:18 ` Brian Foster
2013-09-03 18:25 ` [RFC PATCH 10/11] xfs: update the finobt on inode free Brian Foster
2013-09-05 2:54 ` Dave Chinner
2013-09-05 16:19 ` Brian Foster
2013-09-06 0:28 ` Dave Chinner
2013-09-06 11:39 ` Brian Foster
2013-09-06 21:24 ` Dave Chinner
2013-09-07 12:30 ` Brian Foster
2013-09-08 20:08 ` Michael L. Semon
2013-09-09 2:34 ` Better numbers " Michael L. Semon
2013-09-03 18:25 ` [RFC PATCH 11/11] xfs: add finobt support to growfs Brian Foster
2013-09-05 2:55 ` Dave Chinner
2013-09-05 21:17 ` [RFC PATCH 00/11] xfs: introduce the free inode btree Michael L. Semon
2013-09-06 11:17 ` Brian Foster [this message]
2013-09-06 21:35 ` Dave Chinner
2013-09-07 12:31 ` Brian Foster
2013-09-08 1:04 ` Michael L. Semon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5229B9C1.5050505@redhat.com \
--to=bfoster@redhat.com \
--cc=mlsemon35@gmail.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox