From: "Michael L. Semon" <mlsemon35@gmail.com>
To: Brian Foster <bfoster@redhat.com>, xfs@oss.sgi.com
Subject: Re: [PATCH v2 00/11] xfs: introduce the free inode btree
Date: Sun, 17 Nov 2013 17:43:17 -0500 [thread overview]
Message-ID: <52894685.8080603@gmail.com> (raw)
In-Reply-To: <1384353427-36205-1-git-send-email-bfoster@redhat.com>
On 11/13/2013 09:36 AM, Brian Foster wrote:
> Hi all,
>
> The free inode btree adds a new inode btree to XFS with the intent to
> track only inode chunks with at least one free inode. Patches 1-3 add
> the necessary support for the new XFS_BTNUM_FINOBT type and introduce a
> read-only v5 superblock flag. Patch 4 updates the transaction
> reservations for inode allocation operations to account for the finobt.
> Patches 5-9 add support to manage the finobt on inode chunk allocation,
> inode allocation, inode free (and chunk deletion) and growfs. Patch 10
> adds support to report finobt status in the fs geometry. Patch 11 adds
> the feature bit to the associated mask. Thoughts, reviews, flames
> appreciated.
>
> Brian
>
> v2:
> - Rebase to latest xfs tree (minor shifting around of some header bits).
> - Added "xfs: report finobt status in fs geometry" patch to series.
Very nice rebase! There might have been a whitespace issue on patch #6
for kernel and xfsprogs, but it was easy going after that.
I'm halfway through testing 4k finobt CRC filesystems on a 2.2-GB, 2-disk
md RAID-0, x86 Pentium 4, 512 MB of RAM. The current nasty setup is
kernel 3.12.0+, less the 5 most recent AIO commits/merges, and me trying
to get in the few not-merged Dave Chinner kernel/xfsprogs patches along
with your patches.
I meant to be done with 4k by now, but generic/224 caused the kernel OOM
killer to halt testing, much like it does in 256 MB RAM without finobt.
No problem: I'll thank Stan in advance for introducing me to the term
O_PONIES.
The rest of this letter is random junk that hasn't been re-tested, to
give a flavor of what might lie ahead. I'm missing a stack trace to the
effect of "Error 117: offline filesystem operation in progress" as
something later than xfstests xfs/296 was running. None of this letter
needs a reply.
Good luck!
Michael
[NOISE FOLLOWS]
***** I don't know if this one is an xfstests issue or an xfsprogs
issue. Something like this also happened in a non-finobt
`./check -g auto`...
xfs/033 [failed, exit status 1] - output mismatch (see /var/lib/xfstests/results//xfs/033.out.bad)
--- tests/xfs/033.out 2013-11-11 13:46:22.367412935 -0500
+++ /var/lib/xfstests/results//xfs/033.out.bad 2013-11-17 12:57:28.010382465 -0500
@@ -17,9 +17,10 @@
- process known inodes and perform inode discovery...
bad magic number 0x0 on inode INO
bad version number 0x0 on inode INO
+inode identifier 0 mismatch on inode INO
bad magic number 0x0 on inode INO, resetting magic number
bad version number 0x0 on inode INO, resetting version number
-imap claims a free inode INO is in use, correcting imap and clearing inode
...
(Run 'diff -u tests/xfs/033.out /var/lib/xfstests/results//xfs/033.out.bad' to see the entire diff)
***** The diff for xfs/033:
19a20
> inode identifier 0 mismatch on inode INO
22c23
< imap claims a free inode INO is in use, correcting imap and clearing inode
---
> inode identifier 0 mismatch on inode INO
33,194c34,37
< - resetting contents of realtime bitmap and summary inodes
< - traversing filesystem ...
< - traversal finished ...
< - moving disconnected inodes to lost+found ...
< Phase 7 - verify and correct link counts...
< resetting inode INO nlinks from 1 to 2
< done
< Corrupting rt bitmap inode - setting bits to 0
< Wrote X.XXKb (value 0x0)
< Phase 1 - find and verify superblock...
< Phase 2 - using <TYPEOF> log
< - zero log...
< - scan filesystem freespace and inode maps...
< - found root inode chunk
< Phase 3 - for each AG...
< - scan and clear agi unlinked lists...
< - process known inodes and perform inode discovery...
< bad magic number 0x0 on inode INO
< bad version number 0x0 on inode INO
< bad magic number 0x0 on inode INO, resetting magic number
< bad version number 0x0 on inode INO, resetting version number
< imap claims a free inode INO is in use, correcting imap and clearing inode
< cleared realtime bitmap inode INO
< - process newly discovered inodes...
< Phase 4 - check for duplicate blocks...
< - setting up duplicate extent list...
< - check for inodes claiming duplicate blocks...
< Phase 5 - rebuild AG headers and trees...
< - reset superblock...
< Phase 6 - check inode connectivity...
< reinitializing realtime bitmap inode
< - resetting contents of realtime bitmap and summary inodes
< - traversing filesystem ...
< - traversal finished ...
< - moving disconnected inodes to lost+found ...
< Phase 7 - verify and correct link counts...
< done
< Corrupting rt summary inode - setting bits to 0
< Wrote X.XXKb (value 0x0)
< Phase 1 - find and verify superblock...
< Phase 2 - using <TYPEOF> log
< - zero log...
< - scan filesystem freespace and inode maps...
< - found root inode chunk
< Phase 3 - for each AG...
< - scan and clear agi unlinked lists...
< - process known inodes and perform inode discovery...
< bad magic number 0x0 on inode INO
< bad version number 0x0 on inode INO
< bad magic number 0x0 on inode INO, resetting magic number
< bad version number 0x0 on inode INO, resetting version number
< imap claims a free inode INO is in use, correcting imap and clearing inode
< cleared realtime summary inode INO
< - process newly discovered inodes...
< Phase 4 - check for duplicate blocks...
< - setting up duplicate extent list...
< - check for inodes claiming duplicate blocks...
< Phase 5 - rebuild AG headers and trees...
< - reset superblock...
< Phase 6 - check inode connectivity...
< reinitializing realtime summary inode
< - resetting contents of realtime bitmap and summary inodes
< - traversing filesystem ...
< - traversal finished ...
< - moving disconnected inodes to lost+found ...
< Phase 7 - verify and correct link counts...
< done
< Corrupting root inode - setting bits to -1
< Wrote X.XXKb (value 0xffffffff)
< Phase 1 - find and verify superblock...
< Phase 2 - using <TYPEOF> log
< - zero log...
< - scan filesystem freespace and inode maps...
< - found root inode chunk
< Phase 3 - for each AG...
< - scan and clear agi unlinked lists...
< - process known inodes and perform inode discovery...
< bad magic number 0xffff on inode INO
< bad version number 0xffffffff on inode INO
< bad (negative) size -1 on inode INO
< bad magic number 0xffff on inode INO, resetting magic number
< bad version number 0xffffffff on inode INO, resetting version number
< bad (negative) size -1 on inode INO
< cleared root inode INO
< - process newly discovered inodes...
< Phase 4 - check for duplicate blocks...
< - setting up duplicate extent list...
< root inode lost
< - check for inodes claiming duplicate blocks...
< Phase 5 - rebuild AG headers and trees...
< - reset superblock...
< Phase 6 - check inode connectivity...
< reinitializing root directory
< - resetting contents of realtime bitmap and summary inodes
< - traversing filesystem ...
< - traversal finished ...
< - moving disconnected inodes to lost+found ...
< Phase 7 - verify and correct link counts...
< resetting inode INO nlinks from 1 to 2
< done
< Corrupting rt bitmap inode - setting bits to -1
< Wrote X.XXKb (value 0xffffffff)
< Phase 1 - find and verify superblock...
< Phase 2 - using <TYPEOF> log
< - zero log...
< - scan filesystem freespace and inode maps...
< - found root inode chunk
< Phase 3 - for each AG...
< - scan and clear agi unlinked lists...
< - process known inodes and perform inode discovery...
< bad magic number 0xffff on inode INO
< bad version number 0xffffffff on inode INO
< bad (negative) size -1 on inode INO
< bad magic number 0xffff on inode INO, resetting magic number
< bad version number 0xffffffff on inode INO, resetting version number
< bad (negative) size -1 on inode INO
< cleared realtime bitmap inode INO
< - process newly discovered inodes...
< Phase 4 - check for duplicate blocks...
< - setting up duplicate extent list...
< - check for inodes claiming duplicate blocks...
< Phase 5 - rebuild AG headers and trees...
< - reset superblock...
< Phase 6 - check inode connectivity...
< reinitializing realtime bitmap inode
< - resetting contents of realtime bitmap and summary inodes
< - traversing filesystem ...
< - traversal finished ...
< - moving disconnected inodes to lost+found ...
< Phase 7 - verify and correct link counts...
< done
< Corrupting rt summary inode - setting bits to -1
< Wrote X.XXKb (value 0xffffffff)
< Phase 1 - find and verify superblock...
< Phase 2 - using <TYPEOF> log
< - zero log...
< - scan filesystem freespace and inode maps...
< - found root inode chunk
< Phase 3 - for each AG...
< - scan and clear agi unlinked lists...
< - process known inodes and perform inode discovery...
< bad magic number 0xffff on inode INO
< bad version number 0xffffffff on inode INO
< bad (negative) size -1 on inode INO
< bad magic number 0xffff on inode INO, resetting magic number
< bad version number 0xffffffff on inode INO, resetting version number
< bad (negative) size -1 on inode INO
< cleared realtime summary inode INO
< - process newly discovered inodes...
< Phase 4 - check for duplicate blocks...
< - setting up duplicate extent list...
< - check for inodes claiming duplicate blocks...
< Phase 5 - rebuild AG headers and trees...
< - reset superblock...
< Phase 6 - check inode connectivity...
< reinitializing realtime summary inode
< - resetting contents of realtime bitmap and summary inodes
< - traversing filesystem ...
< - traversal finished ...
< - moving disconnected inodes to lost+found ...
< Phase 7 - verify and correct link counts...
< done
---
> xfs_imap_to_bp: xfs_trans_read_buf() returned error 117.
>
> fatal error -- could not iget root inode -- error - 117
> _check_xfs_filesystem: filesystem on /dev/md126 is inconsistent (r) (see /var/lib/xfstests/results//xfs/033.full)
***** This is the lone segfault so far:
xfs/291 [12832.846621] XFS (md126): Version 5 superblock detected. This kernel has EXPERIMENTAL support enabled!
[12832.846621] Use of these features in this kernel is at your own risk!
[12832.872608] XFS (md126): Mounting Filesystem
[12833.063779] XFS (md126): Ending clean mount
[13153.675046] XFS (md126): Version 5 superblock detected. This kernel has EXPERIMENTAL support enabled!
[13153.675046] Use of these features in this kernel is at your own risk!
[13153.694128] XFS (md126): Mounting Filesystem
[13154.105167] XFS (md126): Ending clean mount
[13201.470358] xfs_db[17902]: segfault at 9c157f8 ip 0809b6b0 sp bfe97950 error 4 in xfs_db[8048000+90000]
[failed, exit status 1] - output mismatch (see /var/lib/xfstests/results//xfs/291.out.bad)
--- tests/xfs/291.out 2013-11-11 13:46:26.652264785 -0500
+++ /var/lib/xfstests/results//xfs/291.out.bad 2013-11-17 16:28:05.133832908 -0500
@@ -1 +1,11 @@
QA output created by 291
+xfs_dir3_data_read_verify: XFS_CORRUPTION_ERROR
+xfs_dir3_data_read_verify: XFS_CORRUPTION_ERROR
+xfs_dir3_data_read_verify: XFS_CORRUPTION_ERROR
+xfs_dir3_data_read_verify: XFS_CORRUPTION_ERROR
+xfs_dir3_data_read_verify: XFS_CORRUPTION_ERROR
+__read_verify: XFS_CORRUPTION_ERROR
...
(Run 'diff -u tests/xfs/291.out /var/lib/xfstests/results//xfs/291.out.bad' to see the entire diff)
[13202.293470] XFS (md127): Version 5 superblock detected. This kernel has EXPERIMENTAL support enabled!
[13202.293470] Use of these features in this kernel is at your own risk!
[13202.309944] XFS (md127): Mounting Filesystem
[13202.587663] XFS (md127): Ending clean mount
***** I might not have seen this lockdep splat yet, but this
is a new merge window. This splat is repeatable and may be
independent of finobt.
xfs/078 [87803.635893]
======================================================
[ INFO: possible circular locking dependency detected ]
3.12.0+ #2 Not tainted
-------------------------------------------------------
xfs_repair/12944 is trying to acquire lock:
(timekeeper_seq){------}, at: [<c104f843>] __hrtimer_start_range_ns+0xc7/0x35d
but task is already holding lock:
(hrtimer_bases.lock){-.-.-.}, at: [<c104f7a4>] __hrtimer_start_range_ns+0x28/0x35d
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #5 (hrtimer_bases.lock){-.-.-.}:
[<c106577c>] lock_acquire+0x7f/0x15e
[<c162d072>] _raw_spin_lock_irqsave+0x4a/0x7a
[<c104f7a4>] __hrtimer_start_range_ns+0x28/0x35d
[<c1055b01>] start_bandwidth_timer+0x60/0x6f
[<c105b1c2>] enqueue_task_rt+0xd3/0xfd
[<c10546aa>] enqueue_task+0x45/0x60
[<c1055813>] __sched_setscheduler+0x243/0x372
[<c1056a21>] sched_setscheduler+0x17/0x19
[<c108ae53>] watchdog_enable+0x69/0x7d
[<c1053063>] smpboot_thread_fn+0x93/0x130
[<c104c4ab>] kthread+0xb3/0xc7
[<c162e4b7>] ret_from_kernel_thread+0x1b/0x28
-> #4 (&rt_b->rt_runtime_lock){-.-.-.}:
[<c106577c>] lock_acquire+0x7f/0x15e
[<c162cffb>] _raw_spin_lock+0x41/0x6e
[<c105b1ac>] enqueue_task_rt+0xbd/0xfd
[<c10546aa>] enqueue_task+0x45/0x60
[<c1055813>] __sched_setscheduler+0x243/0x372
[<c1056a21>] sched_setscheduler+0x17/0x19
[<c108ae53>] watchdog_enable+0x69/0x7d
[<c1053063>] smpboot_thread_fn+0x93/0x130
[<c104c4ab>] kthread+0xb3/0xc7
[<c162e4b7>] ret_from_kernel_thread+0x1b/0x28
-> #3 (&rq->lock){-.-.-.}:
[<c106577c>] lock_acquire+0x7f/0x15e
[<c162cffb>] _raw_spin_lock+0x41/0x6e
[<c10561da>] wake_up_new_task+0x3b/0x147
[<c102d132>] do_fork+0x116/0x305
[<c102d34e>] kernel_thread+0x2d/0x33
[<c161f0b2>] rest_init+0x22/0x128
[<c19f39da>] start_kernel+0x2df/0x2e5
[<c19f3378>] i386_start_kernel+0x12e/0x131
-> #2 (&p->pi_lock){-.-.-.}:
[<c106577c>] lock_acquire+0x7f/0x15e
[<c162d072>] _raw_spin_lock_irqsave+0x4a/0x7a
[<c1055e1a>] try_to_wake_up+0x23/0x138
[<c1055f60>] wake_up_process+0x1f/0x33
[<c104411c>] start_worker+0x25/0x28
[<c10451cc>] create_and_start_worker+0x37/0x5d
[<c1a03b34>] init_workqueues+0xd4/0x2c4
[<c19f3a99>] do_one_initcall+0xb9/0x153
[<c19f3b7e>] kernel_init_freeable+0x4b/0x17d
[<c161f1c8>] kernel_init+0x10/0xf2
[<c162e4b7>] ret_from_kernel_thread+0x1b/0x28
-> #1 (&(&pool->lock)->rlock){-.-.-.}:
[<c106577c>] lock_acquire+0x7f/0x15e
[<c162cffb>] _raw_spin_lock+0x41/0x6e
[<c104575a>] __queue_work+0x12b/0x393
[<c1045c26>] queue_work_on+0x2f/0x6a
[<c104f4b6>] clock_was_set_delayed+0x1d/0x1f
[<c1075a67>] do_adjtimex+0xf4/0x145
[<c1030ce0>] SYSC_adjtimex+0x30/0x62
[<c1030f67>] SyS_adjtimex+0x10/0x12
[<c162e53f>] sysenter_do_call+0x12/0x36
-> #0 (timekeeper_seq){------}:
[<c10648b9>] __lock_acquire+0x13a4/0x17ac
[<c106577c>] lock_acquire+0x7f/0x15e
[<c1073688>] ktime_get+0x4f/0x169
[<c104f843>] __hrtimer_start_range_ns+0xc7/0x35d
[<c104faff>] hrtimer_start_range_ns+0x26/0x2c
[<c104b17f>] common_timer_set+0xf5/0x164
[<c104bd58>] SyS_timer_settime+0xbe/0x183
[<c162dcc8>] syscall_call+0x7/0xb
other info that might help us debug this:
Chain exists of:
timekeeper_seq --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(hrtimer_bases.lock);
lock(&rt_b->rt_runtime_lock);
lock(hrtimer_bases.lock);
lock(timekeeper_seq);
*** DEADLOCK ***
2 locks held by xfs_repair/12944:
#0: (&(&new_timer->it_lock)->rlock){......}, at: [<c104b292>] __lock_timer+0xa4/0x1af
#1: (hrtimer_bases.lock){-.-.-.}, at: [<c104f7a4>] __hrtimer_start_range_ns+0x28/0x35d
stack backtrace:
CPU: 0 PID: 12944 Comm: xfs_repair Not tainted 3.12.0+ #2
Hardware name: Dell Computer Corporation Dimension 2350/07W080, BIOS A01 12/17/2002
c1cb2e70 c1cb2e70 deb01dd8 c162748d deb01df8 c162303c c17a3306 deb01e3c
deaad0c0 deaad550 deaad550 00000002 deb01e6c c10648b9 deaad528 0000006f
c106269b deb01e20 c1c8bd08 00000003 00000000 0000000e 00000002 00000001
Call Trace:
[<c162748d>] dump_stack+0x16/0x18
[<c162303c>] print_circular_bug+0x1b8/0x1c2
[<c10648b9>] __lock_acquire+0x13a4/0x17ac
[<c106269b>] ? trace_hardirqs_off+0xb/0xd
[<c106577c>] lock_acquire+0x7f/0x15e
[<c104f843>] ? __hrtimer_start_range_ns+0xc7/0x35d
[<c1073688>] ktime_get+0x4f/0x169
[<c104f843>] ? __hrtimer_start_range_ns+0xc7/0x35d
[<c162d098>] ? _raw_spin_lock_irqsave+0x70/0x7a
[<c104f7a4>] ? __hrtimer_start_range_ns+0x28/0x35d
[<c104f843>] __hrtimer_start_range_ns+0xc7/0x35d
[<c104faff>] hrtimer_start_range_ns+0x26/0x2c
[<c104b17f>] common_timer_set+0xf5/0x164
[<c104b08a>] ? __posix_timers_find+0xa7/0xa7
[<c104bd58>] SyS_timer_settime+0xbe/0x183
[<c162dcc8>] syscall_call+0x7/0xb
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2013-11-17 22:43 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-11-13 14:36 [PATCH v2 00/11] xfs: introduce the free inode btree Brian Foster
2013-11-13 14:36 ` [PATCH v2 01/11] xfs: refactor xfs_ialloc_btree.c to support multiple inobt numbers Brian Foster
2013-11-13 16:17 ` Christoph Hellwig
2013-11-13 14:36 ` [PATCH v2 02/11] xfs: reserve v5 superblock read-only compat. feature bit for finobt Brian Foster
2013-11-13 16:18 ` Christoph Hellwig
2013-11-13 14:36 ` [PATCH v2 03/11] xfs: support the XFS_BTNUM_FINOBT free inode btree type Brian Foster
2013-11-13 14:37 ` [PATCH v2 04/11] xfs: update inode allocation/free transaction reservations for finobt Brian Foster
2013-11-13 14:37 ` [PATCH v2 05/11] xfs: insert newly allocated inode chunks into the finobt Brian Foster
2013-11-13 14:37 ` [PATCH v2 06/11] xfs: use and update the finobt on inode allocation Brian Foster
2013-11-13 14:37 ` [PATCH v2 07/11] xfs: refactor xfs_difree() inobt bits into xfs_difree_inobt() helper Brian Foster
2013-11-13 14:37 ` [PATCH v2 08/11] xfs: update the finobt on inode free Brian Foster
2013-11-13 14:37 ` [PATCH v2 09/11] xfs: add finobt support to growfs Brian Foster
2013-11-13 14:37 ` [PATCH v2 10/11] xfs: report finobt status in fs geometry Brian Foster
2013-11-13 14:37 ` [PATCH v2 11/11] xfs: enable the finobt feature on v5 superblocks Brian Foster
2013-11-13 16:17 ` [PATCH v2 00/11] xfs: introduce the free inode btree Christoph Hellwig
2013-11-13 17:55 ` Brian Foster
2013-11-13 21:10 ` Dave Chinner
2013-11-19 21:29 ` Brian Foster
2013-11-19 22:17 ` Dave Chinner
2013-11-17 22:43 ` Michael L. Semon [this message]
2013-11-18 22:38 ` Michael L. Semon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=52894685.8080603@gmail.com \
--to=mlsemon35@gmail.com \
--cc=bfoster@redhat.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox