All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lachlan McIlroy <lachlan@sgi.com>
To: Christoph Hellwig <hch@lst.de>
Cc: Mark Goodwin <markgw@sgi.com>, xfs@oss.sgi.com
Subject: Re: fw: [PATCH] fix instant oops with tracing enabled
Date: Wed, 15 Oct 2008 11:27:09 +1000	[thread overview]
Message-ID: <48F546ED.6050702@sgi.com> (raw)
In-Reply-To: <20081014131140.GB17351@lst.de>

Christoph Hellwig wrote:
> On Tue, Oct 14, 2008 at 10:40:15AM +1000, Mark Goodwin wrote:
>> Lachlan also saw some regressions after merging these patchsets :
>> . replace the mount inode list with radix tree traversals
>> . clean up sync code
> 
> What exactly?  I saw some softlookup in 042, but when applying Dave's
> xfs_sync_inodeS_ag fix (or the hal of it applying without the del inodes
> tracking in the radix tree) it goes away.

I saw this panic but I don't think it's related to the above patches:

[252921.307588] BUG: unable to handle kernel <3>BUG: scheduling while atomic: dd/16976/0xf101da90
[252921.307908] Modules linked in:
[252921.307911] Pid: 16976, comm: dd Not tainted 2.6.27-rc8 #183
[252921.307913] 
[252921.307913] Call Trace:
[252921.307920]  [<ffffffff8102fe22>] __schedule_bug+0x62/0x66
[252921.307923]  [<ffffffff8153dce1>] schedule+0x99/0x7c7
[252921.307925]  [<ffffffff8153e890>] schedule_timeout+0x22/0xb4
[252921.307929]  [<ffffffff810473f9>] ? add_wait_queue_exclusive+0x3c/0x41
[252921.307932]  [<ffffffff81198bc9>] xlog_state_get_iclog_space+0xe8/0x278
[252921.307934]  [<ffffffff8102de2d>] ? default_wake_function+0x0/0xf
[252921.307936]  [<ffffffff81198e6d>] xlog_write+0x114/0x579
[252921.307938]  [<ffffffff811761d5>] ? xfs_buf_item_pin+0x76/0x7b
[252921.307940]  [<ffffffff811993a7>] xfs_log_write+0x38/0x62
[252921.307943]  [<ffffffff811a4f6b>] _xfs_trans_commit+0x1fd/0x3c6
[252921.307945]  [<ffffffff81193e93>] xfs_iomap_write_allocate+0x2d5/0x387
[252921.307947]  [<ffffffff81194e07>] xfs_iomap+0x331/0x3ba
[252921.307950]  [<ffffffff811b0930>] xfs_map_blocks+0x30/0x69
[252921.307952]  [<ffffffff811b1a00>] xfs_page_state_convert+0x2e5/0x594
[252921.307953]  [<ffffffff811b1f1b>] ? xfs_vm_releasepage+0xae/0xbd
[252921.307955]  [<ffffffff811b1ff1>] xfs_vm_writepage+0xc7/0x109
[252921.307958]  [<ffffffff81080835>] shrink_page_list+0x31a/0x57c
[252921.307960]  [<ffffffff81080be3>] shrink_inactive_list+0x126/0x39d
[252921.307962]  [<ffffffff81080f3f>] shrink_zone+0xe5/0x10a
[252921.307964]  [<ffffffff81081436>] try_to_free_pages+0x248/0x3cf
[252921.307966]  [<ffffffff8108042f>] ? isolate_pages_global+0x0/0x34
[252921.307967]  [<ffffffff8107cc3c>] __alloc_pages_internal+0x262/0x3b6
[252921.307969]  [<ffffffff811b4284>] ? xfs_buf_get_flags+0x6b/0x165
[252921.307972]  [<ffffffff8109709f>] alloc_pages_current+0xb9/0xc2
[252921.307974]  [<ffffffff8109d66b>] new_slab+0x57/0x283
[252921.307975]  [<ffffffff8109daeb>] __slab_alloc+0x1e8/0x3dd
[252921.307977]  [<ffffffff811b0220>] ? kmem_zone_alloc+0x58/0xaa
[252921.307980]  [<ffffffff811638c1>] ? xfs_bmap_search_multi_extents+0x9a/0xda
[252921.307982]  [<ffffffff8109e07e>] kmem_cache_alloc+0x43/0x76
[252921.307983]  [<ffffffff811b0220>] kmem_zone_alloc+0x58/0xaa
[252921.307985]  [<ffffffff811b0281>] kmem_zone_zalloc+0xf/0x31
[252921.307986]  [<ffffffff811a555c>] _xfs_trans_alloc+0x25/0x5f
[252921.307988]  [<ffffffff811a562c>] xfs_trans_alloc+0x96/0xa1
[252921.307990]  [<ffffffff81193d05>] xfs_iomap_write_allocate+0x147/0x387
[252921.307991]  [<ffffffff81194db4>] ? xfs_iomap+0x2de/0x3ba
[252921.307993]  [<ffffffff81194e07>] xfs_iomap+0x331/0x3ba
[252921.307995]  [<ffffffff811b0930>] xfs_map_blocks+0x30/0x69
[252921.307996]  [<ffffffff811b1a00>] xfs_page_state_convert+0x2e5/0x594
[252921.307998]  [<ffffffff811b1ff1>] xfs_vm_writepage+0xc7/0x109
[252921.307999]  [<ffffffff8107cec6>] __writepage+0x12/0x2b
[252921.308001]  [<ffffffff8107d39a>] write_cache_pages+0x1b3/0x317
[252921.308003]  [<ffffffff8107ceb4>] ? __writepage+0x0/0x2b
[252921.308004]  [<ffffffff8107d51d>] generic_writepages+0x1f/0x25
[252921.308006]  [<ffffffff811b20ca>] xfs_vm_writepages+0x43/0x4b
[252921.308007]  [<ffffffff8107d54b>] do_writepages+0x28/0x37
[252921.308011]  [<ffffffff810bfd82>] __writeback_single_inode+0x145/0x29f
[252921.308015]  [<ffffffff812283c5>] ? prop_fraction_single+0x3d/0x5f
[252921.308017]  [<ffffffff810c0294>] generic_sync_sb_inodes+0x1d0/0x2ba
[252921.308019]  [<ffffffff810c0387>] sync_sb_inodes+0x9/0xb
[252921.308021]  [<ffffffff810c06f3>] writeback_inodes+0x64/0xad
[252921.308023]  [<ffffffff8107da26>] balance_dirty_pages_ratelimited_nr+0x16b/0x2dd
[252921.308027]  [<ffffffff8107769f>] generic_file_buffered_write+0x203/0x625
[252921.308028]  [<ffffffff8107c16d>] ? get_page_from_freelist+0x45e/0x5d0
[252921.308031]  [<ffffffff811b8b80>] ? xfs_rw_enter_trace+0xbf/0xca
[252921.308032]  [<ffffffff811b9641>] xfs_write+0x64f/0x9cf
[252921.308035]  [<ffffffff81076b4e>] ? find_lock_page+0x2b/0x61
[252921.308037]  [<ffffffff811b50c3>] __xfs_file_write+0x4c/0x4e
[252921.308038]  [<ffffffff811b50e9>] xfs_file_aio_write+0x11/0x13
[252921.308040]  [<ffffffff810a2f94>] do_sync_write+0xe2/0x126
[252921.308042]  [<ffffffff81084935>] ? __do_fault+0x326/0x36c
[252921.308044]  [<ffffffff810471d3>] ? autoremove_wake_function+0x0/0x38
[252921.308047]  [<ffffffff811e8618>] ? selinux_file_permission+0x10d/0x116
[252921.308050]  [<ffffffff811e1321>] ? security_file_permission+0x11/0x13
[252921.308052]  [<ffffffff810a3790>] vfs_write+0xae/0x157
[252921.308053]  [<ffffffff810a3c9e>] sys_write+0x47/0x6f
[252921.308055]  [<ffffffff8100bf3b>] system_call_fastpath+0x16/0x1b
[252921.308056] 
[252921.308125] paging request at ffff881829c85a78
[252921.308125] IP: [<ffffffff810297a3>] cpuacct_charge+0x2b/0x34
[252921.308125] PGD 202063 PUD 0 
[252921.308125] Oops: 0000 [1] SMP 


I saw sync get stuck in an infinite loop running test 042 - maybe the same
problem you saw.

[1]kdb> btp 7356
Stack traceback for pid 7356
0xffff880071d10740     7356     7390  1    2   R  0xffff880071d10ba8  sync
sp                ip                Function (args)
0xffff880058cc3c88 0xffffffff81540566 kdb_interrupt+0x66 (0xffff8800720e4ac4, 0x202, 0x0, 0xffff88007119b810, 0xffff880058cc3d48, 0xffff88007213deb8)
0xffff880058cc3ce8 0xffffffff8153ff8e _spin_unlock_irqrestore+0x8 (0xffff8800720e4ac4, 0x202)
0xffff880058cc3d20 0xffffffff81229b96 __down_read_trylock+0x3f (invalid)
0xffff880058cc3d40 0xffffffff8104a34d down_read_trylock+0x9
0xffff880058cc3d50 0xffffffff8118bcd9 xfs_ilock_nowait+0xaf (0xffff8800720e4a00, invalid)
0xffff880058cc3d80 0xffffffff811bc3d9 xfs_sync_inodes_ag+0x12a (0xffff88007119b800, invalid, invalid, 0x0)
0xffff880058cc3e00 0xffffffff811bc6ee xfs_sync_inodes+0x65 (0xffff88007119b800, invalid, 0x0)
0xffff880058cc3e40 0xffffffff811bc785 xfs_syncsub+0x67 (0xffff88007119b800, invalid, 0x0)
0xffff880058cc3e80 0xffffffff811bc9d0 xfs_sync+0x7d (0xffff88007119b800, invalid)
0xffff880058cc3eb0 0xffffffff811ba6b9 xfs_fs_sync_super+0x38 (0xffff88007e056000)
0xffff880058cc3f20 0xffffffff810a5311 sync_filesystems+0xb7 (invalid)
0xffff880058cc3f50 0xffffffff810c2deb do_sync+0x37 (0x1)
0xffff880058cc3f70 0xffffffff810c2e25 sys_sync+0xe
  not matched: from 0xffffffff8100bfad to 0xffffffff8100c025 drop_through 0 bb_jmp[7


I saw the panic in _xfs_itrace_exit() which has now been fixed.

And I also saw this assertion:

<4>[34770.626472] Assertion failed: (index >= 0) && (index < ktp->kt_nentries), file: fs/xfs/support/ktrace.c, line: 173
<0>[34770.626511] ------------[ cut here ]------------
<2>[34770.627419] kernel BUG at fs/xfs/support/debug.c:81!

[2]kdb> bt
Stack traceback for pid 400
0xffff88007e883a00      400        2  1    2   R  0xffff88007e883e68 *xfslogd/2
sp                ip                Function (args)
0xffff88007e66fbf8 0xffffffff811bd5d5 assfail+0x1a (invalid, invalid, invalid)
0xffff88007e66fc28 0xffffffff811bdb24 ktrace_enter+0x8b (invalid, invalid, invalid, invalid, invalid, invalid, invalid, invalid, invalid)
0xffff88007e66fc78 0xffffffff81175b35 xfs_buf_item_trace+0xe6 (0xffffffff816d8948, 0xffff88007c47cb40)
0xffff88007e66fd18 0xffffffff81175b60 xfs_buf_item_committed+0x1c (0xffff88007c47cb40, 0x100000b1f)
0xffff88007e66fd38 0xffffffff811a4766 xfs_trans_chunk_committed+0x60 (0xffff880050124780, 0x100000b1f, invalid)
0xffff88007e66fd98 0xffffffff811a4873 xfs_trans_committed+0x43 (0xffff880050124670, invalid)
0xffff88007e66fdc8 0xffffffff81197b2a xlog_state_do_callback+0x19a (0xffff88007ef78400, invalid, 0xffff88007ef79000)
0xffff88007e66fe38 0xffffffff81197d6d xlog_state_done_syncing+0xda (0xffff88007ef79000, invalid)
0xffff88007e66fe68 0xffffffff81198587 xlog_iodone+0x154 (0xffff88006ac37c80)
0xffff88007e66fe98 0xffffffff811b3afb xfs_buf_iodone_work+0x65 (invalid)
0xffff88007e66feb8 0xffffffff81043cfb run_workqueue+0x7c (0xffff88007e866b80)
0xffff88007e66fed8 0xffffffff81044711 worker_thread+0xd8 (0xffff88007e866b80)
0xffff88007e66ff28 0xffffffff810470a3 kthread+0x49 (invalid)
0xffff88007e66ff48 0xffffffff8100ce89 child_rip+0xa (invalid, invalid)


> 
>>> If that
>>> series is going to be included in the current round of checkins
>>> then this patch probably isn't needed.
>> The agreed plan for 2.6.28 still has the following patchsets to go in:
>>
>> . Combine the XFS and Linux inode structures V2
>> . Track reclaimable inodes in inode cache
>> . AIL cleanup and bug fixes
>> . Account for allocated blocks when expanding directories
>> . Check for valid transaction headers in recovery
>> . fix remount rw with unrecognized options
> 
> 
> 3-6 are small bug fixes and should go in ASAP.  I'd really like to see 1
> and 2 and volunter to help sorting out any fallout.  Not entirely sure
> about the AIL patches - they seem ready but at least they don't have
> much impact on everything else.   So if you really want to reduce the
> amount of patches those would be the ones.
> 
> 
> 

  reply	other threads:[~2008-10-15  0:26 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-10-13 22:39 fw: [PATCH] fix instant oops with tracing enabled Dave Chinner
2008-10-14  0:40 ` Mark Goodwin
2008-10-14  2:04   ` Dave Chinner
2008-10-14 13:11   ` Christoph Hellwig
2008-10-15  1:27     ` Lachlan McIlroy [this message]
2008-10-15  0:54       ` Dave Chinner
2008-10-15  2:28         ` Lachlan McIlroy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=48F546ED.6050702@sgi.com \
    --to=lachlan@sgi.com \
    --cc=hch@lst.de \
    --cc=markgw@sgi.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.