Re: fw: [PATCH] fix instant oops with tracing enabled

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

From: Lachlan McIlroy <lachlan@sgi.com>
To: Christoph Hellwig <hch@lst.de>
Cc: Mark Goodwin <markgw@sgi.com>, xfs@oss.sgi.com
Subject: Re: fw: [PATCH] fix instant oops with tracing enabled
Date: Wed, 15 Oct 2008 11:27:09 +1000	[thread overview]
Message-ID: <48F546ED.6050702@sgi.com> (raw)
In-Reply-To: <20081014131140.GB17351@lst.de>

Christoph Hellwig wrote:
> On Tue, Oct 14, 2008 at 10:40:15AM +1000, Mark Goodwin wrote:
>> Lachlan also saw some regressions after merging these patchsets :
>> . replace the mount inode list with radix tree traversals
>> . clean up sync code
> 
> What exactly?  I saw some softlookup in 042, but when applying Dave's
> xfs_sync_inodeS_ag fix (or the hal of it applying without the del inodes
> tracking in the radix tree) it goes away.

I saw this panic but I don't think it's related to the above patches:

[252921.307588] BUG: unable to handle kernel <3>BUG: scheduling while atomic: dd/16976/0xf101da90
[252921.307908] Modules linked in:
[252921.307911] Pid: 16976, comm: dd Not tainted 2.6.27-rc8 #183
[252921.307913] 
[252921.307913] Call Trace:
[252921.307920]  [<ffffffff8102fe22>] __schedule_bug+0x62/0x66
[252921.307923]  [<ffffffff8153dce1>] schedule+0x99/0x7c7
[252921.307925]  [<ffffffff8153e890>] schedule_timeout+0x22/0xb4
[252921.307929]  [<ffffffff810473f9>] ? add_wait_queue_exclusive+0x3c/0x41
[252921.307932]  [<ffffffff81198bc9>] xlog_state_get_iclog_space+0xe8/0x278
[252921.307934]  [<ffffffff8102de2d>] ? default_wake_function+0x0/0xf
[252921.307936]  [<ffffffff81198e6d>] xlog_write+0x114/0x579
[252921.307938]  [<ffffffff811761d5>] ? xfs_buf_item_pin+0x76/0x7b
[252921.307940]  [<ffffffff811993a7>] xfs_log_write+0x38/0x62
[252921.307943]  [<ffffffff811a4f6b>] _xfs_trans_commit+0x1fd/0x3c6
[252921.307945]  [<ffffffff81193e93>] xfs_iomap_write_allocate+0x2d5/0x387
[252921.307947]  [<ffffffff81194e07>] xfs_iomap+0x331/0x3ba
[252921.307950]  [<ffffffff811b0930>] xfs_map_blocks+0x30/0x69
[252921.307952]  [<ffffffff811b1a00>] xfs_page_state_convert+0x2e5/0x594
[252921.307953]  [<ffffffff811b1f1b>] ? xfs_vm_releasepage+0xae/0xbd
[252921.307955]  [<ffffffff811b1ff1>] xfs_vm_writepage+0xc7/0x109
[252921.307958]  [<ffffffff81080835>] shrink_page_list+0x31a/0x57c
[252921.307960]  [<ffffffff81080be3>] shrink_inactive_list+0x126/0x39d
[252921.307962]  [<ffffffff81080f3f>] shrink_zone+0xe5/0x10a
[252921.307964]  [<ffffffff81081436>] try_to_free_pages+0x248/0x3cf
[252921.307966]  [<ffffffff8108042f>] ? isolate_pages_global+0x0/0x34
[252921.307967]  [<ffffffff8107cc3c>] __alloc_pages_internal+0x262/0x3b6
[252921.307969]  [<ffffffff811b4284>] ? xfs_buf_get_flags+0x6b/0x165
[252921.307972]  [<ffffffff8109709f>] alloc_pages_current+0xb9/0xc2
[252921.307974]  [<ffffffff8109d66b>] new_slab+0x57/0x283
[252921.307975]  [<ffffffff8109daeb>] __slab_alloc+0x1e8/0x3dd
[252921.307977]  [<ffffffff811b0220>] ? kmem_zone_alloc+0x58/0xaa
[252921.307980]  [<ffffffff811638c1>] ? xfs_bmap_search_multi_extents+0x9a/0xda
[252921.307982]  [<ffffffff8109e07e>] kmem_cache_alloc+0x43/0x76
[252921.307983]  [<ffffffff811b0220>] kmem_zone_alloc+0x58/0xaa
[252921.307985]  [<ffffffff811b0281>] kmem_zone_zalloc+0xf/0x31
[252921.307986]  [<ffffffff811a555c>] _xfs_trans_alloc+0x25/0x5f
[252921.307988]  [<ffffffff811a562c>] xfs_trans_alloc+0x96/0xa1
[252921.307990]  [<ffffffff81193d05>] xfs_iomap_write_allocate+0x147/0x387
[252921.307991]  [<ffffffff81194db4>] ? xfs_iomap+0x2de/0x3ba
[252921.307993]  [<ffffffff81194e07>] xfs_iomap+0x331/0x3ba
[252921.307995]  [<ffffffff811b0930>] xfs_map_blocks+0x30/0x69
[252921.307996]  [<ffffffff811b1a00>] xfs_page_state_convert+0x2e5/0x594
[252921.307998]  [<ffffffff811b1ff1>] xfs_vm_writepage+0xc7/0x109
[252921.307999]  [<ffffffff8107cec6>] __writepage+0x12/0x2b
[252921.308001]  [<ffffffff8107d39a>] write_cache_pages+0x1b3/0x317
[252921.308003]  [<ffffffff8107ceb4>] ? __writepage+0x0/0x2b
[252921.308004]  [<ffffffff8107d51d>] generic_writepages+0x1f/0x25
[252921.308006]  [<ffffffff811b20ca>] xfs_vm_writepages+0x43/0x4b
[252921.308007]  [<ffffffff8107d54b>] do_writepages+0x28/0x37
[252921.308011]  [<ffffffff810bfd82>] __writeback_single_inode+0x145/0x29f
[252921.308015]  [<ffffffff812283c5>] ? prop_fraction_single+0x3d/0x5f
[252921.308017]  [<ffffffff810c0294>] generic_sync_sb_inodes+0x1d0/0x2ba
[252921.308019]  [<ffffffff810c0387>] sync_sb_inodes+0x9/0xb
[252921.308021]  [<ffffffff810c06f3>] writeback_inodes+0x64/0xad
[252921.308023]  [<ffffffff8107da26>] balance_dirty_pages_ratelimited_nr+0x16b/0x2dd
[252921.308027]  [<ffffffff8107769f>] generic_file_buffered_write+0x203/0x625
[252921.308028]  [<ffffffff8107c16d>] ? get_page_from_freelist+0x45e/0x5d0
[252921.308031]  [<ffffffff811b8b80>] ? xfs_rw_enter_trace+0xbf/0xca
[252921.308032]  [<ffffffff811b9641>] xfs_write+0x64f/0x9cf
[252921.308035]  [<ffffffff81076b4e>] ? find_lock_page+0x2b/0x61
[252921.308037]  [<ffffffff811b50c3>] __xfs_file_write+0x4c/0x4e
[252921.308038]  [<ffffffff811b50e9>] xfs_file_aio_write+0x11/0x13
[252921.308040]  [<ffffffff810a2f94>] do_sync_write+0xe2/0x126
[252921.308042]  [<ffffffff81084935>] ? __do_fault+0x326/0x36c
[252921.308044]  [<ffffffff810471d3>] ? autoremove_wake_function+0x0/0x38
[252921.308047]  [<ffffffff811e8618>] ? selinux_file_permission+0x10d/0x116
[252921.308050]  [<ffffffff811e1321>] ? security_file_permission+0x11/0x13
[252921.308052]  [<ffffffff810a3790>] vfs_write+0xae/0x157
[252921.308053]  [<ffffffff810a3c9e>] sys_write+0x47/0x6f
[252921.308055]  [<ffffffff8100bf3b>] system_call_fastpath+0x16/0x1b
[252921.308056] 
[252921.308125] paging request at ffff881829c85a78
[252921.308125] IP: [<ffffffff810297a3>] cpuacct_charge+0x2b/0x34
[252921.308125] PGD 202063 PUD 0 
[252921.308125] Oops: 0000 [1] SMP 


I saw sync get stuck in an infinite loop running test 042 - maybe the same
problem you saw.

[1]kdb> btp 7356
Stack traceback for pid 7356
0xffff880071d10740     7356     7390  1    2   R  0xffff880071d10ba8  sync
sp                ip                Function (args)
0xffff880058cc3c88 0xffffffff81540566 kdb_interrupt+0x66 (0xffff8800720e4ac4, 0x202, 0x0, 0xffff88007119b810, 0xffff880058cc3d48, 0xffff88007213deb8)
0xffff880058cc3ce8 0xffffffff8153ff8e _spin_unlock_irqrestore+0x8 (0xffff8800720e4ac4, 0x202)
0xffff880058cc3d20 0xffffffff81229b96 __down_read_trylock+0x3f (invalid)
0xffff880058cc3d40 0xffffffff8104a34d down_read_trylock+0x9
0xffff880058cc3d50 0xffffffff8118bcd9 xfs_ilock_nowait+0xaf (0xffff8800720e4a00, invalid)
0xffff880058cc3d80 0xffffffff811bc3d9 xfs_sync_inodes_ag+0x12a (0xffff88007119b800, invalid, invalid, 0x0)
0xffff880058cc3e00 0xffffffff811bc6ee xfs_sync_inodes+0x65 (0xffff88007119b800, invalid, 0x0)
0xffff880058cc3e40 0xffffffff811bc785 xfs_syncsub+0x67 (0xffff88007119b800, invalid, 0x0)
0xffff880058cc3e80 0xffffffff811bc9d0 xfs_sync+0x7d (0xffff88007119b800, invalid)
0xffff880058cc3eb0 0xffffffff811ba6b9 xfs_fs_sync_super+0x38 (0xffff88007e056000)
0xffff880058cc3f20 0xffffffff810a5311 sync_filesystems+0xb7 (invalid)
0xffff880058cc3f50 0xffffffff810c2deb do_sync+0x37 (0x1)
0xffff880058cc3f70 0xffffffff810c2e25 sys_sync+0xe
  not matched: from 0xffffffff8100bfad to 0xffffffff8100c025 drop_through 0 bb_jmp[7


I saw the panic in _xfs_itrace_exit() which has now been fixed.

And I also saw this assertion:

<4>[34770.626472] Assertion failed: (index >= 0) && (index < ktp->kt_nentries), file: fs/xfs/support/ktrace.c, line: 173
<0>[34770.626511] ------------[ cut here ]------------
<2>[34770.627419] kernel BUG at fs/xfs/support/debug.c:81!

[2]kdb> bt
Stack traceback for pid 400
0xffff88007e883a00      400        2  1    2   R  0xffff88007e883e68 *xfslogd/2
sp                ip                Function (args)
0xffff88007e66fbf8 0xffffffff811bd5d5 assfail+0x1a (invalid, invalid, invalid)
0xffff88007e66fc28 0xffffffff811bdb24 ktrace_enter+0x8b (invalid, invalid, invalid, invalid, invalid, invalid, invalid, invalid, invalid)
0xffff88007e66fc78 0xffffffff81175b35 xfs_buf_item_trace+0xe6 (0xffffffff816d8948, 0xffff88007c47cb40)
0xffff88007e66fd18 0xffffffff81175b60 xfs_buf_item_committed+0x1c (0xffff88007c47cb40, 0x100000b1f)
0xffff88007e66fd38 0xffffffff811a4766 xfs_trans_chunk_committed+0x60 (0xffff880050124780, 0x100000b1f, invalid)
0xffff88007e66fd98 0xffffffff811a4873 xfs_trans_committed+0x43 (0xffff880050124670, invalid)
0xffff88007e66fdc8 0xffffffff81197b2a xlog_state_do_callback+0x19a (0xffff88007ef78400, invalid, 0xffff88007ef79000)
0xffff88007e66fe38 0xffffffff81197d6d xlog_state_done_syncing+0xda (0xffff88007ef79000, invalid)
0xffff88007e66fe68 0xffffffff81198587 xlog_iodone+0x154 (0xffff88006ac37c80)
0xffff88007e66fe98 0xffffffff811b3afb xfs_buf_iodone_work+0x65 (invalid)
0xffff88007e66feb8 0xffffffff81043cfb run_workqueue+0x7c (0xffff88007e866b80)
0xffff88007e66fed8 0xffffffff81044711 worker_thread+0xd8 (0xffff88007e866b80)
0xffff88007e66ff28 0xffffffff810470a3 kthread+0x49 (invalid)
0xffff88007e66ff48 0xffffffff8100ce89 child_rip+0xa (invalid, invalid)


> 
>>> If that
>>> series is going to be included in the current round of checkins
>>> then this patch probably isn't needed.
>> The agreed plan for 2.6.28 still has the following patchsets to go in:
>>
>> . Combine the XFS and Linux inode structures V2
>> . Track reclaimable inodes in inode cache
>> . AIL cleanup and bug fixes
>> . Account for allocated blocks when expanding directories
>> . Check for valid transaction headers in recovery
>> . fix remount rw with unrecognized options
> 
> 
> 3-6 are small bug fixes and should go in ASAP.  I'd really like to see 1
> and 2 and volunter to help sorting out any fallout.  Not entirely sure
> about the AIL patches - they seem ready but at least they don't have
> much impact on everything else.   So if you really want to reduce the
> amount of patches those would be the ones.
> 
> 
>

next prev parent reply	other threads:[~2008-10-15  0:26 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-10-13 22:39 fw: [PATCH] fix instant oops with tracing enabled Dave Chinner
2008-10-14  0:40 ` Mark Goodwin
2008-10-14  2:04   ` Dave Chinner
2008-10-14 13:11   ` Christoph Hellwig
2008-10-15  1:27     ` Lachlan McIlroy [this message]
2008-10-15  0:54       ` Dave Chinner
2008-10-15  2:28         ` Lachlan McIlroy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=48F546ED.6050702@sgi.com \
    --to=lachlan@sgi.com \
    --cc=hch@lst.de \
    --cc=markgw@sgi.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox