All of lore.kernel.org
 help / color / mirror / Atom feed
From: Timothy Shimmin <tes@sgi.com>
To: David Chinner <dgc@sgi.com>
Cc: markgw@sgi.com, Eric Sandeen <sandeen@sandeen.net>,
	xfs-oss <xfs@oss.sgi.com>
Subject: Re: deadlocked xfs
Date: Fri, 11 Jul 2008 14:08:12 +1000	[thread overview]
Message-ID: <4876DCAC.5040903@sgi.com> (raw)
In-Reply-To: <4876D872.2060408@sgi.com>

Mark Goodwin wrote:
> 
> 
> Dave Chinner wrote:
>> On Fri, Jul 11, 2008 at 12:48:11PM +1000, Mark Goodwin wrote:
>>> Thanks for the report Eric. This looks very similar to a
>>> deadlock Lachlan recently hit in the patch for
>>> "Use atomics for iclog reference counting"
>>> http://oss.sgi.com/archives/xfs/2008-02/msg00130.html
>>>
>>> It seems this patch can cause deadlocks under heavy log traffic.
>>> I don't think anyone has a fix yet ... Lachlan is out this week,
>>> but Tim can follow-up here ...
>>
>> Nice to know - why didn't anyone email me or report this to the
>> list when the bug was first found? I mean, I wrote that code, I know
>> what it is supposed to be doing and as a result should be able
> 
> Only recently found and didn't think it was this easy to hit.
> But no excuses ...
> 
>> help find and fix the bug. Can you please post what details you have
>> about the problem (test case, stack traces, debugging info, etc)
>> so I can try to find the problem.
> 
> See Tim's follow-up.
> 

Remaining info from bug....

> While stress testing I'm frequently hitting a deadlock in the log code where
> many threads are stuck in xlog_state_get_iclog_space().
> 
> Stack traceback for pid 16570
> 0xffff8100421fa4c0    16570     4620  0    2   D  0xffff8100421fa810  rm
> rsp                rip                Function (args)
> 0xffff81003c41d908 0xffffffff8062be32 thread_return
> 0xffff81003c41d9a0 0xffffffff8062c205 schedule_timeout+0x1e (0x7fffffffffffffff)
> 0xffff81003c41da10 0xffffffff803954f0 xlog_state_get_iclog_space+0xed (0xffff81007d33d650, invalid, 0xffff81003c41db00, 0xffff81007c1d39c0, 0xffff81003c41db08, 0xffff81003c41db0c)
> 0xffff81003c41daa0 0xffffffff8039578d xlog_write+0x118 (0xffff81007c97d338, 0xffff81003c41db58, invalid, 0xffff81007c1d39c0, 0xffff81003c41dba0, 0xffff81003c41dce0, 0xffff810000000002)
> 0xffff81003c41db40 0xffffffff80395cb4 xlog_commit_record+0x6e (0xffff81007c97d338, 0xffff81007c1d39c0, 0xffff81003c41dce0, 0xffff81003c41dba0)
> 0xffff81003c41db90 0xffffffff80396c31 xfs_log_done+0x34 (invalid, 0xffff81007c1d39c0, invalid, invalid)
> 0xffff81003c41dbc0 0xffffffff803a1271 _xfs_trans_commit+0x24a (0xffff81007a99c780, invalid, 0x0)
> 0xffff81003c41dd20 0xffffffff8035eb2e xfs_bmap_finish+0xaa (0xffff81003c41de38, 0xffff81003c41ddc8, 0xffff81003c41dde0)
> 0xffff81003c41dd80 0xffffffff8038ae2b xfs_itruncate_finish+0x376 (0xffff81003c41de38, 0xffff81007b9c5670, invalid, invalid, 0x1)
> 0xffff81003c41de20 0xffffffff803ab720 xfs_inactive+0x27d (0xffff81007b9c5670)
> 0xffff81003c41de70 0xffffffff803b663e xfs_fs_clear_inode+0xbd (0xffff810073cf0438)
> 0xffff81003c41de90 0xffffffff802964af clear_inode+0x9c (0xffff810073cf0438)
> 0xffff81003c41dea0 0xffffffff80296598 generic_delete_inode+0x92 (0xffff810073cf0438)
> 0xffff81003c41dec0 0xffffffff8028d953 do_unlinkat+0xdd (invalid, invalid)
> 0xffff81003c41df80 0xffffffff8020ae4b system_call_after_swapgs+0x7b (invalid, invalid, invalid, invalid, invalid, invalid)
> Enter <q> to end, <cr> to continue:
> Stack traceback for pid 24088
> 0xffff81003b804a00    24088     4649  0    6   D  0xffff81003b804d50  xfs_io
> rsp                rip                Function (args)
> 0xffff81003e403288 0xffffffff8062be32 thread_return
> 0xffff81003e403320 0xffffffff8062c205 schedule_timeout+0x1e (0x7fffffffffffffff)
> 0xffff81003e403390 0xffffffff803954f0 xlog_state_get_iclog_space+0xed (0xffff81007d33d650, invalid, 0xffff81003e403480, 0xffff8100741499c0, 0xffff81003e403488, 0xffff81003e40348c)
> 0xffff81003e403420 0xffffffff8039578d xlog_write+0x118 (0xffff81007c97d338, 0xffff81003e4034f8, invalid, 0xffff8100741499c0, 0xffff81003e5b9858, 0x0, 0x0)
> 0xffff81003e4034c0 0xffffffff80395d13 xfs_log_write+0x33 (0xffff81007c97d338)
> 0xffff81003e4034e0 0xffffffff803a1256 _xfs_trans_commit+0x22f (0xffff81003e5b9810, invalid, 0x0)
> 0xffff81003e403640 0xffffffff8038fdba xfs_iomap_write_allocate+0x2eb (0xffff81007b9e7c90, invalid, invalid, 0xffff81003e403788, 0xffff81003e4037b4)
> 0xffff81003e403730 0xffffffff80390dd6 xfs_iomap+0x35a (0xffff81007b9e7c90, 0x19edd000, invalid, invalid, 0xffff81003e403898, 0xffff81003e403804)
> 0xffff81003e4037f0 0xffffffff803acf45 xfs_map_blocks+0x2c (invalid, invalid, invalid, invalid, invalid)
> 0xffff81003e403820 0xffffffff803ae04e xfs_page_state_convert+0x28b (0xffff81007b9e6c18, 0xffffe200028ac0f0, 0xffff81003e403b58, invalid, 0x100000000)
> 0xffff81003e403910 0xffffffff803ae61e xfs_vm_writepage+0xca (0xffffe200028ac0f0, 0xffff81003e403b58)
> 0xffff81003e403940 0xffffffff80260c16 __writepage+0xd (invalid, invalid, 0xffff81007b9e6e00)
> 0xffff81003e403950 0xffffffff8026112c write_cache_pages+0x19e (0xffff81007b9e6e00, 0xffff81003e403b58, 0xffffffff80260c09, 0xffff81007b9e6e00)
> 0xffff81003e403a50 0xffffffff802612a0 do_writepages+0x23 (invalid, 0xffff81003e403b58)
> 0xffff81003e403a60 0xffffffff8029ebca __writeback_single_inode+0x150 (0xffff81007b9e6c18, 0xffff81003e403b58)
> 0xffff81003e403ae0 0xffffffff8029f124 sync_sb_inodes+0x1b4 (0xffff81007c15d3f8, 0xffff81003e403b58)
> 0xffff81003e403b20 0xffffffff8029f406 writeback_inodes+0x81 (0xffff81003e403b58)
> 0xffff81003e403b40 0xffffffff802616f9 balance_dirty_pages_ratelimited_nr+0x15e (invalid, invalid)
> 0xffff81003e403bf0 0xffffffff8025bcb6 generic_file_buffered_write+0x20c (0xffff81003e403df8, 0xffff81003e403ee8, 0x1, 0x3c7c8000, 0xffff81003e403e78, invalid, 0x0)
> 0xffff81003e403cf0 0xffffffff803b56b4 xfs_write+0x68a (0xffff81007b58d360, 0xffff81003e403df8, 0xffff81003e403ee8, 0x1, 0xffff81003e403e78, invalid)
> 0xffff81003e403df0 0xffffffff8028411f do_sync_write+0xc9 (0xffff810079de9530, 0x51d000, 0x1000, 0xffff81003e403f48)
> 0xffff81003e403f10 0xffffffff8028485f vfs_write+0xad (0xffff810079de9530, 0x51d000, invalid, 0xffff81003e403f48)
> 0xffff81003e403f40 0xffffffff80284e87 sys_pwrite64+0x50 (invalid, 0x51d000, 0x1000, 0x3c7c8000)
> 0xffff81003e403f80 0xffffffff8020ae4b system_call_after_swapgs+0x7b (invalid, invalid, invalid, invalid, invalid, invalid)
> 
> Taking a look at the log and iclog structures...
> 
> [2]kdb> xlog 0xffff81007d33d650
> xlog at 0xffff81007d33d650
> &flush_wait: 0xffff81007d33d6d0  ICLOG: 0xffff81007c193710  
> &icloglock: 0xffff81007d33d720  tail_lsn: [58:173319]  last_sync_lsn: [58:173319] 
> mp: 0xffff81007c97d338  xbuf: 0xffff81007c8e9e08  l_covered_state: need 
> flags: log 0x0 <>   logBBstart: 0 logsize: 134217728 logBBsize: 262144
> curr_cycle: 58  prev_cycle: 58  curr_block: 213344  prev_block: 213280
> iclog_bak: 0xffff81007d33d830  iclog_size: 0x8000 (32768)  num iclogs: 8
> l_iclog_hsize 512 l_iclog_heads 1
> l_sectbb_log 0 l_sectbb_mask 0
> &grant_lock: 0xffff81007d33d7d0  resHeadQ: 0x0000000000000000  wrHeadQ: 0x0000000000000000
> GResCycle: 58  GResBytes: 110263292  GWrCycle: 58  GWrBytes: 110263292
> GResBlocks: 215357 GResRemain: 508  GWrBlocks: 215357 GWrRemain: 508
> trace: 0x0000000000000000  grant_trace: use xlog value
> [2]kdb> 
> [2]kdb> xicall 0xffff81007c193710
> xlog_in_core/header at 0xffff81007c193710/0xffffc2001110d000
> magicno: feedbabe  cycle: 58  version: 2  lsn: 0x3a00033f60
> tail_lsn: 0x3a0000d587  len: 32256  prev_block: 212768  num_ops: 92
> cycle_data: 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 
> size: 32768
> 
> --------------------------------------------------
> data: 0xffffc2001110d200  &forcesema: 0xffff81007c193710  next: 0xffff81007c1932f8 bp: 0xffff81007d102608
> log: 0xffff81007d33d650  callb: 0xffff810077915898  callb_tail: 0xffff81003e43cad8
> size: 32256 (OFFSET: 32240) trace: 0xffff81007aa78560 refcnt: 0 bwritecnt: 0 state: 0x2 <WANT_SYNC > 
> =================================================
> xlog_in_core/header at 0xffff81007c1932f8/0xffffc20011116000
> magicno: feedbabe  cycle: 58  version: 2  lsn: 0x3a00033fa0
> tail_lsn: 0x3a0000d587  len: 32256  prev_block: 212832  num_ops: 99
> cycle_data: 7c1d39c0  127c8  121  121  121  121  122  122  123  123  36120100  3412  0  0  0  0  0  0  0  0  69020000  5  41425443  1c84  4  16  155  60  0  1cd9400  1d05600  1d24600  1d52a00  1e3 
> size: 32768
> 
> --------------------------------------------------
> data: 0xffffc20011116200  &forcesema: 0xffff81007c1932f8  next: 0xffff81007d297610 bp: 0xffff81007d102408
> log: 0xffff81007d33d650  callb: 0xffff81007206b898  callb_tail: 0xffff8100715010f8
> size: 32256 (OFFSET: 32256) trace: 0xffff81007aa781d0 refcnt: 0 bwritecnt: 0 state: 0x10 <DO_CALLBACK > 
> =================================================
> xlog_in_core/header at 0xffff81007d297610/0xffffc2001111f000
> magicno: feedbabe  cycle: 58  version: 2  lsn: 0x3a00033fe0
> tail_lsn: 0x3a0000d587  len: 32256  prev_block: 212896  num_ops: 102
> cycle_data: 7c1d39c0  2dd  3d  547a  1f7d  0  24  2000000  2f6  1  11  1  4a  6730  72d9  631a  6d46  db00  0  6240000d  8400000b  9fa00025  d2e00010  c2c00001  3400001  3da00001  85600001  100000 
> size: 32768
> 
> --------------------------------------------------
> data: 0xffffc2001111f200  &forcesema: 0xffff81007d297610  next: 0xffff81007d296198 bp: 0xffff81007d102208
> log: 0xffff81007d33d650  callb: 0xffff81007aac3548  callb_tail: 0xffff81006cdf2788
> size: 32256 (OFFSET: 32256) trace: 0xffff81007a9ef860 refcnt: 0 bwritecnt: 0 state: 0x10 <DO_CALLBACK > 
> =================================================
> xlog_in_core/header at 0xffff81007d296198/0xffffc20011128000
> magicno: feedbabe  cycle: 58  version: 2  lsn: 0x3a00034020
> tail_lsn: 0x3a0000d587  len: 32256  prev_block: 212960  num_ops: 105
> cycle_data: 7aa7bcd8  0  7e00001  4de00001  4d000001  56600001  49400001  5ae00001  9f600001  4f800001  1f03000  0  1  3  5  a  1f  4347  55b4  5e62  644a  716a  df1400  e59800  ea0600  efc000  f9 
> size: 32768
> 
> --------------------------------------------------
> data: 0xffffc20011128200  &forcesema: 0xffff81007d296198  next: 0xffff81007e59cd20 bp: 0xffff81007d102008
> log: 0xffff81007d33d650  callb: 0xffff81006cd5b798  callb_tail: 0xffff81006c4d10f8
> size: 32256 (OFFSET: 32256) trace: 0xffff81007aa78680 refcnt: 0 bwritecnt: 0 state: 0x10 <DO_CALLBACK > 
> =================================================
> xlog_in_core/header at 0xffff81007e59cd20/0xffffc20000354000
> magicno: feedbabe  cycle: 58  version: 2  lsn: 0x3a00034060
> tail_lsn: 0x3a0000d587  len: 32256  prev_block: 213024  num_ops: 93
> cycle_data: 7aa7bcd8  0  91400001  46600001  67200001  8800001  49400001  5ae00001  9f600001  4f800001  0  4fe00001  79c00001  99600001  c9c00001  b4e0000d  ede0000d  31000017  7580003a  0  5f60   
> size: 32768
> 
> --------------------------------------------------
> data: 0xffffc20000354200  &forcesema: 0xffff81007e59cd20  next: 0xffff81007e59c4f0 bp: 0xffff81007d102e08
> log: 0xffff81007d33d650  callb: 0xffff81003fd86068  callb_tail: 0xffff81003e4fca58
> size: 32256 (OFFSET: 32256) trace: 0xffff81007a9efef0 refcnt: 0 bwritecnt: 0 state: 0x10 <DO_CALLBACK > 
> =================================================
> xlog_in_core/header at 0xffff81007e59c4f0/0xffffc2000035d000
> magicno: feedbabe  cycle: 58  version: 2  lsn: 0x3a000340a0
> tail_lsn: 0x3a0000d587  len: 32256  prev_block: 213088  num_ops: 88
> cycle_data: 7c1d39c0  127c8  121  121  121  121  122  122  123  123  36120100  5d10  6a72  1  3  2030000  ba00001  35600001  3a800001  d6800001  dd200001  e0800001  83000800  6127  6164  58b7  72d 
> size: 32768
> 
> --------------------------------------------------
> data: 0xffffc2000035d200  &forcesema: 0xffff81007e59c4f0  next: 0xffff81007d5084f0 bp: 0xffff81007d102c08
> log: 0xffff81007d33d650  callb: 0xffff81006a091818  callb_tail: 0xffff81007206b548
> size: 32256 (OFFSET: 32256) trace: 0xffff81007aa786e0 refcnt: 0 bwritecnt: 0 state: 0x10 <DO_CALLBACK > 
> =================================================
> xlog_in_core/header at 0xffff81007d5084f0/0xffffc20000366000
> magicno: feedbabe  cycle: 58  version: 2  lsn: 0x3a000340e0
> tail_lsn: 0x3a0000d587  len: 32256  prev_block: 213152  num_ops: 101
> cycle_data: 7aa7bcd8  5f75  6af1  1  3  e359cb09  4  7  11  143  1c  0  1ceba00  1d10800  1d29800  1d5f800  1e53000  1e97000  1ed1800  1f16400  7936448  35  de7a00  e4c800  e9dc00  ee3c00  f7a800  
> size: 32768
> 
> --------------------------------------------------
> data: 0xffffc20000366200  &forcesema: 0xffff81007d5084f0  next: 0xffff81007e59c908 bp: 0xffff81007d102a08
> log: 0xffff81007d33d650  callb: 0xffff8100778a6da8  callb_tail: 0xffff8100779144b8
> size: 32256 (OFFSET: 32256) trace: 0xffff81007a981b60 refcnt: 0 bwritecnt: 0 state: 0x10 <DO_CALLBACK > 
> =================================================
> xlog_in_core/header at 0xffff81007e59c908/0xffffc2000036f000
> magicno: feedbabe  cycle: 58  version: 2  lsn: 0x3a00034120
> tail_lsn: 0x3a0000d587  len: 32256  prev_block: 213216  num_ops: 118
> cycle_data: 7aa7bcd8  5f75  6af1  1  3  0  1f01  7b  79  c42c00  0  0  0  0  0  0  0  0  0  4ce00001  b174  80000000  80000000  80000000  80000000  0  0  0  0  68800010  72e6  2000000  31c  1  8c  
> size: 32768
> 
> --------------------------------------------------
> data: 0xffffc2000036f200  &forcesema: 0xffff81007e59c908  next: 0xffff81007c193710 bp: 0xffff81007d102808
> log: 0xffff81007d33d650  callb: 0xffff81006a1c8168  callb_tail: 0xffff81004714d0f8
> size: 32256 (OFFSET: 32256) trace: 0xffff81007aa78f80 refcnt: 0 bwritecnt: 0 state: 0x10 <DO_CALLBACK > 
> =================================================

      parent reply	other threads:[~2008-07-11  4:07 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-07-11  2:33 deadlocked xfs Eric Sandeen
2008-07-11  2:48 ` Mark Goodwin
2008-07-11  2:56   ` Eric Sandeen
2008-07-11  3:21   ` Timothy Shimmin
2008-07-11  4:04     ` Dave Chinner
2008-07-11  4:10       ` Eric Sandeen
2008-07-11  4:13         ` Timothy Shimmin
2008-07-11  4:18           ` Eric Sandeen
2008-07-11  4:26             ` Eric Sandeen
2008-07-11  4:27             ` Dave Chinner
2008-07-11  4:17       ` Christoph Hellwig
2008-07-11  4:44       ` Timothy Shimmin
2008-07-11  3:22   ` Dave Chinner
2008-07-11  3:50     ` Mark Goodwin
2008-07-11  4:02       ` Eric Sandeen
2008-07-11  4:05       ` Dave Chinner
2008-07-11  4:08       ` Timothy Shimmin [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4876DCAC.5040903@sgi.com \
    --to=tes@sgi.com \
    --cc=dgc@sgi.com \
    --cc=markgw@sgi.com \
    --cc=sandeen@sandeen.net \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.