All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christopher Li <ext3-user@chrisli.org>
To: Neil Brown <neilb@cse.unsw.edu.au>
Cc: ext3-users@redhat.com, linux-kernel@vger.kernel.org
Subject: Re: EXT3 deadlock in 2.4.22 and 2.4.23-pre7 - quota related?
Date: Mon, 27 Oct 2003 01:20:02 -0500	[thread overview]
Message-ID: <20031027062002.GA3544@64m.dyndns.org> (raw)
In-Reply-To: <16284.46552.776018.358472@notabene.cse.unsw.edu.au>

On Mon, Oct 27, 2003 at 05:06:16PM +1100, Neil Brown wrote:
> 
> The related kjournald is at:
>     kjournald Call Trace:    [sleep_on+75/124]
>          [journal_commit_transaction+357/4044] [do_IRQ+221/236]
>          [.text.lock.sched+131/471] [kjournald+326/540]
>          [commit_timeout+0/12] [arch_kernel_thread+40/56] 
> 
> This sleep_on is at line 87 in commit.c (journal_commit_transaction)
> where it is waiting for t_updates to be 0.  At this point, 
> t_state is T_LOCKED, so presumably those nfsd threads above are
> waiting on kjournald.  But what is kjournald really waiting for?

kjournald is wait for the current pending transaction to stop.

> My first though was the two nfsd threads  in:
>      nfsd Call Trace:    [sleep_on+75/124]
> 	  [log_wait_commit+74/136] [journal_stop+408/432]
> 	  [journal_force_commit+78/128] [ext3_force_commit+66/112]
> 	  [ext3_sync_file+128/144] [nfsd_sync_dir+49/72]
> 	  [nfsd_unlink+455/480] [nfsd_proc_remove+122/140]
> 	  [nfsd_dispatch+207/406] [svc_process+655/1264]
> 	  [nfsd+566/944] [arch_kernel_thread+40/56] 
> 
> that are waiting on j_wait_done_commit.  However they are doing that
> from journal_stop *after* journal_stop has decremented t_updates, so
> it doesn't seem likely that kjournald is waiting on that.

That is right.

> 
> Outside of nfsd, there is an rquotad program (locally written, not the
> standard one) that is :
> 
>       rquotad Call Trace:    [sleep_on+75/124]
>             [start_this_handle+205/368] [journal_start+149/196]
>             [ext3_dirty_inode+116/268] [__mark_inode_dirty+50/168]
>             [update_atime+75/80] [do_generic_file_read+1158/1172]
>             [generic_file_read+147/400] [file_read_actor+0/224]
>             [do_get_write_access+1382/1420] [v1_read_dqblk+121/196]
>             [read_dqblk+76/128] [dqget+344/484] [vfs_get_dqblk+21/64]
>             [v1_get_dqblk+39/172] [link_path_walk+2680/2956]
>             [do_compat_quotactl+417/688] [resolve_dev+89/108]
>             [sys_quotactl+166/275] [system_call+51/56] 
> 
> So it is trying to start a transaction to update the atime on the
> quota file, and has a lock on some quota structures thanks to
> "read_dqblk".

This guy is waiting the journal commit to be finished, seems harmless
to me.

> 
> At the same time, "sync" is running:
> 
>          sync Call Trace:    [__down+109/208] [__down_failed+8/12]
>              [.text.lock.dquot+73/286] [ext3_sync_dquot+337/462]
>              [vfs_quota_sync+102/372] [sync_dquots_dev+194/260]
>              [fsync_dev+66/128] [sys_sync+7/16] [system_call+51/56] 
> 
> and has started an ext3 transaction (in ext3_sync_dquot) and is trying
> to get the lock that rquotad has.

That seems wrong to me. It should get the lock before it start the
transasction. For the same reason that you can't lock_page inside
journal transasction, it is a ranking error. BTW, it seems that
current bk tree, truncate still do lock_page inside journal
transasction.

> 
> Presumably the transaction that sync has started is keeping t_updates
> greater than 0, thus preventing kjournald from progressing, and this
> preventing anyone else, including rquotad, from starting a new
> transaction.  Hence a deadlock.

That is right.

> 
> My guess is that ext3_sync_dquot doesn't need  ext3_journal_start at
> all but that isn't a well-informed guess.

I think you want to put ext3_sync_dquot to be atomic on power failure.  
The journal handle can get from ext3_current_journal_handle, which
used by writepage etc.

Chris

  reply	other threads:[~2003-10-27  9:22 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-10-27  6:06 EXT3 deadlock in 2.4.22 and 2.4.23-pre7 - quota related? Neil Brown
2003-10-27  6:20 ` Christopher Li [this message]
2003-10-27 10:01 ` Andrew Morton
2003-10-29  0:02   ` Jan Kara
2003-11-10 11:11   ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20031027062002.GA3544@64m.dyndns.org \
    --to=ext3-user@chrisli.org \
    --cc=ext3-users@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=neilb@cse.unsw.edu.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.