All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bob Peterson <rpeterso@redhat.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] GFS2 deadlock
Date: Mon, 5 Oct 2015 12:15:56 -0400 (EDT)	[thread overview]
Message-ID: <673421950.40526151.1444061756793.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <m2oagd5nxk.fsf@discipline.rit.edu>

----- Original Message -----
> We've just run into a deadlock.
> 
> It seems very similar to the one referenced in commit
> 44ad37d69b2cc421d5b5c7ad7fed16230685b092
> 
> is it possible that fs/gfs2/export.c:gfs2_get_dentry()
> 
> 140          inode = gfs2_ilookup(sb, inum->no_addr, 0);
> 
> should be:
> 
> 140          inode = gfs2_ilookup(sb, inum->no_addr, 1);
> 
> ?
> 
> I have a dump if more information would help.
> 
> same inode:
> this is gfs2_inode->i_iopen_gh->gh_gl
> G:  s:SH n:5/3157699 f:DIqob t:SH d:UN/104484397000 a:0 v:0 r:3 m:200
>  H: s:SH f:EH e:0 p:24919 [nfsd] gfs2_inode_lookup+0x10e/0x210 [gfs2]
> 
> this is gfs2_inode->i_gl
> G:  s:EX n:2/3157699 f:yIqob t:EX d:EX/0 a:0 v:0 r:4 m:200
>  H: s:EX f:H e:0 p:24920 [nfsd] gfs2_evict_inode+0x124/0x400 [gfs2]
>   I: n:81596/51738265 t:8 f:0x00 d:0x00000000 s:500
> 
> This is doing SEQ/PUTFH/GETATTR:
> 
> crash> bt
> PID: 24919  TASK: ffff881f9e11d160  CPU: 32  COMMAND: "nfsd"
>  #0 [ffff883f62443950] __schedule at ffffffff8165aaf4
>  #1 [ffff883f624439a0] schedule at ffffffff8165b1a7
>  #2 [ffff883f624439a8] __wait_on_freeing_inode at ffffffff811fbe1c
>  #3 [ffff883f62443a30] find_inode at ffffffff811fbed1
>  #4 [ffff883f62443a80] ilookup5_nowait at ffffffff811fbf61
>  #5 [ffff883f62443ab0] ilookup5 at ffffffff811fcb33
>  #6 [ffff883f62443ad0] gfs2_ilookup at ffffffffa080d1db [gfs2]
>  #7 [ffff883f62443af0] gfs2_get_dentry at ffffffffa0806a11 [gfs2]
>  #8 [ffff883f62443b10] gfs2_fh_to_dentry at ffffffffa0806b2c [gfs2]
>  #9 [ffff883f62443b30] exportfs_decode_fh at ffffffff81262ef2
> #10 [ffff883f62443ca0] fh_verify at ffffffffa057e977 [nfsd]
> #11 [ffff883f62443d20] nfsd4_putfh at ffffffffa058ce6d [nfsd]
> #12 [ffff883f62443d50] nfsd4_proc_compound at ffffffffa058ed57 [nfsd]
> #13 [ffff883f62443db0] nfsd_dispatch at ffffffffa057af83 [nfsd]
> #14 [ffff883f62443df0] svc_process_common at ffffffffa01a2bb0 [sunrpc]
> #15 [ffff883f62443e60] svc_process at ffffffffa01a2f53 [sunrpc]
> #16 [ffff883f62443e90] nfsd at ffffffffa057a98f [nfsd]
> #17 [ffff883f62443ec0] kthread at ffffffff81096919
> #18 [ffff883f62443f50] ret_from_fork at ffffffff8165f3a2
> 
> This is doing SEQ/PUTFH/REMOVE:
> 
> crash> bt
> PID: 24920  TASK: ffff881febf843d0  CPU: 32  COMMAND: "nfsd"
>  #0 [ffff883f62447a00] __schedule at ffffffff8165aaf4
>  #1 [ffff883f62447a50] schedule at ffffffff8165b1a7
>  #2 [ffff883f62447a58] bit_wait at ffffffff8165b9bc
>  #3 [ffff883f62447a70] bit_wait at ffffffff8165b9bc
>  #4 [ffff883f62447a80] __wait_on_bit at ffffffff8165b645
>  #5 [ffff883f62447ad0] out_of_line_wait_on_bit at ffffffff8165b6e2
>  #6 [ffff883f62447b40] gfs2_glock_dq_wait at ffffffffa07ff4f3 [gfs2]
>  #7 [ffff883f62447b60] gfs2_evict_inode at ffffffffa0818111 [gfs2]
>  #8 [ffff883f62447bf0] evict at ffffffff811fc9eb
>  #9 [ffff883f62447c20] iput at ffffffff811fd34b
> #10 [ffff883f62447c50] d_delete at ffffffff811f8c58
> #11 [ffff883f62447c80] vfs_unlink at ffffffff811ee8f9
> #12 [ffff883f62447cd0] nfsd_unlink at ffffffffa0580dcf [nfsd]
> #13 [ffff883f62447d10] nfsd4_remove at ffffffffa058debd [nfsd]
> #14 [ffff883f62447d50] nfsd4_proc_compound at ffffffffa058ed57 [nfsd]
> #15 [ffff883f62447db0] nfsd_dispatch at ffffffffa057af83 [nfsd]
> #16 [ffff883f62447df0] svc_process_common at ffffffffa01a2bb0 [sunrpc]
> #17 [ffff883f62447e60] svc_process at ffffffffa01a2f53 [sunrpc]
> #18 [ffff883f62447e90] nfsd at ffffffffa057a98f [nfsd]
> #19 [ffff883f62447ec0] kthread at ffffffff81096919
> #20 [ffff883f62447f50] ret_from_fork at ffffffff8165f3a2
> 
> Thanks,
> 
> Andy
> 
> --
> Andrew W. Elble
> aweits at discipline.rit.edu
> Infrastructure Engineer, Communications Technical Lead
> Rochester Institute of Technology
> PGP: BFAD 8461 4CCF DC95 DA2C B0EB 965B 082E 863E C912

Hi Andy,

Can you tell me how you recreated this problem? Seems like a test
we should automate and check regularly in our regression testing.

At any rate, the nfs code path is the only one that calls gfs2_ilookup
with non_block set to 0. So if we do that, we might as well get rid
of the parameter entirely. I suspect your problem goes deeper than
this, and I'd like to understand the problem in more detail.

At any rate, you're right: my latest set of patches will hopefully
eliminate the problem and allow for a smoother transition from unlinked
to deleted. If there's still a problem, I want to know about it and
recreate it as soon as possible.

Regards,

Bob Peterson
Red Hat File Systems



  parent reply	other threads:[~2015-10-05 16:15 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-05 15:34 [Cluster-devel] GFS2 deadlock Andrew W Elble
2015-10-05 16:03 ` Andrew W Elble
2015-10-05 16:15 ` Bob Peterson [this message]
2015-10-05 17:10   ` Andrew W Elble

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=673421950.40526151.1444061756793.JavaMail.zimbra@redhat.com \
    --to=rpeterso@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.