cluster-devel.redhat.com archive mirror
 help / color / mirror / Atom feed
* [Cluster-devel] [gfs2:gfs2-iopen 12/12] fs/gfs2/util.c:126:3: error: implicit declaration of function 'gfs2_glock_dq_wait'; did you mean 'gfs2_glock_nq_init'?
@ 2020-04-17  7:36 kbuild test robot
  2020-04-17 12:23 ` Andreas Gruenbacher
  0 siblings, 1 reply; 5+ messages in thread
From: kbuild test robot @ 2020-04-17  7:36 UTC (permalink / raw)
  To: cluster-devel.redhat.com

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git gfs2-iopen
head:   c748e1ec9bd20d71265a148042f6bc97ffc5f343
commit: c748e1ec9bd20d71265a148042f6bc97ffc5f343 [12/12] gfs2: Remove unused function gfs2_glock_dq_wait
config: s390-defconfig (attached as .config)
compiler: s390-linux-gcc (GCC) 9.3.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        git checkout c748e1ec9bd20d71265a148042f6bc97ffc5f343
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day GCC_VERSION=9.3.0 make.cross ARCH=s390 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   fs/gfs2/util.c: In function 'signal_our_withdraw':
>> fs/gfs2/util.c:126:3: error: implicit declaration of function 'gfs2_glock_dq_wait'; did you mean 'gfs2_glock_nq_init'? [-Werror=implicit-function-declaration]
     126 |   gfs2_glock_dq_wait(&sdp->sd_journal_gh);
         |   ^~~~~~~~~~~~~~~~~~
         |   gfs2_glock_nq_init
   cc1: some warnings being treated as errors

vim +126 fs/gfs2/util.c

0d91061a372671a Bob Peterson 2019-02-18   92  
601ef0d52e96175 Bob Peterson 2020-01-28   93  static void signal_our_withdraw(struct gfs2_sbd *sdp)
601ef0d52e96175 Bob Peterson 2020-01-28   94  {
601ef0d52e96175 Bob Peterson 2020-01-28   95  	struct gfs2_glock *gl = sdp->sd_live_gh.gh_gl;
601ef0d52e96175 Bob Peterson 2020-01-28   96  	struct inode *inode = sdp->sd_jdesc->jd_inode;
601ef0d52e96175 Bob Peterson 2020-01-28   97  	struct gfs2_inode *ip = GFS2_I(inode);
601ef0d52e96175 Bob Peterson 2020-01-28   98  	u64 no_formal_ino = ip->i_no_formal_ino;
601ef0d52e96175 Bob Peterson 2020-01-28   99  	int ret = 0;
601ef0d52e96175 Bob Peterson 2020-01-28  100  	int tries;
601ef0d52e96175 Bob Peterson 2020-01-28  101  
601ef0d52e96175 Bob Peterson 2020-01-28  102  	if (test_bit(SDF_NORECOVERY, &sdp->sd_flags))
601ef0d52e96175 Bob Peterson 2020-01-28  103  		return;
601ef0d52e96175 Bob Peterson 2020-01-28  104  
601ef0d52e96175 Bob Peterson 2020-01-28  105  	/* Prevent any glock dq until withdraw recovery is complete */
601ef0d52e96175 Bob Peterson 2020-01-28  106  	set_bit(SDF_WITHDRAW_RECOVERY, &sdp->sd_flags);
601ef0d52e96175 Bob Peterson 2020-01-28  107  	/*
601ef0d52e96175 Bob Peterson 2020-01-28  108  	 * Don't tell dlm we're bailing until we have no more buffers in the
601ef0d52e96175 Bob Peterson 2020-01-28  109  	 * wind. If journal had an IO error, the log code should just purge
601ef0d52e96175 Bob Peterson 2020-01-28  110  	 * the outstanding buffers rather than submitting new IO. Making the
601ef0d52e96175 Bob Peterson 2020-01-28  111  	 * file system read-only will flush the journal, etc.
601ef0d52e96175 Bob Peterson 2020-01-28  112  	 *
601ef0d52e96175 Bob Peterson 2020-01-28  113  	 * During a normal unmount, gfs2_make_fs_ro calls gfs2_log_shutdown
601ef0d52e96175 Bob Peterson 2020-01-28  114  	 * which clears SDF_JOURNAL_LIVE. In a withdraw, we must not write
601ef0d52e96175 Bob Peterson 2020-01-28  115  	 * any UNMOUNT log header, so we can't call gfs2_log_shutdown, and
601ef0d52e96175 Bob Peterson 2020-01-28  116  	 * therefore we need to clear SDF_JOURNAL_LIVE manually.
601ef0d52e96175 Bob Peterson 2020-01-28  117  	 */
601ef0d52e96175 Bob Peterson 2020-01-28  118  	clear_bit(SDF_JOURNAL_LIVE, &sdp->sd_flags);
601ef0d52e96175 Bob Peterson 2020-01-28  119  	if (!sb_rdonly(sdp->sd_vfs))
601ef0d52e96175 Bob Peterson 2020-01-28  120  		ret = gfs2_make_fs_ro(sdp);
601ef0d52e96175 Bob Peterson 2020-01-28  121  
601ef0d52e96175 Bob Peterson 2020-01-28  122  	/*
601ef0d52e96175 Bob Peterson 2020-01-28  123  	 * Drop the glock for our journal so another node can recover it.
601ef0d52e96175 Bob Peterson 2020-01-28  124  	 */
601ef0d52e96175 Bob Peterson 2020-01-28  125  	if (gfs2_holder_initialized(&sdp->sd_journal_gh)) {
601ef0d52e96175 Bob Peterson 2020-01-28 @126  		gfs2_glock_dq_wait(&sdp->sd_journal_gh);
601ef0d52e96175 Bob Peterson 2020-01-28  127  		gfs2_holder_uninit(&sdp->sd_journal_gh);
601ef0d52e96175 Bob Peterson 2020-01-28  128  	}
601ef0d52e96175 Bob Peterson 2020-01-28  129  	sdp->sd_jinode_gh.gh_flags |= GL_NOCACHE;
601ef0d52e96175 Bob Peterson 2020-01-28  130  	gfs2_glock_dq(&sdp->sd_jinode_gh);
601ef0d52e96175 Bob Peterson 2020-01-28  131  	if (test_bit(SDF_FS_FROZEN, &sdp->sd_flags)) {
601ef0d52e96175 Bob Peterson 2020-01-28  132  		/* Make sure gfs2_unfreeze works if partially-frozen */
601ef0d52e96175 Bob Peterson 2020-01-28  133  		flush_workqueue(gfs2_freeze_wq);
601ef0d52e96175 Bob Peterson 2020-01-28  134  		atomic_set(&sdp->sd_freeze_state, SFS_FROZEN);
601ef0d52e96175 Bob Peterson 2020-01-28  135  		thaw_super(sdp->sd_vfs);
601ef0d52e96175 Bob Peterson 2020-01-28  136  	} else {
601ef0d52e96175 Bob Peterson 2020-01-28  137  		wait_on_bit(&gl->gl_flags, GLF_DEMOTE, TASK_UNINTERRUPTIBLE);
601ef0d52e96175 Bob Peterson 2020-01-28  138  	}
601ef0d52e96175 Bob Peterson 2020-01-28  139  
601ef0d52e96175 Bob Peterson 2020-01-28  140  	/*
601ef0d52e96175 Bob Peterson 2020-01-28  141  	 * holder_uninit to force glock_put, to force dlm to let go
601ef0d52e96175 Bob Peterson 2020-01-28  142  	 */
601ef0d52e96175 Bob Peterson 2020-01-28  143  	gfs2_holder_uninit(&sdp->sd_jinode_gh);
601ef0d52e96175 Bob Peterson 2020-01-28  144  
601ef0d52e96175 Bob Peterson 2020-01-28  145  	/*
601ef0d52e96175 Bob Peterson 2020-01-28  146  	 * Note: We need to be careful here:
601ef0d52e96175 Bob Peterson 2020-01-28  147  	 * Our iput of jd_inode will evict it. The evict will dequeue its
601ef0d52e96175 Bob Peterson 2020-01-28  148  	 * glock, but the glock dq will wait for the withdraw unless we have
601ef0d52e96175 Bob Peterson 2020-01-28  149  	 * exception code in glock_dq.
601ef0d52e96175 Bob Peterson 2020-01-28  150  	 */
601ef0d52e96175 Bob Peterson 2020-01-28  151  	iput(inode);
601ef0d52e96175 Bob Peterson 2020-01-28  152  	/*
601ef0d52e96175 Bob Peterson 2020-01-28  153  	 * Wait until the journal inode's glock is freed. This allows try locks
601ef0d52e96175 Bob Peterson 2020-01-28  154  	 * on other nodes to be successful, otherwise we remain the owner of
601ef0d52e96175 Bob Peterson 2020-01-28  155  	 * the glock as far as dlm is concerned.
601ef0d52e96175 Bob Peterson 2020-01-28  156  	 */
601ef0d52e96175 Bob Peterson 2020-01-28  157  	if (gl->gl_ops->go_free) {
601ef0d52e96175 Bob Peterson 2020-01-28  158  		set_bit(GLF_FREEING, &gl->gl_flags);
601ef0d52e96175 Bob Peterson 2020-01-28  159  		wait_on_bit(&gl->gl_flags, GLF_FREEING, TASK_UNINTERRUPTIBLE);
601ef0d52e96175 Bob Peterson 2020-01-28  160  	}
601ef0d52e96175 Bob Peterson 2020-01-28  161  
601ef0d52e96175 Bob Peterson 2020-01-28  162  	if (sdp->sd_lockstruct.ls_ops->lm_lock == NULL) { /* lock_nolock */
601ef0d52e96175 Bob Peterson 2020-01-28  163  		clear_bit(SDF_WITHDRAW_RECOVERY, &sdp->sd_flags);
601ef0d52e96175 Bob Peterson 2020-01-28  164  		goto skip_recovery;
601ef0d52e96175 Bob Peterson 2020-01-28  165  	}
601ef0d52e96175 Bob Peterson 2020-01-28  166  	/*
601ef0d52e96175 Bob Peterson 2020-01-28  167  	 * Dequeue the "live" glock, but keep a reference so it's never freed.
601ef0d52e96175 Bob Peterson 2020-01-28  168  	 */
601ef0d52e96175 Bob Peterson 2020-01-28  169  	gfs2_glock_hold(gl);
601ef0d52e96175 Bob Peterson 2020-01-28  170  	gfs2_glock_dq_wait(&sdp->sd_live_gh);
601ef0d52e96175 Bob Peterson 2020-01-28  171  	/*
601ef0d52e96175 Bob Peterson 2020-01-28  172  	 * We enqueue the "live" glock in EX so that all other nodes
601ef0d52e96175 Bob Peterson 2020-01-28  173  	 * get a demote request and act on it. We don't really want the
601ef0d52e96175 Bob Peterson 2020-01-28  174  	 * lock in EX, so we send a "try" lock with 1CB to produce a callback.
601ef0d52e96175 Bob Peterson 2020-01-28  175  	 */
601ef0d52e96175 Bob Peterson 2020-01-28  176  	fs_warn(sdp, "Requesting recovery of jid %d.\n",
601ef0d52e96175 Bob Peterson 2020-01-28  177  		sdp->sd_lockstruct.ls_jid);
601ef0d52e96175 Bob Peterson 2020-01-28  178  	gfs2_holder_reinit(LM_ST_EXCLUSIVE, LM_FLAG_TRY_1CB | LM_FLAG_NOEXP,
601ef0d52e96175 Bob Peterson 2020-01-28  179  			   &sdp->sd_live_gh);
601ef0d52e96175 Bob Peterson 2020-01-28  180  	msleep(GL_GLOCK_MAX_HOLD);
601ef0d52e96175 Bob Peterson 2020-01-28  181  	/*
601ef0d52e96175 Bob Peterson 2020-01-28  182  	 * This will likely fail in a cluster, but succeed standalone:
601ef0d52e96175 Bob Peterson 2020-01-28  183  	 */
601ef0d52e96175 Bob Peterson 2020-01-28  184  	ret = gfs2_glock_nq(&sdp->sd_live_gh);
601ef0d52e96175 Bob Peterson 2020-01-28  185  
601ef0d52e96175 Bob Peterson 2020-01-28  186  	/*
601ef0d52e96175 Bob Peterson 2020-01-28  187  	 * If we actually got the "live" lock in EX mode, there are no other
601ef0d52e96175 Bob Peterson 2020-01-28  188  	 * nodes available to replay our journal. So we try to replay it
601ef0d52e96175 Bob Peterson 2020-01-28  189  	 * ourselves. We hold the "live" glock to prevent other mounters
601ef0d52e96175 Bob Peterson 2020-01-28  190  	 * during recovery, then just dequeue it and reacquire it in our
601ef0d52e96175 Bob Peterson 2020-01-28  191  	 * normal SH mode. Just in case the problem that caused us to
601ef0d52e96175 Bob Peterson 2020-01-28  192  	 * withdraw prevents us from recovering our journal (e.g. io errors
601ef0d52e96175 Bob Peterson 2020-01-28  193  	 * and such) we still check if the journal is clean before proceeding
601ef0d52e96175 Bob Peterson 2020-01-28  194  	 * but we may wait forever until another mounter does the recovery.
601ef0d52e96175 Bob Peterson 2020-01-28  195  	 */
601ef0d52e96175 Bob Peterson 2020-01-28  196  	if (ret == 0) {
601ef0d52e96175 Bob Peterson 2020-01-28  197  		fs_warn(sdp, "No other mounters found. Trying to recover our "
601ef0d52e96175 Bob Peterson 2020-01-28  198  			"own journal jid %d.\n", sdp->sd_lockstruct.ls_jid);
601ef0d52e96175 Bob Peterson 2020-01-28  199  		if (gfs2_recover_journal(sdp->sd_jdesc, 1))
601ef0d52e96175 Bob Peterson 2020-01-28  200  			fs_warn(sdp, "Unable to recover our journal jid %d.\n",
601ef0d52e96175 Bob Peterson 2020-01-28  201  				sdp->sd_lockstruct.ls_jid);
601ef0d52e96175 Bob Peterson 2020-01-28  202  		gfs2_glock_dq_wait(&sdp->sd_live_gh);
601ef0d52e96175 Bob Peterson 2020-01-28  203  		gfs2_holder_reinit(LM_ST_SHARED, LM_FLAG_NOEXP | GL_EXACT,
601ef0d52e96175 Bob Peterson 2020-01-28  204  				   &sdp->sd_live_gh);
601ef0d52e96175 Bob Peterson 2020-01-28  205  		gfs2_glock_nq(&sdp->sd_live_gh);
601ef0d52e96175 Bob Peterson 2020-01-28  206  	}
601ef0d52e96175 Bob Peterson 2020-01-28  207  
601ef0d52e96175 Bob Peterson 2020-01-28  208  	gfs2_glock_queue_put(gl); /* drop the extra reference we acquired */
601ef0d52e96175 Bob Peterson 2020-01-28  209  	clear_bit(SDF_WITHDRAW_RECOVERY, &sdp->sd_flags);
601ef0d52e96175 Bob Peterson 2020-01-28  210  
601ef0d52e96175 Bob Peterson 2020-01-28  211  	/*
601ef0d52e96175 Bob Peterson 2020-01-28  212  	 * At this point our journal is evicted, so we need to get a new inode
601ef0d52e96175 Bob Peterson 2020-01-28  213  	 * for it. Once done, we need to call gfs2_find_jhead which
601ef0d52e96175 Bob Peterson 2020-01-28  214  	 * calls gfs2_map_journal_extents to map it for us again.
601ef0d52e96175 Bob Peterson 2020-01-28  215  	 *
601ef0d52e96175 Bob Peterson 2020-01-28  216  	 * Note that we don't really want it to look up a FREE block. The
601ef0d52e96175 Bob Peterson 2020-01-28  217  	 * GFS2_BLKST_FREE simply overrides a block check in gfs2_inode_lookup
601ef0d52e96175 Bob Peterson 2020-01-28  218  	 * which would otherwise fail because it requires grabbing an rgrp
601ef0d52e96175 Bob Peterson 2020-01-28  219  	 * glock, which would fail with -EIO because we're withdrawing.
601ef0d52e96175 Bob Peterson 2020-01-28  220  	 */
601ef0d52e96175 Bob Peterson 2020-01-28  221  	inode = gfs2_inode_lookup(sdp->sd_vfs, DT_UNKNOWN,
601ef0d52e96175 Bob Peterson 2020-01-28  222  				  sdp->sd_jdesc->jd_no_addr, no_formal_ino,
601ef0d52e96175 Bob Peterson 2020-01-28  223  				  GFS2_BLKST_FREE);
601ef0d52e96175 Bob Peterson 2020-01-28  224  	if (IS_ERR(inode)) {
601ef0d52e96175 Bob Peterson 2020-01-28  225  		fs_warn(sdp, "Reprocessing of jid %d failed with %ld.\n",
601ef0d52e96175 Bob Peterson 2020-01-28  226  			sdp->sd_lockstruct.ls_jid, PTR_ERR(inode));
601ef0d52e96175 Bob Peterson 2020-01-28  227  		goto skip_recovery;
601ef0d52e96175 Bob Peterson 2020-01-28  228  	}
601ef0d52e96175 Bob Peterson 2020-01-28  229  	sdp->sd_jdesc->jd_inode = inode;
601ef0d52e96175 Bob Peterson 2020-01-28  230  
601ef0d52e96175 Bob Peterson 2020-01-28  231  	/*
601ef0d52e96175 Bob Peterson 2020-01-28  232  	 * Now wait until recovery is complete.
601ef0d52e96175 Bob Peterson 2020-01-28  233  	 */
601ef0d52e96175 Bob Peterson 2020-01-28  234  	for (tries = 0; tries < 10; tries++) {
7d9f9249580e05a Bob Peterson 2019-02-18  235  		ret = check_journal_clean(sdp, sdp->sd_jdesc, false);
601ef0d52e96175 Bob Peterson 2020-01-28  236  		if (!ret)
601ef0d52e96175 Bob Peterson 2020-01-28  237  			break;
601ef0d52e96175 Bob Peterson 2020-01-28  238  		msleep(HZ);
601ef0d52e96175 Bob Peterson 2020-01-28  239  		fs_warn(sdp, "Waiting for journal recovery jid %d.\n",
601ef0d52e96175 Bob Peterson 2020-01-28  240  			sdp->sd_lockstruct.ls_jid);
601ef0d52e96175 Bob Peterson 2020-01-28  241  	}
601ef0d52e96175 Bob Peterson 2020-01-28  242  skip_recovery:
601ef0d52e96175 Bob Peterson 2020-01-28  243  	if (!ret)
601ef0d52e96175 Bob Peterson 2020-01-28  244  		fs_warn(sdp, "Journal recovery complete for jid %d.\n",
601ef0d52e96175 Bob Peterson 2020-01-28  245  			sdp->sd_lockstruct.ls_jid);
601ef0d52e96175 Bob Peterson 2020-01-28  246  	else
601ef0d52e96175 Bob Peterson 2020-01-28  247  		fs_warn(sdp, "Journal recovery skipped for %d until next "
601ef0d52e96175 Bob Peterson 2020-01-28  248  			"mount.\n", sdp->sd_lockstruct.ls_jid);
601ef0d52e96175 Bob Peterson 2020-01-28  249  	fs_warn(sdp, "Glock dequeues delayed: %lu\n", sdp->sd_glock_dqs_held);
601ef0d52e96175 Bob Peterson 2020-01-28  250  	sdp->sd_glock_dqs_held = 0;
601ef0d52e96175 Bob Peterson 2020-01-28  251  	wake_up_bit(&sdp->sd_flags, SDF_WITHDRAW_RECOVERY);
601ef0d52e96175 Bob Peterson 2020-01-28  252  }
601ef0d52e96175 Bob Peterson 2020-01-28  253  

:::::: The code at line 126 was first introduced by commit
:::::: 601ef0d52e9617588fcff3df26953592f2eb44ac gfs2: Force withdraw to replay journals and wait for it to finish

:::::: TO: Bob Peterson <rpeterso@redhat.com>
:::::: CC: Bob Peterson <rpeterso@redhat.com>

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all at lists.01.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: .config.gz
Type: application/gzip
Size: 19349 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/cluster-devel/attachments/20200417/2336dfce/attachment.gz>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Cluster-devel] [gfs2:gfs2-iopen 12/12] fs/gfs2/util.c:126:3: error: implicit declaration of function 'gfs2_glock_dq_wait'; did you mean 'gfs2_glock_nq_init'?
  2020-04-17  7:36 [Cluster-devel] [gfs2:gfs2-iopen 12/12] fs/gfs2/util.c:126:3: error: implicit declaration of function 'gfs2_glock_dq_wait'; did you mean 'gfs2_glock_nq_init'? kbuild test robot
@ 2020-04-17 12:23 ` Andreas Gruenbacher
  2020-04-17 13:09   ` Bob Peterson
  0 siblings, 1 reply; 5+ messages in thread
From: Andreas Gruenbacher @ 2020-04-17 12:23 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Bob,

On Fri, Apr 17, 2020 at 9:37 AM kbuild test robot <lkp@intel.com> wrote:
> tree:   https://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git gfs2-iopen
> head:   c748e1ec9bd20d71265a148042f6bc97ffc5f343
> commit: c748e1ec9bd20d71265a148042f6bc97ffc5f343 [12/12] gfs2: Remove unused function gfs2_glock_dq_wait
> config: s390-defconfig (attached as .config)
> compiler: s390-linux-gcc (GCC) 9.3.0
> reproduce:
>         wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
>         chmod +x ~/bin/make.cross
>         git checkout c748e1ec9bd20d71265a148042f6bc97ffc5f343
>         # save the attached .config to linux build tree
>         COMPILER_INSTALL_PATH=$HOME/0day GCC_VERSION=9.3.0 make.cross ARCH=s390
>
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kbuild test robot <lkp@intel.com>
>
> All errors (new ones prefixed by >>):
>
>    fs/gfs2/util.c: In function 'signal_our_withdraw':
> >> fs/gfs2/util.c:126:3: error: implicit declaration of function 'gfs2_glock_dq_wait'; did you mean 'gfs2_glock_nq_init'? [-Werror=implicit-function-declaration]
>      126 |   gfs2_glock_dq_wait(&sdp->sd_journal_gh);
>          |   ^~~~~~~~~~~~~~~~~~
>          |   gfs2_glock_nq_init
>    cc1: some warnings being treated as errors

commit "gfs2: Force withdraw to replay journals and wait for it to
finish" adds three new users of gfs2_glock_dq_wait in function
signal_our_withdraw. Is the waiting there done for a reason, or can we
replace gfs2_glock_dq_wait with gfs2_glock_dq / gfs2_glock_dq_uninit
in that function?

Thanks,
Andreas




^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Cluster-devel] [gfs2:gfs2-iopen 12/12] fs/gfs2/util.c:126:3: error: implicit declaration of function 'gfs2_glock_dq_wait'; did you mean 'gfs2_glock_nq_init'?
  2020-04-17 12:23 ` Andreas Gruenbacher
@ 2020-04-17 13:09   ` Bob Peterson
  2020-04-17 14:27     ` [Cluster-devel] [gfs2:gfs2-iopen 12/12] fs/gfs2/util.c:126:3: error: implicit declaration of function Andreas Gruenbacher
  0 siblings, 1 reply; 5+ messages in thread
From: Bob Peterson @ 2020-04-17 13:09 UTC (permalink / raw)
  To: cluster-devel.redhat.com

----- Original Message -----
> Bob,
> 
> commit "gfs2: Force withdraw to replay journals and wait for it to
> finish" adds three new users of gfs2_glock_dq_wait in function
> signal_our_withdraw. Is the waiting there done for a reason, or can we
> replace gfs2_glock_dq_wait with gfs2_glock_dq / gfs2_glock_dq_uninit
> in that function?
> 
> Thanks,
> Andreas

Hi Andreas,

I remember debugging cases in which we needed to wait.
When we didn't wait for glocks to be demoted, the node just reacquired
the glocks again too quickly, before other nodes could attempt recovery.
When the withdrawing node tried to reacquire them, DLM simply granted
them on the same node, which is the only node that must not do recovery.

Addressing each instance separately:

(1) We would dequeue our journal glock, then dequeue the live glock.
    The other nodes would see the callback for the "live" glock and
    attempt recovery, however they were denied access to the journal
    glock because it was still held on the recovering node. That's
    because the glock state machine (but more importantly the dlm)
    had not yet demoted the lock completely when this took place.
    So none of the nodes performed recovery.
(2) We might be able to get rid of the "wait" for the "live" glock.
    I can't think of a case in which that would behave badly, but
    bear in mind it's been more than a year since I originally wrote
    that. It's probably closer to 2 years now.
(3) We might be able to get rid of the third "wait" which is also for
    the "live" glock. I don't remember the circumstances.

TBH, I wouldn't feel comfortable getting rid of any of those waits
until I do some heavy experimentation on my 5-node cluster with the
recovery tests.

Bob



^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Cluster-devel] [gfs2:gfs2-iopen 12/12] fs/gfs2/util.c:126:3: error: implicit declaration of function
  2020-04-17 13:09   ` Bob Peterson
@ 2020-04-17 14:27     ` Andreas Gruenbacher
  2020-04-17 14:35       ` Bob Peterson
  0 siblings, 1 reply; 5+ messages in thread
From: Andreas Gruenbacher @ 2020-04-17 14:27 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi Bob,

On Fri, Apr 17, 2020 at 3:09 PM Bob Peterson <rpeterso@redhat.com> wrote:
> ----- Original Message -----
> > Bob,
> >
> > commit "gfs2: Force withdraw to replay journals and wait for it to
> > finish" adds three new users of gfs2_glock_dq_wait in function
> > signal_our_withdraw. Is the waiting there done for a reason, or can we
> > replace gfs2_glock_dq_wait with gfs2_glock_dq / gfs2_glock_dq_uninit
> > in that function?
> >
> > Thanks,
> > Andreas
>
> Hi Andreas,
>
> I remember debugging cases in which we needed to wait.
> When we didn't wait for glocks to be demoted, the node just reacquired
> the glocks again too quickly, before other nodes could attempt recovery.
> When the withdrawing node tried to reacquire them, DLM simply granted
> them on the same node, which is the only node that must not do recovery.
>
> Addressing each instance separately:
>
> (1) We would dequeue our journal glock, then dequeue the live glock.
>     The other nodes would see the callback for the "live" glock and
>     attempt recovery, however they were denied access to the journal
>     glock because it was still held on the recovering node. That's
>     because the glock state machine (but more importantly the dlm)
>     had not yet demoted the lock completely when this took place.
>     So none of the nodes performed recovery.

Hmm, sdp->sd_journal_gh has the GL_NOCACHE flag set, so the demote
should happen immediately.  On the other hand, sdp->sd_live_gh doesn't
have that flag set, so that may be the actual problem here.

> (2) We might be able to get rid of the "wait" for the "live" glock.
>     I can't think of a case in which that would behave badly, but
>     bear in mind it's been more than a year since I originally wrote
>     that. It's probably closer to 2 years now.
> (3) We might be able to get rid of the third "wait" which is also for
>     the "live" glock. I don't remember the circumstances.
>
> TBH, I wouldn't feel comfortable getting rid of any of those waits
> until I do some heavy experimentation on my 5-node cluster with the
> recovery tests.

I agree that testing will be needed.  What do you think of the below
patch?

Thanks,
Andreas

---
 fs/gfs2/ops_fstype.c |  2 +-
 fs/gfs2/util.c       | 16 ++++++++--------
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index e2b69ffcc6a8..98b2577b815b 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -405,7 +405,7 @@ static int init_locking(struct gfs2_sbd *sdp, struct gfs2_holder *mount_gh,
 	error = gfs2_glock_nq_num(sdp,
 				  GFS2_LIVE_LOCK, &gfs2_nondisk_glops,
 				  LM_ST_SHARED,
-				  LM_FLAG_NOEXP | GL_EXACT,
+				  LM_FLAG_NOEXP | GL_EXACT | GL_NOCACHE,
 				  &sdp->sd_live_gh);
 	if (error) {
 		fs_err(sdp, "can't acquire live glock: %d\n", error);
diff --git a/fs/gfs2/util.c b/fs/gfs2/util.c
index 9b64d40ab379..6e48cef79c53 100644
--- a/fs/gfs2/util.c
+++ b/fs/gfs2/util.c
@@ -122,10 +122,8 @@ static void signal_our_withdraw(struct gfs2_sbd *sdp)
 	/*
 	 * Drop the glock for our journal so another node can recover it.
 	 */
-	if (gfs2_holder_initialized(&sdp->sd_journal_gh)) {
-		gfs2_glock_dq_wait(&sdp->sd_journal_gh);
-		gfs2_holder_uninit(&sdp->sd_journal_gh);
-	}
+	if (gfs2_holder_initialized(&sdp->sd_journal_gh))
+		gfs2_glock_dq_uninit(&sdp->sd_journal_gh);
 	sdp->sd_jinode_gh.gh_flags |= GL_NOCACHE;
 	gfs2_glock_dq(&sdp->sd_jinode_gh);
 	if (test_bit(SDF_FS_FROZEN, &sdp->sd_flags)) {
@@ -167,7 +165,7 @@ static void signal_our_withdraw(struct gfs2_sbd *sdp)
 	 * Dequeue the "live" glock, but keep a reference so it's never freed.
 	 */
 	gfs2_glock_hold(gl);
-	gfs2_glock_dq_wait(&sdp->sd_live_gh);
+	gfs2_glock_dq(&sdp->sd_live_gh);
 	/*
 	 * We enqueue the "live" glock in EX so that all other nodes
 	 * get a demote request and act on it. We don't really want the
@@ -175,7 +173,8 @@ static void signal_our_withdraw(struct gfs2_sbd *sdp)
 	 */
 	fs_warn(sdp, "Requesting recovery of jid %d.\n",
 		sdp->sd_lockstruct.ls_jid);
-	gfs2_holder_reinit(LM_ST_EXCLUSIVE, LM_FLAG_TRY_1CB | LM_FLAG_NOEXP,
+	gfs2_holder_reinit(LM_ST_EXCLUSIVE,
+			   LM_FLAG_TRY_1CB | LM_FLAG_NOEXP | GL_NOCACHE,
 			   &sdp->sd_live_gh);
 	msleep(GL_GLOCK_MAX_HOLD);
 	/*
@@ -199,8 +198,9 @@ static void signal_our_withdraw(struct gfs2_sbd *sdp)
 		if (gfs2_recover_journal(sdp->sd_jdesc, 1))
 			fs_warn(sdp, "Unable to recover our journal jid %d.\n",
 				sdp->sd_lockstruct.ls_jid);
-		gfs2_glock_dq_wait(&sdp->sd_live_gh);
-		gfs2_holder_reinit(LM_ST_SHARED, LM_FLAG_NOEXP | GL_EXACT,
+		gfs2_glock_dq(&sdp->sd_live_gh);
+		gfs2_holder_reinit(LM_ST_SHARED,
+				   LM_FLAG_NOEXP | GL_EXACT | GL_NOCACHE,
 				   &sdp->sd_live_gh);
 		gfs2_glock_nq(&sdp->sd_live_gh);
 	}

base-commit: bd437e630fee648b79999e617d48bb07ae76f8eb
prerequisite-patch-id: e3b7fbfc1e67b4065cb6c928c29c7ed8bee0fc1c
prerequisite-patch-id: 35e9388872b2d6ffc70c4fc5497d3fdec97d6392
-- 
2.25.2



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [Cluster-devel] [gfs2:gfs2-iopen 12/12] fs/gfs2/util.c:126:3: error: implicit declaration of function
  2020-04-17 14:27     ` [Cluster-devel] [gfs2:gfs2-iopen 12/12] fs/gfs2/util.c:126:3: error: implicit declaration of function Andreas Gruenbacher
@ 2020-04-17 14:35       ` Bob Peterson
  0 siblings, 0 replies; 5+ messages in thread
From: Bob Peterson @ 2020-04-17 14:35 UTC (permalink / raw)
  To: cluster-devel.redhat.com

----- Original Message -----
> Hi Bob,
> 
> Hmm, sdp->sd_journal_gh has the GL_NOCACHE flag set, so the demote
> should happen immediately.  On the other hand, sdp->sd_live_gh doesn't
> have that flag set, so that may be the actual problem here.
> 
> I agree that testing will be needed.  What do you think of the below
> patch?
> 
> Thanks,
> Andreas

Hi Andreas,

I'll test it out on my cluster with some recovery tests and let you
know how it goes.

Bob



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-04-17 14:35 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-04-17  7:36 [Cluster-devel] [gfs2:gfs2-iopen 12/12] fs/gfs2/util.c:126:3: error: implicit declaration of function 'gfs2_glock_dq_wait'; did you mean 'gfs2_glock_nq_init'? kbuild test robot
2020-04-17 12:23 ` Andreas Gruenbacher
2020-04-17 13:09   ` Bob Peterson
2020-04-17 14:27     ` [Cluster-devel] [gfs2:gfs2-iopen 12/12] fs/gfs2/util.c:126:3: error: implicit declaration of function Andreas Gruenbacher
2020-04-17 14:35       ` Bob Peterson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).