From mboxrd@z Thu Jan  1 00:00:00 1970
From: Bob Peterson <rpeterso@redhat.com>
Date: Fri, 28 Sep 2018 08:55:58 -0400 (EDT)
Subject: [Cluster-devel] [PATCH 0/2] GFS2: inplace_reserve
	performance	improvements
In-Reply-To: <06359a927ea74713adaa1d0a5aec74f6@AMSPEX02CL02.citrite.net>
References: <1537455133-48589-1-git-send-email-mark.syms@citrix.com>
	<77971123.14918571.1537463869960.JavaMail.zimbra@redhat.com>
	<b948a97d0e9c48c7964d9fd8257e1283@AMSPEX02CL02.citrite.net>
	<552791371.16890843.1538137438323.JavaMail.zimbra@redhat.com>
	<06359a927ea74713adaa1d0a5aec74f6@AMSPEX02CL02.citrite.net>
Message-ID: <1591671579.16895938.1538139358115.JavaMail.zimbra@redhat.com>
List-Id: <cluster-devel.redhat.com>
To: cluster-devel.redhat.com
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit

----- Original Message -----
> Hi Bob,
> 
> No, we haven't but it wouldn't be hard for us to replace our patches in our
> internal patchqueue with these and try them. Will let you know what we find.
> 
> We have also seen, what we think is an unrelated issue where we get the
> following backtrace in kern.log and our system stalls
> 
> Sep 21 21:19:09 cl15-05 kernel: [21389.462707] INFO: task python:15480
> blocked for more than 120 seconds.
> Sep 21 21:19:09 cl15-05 kernel: [21389.462749]       Tainted: G           O
> 4.4.0+10 #1
> Sep 21 21:19:09 cl15-05 kernel: [21389.462763] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Sep 21 21:19:09 cl15-05 kernel: [21389.462783] python          D
> ffff88019628bc90     0 15480      1 0x00000000
> Sep 21 21:19:09 cl15-05 kernel: [21389.462790]  ffff88019628bc90
> ffff880198f11c00 ffff88005a509c00 ffff88019628c000
> Sep 21 21:19:09 cl15-05 kernel: [21389.462795]  ffffc90040226000
> ffff88019628bd80 fffffffffffffe58 ffff8801818da418
> Sep 21 21:19:09 cl15-05 kernel: [21389.462799]  ffff88019628bca8
> ffffffff815a1cd4 ffff8801818da5c0 ffff88019628bd68
> Sep 21 21:19:09 cl15-05 kernel: [21389.462803] Call Trace:
> Sep 21 21:19:09 cl15-05 kernel: [21389.462815]  [<ffffffff815a1cd4>]
> schedule+0x64/0x80
> Sep 21 21:19:09 cl15-05 kernel: [21389.462877]  [<ffffffffa0663624>]
> find_insert_glock+0x4a4/0x530 [gfs2]
> Sep 21 21:19:09 cl15-05 kernel: [21389.462891]  [<ffffffffa0660c20>] ?
> gfs2_holder_wake+0x20/0x20 [gfs2]
> Sep 21 21:19:09 cl15-05 kernel: [21389.462903]  [<ffffffffa06639ed>]
> gfs2_glock_get+0x3d/0x330 [gfs2]
> Sep 21 21:19:09 cl15-05 kernel: [21389.462928]  [<ffffffffa066cff2>]
> do_flock+0xf2/0x210 [gfs2]
> Sep 21 21:19:09 cl15-05 kernel: [21389.462933]  [<ffffffffa0671ad0>] ?
> gfs2_getattr+0xe0/0xf0 [gfs2]
> Sep 21 21:19:09 cl15-05 kernel: [21389.462938]  [<ffffffff811ba2fb>] ?
> cp_new_stat+0x10b/0x120
> Sep 21 21:19:09 cl15-05 kernel: [21389.462943]  [<ffffffffa066d188>]
> gfs2_flock+0x78/0xa0 [gfs2]
> Sep 21 21:19:09 cl15-05 kernel: [21389.462946]  [<ffffffff812021e9>]
> SyS_flock+0x129/0x170
> Sep 21 21:19:09 cl15-05 kernel: [21389.462948]  [<ffffffff815a57ee>]
> entry_SYSCALL_64_fastpath+0x12/0x71
> 
> We think there is a possibility, given that this code path only gets entered
> if a glock is being destroyed, that there is a time of check, time of use
> issue here where by the time that schedule gets called the thing which we
> expect to be waking us up has completed dying and therefore won't trigger a
> wakeup for us. We only seen this a couple of times in fairly intensive VM
> stress tests where a lot of flocks get used on a small number of lock files
> (we use them to ensure consistent behaviour of disk activation/deactivation
> and also access to the database with the system state) but it's concerning
> nonetheless. We're looking at replacing the call to schedule with
> schedule_timeout with a timeout of maybe HZ to ensure that we will always
> get out of the schedule operation and retry. Is this something you think you
> may have seen or have any ideas on?
> 
> Thanks,
> 
> 	Mark.

Hi Mark,

It's very common to get call traces like that when one of the nodes
in a cluster goes down and the other nodes all wait for the failed node
to be fenced, etc. The node failure causes dlm to temporarily stop
granting locks until the issue is resolved. This is expected behavior,
and dlm recovery should eventually grant the lock once the node is
properly removed from the cluster. I haven't seen it on an flock glock,
because I personally don't often run with flocks, but it often happens
to me with other glocks when I do recovery testing (which I've been
doing a lot of lately).

So is there a node failure in your case? If there's a node failure, dlm
should recover the locks and allow the waiter to continue normally. If
it's not a node failure, it's hard to say... I know Andreas fixed some
problems with the rcu locking we do to protect the glock rhashtable.
Perhaps the kernel you're using is missing one of his patches? Or maybe
it's a new bug. Adding Andreas to the cc.

Regards,

Bob Peterson