[Cluster-devel] [PATCH 0/2] GFS2: inplace_reserve performance improvements

cluster-devel.redhat.com archive mirror
 help / color / mirror / Atom feed

From: Mark Syms <Mark.Syms@citrix.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [PATCH 0/2] GFS2: inplace_reserve performance	improvements
Date: Tue, 2 Oct 2018 13:50:27 +0000	[thread overview]
Message-ID: <d8c09a7a162c41ea9100fd2608c65242@AMSPEX02CL02.citrite.net> (raw)
In-Reply-To: <0ae4b647a6f74baa8740d16686526b28@AMSPEX02CL02.citrite.net>

Just to follow up on this further. We made a change to swap the call to schedule for schedule_timeout and added a kprintf in the case where schedule_timeout actually timed out rather than being woken. Running in our multi VM , multi host stress test we saw one instance of the kprintf message and no stuck tasks at any point within a 24 stress test so it looks like this is beneficial, at least in our kernel.

I'll clean the change up and send it on but it's likely that it's only required due a race elsewhere and possibly one that has already been fixed in your latest codebase. Our thoughts revolve around the code performing clean up on the hashtable possibly checking to see if there are any waiters that will need to be notified on completion of the cleanup immediately before we start to wait and then not notifying  as it's not aware of the waiter, but we have no concrete proof of that just the outcome that we get into schedule and stick there.

Mark.	

-----Original Message-----
From: Mark Syms 
Sent: 28 September 2018 14:57
To: 'Bob Peterson' <rpeterso@redhat.com>
Cc: cluster-devel at redhat.com; Tim Smith <tim.smith@citrix.com>; Ross Lagerwall <ross.lagerwall@citrix.com>; Andreas Gruenbacher <agruenba@redhat.com>
Subject: RE: [Cluster-devel] [PATCH 0/2] GFS2: inplace_reserve performance improvements

The hosts stayed up and in the first occurrence of this (that we caught inflight as opposed to only seeing the aftermath in automation logs) the locks all actually unwedged after about 2 hours.

	Mark.

-----Original Message-----
From: Bob Peterson <rpeterso@redhat.com>
Sent: 28 September 2018 13:56
To: Mark Syms <Mark.Syms@citrix.com>
Cc: cluster-devel at redhat.com; Tim Smith <tim.smith@citrix.com>; Ross Lagerwall <ross.lagerwall@citrix.com>; Andreas Gruenbacher <agruenba@redhat.com>
Subject: Re: [Cluster-devel] [PATCH 0/2] GFS2: inplace_reserve performance improvements

----- Original Message -----
> Hi Bob,
> 
> No, we haven't but it wouldn't be hard for us to replace our patches 
> in our internal patchqueue with these and try them. Will let you know what we find.
> 
> We have also seen, what we think is an unrelated issue where we get 
> the following backtrace in kern.log and our system stalls
> 
> Sep 21 21:19:09 cl15-05 kernel: [21389.462707] INFO: task python:15480 
> blocked for more than 120 seconds.
> Sep 21 21:19:09 cl15-05 kernel: [21389.462749]       Tainted: G           O
> 4.4.0+10 #1
> Sep 21 21:19:09 cl15-05 kernel: [21389.462763] "echo 0 > 
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Sep 21 21:19:09 cl15-05 kernel: [21389.462783] python          D
> ffff88019628bc90     0 15480      1 0x00000000
> Sep 21 21:19:09 cl15-05 kernel: [21389.462790]  ffff88019628bc90
> ffff880198f11c00 ffff88005a509c00 ffff88019628c000 Sep 21 21:19:09
> cl15-05 kernel: [21389.462795]  ffffc90040226000
> ffff88019628bd80 fffffffffffffe58 ffff8801818da418 Sep 21 21:19:09
> cl15-05 kernel: [21389.462799]  ffff88019628bca8
> ffffffff815a1cd4 ffff8801818da5c0 ffff88019628bd68 Sep 21 21:19:09
> cl15-05 kernel: [21389.462803] Call Trace:
> Sep 21 21:19:09 cl15-05 kernel: [21389.462815]  [<ffffffff815a1cd4>]
> schedule+0x64/0x80
> Sep 21 21:19:09 cl15-05 kernel: [21389.462877]  [<ffffffffa0663624>]
> find_insert_glock+0x4a4/0x530 [gfs2]
> Sep 21 21:19:09 cl15-05 kernel: [21389.462891]  [<ffffffffa0660c20>] ?
> gfs2_holder_wake+0x20/0x20 [gfs2]
> Sep 21 21:19:09 cl15-05 kernel: [21389.462903]  [<ffffffffa06639ed>]
> gfs2_glock_get+0x3d/0x330 [gfs2]
> Sep 21 21:19:09 cl15-05 kernel: [21389.462928]  [<ffffffffa066cff2>]
> do_flock+0xf2/0x210 [gfs2]
> Sep 21 21:19:09 cl15-05 kernel: [21389.462933]  [<ffffffffa0671ad0>] ?
> gfs2_getattr+0xe0/0xf0 [gfs2]
> Sep 21 21:19:09 cl15-05 kernel: [21389.462938]  [<ffffffff811ba2fb>] ?
> cp_new_stat+0x10b/0x120
> Sep 21 21:19:09 cl15-05 kernel: [21389.462943]  [<ffffffffa066d188>]
> gfs2_flock+0x78/0xa0 [gfs2]
> Sep 21 21:19:09 cl15-05 kernel: [21389.462946]  [<ffffffff812021e9>]
> SyS_flock+0x129/0x170
> Sep 21 21:19:09 cl15-05 kernel: [21389.462948]  [<ffffffff815a57ee>]
> entry_SYSCALL_64_fastpath+0x12/0x71
> 
> We think there is a possibility, given that this code path only gets 
> entered if a glock is being destroyed, that there is a time of check, 
> time of use issue here where by the time that schedule gets called the 
> thing which we expect to be waking us up has completed dying and 
> therefore won't trigger a wakeup for us. We only seen this a couple of 
> times in fairly intensive VM stress tests where a lot of flocks get 
> used on a small number of lock files (we use them to ensure consistent 
> behaviour of disk activation/deactivation and also access to the 
> database with the system state) but it's concerning nonetheless. We're 
> looking at replacing the call to schedule with schedule_timeout with a 
> timeout of maybe HZ to ensure that we will always get out of the 
> schedule operation and retry. Is this something you think you may have seen or have any ideas on?
> 
> Thanks,
> 
> 	Mark.

Hi Mark,

It's very common to get call traces like that when one of the nodes in a cluster goes down and the other nodes all wait for the failed node to be fenced, etc. The node failure causes dlm to temporarily stop granting locks until the issue is resolved. This is expected behavior, and dlm recovery should eventually grant the lock once the node is properly removed from the cluster. I haven't seen it on an flock glock, because I personally don't often run with flocks, but it often happens to me with other glocks when I do recovery testing (which I've been doing a lot of lately).

So is there a node failure in your case? If there's a node failure, dlm should recover the locks and allow the waiter to continue normally. If it's not a node failure, it's hard to say... I know Andreas fixed some problems with the rcu locking we do to protect the glock rhashtable.
Perhaps the kernel you're using is missing one of his patches? Or maybe it's a new bug. Adding Andreas to the cc.

Regards,

Bob Peterson

     prev parent reply	other threads:[~2018-10-02 13:50 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-20 14:52 [Cluster-devel] [PATCH 0/2] GFS2: inplace_reserve performance improvements Mark Syms
2018-09-20 14:52 ` [Cluster-devel] [PATCH 1/2] Add some randomisation to the GFS2 resource group allocator Mark Syms
2018-09-20 14:52 ` [Cluster-devel] [PATCH 2/2] GFS2: Avoid recently demoted rgrps Mark Syms
2018-09-20 17:17 ` [Cluster-devel] [PATCH 0/2] GFS2: inplace_reserve performance improvements Bob Peterson
2018-09-20 17:47   ` Mark Syms
2018-09-20 18:16     ` Steven Whitehouse
2018-09-28 12:23     ` Bob Peterson
2018-09-28 12:36       ` Mark Syms
2018-09-28 12:50         ` Mark Syms
2018-09-28 13:18           ` Steven Whitehouse
2018-09-28 13:43             ` Tim Smith
2018-09-28 13:59               ` Bob Peterson
2018-09-28 14:11                 ` Mark Syms
2018-09-28 15:09                 ` Tim Smith
2018-09-28 15:09               ` Steven Whitehouse
2018-09-28 12:55         ` Bob Peterson
2018-09-28 13:56           ` Mark Syms
2018-10-02 13:50             ` Mark Syms [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d8c09a7a162c41ea9100fd2608c65242@AMSPEX02CL02.citrite.net \
    --to=mark.syms@citrix.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).