From: Steven Whitehouse <swhiteho@redhat.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock
Date: Mon, 8 Oct 2018 13:56:29 +0100 [thread overview]
Message-ID: <71a4814c-9300-2ff8-2617-4fda64625b46@redhat.com> (raw)
In-Reply-To: <1539002191-40831-2-git-send-email-mark.syms@citrix.com>
Hi,
On 08/10/18 13:36, Mark Syms wrote:
> During a VM stress test we encountered a system lockup and kern.log contained
>
> kernel: [21389.462707] INFO: task python:15480 blocked for more than 120 seconds.
> kernel: [21389.462749] Tainted: G O 4.4.0+10 #1
> kernel: [21389.462763] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> kernel: [21389.462783] python D ffff88019628bc90 0 15480 1 0x00000000
> kernel: [21389.462790] ffff88019628bc90 ffff880198f11c00 ffff88005a509c00 ffff88019628c000
> kernel: [21389.462795] ffffc90040226000 ffff88019628bd80 fffffffffffffe58 ffff8801818da418
> kernel: [21389.462799] ffff88019628bca8 ffffffff815a1cd4 ffff8801818da5c0 ffff88019628bd68
> kernel: [21389.462803] Call Trace:
> kernel: [21389.462815] [<ffffffff815a1cd4>] schedule+0x64/0x80
> kernel: [21389.462877] [<ffffffffa0663624>] find_insert_glock+0x4a4/0x530 [gfs2]
> kernel: [21389.462891] [<ffffffffa0660c20>] ? gfs2_holder_wake+0x20/0x20 [gfs2]
> kernel: [21389.462903] [<ffffffffa06639ed>] gfs2_glock_get+0x3d/0x330 [gfs2]
> kernel: [21389.462928] [<ffffffffa066cff2>] do_flock+0xf2/0x210 [gfs2]
> kernel: [21389.462933] [<ffffffffa0671ad0>] ? gfs2_getattr+0xe0/0xf0 [gfs2]
> kernel: [21389.462938] [<ffffffff811ba2fb>] ? cp_new_stat+0x10b/0x120
> kernel: [21389.462943] [<ffffffffa066d188>] gfs2_flock+0x78/0xa0 [gfs2]
> kernel: [21389.462946] [<ffffffff812021e9>] SyS_flock+0x129/0x170
> kernel: [21389.462948] [<ffffffff815a57ee>] entry_SYSCALL_64_fastpath+0x12/0x71
>
> On examination of the code it was determined that this code path
> is only taken if the selected glock is marked as dead, the supposition
> therefore is that by the time schedule as called the glock had been
> cleaned up and therefore nothing woke the schedule. Instead of calling
> schedule, call schedule_timeout(HZ) so at least we get a chance to
> re-evaluate.
>
> On repeating the stress test, the printk message was seen once in the logs
> across four servers but no further occurences nor were there any stuck task
> log entries. This indicates that when the timeout occured the code repeated
> the lookup and did not find the same glock entry but as we hadn't been
> woken this means that we would never have been woken.
>
> Signed-off-by: Mark Syms <mark.syms@citrix.com>
> Signed-off-by: Tim Smith <tim.smith@citrix.com>
> ---
> fs/gfs2/glock.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
> index 4614ee2..0a59a01 100644
> --- a/fs/gfs2/glock.c
> +++ b/fs/gfs2/glock.c
> @@ -758,7 +758,8 @@ static struct gfs2_glock *find_insert_glock(struct lm_lockname *name,
> }
> if (gl && !lockref_get_not_dead(&gl->gl_lockref)) {
> rcu_read_unlock();
> - schedule();
> + if (schedule_timeout(HZ) == 0)
> + printk(KERN_INFO "find_insert_glock schedule timed out\n");
> goto again;
> }
> out:
That is a bit odd. In fact that whole function looks odd. I wonder why
it needs to wait in the first place. It should be a simple comparison
here. If the glock is dead then nothing else should touch it, so we are
safe to add a new one into the hash table. The wait is almost certainly
a bug,
Steve.
next prev parent reply other threads:[~2018-10-08 12:56 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-10-08 12:36 [Cluster-devel] [PATCH 0/2] GFS2 locking patches Mark Syms
2018-10-08 12:36 ` [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock Mark Syms
2018-10-08 12:56 ` Steven Whitehouse [this message]
2018-10-08 12:59 ` Mark Syms
2018-10-08 13:03 ` Steven Whitehouse
2018-10-08 13:10 ` Tim Smith
2018-10-08 13:13 ` Steven Whitehouse
2018-10-08 13:26 ` Tim Smith
2018-10-09 8:13 ` Mark Syms
2018-10-09 8:41 ` Steven Whitehouse
2018-10-09 8:50 ` Mark Syms
2018-10-09 12:34 ` Andreas Gruenbacher
2018-10-09 14:00 ` Tim Smith
2018-10-09 14:47 ` Andreas Gruenbacher
2018-10-09 15:34 ` Tim Smith
2018-10-08 13:25 ` Bob Peterson
2018-10-08 12:36 ` [Cluster-devel] [PATCH 2/2] GFS2: Flush the GFS2 delete workqueue before stopping the kernel threads Mark Syms
2018-10-08 12:57 ` Steven Whitehouse
2018-10-08 20:10 ` Bob Peterson
2018-10-08 20:53 ` Mark Syms
[not found] ` <1603192.QCZcUHDx0E@dhcp-3-135.uk.xensource.com>
[not found] ` <CAHc6FU7AyDQ=yEj9WtN+aqtf7jfWs0TGh=Og0fQFVHyYMKHacA@mail.gmail.com>
2018-10-09 15:08 ` [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock Tim Smith
2018-10-10 8:22 ` Tim Smith
2018-10-11 8:14 ` Mark Syms
2018-10-11 19:28 ` Mark Syms
[not found] ` <5bbfa461.1c69fb81.1242f.59a7SMTPIN_ADDED_BROKEN@mx.google.com>
2019-03-06 16:14 ` Andreas Gruenbacher
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=71a4814c-9300-2ff8-2617-4fda64625b46@redhat.com \
--to=swhiteho@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).