Linux Btrfs filesystem development
 help / color / mirror / Atom feed
* Need some assistance/direction in determining a system hang during heavy IO
@ 2017-10-26 15:40 Cheyenne Wills
  2017-10-26 17:41 ` Roman Mamedov
  0 siblings, 1 reply; 3+ messages in thread
From: Cheyenne Wills @ 2017-10-26 15:40 UTC (permalink / raw)
  To: linux-btrfs

A while back I opened an issue in the IO/Storage Block layer in kernel
bugzilla -- though it may be going to the wrong queue.

Was wondering if someone could take a quick look at
https://bugzilla.kernel.org/show_bug.cgi?id=193331 and see if the
problem rings a bell, or if it's in the wrong component, etc. or give
me some direction on what information I should try to collect.

Briefly when I upgraded a system from 4.0.5 kernel to 4.9.5 (and
later) I'm seeing a blocked task timeout with heavy IO against a
multi-lun btrfs filesystem.  I've tried a 4.12.12 kernel and am still
getting the hang.  I can force a recreate by running a btrfs scrub
against the filesystem.  My initial encounters with the hang was when
I was running a filesystem backup and also when the major app (FTP
server in this case) was performing a lot of read IO.

Thanks in advance.

Cheyenne Wills

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Need some assistance/direction in determining a system hang during heavy IO
  2017-10-26 15:40 Need some assistance/direction in determining a system hang during heavy IO Cheyenne Wills
@ 2017-10-26 17:41 ` Roman Mamedov
  2017-10-26 18:04   ` Cheyenne Wills
  0 siblings, 1 reply; 3+ messages in thread
From: Roman Mamedov @ 2017-10-26 17:41 UTC (permalink / raw)
  To: Cheyenne Wills; +Cc: linux-btrfs

On Thu, 26 Oct 2017 09:40:19 -0600
Cheyenne Wills <cheyenne.wills@gmail.com> wrote:

> Briefly when I upgraded a system from 4.0.5 kernel to 4.9.5 (and
> later) I'm seeing a blocked task timeout with heavy IO against a
> multi-lun btrfs filesystem.  I've tried a 4.12.12 kernel and am still
> getting the hang.

There is now 4.9.58 (fifty three versions later!) and 4.12 series is long
abandoned and gone from the charts altogether. So just in case, did you check
with the latest kernels?

Also, keep in mind the 120 second warnings are just that, and not an error
condition by themselves. You can disable them or increase the maximum timeout
in sysctl settings. And it is not clear from your reports if you only get
warnings and after the load subsides everything is back to normal, or the FS
locks out "for good", i.e. with all access attempts hanging indefinitely and
no way to unmount the FS or otherwise recover.

-- 
With respect,
Roman

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Need some assistance/direction in determining a system hang during heavy IO
  2017-10-26 17:41 ` Roman Mamedov
@ 2017-10-26 18:04   ` Cheyenne Wills
  0 siblings, 0 replies; 3+ messages in thread
From: Cheyenne Wills @ 2017-10-26 18:04 UTC (permalink / raw)
  To: Roman Mamedov; +Cc: linux-btrfs

On Thu, Oct 26, 2017 at 11:41 AM, Roman Mamedov <rm@romanrm.net> wrote:
> On Thu, 26 Oct 2017 09:40:19 -0600
> Cheyenne Wills <cheyenne.wills@gmail.com> wrote:
>
>> Briefly when I upgraded a system from 4.0.5 kernel to 4.9.5 (and
>> later) I'm seeing a blocked task timeout with heavy IO against a
>> multi-lun btrfs filesystem.  I've tried a 4.12.12 kernel and am still
>> getting the hang.
>
> There is now 4.9.58 (fifty three versions later!) and 4.12 series is long
> abandoned and gone from the charts altogether. So just in case, did you check
> with the latest kernels?
>
> Also, keep in mind the 120 second warnings are just that, and not an error
> condition by themselves. You can disable them or increase the maximum timeout
> in sysctl settings. And it is not clear from your reports if you only get
> warnings and after the load subsides everything is back to normal, or the FS
> locks out "for good", i.e. with all access attempts hanging indefinitely and
> no way to unmount the FS or otherwise recover.
>
> --
> With respect,
> Roman

Thank you.

The whole system ends up hanging and doesn't recover. A hard reboot is
required to recover (e.g. I can't shutdown the system).

The problem first appeared a while back when I was doing an upgrade
from a 4.0.5 kernel to a 4.4.21 kernel.
I opened the problem against a 4.9.5 kernel (this was back in Jan
2017).  Since then I've tried various levels of the kernel with the
latest (last couple of days) being
a 4.12.12 level (I'm running a gentoo system and the gentoo-sources
4.12.12 is their highest stable level) -- all levels that I've tried
since the 4.4.21 have failed so far.

I'm about to try an old 4.1 kernel just to see if I can bisect where
the problem was introduced (which is somewhere between 4.0.5 (works)
and 4.4.21 (fails)  kernel.
I can also try a newer ("unstable") 4.13 gentoo-sources kernel to see
if anything even more recent fixes the problem.

Thanks

Cheyenne Wills

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2017-10-26 18:04 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-10-26 15:40 Need some assistance/direction in determining a system hang during heavy IO Cheyenne Wills
2017-10-26 17:41 ` Roman Mamedov
2017-10-26 18:04   ` Cheyenne Wills

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox