linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dmitry Katsubo <dma_k@mail.ru>
To: linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: Process is blocked for more than 120 seconds
Date: Thu, 16 Jun 2016 03:05:38 +0200	[thread overview]
Message-ID: <5761FB62.30001@mail.ru> (raw)
In-Reply-To: <564328B8.2020509@mail.ru>

On 2015-11-11 12:38, Dmitry Katsubo wrote:
> On 2015-11-09 14:25, Austin S Hemmelgarn wrote:
>> On 2015-11-07 07:22, Dmitry Katsubo wrote:
>>> Hi everyone,
>>>
>>> I have noticed the following in the log. The system continues to run,
>>> but I am not sure for how long it will be stable. Should I start
>>> worrying? Thanks in advance for the opinion.
>>>
>> This just means that a process was stuck in the D state (uninterruptible
>> I/O sleep) for more than 120 seconds.  Depending on a number of factors,
>> this happening could mean:
>> 1. Absolutely nothing (if you have low-powered or older hardware, for
>> example, I get these regularly on a first generation Raspberry Pi if I
>> don't increase the timeout significantly)
>> 2. The program is doing a very large chunk of I/O (usually with the
>> O_DIRECT flag, although this probably isn't the case here)
>> 3. There's a bug in the blocked program (this is rarely the case when
>> this type of thing happens)
>> 4. There's a bug in the kernel (which is why this dumps a stack trace)
>> 5. The filesystem itself is messed up somehow, and the kernel isn't
>> handling it properly (technically a bug, but a more specific case of it).
>> 6. You're hardware is misbehaving, failing, or experienced a transient
>> error.
>>
>> Assuming you can rule out possibilities 1 and 6, I think that 4 is the
>> most likely cause, as all of the listed programs (I'm assuming that
>> 'master' is from postfix) are relatively well audited, and all of them
>> hit this at the same time.
>>
>> For what it's worth, if you want you can do:
>> echo 0 > /proc/sys/kernel/hung_task_timeout_secs
>> like the message says to stop these from appearing in the future, or use
>> some arbitrary number to change the timeout before these messages appear
>> (I usually use at least 150 on production systems, and more often 300,
>> although on something like a Raspberry Pi I often use timeouts as high
>> as 1800 seconds).
> 
> Thanks for comments, Austin.
> 
> The system is "normal" PC, running Intel Core 2 Duo Mobile @1.66GHz.
> "master" is indeed a postfix process.
> 
> I haven't seen anything like that when I was on 3.16 kernel, but after I
> have upgraded to 4.2.3, I caught that message. I/O and CPU load are
> usually low, but it could be (6) from your list, as the system is
> generally very old (5+ years).
> 
> As the problem appeared only once for passed 15 days, I think it is just
> a transient error. Thanks for clarifying the possible reasons.

The problem (rarely) re-occurs. It does not happen on XFS filesystem (root)
but only on btrfs. I will increase timeout up to 300 and see what happens.

=== cut dmesg ===
INFO: task fail2ban-server:1747 blocked for more than 120 seconds.
      Tainted: G        W       4.4.0-1-rt-686-pae #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
fail2ban-server D 0000001f     0  1747      1 0x00000000
 f1ca1bc0 00200086 f2d24190 0000001f 00000000 f79ca4c0 f3d21bc0 00000000
 c1168726 f1ca1bc0 e9152000 00000000 e9151d8c c156075f c0d25a90 f1ca1bc0
 e9151db4 c1561ed4 00000000 00000000 f1ca1bc0 00000002 eab98940 c0d25a90
Call Trace:
 [<c1168726>] ? __filemap_fdatawrite_range+0xb6/0xf0
 [<c156075f>] ? schedule+0x3f/0xd0
 [<c1561ed4>] ? __rt_mutex_slowlock+0x74/0x140
 [<c1562373>] ? rt_mutex_slowlock+0xf3/0x250
 [<f8c2bb8e>] ? btrfs_write_marked_extents+0xae/0x190 [btrfs]
 [<c1562575>] ? rt_mutex_lock+0x45/0x50
 [<f8c78515>] ? btrfs_sync_log+0x1d5/0x9a0 [btrfs]
 [<c106ec71>] ? pin_current_cpu+0x71/0x1a0
 [<c1091cda>] ? preempt_count_add+0x8a/0xb0
 [<c106edb3>] ? unpin_current_cpu+0x13/0x70
 [<f8c474ee>] ? btrfs_sync_file+0x3ce/0x410 [btrfs]
 [<f8c47120>] ? start_ordered_ops+0x40/0x40 [btrfs]
 [<c11f9767>] ? vfs_fsync_range+0x47/0xb0
 [<c11f983c>] ? do_fsync+0x3c/0x60
 [<c11f9ab5>] ? SyS_fdatasync+0x15/0x20
 [<c100393d>] ? do_fast_syscall_32+0x8d/0x150
 [<c1563e28>] ? sysenter_past_esp+0x3d/0x61
 [<c1560000>] ? pci_mmcfg_check_reserved+0x90/0xb0
INFO: task cleanup:2093 blocked for more than 120 seconds.
      Tainted: G        W       4.4.0-1-rt-686-pae #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
cleanup         D d0f5de0c     0  2093  28135 0x00000000
 eab99bc0 00200086 c1836760 d0f5de0c 00000002 f79ca4c0 f3d21bc0 00000000
 00000001 eab99bc0 d0f5e000 f325d2c4 d0f5ddfc c156075f f325d000 00000088
 d0f5de30 f8c72f36 00000310 f325d290 00000000 eab99bc0 c10afd70 f325d2dc
Call Trace:
 [<c156075f>] ? schedule+0x3f/0xd0
 [<f8c72f36>] ? wait_log_commit+0xc6/0xf0 [btrfs]
 [<c10afd70>] ? wake_atomic_t_function+0x70/0x70
 [<f8c786aa>] ? btrfs_sync_log+0x36a/0x9a0 [btrfs]
 [<c106ec71>] ? pin_current_cpu+0x71/0x1a0
 [<c1091cda>] ? preempt_count_add+0x8a/0xb0
 [<c106edb3>] ? unpin_current_cpu+0x13/0x70
 [<f8c792d4>] ? btrfs_log_dentry_safe+0x64/0x70 [btrfs]
 [<f8c474ee>] ? btrfs_sync_file+0x3ce/0x410 [btrfs]
 [<c11c6100>] ? do_sys_truncate+0xb0/0xb0
 [<f8c47120>] ? start_ordered_ops+0x40/0x40 [btrfs]
 [<c11f9767>] ? vfs_fsync_range+0x47/0xb0
 [<c11f983c>] ? do_fsync+0x3c/0x60
 [<c11f9a92>] ? SyS_fsync+0x12/0x20
 [<c100393d>] ? do_fast_syscall_32+0x8d/0x150
 [<c1563e28>] ? sysenter_past_esp+0x3d/0x61
INFO: task lmtpd:2101 blocked for more than 120 seconds.
      Tainted: G        W       4.4.0-1-rt-686-pae #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
lmtpd           D db0c7e0c     0  2101   7894 0x00000000
 cd540940 00200086 c1836760 db0c7e0c 00000002 f79674c0 c1706dc0 00000000
 00000001 cd540940 db0c8000 f325d2e4 db0c7dfc c156075f f325d000 00000087
 db0c7e30 f8c72f36 00000314 f325d290 00000000 cd540940 c10afd70 f325d2fc
Call Trace:
 [<c156075f>] ? schedule+0x3f/0xd0
 [<f8c72f36>] ? wait_log_commit+0xc6/0xf0 [btrfs]
 [<c10afd70>] ? wake_atomic_t_function+0x70/0x70
 [<f8c783e5>] ? btrfs_sync_log+0xa5/0x9a0 [btrfs]
 [<c106ec71>] ? pin_current_cpu+0x71/0x1a0
 [<c1091cda>] ? preempt_count_add+0x8a/0xb0
 [<c106edb3>] ? unpin_current_cpu+0x13/0x70
 [<f8c792d4>] ? btrfs_log_dentry_safe+0x64/0x70 [btrfs]
 [<f8c474ee>] ? btrfs_sync_file+0x3ce/0x410 [btrfs]
 [<f8c47120>] ? start_ordered_ops+0x40/0x40 [btrfs]
 [<c11f9767>] ? vfs_fsync_range+0x47/0xb0
 [<c11f983c>] ? do_fsync+0x3c/0x60
 [<c11f9ab5>] ? SyS_fdatasync+0x15/0x20
 [<c100393d>] ? do_fast_syscall_32+0x8d/0x150
 [<c1563e28>] ? sysenter_past_esp+0x3d/0x61
INFO: task nmbd:2104 blocked for more than 120 seconds.
      Tainted: G        W       4.4.0-1-rt-686-pae #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
nmbd            D f6a652d4     0  2104   1702 0x00000000
 eab9ca00 00200086 c1836760 f6a652d4 00000002 f79ca4c0 f3d21bc0 00000000
 c10929fb eab9ca00 ca910000 f2fe2ea0 ca90fd20 c156075f f2023900 f325d1dc
 ca90fd48 f8c2b70f 00000000 eab9ca00 c10afd70 f6a652ec dfd2bdb4 98141c71
Call Trace:
 [<c10929fb>] ? migrate_enable+0x8b/0x150
 [<c156075f>] ? schedule+0x3f/0xd0
 [<f8c2b70f>] ? wait_current_trans.isra.21+0xaf/0xf0 [btrfs]
 [<c10afd70>] ? wake_atomic_t_function+0x70/0x70
 [<f8c2d09a>] ? start_transaction+0x29a/0x500 [btrfs]
 [<c106edb3>] ? unpin_current_cpu+0x13/0x70
 [<f8c2d31c>] ? btrfs_start_transaction+0x1c/0x20 [btrfs]
 [<f8c3e5d0>] ? btrfs_create+0x40/0x240 [btrfs]
 [<c106ec71>] ? pin_current_cpu+0x71/0x1a0
 [<c1563445>] ? _raw_spin_lock+0x15/0x40
 [<c1091cda>] ? preempt_count_add+0x8a/0xb0
 [<c11d3bab>] ? vfs_create+0xcb/0x140
 [<c11d626c>] ? path_openat+0x103c/0x1130
 [<c119ec91>] ? page_add_file_rmap+0x11/0x50
 [<c11d7ce8>] ? do_filp_open+0x68/0xe0
 [<c106edb3>] ? unpin_current_cpu+0x13/0x70
 [<c10929fb>] ? migrate_enable+0x8b/0x150
 [<c11e5375>] ? __alloc_fd+0xa5/0x160
 [<c11c7af6>] ? do_sys_open+0x116/0x220
 [<c11c7c22>] ? SyS_open+0x22/0x30
 [<c100393d>] ? do_fast_syscall_32+0x8d/0x150
 [<c1563e28>] ? sysenter_past_esp+0x3d/0x61
=== end ===

-- 
With best regards,
Dmitry

      reply	other threads:[~2016-06-16  1:05 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-02 11:43 Process is blocked for more than 120 seconds Dmitry Katsubo
2015-11-07 12:22 ` Dmitry Katsubo
2015-11-09 13:25   ` Austin S Hemmelgarn
2015-11-11 11:38     ` Dmitry Katsubo
2016-06-16  1:05       ` Dmitry Katsubo [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5761FB62.30001@mail.ru \
    --to=dma_k@mail.ru \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).