All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dmitry Katsubo <dma_k@mail.ru>
To: linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: Process is blocked for more than 120 seconds
Date: Thu, 16 Jun 2016 03:05:38 +0200	[thread overview]
Message-ID: <5761FB62.30001@mail.ru> (raw)
In-Reply-To: <564328B8.2020509@mail.ru>

On 2015-11-11 12:38, Dmitry Katsubo wrote:
> On 2015-11-09 14:25, Austin S Hemmelgarn wrote:
>> On 2015-11-07 07:22, Dmitry Katsubo wrote:
>>> Hi everyone,
>>>
>>> I have noticed the following in the log. The system continues to run,
>>> but I am not sure for how long it will be stable. Should I start
>>> worrying? Thanks in advance for the opinion.
>>>
>> This just means that a process was stuck in the D state (uninterruptible
>> I/O sleep) for more than 120 seconds.  Depending on a number of factors,
>> this happening could mean:
>> 1. Absolutely nothing (if you have low-powered or older hardware, for
>> example, I get these regularly on a first generation Raspberry Pi if I
>> don't increase the timeout significantly)
>> 2. The program is doing a very large chunk of I/O (usually with the
>> O_DIRECT flag, although this probably isn't the case here)
>> 3. There's a bug in the blocked program (this is rarely the case when
>> this type of thing happens)
>> 4. There's a bug in the kernel (which is why this dumps a stack trace)
>> 5. The filesystem itself is messed up somehow, and the kernel isn't
>> handling it properly (technically a bug, but a more specific case of it).
>> 6. You're hardware is misbehaving, failing, or experienced a transient
>> error.
>>
>> Assuming you can rule out possibilities 1 and 6, I think that 4 is the
>> most likely cause, as all of the listed programs (I'm assuming that
>> 'master' is from postfix) are relatively well audited, and all of them
>> hit this at the same time.
>>
>> For what it's worth, if you want you can do:
>> echo 0 > /proc/sys/kernel/hung_task_timeout_secs
>> like the message says to stop these from appearing in the future, or use
>> some arbitrary number to change the timeout before these messages appear
>> (I usually use at least 150 on production systems, and more often 300,
>> although on something like a Raspberry Pi I often use timeouts as high
>> as 1800 seconds).
> 
> Thanks for comments, Austin.
> 
> The system is "normal" PC, running Intel Core 2 Duo Mobile @1.66GHz.
> "master" is indeed a postfix process.
> 
> I haven't seen anything like that when I was on 3.16 kernel, but after I
> have upgraded to 4.2.3, I caught that message. I/O and CPU load are
> usually low, but it could be (6) from your list, as the system is
> generally very old (5+ years).
> 
> As the problem appeared only once for passed 15 days, I think it is just
> a transient error. Thanks for clarifying the possible reasons.

The problem (rarely) re-occurs. It does not happen on XFS filesystem (root)
but only on btrfs. I will increase timeout up to 300 and see what happens.

=== cut dmesg ===
INFO: task fail2ban-server:1747 blocked for more than 120 seconds.
      Tainted: G        W       4.4.0-1-rt-686-pae #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
fail2ban-server D 0000001f     0  1747      1 0x00000000
 f1ca1bc0 00200086 f2d24190 0000001f 00000000 f79ca4c0 f3d21bc0 00000000
 c1168726 f1ca1bc0 e9152000 00000000 e9151d8c c156075f c0d25a90 f1ca1bc0
 e9151db4 c1561ed4 00000000 00000000 f1ca1bc0 00000002 eab98940 c0d25a90
Call Trace:
 [<c1168726>] ? __filemap_fdatawrite_range+0xb6/0xf0
 [<c156075f>] ? schedule+0x3f/0xd0
 [<c1561ed4>] ? __rt_mutex_slowlock+0x74/0x140
 [<c1562373>] ? rt_mutex_slowlock+0xf3/0x250
 [<f8c2bb8e>] ? btrfs_write_marked_extents+0xae/0x190 [btrfs]
 [<c1562575>] ? rt_mutex_lock+0x45/0x50
 [<f8c78515>] ? btrfs_sync_log+0x1d5/0x9a0 [btrfs]
 [<c106ec71>] ? pin_current_cpu+0x71/0x1a0
 [<c1091cda>] ? preempt_count_add+0x8a/0xb0
 [<c106edb3>] ? unpin_current_cpu+0x13/0x70
 [<f8c474ee>] ? btrfs_sync_file+0x3ce/0x410 [btrfs]
 [<f8c47120>] ? start_ordered_ops+0x40/0x40 [btrfs]
 [<c11f9767>] ? vfs_fsync_range+0x47/0xb0
 [<c11f983c>] ? do_fsync+0x3c/0x60
 [<c11f9ab5>] ? SyS_fdatasync+0x15/0x20
 [<c100393d>] ? do_fast_syscall_32+0x8d/0x150
 [<c1563e28>] ? sysenter_past_esp+0x3d/0x61
 [<c1560000>] ? pci_mmcfg_check_reserved+0x90/0xb0
INFO: task cleanup:2093 blocked for more than 120 seconds.
      Tainted: G        W       4.4.0-1-rt-686-pae #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
cleanup         D d0f5de0c     0  2093  28135 0x00000000
 eab99bc0 00200086 c1836760 d0f5de0c 00000002 f79ca4c0 f3d21bc0 00000000
 00000001 eab99bc0 d0f5e000 f325d2c4 d0f5ddfc c156075f f325d000 00000088
 d0f5de30 f8c72f36 00000310 f325d290 00000000 eab99bc0 c10afd70 f325d2dc
Call Trace:
 [<c156075f>] ? schedule+0x3f/0xd0
 [<f8c72f36>] ? wait_log_commit+0xc6/0xf0 [btrfs]
 [<c10afd70>] ? wake_atomic_t_function+0x70/0x70
 [<f8c786aa>] ? btrfs_sync_log+0x36a/0x9a0 [btrfs]
 [<c106ec71>] ? pin_current_cpu+0x71/0x1a0
 [<c1091cda>] ? preempt_count_add+0x8a/0xb0
 [<c106edb3>] ? unpin_current_cpu+0x13/0x70
 [<f8c792d4>] ? btrfs_log_dentry_safe+0x64/0x70 [btrfs]
 [<f8c474ee>] ? btrfs_sync_file+0x3ce/0x410 [btrfs]
 [<c11c6100>] ? do_sys_truncate+0xb0/0xb0
 [<f8c47120>] ? start_ordered_ops+0x40/0x40 [btrfs]
 [<c11f9767>] ? vfs_fsync_range+0x47/0xb0
 [<c11f983c>] ? do_fsync+0x3c/0x60
 [<c11f9a92>] ? SyS_fsync+0x12/0x20
 [<c100393d>] ? do_fast_syscall_32+0x8d/0x150
 [<c1563e28>] ? sysenter_past_esp+0x3d/0x61
INFO: task lmtpd:2101 blocked for more than 120 seconds.
      Tainted: G        W       4.4.0-1-rt-686-pae #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
lmtpd           D db0c7e0c     0  2101   7894 0x00000000
 cd540940 00200086 c1836760 db0c7e0c 00000002 f79674c0 c1706dc0 00000000
 00000001 cd540940 db0c8000 f325d2e4 db0c7dfc c156075f f325d000 00000087
 db0c7e30 f8c72f36 00000314 f325d290 00000000 cd540940 c10afd70 f325d2fc
Call Trace:
 [<c156075f>] ? schedule+0x3f/0xd0
 [<f8c72f36>] ? wait_log_commit+0xc6/0xf0 [btrfs]
 [<c10afd70>] ? wake_atomic_t_function+0x70/0x70
 [<f8c783e5>] ? btrfs_sync_log+0xa5/0x9a0 [btrfs]
 [<c106ec71>] ? pin_current_cpu+0x71/0x1a0
 [<c1091cda>] ? preempt_count_add+0x8a/0xb0
 [<c106edb3>] ? unpin_current_cpu+0x13/0x70
 [<f8c792d4>] ? btrfs_log_dentry_safe+0x64/0x70 [btrfs]
 [<f8c474ee>] ? btrfs_sync_file+0x3ce/0x410 [btrfs]
 [<f8c47120>] ? start_ordered_ops+0x40/0x40 [btrfs]
 [<c11f9767>] ? vfs_fsync_range+0x47/0xb0
 [<c11f983c>] ? do_fsync+0x3c/0x60
 [<c11f9ab5>] ? SyS_fdatasync+0x15/0x20
 [<c100393d>] ? do_fast_syscall_32+0x8d/0x150
 [<c1563e28>] ? sysenter_past_esp+0x3d/0x61
INFO: task nmbd:2104 blocked for more than 120 seconds.
      Tainted: G        W       4.4.0-1-rt-686-pae #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
nmbd            D f6a652d4     0  2104   1702 0x00000000
 eab9ca00 00200086 c1836760 f6a652d4 00000002 f79ca4c0 f3d21bc0 00000000
 c10929fb eab9ca00 ca910000 f2fe2ea0 ca90fd20 c156075f f2023900 f325d1dc
 ca90fd48 f8c2b70f 00000000 eab9ca00 c10afd70 f6a652ec dfd2bdb4 98141c71
Call Trace:
 [<c10929fb>] ? migrate_enable+0x8b/0x150
 [<c156075f>] ? schedule+0x3f/0xd0
 [<f8c2b70f>] ? wait_current_trans.isra.21+0xaf/0xf0 [btrfs]
 [<c10afd70>] ? wake_atomic_t_function+0x70/0x70
 [<f8c2d09a>] ? start_transaction+0x29a/0x500 [btrfs]
 [<c106edb3>] ? unpin_current_cpu+0x13/0x70
 [<f8c2d31c>] ? btrfs_start_transaction+0x1c/0x20 [btrfs]
 [<f8c3e5d0>] ? btrfs_create+0x40/0x240 [btrfs]
 [<c106ec71>] ? pin_current_cpu+0x71/0x1a0
 [<c1563445>] ? _raw_spin_lock+0x15/0x40
 [<c1091cda>] ? preempt_count_add+0x8a/0xb0
 [<c11d3bab>] ? vfs_create+0xcb/0x140
 [<c11d626c>] ? path_openat+0x103c/0x1130
 [<c119ec91>] ? page_add_file_rmap+0x11/0x50
 [<c11d7ce8>] ? do_filp_open+0x68/0xe0
 [<c106edb3>] ? unpin_current_cpu+0x13/0x70
 [<c10929fb>] ? migrate_enable+0x8b/0x150
 [<c11e5375>] ? __alloc_fd+0xa5/0x160
 [<c11c7af6>] ? do_sys_open+0x116/0x220
 [<c11c7c22>] ? SyS_open+0x22/0x30
 [<c100393d>] ? do_fast_syscall_32+0x8d/0x150
 [<c1563e28>] ? sysenter_past_esp+0x3d/0x61
=== end ===

-- 
With best regards,
Dmitry

      reply	other threads:[~2016-06-16  1:05 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-02 11:43 Process is blocked for more than 120 seconds Dmitry Katsubo
2015-11-07 12:22 ` Dmitry Katsubo
2015-11-09 13:25   ` Austin S Hemmelgarn
2015-11-11 11:38     ` Dmitry Katsubo
2016-06-16  1:05       ` Dmitry Katsubo [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5761FB62.30001@mail.ru \
    --to=dma_k@mail.ru \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.