From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from fallback2.mail.ru ([94.100.179.22]:43563 "EHLO fallback2.mail.ru" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751859AbbKKM2G (ORCPT ); Wed, 11 Nov 2015 07:28:06 -0500 Received: from smtp48.i.mail.ru (smtp48.i.mail.ru [94.100.177.108]) by fallback2.mail.ru (mPOP.Fallback_MX) with ESMTP id 66DC964BD7C8 for ; Wed, 11 Nov 2015 14:37:39 +0300 (MSK) Received: from 77-173-215-182.ip.telfort.nl ([77.173.215.182]:46924 helo=centurion) by smtp48.i.mail.ru with esmtpa (envelope-from ) id 1ZwTib-0008U0-3D for linux-btrfs@vger.kernel.org; Wed, 11 Nov 2015 14:37:37 +0300 Received: from [127.0.0.1] (localhost [IPv6:::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by centurion (Postfix) with ESMTPSA id 96D033833EB for ; Wed, 11 Nov 2015 12:37:35 +0100 (CET) Subject: Re: Process is blocked for more than 120 seconds To: linux-btrfs References: <56374C7D.5080609@mail.ru> <563DECF0.8080901@mail.ru> <56409EE4.6070605@gmail.com> From: Dmitry Katsubo Message-ID: <564328B8.2020509@mail.ru> Date: Wed, 11 Nov 2015 12:38:32 +0100 MIME-Version: 1.0 In-Reply-To: <56409EE4.6070605@gmail.com> Content-Type: text/plain; charset=windows-1252 Sender: linux-btrfs-owner@vger.kernel.org List-ID: On 2015-11-09 14:25, Austin S Hemmelgarn wrote: > On 2015-11-07 07:22, Dmitry Katsubo wrote: >> Hi everyone, >> >> I have noticed the following in the log. The system continues to run, >> but I am not sure for how long it will be stable. Should I start >> worrying? Thanks in advance for the opinion. >> > This just means that a process was stuck in the D state (uninterruptible > I/O sleep) for more than 120 seconds. Depending on a number of factors, > this happening could mean: > 1. Absolutely nothing (if you have low-powered or older hardware, for > example, I get these regularly on a first generation Raspberry Pi if I > don't increase the timeout significantly) > 2. The program is doing a very large chunk of I/O (usually with the > O_DIRECT flag, although this probably isn't the case here) > 3. There's a bug in the blocked program (this is rarely the case when > this type of thing happens) > 4. There's a bug in the kernel (which is why this dumps a stack trace) > 5. The filesystem itself is messed up somehow, and the kernel isn't > handling it properly (technically a bug, but a more specific case of it). > 6. You're hardware is misbehaving, failing, or experienced a transient > error. > > Assuming you can rule out possibilities 1 and 6, I think that 4 is the > most likely cause, as all of the listed programs (I'm assuming that > 'master' is from postfix) are relatively well audited, and all of them > hit this at the same time. > > For what it's worth, if you want you can do: > echo 0 > /proc/sys/kernel/hung_task_timeout_secs > like the message says to stop these from appearing in the future, or use > some arbitrary number to change the timeout before these messages appear > (I usually use at least 150 on production systems, and more often 300, > although on something like a Raspberry Pi I often use timeouts as high > as 1800 seconds). Thanks for comments, Austin. The system is "normal" PC, running Intel Core 2 Duo Mobile @1.66GHz. "master" is indeed a postfix process. I haven't seen anything like that when I was on 3.16 kernel, but after I have upgraded to 4.2.3, I caught that message. I/O and CPU load are usually low, but it could be (6) from your list, as the system is generally very old (5+ years). As the problem appeared only once for passed 15 days, I think it is just a transient error. Thanks for clarifying the possible reasons. -- With best regards, Dmitry