public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Troels Hansen <th@casalogic.dk>
Cc: linux-xfs <linux-xfs@vger.kernel.org>
Subject: Re: Strange XFS problem
Date: Thu, 13 Sep 2018 16:18:33 +1000	[thread overview]
Message-ID: <20180913061833.GB27618@dastard> (raw)
In-Reply-To: <1406319049.4700361.1536816119283.JavaMail.zimbra@casalogic.dk>

On Thu, Sep 13, 2018 at 07:21:59AM +0200, Troels Hansen wrote:
> 
> > 
> > What happens on your network every 14 days or so? Is there a rogue
> > client side backup or admin task running somewhere?
> > 
> 
> Well, we run nightly backups, but thats read ops.

Yup, but that can get stuck modifying atime, like the bacula process
in the hung process traces. :)

Hmmm - just a thought - it's hardware raid - it's not running a
background admin op like a media scrub every 14 days, is it?

> When I look at the load, its not particular more loaded at that time, than normal work.

OK.

> > Does this repeat every 120s?
> 
> No, what I sent is the full trace. It happened around 23:23, but
> no more XFS errors in the log (which is on the ext4 OS disk).

Ok, so those processes reported as hung have been woken and made
progress again. It seems like a temporary overload situation.

> It was working when I came in the following morning aroung 6:45,
> and worked for some time,  but initially failed, and we had to
> reboot the server to get NFS exports to work.  But, as I said,
> even though the fs was inaccessible from NFS I could `ls` the
> filesystem locally, but we really have no indication of it being
> an NFS problem, as we only see the XFS problem.

That could be the same problem, with all the kernel nfsds blocked
waiting for the filesystem so no new NFS requests could be
processed.  How many kernel nfsd threads do you run?  Local
operations can still be done (don't go through nfsds), and they
won't be slow if they hit the caches rather than have to retreive
data from disk.

> It could also boil down to a NFS problem, I just wasn't sure how
> to read the XFS trace.

Like you, I don't think this is an NFS problem - it smells more of
how huge hardware writeback caches in front of slow disks using
RAID5/6 behave.

i.e. Flushing 100MB of sequential write data from the cache takes a
fraction or a second, flushing 100MB of random 4k write data to
RAID5 luns can take minutes. While the hardware cache and flushing is
supposed to be completely invisible to the OS, we can see it's
impact via unexpectedly high device utilisations and long IO times
for otherwise normal IO loads.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2018-09-13 11:27 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-12  8:07 Strange XFS problem Troels Hansen
2018-09-12 10:59 ` Carlos Maiolino
2018-09-12 11:39   ` Troels Hansen
     [not found]     ` <notmuch-sha1-960c954e5404b5b2f083d150633af0b7848ec14c>
2018-09-12 16:39       ` Carlos Maiolino
2018-09-13  4:19 ` Dave Chinner
2018-09-13  5:21   ` Troels Hansen
2018-09-13  6:18     ` Dave Chinner [this message]
  -- strict thread matches above, loose matches on Subject: below --
2018-06-03  1:51 Thorsten Hufnagel
2018-06-03  2:32 ` Thorsten Hufnagel
2018-06-05  8:29   ` Carlos Maiolino
2018-06-05 13:46     ` Stefan Ring
2018-06-05 14:19       ` Eric Sandeen
2018-06-05 14:46         ` Stefan Ring
2018-06-05 17:34           ` Chris Murphy
2018-06-12 18:53         ` Stefan Ring
2018-06-12 18:56           ` Eric Sandeen
2018-06-13 22:02             ` Dave Chinner
2018-06-14  2:11               ` Eric Sandeen
2018-06-14 15:47             ` Stefan Ring

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180913061833.GB27618@dastard \
    --to=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    --cc=th@casalogic.dk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox