All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kevin Richter <xfs@pzystorm.de>
To: xfs@oss.sgi.com
Subject: Re: XFS blocked task in xlog_cil_force_lsn
Date: Wed, 18 Dec 2013 11:27:54 +0100	[thread overview]
Message-ID: <52B178AA.6040302@pzystorm.de> (raw)
In-Reply-To: <52B118A9.8080905@hardwarefreak.com>

Thanks for your mails!

> This is unusual. How long have you waited?

For the reboot? One night.
After the copy process hangs: several hours. But mostly it recovers
after several minutes.

> 1.  Switch to deadline.  CFQ is not suitable for RAID storage, and not
> suitable for XFS.  This may not be a silver bullet but it will help.

Can I switch it while my copy process (from a separate hd to this raid)
is running... without data loss? Otherwise I would wait a bit, because
now it is actually running for 8 hours without kernel panics.

> 2.  Post your chunk size and RAID6 stripe_cache_size value.  They may be
> sub optimal for your workload.

$ cat /sys/block/md2/md/stripe_cache_size
256
$ mdadm --detail /dev/md2 | grep Chunk
Chunk Size : 512K

> 3.  Post 'xfs_info /dev/mdX'

There is a LUKS volume around /dev/md2, named '6tb'.
> $ xfs_info /dev/md2
> xfs_info: /dev/md2 is not a mounted XFS filesystem
> $ xfs_info /dev/mapper/6tb
> meta-data=/dev/mapper/6tb        isize=256    agcount=32, agsize=45631360 blks
>          =                       sectsz=512   attr=2
> data     =                       bsize=4096   blocks=1460203520, imaxpct=5
>          =                       sunit=128    swidth=384 blks
> naming   =version 2              bsize=4096   ascii-ci=0
> log      =internal               bsize=4096   blocks=521728, version=2
>          =                       sectsz=512   sunit=8 blks, lazy-count=1
> realtime =none                   extsz=4096   blocks=0, rtextents=0


> 4.  You're getting a lot of kswapd timeouts because you have swap and
> the md/RAID6 array on the same disks.  Relocate swap to disks that are
> not part of this RAID6.  Small SSDs are cheap and fast.  Buy one and put
> swap on it.  Or install more RAM in the machine.  Going the SSD route is
> better as it gives flexibility.  For instance, you can also relocate
> your syslog files to it and anything else that does IO without eating
> lots of space.  This decreases the IOPS load on your rust.

No no, swap is not on any of the raid disks.

> # cat /proc/swaps
> Filename                                Type            Size    Used    Priority
> /dev/sda3                               partition       7812496 0       -1
sda is not in the raid. In the raid there are sd[cdefg].


> 5.  Describe in some detail the workload(s) causing the heavy IO, and
> thus these timeouts.

cd /olddharddisk
cp -av . /raid/

oldhardddisk is a mounted 1tb old harddisk, /raid is the 6tb raid from
above.

Heavy workload while this copy process (2 CPUs, each 4 cores):
> top - 11:13:37 up 4 days, 21:32,  2 users,  load average: 12.95, 11.33, 10.32
> Tasks: 155 total,   2 running, 153 sleeping,   0 stopped,   0 zombie
> Cpu(s):  0.0%us,  5.7%sy,  0.0%ni, 82.1%id, 11.8%wa,  0.0%hi,  0.3%si,  0.0%st
> Mem:  32916276k total, 32750240k used,   166036k free, 10076760k buffers
> Swap:  7812496k total,        0k used,  7812496k free, 21221136k cached
> 
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>   699 root      20   0     0    0    0 S   11  0.0 248:17.59 md2_raid6

Dont know what consumes all of this 32GB RAM... 'top' sorted by memory
consumption does not tell me. All entries are only 0.0% and 0.1%




Thanks,
Kevin




Am 18.12.2013 04:38, schrieb Stan Hoeppner:
> On 12/17/2013 8:05 PM, Kevin Richter wrote:
>> Hi,
>>
>> around April 2012 there was a similar thread on this list which I have
>> found via Google, so my mail topic is the same.
>>
>> I have a RAID6 array with 5 disks (each 2TB, net: 6TB). While copying
>> under heavy load there are always these blocks. At the bottom of this
>> message I have included some line from the syslog.
>>
>> Even a reboot is now not possible anymore, because the whole system
>> hangs while executing the "sync" command in one of the shutdown scripts.
>>
>> So... first I have thought that my disks are faulty.
>> But with smartmontools I have started a short and a long test on all of
>> the 5 disks: no errors
>>
>> Then I have even recreated the whole array, but no improvement.
>>
>> Details about my server: 3.2.0-57-generic, Ubuntu 12.04.3 LTS
>> Details about the array: soft array with mdadm v3.2.5, no hardware raid
>> controller in the server
>>
>> The scheduler of the raid disks:
>>> $ cat /sys/block/sd[cdefg]/queue/scheduler
>>> noop deadline [cfq]
>>> noop deadline [cfq]
>>> noop deadline [cfq]
>>> noop deadline [cfq]
>>> noop deadline [cfq]
>>
>>
>> Any ideas what I can do?
> 
> Your workload is seeking the disks to death, which is why you're getting
> these timeouts.  The actuators simply can't keep up.
> 
> 1.  Switch to deadline.  CFQ is not suitable for RAID storage, and not
> suitable for XFS.  This may not be a silver bullet but it will help.
> 
> 2.  Post your chunk size and RAID6 stripe_cache_size value.  They may be
> sub optimal for your workload.  For the latter
> 
> $ cat /sys/block/mdX/md/stripe_cache_size
> 
> 3.  Post 'xfs_info /dev/mdX'
> 
> 4.  You're getting a lot of kswapd timeouts because you have swap and
> the md/RAID6 array on the same disks.  Relocate swap to disks that are
> not part of this RAID6.  Small SSDs are cheap and fast.  Buy one and put
> swap on it.  Or install more RAM in the machine.  Going the SSD route is
> better as it gives flexibility.  For instance, you can also relocate
> your syslog files to it and anything else that does IO without eating
> lots of space.  This decreases the IOPS load on your rust.
> 
> 5.  Describe in some detail the workload(s) causing the heavy IO, and
> thus these timeouts.
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2013-12-18 10:29 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-18  2:05 XFS blocked task in xlog_cil_force_lsn Kevin Richter
2013-12-18  3:38 ` Stan Hoeppner
2013-12-18 10:27   ` Kevin Richter [this message]
2013-12-19 14:11     ` Stan Hoeppner
2013-12-20 10:26       ` Kevin Richter
2013-12-20 12:36         ` Stan Hoeppner
2013-12-21  5:30           ` Dave Chinner
2013-12-22  9:18             ` Stan Hoeppner
2013-12-22 20:14               ` Dave Chinner
2013-12-22 21:01               ` Michael L. Semon
2013-12-22  2:35           ` Kevin Richter
2013-12-22  8:12             ` Stan Hoeppner
2013-12-22 14:10               ` Kevin Richter
2013-12-22 17:29                 ` Stan Hoeppner
2013-12-20 22:43         ` Arkadiusz Miśkiewicz
2013-12-21 11:18           ` md-RAID5/6 stripe_cache_size default value vs performance vs memory footprint Stan Hoeppner
2013-12-21 11:18             ` Stan Hoeppner
2013-12-21 12:20             ` Piergiorgio Sartor
2013-12-21 12:20               ` Piergiorgio Sartor
2013-12-22  1:41             ` Stan Hoeppner
2013-12-26  8:55             ` Christoph Hellwig
2013-12-26  9:24               ` Stan Hoeppner
2013-12-26 22:14                 ` NeilBrown
2013-12-18  8:33 ` XFS blocked task in xlog_cil_force_lsn Stefan Ring
2013-12-18 22:21 ` Dave Chinner
  -- strict thread matches above, loose matches on Subject: below --
2012-04-16  6:48 Stefan Priebe - Profihost AG
2012-04-16 23:56 ` Dave Chinner
2012-04-17  8:19   ` Stefan Priebe - Profihost AG
2012-04-17  8:32     ` Dave Chinner
2012-04-17  9:19       ` Stefan Priebe - Profihost AG

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52B178AA.6040302@pzystorm.de \
    --to=xfs@pzystorm.de \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.