From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from relay.sgi.com (relay2.corp.sgi.com [137.38.102.29]) by oss.sgi.com (Postfix) with ESMTP id A9EBA7F4E for ; Tue, 25 Mar 2014 04:40:15 -0500 (CDT) Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by relay2.corp.sgi.com (Postfix) with ESMTP id 89DDB304059 for ; Tue, 25 Mar 2014 02:40:12 -0700 (PDT) Received: from smtp2.ispfabriek.nl (smtp2.ispfabriek.nl [37.251.0.169]) by cuda.sgi.com with ESMTP id sgvuqQIqCokPm2pP for ; Tue, 25 Mar 2014 02:40:10 -0700 (PDT) Message-ID: <53314EED.2000004@1st-setup.nl> Date: Tue, 25 Mar 2014 10:39:57 +0100 From: "Michel Verbraak(1st-Setup)" MIME-Version: 1.0 Subject: Re: xfs blocks (blocked for more than 120 seconds) References: <532FF9DD.5080700@1st-setup.nl> <20140324173636.GD18572@destitution> In-Reply-To: <20140324173636.GD18572@destitution> List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: multipart/mixed; boundary="===============4766323912676676861==" Errors-To: xfs-bounces@oss.sgi.com Sender: xfs-bounces@oss.sgi.com To: Dave Chinner Cc: xfs@oss.sgi.com This is a multi-part message in MIME format. --===============4766323912676676861== Content-Type: multipart/alternative; boundary="------------000709050504020909030804" This is a multi-part message in MIME format. --------------000709050504020909030804 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit op 24-03-14 18:36, Dave Chinner schreef: > On Mon, Mar 24, 2014 at 10:24:45AM +0100, Michel Verbraak(1st-Setup) wrote: >> Hi, >> >> We have a problem with one of our systems which is using XFS but we are >> unable to find the problem. Recently we had two moments, Tuesday 4th of >> March and Friday the 21st of March, where we had to reboot the system to >> get it up and running again. >> >> What happens: >> - The programs handling files on the XFS disc stop working when >> creating, deleting or writing files. They do not error they are just >> waiting on the command to complete. >> - One of our programs, a java application, goes into very high cpu usage >> (50%) which normally is at 1%. This could be something in our java >> application but it happens at the moment handling files gets stuck. >> - A nice restart of the programs does not succeed as wel a kill -9 does >> not work. >> - Trying to reboot the servers in a normal fashion does not work. As it >> is a virtual machine we have to do a shutdown (unplug power) and start >> it up again to get it up and running. > ...... >> Following details I have for you: >> >> System OS: Ubuntu 12.04 LTS >> Kernel: 3.2.0-37-generic #58-Ubuntu SMP Thu Jan 24 15:28:10 UTC 2013 >> x86_64 x86_64 x86_64 GNU/Linux >> Server: Virtual machine in a VMWare setup. >> Disc: 300GB direct attached LUN >> >> We have an exact clone of this system for our acceptance environment. In >> this environment we are unable to reproduce this problem/situation. >> >> Differences between the two days is that our services on 2014-03-21 were >> quit busy with a lot of file changes on the xfs disc and on 2014-03-04 >> the system was very quiet on the moment the kernel traces appear and the >> services get stuck. >> >> Any help is appreciated. >> >> Regards Michel Verbraak. >> >> Following we see in the syslog on both moments (2014-03-04 and 2014-03-21): >> > .... >> Mar 21 06:32:20 ealxs00169 kernel: [1412280.930543] flush-8:16 D >> 0000000000000000 0 13864 2 0x00000000 >> [] schedule+0x3f/0x60 >> [] io_schedule+0x8f/0xd0 >> [] sleep_on_page+0xe/0x20 >> [] __wait_on_bit_lock+0x5a/0xc0 >> [] __lock_page+0x67/0x70 >> [] write_cache_pages+0x3d4/0x460 >> [] generic_writepages+0x4a/0x70 >> [] xfs_vm_writepages+0x4d/0x60 [xfs] >> [] do_writepages+0x21/0x40 >> [] writeback_single_inode+0x180/0x430 >> [] writeback_sb_inodes+0x1b6/0x270 >> [] __writeback_inodes_wb+0x9e/0xd0 >> [] wb_writeback+0x27b/0x330 >> [] wb_check_old_data_flush+0x9f/0xb0 >> [] wb_do_writeback+0x151/0x1d0 >> [] bdi_writeback_thread+0x83/0x2a0 >> [] kthread+0x8c/0xa0 > Writeback is blocked on a locked page, and is waiting for IO > completion. We currently have the following options in fstab for the filesystem: defaults,noatime,inode64,barrier=no Now I read everywhere to turn of barrier you should specify "nobarrier". Is our way of disabling wrong? The disc used has "write cache" disabled: [ 2.875792] sd 3:0:0:0: Attached scsi generic sg2 type 0 [ 2.876376] sd 3:0:0:0: [sdb] 629145600 512-byte logical blocks: (322 GB/300 GiB) [ 2.876879] sd 3:0:0:0: [sdb] Write Protect is off [ 2.877050] sd 3:0:0:0: [sdb] Mode Sense: 87 00 00 08 [ 2.877890] sd 3:0:0:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA [ 2.885634] sdb: unknown partition table ... [ 5.132308] SGI XFS with ACLs, security attributes, realtime, large block/inode numbers, no debug enabled [ 5.155268] SGI XFS Quota Management subsystem [ 5.159345] XFS (sdb): Mounting Filesystem .. Following is output of xfs_info on sdb: meta-data=/dev/sdb isize=256 agcount=4, agsize=19660800 blks = sectsz=512 attr=2 data = bsize=4096 blocks=78643200, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 log =internal bsize=4096 blocks=38400, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 > > >> Mar 21 06:34:20 ealxs00169 kernel: [1412400.891181] archiver.pl D >> [] schedule+0x3f/0x60 >> [] schedule_timeout+0x2a5/0x320 >> [] __down_common+0xa5/0xf5 >> [] __down+0x1d/0x1f ..... >> Cheers, >> >> Dave. Michel. --------------000709050504020909030804 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit

op 24-03-14 18:36, Dave Chinner schreef:

On Mon, Mar 24, 2014 at 10:24:45AM +0100, Michel Verbraak(1st-Setup) wrote:

Hi,

We have a problem with one of our systems which is using XFS but we are
unable to find the problem. Recently we had two moments, Tuesday 4th of
March and Friday the 21st of March, where we had to reboot the system to
get it up and running again.

What happens:
- The programs handling files on the XFS disc stop working when
creating, deleting or writing files. They do not error they are just
waiting on the command to complete.
- One of our programs, a java application, goes into very high cpu usage
(50%) which normally is at 1%. This could be something in our java
application but it happens at the moment handling files gets stuck.
- A nice restart of the programs does not succeed as wel a kill -9 does
not work.
- Trying to reboot the servers in a normal fashion does not work. As it
is a virtual machine we have to do a shutdown (unplug power) and start
it up again to get it up and running.

......

Following details I have for you:

System OS: Ubuntu 12.04 LTS
Kernel: 3.2.0-37-generic #58-Ubuntu SMP Thu Jan 24 15:28:10 UTC 2013
x86_64 x86_64 x86_64 GNU/Linux
Server: Virtual machine in a VMWare setup.
Disc: 300GB direct attached LUN

We have an exact clone of this system for our acceptance environment. In
this environment we are unable to reproduce this problem/situation.

Differences between the two days is that our services on 2014-03-21 were
quit busy with a lot of file changes on the xfs disc and on 2014-03-04
the system was very quiet on the moment the kernel traces appear and the
services get stuck.

Any help is appreciated.

Regards Michel Verbraak.

Following we see in the syslog on both moments (2014-03-04 and 2014-03-21):

....

Mar 21 06:32:20 ealxs00169 kernel: [1412280.930543] flush-8:16      D
0000000000000000     0 13864      2 0x00000000
[<ffffffff8165b34f>] schedule+0x3f/0x60
[<ffffffff8165b3ff>] io_schedule+0x8f/0xd0
[<ffffffff8111836e>] sleep_on_page+0xe/0x20
[<ffffffff8165baca>] __wait_on_bit_lock+0x5a/0xc0
[<ffffffff81118357>] __lock_page+0x67/0x70
[<ffffffff81122bd4>] write_cache_pages+0x3d4/0x460
[<ffffffff81122caa>] generic_writepages+0x4a/0x70
[<ffffffffa007980d>] xfs_vm_writepages+0x4d/0x60 [xfs]
[<ffffffff81123b71>] do_writepages+0x21/0x40
[<ffffffff811a2990>] writeback_single_inode+0x180/0x430
[<ffffffff811a3056>] writeback_sb_inodes+0x1b6/0x270
[<ffffffff811a31ae>] __writeback_inodes_wb+0x9e/0xd0
[<ffffffff811a345b>] wb_writeback+0x27b/0x330
[<ffffffff811a35af>] wb_check_old_data_flush+0x9f/0xb0
[<ffffffff811a4481>] wb_do_writeback+0x151/0x1d0
[<ffffffff811a4583>] bdi_writeback_thread+0x83/0x2a0
[<ffffffff8108b27c>] kthread+0x8c/0xa0

Writeback is blocked on a locked page, and is waiting for IO
completion.

We currently have the following options in fstab for the filesystem: defaults,noatime,inode64,barrier=no

Now I read everywhere to turn of barrier you should specify "nobarrier". Is our way of disabling wrong?

The disc used has "write cache" disabled:
[    2.875792] sd 3:0:0:0: Attached scsi generic sg2 type 0
[    2.876376] sd 3:0:0:0: [sdb] 629145600 512-byte logical blocks: (322 GB/300 GiB)
[    2.876879] sd 3:0:0:0: [sdb] Write Protect is off
[    2.877050] sd 3:0:0:0: [sdb] Mode Sense: 87 00 00 08
[    2.877890] sd 3:0:0:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[    2.885634] sdb: unknown partition table
...
[    5.132308] SGI XFS with ACLs, security attributes, realtime, large block/inode numbers, no debug enabled
[    5.155268] SGI XFS Quota Management subsystem
[    5.159345] XFS (sdb): Mounting Filesystem
..

Following is output of xfs_info on sdb:
meta-data=/dev/sdb               isize=256    agcount=4, agsize=19660800 blks
         =                       sectsz=512   attr=2
data     =                       bsize=4096   blocks=78643200, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal               bsize=4096   blocks=38400, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

Mar 21 06:34:20 ealxs00169 kernel: [1412400.891181] archiver.pl     D
[<ffffffff8165b34f>] schedule+0x3f/0x60
[<ffffffff8165b995>] schedule_timeout+0x2a5/0x320
[<ffffffff8165c5f0>] __down_common+0xa5/0xf5
[<ffffffff8165c6b3>] __down+0x1d/0x1f

.....

Cheers,

Dave.

Michel.
--------------000709050504020909030804-- --===============4766323912676676861== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs --===============4766323912676676861==--