From: Andrew Martin <amartin@xes-inc.com>
To: stan@hardwarefreak.com
Cc: NeilBrown <neilb@suse.de>, linux-raid@vger.kernel.org
Subject: Re: Automatically drop caches after mdadm fails a drive out of an array?
Date: Wed, 12 Feb 2014 08:44:48 -0600 (CST) [thread overview]
Message-ID: <1113712961.13129.1392216288183.JavaMail.zimbra@xes-inc.com> (raw)
In-Reply-To: <52FABC1A.1010804@hardwarefreak.com>
Stan,
----- Original Message -----
> From: "Stan Hoeppner" <stan@hardwarefreak.com>
> To: "Andrew Martin" <amartin@xes-inc.com>, "NeilBrown" <neilb@suse.de>
> Cc: linux-raid@vger.kernel.org
> Sent: Tuesday, February 11, 2014 6:11:06 PM
> Subject: Re: Automatically drop caches after mdadm fails a drive out of an array?
>
> On 2/11/2014 5:10 PM, Andrew Martin wrote:
> > Neil,
> >
> > ----- Original Message -----
> >> From: "NeilBrown" <neilb@suse.de>
> >> To: "Andrew Martin" <amartin@xes-inc.com>
> >> Cc: linux-raid@vger.kernel.org
> >> Sent: Tuesday, February 11, 2014 1:54:20 PM
> >> Subject: Re: Automatically drop caches after mdadm fails a drive out of an
> >> array?
> >>
> >> On Tue, 11 Feb 2014 11:11:04 -0600 (CST) Andrew Martin
> >> <amartin@xes-inc.com>
> >> wrote:
> >>
> >>> Hello,
> >>>
> >>> I am running mdadm 3.2.5 on an Ubuntu 12.04 fileserver with a 10-drive
> >>> RAID6 array (10x1TB). Recently, /dev/sdb started failing:
> >>> Feb 10 13:49:29 myfileserver kernel: [17162220.838256] sas: command
> >>> 0xffff88010628f600, task 0xffff8800466241c0, timed out:
> >>> BLK_EH_NOT_HANDLED
> >>>
> >>> Around this same time, a few users attempted to access a directory on
> >>> this
> >>> RAID array over CIFS, which they had previously accessed earlier in the
> >>> day. When they attempted to access it this time, the directory was empty.
> >>> The emptiness of the folder was confirmed via a local shell on the
> >>> fileserver, which reported the same information. At around 13:50, mdadm
> >>> dropped /dev/sdb from the RAID array:
> >>
> >> The directory being empty can have nothing to do with the device failure.
> >> md/raid will never let bad data into the page cache in the manner you
> >> suggest.
> >
> > Thank you for the clarification. What other possibilities could have
> > triggered
> > this behavior? I am also using LVM and DRBD on top of the the md device.
>
> The filesystem told you the directory was empty. Directories and files
> are filesystem structures. Why are you talking about all the layers of
> the stack below the filesystem, but not the filesystem itself? What
> filesystem is this? Are there any FS related errors in dmesg?
It seemed unlikely that the timing of the failure of the drive out of
the raid array and these filesystem-level problems was coincidental.
Yes, there were also filesystem errors, immediately after md dropped the
device. This is an ext4 filesystem:
13:50:31 mdadm[1897]: Fail event detected on md device /dev/md2, component device /dev/sdb
13:50:31 smbd[3428]: [2014/02/10 13:50:31.226854, 0] smbd/process.c:2439(keepalive_fn)
13:50:31 smbd[13539]: [2014/02/10 13:50:31.227084, 0] smbd/process.c:2439(keepalive_fn)
13:50:31 kernel: [17162282.624858] EXT4-fs error (device drbd0): htree_dirblock_to_tree:587: inode #148638560: block 1189089581: comm smbd: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=2004033568, rec_len=29801, name_len=99
13:50:31 kernel: [17162282.823733] EXT4-fs error (device drbd0): htree_dirblock_to_tree:587: inode #148638560: block 1189089581: comm smbd: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=2004033568, rec_len=29801, name_len=99
13:50:31 kernel: [17162282.832886] /build/buildd/linux-3.2.0/drivers/scsi/mvsas/mv_sas.c 1863:port 2 slot 45 rx_desc 3002D has error info8000000080000000.
13:50:31 kernel: [17162282.832920] /build/buildd/linux-3.2.0/drivers/scsi/mvsas/mv_94xx.c 626:command active 30305FFF, slot [2d].
13:50:31 kernel: [17162282.991884] /build/buildd/linux-3.2.0/drivers/scsi/mvsas/mv_sas.c 1863:port 3 slot 52 rx_desc 30034 has error info8000000080000000.
13:50:31 kernel: [17162282.991892] /build/buildd/linux-3.2.0/drivers/scsi/mvsas/mv_94xx.c 626:command active 302FFFFF, slot [34].
13:50:31 kernel: [17162282.992072] /build/buildd/linux-3.2.0/drivers/scsi/mvsas/mv_sas.c 1863:port 2 slot 53 rx_desc 30035 has error info8000000080000000.
...
13:52:03 kernel: [17162374.423961] EXT4-fs error (device drbd0): htree_dirblock_to_tree:587: inode #148638560: block 1189089581: comm smbd: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=2004033568, rec_len=29801, name_len=99
13:52:04 kernel: [17162375.839851] EXT4-fs error (device drbd0): htree_dirblock_to_tree:587: inode #148638560: block 1189089581: comm smbd: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=2004033568, rec_len=29801, name_len=99
13:52:08 kernel: [17162380.135391] EXT4-fs error (device drbd0): htree_dirblock_to_tree:587: inode #148638560: block 1189089581: comm smbd: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=2004033568, rec_len=29801, name_len=99
13:52:13 kernel: [17162385.108358] EXT4-fs error (device drbd0): htree_dirblock_to_tree:587: inode #148638560: block 1189089581: comm smbd: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=2004033568, rec_len=29801, name_len=99
13:52:17 kernel: [17162388.166515] EXT4-fs error (device drbd0): htree_dirblock_to_tree:587: inode #148638560: block 1189089581: comm smbd: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=2004033568, rec_len=29801, name_len=99
...
Thanks,
Andrew
next prev parent reply other threads:[~2014-02-12 14:44 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1413719638.30344.1392138285471.JavaMail.zimbra@xes-inc.com>
2014-02-11 17:11 ` Automatically drop caches after mdadm fails a drive out of an array? Andrew Martin
2014-02-11 19:54 ` NeilBrown
2014-02-11 23:10 ` Andrew Martin
2014-02-12 0:11 ` Stan Hoeppner
2014-02-12 14:44 ` Andrew Martin [this message]
2014-02-13 8:29 ` Stan Hoeppner
2014-02-13 14:57 ` Andrew Martin
2014-02-13 17:25 ` Mikael Abrahamsson
2014-02-14 4:53 ` Stan Hoeppner
2014-02-14 22:40 ` Andrew Martin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1113712961.13129.1392216288183.JavaMail.zimbra@xes-inc.com \
--to=amartin@xes-inc.com \
--cc=linux-raid@vger.kernel.org \
--cc=neilb@suse.de \
--cc=stan@hardwarefreak.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).