From: Ryusuke Konishi <ryusuke-sG5X7nlA6pw@public.gmane.org>
To: users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org,
david.barham-kv7WeFo6aLtBDgjK7y7TUQ@public.gmane.org
Subject: Re: NILFS hanging SLES 11 - advise on diagnosis needed
Date: Fri, 02 Oct 2009 20:42:07 +0900 (JST) [thread overview]
Message-ID: <20091002.204207.26501663.ryusuke@osrg.net> (raw)
In-Reply-To: <80225669DEC80D488C4A0B5150466E1802F17EC3-q6sIb7dcTNEDs3eufgyKiYGWD8NUZu9dQQ4Iyu8u01E@public.gmane.org>
Hi,
On Fri, 2 Oct 2009 12:11:14 +0200, "Barham, David" wrote:
> Hi
> I'm running SLES 11, 2.6.27.19-5-default with NILFS2 nilfs-2.0.16. I have a 1.5Tb NILFS2 partition which I am setting up with the intention of using Robocopy from various PCs via samba. The robocopy scripts run nightly and a checkpoint is taken once night. A script stops samba, unmounts the previous weeks checkpoint, deletes the checkpoint, creates a new one and then mounts it and restarts samba. This should mean that at any time the user can go back to 'snapshot_{DAY}' to get their files back.
>
> So far so good.
>
> However as I copy the previously backed up files from the previous linux machine where I was doing this (only giving a 'current' copy with reiserfs). I'm finding that the new machine is occasionally hanging. The OS just locks up, screen on console frozen but host still responds to ping.
>
> I'm trying to work out what is causing the hang, I'm getting various messages in the log from smartd relating to the disk which houses the NILFS along the lines of:
>
> Oct 2 09:56:59 cpli6008 syslog-ng[1933]: Log statistics; dropped='pipe(/dev/xconsole)=0', dropped='pipe(/dev/tty10)=0', processed='center(queued)=947', processed='center(received)=478', processed='destination(newsnotice)=0', processed='destination(acpid)=0', processed='destination(firewall)=0', processed='destination(mail)=12', processed='destination(mailinfo)=12', processed='destination(console)=151', processed='destination(newserr)=0', processed='destination(newscrit)=0', processed='destination(messages)=466', processed='destination(mailwarn)=0', processed='destination(localmessages)=0', processed='destination(netmgm)=0', processed='destination(mailerr)=0', processed='destination(xconsole)=151', processed='destination(warn)=155', processed='source(src)=478'
> Oct 2 09:57:25 cpli6008 smartd[3473]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 110 to 112
> Oct 2 09:57:25 cpli6008 smartd[3473]: Device: /dev/sdb [SAT], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 115 to 117
> Oct 2 09:57:25 cpli6008 smartd[3473]: Device: /dev/sdb [SAT], SMART Usage Attribute: 189 High_Fly_Writes changed from 88 to 87
> Oct 2 09:57:25 cpli6008 smartd[3473]: Device: /dev/sdb [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 60 to 61
> Oct 2 09:57:25 cpli6008 smartd[3473]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 40 to 39
> Oct 2 09:57:25 cpli6008 smartd[3473]: Device: /dev/sdb [SAT], SMART Usage Attribute: 195 Hardware_ECC_Recovered changed from 50 to 51
>
> {machine stops responding and gets power cycled}
>
> Oct 2 10:10:58 cpli6008 syslog-ng[1948]: syslog-ng starting up; version='2.0.9'
>
> Do folks think that the hang is NILFS or dodgy hardware/reporting
> from smartd? Is there any advise on getting some debug or status
> information from NILFS to help show it isn't the cause of the
> problem. I would have expected that if it went bang I'd have seen
> something 'worrying' in the log.
The nilfs2 standalone module has a debug mode. You can enable it by
commenting out the following line (i.e. CONFIG_NILFS_DEBUG=y) in
nilfs2-module/fs/Makefile before compiling:
ifndef CONFIG_NILFS
EXTERNAL_BUILD=y
CONFIG_NILFS=m
# Uncomment below to do debug build.
CONFIG_NILFS_DEBUG=y
# Uncomment below to enable bmap validity check.
#CONFIG_NILFS_BMAP_DEBUG=y
endif
By the way, I'm planning to release nilfs-2.0.17 tomorrow in order to
solve file system corruption problems which infrequently happen and
were reported on this list.
The bugfix was already merged in the mainline and also sent to -stable
trees for 3.6.30 and 3.6.31, but not yet done.
Your problem looks hardware problem to me, but I think the new version
is worth a try.
Cheers,
Ryusuke Konishi
next prev parent reply other threads:[~2009-10-02 11:42 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-09-26 12:13 IOzone testing Jeff Layton
[not found] ` <4ABE0553.3090502-fOdFMYwuEsI@public.gmane.org>
2009-09-26 12:54 ` Jérôme Poulin
[not found] ` <debc30fc0909260554s21969638h280d9e6f89b7596e-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-09-26 12:36 ` Jeff Layton
[not found] ` <4ABE0AD7.7000904-fOdFMYwuEsI@public.gmane.org>
2009-09-26 13:27 ` Jérôme Poulin
[not found] ` <debc30fc0909260627n12616bddif87b25f0ae64b6f5-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-09-26 13:14 ` Jeff Layton
[not found] ` <4ABE13D3.2080809-fOdFMYwuEsI@public.gmane.org>
2009-09-26 14:15 ` Jérôme Poulin
[not found] ` <debc30fc0909260715u562bd4b0g747327c3e1a0fe37-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-09-26 14:36 ` Jeff Layton
[not found] ` <4ABE2704.6030607-fOdFMYwuEsI@public.gmane.org>
2009-09-26 15:15 ` Jérôme Poulin
[not found] ` <debc30fc0909260815s412ee145xa90e02d9ce4c942f-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-09-26 15:08 ` Jeff Layton
[not found] ` <4ABE2E86.5010708-fOdFMYwuEsI@public.gmane.org>
2009-09-26 16:29 ` Byron Guernsey
[not found] ` <C83C8B9E-C7E0-46B8-913E-6F0C407DE2E9-BUHhN+a2lJ4@public.gmane.org>
2009-09-26 16:32 ` Jeff Layton
2009-10-02 10:11 ` NILFS hanging SLES 11 - advise on diagnosis needed Barham, David
[not found] ` <80225669DEC80D488C4A0B5150466E1802F17EC3-q6sIb7dcTNEDs3eufgyKiYGWD8NUZu9dQQ4Iyu8u01E@public.gmane.org>
2009-10-02 11:42 ` Ryusuke Konishi [this message]
[not found] ` <20091002.204207.26501663.ryusuke-sG5X7nlA6pw@public.gmane.org>
2009-10-02 12:21 ` Fernando Jiménez Solano
2009-10-06 14:24 ` Barham, David
[not found] ` <80225669DEC80D488C4A0B5150466E1802F184D0-q6sIb7dcTNEDs3eufgyKiYGWD8NUZu9dQQ4Iyu8u01E@public.gmane.org>
2009-10-06 14:59 ` Barham, David
2009-10-02 16:20 ` NILFS space usage Barham, David
[not found] ` <80225669DEC80D488C4A0B5150466E1802F18005-q6sIb7dcTNEDs3eufgyKiYGWD8NUZu9dQQ4Iyu8u01E@public.gmane.org>
2009-10-02 16:59 ` Fernando Jiménez Solano
2009-09-26 18:06 ` IOzone testing Reinoud Zandijk
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20091002.204207.26501663.ryusuke@osrg.net \
--to=ryusuke-sg5x7nla6pw@public.gmane.org \
--cc=david.barham-kv7WeFo6aLtBDgjK7y7TUQ@public.gmane.org \
--cc=users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.