From: Martin Wilderoth <martin.wilderoth@linserv.se>
To: Gregory Farnum <gregory.farnum@dreamhost.com>
Cc: ceph-devel@vger.kernel.org
Subject: Re: osd stops
Date: Tue, 12 Apr 2011 20:05:58 +0200 (CEST) [thread overview]
Message-ID: <688456938.14487.1302631558862.JavaMail.root@mail.linserv.se> (raw)
In-Reply-To: <64610990.14485.1302631358989.JavaMail.root@mail.linserv.se>
Thanks for the answer, now I know the reson. Some of my osd had 90% of data, dmesg also shows error with the btrfs on the hosts. I will run the test with another file system ext3 :-) or is any other filesystem better. It's a backuppc filesystem with a lot of hardlinks and data I would like to test to run in ceph.
----- Ursprungligt meddelande -----
Från: "Gregory Farnum" <gregory.farnum@dreamhost.com>
Till: "Martin Wilderoth" <martin.wilderoth@linserv.se>
Kopia: ceph-devel@vger.kernel.org
Skickat: tisdag, 12 apr 2011 19:24:27
Ämne: Re: osd stops
Ah. It looks like you're running btrfs and you have a very full disk. Unfortunately btrfs doesn't handle low-disk situations (above ~80% utilization -- yes, it's annoying) very well and so it's failing to perform pretty basic tasks and is propagating those failures up to the OSD. If you really need to run that close to full utilization you're going to need to use another underlying filesystem, or add more disks/nodes to spread the data across.
Sorry. :(
-Greg
On Tuesday, April 12, 2011 at 9:26 AM, Martin Wilderoth wrote:
I have been done some tests and it seems as I always get the same problem.
> I have been transfering data and suddenly I get I/O error and superblock problem.
> This occurs when the filesystem is filled to aprox 80%
>
> ceph health reports no error. I restart the system -a stop -a start
> after that the system is degraded and the osd stopes.
>
> The log shows of the fist failing osd
>
> 2011-04-12 17:51:07.716513 7f02365b8700 -- 0.0.0.0:6802/20180 >> 10.0.6.12:6802/13633 pipe(0x2e1da00 sd=22 pgs=0 cs=0 l=0).fault first fault
> 2011-04-12 17:51:07.716868 7f02365b8700 -- 0.0.0.0:6802/20180 >> 10.0.6.12:6802/13633 pipe(0x2e1da00 sd=22 pgs=0 cs=0 l=0).connect claims to be 0.0.0.0:6802/15976 not 10.0.6.12:6802/13633 - wrong node!
> os/FileStore.cc: In function 'void FileStore::sync_entry()', in thread '0x7f023f9ce700'
> os/FileStore.cc: 2674: FAILED assert(r == 0)
> ceph version 0.26 (commit:9981ff90968398da43c63106694d661f5e3d07d5)
> 1: (FileStore::sync_entry()+0x1975) [0x59f165]
> 2: (FileStore::SyncThread::entry()+0xd) [0x5a8a7d]
> 3: (()+0x68ba) [0x7f024602b8ba]
> 4: (clone()+0x6d) [0x7f0244cc002d]
> ceph version 0.26 (commit:9981ff90968398da43c63106694d661f5e3d07d5)
> 1: (FileStore::sync_entry()+0x1975) [0x59f165]
> 2: (FileStore::SyncThread::entry()+0xd) [0x5a8a7d]
> 3: (()+0x68ba) [0x7f024602b8ba]
> 4: (clone()+0x6d) [0x7f0244cc002d]
> *** Caught signal (Aborted) **
> in thread 0x7f023f9ce700
> ceph version 0.26 (commit:9981ff90968398da43c63106694d661f5e3d07d5)
> 1: /usr/bin/cosd() [0x61e42c]
> 2: (()+0xef60) [0x7f0246033f60]
> 3: (gsignal()+0x35) [0x7f0244c23165]
> 4: (abort()+0x180) [0x7f0244c25f70]
> 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7f02454b6dc5]
> 6: (()+0xcb166) [0x7f02454b5166]
> 7: (()+0xcb193) [0x7f02454b5193]
> 8: (()+0xcb28e) [0x7f02454b528e]
> 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x373) [0x6061e3]
> 10: (FileStore::sync_entry()+0x1975) [0x59f165]
> 11: (FileStore::SyncThread::entry()+0xd) [0x5a8a7d]
> 12: (()+0x68ba) [0x7f024602b8ba]
> 13: (clone()+0x6d) [0x7f0244cc002d]
>
> the second failing osd
>
> 2011-04-12 18:03:36.036420 7f39c6ce7700 FileStore: sync_entry timed out after 600 seconds.
> ceph version 0.26 (commit:9981ff90968398da43c63106694d661f5e3d07d5)
> 2011-04-12 18:03:36.036494 1: (SafeTimer::timer_thread()+0x36b) [0x601afb]
> 2011-04-12 18:03:36.036509 2: (SafeTimerThread::entry()+0xd) [0x6042cd]
> 2011-04-12 18:03:36.036528 3: (()+0x68ba) [0x7f39d034a8ba]
> 2011-04-12 18:03:36.036541 4: (clone()+0x6d) [0x7f39cefdf02d]
> 2011-04-12 18:03:36.036551 os/FileStore.cc: In function 'virtual void SyncEntryTimeout::finish(int)', in thread '0x7f39c6ce7700'
> os/FileStore.cc: 2573: FAILED assert(0)
> ceph version 0.26 (commit:9981ff90968398da43c63106694d661f5e3d07d5)
> 1: (SyncEntryTimeout::finish(int)+0xf4) [0x5a0b34]
> 2: (SafeTimer::timer_thread()+0x36b) [0x601afb]
> 3: (SafeTimerThread::entry()+0xd) [0x6042cd]
> 4: (()+0x68ba) [0x7f39d034a8ba]
> 5: (clone()+0x6d) [0x7f39cefdf02d]
> ceph version 0.26 (commit:9981ff90968398da43c63106694d661f5e3d07d5)
> 1: (SyncEntryTimeout::finish(int)+0xf4) [0x5a0b34]
> 2: (SafeTimer::timer_thread()+0x36b) [0x601afb]
> 3: (SafeTimerThread::entry()+0xd) [0x6042cd]
> 4: (()+0x68ba) [0x7f39d034a8ba]
> 5: (clone()+0x6d) [0x7f39cefdf02d]
> *** Caught signal (Aborted) **
> in thread 0x7f39c6ce7700
> ceph version 0.26 (commit:9981ff90968398da43c63106694d661f5e3d07d5)
> 1: /usr/bin/cosd() [0x61e42c]
> 2: (()+0xef60) [0x7f39d0352f60]
> 3: (gsignal()+0x35) [0x7f39cef42165]
> 4: (abort()+0x180) [0x7f39cef44f70]
> 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7f39cf7d5dc5]
> 6: (()+0xcb166) [0x7f39cf7d4166]
> 7: (()+0xcb193) [0x7f39cf7d4193]
> 8: (()+0xcb28e) [0x7f39cf7d428e]
> 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x373) [0x6061e3]
> 10: (SyncEntryTimeout::finish(int)+0xf4) [0x5a0b34]
> 11: (SafeTimer::timer_thread()+0x36b) [0x601afb]
> 12: (SafeTimerThread::entry()+0xd) [0x6042cd]
> 13: (()+0x68ba) [0x7f39d034a8ba]
> 14: (clone()+0x6d) [0x7f39cefdf02d]
>
> regards Martin
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next parent reply other threads:[~2011-04-12 18:13 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <64610990.14485.1302631358989.JavaMail.root@mail.linserv.se>
2011-04-12 18:05 ` Martin Wilderoth [this message]
2011-04-12 18:24 ` osd stops Gregory Farnum
2011-04-13 12:12 ` Martin Wilderoth
2011-04-13 19:38 ` Gregory Farnum
2011-04-13 19:43 ` Gregory Farnum
[not found] <ab2410b5-fe4c-4600-a2c0-f36a708fb6e2@mail.linserv.se>
2013-04-14 5:07 ` Martin Wilderoth
[not found] <1608788961.14465.1302625479260.JavaMail.root@mail.linserv.se>
2011-04-12 16:26 ` Martin Wilderoth
2011-04-12 16:57 ` Wido den Hollander
2011-04-12 17:24 ` Gregory Farnum
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=688456938.14487.1302631558862.JavaMail.root@mail.linserv.se \
--to=martin.wilderoth@linserv.se \
--cc=ceph-devel@vger.kernel.org \
--cc=gregory.farnum@dreamhost.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.