All of lore.kernel.org
 help / color / mirror / Atom feed
From: Zenon Panoussis <oracle@provocation.net>
To: ceph-devel@vger.kernel.org
Subject: Re: Suicide
Date: Tue, 19 Apr 2011 00:38:47 +0200	[thread overview]
Message-ID: <4DACBD77.9020600@provocation.net> (raw)
In-Reply-To: <CB9E56C259C44656B7B9DD670A548B8A@gmail.com>


On 04/18/2011 11:21 PM, Gregory Farnum wrote:

> I looked through your logs a bit and noticed that the OSD on node01 is 
> crashing due to high latencies on disk access (I think the defaults for
> this case are it asserts out if there's no progress after 10 minutes or
> something). 

First of all, thank you for plowing through those huge logs. It's a feat
all and by itself.

Could you please post an example where you found the OSD crashing, so that
I and others know what log entries to look for?

> Based on that, I pretty much have to guess that there's just too much 
> stress on your disk and it's going to cause problems. You can try loosening
> the various configurable timeouts to let it run longer but it seems like
> really you just need beefier disks for the amount of stuff you're doing to
> them. 

My hardware is indeed very primitive, but in order to prevent this from
happening I would have to make sure that the disks always have more capacity
than the network. In a real-world setup, with gigabit or muti-gigabit
networking and multiple applications doing disk I/O simultaneously, this
is unfeasible. Also, I suspect that it would go against the hierarchy of
O/S subsystem layering.

What I mean is this: if an application tries to write data to the file
system and fails, the application should either hang or time out and
bail out; the file system itself should still not crash. The application
is always agnostic about the file system, so therefore the file system
should never acknowledge more data than it can promise to actually
process.

In the case of ceph things get complicated by the fact that ceph appears
as a file system to the applications using it, but depends itself on an
underlying file system for its disk access. As a result, ceph is responsible
for the data it accepts from applications, but has no way to meet this
responsibility if the underlying file system lets it down.

I don't know how this problem can be truly solved, but some trickery with
I/O buffers might go a long way towards mitigating it. Or perhaps some
available capacity calls between the monitor and the client. Every other
networked file system has a similar problem, so looking at how NFS or samba
deal with it could provide ideas or even ready code.

> IIRC you're running a monitor and an OSD on the same 2.5" physical disk, 
> which means they're colliding on stuff like sync() calls.

Indeed, I'm runing the entire system on a dirt cheap 2,5" disk. Still, good
software on bad hardware should run slow or not at all, but not try to run
fast and then crash and corrupt its data.

> This general slowness doesn't explain the mds log corruption, although it 
> might be one of the trigger conditions. I added another assert in the 
> Journaler code which might have caused the problem (though I don't think 
> it could have) but don't have any other new ideas.

I'll test again as soon as 0.27 is out (BTW, is 0.27 blocked by 0.26.1 or
do they run independent of each-other?).

Z


  reply	other threads:[~2011-04-18 22:38 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-04-15 14:48 Suicide Zenon Panoussis
2011-04-15 15:16 ` Suicide Zenon Panoussis
2011-04-15 15:38 ` Suicide Zenon Panoussis
2011-04-15 20:21 ` Suicide Gregory Farnum
2011-04-15 22:38   ` Suicide Zenon Panoussis
2011-04-15 23:06     ` Suicide Gregory Farnum
2011-04-15 23:29       ` Suicide Zenon Panoussis
2011-04-16  0:00         ` Suicide Gregory Farnum
2011-04-16  9:53           ` Suicide Zenon Panoussis
2011-04-16 23:50           ` Suicide Zenon Panoussis
2011-04-17  0:14             ` Suicide Zenon Panoussis
2011-04-18 16:40             ` Suicide Gregory Farnum
2011-04-18 21:21               ` Suicide Gregory Farnum
2011-04-18 22:38                 ` Zenon Panoussis [this message]
2011-04-18 23:02                   ` Suicide Gregory Farnum
2011-04-19  0:17                     ` Suicide Colin McCabe
2011-04-19 10:45                     ` Suicide Zenon Panoussis
2011-04-19 16:29                       ` Suicide Gregory Farnum

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4DACBD77.9020600@provocation.net \
    --to=oracle@provocation.net \
    --cc=ceph-devel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.