All of lore.kernel.org
 help / color / mirror / Atom feed
From: Amon Ott <a.ott@m-privacy.de>
To: ceph-devel@vger.kernel.org
Subject: OSD deadlock with cephfs client and OSD on same machine
Date: Tue, 29 May 2012 09:44:33 +0200	[thread overview]
Message-ID: <201205290944.33983.a.ott@m-privacy.de> (raw)

Hello again!

On Linux, if you run OSD on ext4 filesystem, have a cephfs kernel client mount 
on the same system and no syncfs system call (as to be expected with libc6 < 
2.14 or kernel < 2.6.39), OSD deadlocks in sys_sync(). Only reboot recovers 
the system.

After some investigation in the code, this is what I found:
In src/common/sync_filesystem.h, the function sync_filesystem() first tries a 
syncfs() (not available), then a btrfs ioctrl sync (not available with 
non-btrfs), then finally a sync(). sys_sync tries to sync all filesystems, 
including the journal device, the osd storage area and the cephfs mount. 
Under some load, when OSD calls sync(), cephfs sync waits for the local osd, 
which already waits for its storage to sync, which the kernel wants to do 
after the cephfs sync. Deadlock.

The function sync_filesystem() is called by FileStore::sync_entry() in 
src/os/FileStore.cc, but only on non-btrfs storage and if 
filestore_fsync_flushes_journal_data is false. After forcing this to true in 
OSD config, our test cluster survived three days of heavy load (and still 
running fine) instead of deadlocking all nodes within an hour. Reproduced 
with 0.47.2 and kernel 3.2.18, but the related code seems unchanged in 
current master.

Conclusion: If you want to run OSD and cephfs kernel client on the same Linux 
server and have a libc6 before 2.14 (e.g. Debian's newest in experimental is 
2.13) or a kernel before 2.6.39, either do not use ext4 (but btrfs is still 
unstable) or risk data loss by missing syncs through the workaround of 
forcing filestore_fsync_flushes_journal_data to true.

Please consider putting out a fat warning at least at build time, if syncfs() 
is not available, e.g. "No syncfs() syscall, please expect a deadlock when 
running osd on non-btrfs together with a local cephfs mount." Even better 
would be a quick runtime test for missing syncfs() and storage on non-btrfs 
that spits out a warning, if deadlock is possible.

As a side effect, the experienced lockup seems to be a good way to reproduce 
the long standing bug 1047 - when our cluster tried to recover, all MDS 
instances died with those symptoms. It seems that a partial sync of journal 
or data partition causes that broken state.

Amon Ott
-- 
Dr. Amon Ott
m-privacy GmbH           Tel: +49 30 24342334
Am Köllnischen Park 1    Fax: +49 30 24342336
10179 Berlin             http://www.m-privacy.de

Amtsgericht Charlottenburg, HRB 84946

Geschäftsführer:
 Dipl.-Kfm. Holger Maczkowsky,
 Roman Maczkowsky

GnuPG-Key-ID: 0x2DD3A649
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

             reply	other threads:[~2012-05-29  7:44 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-29  7:44 Amon Ott [this message]
2012-05-29 15:47 ` OSD deadlock with cephfs client and OSD on same machine Sage Weil
2012-05-30  7:08   ` Amon Ott
2012-06-01  9:35     ` Amon Ott
2012-06-01 21:57       ` Tommi Virtanen
2012-11-05 20:17       ` Cláudio Martins
2012-11-06  7:54         ` Amon Ott
2012-05-29 16:18 ` Tommi Virtanen
2012-05-30  6:59   ` Amon Ott
2012-05-30 17:02     ` Tommi Virtanen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201205290944.33983.a.ott@m-privacy.de \
    --to=a.ott@m-privacy.de \
    --cc=ceph-devel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.