From: Kevan Rehm <krehm@cray.com>
To: "ceph-devel (ceph-devel@vger.kernel.org)" <ceph-devel@vger.kernel.org>
Subject: bluestore bug#15724: bluefs _replay file with link count 0:
Date: Thu, 14 Jul 2016 18:49:26 +0000 [thread overview]
Message-ID: <D3AD48E4.B62F9%krehm@cray.com> (raw)
Greetings,
Bug #15724 reports several occurrences of an OSD that will not start due
to the message "bluefs _replay file with link count 0:". I have also had
several occurrences of this in the last couple of days, using a ceph built
from master a couple of weeks ago.
I jacked up the bluefs debug tracing and have been working through the
reading of the transaction log by the OSD during startup, and I can see
what appears to be a bad transaction that is causing this, see below.
(Probably also the cause of the segmentation fault that occurred just
before this incident.)
Before I fry any more brain cells, I'd like to know if someone is already
fixing this, or if someone has any advice to offer, like where the problem
most likely lies. If not, I'll keep digging.
Here is what we see after the transaction log has been read, the OSD gets
excited and bails:
2016-07-13 16:03:03.933669 7fb6617c1800 20 bluefs _read got 4096
2016-07-13 16:03:03.933671 7fb6617c1800 10 bluefs _replay 0x5cf000: stop:
uuid 4ef0859e-06e8-51c2-9612-add7269b0100 != super.uuid
7eb30434-317a-4844-b8d1-6f8cb37585d1
2016-07-13 16:03:03.933678 7fb6617c1800 10 bluefs _replay log file size
was 0x5cf000
2016-07-13 16:03:03.933731 7fb6617c1800 -1 bluefs _replay file with link
count 0: file(ino 113 size 71354938 mtime 2016-07-12 17:05:53.738683 bdev
1 extents
[1:904921088+1048576,1:910163968+2097152,1:916455424+2097152,1:931135488+20
97152,1:937426944+2097152,1:943718400+2097152,1:950009856+2097152,1:9604956
16+58720256])
2016-07-13 16:03:03.933739 7fb6617c1800 -1 bluefs mount failed to replay
log: (5) Input/output error
2016-07-13 16:03:03.933753 7fb6617c1800 20 bluefs _stop_alloc
2016-07-13 16:03:03.933754 7fb6617c1800 10 bitmapalloc:shutdown instance
140421450426752
2016-07-13 16:03:03.933924 7fb6617c1800 -1
bluestore(/var/lib/ceph/osd/ceph-20) _open_db failed bluefs mount: (5)
Input/output error
Starting from the front of the log, here are the first references to ino
113 in the transaction log:
2016-07-13 16:03:03.900994 7fb6617c1800 20 bluefs _replay 0x562000:
op_file_update file(ino 113 size 0 mtime 2016-07-12 17:05:48.312770 bdev
1 extents [])
2016-07-13 16:03:03.900996 7fb6617c1800 30 bluefs _get_file ino 113 =
0x7fb66aafbb90 (new)
2016-07-13 16:03:03.900997 7fb6617c1800 20 bluefs _replay 0x562000:
op_dir_link db/000145.log to 113
2016-07-13 16:03:03.900998 7fb6617c1800 30 bluefs _get_file ino 113 =
0x7fb66aafbb90
Š
2016-07-13 16:03:03.901047 7fb6617c1800 20 bluefs _replay 0x56e000:
op_file_update file(ino 113 size 23 mtime 2016-07-12 17:05:48.314814 bdev
1 extents [1:904921088+1048576])
2016-07-13 16:03:03.901048 7fb6617c1800 30 bluefs _get_file ino 113 =
0x7fb66aafbb90
Lots of op_file_updates are applied to ino 113 in the log, leading up to
this final one.
2016-07-13 16:03:03.924173 7fb6617c1800 20 bluefs _replay 0x5ab000:
op_file_update file(ino 113 size 71359362 mtime 2016-07-12
17:05:49.420150 bdev 1 extents
[1:904921088+1048576,1:910163968+2097152,1:916455424+2097152,1:931135488+20
97152,1:937426944+2097152,1:943718400+2097152,1:950009856+2097152,1:9604956
16+58720256])
2016-07-13 16:03:03.924176 7fb6617c1800 30 bluefs _get_file ino 113 =
0x7fb66aafbb90
Next ino 113's filename is renamed from 000145.log to 000151.log, followed
by an update, so it must be coming from BlueRocksEnv::ReuseWritableFile().
2016-07-13 16:03:03.925181 7fb6617c1800 10 bluefs _replay 0x5b4000:
txn(seq 77743 len 982 crc 98786513)
2016-07-13 16:03:03.925182 7fb6617c1800 20 bluefs _replay 0x5b4000:
op_dir_link db/000151.log to 113
2016-07-13 16:03:03.925183 7fb6617c1800 30 bluefs _get_file ino 113 =
0x7fb66aafbb90
2016-07-13 16:03:03.925184 7fb6617c1800 20 bluefs _replay 0x5b4000:
op_dir_unlink db/000145.log
2016-07-13 16:03:03.925186 7fb6617c1800 20 bluefs _replay 0x5b4000:
op_file_update file(ino 113 size 71359362 mtime 2016-07-12
17:05:53.738683 bdev 1 extents
[1:904921088+1048576,1:910163968+2097152,1:916455424+2097152,1:931135488+20
97152,1:937426944+2097152,1:943718400+2097152,1:950009856+2097152,1:9604956
16+58720256])
2016-07-13 16:03:03.925197 7fb6617c1800 30 bluefs _get_file ino 113 =
0x7fb66aafbb90
In txn 77752, ino 113 and its filename get removed via a
BlueRocksEnv::DeleteFile() call, followed by an update with the original
allocation. Bad news. Don't yet know where the update is coming from.
So now we have an ino 113 with no links, and a series of extents which it
doesn't really own.
2016-07-13 16:03:03.929686 7fb6617c1800 20 bluefs _replay 0x5be000:
op_file_remove 113
2016-07-13 16:03:03.929687 7fb6617c1800 20 bluefs _replay 0x5be000:
op_dir_unlink db/000150.log
Š
2016-07-13 16:03:03.929689 7fb6617c1800 20 bluefs _replay 0x5be000:
op_file_update file(ino 113 size 71354938 mtime 2016-07-12
17:05:53.738683 bdev 1 extents
[1:904921088+1048576,1:910163968+2097152,1:916455424+2097152,1:931135488+20
97152,1:937426944+2097152,1:943718400+2097152,1:950009856+2097152,1:9604956
16+58720256])
2016-07-13 16:03:03.929693 7fb6617c1800 30 bluefs _get_file ino 113 =
0x7fb66aafc880 (new)
The end result is an ino without a link, and an unhappy OSD.
2016-07-13 16:03:03.933669 7fb6617c1800 20 bluefs _read got 4096
2016-07-13 16:03:03.933671 7fb6617c1800 10 bluefs _replay 0x5cf000: stop:
uuid 4ef0859e-06e8-51c2-9612-add7269b0100 != super.uuid
7eb30434-317a-4844-b8d1-6f8cb37585d1
2016-07-13 16:03:03.933678 7fb6617c1800 10 bluefs _replay log file size
was 0x5cf000
2016-07-13 16:03:03.933731 7fb6617c1800 -1 bluefs _replay file with link
count 0: file(ino 113 size 71354938 mtime 2016-07-12 17:05:53.738683 bdev
1 extents
[1:904921088+1048576,1:910163968+2097152,1:916455424+2097152,1:931135488+20
97152,1:937426944+2097152,1:943718400+2097152,1:950009856+2097152,1:9604956
16+58720256])
I do have the OSD logs showing the original segfault plus the error
mentioned here, but I had to fix the block device so that I could move on.
I can upload the logs if someone is interested, but they're large.
Thanks, Kevan
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next reply other threads:[~2016-07-14 18:49 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-07-14 18:49 Kevan Rehm [this message]
2016-07-14 19:49 ` bluestore bug#15724: bluefs _replay file with link count 0: Kevan Rehm
-- strict thread matches above, loose matches on Subject: below --
2016-07-15 20:21 Kevan Rehm
2016-07-16 7:23 ` Varada Kari
2016-07-16 11:23 ` Kevan Rehm
2016-07-16 12:16 ` Varada Kari
2016-07-16 14:35 ` Kevan Rehm
2016-07-19 14:09 ` Kevan Rehm
2016-07-19 14:50 ` Varada Kari
2016-07-19 15:08 ` Kevan Rehm
2016-07-19 16:40 ` Varada Kari
2016-07-19 17:04 ` Kevan Rehm
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=D3AD48E4.B62F9%krehm@cray.com \
--to=krehm@cray.com \
--cc=ceph-devel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.